Re: [RFC bpf-next v2 7/8] bpf: add documentation for eBPF helpers (51-57)

2018-04-11 Thread Quentin Monnet
2018-04-11 16:44 UTC+0100 ~ Quentin Monnet 
> 2018-04-10 09:58 UTC-0700 ~ Yonghong Song 
>> On 4/10/18 7:41 AM, Quentin Monnet wrote:
>>> Add documentation for eBPF helper functions to bpf.h user header file.
>>> This documentation can be parsed with the Python script provided in
>>> another commit of the patch series, in order to provide a RST document
>>> that can later be converted into a man page.
>>>
>>> The objective is to make the documentation easily understandable and
>>> accessible to all eBPF developers, including beginners.
>>>
>>> This patch contains descriptions for the following helper functions:
>>>
>>> Helpers from Lawrence:
>>> - bpf_setsockopt()
>>> - bpf_getsockopt()
>>> - bpf_sock_ops_cb_flags_set()
>>>
>>> Helpers from Yonghong:
>>> - bpf_perf_event_read_value()
>>> - bpf_perf_prog_read_value()
>>>
>>> Helper from Josef:
>>> - bpf_override_return()
>>>
>>> Helper from Andrey:
>>> - bpf_bind()
>>>
>>> Cc: Lawrence Brakmo 
>>> Cc: Yonghong Song 
>>> Cc: Josef Bacik 
>>> Cc: Andrey Ignatov 
>>> Signed-off-by: Quentin Monnet 
>>> ---

> [...]
> 
> Thanks Yonghong for the review!
> 
> I have a favor to ask of you. I got a bounce for Kaixu Xia's email
> address, and I don't know what alternative email address I could use. I
> CC-ed to have a review for helper bpf_perf_event_read() (in patch 6 of
> this series), which is rather close to bpf_perf_event_read_value().
> Would you mind having a look at that one too, please? The description is
> not long.

Well I read again the description I wrote, and actually the one for
bpf_perf_evnet_read() is nearly a subset of the one for
perf_event_read_value(). So the same comments that you raised earlier
apply, there's probably nothing more to review. But if you notice that
some important info is missing for bpf_perf_event_read(), I'm interested
too!

Quentin
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC bpf-next v2 7/8] bpf: add documentation for eBPF helpers (51-57)

2018-04-11 Thread Quentin Monnet
2018-04-10 10:50 UTC-0700 ~ Andrey Ignatov 
> Quentin Monnet  [Tue, 2018-04-10 07:43 -0700]:
>> + * int bpf_bind(struct bpf_sock_addr_kern *ctx, struct sockaddr *addr, int 
>> addr_len)
>> + *  Description
>> + *  Bind the socket associated to *ctx* to the address pointed by
>> + *  *addr*, of length *addr_len*. This allows for making outgoing
>> + *  connection from the desired IP address, which can be useful for
>> + *  example when all processes inside a cgroup should use one
>> + *  single IP address on a host that has multiple IP configured.
>> + *
>> + *  This helper works for IPv4 and IPv6, TCP and UDP sockets. The
>> + *  domain (*addr*\ **->sa_family**) must be **AF_INET** (or
>> + *  **AF_INET6**). Looking for a free port to bind to can be
>> + *  expensive, therefore binding to port is not permitted by the
>> + *  helper: *addr*\ **->sin_port** (or **sin6_port**, respectively)
>> + *  must be set to zero.
>> + *
>> + *  As for the remote end, both parts of it can be overridden,
>> + *  remote IP and remote port. This can be useful if an application
>> + *  inside a cgroup wants to connect to another application inside
>> + *  the same cgroup or to itself, but knows nothing about the IP
>> + *  address assigned to the cgroup.
> 
> The last paragraph ("As for the remote end ...") is not relevant to
> bpf_bind() and should be removed. It's about sys_connect hook itself
> that can call to bpf_bind() but also has other functionality (and that
> other functionality is described by this paragraph).

Thanks Andrey, I will remove this paragraph.

Quentin



--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC bpf-next v2 7/8] bpf: add documentation for eBPF helpers (51-57)

2018-04-11 Thread Quentin Monnet
2018-04-10 09:58 UTC-0700 ~ Yonghong Song 
> On 4/10/18 7:41 AM, Quentin Monnet wrote:
>> Add documentation for eBPF helper functions to bpf.h user header file.
>> This documentation can be parsed with the Python script provided in
>> another commit of the patch series, in order to provide a RST document
>> that can later be converted into a man page.
>>
>> The objective is to make the documentation easily understandable and
>> accessible to all eBPF developers, including beginners.
>>
>> This patch contains descriptions for the following helper functions:
>>
>> Helpers from Lawrence:
>> - bpf_setsockopt()
>> - bpf_getsockopt()
>> - bpf_sock_ops_cb_flags_set()
>>
>> Helpers from Yonghong:
>> - bpf_perf_event_read_value()
>> - bpf_perf_prog_read_value()
>>
>> Helper from Josef:
>> - bpf_override_return()
>>
>> Helper from Andrey:
>> - bpf_bind()
>>
>> Cc: Lawrence Brakmo 
>> Cc: Yonghong Song 
>> Cc: Josef Bacik 
>> Cc: Andrey Ignatov 
>> Signed-off-by: Quentin Monnet 
>> ---
>>   include/uapi/linux/bpf.h | 184
>> +++
>>   1 file changed, 184 insertions(+)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 15d9ccafebbe..7343af4196c8 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h

[...]

>> @@ -1255,6 +1277,168 @@ union bpf_attr {
>>    * performed again.
>>    * Return
>>    * 0 on success, or a negative error in case of failure.
>> + *
>> + * int bpf_perf_event_read_value(struct bpf_map *map, u64 flags,
>> struct bpf_perf_event_value *buf, u32 buf_size)
>> + * Description
>> + * Read the value of a perf event counter, and store it into
>> *buf*
>> + * of size *buf_size*. This helper relies on a *map* of type
>> + * **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. The nature of the perf
>> + * event counter is selected at the creation of the *map*. The
> 
> The nature of the perf event counter is selected when *map* is updated
> with perf_event fd's.
> 

Thanks, I will fix it.

>> + * *map* is an array whose size is the number of available CPU
>> + * cores, and each cell contains a value relative to one
>> core. The
> 
> It is confusing to mix core/cpu here. Maybe just use perf_event
> convention, always using cpu?
> 

Right, I'll remove occurrences of "core".

>> + * value to retrieve is indicated by *flags*, that contains the
>> + * index of the core to look up, masked with
>> + * **BPF_F_INDEX_MASK**. Alternatively, *flags* can be set to
>> + * **BPF_F_CURRENT_CPU** to indicate that the value for the
>> + * current CPU core should be retrieved.
>> + *
>> + * This helper behaves in a way close to
>> + * **bpf_perf_event_read**\ () helper, save that instead of
>> + * just returning the value observed, it fills the *buf*
>> + * structure. This allows for additional data to be
>> retrieved: in
>> + * particular, the enabled and running times (in *buf*\
>> + * **->enabled** and *buf*\ **->running**, respectively) are
>> + * copied.
>> + *
>> + * These values are interesting, because hardware PMU
>> (Performance
>> + * Monitoring Unit) counters are limited resources. When
>> there are
>> + * more PMU based perf events opened than available counters,
>> + * kernel will multiplex these events so each event gets certain
>> + * percentage (but not all) of the PMU time. In case that
>> + * multiplexing happens, the number of samples or counter value
>> + * will not reflect the case compared to when no multiplexing
>> + * occurs. This makes comparison between different runs
>> difficult.
>> + * Typically, the counter value should be normalized before
>> + * comparing to other experiments. The usual normalization is
>> done
>> + * as follows.
>> + *
>> + * ::
>> + *
>> + * normalized_counter = counter * t_enabled / t_running
>> + *
>> + * Where t_enabled is the time enabled for event and
>> t_running is
>> + * the time running for event since last normalization. The
>> + * enabled and running times are accumulated since the perf
>> event
>> + * open. To achieve scaling factor between two invocations of an
>> + * eBPF program, users can can use CPU id as the key (which is
>> + * typical for perf array usage model) to remember the previous
>> + * value and do the calculation inside the eBPF program.
>> + * Return
>> + * 0 on success, or a negative error in case of failure.
>> + *

[...]

Thanks Yonghong for the review!

I have a favor to ask of you. I got a bounce for Kaixu Xia's email
address, and I don't know what alternative email address I could use. I
CC-ed to have a review for helper 

Re: [RFC bpf-next v2 7/8] bpf: add documentation for eBPF helpers (51-57)

2018-04-10 Thread Andrey Ignatov
Quentin Monnet  [Tue, 2018-04-10 07:43 -0700]:
> + * int bpf_bind(struct bpf_sock_addr_kern *ctx, struct sockaddr *addr, int 
> addr_len)
> + *   Description
> + *   Bind the socket associated to *ctx* to the address pointed by
> + *   *addr*, of length *addr_len*. This allows for making outgoing
> + *   connection from the desired IP address, which can be useful for
> + *   example when all processes inside a cgroup should use one
> + *   single IP address on a host that has multiple IP configured.
> + *
> + *   This helper works for IPv4 and IPv6, TCP and UDP sockets. The
> + *   domain (*addr*\ **->sa_family**) must be **AF_INET** (or
> + *   **AF_INET6**). Looking for a free port to bind to can be
> + *   expensive, therefore binding to port is not permitted by the
> + *   helper: *addr*\ **->sin_port** (or **sin6_port**, respectively)
> + *   must be set to zero.
> + *
> + *   As for the remote end, both parts of it can be overridden,
> + *   remote IP and remote port. This can be useful if an application
> + *   inside a cgroup wants to connect to another application inside
> + *   the same cgroup or to itself, but knows nothing about the IP
> + *   address assigned to the cgroup.

The last paragraph ("As for the remote end ...") is not relevant to
bpf_bind() and should be removed. It's about sys_connect hook itself
that can call to bpf_bind() but also has other functionality (and that
other functionality is described by this paragraph).


-- 
Andrey Ignatov
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC bpf-next v2 7/8] bpf: add documentation for eBPF helpers (51-57)

2018-04-10 Thread Yonghong Song



On 4/10/18 7:41 AM, Quentin Monnet wrote:

Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions:

Helpers from Lawrence:
- bpf_setsockopt()
- bpf_getsockopt()
- bpf_sock_ops_cb_flags_set()

Helpers from Yonghong:
- bpf_perf_event_read_value()
- bpf_perf_prog_read_value()

Helper from Josef:
- bpf_override_return()

Helper from Andrey:
- bpf_bind()

Cc: Lawrence Brakmo 
Cc: Yonghong Song 
Cc: Josef Bacik 
Cc: Andrey Ignatov 
Signed-off-by: Quentin Monnet 
---
  include/uapi/linux/bpf.h | 184 +++
  1 file changed, 184 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 15d9ccafebbe..7343af4196c8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1208,6 +1208,28 @@ union bpf_attr {
   *Return
   *0
   *
+ * int bpf_setsockopt(struct bpf_sock_ops_kern *bpf_socket, int level, int 
optname, char *optval, int optlen)
+ * Description
+ * Emulate a call to **setsockopt()** on the socket associated to
+ * *bpf_socket*, which must be a full socket. The *level* at
+ * which the option resides and the name *optname* of the option
+ * must be specified, see **setsockopt(2)** for more information.
+ * The option value of length *optlen* is pointed by *optval*.
+ *
+ * This helper actually implements a subset of **setsockopt()**.
+ * It supports the following *level*\ s:
+ *
+ * * **SOL_SOCKET**, which supports the following *optname*\ s:
+ *   **SO_RCVBUF**, **SO_SNDBUF**, **SO_MAX_PACING_RATE**,
+ *   **SO_PRIORITY**, **SO_RCVLOWAT**, **SO_MARK**.
+ * * **IPPROTO_TCP**, which supports the following *optname*\ s:
+ *   **TCP_CONGESTION**, **TCP_BPF_IW**,
+ *   **TCP_BPF_SNDCWND_CLAMP**.
+ * * **IPPROTO_IP**, which supports *optname* **IP_TOS**.
+ * * **IPPROTO_IPV6**, which supports *optname* **IPV6_TCLASS**.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
   * int bpf_skb_adjust_room(struct sk_buff *skb, u32 len_diff, u32 mode, u64 
flags)
   *Description
   *Grow or shrink the room for data in the packet associated to
@@ -1255,6 +1277,168 @@ union bpf_attr {
   *performed again.
   *Return
   *0 on success, or a negative error in case of failure.
+ *
+ * int bpf_perf_event_read_value(struct bpf_map *map, u64 flags, struct 
bpf_perf_event_value *buf, u32 buf_size)
+ * Description
+ * Read the value of a perf event counter, and store it into *buf*
+ * of size *buf_size*. This helper relies on a *map* of type
+ * **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. The nature of the perf
+ * event counter is selected at the creation of the *map*. The


The nature of the perf event counter is selected when *map* is updated 
with perf_event fd's.



+ * *map* is an array whose size is the number of available CPU
+ * cores, and each cell contains a value relative to one core. The


It is confusing to mix core/cpu here. Maybe just use perf_event 
convention, always using cpu?



+ * value to retrieve is indicated by *flags*, that contains the
+ * index of the core to look up, masked with
+ * **BPF_F_INDEX_MASK**. Alternatively, *flags* can be set to
+ * **BPF_F_CURRENT_CPU** to indicate that the value for the
+ * current CPU core should be retrieved.
+ *
+ * This helper behaves in a way close to
+ * **bpf_perf_event_read**\ () helper, save that instead of
+ * just returning the value observed, it fills the *buf*
+ * structure. This allows for additional data to be retrieved: in
+ * particular, the enabled and running times (in *buf*\
+ * **->enabled** and *buf*\ **->running**, respectively) are
+ * copied.
+ *
+ * These values are interesting, because hardware PMU (Performance
+ * Monitoring Unit) counters are limited resources. When there are
+ * more PMU based perf events opened than available counters,
+ * kernel will multiplex these events so each event gets certain
+ * percentage (but not all) of the PMU time. In case that
+ * multiplexing happens, the number of samples or counter value
+ *

[RFC bpf-next v2 7/8] bpf: add documentation for eBPF helpers (51-57)

2018-04-10 Thread Quentin Monnet
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions:

Helpers from Lawrence:
- bpf_setsockopt()
- bpf_getsockopt()
- bpf_sock_ops_cb_flags_set()

Helpers from Yonghong:
- bpf_perf_event_read_value()
- bpf_perf_prog_read_value()

Helper from Josef:
- bpf_override_return()

Helper from Andrey:
- bpf_bind()

Cc: Lawrence Brakmo 
Cc: Yonghong Song 
Cc: Josef Bacik 
Cc: Andrey Ignatov 
Signed-off-by: Quentin Monnet 
---
 include/uapi/linux/bpf.h | 184 +++
 1 file changed, 184 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 15d9ccafebbe..7343af4196c8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1208,6 +1208,28 @@ union bpf_attr {
  * Return
  * 0
  *
+ * int bpf_setsockopt(struct bpf_sock_ops_kern *bpf_socket, int level, int 
optname, char *optval, int optlen)
+ * Description
+ * Emulate a call to **setsockopt()** on the socket associated to
+ * *bpf_socket*, which must be a full socket. The *level* at
+ * which the option resides and the name *optname* of the option
+ * must be specified, see **setsockopt(2)** for more information.
+ * The option value of length *optlen* is pointed by *optval*.
+ *
+ * This helper actually implements a subset of **setsockopt()**.
+ * It supports the following *level*\ s:
+ *
+ * * **SOL_SOCKET**, which supports the following *optname*\ s:
+ *   **SO_RCVBUF**, **SO_SNDBUF**, **SO_MAX_PACING_RATE**,
+ *   **SO_PRIORITY**, **SO_RCVLOWAT**, **SO_MARK**.
+ * * **IPPROTO_TCP**, which supports the following *optname*\ s:
+ *   **TCP_CONGESTION**, **TCP_BPF_IW**,
+ *   **TCP_BPF_SNDCWND_CLAMP**.
+ * * **IPPROTO_IP**, which supports *optname* **IP_TOS**.
+ * * **IPPROTO_IPV6**, which supports *optname* **IPV6_TCLASS**.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
  * int bpf_skb_adjust_room(struct sk_buff *skb, u32 len_diff, u32 mode, u64 
flags)
  * Description
  * Grow or shrink the room for data in the packet associated to
@@ -1255,6 +1277,168 @@ union bpf_attr {
  * performed again.
  * Return
  * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_perf_event_read_value(struct bpf_map *map, u64 flags, struct 
bpf_perf_event_value *buf, u32 buf_size)
+ * Description
+ * Read the value of a perf event counter, and store it into *buf*
+ * of size *buf_size*. This helper relies on a *map* of type
+ * **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. The nature of the perf
+ * event counter is selected at the creation of the *map*. The
+ * *map* is an array whose size is the number of available CPU
+ * cores, and each cell contains a value relative to one core. The
+ * value to retrieve is indicated by *flags*, that contains the
+ * index of the core to look up, masked with
+ * **BPF_F_INDEX_MASK**. Alternatively, *flags* can be set to
+ * **BPF_F_CURRENT_CPU** to indicate that the value for the
+ * current CPU core should be retrieved.
+ *
+ * This helper behaves in a way close to
+ * **bpf_perf_event_read**\ () helper, save that instead of
+ * just returning the value observed, it fills the *buf*
+ * structure. This allows for additional data to be retrieved: in
+ * particular, the enabled and running times (in *buf*\
+ * **->enabled** and *buf*\ **->running**, respectively) are
+ * copied.
+ *
+ * These values are interesting, because hardware PMU (Performance
+ * Monitoring Unit) counters are limited resources. When there are
+ * more PMU based perf events opened than available counters,
+ * kernel will multiplex these events so each event gets certain
+ * percentage (but not all) of the PMU time. In case that
+ * multiplexing happens, the number of samples or counter value
+ * will not reflect the case compared to when no multiplexing
+ * occurs. This makes comparison between different runs difficult.
+ * Typically, the counter value should be normalized before
+ * comparing to