On 07/23/2015 01:23 PM, Michael Kerrisk (man-pages) wrote:
...
Btw, a user obviously can close() the map fds if he
wants to, but ultimatively they're freed when the program unloads.
Okay. (Not sure if you meant that something should be added to the page.)
I think not necessary.
[...]
The attributes key_size and value_size will be used by the
attribute's?
Nope. But I changed this to "The key_size and value_size attributes will be",
which may read clearer.
Sorry, true, I was a bit confused. :)
[...]
The type __u64 is kernel internal, so if there's no strict reason to use it,
we should just use what's provided by stdint.h.
Agreed. Done. (By the way, what about all the __u32 and __u64 elements in the
bpf_attr union?)
I wouldn't change the bpf_attr from the uapi.
Just the provided example code here, I presume people might copy from here when
they build their own library and in userspace uint64_t seems to be more natural.
[...]
* map_update_elem() replaces elements in an non-atomic
fashion; for atomic updates, a hash-table map should be
used instead.
This point here is most important, i.e. to not have false user expecations.
Maybe it's also worth mentioning that when you have a value_size of
sizeof(long),
you can however use __sync_fetch_and_add() atomic builtin from the LLVM backend.
I think I'll leave out that detail for the moment.
Ok, I guess we could revisit/clarify that at a later point in time. I'd add
a TODO comment to the source or the like, as this also is related to the 2nd
below use case (aggregation/accounting), where an array is typically used.
Among the uses for array maps are the following:
* As "global" eBPF variables: an array of 1 element whose
key is (index) 0 and where the value is a collection of
'global' variables which eBPF programs can use to keep
state between events.
* Aggregation of tracing events into a fixed set of buck‐
ets.
[...]
* license is a license string, which must be GPL compatible to
call helper functions marked gpl_only.
Not strictly. So here, the same rules apply as with kernel modules. I.e. what
the kernel checks for are the following license strings:
static inline int license_is_gpl_compatible(const char *license)
{
return (strcmp(license, "GPL") == 0
|| strcmp(license, "GPL v2") == 0
|| strcmp(license, "GPL and additional rights") == 0
|| strcmp(license, "Dual BSD/GPL") == 0
|| strcmp(license, "Dual MIT/GPL") == 0
|| strcmp(license, "Dual MPL/GPL") == 0);
}
With any of them, the eBPF program is declared GPL compatible. Maybe of interest
for those that want to use dual licensing of some sort.
So, I'm a little unclear here. What text do you suggest for the page?
Maybe we should mention in addition that the same licensing rules apply as
in case with kernel modules, so also dual licenses could be used.
* log_buf is a pointer to a caller-allocated buffer in which the
in-kernel verifier can store the verification log. This log
is a multi-line string that can be checked by the program
author in order to understand how the verifier came to the
conclusion that the BPF program is unsafe. The format of the
output can change at any time as the verifier evolves.
* log_size size of the buffer pointed to by log_bug. If the
size of the buffer is not large enough to store all verifier
messages, -1 is returned and errno is set to ENOSPC.
* log_level verbosity level of the verifier. A value of zero
means that the verifier will not provide a log.
Note that the log buffer is optional as mentioned here log_level = 0. The
above example code of bpf_prog_load() suggests that it always needs to be
provided.
I once ran indeed into an issue where the program itself was correct, but
it got rejected by the kernel, because my log buffer size was too small, so
in tc, we now have it larger as bpf_log_buf[65536] ...
So, I'm not clear. Do you mean that some piece of text here in the page
should be changed? If so, could elaborate?
I'd maybe only mention in addition that in log_level=0 case, we also must not
provide a log_buf and log_size, otherwise we get EINVAL.
[...]
I had to read this twice. ;) Maybe this needs to be reworded slightly.
It just means that depending on the program type that the author selects,
you might end up with a different subset of helper functions, and a
different program input/context. For example tracing does not have the
exact same helpers as socket filters (it might have some that can be used
by both). Also, the eBPF program input (context) for socket filters is a
network packet, wheras for tracing you operate on a set of registers.
Changed. Now we have:
eBPF program types
The eBPF program type (prog_type) determines the subset of a ker‐
nel helper functions that the program may call. The program type
s/a//
also determines dthe program input (context)—the format of struct
s/dthe/the/
bpf_context (which is the data blob passed into the eBPF program
as the first argument).
For example, a tracing program does not have the exact same sub‐
set of helper functions as a socket filter program (though they
may have some helpers in common). Similarly, the input (context)
for a tracing program is a set of register values, while for a
socket filter it is a network packet.
The set of functions available to eBPF programs of a given type
may increase in the future.
That's fine with me.
[...]
I would also make a note about the JIT compiler here, i.e. that it's disabled
by default, and can be enabled via:
* Normal mode: echo 1 > /proc/sys/net/core/bpf_jit_enable
* Debugging mode: echo 2 > /proc/sys/net/core/bpf_jit_enable
[opcodes dumped in hex into the kernel log, which can then be disassembled
Here, I assume you mean thet the generated (native) opcodes are dumpeed, right?
Yes.
with tools/net/bpf_jit_disasm.c from the kernel tree]
When enabled, after a eBPF program gets loaded, it's transparently compiled /
translated inside the kernel into machine opcodes for better performance,
currently on x86_64, arm64 and s390.
According to Documentation/networking/filter.txt the JIT compiler supports
many more architectures:
The Linux kernel has a built-in BPF JIT compiler for x86_64,
SPARC, PowerPC, ARM, ARM64, MIPS and s390 and can be enabled
through CONFIG_BPF_JIT.
Or am I misunderstanding something?
The others only work for cBPF and have not (yet) be converted over to eBPF.
For the three mentioned above, the kernel internally migrates cBPF into eBPF
instructions and then JITs the eBPF result eventually.
I added the following:
The kernel contains a just-in-time (JIT) compiler that translates
eBPF bytecode into native machine code for better performance.
The JIT compiler is disabled by default, but its operation can be
controlled by writing one of the following values to
/proc/sys/net/core/bpf_jit_enable:
0 Disable JIT compilation (default).
1 Normal compilation.
2 Debugging mode. The generated opcodes are dumped in hexadeci‐
mal into the kernel log. These opcodes can then be disassem‐
bled using the program tools/net/bpf_jit_disasm.c provided in
the kernel source tree.
SEE ALSO
seccomp(2), socket(7), tc(8), tc-bpf(8)
Both classic and extended BPF are explained in the kernel source
file Documentation/networking/filter.txt.
Rest looks good for an initial version!
Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/