[PATCH] netlink: put module reference if dump start fails
Before, if cb->start() failed, the module reference would never be put, because cb->cb_running is intentionally false at this point. Users are generally annoyed by this because they can no longer unload modules that leak references. Also, it may be possible to tediously wrap a reference counter back to zero, especially since module.c still uses atomic_inc instead of refcount_inc. This patch expands the error path to simply call module_put if cb->start() fails. Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com> --- This probably should be queued up for stable. net/netlink/af_netlink.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 2ad445c1d27c..07e8478068f0 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2308,7 +2308,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, if (cb->start) { ret = cb->start(cb); if (ret) - goto error_unlock; + goto error_put; } nlk->cb_running = true; @@ -2328,6 +2328,8 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, */ return -EINTR; +error_put: + module_put(control->module); error_unlock: sock_put(sk); mutex_unlock(nlk->cb_mutex); -- 2.16.1
[PATCH] netlink: put module reference if dump start fails
Before, if cb->start() failed, the module reference would never be put, because cb->cb_running is intentionally false at this point. Users are generally annoyed by this because they can no longer unload modules that leak references. Also, it may be possible to tediously wrap a reference counter back to zero, especially since module.c still uses atomic_inc instead of refcount_inc. This patch expands the error path to simply call module_put if cb->start() fails. Signed-off-by: Jason A. Donenfeld --- This probably should be queued up for stable. net/netlink/af_netlink.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 2ad445c1d27c..07e8478068f0 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2308,7 +2308,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, if (cb->start) { ret = cb->start(cb); if (ret) - goto error_unlock; + goto error_put; } nlk->cb_running = true; @@ -2328,6 +2328,8 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, */ return -EINTR; +error_put: + module_put(control->module); error_unlock: sock_put(sk); mutex_unlock(nlk->cb_mutex); -- 2.16.1
Re: [PATCH] random: always fill buffer in get_random_bytes_wait
Hi Ted, Can you apply this? Thanks, Jason
Re: [PATCH] random: always fill buffer in get_random_bytes_wait
Hi Ted, Can you apply this? Thanks, Jason
[PATCH] random: always fill buffer in get_random_bytes_wait
In the unfortunate event that a developer fails to check the return value of get_random_bytes_wait, or simply wants to make a "best effort" attempt, for whatever that's worth, it's much better to still fill the buffer with _something_ rather than catastrophically failing in the case of an interruption. This is both a defense in depth measure against inevitable programming bugs, as well as a means of making the API a bit more useful. Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com> --- include/linux/random.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/include/linux/random.h b/include/linux/random.h index 4024f7d9c77d..2ddf13b4281e 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -85,10 +85,8 @@ static inline unsigned long get_random_canary(void) static inline int get_random_bytes_wait(void *buf, int nbytes) { int ret = wait_for_random_bytes(); - if (unlikely(ret)) - return ret; get_random_bytes(buf, nbytes); - return 0; + return ret; } #define declare_get_random_var_wait(var) \ -- 2.16.1
[PATCH] random: always fill buffer in get_random_bytes_wait
In the unfortunate event that a developer fails to check the return value of get_random_bytes_wait, or simply wants to make a "best effort" attempt, for whatever that's worth, it's much better to still fill the buffer with _something_ rather than catastrophically failing in the case of an interruption. This is both a defense in depth measure against inevitable programming bugs, as well as a means of making the API a bit more useful. Signed-off-by: Jason A. Donenfeld --- include/linux/random.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/include/linux/random.h b/include/linux/random.h index 4024f7d9c77d..2ddf13b4281e 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -85,10 +85,8 @@ static inline unsigned long get_random_canary(void) static inline int get_random_bytes_wait(void *buf, int nbytes) { int ret = wait_for_random_bytes(); - if (unlikely(ret)) - return ret; get_random_bytes(buf, nbytes); - return 0; + return ret; } #define declare_get_random_var_wait(var) \ -- 2.16.1
Re: [kernel-hardening] Re: [PATCH v2] cpu: do not leak vulnerabilities to unprivileged users
On Fri, Jan 26, 2018 at 5:43 PM, Alan Coxwrote: > a) The info is already trivially accessible via /proc/cpuinfo No, /proc/cpuinfo shows if the CPU itself has these bugs, but doesn't show whether or not the kernel has gone to lengths to mitigate these bugs. # grep -o 'bugs.*cpu_meltdown' -m1 /proc/cpuinfo bugs: cpu_meltdown # cat /sys/devices/system/cpu/vulnerabilities/meltdown Mitigation: PTI > or by measurement to an attacker Right, so without this, an attacker has to measure. The purpose of this patchset is to require the attacker to perform an additional measurement. That seems worthwhile, especially if measurements are or would ever become non-trivial to make. > b) Some JIT and other environments need to know Shouldn't JITs do the best they can with the environment they're in? And for that, isn't /proc/cpuinfo enough? Jason
Re: [kernel-hardening] Re: [PATCH v2] cpu: do not leak vulnerabilities to unprivileged users
On Fri, Jan 26, 2018 at 5:43 PM, Alan Cox wrote: > a) The info is already trivially accessible via /proc/cpuinfo No, /proc/cpuinfo shows if the CPU itself has these bugs, but doesn't show whether or not the kernel has gone to lengths to mitigate these bugs. # grep -o 'bugs.*cpu_meltdown' -m1 /proc/cpuinfo bugs: cpu_meltdown # cat /sys/devices/system/cpu/vulnerabilities/meltdown Mitigation: PTI > or by measurement to an attacker Right, so without this, an attacker has to measure. The purpose of this patchset is to require the attacker to perform an additional measurement. That seems worthwhile, especially if measurements are or would ever become non-trivial to make. > b) Some JIT and other environments need to know Shouldn't JITs do the best they can with the environment they're in? And for that, isn't /proc/cpuinfo enough? Jason
[PATCH v2] cpu: do not leak vulnerabilities to unprivileged users
While it's public information if the CPU in general has spectre/meltdown bugs, it probably shouldn't be as globally obvious to all unprivileged users whether or not the kernel is doing something to mitigate those bugs. While an attacker can obviously probe and try, there frequently is a trade-off attackers make of how much probing around they're willing to do versus the certainty of an attack working, in order to reduce detection. By making it loud and clear that the kernel _is_ vulnerable, we're simply aiding the trade-off calculations attackers have to make when choosing which vectors to target. So, this patch changes the permissions to 0400 to make the attacker's job slightly less easy. While we're at it, we clean up the leak in dmesg too. Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com> --- v2 handles dmesg too. arch/x86/kernel/cpu/bugs.c | 1 - drivers/base/cpu.c | 6 +++--- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 390b3dc3d438..e512ae82f201 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -230,7 +230,6 @@ static void __init spectre_v2_select_mitigation(void) } spectre_v2_enabled = mode; - pr_info("%s\n", spectre_v2_strings[mode]); /* * If neither SMEP or KPTI are available, there is a risk of diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index d99038487a0d..a3a8e008f957 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -531,9 +531,9 @@ ssize_t __weak cpu_show_spectre_v2(struct device *dev, return sprintf(buf, "Not affected\n"); } -static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL); -static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL); -static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL); +static DEVICE_ATTR(meltdown, 0400, cpu_show_meltdown, NULL); +static DEVICE_ATTR(spectre_v1, 0400, cpu_show_spectre_v1, NULL); +static DEVICE_ATTR(spectre_v2, 0400, cpu_show_spectre_v2, NULL); static struct attribute *cpu_root_vulnerabilities_attrs[] = { _attr_meltdown.attr, -- 2.16.1
[PATCH v2] cpu: do not leak vulnerabilities to unprivileged users
While it's public information if the CPU in general has spectre/meltdown bugs, it probably shouldn't be as globally obvious to all unprivileged users whether or not the kernel is doing something to mitigate those bugs. While an attacker can obviously probe and try, there frequently is a trade-off attackers make of how much probing around they're willing to do versus the certainty of an attack working, in order to reduce detection. By making it loud and clear that the kernel _is_ vulnerable, we're simply aiding the trade-off calculations attackers have to make when choosing which vectors to target. So, this patch changes the permissions to 0400 to make the attacker's job slightly less easy. While we're at it, we clean up the leak in dmesg too. Signed-off-by: Jason A. Donenfeld --- v2 handles dmesg too. arch/x86/kernel/cpu/bugs.c | 1 - drivers/base/cpu.c | 6 +++--- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 390b3dc3d438..e512ae82f201 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -230,7 +230,6 @@ static void __init spectre_v2_select_mitigation(void) } spectre_v2_enabled = mode; - pr_info("%s\n", spectre_v2_strings[mode]); /* * If neither SMEP or KPTI are available, there is a risk of diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index d99038487a0d..a3a8e008f957 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -531,9 +531,9 @@ ssize_t __weak cpu_show_spectre_v2(struct device *dev, return sprintf(buf, "Not affected\n"); } -static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL); -static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL); -static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL); +static DEVICE_ATTR(meltdown, 0400, cpu_show_meltdown, NULL); +static DEVICE_ATTR(spectre_v1, 0400, cpu_show_spectre_v1, NULL); +static DEVICE_ATTR(spectre_v2, 0400, cpu_show_spectre_v2, NULL); static struct attribute *cpu_root_vulnerabilities_attrs[] = { _attr_meltdown.attr, -- 2.16.1
Re: [kernel-hardening] Re: [PATCH] cpu: do not leak vulnerabilities to unprivileged users
On Thu, Jan 25, 2018 at 2:34 PM, Alan Coxwrote: > As you observe any attacker can already trivially ascertain whether > protection is on, so there is no point pretending file permissions > magically stop that. In fact the information is already in cpuinfo. Actually the other place it leaks is in dmesg, which would need to be patched too. My understanding about cpuinfo was that it showed whether or not the processor family is generally vulnerable to it, independent of whether or not the kernel has been patched. What this patch does relates to whether or not the kernel has been patched.
Re: [kernel-hardening] Re: [PATCH] cpu: do not leak vulnerabilities to unprivileged users
On Thu, Jan 25, 2018 at 2:34 PM, Alan Cox wrote: > As you observe any attacker can already trivially ascertain whether > protection is on, so there is no point pretending file permissions > magically stop that. In fact the information is already in cpuinfo. Actually the other place it leaks is in dmesg, which would need to be patched too. My understanding about cpuinfo was that it showed whether or not the processor family is generally vulnerable to it, independent of whether or not the kernel has been patched. What this patch does relates to whether or not the kernel has been patched.
[PATCH] cpu: do not leak vulnerabilities to unprivileged users
While it's public information if the CPU in general has spectre/meltdown bugs, it probably shouldn't be as globally obvious to all unprivileged users whether or not the kernel is doing something to mitigate those bugs. While an attacker can obviously probe and try, there frequently is a trade-off attackers make of how much probing around they're willing to do versus the certainty of an attack working, in order to reduce detection. By making it loud and clear that the kernel _is_ vulnerable, we're simply aiding the trade-off calculations attackers have to make when choosing which vectors to target. So, this patch changes the permissions to 0400 to make the attacker's job slightly less easy. Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com> --- drivers/base/cpu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index d99038487a0d..a3a8e008f957 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -531,9 +531,9 @@ ssize_t __weak cpu_show_spectre_v2(struct device *dev, return sprintf(buf, "Not affected\n"); } -static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL); -static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL); -static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL); +static DEVICE_ATTR(meltdown, 0400, cpu_show_meltdown, NULL); +static DEVICE_ATTR(spectre_v1, 0400, cpu_show_spectre_v1, NULL); +static DEVICE_ATTR(spectre_v2, 0400, cpu_show_spectre_v2, NULL); static struct attribute *cpu_root_vulnerabilities_attrs[] = { _attr_meltdown.attr, -- 2.16.1
[PATCH] cpu: do not leak vulnerabilities to unprivileged users
While it's public information if the CPU in general has spectre/meltdown bugs, it probably shouldn't be as globally obvious to all unprivileged users whether or not the kernel is doing something to mitigate those bugs. While an attacker can obviously probe and try, there frequently is a trade-off attackers make of how much probing around they're willing to do versus the certainty of an attack working, in order to reduce detection. By making it loud and clear that the kernel _is_ vulnerable, we're simply aiding the trade-off calculations attackers have to make when choosing which vectors to target. So, this patch changes the permissions to 0400 to make the attacker's job slightly less easy. Signed-off-by: Jason A. Donenfeld --- drivers/base/cpu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index d99038487a0d..a3a8e008f957 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -531,9 +531,9 @@ ssize_t __weak cpu_show_spectre_v2(struct device *dev, return sprintf(buf, "Not affected\n"); } -static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL); -static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL); -static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL); +static DEVICE_ATTR(meltdown, 0400, cpu_show_meltdown, NULL); +static DEVICE_ATTR(spectre_v1, 0400, cpu_show_spectre_v1, NULL); +static DEVICE_ATTR(spectre_v2, 0400, cpu_show_spectre_v2, NULL); static struct attribute *cpu_root_vulnerabilities_attrs[] = { _attr_meltdown.attr, -- 2.16.1
[PATCH] arm64: support __int128 with clang
Commit fb8722735f50 ("arm64: support __int128 on gcc 5+") added support for arm64 __int128 with gcc with a version-conditional, but neglected to enable this for clang, which in fact appears to support aarch64 __int128. This commit therefore enables it if the compiler is clang, using the same type of makefile conditional used elsewhere in the tree. Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com> --- arch/arm64/Makefile | 4 1 file changed, 4 insertions(+) diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index b481b4a7c011..16c2e8b58546 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -57,7 +57,11 @@ KBUILD_AFLAGS+= $(lseinstr) $(brokengasinst) KBUILD_CFLAGS += $(call cc-option,-mabi=lp64) KBUILD_AFLAGS += $(call cc-option,-mabi=lp64) +ifeq ($(cc-name),clang) +KBUILD_CFLAGS += -DCONFIG_ARCH_SUPPORTS_INT128 +else KBUILD_CFLAGS += $(call cc-ifversion, -ge, 0500, -DCONFIG_ARCH_SUPPORTS_INT128) +endif ifeq ($(CONFIG_CPU_BIG_ENDIAN), y) KBUILD_CPPFLAGS+= -mbig-endian -- 2.15.1
[PATCH] arm64: support __int128 with clang
Commit fb8722735f50 ("arm64: support __int128 on gcc 5+") added support for arm64 __int128 with gcc with a version-conditional, but neglected to enable this for clang, which in fact appears to support aarch64 __int128. This commit therefore enables it if the compiler is clang, using the same type of makefile conditional used elsewhere in the tree. Signed-off-by: Jason A. Donenfeld --- arch/arm64/Makefile | 4 1 file changed, 4 insertions(+) diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index b481b4a7c011..16c2e8b58546 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -57,7 +57,11 @@ KBUILD_AFLAGS+= $(lseinstr) $(brokengasinst) KBUILD_CFLAGS += $(call cc-option,-mabi=lp64) KBUILD_AFLAGS += $(call cc-option,-mabi=lp64) +ifeq ($(cc-name),clang) +KBUILD_CFLAGS += -DCONFIG_ARCH_SUPPORTS_INT128 +else KBUILD_CFLAGS += $(call cc-ifversion, -ge, 0500, -DCONFIG_ARCH_SUPPORTS_INT128) +endif ifeq ($(CONFIG_CPU_BIG_ENDIAN), y) KBUILD_CPPFLAGS+= -mbig-endian -- 2.15.1
[PATCH] arch/um: compile with modern headers
Recent libcs have gotten a bit more strict, so we actually need to include the right headers and use the right types. This enables UML to compile again. Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com> Cc: sta...@vger.kernel.org --- Rebased on linux-next, by request. arch/um/os-Linux/file.c | 1 + arch/um/os-Linux/signal.c | 1 + arch/x86/um/stub_segv.c | 1 + 3 files changed, 3 insertions(+) diff --git a/arch/um/os-Linux/file.c b/arch/um/os-Linux/file.c index 2db18cbbb0ea..c0197097c86e 100644 --- a/arch/um/os-Linux/file.c +++ b/arch/um/os-Linux/file.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c index a5c0c909c48b..bf0acb8aad8b 100644 --- a/arch/um/os-Linux/signal.c +++ b/arch/um/os-Linux/signal.c @@ -16,6 +16,7 @@ #include #include #include +#include void (*sig_info[NSIG])(int, struct siginfo *, struct uml_pt_regs *) = { [SIGTRAP] = relay_signal, diff --git a/arch/x86/um/stub_segv.c b/arch/x86/um/stub_segv.c index fd6825537b97..27361cbb7ca9 100644 --- a/arch/x86/um/stub_segv.c +++ b/arch/x86/um/stub_segv.c @@ -6,6 +6,7 @@ #include #include #include +#include void __attribute__ ((__section__ (".__syscall_stub"))) stub_segv_handler(int sig, siginfo_t *info, void *p) -- 2.15.1
[PATCH] arch/um: compile with modern headers
Recent libcs have gotten a bit more strict, so we actually need to include the right headers and use the right types. This enables UML to compile again. Signed-off-by: Jason A. Donenfeld Cc: sta...@vger.kernel.org --- Rebased on linux-next, by request. arch/um/os-Linux/file.c | 1 + arch/um/os-Linux/signal.c | 1 + arch/x86/um/stub_segv.c | 1 + 3 files changed, 3 insertions(+) diff --git a/arch/um/os-Linux/file.c b/arch/um/os-Linux/file.c index 2db18cbbb0ea..c0197097c86e 100644 --- a/arch/um/os-Linux/file.c +++ b/arch/um/os-Linux/file.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c index a5c0c909c48b..bf0acb8aad8b 100644 --- a/arch/um/os-Linux/signal.c +++ b/arch/um/os-Linux/signal.c @@ -16,6 +16,7 @@ #include #include #include +#include void (*sig_info[NSIG])(int, struct siginfo *, struct uml_pt_regs *) = { [SIGTRAP] = relay_signal, diff --git a/arch/x86/um/stub_segv.c b/arch/x86/um/stub_segv.c index fd6825537b97..27361cbb7ca9 100644 --- a/arch/x86/um/stub_segv.c +++ b/arch/x86/um/stub_segv.c @@ -6,6 +6,7 @@ #include #include #include +#include void __attribute__ ((__section__ (".__syscall_stub"))) stub_segv_handler(int sig, siginfo_t *info, void *p) -- 2.15.1
[PATCH] arch/um: compile with modern headers
Recent libcs have gotten a bit more strict, so we actually need to include the right headers and use the right types. This enables UML to compile again. Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com> Cc: sta...@vger.kernel.org --- arch/um/os-Linux/file.c | 1 + arch/um/os-Linux/signal.c | 3 ++- arch/x86/um/stub_segv.c | 3 ++- 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/um/os-Linux/file.c b/arch/um/os-Linux/file.c index 2db18cbbb0ea..c0197097c86e 100644 --- a/arch/um/os-Linux/file.c +++ b/arch/um/os-Linux/file.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c index a86d7cc2c2d8..bf0acb8aad8b 100644 --- a/arch/um/os-Linux/signal.c +++ b/arch/um/os-Linux/signal.c @@ -16,6 +16,7 @@ #include #include #include +#include void (*sig_info[NSIG])(int, struct siginfo *, struct uml_pt_regs *) = { [SIGTRAP] = relay_signal, @@ -159,7 +160,7 @@ static void (*handlers[_NSIG])(int sig, struct siginfo *si, mcontext_t *mc) = { static void hard_handler(int sig, siginfo_t *si, void *p) { - struct ucontext *uc = p; + ucontext_t *uc = p; mcontext_t *mc = >uc_mcontext; unsigned long pending = 1UL << sig; diff --git a/arch/x86/um/stub_segv.c b/arch/x86/um/stub_segv.c index 1518d2805ae8..27361cbb7ca9 100644 --- a/arch/x86/um/stub_segv.c +++ b/arch/x86/um/stub_segv.c @@ -6,11 +6,12 @@ #include #include #include +#include void __attribute__ ((__section__ (".__syscall_stub"))) stub_segv_handler(int sig, siginfo_t *info, void *p) { - struct ucontext *uc = p; + ucontext_t *uc = p; GET_FAULTINFO_FROM_MC(*((struct faultinfo *) STUB_DATA), >uc_mcontext); -- 2.15.1
[PATCH] arch/um: compile with modern headers
Recent libcs have gotten a bit more strict, so we actually need to include the right headers and use the right types. This enables UML to compile again. Signed-off-by: Jason A. Donenfeld Cc: sta...@vger.kernel.org --- arch/um/os-Linux/file.c | 1 + arch/um/os-Linux/signal.c | 3 ++- arch/x86/um/stub_segv.c | 3 ++- 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/um/os-Linux/file.c b/arch/um/os-Linux/file.c index 2db18cbbb0ea..c0197097c86e 100644 --- a/arch/um/os-Linux/file.c +++ b/arch/um/os-Linux/file.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c index a86d7cc2c2d8..bf0acb8aad8b 100644 --- a/arch/um/os-Linux/signal.c +++ b/arch/um/os-Linux/signal.c @@ -16,6 +16,7 @@ #include #include #include +#include void (*sig_info[NSIG])(int, struct siginfo *, struct uml_pt_regs *) = { [SIGTRAP] = relay_signal, @@ -159,7 +160,7 @@ static void (*handlers[_NSIG])(int sig, struct siginfo *si, mcontext_t *mc) = { static void hard_handler(int sig, siginfo_t *si, void *p) { - struct ucontext *uc = p; + ucontext_t *uc = p; mcontext_t *mc = >uc_mcontext; unsigned long pending = 1UL << sig; diff --git a/arch/x86/um/stub_segv.c b/arch/x86/um/stub_segv.c index 1518d2805ae8..27361cbb7ca9 100644 --- a/arch/x86/um/stub_segv.c +++ b/arch/x86/um/stub_segv.c @@ -6,11 +6,12 @@ #include #include #include +#include void __attribute__ ((__section__ (".__syscall_stub"))) stub_segv_handler(int sig, siginfo_t *info, void *p) { - struct ucontext *uc = p; + ucontext_t *uc = p; GET_FAULTINFO_FROM_MC(*((struct faultinfo *) STUB_DATA), >uc_mcontext); -- 2.15.1
Re: WireGuard Upstreaming Roadmap (November 2017)
Hi Dave, On Fri, Dec 8, 2017 at 4:38 PM, David Millerwrote: > Sorry, you cannot force the discussion of a feature which will be submitted > upstream to occur on a private mailing list. > > It is _ABSOLUTELY_ appropriate to discss this on netdev since it is the > netdev community which must consider issues like this when looking at > whether to accept WireGuard upstream. > > Jason, this action and response was entirely inappropriate, and please > I'd like you to reply properly to questions about your feature here. Whoops, okay! Very sorry. I'm actually kind of happy to hear that. I had assumed that you'd be annoyed if WireGuard crypto discussion spewed over into netdev adding even more message volume there for something perhaps not directly relevant. But in fact, you're interested and find it important to discuss there. So, good news. And sorry for trying to shew it away before. I'll copy and paste the response I had made on the other list: > This is covered in the paper: > https://www.wireguard.com/papers/wireguard.pdf > > The basic answer is that WireGuard has message type identifiers, and > the handshake also hashes into it an identifier of the primitives > used. If there's ever a problem with those primitives chosen, it will > be possible to introduce new message type identifiers, if that kind of > "support everything even the broken garbage" approach is desired by > misguided people. However, a better approach, of course, is to keep > your secure network separate from your insecure network, and to not > allow insecure nodes on secure segments; when you mix the two, > disaster tends to strike. So, in other words, both approaches to "upgrading" > are possible, in this fantasy wireguardalypse. Take note, though, that > neither one of these approaches (support new and retain old protocol > too for old nodes, or only support new) are "agile" or are anything at > all like the 90s "cipher agility" -- the user still is not permitted > to "choose" ciphers. Regards, Jason
Re: WireGuard Upstreaming Roadmap (November 2017)
Hi Dave, On Fri, Dec 8, 2017 at 4:38 PM, David Miller wrote: > Sorry, you cannot force the discussion of a feature which will be submitted > upstream to occur on a private mailing list. > > It is _ABSOLUTELY_ appropriate to discss this on netdev since it is the > netdev community which must consider issues like this when looking at > whether to accept WireGuard upstream. > > Jason, this action and response was entirely inappropriate, and please > I'd like you to reply properly to questions about your feature here. Whoops, okay! Very sorry. I'm actually kind of happy to hear that. I had assumed that you'd be annoyed if WireGuard crypto discussion spewed over into netdev adding even more message volume there for something perhaps not directly relevant. But in fact, you're interested and find it important to discuss there. So, good news. And sorry for trying to shew it away before. I'll copy and paste the response I had made on the other list: > This is covered in the paper: > https://www.wireguard.com/papers/wireguard.pdf > > The basic answer is that WireGuard has message type identifiers, and > the handshake also hashes into it an identifier of the primitives > used. If there's ever a problem with those primitives chosen, it will > be possible to introduce new message type identifiers, if that kind of > "support everything even the broken garbage" approach is desired by > misguided people. However, a better approach, of course, is to keep > your secure network separate from your insecure network, and to not > allow insecure nodes on secure segments; when you mix the two, > disaster tends to strike. So, in other words, both approaches to "upgrading" > are possible, in this fantasy wireguardalypse. Take note, though, that > neither one of these approaches (support new and retain old protocol > too for old nodes, or only support new) are "agile" or are anything at > all like the 90s "cipher agility" -- the user still is not permitted > to "choose" ciphers. Regards, Jason
Re: WireGuard Upstreaming Roadmap (November 2017)
On Thu, Dec 7, 2017 at 11:22 AM, Stefan Tatschnerwrote: > I have a question which is related to the involved crypto. As far as I > have understood the protocol and the concept of wireguard > What's your opinion on this? This thread has been picked up on the WireGuard mailing list, not here. Since this concerns the interworkings of the protocol and cryptography as a whole, as opposed to implementation details of Linux, please do not send these inquiries to LKML. Additionally, please start new threads for new topics in the future, rather than hijacking a roadmap thread. Look for my answer on the other mailing list. I'll CC you too. Regards, Jason
Re: WireGuard Upstreaming Roadmap (November 2017)
On Thu, Dec 7, 2017 at 11:22 AM, Stefan Tatschner wrote: > I have a question which is related to the involved crypto. As far as I > have understood the protocol and the concept of wireguard > What's your opinion on this? This thread has been picked up on the WireGuard mailing list, not here. Since this concerns the interworkings of the protocol and cryptography as a whole, as opposed to implementation details of Linux, please do not send these inquiries to LKML. Additionally, please start new threads for new topics in the future, rather than hijacking a roadmap thread. Look for my answer on the other mailing list. I'll CC you too. Regards, Jason
Essential get_user fix missing from 3.10 aarch64
Hi stable/arm/Willy, 1f65c13efef69b6dc908e588f91a133641d8475c is an important commit, because it involves evaluation of pointers from userspace. I'm running into issues with RNDADDTOENTCNT reading bogus values, because p is incremented twice as much as it should in this random.c block: case RNDADDENTROPY: if (!capable(CAP_SYS_ADMIN)) return -EPERM; if (get_user(ent_count, p++)) return -EFAULT; if (ent_count < 0) return -EINVAL; if (get_user(size, p++)) return -EFAULT; retval = write_pool(_pool, (const char __user *)p, size); That seems reasonable, but on aarch64, get_user is defined as: #define get_user(x, ptr)\ ({ \ might_sleep(); \ access_ok(VERIFY_READ, (ptr), sizeof(*(ptr))) ? \ __get_user((x), (ptr)) :\ ((x) = 0, -EFAULT); \ }) Notice the multiple use of ptr. I thought I had found something breathtakingly bad, until I realized that it was already fixed in 2013 by Takahiro. It just wasn't marked for stable. Not sure if there's ever going to be another stable 3.10 release, but if so, this would be an important one to backport. Regards, Jason
Essential get_user fix missing from 3.10 aarch64
Hi stable/arm/Willy, 1f65c13efef69b6dc908e588f91a133641d8475c is an important commit, because it involves evaluation of pointers from userspace. I'm running into issues with RNDADDTOENTCNT reading bogus values, because p is incremented twice as much as it should in this random.c block: case RNDADDENTROPY: if (!capable(CAP_SYS_ADMIN)) return -EPERM; if (get_user(ent_count, p++)) return -EFAULT; if (ent_count < 0) return -EINVAL; if (get_user(size, p++)) return -EFAULT; retval = write_pool(_pool, (const char __user *)p, size); That seems reasonable, but on aarch64, get_user is defined as: #define get_user(x, ptr)\ ({ \ might_sleep(); \ access_ok(VERIFY_READ, (ptr), sizeof(*(ptr))) ? \ __get_user((x), (ptr)) :\ ((x) = 0, -EFAULT); \ }) Notice the multiple use of ptr. I thought I had found something breathtakingly bad, until I realized that it was already fixed in 2013 by Takahiro. It just wasn't marked for stable. Not sure if there's ever going to be another stable 3.10 release, but if so, this would be an important one to backport. Regards, Jason
[PATCH v2] arm: detect buggy binutils when in thumb2 mode
On older versions of binutils, \sym points to an aligned address. On newer versions of binutils, \sym sometimes points to the unaligned thumb address in certain circumstances. In order to homogenize this behavior, rather than adding 1, we could simply OR in 1, so that already unaligned instructions don't change. While that works, the downside is that we have to add an `orr` instruction to a fast path. The assembler can't do this at assemble time via "|1" because "invalid operands (.text and *ABS* sections) for `|'". A better solution would be to have consistent binutils behavior, but that ship has sailed. So, this commit adds a detection mechanism, which began as a small thing from Russell King that I then rewrote to use pure bash instead of shelling out, so that it doesn't slow down the build process. The detection mechanism _could_ be used to modify the assembly we generate, but for now it's just being used to catch buggy binutils and abort the build process in that case. The rest of this commit message contains all of the relevant information about the boot bug when compiled in thumb2 mode. My tests concerned these versions: broken: GNU ld (Gentoo 2.29.1 p3) 2.29.1 working: GNU ld (GNU Binutils for Ubuntu) 2.26.1 These produced the following code: --- broken 2017-11-21 17:44:14.523416082 +0100 +++ working 2017-11-21 17:44:44.548461234 +0100 @@ -133,7 +133,7 @@ 160: f01a 0ff0 tst.w sl, #240; 0xf0 164: d111bne.n 18a <__sys_trace> 166: f5b7 7fc8 cmp.w r7, #400; 0x190 - 16a: f2af 1e6a subwlr, pc, #362; 0x16a + 16a: f2af 1e6b subwlr, pc, #363; 0x16b 16e: bf38it cc 170: f858 f027 ldrcc.w pc, [r8, r7, lsl #2] 174: a902add r1, sp, #8 The differing instruction corresponds with this actual line in arch/arm/kernel/entry-common.S: badrlr, ret_fast_syscall@ return address Running the broken kernel results in a runtime OOPS with: PC is at ret_fast_syscall+0x4/0x52 LR is at ret_fast_syscall+0x2/0x52 The disassembly of that function for the crashing kernel is: .text: ret_fast_syscall; CODE XREF: sys_syscall+1C↓j .text: CPSID I ; jumptable 0840 cases 15,18-376 .text:0002 .text:0002 loc_2 ; DATA XREF: sys_syscall-6BA↓o .text:0002 LDR.W R2, [R9,#8] .text:0006 CMP.W R2, #0xBF00 Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com> Cc: Russell King <li...@armlinux.org.uk> Cc: ni...@redhat.com Cc: sta...@vger.kernel.org --- Had the file mode wrong on the submission from a second ago, sorry about that. arch/arm/Makefile| 7 +-- arch/arm/tools/Makefile | 5 - arch/arm/tools/toolcheck | 44 3 files changed, 53 insertions(+), 3 deletions(-) create mode 100755 arch/arm/tools/toolcheck diff --git a/arch/arm/Makefile b/arch/arm/Makefile index 80351e505fd5..bd4e248a7f8f 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -319,16 +319,19 @@ all: $(notdir $(KBUILD_IMAGE)) $(KBUILD_DTBS) archheaders: $(Q)$(MAKE) $(build)=arch/arm/tools uapi -archprepare: +archprepare: toolcheck $(Q)$(MAKE) $(build)=arch/arm/tools kapi +toolcheck: + $(Q)$(MAKE) $(build)=arch/arm/tools $@ + # Convert bzImage to zImage bzImage: zImage BOOT_TARGETS = zImage Image xipImage bootpImage uImage INSTALL_TARGETS= zinstall uinstall install -PHONY += bzImage $(BOOT_TARGETS) $(INSTALL_TARGETS) +PHONY += bzImage $(BOOT_TARGETS) $(INSTALL_TARGETS) toolcheck bootpImage uImage: zImage zImage: Image diff --git a/arch/arm/tools/Makefile b/arch/arm/tools/Makefile index ddb89a7db36f..0a283756f1c5 100644 --- a/arch/arm/tools/Makefile +++ b/arch/arm/tools/Makefile @@ -23,12 +23,15 @@ uapi-hdrs-y += $(uapi)/unistd-eabi.h targets += $(addprefix ../../../,$(gen-y) $(kapi-hdrs-y) $(uapi-hdrs-y)) -PHONY += kapi uapi +PHONY += kapi uapi toolcheck kapi: $(kapi-hdrs-y) $(gen-y) uapi: $(uapi-hdrs-y) +toolcheck: + @'$(srctree)/$(src)/toolcheck' + # Create output directory if not already present _dummy := $(shell [ -d '$(kapi)' ] || mkdir -p '$(kapi)') \ $(shell [ -d '$(uapi)' ] || mkdir -p '$(uapi)') diff --git a/arch/arm/tools/toolcheck b/arch/arm/tools/toolcheck new file mode 100755 index ..04fc44b750d2 --- /dev/null +++ b/arch/arm/tools/toolcheck @@ -0,0 +1,44 @@ +#!/bin/bash +# +# Copyright 2017 Jason A. Donenfeld <ja...@zx2c4.com>. All Rights Reserved. +# + +set -e + +cleanup() { + [[ ! -d $temp ]] || rm -rf "$temp" + exit +} +trap cleanup INT TERM EXIT +temp="$(mktemp -d)" + +check_thumb2_address() { + local disassembly + + $CC $KBUILD_AFLAGS -o "$temp/a.out" -c -x
[PATCH v2] arm: detect buggy binutils when in thumb2 mode
On older versions of binutils, \sym points to an aligned address. On newer versions of binutils, \sym sometimes points to the unaligned thumb address in certain circumstances. In order to homogenize this behavior, rather than adding 1, we could simply OR in 1, so that already unaligned instructions don't change. While that works, the downside is that we have to add an `orr` instruction to a fast path. The assembler can't do this at assemble time via "|1" because "invalid operands (.text and *ABS* sections) for `|'". A better solution would be to have consistent binutils behavior, but that ship has sailed. So, this commit adds a detection mechanism, which began as a small thing from Russell King that I then rewrote to use pure bash instead of shelling out, so that it doesn't slow down the build process. The detection mechanism _could_ be used to modify the assembly we generate, but for now it's just being used to catch buggy binutils and abort the build process in that case. The rest of this commit message contains all of the relevant information about the boot bug when compiled in thumb2 mode. My tests concerned these versions: broken: GNU ld (Gentoo 2.29.1 p3) 2.29.1 working: GNU ld (GNU Binutils for Ubuntu) 2.26.1 These produced the following code: --- broken 2017-11-21 17:44:14.523416082 +0100 +++ working 2017-11-21 17:44:44.548461234 +0100 @@ -133,7 +133,7 @@ 160: f01a 0ff0 tst.w sl, #240; 0xf0 164: d111bne.n 18a <__sys_trace> 166: f5b7 7fc8 cmp.w r7, #400; 0x190 - 16a: f2af 1e6a subwlr, pc, #362; 0x16a + 16a: f2af 1e6b subwlr, pc, #363; 0x16b 16e: bf38it cc 170: f858 f027 ldrcc.w pc, [r8, r7, lsl #2] 174: a902add r1, sp, #8 The differing instruction corresponds with this actual line in arch/arm/kernel/entry-common.S: badrlr, ret_fast_syscall@ return address Running the broken kernel results in a runtime OOPS with: PC is at ret_fast_syscall+0x4/0x52 LR is at ret_fast_syscall+0x2/0x52 The disassembly of that function for the crashing kernel is: .text: ret_fast_syscall; CODE XREF: sys_syscall+1C↓j .text: CPSID I ; jumptable 0840 cases 15,18-376 .text:0002 .text:0002 loc_2 ; DATA XREF: sys_syscall-6BA↓o .text:0002 LDR.W R2, [R9,#8] .text:0006 CMP.W R2, #0xBF00 Signed-off-by: Jason A. Donenfeld Cc: Russell King Cc: ni...@redhat.com Cc: sta...@vger.kernel.org --- Had the file mode wrong on the submission from a second ago, sorry about that. arch/arm/Makefile| 7 +-- arch/arm/tools/Makefile | 5 - arch/arm/tools/toolcheck | 44 3 files changed, 53 insertions(+), 3 deletions(-) create mode 100755 arch/arm/tools/toolcheck diff --git a/arch/arm/Makefile b/arch/arm/Makefile index 80351e505fd5..bd4e248a7f8f 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -319,16 +319,19 @@ all: $(notdir $(KBUILD_IMAGE)) $(KBUILD_DTBS) archheaders: $(Q)$(MAKE) $(build)=arch/arm/tools uapi -archprepare: +archprepare: toolcheck $(Q)$(MAKE) $(build)=arch/arm/tools kapi +toolcheck: + $(Q)$(MAKE) $(build)=arch/arm/tools $@ + # Convert bzImage to zImage bzImage: zImage BOOT_TARGETS = zImage Image xipImage bootpImage uImage INSTALL_TARGETS= zinstall uinstall install -PHONY += bzImage $(BOOT_TARGETS) $(INSTALL_TARGETS) +PHONY += bzImage $(BOOT_TARGETS) $(INSTALL_TARGETS) toolcheck bootpImage uImage: zImage zImage: Image diff --git a/arch/arm/tools/Makefile b/arch/arm/tools/Makefile index ddb89a7db36f..0a283756f1c5 100644 --- a/arch/arm/tools/Makefile +++ b/arch/arm/tools/Makefile @@ -23,12 +23,15 @@ uapi-hdrs-y += $(uapi)/unistd-eabi.h targets += $(addprefix ../../../,$(gen-y) $(kapi-hdrs-y) $(uapi-hdrs-y)) -PHONY += kapi uapi +PHONY += kapi uapi toolcheck kapi: $(kapi-hdrs-y) $(gen-y) uapi: $(uapi-hdrs-y) +toolcheck: + @'$(srctree)/$(src)/toolcheck' + # Create output directory if not already present _dummy := $(shell [ -d '$(kapi)' ] || mkdir -p '$(kapi)') \ $(shell [ -d '$(uapi)' ] || mkdir -p '$(uapi)') diff --git a/arch/arm/tools/toolcheck b/arch/arm/tools/toolcheck new file mode 100755 index ..04fc44b750d2 --- /dev/null +++ b/arch/arm/tools/toolcheck @@ -0,0 +1,44 @@ +#!/bin/bash +# +# Copyright 2017 Jason A. Donenfeld . All Rights Reserved. +# + +set -e + +cleanup() { + [[ ! -d $temp ]] || rm -rf "$temp" + exit +} +trap cleanup INT TERM EXIT +temp="$(mktemp -d)" + +check_thumb2_address() { + local disassembly + + $CC $KBUILD_AFLAGS -o "$temp/a.out" -c -xassembler - <<-_EOF + .syntax unified + .thumb +
[PATCH] arm: detect buggy binutils when in thumb2 mode
On older versions of binutils, \sym points to an aligned address. On newer versions of binutils, \sym sometimes points to the unaligned thumb address in certain circumstances. In order to homogenize this behavior, rather than adding 1, we could simply OR in 1, so that already unaligned instructions don't change. While that works, the downside is that we have to add an `orr` instruction to a fast path. The assembler can't do this at assemble time via "|1" because "invalid operands (.text and *ABS* sections) for `|'". A better solution would be to have consistent binutils behavior, but that ship has sailed. So, this commit adds a detection mechanism, which began as a small thing from Russell King that I then rewrote to use pure bash instead of shelling out, so that it doesn't slow down the build process. The detection mechanism _could_ be used to modify the assembly we generate, but for now it's just being used to catch buggy binutils and abort the build process in that case. The rest of this commit message contains all of the relevant information about the boot bug when compiled in thumb2 mode. My tests concerned these versions: broken: GNU ld (Gentoo 2.29.1 p3) 2.29.1 working: GNU ld (GNU Binutils for Ubuntu) 2.26.1 These produced the following code: --- broken 2017-11-21 17:44:14.523416082 +0100 +++ working 2017-11-21 17:44:44.548461234 +0100 @@ -133,7 +133,7 @@ 160: f01a 0ff0 tst.w sl, #240; 0xf0 164: d111bne.n 18a <__sys_trace> 166: f5b7 7fc8 cmp.w r7, #400; 0x190 - 16a: f2af 1e6a subwlr, pc, #362; 0x16a + 16a: f2af 1e6b subwlr, pc, #363; 0x16b 16e: bf38it cc 170: f858 f027 ldrcc.w pc, [r8, r7, lsl #2] 174: a902add r1, sp, #8 The differing instruction corresponds with this actual line in arch/arm/kernel/entry-common.S: badrlr, ret_fast_syscall@ return address Running the broken kernel results in a runtime OOPS with: PC is at ret_fast_syscall+0x4/0x52 LR is at ret_fast_syscall+0x2/0x52 The disassembly of that function for the crashing kernel is: .text: ret_fast_syscall; CODE XREF: sys_syscall+1C↓j .text: CPSID I ; jumptable 0840 cases 15,18-376 .text:0002 .text:0002 loc_2 ; DATA XREF: sys_syscall-6BA↓o .text:0002 LDR.W R2, [R9,#8] .text:0006 CMP.W R2, #0xBF00 Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com> Cc: Russell King <li...@armlinux.org.uk> Cc: ni...@redhat.com Cc: sta...@vger.kernel.org --- arch/arm/Makefile| 7 +-- arch/arm/tools/Makefile | 5 - arch/arm/tools/toolcheck | 44 3 files changed, 53 insertions(+), 3 deletions(-) create mode 100644 arch/arm/tools/toolcheck diff --git a/arch/arm/Makefile b/arch/arm/Makefile index 80351e505fd5..bd4e248a7f8f 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -319,16 +319,19 @@ all: $(notdir $(KBUILD_IMAGE)) $(KBUILD_DTBS) archheaders: $(Q)$(MAKE) $(build)=arch/arm/tools uapi -archprepare: +archprepare: toolcheck $(Q)$(MAKE) $(build)=arch/arm/tools kapi +toolcheck: + $(Q)$(MAKE) $(build)=arch/arm/tools $@ + # Convert bzImage to zImage bzImage: zImage BOOT_TARGETS = zImage Image xipImage bootpImage uImage INSTALL_TARGETS= zinstall uinstall install -PHONY += bzImage $(BOOT_TARGETS) $(INSTALL_TARGETS) +PHONY += bzImage $(BOOT_TARGETS) $(INSTALL_TARGETS) toolcheck bootpImage uImage: zImage zImage: Image diff --git a/arch/arm/tools/Makefile b/arch/arm/tools/Makefile index ddb89a7db36f..0a283756f1c5 100644 --- a/arch/arm/tools/Makefile +++ b/arch/arm/tools/Makefile @@ -23,12 +23,15 @@ uapi-hdrs-y += $(uapi)/unistd-eabi.h targets += $(addprefix ../../../,$(gen-y) $(kapi-hdrs-y) $(uapi-hdrs-y)) -PHONY += kapi uapi +PHONY += kapi uapi toolcheck kapi: $(kapi-hdrs-y) $(gen-y) uapi: $(uapi-hdrs-y) +toolcheck: + @'$(srctree)/$(src)/toolcheck' + # Create output directory if not already present _dummy := $(shell [ -d '$(kapi)' ] || mkdir -p '$(kapi)') \ $(shell [ -d '$(uapi)' ] || mkdir -p '$(uapi)') diff --git a/arch/arm/tools/toolcheck b/arch/arm/tools/toolcheck new file mode 100644 index ..04fc44b750d2 --- /dev/null +++ b/arch/arm/tools/toolcheck @@ -0,0 +1,44 @@ +#!/bin/bash +# +# Copyright 2017 Jason A. Donenfeld <ja...@zx2c4.com>. All Rights Reserved. +# + +set -e + +cleanup() { + [[ ! -d $temp ]] || rm -rf "$temp" + exit +} +trap cleanup INT TERM EXIT +temp="$(mktemp -d)" + +check_thumb2_address() { + local disassembly + + $CC $KBUILD_AFLAGS -o "$temp/a.out" -c -xassembler - <<-_EOF + .syntax unified + .thu
[PATCH] arm: detect buggy binutils when in thumb2 mode
On older versions of binutils, \sym points to an aligned address. On newer versions of binutils, \sym sometimes points to the unaligned thumb address in certain circumstances. In order to homogenize this behavior, rather than adding 1, we could simply OR in 1, so that already unaligned instructions don't change. While that works, the downside is that we have to add an `orr` instruction to a fast path. The assembler can't do this at assemble time via "|1" because "invalid operands (.text and *ABS* sections) for `|'". A better solution would be to have consistent binutils behavior, but that ship has sailed. So, this commit adds a detection mechanism, which began as a small thing from Russell King that I then rewrote to use pure bash instead of shelling out, so that it doesn't slow down the build process. The detection mechanism _could_ be used to modify the assembly we generate, but for now it's just being used to catch buggy binutils and abort the build process in that case. The rest of this commit message contains all of the relevant information about the boot bug when compiled in thumb2 mode. My tests concerned these versions: broken: GNU ld (Gentoo 2.29.1 p3) 2.29.1 working: GNU ld (GNU Binutils for Ubuntu) 2.26.1 These produced the following code: --- broken 2017-11-21 17:44:14.523416082 +0100 +++ working 2017-11-21 17:44:44.548461234 +0100 @@ -133,7 +133,7 @@ 160: f01a 0ff0 tst.w sl, #240; 0xf0 164: d111bne.n 18a <__sys_trace> 166: f5b7 7fc8 cmp.w r7, #400; 0x190 - 16a: f2af 1e6a subwlr, pc, #362; 0x16a + 16a: f2af 1e6b subwlr, pc, #363; 0x16b 16e: bf38it cc 170: f858 f027 ldrcc.w pc, [r8, r7, lsl #2] 174: a902add r1, sp, #8 The differing instruction corresponds with this actual line in arch/arm/kernel/entry-common.S: badrlr, ret_fast_syscall@ return address Running the broken kernel results in a runtime OOPS with: PC is at ret_fast_syscall+0x4/0x52 LR is at ret_fast_syscall+0x2/0x52 The disassembly of that function for the crashing kernel is: .text: ret_fast_syscall; CODE XREF: sys_syscall+1C↓j .text: CPSID I ; jumptable 0840 cases 15,18-376 .text:0002 .text:0002 loc_2 ; DATA XREF: sys_syscall-6BA↓o .text:0002 LDR.W R2, [R9,#8] .text:0006 CMP.W R2, #0xBF00 Signed-off-by: Jason A. Donenfeld Cc: Russell King Cc: ni...@redhat.com Cc: sta...@vger.kernel.org --- arch/arm/Makefile| 7 +-- arch/arm/tools/Makefile | 5 - arch/arm/tools/toolcheck | 44 3 files changed, 53 insertions(+), 3 deletions(-) create mode 100644 arch/arm/tools/toolcheck diff --git a/arch/arm/Makefile b/arch/arm/Makefile index 80351e505fd5..bd4e248a7f8f 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -319,16 +319,19 @@ all: $(notdir $(KBUILD_IMAGE)) $(KBUILD_DTBS) archheaders: $(Q)$(MAKE) $(build)=arch/arm/tools uapi -archprepare: +archprepare: toolcheck $(Q)$(MAKE) $(build)=arch/arm/tools kapi +toolcheck: + $(Q)$(MAKE) $(build)=arch/arm/tools $@ + # Convert bzImage to zImage bzImage: zImage BOOT_TARGETS = zImage Image xipImage bootpImage uImage INSTALL_TARGETS= zinstall uinstall install -PHONY += bzImage $(BOOT_TARGETS) $(INSTALL_TARGETS) +PHONY += bzImage $(BOOT_TARGETS) $(INSTALL_TARGETS) toolcheck bootpImage uImage: zImage zImage: Image diff --git a/arch/arm/tools/Makefile b/arch/arm/tools/Makefile index ddb89a7db36f..0a283756f1c5 100644 --- a/arch/arm/tools/Makefile +++ b/arch/arm/tools/Makefile @@ -23,12 +23,15 @@ uapi-hdrs-y += $(uapi)/unistd-eabi.h targets += $(addprefix ../../../,$(gen-y) $(kapi-hdrs-y) $(uapi-hdrs-y)) -PHONY += kapi uapi +PHONY += kapi uapi toolcheck kapi: $(kapi-hdrs-y) $(gen-y) uapi: $(uapi-hdrs-y) +toolcheck: + @'$(srctree)/$(src)/toolcheck' + # Create output directory if not already present _dummy := $(shell [ -d '$(kapi)' ] || mkdir -p '$(kapi)') \ $(shell [ -d '$(uapi)' ] || mkdir -p '$(uapi)') diff --git a/arch/arm/tools/toolcheck b/arch/arm/tools/toolcheck new file mode 100644 index ..04fc44b750d2 --- /dev/null +++ b/arch/arm/tools/toolcheck @@ -0,0 +1,44 @@ +#!/bin/bash +# +# Copyright 2017 Jason A. Donenfeld . All Rights Reserved. +# + +set -e + +cleanup() { + [[ ! -d $temp ]] || rm -rf "$temp" + exit +} +trap cleanup INT TERM EXIT +temp="$(mktemp -d)" + +check_thumb2_address() { + local disassembly + + $CC $KBUILD_AFLAGS -o "$temp/a.out" -c -xassembler - <<-_EOF + .syntax unified + .thumb + .macro badr, reg, sym + adr \reg, \sym + 1
Fwd: [PATCH] arm: ensure symbol is a thumb symbol in new binutils
Hello Nick & Binutils developers, It would appear that recent changes in the binutils assembler (perhaps 52a86f843b6dee1de9977293da9786649b146b05? perhaps not?) have broken the kernel when compiled in thumb2 mode. We currently do not have a way of working around your breaking changes without adding additional runtime instructions, which isn't acceptable for us. Details are included in the thread below. Thanks, Jason Forwarded conversation Subject: [PATCH] arm: ensure symbol is a thumb symbol in new binutils From: Jason A. Donenfeld <ja...@zx2c4.com> Date: Tue, Nov 21, 2017 at 6:27 PM To: li...@armlinux.org.uk, linux-arm-ker...@lists.infradead.org Cc: "Jason A. Donenfeld" <ja...@zx2c4.com>, sta...@vger.kernel.org On older versions of binutils, \sym points to an aligned address. On newer versions of binutils, \sym sometimes points to the unaligned thumb address in mysterious and buggy circumstances. In order to homogenize this behavior, rather than adding 1, we simply OR in 1, so that already unaligned instructions don't change. This fix is required for a pedestrian THUMB2_KERNEL to boot without crashing when built with non-old binutils. While it works, the downside is that we have to add an `orr` instruction to a fast path. The assembler can't do this at assemble time via "|1" because "invalid operands (.text and *ABS* sections) for `|'", so we're forced to do this. A better solution would be to have consistent binutils behavior, or to have some kind of \sym feature detection that won't turn into a maze of version comparisons. However, it's at the moment unclear how to achieve this. The rest of this commit message contains all of the relevant information. My tests concerned these versions: broken: GNU ld (Gentoo 2.29.1 p3) 2.29.1 working: GNU ld (GNU Binutils for Ubuntu) 2.26.1 These produced the following code: --- broken 2017-11-21 17:44:14.523416082 +0100 +++ working 2017-11-21 17:44:44.548461234 +0100 @@ -133,7 +133,7 @@ 160: f01a 0ff0 tst.w sl, #240; 0xf0 164: d111bne.n 18a <__sys_trace> 166: f5b7 7fc8 cmp.w r7, #400; 0x190 - 16a: f2af 1e6a subwlr, pc, #362; 0x16a + 16a: f2af 1e6b subwlr, pc, #363; 0x16b 16e: bf38it cc 170: f858 f027 ldrcc.w pc, [r8, r7, lsl #2] 174: a902add r1, sp, #8 The differing instruction corresponds with this actual line in arch/arm/kernel/entry-common.S: badrlr, ret_fast_syscall@ return address Running the broken kernel results in a runtime OOPS with: PC is at ret_fast_syscall+0x4/0x52 LR is at ret_fast_syscall+0x2/0x52 The disassembly of that function for the crashing kernel is: .text: ret_fast_syscall; CODE XREF: sys_syscall+1C↓j .text: CPSID I ; jumptable 0840 cases 15,18-376 .text:0002 .text:0002 loc_2 ; DATA XREF: sys_syscall-6BA↓o .text:0002 LDR.W R2, [R9,#8] .text:0006 CMP.W R2, #0xBF00 Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com> Cc: sta...@vger.kernel.org --- arch/arm/include/asm/assembler.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h index ad301f107dd2..c62a3b6b0a3e 100644 --- a/arch/arm/include/asm/assembler.h +++ b/arch/arm/include/asm/assembler.h @@ -194,10 +194,9 @@ */ .irpc,,eq,ne,cs,cc,mi,pl,vs,vc,hi,ls,ge,lt,gt,le,hs,lo .macro badr\c, rd, sym -#ifdef CONFIG_THUMB2_KERNEL - adr\c \rd, \sym + 1 -#else adr\c \rd, \sym +#ifdef CONFIG_THUMB2_KERNEL + orr\c \rd, \rd, 1 #endif .endm .endr -- 2.15.0 -- From: Russell King - ARM Linux <li...@armlinux.org.uk> Date: Tue, Nov 21, 2017 at 6:38 PM To: "Jason A. Donenfeld" <ja...@zx2c4.com>, Will Deacon <will.dea...@arm.com> Cc: linux-arm-ker...@lists.infradead.org, l...@vger.kernel.org, sta...@vger.kernel.org As it just seems to be a limited range of binutils versions that are affected, I'd rather not impact the kernel fast-paths with extra cycles just because binutils decided to change behaviour. I'd prefer to inform people about the problem and get them to change to a non- buggy binutils. This seems to be the second binutils bug that's biting us within the last month... what's going on with binutils QA? arch/arm/Makefile| 7 +-- arch/arm/tools/Makefile | 5 - arch/arm/tools/toolcheck | 24 3 files changed, 33 insertions(+), 3 deletions(-) diff --git a/arch/arm/Makefile b/arch/arm/Makefile index 1cfac5119545..9e70d0435121 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -319,16 +319,19 @@ all: $(notd
Fwd: [PATCH] arm: ensure symbol is a thumb symbol in new binutils
Hello Nick & Binutils developers, It would appear that recent changes in the binutils assembler (perhaps 52a86f843b6dee1de9977293da9786649b146b05? perhaps not?) have broken the kernel when compiled in thumb2 mode. We currently do not have a way of working around your breaking changes without adding additional runtime instructions, which isn't acceptable for us. Details are included in the thread below. Thanks, Jason Forwarded conversation Subject: [PATCH] arm: ensure symbol is a thumb symbol in new binutils From: Jason A. Donenfeld Date: Tue, Nov 21, 2017 at 6:27 PM To: li...@armlinux.org.uk, linux-arm-ker...@lists.infradead.org Cc: "Jason A. Donenfeld" , sta...@vger.kernel.org On older versions of binutils, \sym points to an aligned address. On newer versions of binutils, \sym sometimes points to the unaligned thumb address in mysterious and buggy circumstances. In order to homogenize this behavior, rather than adding 1, we simply OR in 1, so that already unaligned instructions don't change. This fix is required for a pedestrian THUMB2_KERNEL to boot without crashing when built with non-old binutils. While it works, the downside is that we have to add an `orr` instruction to a fast path. The assembler can't do this at assemble time via "|1" because "invalid operands (.text and *ABS* sections) for `|'", so we're forced to do this. A better solution would be to have consistent binutils behavior, or to have some kind of \sym feature detection that won't turn into a maze of version comparisons. However, it's at the moment unclear how to achieve this. The rest of this commit message contains all of the relevant information. My tests concerned these versions: broken: GNU ld (Gentoo 2.29.1 p3) 2.29.1 working: GNU ld (GNU Binutils for Ubuntu) 2.26.1 These produced the following code: --- broken 2017-11-21 17:44:14.523416082 +0100 +++ working 2017-11-21 17:44:44.548461234 +0100 @@ -133,7 +133,7 @@ 160: f01a 0ff0 tst.w sl, #240; 0xf0 164: d111bne.n 18a <__sys_trace> 166: f5b7 7fc8 cmp.w r7, #400; 0x190 - 16a: f2af 1e6a subwlr, pc, #362; 0x16a + 16a: f2af 1e6b subwlr, pc, #363; 0x16b 16e: bf38it cc 170: f858 f027 ldrcc.w pc, [r8, r7, lsl #2] 174: a902add r1, sp, #8 The differing instruction corresponds with this actual line in arch/arm/kernel/entry-common.S: badrlr, ret_fast_syscall@ return address Running the broken kernel results in a runtime OOPS with: PC is at ret_fast_syscall+0x4/0x52 LR is at ret_fast_syscall+0x2/0x52 The disassembly of that function for the crashing kernel is: .text: ret_fast_syscall; CODE XREF: sys_syscall+1C↓j .text: CPSID I ; jumptable 0840 cases 15,18-376 .text:0002 .text:0002 loc_2 ; DATA XREF: sys_syscall-6BA↓o .text:0002 LDR.W R2, [R9,#8] .text:0006 CMP.W R2, #0xBF00 Signed-off-by: Jason A. Donenfeld Cc: sta...@vger.kernel.org --- arch/arm/include/asm/assembler.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h index ad301f107dd2..c62a3b6b0a3e 100644 --- a/arch/arm/include/asm/assembler.h +++ b/arch/arm/include/asm/assembler.h @@ -194,10 +194,9 @@ */ .irpc,,eq,ne,cs,cc,mi,pl,vs,vc,hi,ls,ge,lt,gt,le,hs,lo .macro badr\c, rd, sym -#ifdef CONFIG_THUMB2_KERNEL - adr\c \rd, \sym + 1 -#else adr\c \rd, \sym +#ifdef CONFIG_THUMB2_KERNEL + orr\c \rd, \rd, 1 #endif .endm .endr -- 2.15.0 -- From: Russell King - ARM Linux Date: Tue, Nov 21, 2017 at 6:38 PM To: "Jason A. Donenfeld" , Will Deacon Cc: linux-arm-ker...@lists.infradead.org, l...@vger.kernel.org, sta...@vger.kernel.org As it just seems to be a limited range of binutils versions that are affected, I'd rather not impact the kernel fast-paths with extra cycles just because binutils decided to change behaviour. I'd prefer to inform people about the problem and get them to change to a non- buggy binutils. This seems to be the second binutils bug that's biting us within the last month... what's going on with binutils QA? arch/arm/Makefile| 7 +-- arch/arm/tools/Makefile | 5 - arch/arm/tools/toolcheck | 24 3 files changed, 33 insertions(+), 3 deletions(-) diff --git a/arch/arm/Makefile b/arch/arm/Makefile index 1cfac5119545..9e70d0435121 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -319,16 +319,19 @@ all: $(notdir $(KBUILD_IMAGE)) $(KBUILD_DTBS) archheaders: $(Q)$(MAKE) $(build)=arch/arm/tools uapi -archprepare: +archprepare: toolcheck $(
Re: objtool - what if I want to clobber rbp?
Hi Josh, On Wed, Nov 22, 2017 at 4:31 AM, Josh Poimboeufwrote: > - Make your feature conflict with CONFIG_FRAME_POINTER on x86_64. The > ORC unwinder is now the default anyway for 4.15, and we renamed the > configs, so most people will be actively switching to ORC. > BTW, since CONFIG_FRAME_POINTER is no longer the default and is becoming > deprecated, there has been some talk of disabling objtool with > CONFIG_FRAME_POINTER. That would make your life easier. However I > think we're not quite ready for that, so it might be a few more release > cycles before that happens. Fortunately my code won't be ready to be merged for a few release cycles, so thanks for letting me know about future plans. I think my patch set and these changes will synchronize nicely. > > - Add some ifdefs so your code only uses %rbp as a scratch register when > CONFIG_FRAME_POINTER is disabled. > - If one of the registers is used less often than the others, you could > spill it to the stack. I know you said you need all the registers, > but I'd be willing to review the code for ideas, if that would help. > Sometimes it helps to get fresh eyes on the problem. We were able to > fix this problem with all the existing crypto code without affecting > performance measurably. We had to get creative with a few of those. That actually could be interesting, if you're curious about looking at it. Jason
Re: objtool - what if I want to clobber rbp?
Hi Josh, On Wed, Nov 22, 2017 at 4:31 AM, Josh Poimboeuf wrote: > - Make your feature conflict with CONFIG_FRAME_POINTER on x86_64. The > ORC unwinder is now the default anyway for 4.15, and we renamed the > configs, so most people will be actively switching to ORC. > BTW, since CONFIG_FRAME_POINTER is no longer the default and is becoming > deprecated, there has been some talk of disabling objtool with > CONFIG_FRAME_POINTER. That would make your life easier. However I > think we're not quite ready for that, so it might be a few more release > cycles before that happens. Fortunately my code won't be ready to be merged for a few release cycles, so thanks for letting me know about future plans. I think my patch set and these changes will synchronize nicely. > > - Add some ifdefs so your code only uses %rbp as a scratch register when > CONFIG_FRAME_POINTER is disabled. > - If one of the registers is used less often than the others, you could > spill it to the stack. I know you said you need all the registers, > but I'd be willing to review the code for ideas, if that would help. > Sometimes it helps to get fresh eyes on the problem. We were able to > fix this problem with all the existing crypto code without affecting > performance measurably. We had to get creative with a few of those. That actually could be interesting, if you're curious about looking at it. Jason
objtool - what if I want to clobber rbp?
Hi Josh, We're working on some highly optimized assembly crypto primitive implementations for WireGuard. The last 24 hours have been spent trying to make objtool happy with a variety of tricks, some more unfortunate than others. There's still one issue remaining, however, and I just can't figure out how to make it go away: poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx uses BP as a scratch register poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx2 uses BP as a scratch register poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx512 uses BP as a scratch register The messages are right. We're using %rbp as a general purpose register, writing into it all sorts of crypto gibberish certainly not suitable for walking stack frames. It's hard to find a way of not using it, without incurring a speed penalty. We really do just need all the registers. Of course the "problem" goes away with the new slick ORC unwinding, which is great. But for frame pointer unwinding, this problem remains. I'm wondering if you can think of any clever way of marking a function or the like that will make this issue go away, somehow. Is there any path forward without sacrificing %rbp and hence performance to a useless frame pointer? Thanks, Jason
objtool - what if I want to clobber rbp?
Hi Josh, We're working on some highly optimized assembly crypto primitive implementations for WireGuard. The last 24 hours have been spent trying to make objtool happy with a variety of tricks, some more unfortunate than others. There's still one issue remaining, however, and I just can't figure out how to make it go away: poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx uses BP as a scratch register poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx2 uses BP as a scratch register poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx512 uses BP as a scratch register The messages are right. We're using %rbp as a general purpose register, writing into it all sorts of crypto gibberish certainly not suitable for walking stack frames. It's hard to find a way of not using it, without incurring a speed penalty. We really do just need all the registers. Of course the "problem" goes away with the new slick ORC unwinding, which is great. But for frame pointer unwinding, this problem remains. I'm wondering if you can think of any clever way of marking a function or the like that will make this issue go away, somehow. Is there any path forward without sacrificing %rbp and hence performance to a useless frame pointer? Thanks, Jason
Re: [GIT PULL] usercopy whitelisting for v4.15-rc1
Hi Linus, On Fri, Nov 17, 2017 at 10:13 PM, Linus Torvaldswrote: > As a security person, you need to repeat this mantra: > > "security problems are just bugs" > > and you need to _internalize_ it, instead of scoff at it. It might be news to you that actually some security people scoff at other security peoples' obsession with "security bugs". Security people like to rate the "utility" of a bug in a security setting, usually from from "does nothing" (benign) to "allows code execution" (dangerous/useful), with something like "crashes system" somewhere in between those two poles. (You may not agree with this scale, but indeed security people tend to think that a crash is safer than allowing malicious code execution.) The security industry is largely obsessed by finding (and selling/using/patching/reporting/showcasing/stockpiling/detecting/stealing) these "dangerous/useful" variety of bugs. And this obsession is continually fulfilled because bugs keep happening -- which is just the nature of software development -- and so this "security bug" infatuation continues. It even leads to people upset with you that you don't care about CVEs and so forth, because they're so fixated on individual bugs and their security impact. So, as I mentioned, it may come as a surprise to you that some security people don't really care for that mentality. As Bas Alberts wrote [1], "Bugs are irrelevant. Yet our industry is fatally focused on what is essentially vulnerability masturbation." So what's the alternative to obsessing over each individual software bug? In the context of the kernel, the solution from Spender and Pipacs, and more recently "adopted" by Kees and his KSPP project, has been to try to eliminate the "security utility" of bugs. In the best case, this means making bugs that were formerly "dangerous/useful" be "benign". In the second best case, it usually means making bugs that were formerly "dangerous/useful" be "irreliable" or "crashes system". The idea is to completely eliminate the security impact of entire *classes* of bugs. It acknowledges that programmers will continue to make programming bugs, as they do, and seeks to make the kernel safer, from a security perspective, in spite of the continued existence of these bugs. > Because the primary focus should be "debugging". The primary focus > should be "let's make sure the kernel released in a year is better > than the one released today". In light of the above, this then is where there might be an unfortunate disconnect. Sure, many hardening features that turn dangerous bugs into less dangerous bugs have the quality of unearthing individual bugs. But many other aspects are rightfully focused on mitigating the security impact future unforeseen, undetected bugs. >From a security perspective, please try to see beyond the individual bug obsession, and consider the larger picture. After all, if the security impact of all kernel bugs is reduced to nil, your mantra that all bugs are just bugs will be even more true. Jason [1] https://lists.immunityinc.com/pipermail/dailydave/2015-August/000976.html
Re: [GIT PULL] usercopy whitelisting for v4.15-rc1
Hi Linus, On Fri, Nov 17, 2017 at 10:13 PM, Linus Torvalds wrote: > As a security person, you need to repeat this mantra: > > "security problems are just bugs" > > and you need to _internalize_ it, instead of scoff at it. It might be news to you that actually some security people scoff at other security peoples' obsession with "security bugs". Security people like to rate the "utility" of a bug in a security setting, usually from from "does nothing" (benign) to "allows code execution" (dangerous/useful), with something like "crashes system" somewhere in between those two poles. (You may not agree with this scale, but indeed security people tend to think that a crash is safer than allowing malicious code execution.) The security industry is largely obsessed by finding (and selling/using/patching/reporting/showcasing/stockpiling/detecting/stealing) these "dangerous/useful" variety of bugs. And this obsession is continually fulfilled because bugs keep happening -- which is just the nature of software development -- and so this "security bug" infatuation continues. It even leads to people upset with you that you don't care about CVEs and so forth, because they're so fixated on individual bugs and their security impact. So, as I mentioned, it may come as a surprise to you that some security people don't really care for that mentality. As Bas Alberts wrote [1], "Bugs are irrelevant. Yet our industry is fatally focused on what is essentially vulnerability masturbation." So what's the alternative to obsessing over each individual software bug? In the context of the kernel, the solution from Spender and Pipacs, and more recently "adopted" by Kees and his KSPP project, has been to try to eliminate the "security utility" of bugs. In the best case, this means making bugs that were formerly "dangerous/useful" be "benign". In the second best case, it usually means making bugs that were formerly "dangerous/useful" be "irreliable" or "crashes system". The idea is to completely eliminate the security impact of entire *classes* of bugs. It acknowledges that programmers will continue to make programming bugs, as they do, and seeks to make the kernel safer, from a security perspective, in spite of the continued existence of these bugs. > Because the primary focus should be "debugging". The primary focus > should be "let's make sure the kernel released in a year is better > than the one released today". In light of the above, this then is where there might be an unfortunate disconnect. Sure, many hardening features that turn dangerous bugs into less dangerous bugs have the quality of unearthing individual bugs. But many other aspects are rightfully focused on mitigating the security impact future unforeseen, undetected bugs. >From a security perspective, please try to see beyond the individual bug obsession, and consider the larger picture. After all, if the security impact of all kernel bugs is reduced to nil, your mantra that all bugs are just bugs will be even more true. Jason [1] https://lists.immunityinc.com/pipermail/dailydave/2015-August/000976.html
Re: [PATCH] fork: fix incorrect fput of ->exe_file causing use-after-free
Hey Eric, This is a pretty late response to the thread, but in case you're curious, since Ubuntu forgot to backport this (I already emailed them about it), I actually experienced a set of boxes that hit this bug in the wild and were crashing every 2 weeks or so, when under highload. It's pretty amazing that syzkaller unearthed this, resulting in a nice test case, making it possible to debug and fix the error. Otherwise it'd be a real heisenbug. So, thanks. Jason
Re: [PATCH] fork: fix incorrect fput of ->exe_file causing use-after-free
Hey Eric, This is a pretty late response to the thread, but in case you're curious, since Ubuntu forgot to backport this (I already emailed them about it), I actually experienced a set of boxes that hit this bug in the wild and were crashing every 2 weeks or so, when under highload. It's pretty amazing that syzkaller unearthed this, resulting in a nice test case, making it possible to debug and fix the error. Otherwise it'd be a real heisenbug. So, thanks. Jason
Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps
On Sat, Nov 11, 2017 at 11:18 PM, Johannes Bergwrote: > >> > If you're handling this by forcing another read() to procude the >> > NLMSG_DONE, then you have no reason to WARN_ON() here. >> > >> > In fact you are adding a WARN_ON() which is trivially triggerable by >> > any user. >> >> I added this in my suggestion for how this could work, but I don't >> think you're right, since we previously check if there's enough space. > > Or perhaps I should say this differently: > > Forcing another read happens through the > > skb_tailroom(skb) < nlmsg_total_size(...) > > check, so the nlmsg_put_answer() can't really fail. > > > Handling nlmsg_put_answer() failures by forcing another read would have > required jumping to the existing if code with a goto, or restructuring > the whole thing completely somehow, and I didn't see how to do that. Exactly. And if something _does_ go wrong in our logic, and we can't add NLMSG_DONE, we really do want people to report this to us, since dumps must always end that way. We'd probably have caught this a number of years ago when userspace developers were twiddling with their receive buffers if we had had the WARN_ON. Nice suggestion from Johannes.
Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps
On Sat, Nov 11, 2017 at 11:18 PM, Johannes Berg wrote: > >> > If you're handling this by forcing another read() to procude the >> > NLMSG_DONE, then you have no reason to WARN_ON() here. >> > >> > In fact you are adding a WARN_ON() which is trivially triggerable by >> > any user. >> >> I added this in my suggestion for how this could work, but I don't >> think you're right, since we previously check if there's enough space. > > Or perhaps I should say this differently: > > Forcing another read happens through the > > skb_tailroom(skb) < nlmsg_total_size(...) > > check, so the nlmsg_put_answer() can't really fail. > > > Handling nlmsg_put_answer() failures by forcing another read would have > required jumping to the existing if code with a goto, or restructuring > the whole thing completely somehow, and I didn't see how to do that. Exactly. And if something _does_ go wrong in our logic, and we can't add NLMSG_DONE, we really do want people to report this to us, since dumps must always end that way. We'd probably have caught this a number of years ago when userspace developers were twiddling with their receive buffers if we had had the WARN_ON. Nice suggestion from Johannes.
WireGuard Upstreaming Roadmap (November 2017)
Hi folks, This relates to WireGuard [0]. Following a very nice conference with the Linux kernel networking subsystem community [1,2], I thought it might be a good idea to spell out the roadmap for the coming months and the trajectory into upstream, based on my discussions with several developers and maintainers. There are several threads of this, the biggest of which surrounds the kernel code base, but there are some other ends of the WireGuard project as a whole that are also relevant. The current biggest blocker is issues with the crypto API. Before WireGuard can go upstream, I intend to embark on a multi-pronged effort to overhaul the crypto API. I very much need to sync up with Herbert regarding my plans for this, and start spec'ing things out a bit more formally, so I can begin concrete discussions with him. I intend to base my work both on feedback from linux-crypto/Herbert and from the cryptographic research community. I hope to go to RWC2018 [3] and the subsequent HACS workshop for the academic engagement side, but of course like all the work I do on the kernel, things will be highly based in engineering, rather than purely academic, practices. Dave has strongly encouraged me to post patches sooner rather than later. So I think before the crypto API is ready to go, I'll likely post a [RFG] -- request for grumbles -- patch set to netdev, in order to have some code review, so as to gauge where we're at. This patch set will use my current crypto API, not the kernel's crypto API, with it mentioned in the opening that I intend to switch to the kernel's crypto API when it looks like the one used here. This way we'll get early feedback so that the later real [PATCH] series will go more smoothly. There are a few WireGuard features that some of you have been waiting for. At the urging of some folks at the conference, I intend to submit a "core" WireGuard to upstream, and then use future releases to build on top of that, to make the initial upstreaming process go as easily as possible. Therefore, there are three big TODO items that may _or_ may not go in after upstreaming: - In-band control messages [possibly not] - Netlink multicast event notification - GRO integration Of these, it's most likely I'll implement GRO and leave the other two until after, but this may change. Since WireGuard already does GSO, it would make sense to implement the other side of things too. There's also a longer more ambitious roadmap [4], filled with both easy coding-related things and longer term design things, but that's out of scope for this email, though many of which will likely be completed before submission time. There are also six other threads of development that are ongoing, which I intend to put a bit more focus on too in the near future: - The userspace implementation. I'd like to bring this up to deployment quality, which naturally fits into the next area. - Mobile apps. It's fairly easy to integrate the userspace implementation with existing APIs. The current Android app already works well with the kernel module, but of course people want this more easily deployed. - Mac and Windows support for the userspace implementation. These are already mostly done, but the APIs used may in fact change, so there may still be a bit of work to do here before we're satisfied. - Bindings and libraries. Now that we have a stable Netlink API, we can start making nice wrappers for the various languages people like to use. It remains to be seen whether or not a C "libwireguard" is needed, since in that domain, talking Netlink directly is often a better choice, but I do see some potential for a pywireguard and the like. This will also be essential when the already mentioned plans for event notification and the possibly excluded control messages materialize. - More formal verification. While we have the cryptographic protocol verified, there are still more places where formalism is quite helpful, proving more state machines, and even proving C implementations to be correct. Work and research is ongoing in this domain. - Integration into network managers and routing daemons (mesh and traditional). Work has already begun here on systemd-networkd, and others are looking into daemons like babel and bird. So that's where we're at. I hope to have a RFG submitted in the next several months, and hopefully we gather some nice momentum and get the work upstreamed and the project completed soon, for some definition of "complete". If you'd like to work on WireGuard, or simply have any questions, don't hesitate to email me. Regards, Jason [0] https://www.wireguard.com/ [1] https://www.netdevconf.org/2.2/ [2] https://www.wireguard.com/presentations/ [3] https://rwc.iacr.org [4] https://www.wireguard.com/todo/
WireGuard Upstreaming Roadmap (November 2017)
Hi folks, This relates to WireGuard [0]. Following a very nice conference with the Linux kernel networking subsystem community [1,2], I thought it might be a good idea to spell out the roadmap for the coming months and the trajectory into upstream, based on my discussions with several developers and maintainers. There are several threads of this, the biggest of which surrounds the kernel code base, but there are some other ends of the WireGuard project as a whole that are also relevant. The current biggest blocker is issues with the crypto API. Before WireGuard can go upstream, I intend to embark on a multi-pronged effort to overhaul the crypto API. I very much need to sync up with Herbert regarding my plans for this, and start spec'ing things out a bit more formally, so I can begin concrete discussions with him. I intend to base my work both on feedback from linux-crypto/Herbert and from the cryptographic research community. I hope to go to RWC2018 [3] and the subsequent HACS workshop for the academic engagement side, but of course like all the work I do on the kernel, things will be highly based in engineering, rather than purely academic, practices. Dave has strongly encouraged me to post patches sooner rather than later. So I think before the crypto API is ready to go, I'll likely post a [RFG] -- request for grumbles -- patch set to netdev, in order to have some code review, so as to gauge where we're at. This patch set will use my current crypto API, not the kernel's crypto API, with it mentioned in the opening that I intend to switch to the kernel's crypto API when it looks like the one used here. This way we'll get early feedback so that the later real [PATCH] series will go more smoothly. There are a few WireGuard features that some of you have been waiting for. At the urging of some folks at the conference, I intend to submit a "core" WireGuard to upstream, and then use future releases to build on top of that, to make the initial upstreaming process go as easily as possible. Therefore, there are three big TODO items that may _or_ may not go in after upstreaming: - In-band control messages [possibly not] - Netlink multicast event notification - GRO integration Of these, it's most likely I'll implement GRO and leave the other two until after, but this may change. Since WireGuard already does GSO, it would make sense to implement the other side of things too. There's also a longer more ambitious roadmap [4], filled with both easy coding-related things and longer term design things, but that's out of scope for this email, though many of which will likely be completed before submission time. There are also six other threads of development that are ongoing, which I intend to put a bit more focus on too in the near future: - The userspace implementation. I'd like to bring this up to deployment quality, which naturally fits into the next area. - Mobile apps. It's fairly easy to integrate the userspace implementation with existing APIs. The current Android app already works well with the kernel module, but of course people want this more easily deployed. - Mac and Windows support for the userspace implementation. These are already mostly done, but the APIs used may in fact change, so there may still be a bit of work to do here before we're satisfied. - Bindings and libraries. Now that we have a stable Netlink API, we can start making nice wrappers for the various languages people like to use. It remains to be seen whether or not a C "libwireguard" is needed, since in that domain, talking Netlink directly is often a better choice, but I do see some potential for a pywireguard and the like. This will also be essential when the already mentioned plans for event notification and the possibly excluded control messages materialize. - More formal verification. While we have the cryptographic protocol verified, there are still more places where formalism is quite helpful, proving more state machines, and even proving C implementations to be correct. Work and research is ongoing in this domain. - Integration into network managers and routing daemons (mesh and traditional). Work has already begun here on systemd-networkd, and others are looking into daemons like babel and bird. So that's where we're at. I hope to have a RFG submitted in the next several months, and hopefully we gather some nice momentum and get the work upstreamed and the project completed soon, for some definition of "complete". If you'd like to work on WireGuard, or simply have any questions, don't hesitate to email me. Regards, Jason [0] https://www.wireguard.com/ [1] https://www.netdevconf.org/2.2/ [2] https://www.wireguard.com/presentations/ [3] https://rwc.iacr.org [4] https://www.wireguard.com/todo/
Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps
On Sat, Nov 11, 2017 at 11:37 AM, David Millerwrote: > Jason I'm already pushing my luck as-is with the pull request I made > yesterday. > > I've seen your original requst to get this in, you don't have to say > it multiple times. > > We can get this into the merge window and submit it for -stable, so > please relax. Whoops, sorry! Okay, no problem. Jason
Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps
On Sat, Nov 11, 2017 at 11:37 AM, David Miller wrote: > Jason I'm already pushing my luck as-is with the pull request I made > yesterday. > > I've seen your original requst to get this in, you don't have to say > it multiple times. > > We can get this into the merge window and submit it for -stable, so > please relax. Whoops, sorry! Okay, no problem. Jason
Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps
IIRC, 4.14 comes tomorrow-ish? If possible, it would be nice to get this in 4.14 before then, so it doesn't have to take time to trickle down through stable. Jason On Thu, Nov 9, 2017 at 1:04 PM, Jason A. Donenfeld <ja...@zx2c4.com> wrote: > The way people generally use netlink_dump is that they fill in the skb > as much as possible, breaking when nla_put returns an error. Then, they > get called again and start filling out the next skb, and again, and so > forth. The mechanism at work here is the ability for the iterative > dumping function to detect when the skb is filled up and not fill it > past the brim, waiting for a fresh skb for the rest of the data. > > However, if the attributes are small and nicely packed, it is possible > that a dump callback function successfully fills in attributes until the > skb is of size 4080 (libmnl's default page-sized receive buffer size). > The dump function completes, satisfied, and then, if it happens to be > that this is actually the last skb, and no further ones are to be sent, > then netlink_dump will add on the NLMSG_DONE part: > > nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); > > It is very important that netlink_dump does this, of course. However, in > this example, that call to nlmsg_put_answer will fail, because the > previous filling by the dump function did not leave it enough room. And > how could it possibly have done so? All of the nla_put variety of > functions simply check to see if the skb has enough tailroom, > independent of the context it is in. > > In order to keep the important assumptions of all netlink dump users, it > is therefore important to give them an skb that has this end part of the > tail already reserved, so that the call to nlmsg_put_answer does not > fail. Otherwise, library authors are forced to find some bizarre sized > receive buffer that has a large modulo relative to the common sizes of > messages received, which is ugly and buggy. > > This patch thus saves the NLMSG_DONE for an additional message, for the > case that things are dangerously close to the brim. This requires > keeping track of the errno from ->dump() across calls. > > Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com> > --- > Can we get this into 4.14? Is there still time? It should also be queued > up for stable. > > Changes v3->v4: > - I like long lines. The kernel does not. Wrapped at 80 now. > > net/netlink/af_netlink.c | 17 +++-- > net/netlink/af_netlink.h | 1 + > 2 files changed, 12 insertions(+), 6 deletions(-) > > diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c > index b93148e8e9fb..15c99dfa3d72 100644 > --- a/net/netlink/af_netlink.c > +++ b/net/netlink/af_netlink.c > @@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk) > struct sk_buff *skb = NULL; > struct nlmsghdr *nlh; > struct module *module; > - int len, err = -ENOBUFS; > + int err = -ENOBUFS; > int alloc_min_size; > int alloc_size; > > @@ -2183,9 +2183,11 @@ static int netlink_dump(struct sock *sk) > skb_reserve(skb, skb_tailroom(skb) - alloc_size); > netlink_skb_set_owner_r(skb, sk); > > - len = cb->dump(skb, cb); > + if (nlk->dump_done_errno > 0) > + nlk->dump_done_errno = cb->dump(skb, cb); > > - if (len > 0) { > + if (nlk->dump_done_errno > 0 || > + skb_tailroom(skb) < > nlmsg_total_size(sizeof(nlk->dump_done_errno))) { > mutex_unlock(nlk->cb_mutex); > > if (sk_filter(sk, skb)) > @@ -2195,13 +2197,15 @@ static int netlink_dump(struct sock *sk) > return 0; > } > > - nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); > - if (!nlh) > + nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, > + sizeof(nlk->dump_done_errno), NLM_F_MULTI); > + if (WARN_ON(!nlh)) > goto errout_skb; > > nl_dump_check_consistent(cb, nlh); > > - memcpy(nlmsg_data(nlh), , sizeof(len)); > + memcpy(nlmsg_data(nlh), >dump_done_errno, > + sizeof(nlk->dump_done_errno)); > > if (sk_filter(sk, skb)) > kfree_skb(skb); > @@ -2273,6 +2277,7 @@ int __netlink_dump_start(struct sock *ssk, struct > sk_buff *skb, > } > > nlk->cb_running = true; > + nlk->dump_done_errno = INT_MAX; > > mutex_unlock(nlk->cb_mutex); > > diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h > index 028188597eaa..962de7b3c023 100644 > --- a/net/netlink/af_netlink.
Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps
IIRC, 4.14 comes tomorrow-ish? If possible, it would be nice to get this in 4.14 before then, so it doesn't have to take time to trickle down through stable. Jason On Thu, Nov 9, 2017 at 1:04 PM, Jason A. Donenfeld wrote: > The way people generally use netlink_dump is that they fill in the skb > as much as possible, breaking when nla_put returns an error. Then, they > get called again and start filling out the next skb, and again, and so > forth. The mechanism at work here is the ability for the iterative > dumping function to detect when the skb is filled up and not fill it > past the brim, waiting for a fresh skb for the rest of the data. > > However, if the attributes are small and nicely packed, it is possible > that a dump callback function successfully fills in attributes until the > skb is of size 4080 (libmnl's default page-sized receive buffer size). > The dump function completes, satisfied, and then, if it happens to be > that this is actually the last skb, and no further ones are to be sent, > then netlink_dump will add on the NLMSG_DONE part: > > nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); > > It is very important that netlink_dump does this, of course. However, in > this example, that call to nlmsg_put_answer will fail, because the > previous filling by the dump function did not leave it enough room. And > how could it possibly have done so? All of the nla_put variety of > functions simply check to see if the skb has enough tailroom, > independent of the context it is in. > > In order to keep the important assumptions of all netlink dump users, it > is therefore important to give them an skb that has this end part of the > tail already reserved, so that the call to nlmsg_put_answer does not > fail. Otherwise, library authors are forced to find some bizarre sized > receive buffer that has a large modulo relative to the common sizes of > messages received, which is ugly and buggy. > > This patch thus saves the NLMSG_DONE for an additional message, for the > case that things are dangerously close to the brim. This requires > keeping track of the errno from ->dump() across calls. > > Signed-off-by: Jason A. Donenfeld > --- > Can we get this into 4.14? Is there still time? It should also be queued > up for stable. > > Changes v3->v4: > - I like long lines. The kernel does not. Wrapped at 80 now. > > net/netlink/af_netlink.c | 17 +++-- > net/netlink/af_netlink.h | 1 + > 2 files changed, 12 insertions(+), 6 deletions(-) > > diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c > index b93148e8e9fb..15c99dfa3d72 100644 > --- a/net/netlink/af_netlink.c > +++ b/net/netlink/af_netlink.c > @@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk) > struct sk_buff *skb = NULL; > struct nlmsghdr *nlh; > struct module *module; > - int len, err = -ENOBUFS; > + int err = -ENOBUFS; > int alloc_min_size; > int alloc_size; > > @@ -2183,9 +2183,11 @@ static int netlink_dump(struct sock *sk) > skb_reserve(skb, skb_tailroom(skb) - alloc_size); > netlink_skb_set_owner_r(skb, sk); > > - len = cb->dump(skb, cb); > + if (nlk->dump_done_errno > 0) > + nlk->dump_done_errno = cb->dump(skb, cb); > > - if (len > 0) { > + if (nlk->dump_done_errno > 0 || > + skb_tailroom(skb) < > nlmsg_total_size(sizeof(nlk->dump_done_errno))) { > mutex_unlock(nlk->cb_mutex); > > if (sk_filter(sk, skb)) > @@ -2195,13 +2197,15 @@ static int netlink_dump(struct sock *sk) > return 0; > } > > - nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); > - if (!nlh) > + nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, > + sizeof(nlk->dump_done_errno), NLM_F_MULTI); > + if (WARN_ON(!nlh)) > goto errout_skb; > > nl_dump_check_consistent(cb, nlh); > > - memcpy(nlmsg_data(nlh), , sizeof(len)); > + memcpy(nlmsg_data(nlh), >dump_done_errno, > + sizeof(nlk->dump_done_errno)); > > if (sk_filter(sk, skb)) > kfree_skb(skb); > @@ -2273,6 +2277,7 @@ int __netlink_dump_start(struct sock *ssk, struct > sk_buff *skb, > } > > nlk->cb_running = true; > + nlk->dump_done_errno = INT_MAX; > > mutex_unlock(nlk->cb_mutex); > > diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h > index 028188597eaa..962de7b3c023 100644 > --- a/net/netlink/af_netlink.h > +++ b/net/netlink/af_netlink.h > @@ -34,6 +34,7 @@ struct netlink_sock { > wait_queue_head_t wait; > boolbound; > boolcb_running; > + int dump_done_errno; > struct netlink_callback cb; > struct mutex*cb_mutex; > struct mutexcb_def_mutex; > -- > 2.15.0 >
[PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps
The way people generally use netlink_dump is that they fill in the skb as much as possible, breaking when nla_put returns an error. Then, they get called again and start filling out the next skb, and again, and so forth. The mechanism at work here is the ability for the iterative dumping function to detect when the skb is filled up and not fill it past the brim, waiting for a fresh skb for the rest of the data. However, if the attributes are small and nicely packed, it is possible that a dump callback function successfully fills in attributes until the skb is of size 4080 (libmnl's default page-sized receive buffer size). The dump function completes, satisfied, and then, if it happens to be that this is actually the last skb, and no further ones are to be sent, then netlink_dump will add on the NLMSG_DONE part: nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); It is very important that netlink_dump does this, of course. However, in this example, that call to nlmsg_put_answer will fail, because the previous filling by the dump function did not leave it enough room. And how could it possibly have done so? All of the nla_put variety of functions simply check to see if the skb has enough tailroom, independent of the context it is in. In order to keep the important assumptions of all netlink dump users, it is therefore important to give them an skb that has this end part of the tail already reserved, so that the call to nlmsg_put_answer does not fail. Otherwise, library authors are forced to find some bizarre sized receive buffer that has a large modulo relative to the common sizes of messages received, which is ugly and buggy. This patch thus saves the NLMSG_DONE for an additional message, for the case that things are dangerously close to the brim. This requires keeping track of the errno from ->dump() across calls. Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com> --- Can we get this into 4.14? Is there still time? It should also be queued up for stable. Changes v3->v4: - I like long lines. The kernel does not. Wrapped at 80 now. net/netlink/af_netlink.c | 17 +++-- net/netlink/af_netlink.h | 1 + 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index b93148e8e9fb..15c99dfa3d72 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk) struct sk_buff *skb = NULL; struct nlmsghdr *nlh; struct module *module; - int len, err = -ENOBUFS; + int err = -ENOBUFS; int alloc_min_size; int alloc_size; @@ -2183,9 +2183,11 @@ static int netlink_dump(struct sock *sk) skb_reserve(skb, skb_tailroom(skb) - alloc_size); netlink_skb_set_owner_r(skb, sk); - len = cb->dump(skb, cb); + if (nlk->dump_done_errno > 0) + nlk->dump_done_errno = cb->dump(skb, cb); - if (len > 0) { + if (nlk->dump_done_errno > 0 || + skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) { mutex_unlock(nlk->cb_mutex); if (sk_filter(sk, skb)) @@ -2195,13 +2197,15 @@ static int netlink_dump(struct sock *sk) return 0; } - nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); - if (!nlh) + nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, + sizeof(nlk->dump_done_errno), NLM_F_MULTI); + if (WARN_ON(!nlh)) goto errout_skb; nl_dump_check_consistent(cb, nlh); - memcpy(nlmsg_data(nlh), , sizeof(len)); + memcpy(nlmsg_data(nlh), >dump_done_errno, + sizeof(nlk->dump_done_errno)); if (sk_filter(sk, skb)) kfree_skb(skb); @@ -2273,6 +2277,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, } nlk->cb_running = true; + nlk->dump_done_errno = INT_MAX; mutex_unlock(nlk->cb_mutex); diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h index 028188597eaa..962de7b3c023 100644 --- a/net/netlink/af_netlink.h +++ b/net/netlink/af_netlink.h @@ -34,6 +34,7 @@ struct netlink_sock { wait_queue_head_t wait; boolbound; boolcb_running; + int dump_done_errno; struct netlink_callback cb; struct mutex*cb_mutex; struct mutexcb_def_mutex; -- 2.15.0
[PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps
The way people generally use netlink_dump is that they fill in the skb as much as possible, breaking when nla_put returns an error. Then, they get called again and start filling out the next skb, and again, and so forth. The mechanism at work here is the ability for the iterative dumping function to detect when the skb is filled up and not fill it past the brim, waiting for a fresh skb for the rest of the data. However, if the attributes are small and nicely packed, it is possible that a dump callback function successfully fills in attributes until the skb is of size 4080 (libmnl's default page-sized receive buffer size). The dump function completes, satisfied, and then, if it happens to be that this is actually the last skb, and no further ones are to be sent, then netlink_dump will add on the NLMSG_DONE part: nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); It is very important that netlink_dump does this, of course. However, in this example, that call to nlmsg_put_answer will fail, because the previous filling by the dump function did not leave it enough room. And how could it possibly have done so? All of the nla_put variety of functions simply check to see if the skb has enough tailroom, independent of the context it is in. In order to keep the important assumptions of all netlink dump users, it is therefore important to give them an skb that has this end part of the tail already reserved, so that the call to nlmsg_put_answer does not fail. Otherwise, library authors are forced to find some bizarre sized receive buffer that has a large modulo relative to the common sizes of messages received, which is ugly and buggy. This patch thus saves the NLMSG_DONE for an additional message, for the case that things are dangerously close to the brim. This requires keeping track of the errno from ->dump() across calls. Signed-off-by: Jason A. Donenfeld --- Can we get this into 4.14? Is there still time? It should also be queued up for stable. Changes v3->v4: - I like long lines. The kernel does not. Wrapped at 80 now. net/netlink/af_netlink.c | 17 +++-- net/netlink/af_netlink.h | 1 + 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index b93148e8e9fb..15c99dfa3d72 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk) struct sk_buff *skb = NULL; struct nlmsghdr *nlh; struct module *module; - int len, err = -ENOBUFS; + int err = -ENOBUFS; int alloc_min_size; int alloc_size; @@ -2183,9 +2183,11 @@ static int netlink_dump(struct sock *sk) skb_reserve(skb, skb_tailroom(skb) - alloc_size); netlink_skb_set_owner_r(skb, sk); - len = cb->dump(skb, cb); + if (nlk->dump_done_errno > 0) + nlk->dump_done_errno = cb->dump(skb, cb); - if (len > 0) { + if (nlk->dump_done_errno > 0 || + skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) { mutex_unlock(nlk->cb_mutex); if (sk_filter(sk, skb)) @@ -2195,13 +2197,15 @@ static int netlink_dump(struct sock *sk) return 0; } - nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); - if (!nlh) + nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, + sizeof(nlk->dump_done_errno), NLM_F_MULTI); + if (WARN_ON(!nlh)) goto errout_skb; nl_dump_check_consistent(cb, nlh); - memcpy(nlmsg_data(nlh), , sizeof(len)); + memcpy(nlmsg_data(nlh), >dump_done_errno, + sizeof(nlk->dump_done_errno)); if (sk_filter(sk, skb)) kfree_skb(skb); @@ -2273,6 +2277,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, } nlk->cb_running = true; + nlk->dump_done_errno = INT_MAX; mutex_unlock(nlk->cb_mutex); diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h index 028188597eaa..962de7b3c023 100644 --- a/net/netlink/af_netlink.h +++ b/net/netlink/af_netlink.h @@ -34,6 +34,7 @@ struct netlink_sock { wait_queue_head_t wait; boolbound; boolcb_running; + int dump_done_errno; struct netlink_callback cb; struct mutex*cb_mutex; struct mutexcb_def_mutex; -- 2.15.0
Re: [PATCH v3] af_netlink: ensure that NLMSG_DONE never fails in dumps
On Thu, Nov 9, 2017 at 11:02 AM, Johannes Bergwrote: > nit: I think your line got a little long here :) Ack. For v4. > and here Ack. For v4. > >> + nlk->dump_done_errno = INT_MAX; > > I guess positive values aren't really returned from dump? When a positive value is returned, the API uses this to signify that the dump is not done, and it should be called again. When 0 or negative is returned, it means the dump is complete and the return value should be returned in DONE. We therefore initialize dump_done_errno to some positive value, so that we can make calling ->dump() conditional on it being positive. When it becomes zero or negative, it's time to return. This mirrors the semantics of the actual ->dump() function precisely.
Re: [PATCH v3] af_netlink: ensure that NLMSG_DONE never fails in dumps
On Thu, Nov 9, 2017 at 11:02 AM, Johannes Berg wrote: > nit: I think your line got a little long here :) Ack. For v4. > and here Ack. For v4. > >> + nlk->dump_done_errno = INT_MAX; > > I guess positive values aren't really returned from dump? When a positive value is returned, the API uses this to signify that the dump is not done, and it should be called again. When 0 or negative is returned, it means the dump is complete and the return value should be returned in DONE. We therefore initialize dump_done_errno to some positive value, so that we can make calling ->dump() conditional on it being positive. When it becomes zero or negative, it's time to return. This mirrors the semantics of the actual ->dump() function precisely.
[PATCH v3] af_netlink: ensure that NLMSG_DONE never fails in dumps
The way people generally use netlink_dump is that they fill in the skb as much as possible, breaking when nla_put returns an error. Then, they get called again and start filling out the next skb, and again, and so forth. The mechanism at work here is the ability for the iterative dumping function to detect when the skb is filled up and not fill it past the brim, waiting for a fresh skb for the rest of the data. However, if the attributes are small and nicely packed, it is possible that a dump callback function successfully fills in attributes until the skb is of size 4080 (libmnl's default page-sized receive buffer size). The dump function completes, satisfied, and then, if it happens to be that this is actually the last skb, and no further ones are to be sent, then netlink_dump will add on the NLMSG_DONE part: nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); It is very important that netlink_dump does this, of course. However, in this example, that call to nlmsg_put_answer will fail, because the previous filling by the dump function did not leave it enough room. And how could it possibly have done so? All of the nla_put variety of functions simply check to see if the skb has enough tailroom, independent of the context it is in. In order to keep the important assumptions of all netlink dump users, it is therefore important to give them an skb that has this end part of the tail already reserved, so that the call to nlmsg_put_answer does not fail. Otherwise, library authors are forced to find some bizarre sized receive buffer that has a large modulo relative to the common sizes of messages received, which is ugly and buggy. This patch thus saves the NLMSG_DONE for an additional message, for the case that things are dangerously close to the brim. This requires keeping track of the errno from ->dump() across calls. Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com> --- Can we get this into 4.14? Is there still time? It should also be queued up for stable. Changes v2->v3: - Johannes didn't like the subject line of the patch, so the only thing that's changed in this version is the new subject line. net/netlink/af_netlink.c | 14 -- net/netlink/af_netlink.h | 1 + 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index b93148e8e9fb..7020689e643e 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk) struct sk_buff *skb = NULL; struct nlmsghdr *nlh; struct module *module; - int len, err = -ENOBUFS; + int err = -ENOBUFS; int alloc_min_size; int alloc_size; @@ -2183,9 +2183,10 @@ static int netlink_dump(struct sock *sk) skb_reserve(skb, skb_tailroom(skb) - alloc_size); netlink_skb_set_owner_r(skb, sk); - len = cb->dump(skb, cb); + if (nlk->dump_done_errno > 0) + nlk->dump_done_errno = cb->dump(skb, cb); - if (len > 0) { + if (nlk->dump_done_errno > 0 || skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) { mutex_unlock(nlk->cb_mutex); if (sk_filter(sk, skb)) @@ -2195,13 +2196,13 @@ static int netlink_dump(struct sock *sk) return 0; } - nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); - if (!nlh) + nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(nlk->dump_done_errno), NLM_F_MULTI); + if (WARN_ON(!nlh)) goto errout_skb; nl_dump_check_consistent(cb, nlh); - memcpy(nlmsg_data(nlh), , sizeof(len)); + memcpy(nlmsg_data(nlh), >dump_done_errno, sizeof(nlk->dump_done_errno)); if (sk_filter(sk, skb)) kfree_skb(skb); @@ -2273,6 +2274,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, } nlk->cb_running = true; + nlk->dump_done_errno = INT_MAX; mutex_unlock(nlk->cb_mutex); diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h index 028188597eaa..962de7b3c023 100644 --- a/net/netlink/af_netlink.h +++ b/net/netlink/af_netlink.h @@ -34,6 +34,7 @@ struct netlink_sock { wait_queue_head_t wait; boolbound; boolcb_running; + int dump_done_errno; struct netlink_callback cb; struct mutex*cb_mutex; struct mutexcb_def_mutex; -- 2.15.0
[PATCH v3] af_netlink: ensure that NLMSG_DONE never fails in dumps
The way people generally use netlink_dump is that they fill in the skb as much as possible, breaking when nla_put returns an error. Then, they get called again and start filling out the next skb, and again, and so forth. The mechanism at work here is the ability for the iterative dumping function to detect when the skb is filled up and not fill it past the brim, waiting for a fresh skb for the rest of the data. However, if the attributes are small and nicely packed, it is possible that a dump callback function successfully fills in attributes until the skb is of size 4080 (libmnl's default page-sized receive buffer size). The dump function completes, satisfied, and then, if it happens to be that this is actually the last skb, and no further ones are to be sent, then netlink_dump will add on the NLMSG_DONE part: nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); It is very important that netlink_dump does this, of course. However, in this example, that call to nlmsg_put_answer will fail, because the previous filling by the dump function did not leave it enough room. And how could it possibly have done so? All of the nla_put variety of functions simply check to see if the skb has enough tailroom, independent of the context it is in. In order to keep the important assumptions of all netlink dump users, it is therefore important to give them an skb that has this end part of the tail already reserved, so that the call to nlmsg_put_answer does not fail. Otherwise, library authors are forced to find some bizarre sized receive buffer that has a large modulo relative to the common sizes of messages received, which is ugly and buggy. This patch thus saves the NLMSG_DONE for an additional message, for the case that things are dangerously close to the brim. This requires keeping track of the errno from ->dump() across calls. Signed-off-by: Jason A. Donenfeld --- Can we get this into 4.14? Is there still time? It should also be queued up for stable. Changes v2->v3: - Johannes didn't like the subject line of the patch, so the only thing that's changed in this version is the new subject line. net/netlink/af_netlink.c | 14 -- net/netlink/af_netlink.h | 1 + 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index b93148e8e9fb..7020689e643e 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk) struct sk_buff *skb = NULL; struct nlmsghdr *nlh; struct module *module; - int len, err = -ENOBUFS; + int err = -ENOBUFS; int alloc_min_size; int alloc_size; @@ -2183,9 +2183,10 @@ static int netlink_dump(struct sock *sk) skb_reserve(skb, skb_tailroom(skb) - alloc_size); netlink_skb_set_owner_r(skb, sk); - len = cb->dump(skb, cb); + if (nlk->dump_done_errno > 0) + nlk->dump_done_errno = cb->dump(skb, cb); - if (len > 0) { + if (nlk->dump_done_errno > 0 || skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) { mutex_unlock(nlk->cb_mutex); if (sk_filter(sk, skb)) @@ -2195,13 +2196,13 @@ static int netlink_dump(struct sock *sk) return 0; } - nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); - if (!nlh) + nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(nlk->dump_done_errno), NLM_F_MULTI); + if (WARN_ON(!nlh)) goto errout_skb; nl_dump_check_consistent(cb, nlh); - memcpy(nlmsg_data(nlh), , sizeof(len)); + memcpy(nlmsg_data(nlh), >dump_done_errno, sizeof(nlk->dump_done_errno)); if (sk_filter(sk, skb)) kfree_skb(skb); @@ -2273,6 +2274,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, } nlk->cb_running = true; + nlk->dump_done_errno = INT_MAX; mutex_unlock(nlk->cb_mutex); diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h index 028188597eaa..962de7b3c023 100644 --- a/net/netlink/af_netlink.h +++ b/net/netlink/af_netlink.h @@ -34,6 +34,7 @@ struct netlink_sock { wait_queue_head_t wait; boolbound; boolcb_running; + int dump_done_errno; struct netlink_callback cb; struct mutex*cb_mutex; struct mutexcb_def_mutex; -- 2.15.0
[PATCH v2] af_netlink: give correct bounds to dump skb for NLMSG_DONE
The way people generally use netlink_dump is that they fill in the skb as much as possible, breaking when nla_put returns an error. Then, they get called again and start filling out the next skb, and again, and so forth. The mechanism at work here is the ability for the iterative dumping function to detect when the skb is filled up and not fill it past the brim, waiting for a fresh skb for the rest of the data. However, if the attributes are small and nicely packed, it is possible that a dump callback function successfully fills in attributes until the skb is of size 4080 (libmnl's default page-sized receive buffer size). The dump function completes, satisfied, and then, if it happens to be that this is actually the last skb, and no further ones are to be sent, then netlink_dump will add on the NLMSG_DONE part: nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); It is very important that netlink_dump does this, of course. However, in this example, that call to nlmsg_put_answer will fail, because the previous filling by the dump function did not leave it enough room. And how could it possibly have done so? All of the nla_put variety of functions simply check to see if the skb has enough tailroom, independent of the context it is in. In order to keep the important assumptions of all netlink dump users, it is therefore important to give them an skb that has this end part of the tail already reserved, so that the call to nlmsg_put_answer does not fail. Otherwise, library authors are forced to find some bizarre sized receive buffer that has a large modulo relative to the common sizes of messages received, which is ugly and buggy. This patch thus saves the NLMSG_DONE for an additional message, for the case that things are dangerously close to the brim. This requires keeping track of the errno from ->dump() across calls. Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com> --- Changes v1->v2: - This implements things without having to twiddle with the skb, at the expense of potentially having an extra back-and-forth and quite a bit of added complexity. net/netlink/af_netlink.c | 14 -- net/netlink/af_netlink.h | 1 + 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index b93148e8e9fb..7020689e643e 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk) struct sk_buff *skb = NULL; struct nlmsghdr *nlh; struct module *module; - int len, err = -ENOBUFS; + int err = -ENOBUFS; int alloc_min_size; int alloc_size; @@ -2183,9 +2183,10 @@ static int netlink_dump(struct sock *sk) skb_reserve(skb, skb_tailroom(skb) - alloc_size); netlink_skb_set_owner_r(skb, sk); - len = cb->dump(skb, cb); + if (nlk->dump_done_errno > 0) + nlk->dump_done_errno = cb->dump(skb, cb); - if (len > 0) { + if (nlk->dump_done_errno > 0 || skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) { mutex_unlock(nlk->cb_mutex); if (sk_filter(sk, skb)) @@ -2195,13 +2196,13 @@ static int netlink_dump(struct sock *sk) return 0; } - nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); - if (!nlh) + nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(nlk->dump_done_errno), NLM_F_MULTI); + if (WARN_ON(!nlh)) goto errout_skb; nl_dump_check_consistent(cb, nlh); - memcpy(nlmsg_data(nlh), , sizeof(len)); + memcpy(nlmsg_data(nlh), >dump_done_errno, sizeof(nlk->dump_done_errno)); if (sk_filter(sk, skb)) kfree_skb(skb); @@ -2273,6 +2274,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, } nlk->cb_running = true; + nlk->dump_done_errno = INT_MAX; mutex_unlock(nlk->cb_mutex); diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h index 028188597eaa..962de7b3c023 100644 --- a/net/netlink/af_netlink.h +++ b/net/netlink/af_netlink.h @@ -34,6 +34,7 @@ struct netlink_sock { wait_queue_head_t wait; boolbound; boolcb_running; + int dump_done_errno; struct netlink_callback cb; struct mutex*cb_mutex; struct mutexcb_def_mutex; -- 2.15.0
[PATCH v2] af_netlink: give correct bounds to dump skb for NLMSG_DONE
The way people generally use netlink_dump is that they fill in the skb as much as possible, breaking when nla_put returns an error. Then, they get called again and start filling out the next skb, and again, and so forth. The mechanism at work here is the ability for the iterative dumping function to detect when the skb is filled up and not fill it past the brim, waiting for a fresh skb for the rest of the data. However, if the attributes are small and nicely packed, it is possible that a dump callback function successfully fills in attributes until the skb is of size 4080 (libmnl's default page-sized receive buffer size). The dump function completes, satisfied, and then, if it happens to be that this is actually the last skb, and no further ones are to be sent, then netlink_dump will add on the NLMSG_DONE part: nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); It is very important that netlink_dump does this, of course. However, in this example, that call to nlmsg_put_answer will fail, because the previous filling by the dump function did not leave it enough room. And how could it possibly have done so? All of the nla_put variety of functions simply check to see if the skb has enough tailroom, independent of the context it is in. In order to keep the important assumptions of all netlink dump users, it is therefore important to give them an skb that has this end part of the tail already reserved, so that the call to nlmsg_put_answer does not fail. Otherwise, library authors are forced to find some bizarre sized receive buffer that has a large modulo relative to the common sizes of messages received, which is ugly and buggy. This patch thus saves the NLMSG_DONE for an additional message, for the case that things are dangerously close to the brim. This requires keeping track of the errno from ->dump() across calls. Signed-off-by: Jason A. Donenfeld --- Changes v1->v2: - This implements things without having to twiddle with the skb, at the expense of potentially having an extra back-and-forth and quite a bit of added complexity. net/netlink/af_netlink.c | 14 -- net/netlink/af_netlink.h | 1 + 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index b93148e8e9fb..7020689e643e 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk) struct sk_buff *skb = NULL; struct nlmsghdr *nlh; struct module *module; - int len, err = -ENOBUFS; + int err = -ENOBUFS; int alloc_min_size; int alloc_size; @@ -2183,9 +2183,10 @@ static int netlink_dump(struct sock *sk) skb_reserve(skb, skb_tailroom(skb) - alloc_size); netlink_skb_set_owner_r(skb, sk); - len = cb->dump(skb, cb); + if (nlk->dump_done_errno > 0) + nlk->dump_done_errno = cb->dump(skb, cb); - if (len > 0) { + if (nlk->dump_done_errno > 0 || skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) { mutex_unlock(nlk->cb_mutex); if (sk_filter(sk, skb)) @@ -2195,13 +2196,13 @@ static int netlink_dump(struct sock *sk) return 0; } - nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); - if (!nlh) + nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(nlk->dump_done_errno), NLM_F_MULTI); + if (WARN_ON(!nlh)) goto errout_skb; nl_dump_check_consistent(cb, nlh); - memcpy(nlmsg_data(nlh), , sizeof(len)); + memcpy(nlmsg_data(nlh), >dump_done_errno, sizeof(nlk->dump_done_errno)); if (sk_filter(sk, skb)) kfree_skb(skb); @@ -2273,6 +2274,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, } nlk->cb_running = true; + nlk->dump_done_errno = INT_MAX; mutex_unlock(nlk->cb_mutex); diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h index 028188597eaa..962de7b3c023 100644 --- a/net/netlink/af_netlink.h +++ b/net/netlink/af_netlink.h @@ -34,6 +34,7 @@ struct netlink_sock { wait_queue_head_t wait; boolbound; boolcb_running; + int dump_done_errno; struct netlink_callback cb; struct mutex*cb_mutex; struct mutexcb_def_mutex; -- 2.15.0
Re: [PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE
Erf, your patch doesn't handle what happens if len comes back negative, but I'll fix it up and send a v2 using this approach. I think I really prefer v1 though. Jason
Re: [PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE
Erf, your patch doesn't handle what happens if len comes back negative, but I'll fix it up and send a v2 using this approach. I think I really prefer v1 though. Jason
Re: [PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE
Hi Johannes, Yes indeed. It sacrifices 24 bytes for making things much less complex. However, if you prefer increasing the complexity of the state machine a bit instead, I suppose we could roll with this approach instead... Jason
Re: [PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE
Hi Johannes, Yes indeed. It sacrifices 24 bytes for making things much less complex. However, if you prefer increasing the complexity of the state machine a bit instead, I suppose we could roll with this approach instead... Jason
Re: [PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE
By the way, in case you're curious, here's the {up,down,cross}stream WireGuard commit that works around it via its compat layer (a rat's nest of hideous backports for all the weird kernels people want WireGuard to run on, which I cannot wait to remove): https://git.zx2c4.com/WireGuard/commit/?id=f689ea7acc23dc8e0968699d964ee382b04fbbe4 Particularly relavent here is the last chunk of that, which is part of the automated test suite, which reproduces the issue by finding the tightest possible packing.
Re: [PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE
By the way, in case you're curious, here's the {up,down,cross}stream WireGuard commit that works around it via its compat layer (a rat's nest of hideous backports for all the weird kernels people want WireGuard to run on, which I cannot wait to remove): https://git.zx2c4.com/WireGuard/commit/?id=f689ea7acc23dc8e0968699d964ee382b04fbbe4 Particularly relavent here is the last chunk of that, which is part of the automated test suite, which reproduces the issue by finding the tightest possible packing.
[PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE
The way people generally use netlink_dump is that they fill in the skb as much as possible, breaking when nla_put returns an error. Then, they get called again and start filling out the next skb, and again, and so forth. The mechanism at work here is the ability for the iterative dumping function to detect when the skb is filled up and not fill it past the brim, waiting for a fresh skb for the rest of the data. However, if the attributes are small and nicely packed, it is possible that a dump callback function successfully fills in attributes until the skb is of size 4080 (libmnl's default page-sized receive buffer size). The dump function completes, satisfied, and then, if it happens to be that this is actually the last skb, and no further ones are to be sent, then netlink_dump will add on the NLMSG_DONE part: nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); It is very important that netlink_dump does this, of course. However, in this example, that call to nlmsg_put_answer will fail, because the previous filling by the dump function did not leave it enough room. And how could it possibly have done so? All of the nla_put variety of functions simply check to see if the skb has enough tailroom, independent of the context it is in. In order to keep the important assumptions of all netlink dump users, it is therefore important to give them an skb that has this end part of the tail already reserved, so that the call to nlmsg_put_answer does not fail. Otherwise, library authors are forced to find some bizarre sized receive buffer that has a large modulo relative to the common sizes of messages received, which is ugly and buggy. This patch thus reserves and restores the required length for NLMSG_DONE during the call to the dump function. Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com> --- See you all at netdevconf tomorrow! net/netlink/af_netlink.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index b93148e8e9fb..b2d0a2fb1939 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2183,7 +2183,9 @@ static int netlink_dump(struct sock *sk) skb_reserve(skb, skb_tailroom(skb) - alloc_size); netlink_skb_set_owner_r(skb, sk); + skb->end -= nlmsg_total_size(sizeof(len)); len = cb->dump(skb, cb); + skb->end += nlmsg_total_size(sizeof(len)); if (len > 0) { mutex_unlock(nlk->cb_mutex); -- 2.15.0
[PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE
The way people generally use netlink_dump is that they fill in the skb as much as possible, breaking when nla_put returns an error. Then, they get called again and start filling out the next skb, and again, and so forth. The mechanism at work here is the ability for the iterative dumping function to detect when the skb is filled up and not fill it past the brim, waiting for a fresh skb for the rest of the data. However, if the attributes are small and nicely packed, it is possible that a dump callback function successfully fills in attributes until the skb is of size 4080 (libmnl's default page-sized receive buffer size). The dump function completes, satisfied, and then, if it happens to be that this is actually the last skb, and no further ones are to be sent, then netlink_dump will add on the NLMSG_DONE part: nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); It is very important that netlink_dump does this, of course. However, in this example, that call to nlmsg_put_answer will fail, because the previous filling by the dump function did not leave it enough room. And how could it possibly have done so? All of the nla_put variety of functions simply check to see if the skb has enough tailroom, independent of the context it is in. In order to keep the important assumptions of all netlink dump users, it is therefore important to give them an skb that has this end part of the tail already reserved, so that the call to nlmsg_put_answer does not fail. Otherwise, library authors are forced to find some bizarre sized receive buffer that has a large modulo relative to the common sizes of messages received, which is ugly and buggy. This patch thus reserves and restores the required length for NLMSG_DONE during the call to the dump function. Signed-off-by: Jason A. Donenfeld --- See you all at netdevconf tomorrow! net/netlink/af_netlink.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index b93148e8e9fb..b2d0a2fb1939 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2183,7 +2183,9 @@ static int netlink_dump(struct sock *sk) skb_reserve(skb, skb_tailroom(skb) - alloc_size); netlink_skb_set_owner_r(skb, sk); + skb->end -= nlmsg_total_size(sizeof(len)); len = cb->dump(skb, cb); + skb->end += nlmsg_total_size(sizeof(len)); if (len > 0) { mutex_unlock(nlk->cb_mutex); -- 2.15.0
Re: [PATCH v4] arm64: support __int128 on gcc 5+
Hi Will, On Tue, Nov 7, 2017 at 11:13 AM, Will Deaconwrote: > So I pushed a fixup patch on top of for-next/core, but I suggest we > temporarily revert the- DCONFIG_ARCH_SUPPORTS_INT128 option if any other > issues crop up. The fixup looks good to me. If there are additional problems, I'm happy to keep providing patches until it's sorted. I just did an allyesconfig and didn't have any problems, though, so I think we're all set now. Jason
Re: [PATCH v4] arm64: support __int128 on gcc 5+
Hi Will, On Tue, Nov 7, 2017 at 11:13 AM, Will Deacon wrote: > So I pushed a fixup patch on top of for-next/core, but I suggest we > temporarily revert the- DCONFIG_ARCH_SUPPORTS_INT128 option if any other > issues crop up. The fixup looks good to me. If there are additional problems, I'm happy to keep providing patches until it's sorted. I just did an allyesconfig and didn't have any problems, though, so I think we're all set now. Jason
Re: [PATCH v4] arm64: support __int128 on gcc 5+
On Tue, Nov 7, 2017 at 1:55 AM, Ard Biesheuvelwrote: > It appears your v4 adds __ashlti3() and __ashrti3, whereas the error > is about __lshrti3() being undefined. Whoopsie. v5 adds the final missing function. Looks like it now compiles for -next with the config you provided me.
Re: [PATCH v4] arm64: support __int128 on gcc 5+
On Tue, Nov 7, 2017 at 1:55 AM, Ard Biesheuvel wrote: > It appears your v4 adds __ashlti3() and __ashrti3, whereas the error > is about __lshrti3() being undefined. Whoopsie. v5 adds the final missing function. Looks like it now compiles for -next with the config you provided me.
[PATCH v5] arm64: support __int128 on gcc 5+
Versions of gcc prior to gcc 5 emitted a __multi3 function call when dealing with TI types, resulting in failures when trying to link to libgcc, and more generally, bad performance. However, since gcc 5, the compiler supports actually emitting fast instructions, which means we can at long last enable this option and receive the speedups. The gcc commit that added proper Aarch64 support is: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d This commit appears to be part of the gcc 5 release. There are still a few instructions, __ashlti3, __ashrti3, and __lshrti3 which require libgcc, which is fine. Rather than linking to libgcc, we simply provide them ourselves, since they're not that complicated. Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com> --- Changes v4->v5: - Add implementation of __lshrti3. (Note that __lshlti3 is not a function.) arch/arm64/Makefile | 2 ++ arch/arm64/lib/Makefile | 2 +- arch/arm64/lib/tishift.S | 80 3 files changed, 83 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/lib/tishift.S diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 939b310913cf..1f8a0fec6998 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -53,6 +53,8 @@ KBUILD_AFLAGS += $(lseinstr) $(brokengasinst) KBUILD_CFLAGS += $(call cc-option,-mabi=lp64) KBUILD_AFLAGS += $(call cc-option,-mabi=lp64) +KBUILD_CFLAGS += $(call cc-ifversion, -ge, 0500, -DCONFIG_ARCH_SUPPORTS_INT128) + ifeq ($(CONFIG_CPU_BIG_ENDIAN), y) KBUILD_CPPFLAGS+= -mbig-endian CHECKFLAGS += -D__AARCH64EB__ diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile index 9a8cb96555d6..4e696f96451f 100644 --- a/arch/arm64/lib/Makefile +++ b/arch/arm64/lib/Makefile @@ -3,7 +3,7 @@ lib-y := bitops.o clear_user.o delay.o copy_from_user.o \ copy_to_user.o copy_in_user.o copy_page.o\ clear_page.o memchr.o memcpy.o memmove.o memset.o\ memcmp.o strcmp.o strncmp.o strlen.o strnlen.o \ - strchr.o strrchr.o + strchr.o strrchr.o tishift.o # Tell the compiler to treat all general purpose registers (with the # exception of the IP registers, which are already handled by the caller diff --git a/arch/arm64/lib/tishift.S b/arch/arm64/lib/tishift.S new file mode 100644 index ..d3db9b2cd479 --- /dev/null +++ b/arch/arm64/lib/tishift.S @@ -0,0 +1,80 @@ +/* + * Copyright (C) 2017 Jason A. Donenfeld <ja...@zx2c4.com>. All Rights Reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#include + +ENTRY(__ashlti3) + cbz x2, 1f + mov x3, #64 + sub x3, x3, x2 + cmp x3, #0 + b.le2f + lsl x1, x1, x2 + lsr x3, x0, x3 + lsl x2, x0, x2 + orr x1, x1, x3 + mov x0, x2 +1: + ret +2: + neg w1, w3 + mov x2, #0 + lsl x1, x0, x1 + mov x0, x2 + ret +ENDPROC(__ashlti3) + +ENTRY(__ashrti3) + cbz x2, 1f + mov x3, #64 + sub x3, x3, x2 + cmp x3, #0 + b.le2f + lsr x0, x0, x2 + lsl x3, x1, x3 + asr x2, x1, x2 + orr x0, x0, x3 + mov x1, x2 +1: + ret +2: + neg w0, w3 + asr x2, x1, #63 + asr x0, x1, x0 + mov x1, x2 + ret +ENDPROC(__ashrti3) + +ENTRY(__lshrti3) + cbz x2, 1f + mov x3, #64 + sub x3, x3, x2 + cmp x3, #0 + b.le2f + lsr x0, x0, x2 + lsl x3, x1, x3 + lsr x2, x1, x2 + orr x0, x0, x3 + mov x1, x2 +1: + ret +2: + neg w0, w3 + mov x2, #0 + lsr x0, x1, x0 + mov x1, x2 + ret +ENDPROC(__lshrti3) -- 2.15.0
[PATCH v5] arm64: support __int128 on gcc 5+
Versions of gcc prior to gcc 5 emitted a __multi3 function call when dealing with TI types, resulting in failures when trying to link to libgcc, and more generally, bad performance. However, since gcc 5, the compiler supports actually emitting fast instructions, which means we can at long last enable this option and receive the speedups. The gcc commit that added proper Aarch64 support is: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d This commit appears to be part of the gcc 5 release. There are still a few instructions, __ashlti3, __ashrti3, and __lshrti3 which require libgcc, which is fine. Rather than linking to libgcc, we simply provide them ourselves, since they're not that complicated. Signed-off-by: Jason A. Donenfeld --- Changes v4->v5: - Add implementation of __lshrti3. (Note that __lshlti3 is not a function.) arch/arm64/Makefile | 2 ++ arch/arm64/lib/Makefile | 2 +- arch/arm64/lib/tishift.S | 80 3 files changed, 83 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/lib/tishift.S diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 939b310913cf..1f8a0fec6998 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -53,6 +53,8 @@ KBUILD_AFLAGS += $(lseinstr) $(brokengasinst) KBUILD_CFLAGS += $(call cc-option,-mabi=lp64) KBUILD_AFLAGS += $(call cc-option,-mabi=lp64) +KBUILD_CFLAGS += $(call cc-ifversion, -ge, 0500, -DCONFIG_ARCH_SUPPORTS_INT128) + ifeq ($(CONFIG_CPU_BIG_ENDIAN), y) KBUILD_CPPFLAGS+= -mbig-endian CHECKFLAGS += -D__AARCH64EB__ diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile index 9a8cb96555d6..4e696f96451f 100644 --- a/arch/arm64/lib/Makefile +++ b/arch/arm64/lib/Makefile @@ -3,7 +3,7 @@ lib-y := bitops.o clear_user.o delay.o copy_from_user.o \ copy_to_user.o copy_in_user.o copy_page.o\ clear_page.o memchr.o memcpy.o memmove.o memset.o\ memcmp.o strcmp.o strncmp.o strlen.o strnlen.o \ - strchr.o strrchr.o + strchr.o strrchr.o tishift.o # Tell the compiler to treat all general purpose registers (with the # exception of the IP registers, which are already handled by the caller diff --git a/arch/arm64/lib/tishift.S b/arch/arm64/lib/tishift.S new file mode 100644 index ..d3db9b2cd479 --- /dev/null +++ b/arch/arm64/lib/tishift.S @@ -0,0 +1,80 @@ +/* + * Copyright (C) 2017 Jason A. Donenfeld . All Rights Reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#include + +ENTRY(__ashlti3) + cbz x2, 1f + mov x3, #64 + sub x3, x3, x2 + cmp x3, #0 + b.le2f + lsl x1, x1, x2 + lsr x3, x0, x3 + lsl x2, x0, x2 + orr x1, x1, x3 + mov x0, x2 +1: + ret +2: + neg w1, w3 + mov x2, #0 + lsl x1, x0, x1 + mov x0, x2 + ret +ENDPROC(__ashlti3) + +ENTRY(__ashrti3) + cbz x2, 1f + mov x3, #64 + sub x3, x3, x2 + cmp x3, #0 + b.le2f + lsr x0, x0, x2 + lsl x3, x1, x3 + asr x2, x1, x2 + orr x0, x0, x3 + mov x1, x2 +1: + ret +2: + neg w0, w3 + asr x2, x1, #63 + asr x0, x1, x0 + mov x1, x2 + ret +ENDPROC(__ashrti3) + +ENTRY(__lshrti3) + cbz x2, 1f + mov x3, #64 + sub x3, x3, x2 + cmp x3, #0 + b.le2f + lsr x0, x0, x2 + lsl x3, x1, x3 + lsr x2, x1, x2 + orr x0, x0, x3 + mov x1, x2 +1: + ret +2: + neg w0, w3 + mov x2, #0 + lsr x0, x1, x0 + mov x1, x2 + ret +ENDPROC(__lshrti3) -- 2.15.0
[PATCH v4] arm64: support __int128 on gcc 5+
Versions of gcc prior to gcc 5 emitted a __multi3 function call when dealing with TI types, resulting in failures when trying to link to libgcc, and more generally, bad performance. However, since gcc 5, the compiler supports actually emitting fast instructions, which means we can at long last enable this option and receive the speedups. The gcc commit that added proper Aarch64 support is: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d This commit appears to be part of the gcc 5 release. There are still a few instructions, __ashlti3 and __ashrti3, which require libgcc, which is fine. Rather than linking to libgcc, we simply provide them ourselves, since they're not that complicated. Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com> --- Changes v3->v4: - Typo in comment - Relabel second function arch/arm64/Makefile | 2 ++ arch/arm64/lib/Makefile | 2 +- arch/arm64/lib/tishift.S | 59 3 files changed, 62 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/lib/tishift.S diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 939b310913cf..1f8a0fec6998 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -53,6 +53,8 @@ KBUILD_AFLAGS += $(lseinstr) $(brokengasinst) KBUILD_CFLAGS += $(call cc-option,-mabi=lp64) KBUILD_AFLAGS += $(call cc-option,-mabi=lp64) +KBUILD_CFLAGS += $(call cc-ifversion, -ge, 0500, -DCONFIG_ARCH_SUPPORTS_INT128) + ifeq ($(CONFIG_CPU_BIG_ENDIAN), y) KBUILD_CPPFLAGS+= -mbig-endian CHECKFLAGS += -D__AARCH64EB__ diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile index a0abc142c92b..55bdb01f1ea6 100644 --- a/arch/arm64/lib/Makefile +++ b/arch/arm64/lib/Makefile @@ -2,7 +2,7 @@ lib-y := bitops.o clear_user.o delay.o copy_from_user.o \ copy_to_user.o copy_in_user.o copy_page.o\ clear_page.o memchr.o memcpy.o memmove.o memset.o\ memcmp.o strcmp.o strncmp.o strlen.o strnlen.o \ - strchr.o strrchr.o + strchr.o strrchr.o tishift.o # Tell the compiler to treat all general purpose registers (with the # exception of the IP registers, which are already handled by the caller diff --git a/arch/arm64/lib/tishift.S b/arch/arm64/lib/tishift.S new file mode 100644 index ..bffe03c478a5 --- /dev/null +++ b/arch/arm64/lib/tishift.S @@ -0,0 +1,59 @@ +/* + * Copyright (C) 2017 Jason A. Donenfeld <ja...@zx2c4.com>. All Rights Reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#include + +ENTRY(__ashlti3) + cbz x2, 1f + mov x3, #64 + sub x3, x3, x2 + cmp x3, #0 + b.le2f + lsl x1, x1, x2 + lsr x3, x0, x3 + lsl x2, x0, x2 + orr x1, x1, x3 + mov x0, x2 +1: + ret +2: + neg w1, w3 + mov x2, #0 + lsl x1, x0, x1 + mov x0, x2 + ret +ENDPROC(__ashlti3) + +ENTRY(__ashrti3) + cbz x2, 1f + mov x3, #64 + sub x3, x3, x2 + cmp x3, #0 + b.le2f + lsr x0, x0, x2 + lsl x3, x1, x3 + asr x2, x1, x2 + orr x0, x0, x3 + mov x1, x2 +1: + ret +2: + neg w0, w3 + asr x2, x1, #63 + asr x0, x1, x0 + mov x1, x2 + ret +ENDPROC(__ashrti3) -- 2.15.0
[PATCH v4] arm64: support __int128 on gcc 5+
Versions of gcc prior to gcc 5 emitted a __multi3 function call when dealing with TI types, resulting in failures when trying to link to libgcc, and more generally, bad performance. However, since gcc 5, the compiler supports actually emitting fast instructions, which means we can at long last enable this option and receive the speedups. The gcc commit that added proper Aarch64 support is: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d This commit appears to be part of the gcc 5 release. There are still a few instructions, __ashlti3 and __ashrti3, which require libgcc, which is fine. Rather than linking to libgcc, we simply provide them ourselves, since they're not that complicated. Signed-off-by: Jason A. Donenfeld --- Changes v3->v4: - Typo in comment - Relabel second function arch/arm64/Makefile | 2 ++ arch/arm64/lib/Makefile | 2 +- arch/arm64/lib/tishift.S | 59 3 files changed, 62 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/lib/tishift.S diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 939b310913cf..1f8a0fec6998 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -53,6 +53,8 @@ KBUILD_AFLAGS += $(lseinstr) $(brokengasinst) KBUILD_CFLAGS += $(call cc-option,-mabi=lp64) KBUILD_AFLAGS += $(call cc-option,-mabi=lp64) +KBUILD_CFLAGS += $(call cc-ifversion, -ge, 0500, -DCONFIG_ARCH_SUPPORTS_INT128) + ifeq ($(CONFIG_CPU_BIG_ENDIAN), y) KBUILD_CPPFLAGS+= -mbig-endian CHECKFLAGS += -D__AARCH64EB__ diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile index a0abc142c92b..55bdb01f1ea6 100644 --- a/arch/arm64/lib/Makefile +++ b/arch/arm64/lib/Makefile @@ -2,7 +2,7 @@ lib-y := bitops.o clear_user.o delay.o copy_from_user.o \ copy_to_user.o copy_in_user.o copy_page.o\ clear_page.o memchr.o memcpy.o memmove.o memset.o\ memcmp.o strcmp.o strncmp.o strlen.o strnlen.o \ - strchr.o strrchr.o + strchr.o strrchr.o tishift.o # Tell the compiler to treat all general purpose registers (with the # exception of the IP registers, which are already handled by the caller diff --git a/arch/arm64/lib/tishift.S b/arch/arm64/lib/tishift.S new file mode 100644 index ..bffe03c478a5 --- /dev/null +++ b/arch/arm64/lib/tishift.S @@ -0,0 +1,59 @@ +/* + * Copyright (C) 2017 Jason A. Donenfeld . All Rights Reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#include + +ENTRY(__ashlti3) + cbz x2, 1f + mov x3, #64 + sub x3, x3, x2 + cmp x3, #0 + b.le2f + lsl x1, x1, x2 + lsr x3, x0, x3 + lsl x2, x0, x2 + orr x1, x1, x3 + mov x0, x2 +1: + ret +2: + neg w1, w3 + mov x2, #0 + lsl x1, x0, x1 + mov x0, x2 + ret +ENDPROC(__ashlti3) + +ENTRY(__ashrti3) + cbz x2, 1f + mov x3, #64 + sub x3, x3, x2 + cmp x3, #0 + b.le2f + lsr x0, x0, x2 + lsl x3, x1, x3 + asr x2, x1, x2 + orr x0, x0, x3 + mov x1, x2 +1: + ret +2: + neg w0, w3 + asr x2, x1, #63 + asr x0, x1, x0 + mov x1, x2 + ret +ENDPROC(__ashrti3) -- 2.15.0
[PATCH v3] arm64: support __int128 on gcc 5+
Versions of gcc prior to gcc 5 emitted a __multi3 function call when dealing with TI types, resulting in failures when trying to link to libgcc, and more generally, bad performance. However, since gcc 5, the compiler supports actually emitting fast instructions, which means we can at long last enable this option and receive the speedups. The gcc commit that added proper Aarch64 support is: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d This commit appears to be part of the gcc 5 release. There are still a few instructions, __ashlti3 and __ashrti3, which require libgcc, which is fine. Rather than linking to libgcc, we simply provide them ourselves, since they're not that complicated. Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com> --- Changes v2->v3: - We now just provide trivial implementations for the missing functions, rather than linking to libgcc. arch/arm64/Makefile | 2 ++ arch/arm64/lib/Makefile | 2 +- arch/arm64/lib/tishift.S | 59 3 files changed, 62 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/lib/tishift.S diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 939b310913cf..1f8a0fec6998 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -53,6 +53,8 @@ KBUILD_AFLAGS += $(lseinstr) $(brokengasinst) KBUILD_CFLAGS += $(call cc-option,-mabi=lp64) KBUILD_AFLAGS += $(call cc-option,-mabi=lp64) +KBUILD_CFLAGS += $(call cc-ifversion, -ge, 0500, -DCONFIG_ARCH_SUPPORTS_INT128) + ifeq ($(CONFIG_CPU_BIG_ENDIAN), y) KBUILD_CPPFLAGS+= -mbig-endian CHECKFLAGS += -D__AARCH64EB__ diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile index a0abc142c92b..55bdb01f1ea6 100644 --- a/arch/arm64/lib/Makefile +++ b/arch/arm64/lib/Makefile @@ -2,7 +2,7 @@ lib-y := bitops.o clear_user.o delay.o copy_from_user.o \ copy_to_user.o copy_in_user.o copy_page.o\ clear_page.o memchr.o memcpy.o memmove.o memset.o\ memcmp.o strcmp.o strncmp.o strlen.o strnlen.o \ - strchr.o strrchr.o + strchr.o strrchr.o tishift.o # Tell the compiler to treat all general purpose registers (with the # exception of the IP registers, which are already handled by the caller diff --git a/arch/arm64/lib/tishift.S b/arch/arm64/lib/tishift.S new file mode 100644 index ..47b6f7ee2b11 --- /dev/null +++ b/arch/arm64/lib/tishift.S @@ -0,0 +1,59 @@ +/* + * Copyright (C) 2017 Jason A. Donenfeld <ja...@zx2c4.com>. All Rights Resreved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#include + +ENTRY(__ashlti3) + cbz x2, 1f + mov x3, #64 + sub x3, x3, x2 + cmp x3, #0 + b.le2f + lsl x1, x1, x2 + lsr x3, x0, x3 + lsl x2, x0, x2 + orr x1, x1, x3 + mov x0, x2 +1: + ret +2: + neg w1, w3 + mov x2, #0 + lsl x1, x0, x1 + mov x0, x2 + ret +ENDPROC(__ashlti3) + +ENTRY(__ashrti3) + cbz x2, 3f + mov x3, #64 + sub x3, x3, x2 + cmp x3, #0 + b.le4f + lsr x0, x0, x2 + lsl x3, x1, x3 + asr x2, x1, x2 + orr x0, x0, x3 + mov x1, x2 +3: + ret +4: + neg w0, w3 + asr x2, x1, #63 + asr x0, x1, x0 + mov x1, x2 + ret +ENDPROC(__ashrti3) -- 2.14.2
[PATCH v3] arm64: support __int128 on gcc 5+
Versions of gcc prior to gcc 5 emitted a __multi3 function call when dealing with TI types, resulting in failures when trying to link to libgcc, and more generally, bad performance. However, since gcc 5, the compiler supports actually emitting fast instructions, which means we can at long last enable this option and receive the speedups. The gcc commit that added proper Aarch64 support is: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d This commit appears to be part of the gcc 5 release. There are still a few instructions, __ashlti3 and __ashrti3, which require libgcc, which is fine. Rather than linking to libgcc, we simply provide them ourselves, since they're not that complicated. Signed-off-by: Jason A. Donenfeld --- Changes v2->v3: - We now just provide trivial implementations for the missing functions, rather than linking to libgcc. arch/arm64/Makefile | 2 ++ arch/arm64/lib/Makefile | 2 +- arch/arm64/lib/tishift.S | 59 3 files changed, 62 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/lib/tishift.S diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 939b310913cf..1f8a0fec6998 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -53,6 +53,8 @@ KBUILD_AFLAGS += $(lseinstr) $(brokengasinst) KBUILD_CFLAGS += $(call cc-option,-mabi=lp64) KBUILD_AFLAGS += $(call cc-option,-mabi=lp64) +KBUILD_CFLAGS += $(call cc-ifversion, -ge, 0500, -DCONFIG_ARCH_SUPPORTS_INT128) + ifeq ($(CONFIG_CPU_BIG_ENDIAN), y) KBUILD_CPPFLAGS+= -mbig-endian CHECKFLAGS += -D__AARCH64EB__ diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile index a0abc142c92b..55bdb01f1ea6 100644 --- a/arch/arm64/lib/Makefile +++ b/arch/arm64/lib/Makefile @@ -2,7 +2,7 @@ lib-y := bitops.o clear_user.o delay.o copy_from_user.o \ copy_to_user.o copy_in_user.o copy_page.o\ clear_page.o memchr.o memcpy.o memmove.o memset.o\ memcmp.o strcmp.o strncmp.o strlen.o strnlen.o \ - strchr.o strrchr.o + strchr.o strrchr.o tishift.o # Tell the compiler to treat all general purpose registers (with the # exception of the IP registers, which are already handled by the caller diff --git a/arch/arm64/lib/tishift.S b/arch/arm64/lib/tishift.S new file mode 100644 index ..47b6f7ee2b11 --- /dev/null +++ b/arch/arm64/lib/tishift.S @@ -0,0 +1,59 @@ +/* + * Copyright (C) 2017 Jason A. Donenfeld . All Rights Resreved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#include + +ENTRY(__ashlti3) + cbz x2, 1f + mov x3, #64 + sub x3, x3, x2 + cmp x3, #0 + b.le2f + lsl x1, x1, x2 + lsr x3, x0, x3 + lsl x2, x0, x2 + orr x1, x1, x3 + mov x0, x2 +1: + ret +2: + neg w1, w3 + mov x2, #0 + lsl x1, x0, x1 + mov x0, x2 + ret +ENDPROC(__ashlti3) + +ENTRY(__ashrti3) + cbz x2, 3f + mov x3, #64 + sub x3, x3, x2 + cmp x3, #0 + b.le4f + lsr x0, x0, x2 + lsl x3, x1, x3 + asr x2, x1, x2 + orr x0, x0, x3 + mov x1, x2 +3: + ret +4: + neg w0, w3 + asr x2, x1, #63 + asr x0, x1, x0 + mov x1, x2 + ret +ENDPROC(__ashrti3) -- 2.14.2
Re: [PATCH v2] arm64: support __int128 on gcc 5+
On Fri, Nov 3, 2017 at 2:42 PM, Will Deaconwrote: > We used to link against libgcc way back when, but that dependency was > removed in commit d67703a8a69e ("arm64: kill off the libgcc dependency") > and I'm really not keen to add it back. I also think that there might > be licensing concerns if you link against it and make use of GCC plugins, > but IANAL. Considering many other architectures link to it, that seems strange to me... > > Shouldn't we just provide our own implementations of __ashlti3 and > __ashrti3 instead? I just coded up an implementation. I'm testing it now and I'll have v3 for you shortly. Jason
Re: [PATCH v2] arm64: support __int128 on gcc 5+
On Fri, Nov 3, 2017 at 2:42 PM, Will Deacon wrote: > We used to link against libgcc way back when, but that dependency was > removed in commit d67703a8a69e ("arm64: kill off the libgcc dependency") > and I'm really not keen to add it back. I also think that there might > be licensing concerns if you link against it and make use of GCC plugins, > but IANAL. Considering many other architectures link to it, that seems strange to me... > > Shouldn't we just provide our own implementations of __ashlti3 and > __ashrti3 instead? I just coded up an implementation. I'm testing it now and I'll have v3 for you shortly. Jason
Re: [PATCH] arm64: support __int128 on gcc 5+
Hi Will, Nice catch. I posted a v2 that addresses that. Thanks, Jason
Re: [PATCH] arm64: support __int128 on gcc 5+
Hi Will, Nice catch. I posted a v2 that addresses that. Thanks, Jason
[PATCH v2] arm64: support __int128 on gcc 5+
Versions of gcc prior to gcc 5 emitted a __multi3 function call when dealing with TI types, resulting in failures when trying to link to libgcc, and more generally, bad performance. However, since gcc 5, the compiler supports actually emitting fast instructions, which means we can at long last enable this option and receive the speedups. The gcc commit that added proper Aarch64 support is: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d This commit appears to be part of the gcc 5 release. There are still a few instructions, such as __ashlti3 and __ashrti3, that require libgcc, which is fine. So, we also link to libgcc for these functions when needed, which is what several other architectures already have been doing for a long time. Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com> --- arch/arm64/Makefile | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 939b310913cf..70c7c0c1bccb 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -53,6 +53,8 @@ KBUILD_AFLAGS += $(lseinstr) $(brokengasinst) KBUILD_CFLAGS += $(call cc-option,-mabi=lp64) KBUILD_AFLAGS += $(call cc-option,-mabi=lp64) +KBUILD_CFLAGS += $(call cc-ifversion, -ge, 0500, -DCONFIG_ARCH_SUPPORTS_INT128) + ifeq ($(CONFIG_CPU_BIG_ENDIAN), y) KBUILD_CPPFLAGS+= -mbig-endian CHECKFLAGS += -D__AARCH64EB__ @@ -108,7 +110,7 @@ core-$(CONFIG_NET) += arch/arm64/net/ core-$(CONFIG_KVM) += arch/arm64/kvm/ core-$(CONFIG_XEN) += arch/arm64/xen/ core-$(CONFIG_CRYPTO) += arch/arm64/crypto/ -libs-y := arch/arm64/lib/ $(libs-y) +libs-y := arch/arm64/lib/ $(shell $(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name) $(libs-y) core-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a # Default target when executing plain make -- 2.14.2
[PATCH v2] arm64: support __int128 on gcc 5+
Versions of gcc prior to gcc 5 emitted a __multi3 function call when dealing with TI types, resulting in failures when trying to link to libgcc, and more generally, bad performance. However, since gcc 5, the compiler supports actually emitting fast instructions, which means we can at long last enable this option and receive the speedups. The gcc commit that added proper Aarch64 support is: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d This commit appears to be part of the gcc 5 release. There are still a few instructions, such as __ashlti3 and __ashrti3, that require libgcc, which is fine. So, we also link to libgcc for these functions when needed, which is what several other architectures already have been doing for a long time. Signed-off-by: Jason A. Donenfeld --- arch/arm64/Makefile | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 939b310913cf..70c7c0c1bccb 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -53,6 +53,8 @@ KBUILD_AFLAGS += $(lseinstr) $(brokengasinst) KBUILD_CFLAGS += $(call cc-option,-mabi=lp64) KBUILD_AFLAGS += $(call cc-option,-mabi=lp64) +KBUILD_CFLAGS += $(call cc-ifversion, -ge, 0500, -DCONFIG_ARCH_SUPPORTS_INT128) + ifeq ($(CONFIG_CPU_BIG_ENDIAN), y) KBUILD_CPPFLAGS+= -mbig-endian CHECKFLAGS += -D__AARCH64EB__ @@ -108,7 +110,7 @@ core-$(CONFIG_NET) += arch/arm64/net/ core-$(CONFIG_KVM) += arch/arm64/kvm/ core-$(CONFIG_XEN) += arch/arm64/xen/ core-$(CONFIG_CRYPTO) += arch/arm64/crypto/ -libs-y := arch/arm64/lib/ $(libs-y) +libs-y := arch/arm64/lib/ $(shell $(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name) $(libs-y) core-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a # Default target when executing plain make -- 2.14.2
Re: [PATCH v7] printk: hash addresses printed with %p
On Mon, Oct 30, 2017 at 9:22 PM, Steven Rostedtwrote: > How quickly do you need static_branch_disable() executed? You could > always pass the work off to a worker thread (that can schedule). > > random_ready_callback -> initiates worker thread -> enables the static branch I had already suggested that much earlier in the thread, but discounted it in the very same suggestion, because that branch turns out to be not that expensive anyway, and I think it's more important that we're able to print meaningful values as soon as we can, rather than waiting for the scheduler.
Re: [PATCH v7] printk: hash addresses printed with %p
On Mon, Oct 30, 2017 at 9:22 PM, Steven Rostedt wrote: > How quickly do you need static_branch_disable() executed? You could > always pass the work off to a worker thread (that can schedule). > > random_ready_callback -> initiates worker thread -> enables the static branch I had already suggested that much earlier in the thread, but discounted it in the very same suggestion, because that branch turns out to be not that expensive anyway, and I think it's more important that we're able to print meaningful values as soon as we can, rather than waiting for the scheduler.
Re: [PATCH] arm64: support __int128 on gcc 5+
On Tue, Oct 31, 2017 at 1:17 PM, Will Deaconwrote: > > Sure, I'll wait for your patch that matches the relevant compiler versions. That's the patch in this thread: http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/539943.html As mentioned in there, gcc does the right thing for 5 and up: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d
Re: [PATCH] arm64: support __int128 on gcc 5+
On Tue, Oct 31, 2017 at 1:17 PM, Will Deacon wrote: > > Sure, I'll wait for your patch that matches the relevant compiler versions. That's the patch in this thread: http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/539943.html As mentioned in there, gcc does the right thing for 5 and up: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d
Re: [PATCH] arm64: support __int128 on gcc 5+
Hi Will, On Tue, Oct 31, 2017 at 12:51 PM, Will Deaconwrote: > Which code in the kernel actually uses 128-bit types directly? I know we > have some unfortunate occurences in our headers (including uapi) for the > vector registers, but I thought we generally used asm or copy routines to > access those. math64.h provides it, and various things throughout use those functions. Notably, the scheduler and kvm use those the __int128 functions. There's also an elliptic curve implementation that uses it, which makes a big difference. And soon I'll be adding an implementation of curve25519 that will make heavy use of these instructions for significant speedups as well. Generally, adding this CONFIG_ARCH key is a hint for current and future bits of code, so that they can use the faster 128-bit implementations when available. Not providing it when it's there and available would be silly. This wasn't originally added to the architecture, because when the Kconfig symbol was added, gcc didn't have sane support for it on aarch64. But now it does, so let's support it. Jason
Re: [PATCH] arm64: support __int128 on gcc 5+
Hi Will, On Tue, Oct 31, 2017 at 12:51 PM, Will Deacon wrote: > Which code in the kernel actually uses 128-bit types directly? I know we > have some unfortunate occurences in our headers (including uapi) for the > vector registers, but I thought we generally used asm or copy routines to > access those. math64.h provides it, and various things throughout use those functions. Notably, the scheduler and kvm use those the __int128 functions. There's also an elliptic curve implementation that uses it, which makes a big difference. And soon I'll be adding an implementation of curve25519 that will make heavy use of these instructions for significant speedups as well. Generally, adding this CONFIG_ARCH key is a hint for current and future bits of code, so that they can use the faster 128-bit implementations when available. Not providing it when it's there and available would be silly. This wasn't originally added to the architecture, because when the Kconfig symbol was added, gcc didn't have sane support for it on aarch64. But now it does, so let's support it. Jason
[PATCH] arm64: support __int128 on gcc 5+
Versions of gcc prior to gcc 5 emitted a __multi3 function call when dealing with TI types, resulting in failures when trying to link to libgcc, and more generally, horrible performance. However, since gcc 5, the compiler supports actually emitting fast instructions, which means we can at long last enable this option and receive the speedups. The gcc commit that added proper Aarch64 support is: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d This commit appears to be part of the gcc 5 release. Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com> --- arch/arm64/Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 939b310913cf..1f8a0fec6998 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -53,6 +53,8 @@ KBUILD_AFLAGS += $(lseinstr) $(brokengasinst) KBUILD_CFLAGS += $(call cc-option,-mabi=lp64) KBUILD_AFLAGS += $(call cc-option,-mabi=lp64) +KBUILD_CFLAGS += $(call cc-ifversion, -ge, 0500, -DCONFIG_ARCH_SUPPORTS_INT128) + ifeq ($(CONFIG_CPU_BIG_ENDIAN), y) KBUILD_CPPFLAGS+= -mbig-endian CHECKFLAGS += -D__AARCH64EB__ -- 2.14.2
[PATCH] arm64: support __int128 on gcc 5+
Versions of gcc prior to gcc 5 emitted a __multi3 function call when dealing with TI types, resulting in failures when trying to link to libgcc, and more generally, horrible performance. However, since gcc 5, the compiler supports actually emitting fast instructions, which means we can at long last enable this option and receive the speedups. The gcc commit that added proper Aarch64 support is: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d This commit appears to be part of the gcc 5 release. Signed-off-by: Jason A. Donenfeld --- arch/arm64/Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 939b310913cf..1f8a0fec6998 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -53,6 +53,8 @@ KBUILD_AFLAGS += $(lseinstr) $(brokengasinst) KBUILD_CFLAGS += $(call cc-option,-mabi=lp64) KBUILD_AFLAGS += $(call cc-option,-mabi=lp64) +KBUILD_CFLAGS += $(call cc-ifversion, -ge, 0500, -DCONFIG_ARCH_SUPPORTS_INT128) + ifeq ($(CONFIG_CPU_BIG_ENDIAN), y) KBUILD_CPPFLAGS+= -mbig-endian CHECKFLAGS += -D__AARCH64EB__ -- 2.14.2
Re: CONFIG_ARCH_SUPPORTS_INT128 for AArch64
Hi Mark, On Tue, Oct 31, 2017 at 11:43 AM, Mark Rutlandwrote: > Judging by Documentation/kbuild/makefiles.txt, we could do something > like the below in the arm64 Makefile: > > cflags-y += $(call cc-ifversion, -ge, 0500, -DCONFIG_ARCH_SUPPORTS_INT128) Okay, in that case, I'll re-research the gcc commit that made int128 work properly on aarch64, confirm the release version, and then submit a patch doing this. Jason
Re: CONFIG_ARCH_SUPPORTS_INT128 for AArch64
Hi Mark, On Tue, Oct 31, 2017 at 11:43 AM, Mark Rutland wrote: > Judging by Documentation/kbuild/makefiles.txt, we could do something > like the below in the arm64 Makefile: > > cflags-y += $(call cc-ifversion, -ge, 0500, -DCONFIG_ARCH_SUPPORTS_INT128) Okay, in that case, I'll re-research the gcc commit that made int128 work properly on aarch64, confirm the release version, and then submit a patch doing this. Jason
CONFIG_ARCH_SUPPORTS_INT128 for AArch64
Hi folks, Older versions of gcc are horrible for int128_t on aarch64, emitting a __multi3 call, which is slow and defeats the purpose. However, more recent versions of gcc (since 5, IIRC), do the right thing and produce the proper inlined instructions. Do we have the infrastructure available to test for this, and conditionally set CONFIG_ARCH_SUPPORTS_INT128 if we do? Thanks, Jason
CONFIG_ARCH_SUPPORTS_INT128 for AArch64
Hi folks, Older versions of gcc are horrible for int128_t on aarch64, emitting a __multi3 call, which is slow and defeats the purpose. However, more recent versions of gcc (since 5, IIRC), do the right thing and produce the proper inlined instructions. Do we have the infrastructure available to test for this, and conditionally set CONFIG_ARCH_SUPPORTS_INT128 if we do? Thanks, Jason
Re: [PATCH V8 2/2] printk: hash addresses printed with %p
On Thu, Oct 26, 2017 at 4:53 AM, Tobin C. Hardingwrote: > +static bool have_filled_random_ptr_key; __read_mostly
Re: [PATCH V8 2/2] printk: hash addresses printed with %p
On Thu, Oct 26, 2017 at 4:53 AM, Tobin C. Harding wrote: > +static bool have_filled_random_ptr_key; __read_mostly
Re: [PATCH v7] printk: hash addresses printed with %p
On Thu, Oct 26, 2017 at 12:27 AM, Tobin C. Hardingwrote: > How good is unlikely()? It places that branch way at the bottom of the function so that it's less likely to pollute the icache. > It doesn't _feel_ right adding a check on every call to printk just to > check for a condition that was only true for the briefest time when the > kernel booted. But if unlikely() is good then I guess it doesn't hurt. > > I'm leaning towards the option 1, but then all those text books I read > are telling me to implement the simplest solution first then if we need > to go faster implement the more complex solution. > > This is a pretty airy fairy discussion now, but if you have an opinion > I'd love to hear it. I don't think adding a single branch there really matters that much, considering how many random other branches there are all over the printk code. However, if you really want to optimize on the little bits, and sensibly don't want to go with the overcomplex workqueue-to-statickey thing, you could consider using a plain vanilla `bool has_gotten_random_ptr_secret` instead of using the atomic_t. The reason is that there's only ever one single writer, changing from a 0 to a 1. Basically the only thing doing the atomic_t got you was a cache flush surrounding the read (and the write) so that assigning has_gotten_random_ptr_secret=true would take effect _immediately_. However, since you might not necessarily about that, going with a bool instead will save you an expensive cache flush, while potentially being a microsecond out of date the first time it's used. Seems like an okay trade off to me. (That kind of cache latency, also, is a few orders of magnitude better than using a work queue for the statickey stuff.)
Re: [PATCH v7] printk: hash addresses printed with %p
On Thu, Oct 26, 2017 at 12:27 AM, Tobin C. Harding wrote: > How good is unlikely()? It places that branch way at the bottom of the function so that it's less likely to pollute the icache. > It doesn't _feel_ right adding a check on every call to printk just to > check for a condition that was only true for the briefest time when the > kernel booted. But if unlikely() is good then I guess it doesn't hurt. > > I'm leaning towards the option 1, but then all those text books I read > are telling me to implement the simplest solution first then if we need > to go faster implement the more complex solution. > > This is a pretty airy fairy discussion now, but if you have an opinion > I'd love to hear it. I don't think adding a single branch there really matters that much, considering how many random other branches there are all over the printk code. However, if you really want to optimize on the little bits, and sensibly don't want to go with the overcomplex workqueue-to-statickey thing, you could consider using a plain vanilla `bool has_gotten_random_ptr_secret` instead of using the atomic_t. The reason is that there's only ever one single writer, changing from a 0 to a 1. Basically the only thing doing the atomic_t got you was a cache flush surrounding the read (and the write) so that assigning has_gotten_random_ptr_secret=true would take effect _immediately_. However, since you might not necessarily about that, going with a bool instead will save you an expensive cache flush, while potentially being a microsecond out of date the first time it's used. Seems like an okay trade off to me. (That kind of cache latency, also, is a few orders of magnitude better than using a work queue for the statickey stuff.)
Re: [PATCH] iscsi-target: Convert timers to use timer_setup()
On Wed, Oct 25, 2017 at 12:01 PM, Kees Cookwrote: > sess->time2retain_timer.expires = > (get_jiffies_64() + sess->sess_ops->DefaultTime2Retain * HZ); > add_timer(>time2retain_timer); > cmd->dataout_timer.expires = (get_jiffies_64() + na->dataout_timeout > * HZ); > add_timer(>dataout_timer); > np->np_login_timer.expires = (get_jiffies_64() + TA_LOGIN_TIMEOUT * > HZ); > add_timer(>np_login_timer); > + timeout.timer.expires = (get_jiffies_64() + TA_LOGIN_TIMEOUT * HZ); > + add_timer(); > conn->nopin_response_timer.expires = > (get_jiffies_64() + na->nopin_response_timeout * HZ); > add_timer(>nopin_response_timer); > conn->nopin_timer.expires = (get_jiffies_64() + na->nopin_timeout * > HZ); > add_timer(>nopin_timer); > conn->nopin_timer.expires = (get_jiffies_64() + na->nopin_timeout * > HZ); > add_timer(>nopin_timer); I assume you're trying to keep these diffs as small as possible, but I imagine at some point the above can also be combined into a single mod_timer. Also, all the goofiness with the flags stuff seems like an antique replacement for timer_pending: >conn->nopin_response_timer_flags &= ~ISCSI_TF_STOP; >conn->nopin_response_timer_flags |= ISCSI_TF_RUNNING;
Re: [PATCH] iscsi-target: Convert timers to use timer_setup()
On Wed, Oct 25, 2017 at 12:01 PM, Kees Cook wrote: > sess->time2retain_timer.expires = > (get_jiffies_64() + sess->sess_ops->DefaultTime2Retain * HZ); > add_timer(>time2retain_timer); > cmd->dataout_timer.expires = (get_jiffies_64() + na->dataout_timeout > * HZ); > add_timer(>dataout_timer); > np->np_login_timer.expires = (get_jiffies_64() + TA_LOGIN_TIMEOUT * > HZ); > add_timer(>np_login_timer); > + timeout.timer.expires = (get_jiffies_64() + TA_LOGIN_TIMEOUT * HZ); > + add_timer(); > conn->nopin_response_timer.expires = > (get_jiffies_64() + na->nopin_response_timeout * HZ); > add_timer(>nopin_response_timer); > conn->nopin_timer.expires = (get_jiffies_64() + na->nopin_timeout * > HZ); > add_timer(>nopin_timer); > conn->nopin_timer.expires = (get_jiffies_64() + na->nopin_timeout * > HZ); > add_timer(>nopin_timer); I assume you're trying to keep these diffs as small as possible, but I imagine at some point the above can also be combined into a single mod_timer. Also, all the goofiness with the flags stuff seems like an antique replacement for timer_pending: >conn->nopin_response_timer_flags &= ~ISCSI_TF_STOP; >conn->nopin_response_timer_flags |= ISCSI_TF_RUNNING;
Re: [PATCH v7] printk: hash addresses printed with %p
On Wed, Oct 25, 2017 at 5:49 AM, Tobin C. Hardingwrote: > static_branch_disable(_ptr_secret) : Doesn't sleep, just atomic read > and set and maybe a WARN_ONCE. Are you sure about that? I just looked myself, and though there is a !HAVE_JUMP_LABEL ifdef that does what you described, there's also a HAVE_JUMP_LABEL that takes a mutex, which sleeps: static_branch_disable static_key_disable cpus_read_lock percpu_down_read percpu_down_read_preempt_disable might_sleep > Now for the 'executes from process context' stuff. Er, sorry, I meant to write non-process context in my original message, which is generally where you're worried about sleeping. > If the callback mechanism is utilized (i.e print before randomness is > ready) then the call back will be executed the next time the randomness > pool gets added to So it sounds to me like this might be called in non-process context. Disaster. I realize the static_key thing was my idea in the original email, so sorry for leading you astray. But moving to do this in early_initcall wound up fixing other issues too, so all and all a net good in going this direction. Two options: you stick with static_branch, because it's cool and speed is fun, and work around all of the above with a call to queue_work so that static_branch_enable is called only from process context. Or, you give up on static_key, because it's not actually super necessary, and instead just use an atomic, and reason that using `if (unlikely(!atomic_read()))` is probably good enough. In this option, the code would be pretty much the same as v7, except you'd s/static_branch/atomic_t/, and change the helpers, etc. This is probably the more reasonable way.
Re: [PATCH v7] printk: hash addresses printed with %p
On Wed, Oct 25, 2017 at 5:49 AM, Tobin C. Harding wrote: > static_branch_disable(_ptr_secret) : Doesn't sleep, just atomic read > and set and maybe a WARN_ONCE. Are you sure about that? I just looked myself, and though there is a !HAVE_JUMP_LABEL ifdef that does what you described, there's also a HAVE_JUMP_LABEL that takes a mutex, which sleeps: static_branch_disable static_key_disable cpus_read_lock percpu_down_read percpu_down_read_preempt_disable might_sleep > Now for the 'executes from process context' stuff. Er, sorry, I meant to write non-process context in my original message, which is generally where you're worried about sleeping. > If the callback mechanism is utilized (i.e print before randomness is > ready) then the call back will be executed the next time the randomness > pool gets added to So it sounds to me like this might be called in non-process context. Disaster. I realize the static_key thing was my idea in the original email, so sorry for leading you astray. But moving to do this in early_initcall wound up fixing other issues too, so all and all a net good in going this direction. Two options: you stick with static_branch, because it's cool and speed is fun, and work around all of the above with a call to queue_work so that static_branch_enable is called only from process context. Or, you give up on static_key, because it's not actually super necessary, and instead just use an atomic, and reason that using `if (unlikely(!atomic_read()))` is probably good enough. In this option, the code would be pretty much the same as v7, except you'd s/static_branch/atomic_t/, and change the helpers, etc. This is probably the more reasonable way.