[PATCH] netlink: put module reference if dump start fails

2018-02-20 Thread Jason A. Donenfeld
Before, if cb->start() failed, the module reference would never be put,
because cb->cb_running is intentionally false at this point. Users are
generally annoyed by this because they can no longer unload modules that
leak references. Also, it may be possible to tediously wrap a reference
counter back to zero, especially since module.c still uses atomic_inc
instead of refcount_inc.

This patch expands the error path to simply call module_put if
cb->start() fails.

Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
---
This probably should be queued up for stable.

 net/netlink/af_netlink.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 2ad445c1d27c..07e8478068f0 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2308,7 +2308,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff 
*skb,
if (cb->start) {
ret = cb->start(cb);
if (ret)
-   goto error_unlock;
+   goto error_put;
}
 
nlk->cb_running = true;
@@ -2328,6 +2328,8 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff 
*skb,
 */
return -EINTR;
 
+error_put:
+   module_put(control->module);
 error_unlock:
sock_put(sk);
mutex_unlock(nlk->cb_mutex);
-- 
2.16.1



[PATCH] netlink: put module reference if dump start fails

2018-02-20 Thread Jason A. Donenfeld
Before, if cb->start() failed, the module reference would never be put,
because cb->cb_running is intentionally false at this point. Users are
generally annoyed by this because they can no longer unload modules that
leak references. Also, it may be possible to tediously wrap a reference
counter back to zero, especially since module.c still uses atomic_inc
instead of refcount_inc.

This patch expands the error path to simply call module_put if
cb->start() fails.

Signed-off-by: Jason A. Donenfeld 
---
This probably should be queued up for stable.

 net/netlink/af_netlink.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 2ad445c1d27c..07e8478068f0 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2308,7 +2308,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff 
*skb,
if (cb->start) {
ret = cb->start(cb);
if (ret)
-   goto error_unlock;
+   goto error_put;
}
 
nlk->cb_running = true;
@@ -2328,6 +2328,8 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff 
*skb,
 */
return -EINTR;
 
+error_put:
+   module_put(control->module);
 error_unlock:
sock_put(sk);
mutex_unlock(nlk->cb_mutex);
-- 
2.16.1



Re: [PATCH] random: always fill buffer in get_random_bytes_wait

2018-02-20 Thread Jason A. Donenfeld
Hi Ted,

Can you apply this?

Thanks,
Jason


Re: [PATCH] random: always fill buffer in get_random_bytes_wait

2018-02-20 Thread Jason A. Donenfeld
Hi Ted,

Can you apply this?

Thanks,
Jason


[PATCH] random: always fill buffer in get_random_bytes_wait

2018-02-04 Thread Jason A. Donenfeld
In the unfortunate event that a developer fails to check the return
value of get_random_bytes_wait, or simply wants to make a "best effort"
attempt, for whatever that's worth, it's much better to still fill the
buffer with _something_ rather than catastrophically failing in the case
of an interruption. This is both a defense in depth measure against
inevitable programming bugs, as well as a means of making the API a bit
more useful.

Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
---
 include/linux/random.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/include/linux/random.h b/include/linux/random.h
index 4024f7d9c77d..2ddf13b4281e 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -85,10 +85,8 @@ static inline unsigned long get_random_canary(void)
 static inline int get_random_bytes_wait(void *buf, int nbytes)
 {
int ret = wait_for_random_bytes();
-   if (unlikely(ret))
-   return ret;
get_random_bytes(buf, nbytes);
-   return 0;
+   return ret;
 }
 
 #define declare_get_random_var_wait(var) \
-- 
2.16.1



[PATCH] random: always fill buffer in get_random_bytes_wait

2018-02-04 Thread Jason A. Donenfeld
In the unfortunate event that a developer fails to check the return
value of get_random_bytes_wait, or simply wants to make a "best effort"
attempt, for whatever that's worth, it's much better to still fill the
buffer with _something_ rather than catastrophically failing in the case
of an interruption. This is both a defense in depth measure against
inevitable programming bugs, as well as a means of making the API a bit
more useful.

Signed-off-by: Jason A. Donenfeld 
---
 include/linux/random.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/include/linux/random.h b/include/linux/random.h
index 4024f7d9c77d..2ddf13b4281e 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -85,10 +85,8 @@ static inline unsigned long get_random_canary(void)
 static inline int get_random_bytes_wait(void *buf, int nbytes)
 {
int ret = wait_for_random_bytes();
-   if (unlikely(ret))
-   return ret;
get_random_bytes(buf, nbytes);
-   return 0;
+   return ret;
 }
 
 #define declare_get_random_var_wait(var) \
-- 
2.16.1



Re: [kernel-hardening] Re: [PATCH v2] cpu: do not leak vulnerabilities to unprivileged users

2018-01-26 Thread Jason A. Donenfeld
On Fri, Jan 26, 2018 at 5:43 PM, Alan Cox  wrote:
> a) The info is already trivially accessible via /proc/cpuinfo

No, /proc/cpuinfo shows if the CPU itself has these bugs, but doesn't
show whether or not the kernel has gone to lengths to mitigate these
bugs.

# grep -o 'bugs.*cpu_meltdown' -m1 /proc/cpuinfo
bugs: cpu_meltdown
# cat /sys/devices/system/cpu/vulnerabilities/meltdown
Mitigation: PTI

> or by measurement to an attacker

Right, so without this, an attacker has to measure. The purpose of
this patchset is to require the attacker to perform an additional
measurement. That seems worthwhile, especially if measurements are or
would ever become non-trivial to make.

> b) Some JIT and other environments need to know

Shouldn't JITs do the best they can with the environment they're in?
And for that, isn't /proc/cpuinfo enough?

Jason


Re: [kernel-hardening] Re: [PATCH v2] cpu: do not leak vulnerabilities to unprivileged users

2018-01-26 Thread Jason A. Donenfeld
On Fri, Jan 26, 2018 at 5:43 PM, Alan Cox  wrote:
> a) The info is already trivially accessible via /proc/cpuinfo

No, /proc/cpuinfo shows if the CPU itself has these bugs, but doesn't
show whether or not the kernel has gone to lengths to mitigate these
bugs.

# grep -o 'bugs.*cpu_meltdown' -m1 /proc/cpuinfo
bugs: cpu_meltdown
# cat /sys/devices/system/cpu/vulnerabilities/meltdown
Mitigation: PTI

> or by measurement to an attacker

Right, so without this, an attacker has to measure. The purpose of
this patchset is to require the attacker to perform an additional
measurement. That seems worthwhile, especially if measurements are or
would ever become non-trivial to make.

> b) Some JIT and other environments need to know

Shouldn't JITs do the best they can with the environment they're in?
And for that, isn't /proc/cpuinfo enough?

Jason


[PATCH v2] cpu: do not leak vulnerabilities to unprivileged users

2018-01-26 Thread Jason A. Donenfeld
While it's public information if the CPU in general has spectre/meltdown
bugs, it probably shouldn't be as globally obvious to all unprivileged
users whether or not the kernel is doing something to mitigate those
bugs. While an attacker can obviously probe and try, there frequently is
a trade-off attackers make of how much probing around they're willing to
do versus the certainty of an attack working, in order to reduce
detection. By making it loud and clear that the kernel _is_ vulnerable,
we're simply aiding the trade-off calculations attackers have to make
when choosing which vectors to target.

So, this patch changes the permissions to 0400 to make the attacker's
job slightly less easy. While we're at it, we clean up the leak in dmesg
too.

Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
---
v2 handles dmesg too.

 arch/x86/kernel/cpu/bugs.c | 1 -
 drivers/base/cpu.c | 6 +++---
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 390b3dc3d438..e512ae82f201 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -230,7 +230,6 @@ static void __init spectre_v2_select_mitigation(void)
}
 
spectre_v2_enabled = mode;
-   pr_info("%s\n", spectre_v2_strings[mode]);
 
/*
 * If neither SMEP or KPTI are available, there is a risk of
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index d99038487a0d..a3a8e008f957 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -531,9 +531,9 @@ ssize_t __weak cpu_show_spectre_v2(struct device *dev,
return sprintf(buf, "Not affected\n");
 }
 
-static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL);
-static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL);
-static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL);
+static DEVICE_ATTR(meltdown, 0400, cpu_show_meltdown, NULL);
+static DEVICE_ATTR(spectre_v1, 0400, cpu_show_spectre_v1, NULL);
+static DEVICE_ATTR(spectre_v2, 0400, cpu_show_spectre_v2, NULL);
 
 static struct attribute *cpu_root_vulnerabilities_attrs[] = {
_attr_meltdown.attr,
-- 
2.16.1



[PATCH v2] cpu: do not leak vulnerabilities to unprivileged users

2018-01-26 Thread Jason A. Donenfeld
While it's public information if the CPU in general has spectre/meltdown
bugs, it probably shouldn't be as globally obvious to all unprivileged
users whether or not the kernel is doing something to mitigate those
bugs. While an attacker can obviously probe and try, there frequently is
a trade-off attackers make of how much probing around they're willing to
do versus the certainty of an attack working, in order to reduce
detection. By making it loud and clear that the kernel _is_ vulnerable,
we're simply aiding the trade-off calculations attackers have to make
when choosing which vectors to target.

So, this patch changes the permissions to 0400 to make the attacker's
job slightly less easy. While we're at it, we clean up the leak in dmesg
too.

Signed-off-by: Jason A. Donenfeld 
---
v2 handles dmesg too.

 arch/x86/kernel/cpu/bugs.c | 1 -
 drivers/base/cpu.c | 6 +++---
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 390b3dc3d438..e512ae82f201 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -230,7 +230,6 @@ static void __init spectre_v2_select_mitigation(void)
}
 
spectre_v2_enabled = mode;
-   pr_info("%s\n", spectre_v2_strings[mode]);
 
/*
 * If neither SMEP or KPTI are available, there is a risk of
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index d99038487a0d..a3a8e008f957 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -531,9 +531,9 @@ ssize_t __weak cpu_show_spectre_v2(struct device *dev,
return sprintf(buf, "Not affected\n");
 }
 
-static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL);
-static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL);
-static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL);
+static DEVICE_ATTR(meltdown, 0400, cpu_show_meltdown, NULL);
+static DEVICE_ATTR(spectre_v1, 0400, cpu_show_spectre_v1, NULL);
+static DEVICE_ATTR(spectre_v2, 0400, cpu_show_spectre_v2, NULL);
 
 static struct attribute *cpu_root_vulnerabilities_attrs[] = {
_attr_meltdown.attr,
-- 
2.16.1



Re: [kernel-hardening] Re: [PATCH] cpu: do not leak vulnerabilities to unprivileged users

2018-01-25 Thread Jason A. Donenfeld
On Thu, Jan 25, 2018 at 2:34 PM, Alan Cox  wrote:
> As you observe any attacker can already trivially ascertain whether
> protection is on, so there is no point pretending file permissions
> magically stop that. In fact the information is already in cpuinfo.

Actually the other place it leaks is in dmesg, which would need to be
patched too.

My understanding about cpuinfo was that it showed whether or not the
processor family is generally vulnerable to it, independent of whether
or not the kernel has been patched. What this patch does relates to
whether or not the kernel has been patched.


Re: [kernel-hardening] Re: [PATCH] cpu: do not leak vulnerabilities to unprivileged users

2018-01-25 Thread Jason A. Donenfeld
On Thu, Jan 25, 2018 at 2:34 PM, Alan Cox  wrote:
> As you observe any attacker can already trivially ascertain whether
> protection is on, so there is no point pretending file permissions
> magically stop that. In fact the information is already in cpuinfo.

Actually the other place it leaks is in dmesg, which would need to be
patched too.

My understanding about cpuinfo was that it showed whether or not the
processor family is generally vulnerable to it, independent of whether
or not the kernel has been patched. What this patch does relates to
whether or not the kernel has been patched.


[PATCH] cpu: do not leak vulnerabilities to unprivileged users

2018-01-25 Thread Jason A. Donenfeld
While it's public information if the CPU in general has spectre/meltdown
bugs, it probably shouldn't be as globally obvious to all unprivileged
users whether or not the kernel is doing something to mitigate those
bugs. While an attacker can obviously probe and try, there frequently is
a trade-off attackers make of how much probing around they're willing to
do versus the certainty of an attack working, in order to reduce
detection. By making it loud and clear that the kernel _is_ vulnerable,
we're simply aiding the trade-off calculations attackers have to make
when choosing which vectors to target.

So, this patch changes the permissions to 0400 to make the attacker's
job slightly less easy.

Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
---
 drivers/base/cpu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index d99038487a0d..a3a8e008f957 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -531,9 +531,9 @@ ssize_t __weak cpu_show_spectre_v2(struct device *dev,
return sprintf(buf, "Not affected\n");
 }
 
-static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL);
-static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL);
-static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL);
+static DEVICE_ATTR(meltdown, 0400, cpu_show_meltdown, NULL);
+static DEVICE_ATTR(spectre_v1, 0400, cpu_show_spectre_v1, NULL);
+static DEVICE_ATTR(spectre_v2, 0400, cpu_show_spectre_v2, NULL);
 
 static struct attribute *cpu_root_vulnerabilities_attrs[] = {
_attr_meltdown.attr,
-- 
2.16.1



[PATCH] cpu: do not leak vulnerabilities to unprivileged users

2018-01-25 Thread Jason A. Donenfeld
While it's public information if the CPU in general has spectre/meltdown
bugs, it probably shouldn't be as globally obvious to all unprivileged
users whether or not the kernel is doing something to mitigate those
bugs. While an attacker can obviously probe and try, there frequently is
a trade-off attackers make of how much probing around they're willing to
do versus the certainty of an attack working, in order to reduce
detection. By making it loud and clear that the kernel _is_ vulnerable,
we're simply aiding the trade-off calculations attackers have to make
when choosing which vectors to target.

So, this patch changes the permissions to 0400 to make the attacker's
job slightly less easy.

Signed-off-by: Jason A. Donenfeld 
---
 drivers/base/cpu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index d99038487a0d..a3a8e008f957 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -531,9 +531,9 @@ ssize_t __weak cpu_show_spectre_v2(struct device *dev,
return sprintf(buf, "Not affected\n");
 }
 
-static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL);
-static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL);
-static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL);
+static DEVICE_ATTR(meltdown, 0400, cpu_show_meltdown, NULL);
+static DEVICE_ATTR(spectre_v1, 0400, cpu_show_spectre_v1, NULL);
+static DEVICE_ATTR(spectre_v2, 0400, cpu_show_spectre_v2, NULL);
 
 static struct attribute *cpu_root_vulnerabilities_attrs[] = {
_attr_meltdown.attr,
-- 
2.16.1



[PATCH] arm64: support __int128 with clang

2017-12-22 Thread Jason A. Donenfeld
Commit fb8722735f50 ("arm64: support __int128 on gcc 5+") added support
for arm64 __int128 with gcc with a version-conditional, but neglected to
enable this for clang, which in fact appears to support aarch64 __int128.
This commit therefore enables it if the compiler is clang, using the
same type of makefile conditional used elsewhere in the tree.

Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
---
 arch/arm64/Makefile | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index b481b4a7c011..16c2e8b58546 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -57,7 +57,11 @@ KBUILD_AFLAGS+= $(lseinstr) $(brokengasinst)
 KBUILD_CFLAGS  += $(call cc-option,-mabi=lp64)
 KBUILD_AFLAGS  += $(call cc-option,-mabi=lp64)
 
+ifeq ($(cc-name),clang)
+KBUILD_CFLAGS  += -DCONFIG_ARCH_SUPPORTS_INT128
+else
 KBUILD_CFLAGS  += $(call cc-ifversion, -ge, 0500, 
-DCONFIG_ARCH_SUPPORTS_INT128)
+endif
 
 ifeq ($(CONFIG_CPU_BIG_ENDIAN), y)
 KBUILD_CPPFLAGS+= -mbig-endian
-- 
2.15.1



[PATCH] arm64: support __int128 with clang

2017-12-22 Thread Jason A. Donenfeld
Commit fb8722735f50 ("arm64: support __int128 on gcc 5+") added support
for arm64 __int128 with gcc with a version-conditional, but neglected to
enable this for clang, which in fact appears to support aarch64 __int128.
This commit therefore enables it if the compiler is clang, using the
same type of makefile conditional used elsewhere in the tree.

Signed-off-by: Jason A. Donenfeld 
---
 arch/arm64/Makefile | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index b481b4a7c011..16c2e8b58546 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -57,7 +57,11 @@ KBUILD_AFLAGS+= $(lseinstr) $(brokengasinst)
 KBUILD_CFLAGS  += $(call cc-option,-mabi=lp64)
 KBUILD_AFLAGS  += $(call cc-option,-mabi=lp64)
 
+ifeq ($(cc-name),clang)
+KBUILD_CFLAGS  += -DCONFIG_ARCH_SUPPORTS_INT128
+else
 KBUILD_CFLAGS  += $(call cc-ifversion, -ge, 0500, 
-DCONFIG_ARCH_SUPPORTS_INT128)
+endif
 
 ifeq ($(CONFIG_CPU_BIG_ENDIAN), y)
 KBUILD_CPPFLAGS+= -mbig-endian
-- 
2.15.1



[PATCH] arch/um: compile with modern headers

2017-12-13 Thread Jason A. Donenfeld
Recent libcs have gotten a bit more strict, so we actually need to
include the right headers and use the right types. This enables UML to
compile again.

Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
Cc: sta...@vger.kernel.org
---
Rebased on linux-next, by request.

 arch/um/os-Linux/file.c   | 1 +
 arch/um/os-Linux/signal.c | 1 +
 arch/x86/um/stub_segv.c   | 1 +
 3 files changed, 3 insertions(+)

diff --git a/arch/um/os-Linux/file.c b/arch/um/os-Linux/file.c
index 2db18cbbb0ea..c0197097c86e 100644
--- a/arch/um/os-Linux/file.c
+++ b/arch/um/os-Linux/file.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c
index a5c0c909c48b..bf0acb8aad8b 100644
--- a/arch/um/os-Linux/signal.c
+++ b/arch/um/os-Linux/signal.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void (*sig_info[NSIG])(int, struct siginfo *, struct uml_pt_regs *) = {
[SIGTRAP]   = relay_signal,
diff --git a/arch/x86/um/stub_segv.c b/arch/x86/um/stub_segv.c
index fd6825537b97..27361cbb7ca9 100644
--- a/arch/x86/um/stub_segv.c
+++ b/arch/x86/um/stub_segv.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void __attribute__ ((__section__ (".__syscall_stub")))
 stub_segv_handler(int sig, siginfo_t *info, void *p)
-- 
2.15.1



[PATCH] arch/um: compile with modern headers

2017-12-13 Thread Jason A. Donenfeld
Recent libcs have gotten a bit more strict, so we actually need to
include the right headers and use the right types. This enables UML to
compile again.

Signed-off-by: Jason A. Donenfeld 
Cc: sta...@vger.kernel.org
---
Rebased on linux-next, by request.

 arch/um/os-Linux/file.c   | 1 +
 arch/um/os-Linux/signal.c | 1 +
 arch/x86/um/stub_segv.c   | 1 +
 3 files changed, 3 insertions(+)

diff --git a/arch/um/os-Linux/file.c b/arch/um/os-Linux/file.c
index 2db18cbbb0ea..c0197097c86e 100644
--- a/arch/um/os-Linux/file.c
+++ b/arch/um/os-Linux/file.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c
index a5c0c909c48b..bf0acb8aad8b 100644
--- a/arch/um/os-Linux/signal.c
+++ b/arch/um/os-Linux/signal.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void (*sig_info[NSIG])(int, struct siginfo *, struct uml_pt_regs *) = {
[SIGTRAP]   = relay_signal,
diff --git a/arch/x86/um/stub_segv.c b/arch/x86/um/stub_segv.c
index fd6825537b97..27361cbb7ca9 100644
--- a/arch/x86/um/stub_segv.c
+++ b/arch/x86/um/stub_segv.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void __attribute__ ((__section__ (".__syscall_stub")))
 stub_segv_handler(int sig, siginfo_t *info, void *p)
-- 
2.15.1



[PATCH] arch/um: compile with modern headers

2017-12-13 Thread Jason A. Donenfeld
Recent libcs have gotten a bit more strict, so we actually need to
include the right headers and use the right types. This enables UML to
compile again.

Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
Cc: sta...@vger.kernel.org
---
 arch/um/os-Linux/file.c   | 1 +
 arch/um/os-Linux/signal.c | 3 ++-
 arch/x86/um/stub_segv.c   | 3 ++-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/um/os-Linux/file.c b/arch/um/os-Linux/file.c
index 2db18cbbb0ea..c0197097c86e 100644
--- a/arch/um/os-Linux/file.c
+++ b/arch/um/os-Linux/file.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c
index a86d7cc2c2d8..bf0acb8aad8b 100644
--- a/arch/um/os-Linux/signal.c
+++ b/arch/um/os-Linux/signal.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void (*sig_info[NSIG])(int, struct siginfo *, struct uml_pt_regs *) = {
[SIGTRAP]   = relay_signal,
@@ -159,7 +160,7 @@ static void (*handlers[_NSIG])(int sig, struct siginfo *si, 
mcontext_t *mc) = {
 
 static void hard_handler(int sig, siginfo_t *si, void *p)
 {
-   struct ucontext *uc = p;
+   ucontext_t *uc = p;
mcontext_t *mc = >uc_mcontext;
unsigned long pending = 1UL << sig;
 
diff --git a/arch/x86/um/stub_segv.c b/arch/x86/um/stub_segv.c
index 1518d2805ae8..27361cbb7ca9 100644
--- a/arch/x86/um/stub_segv.c
+++ b/arch/x86/um/stub_segv.c
@@ -6,11 +6,12 @@
 #include 
 #include 
 #include 
+#include 
 
 void __attribute__ ((__section__ (".__syscall_stub")))
 stub_segv_handler(int sig, siginfo_t *info, void *p)
 {
-   struct ucontext *uc = p;
+   ucontext_t *uc = p;
 
GET_FAULTINFO_FROM_MC(*((struct faultinfo *) STUB_DATA),
  >uc_mcontext);
-- 
2.15.1



[PATCH] arch/um: compile with modern headers

2017-12-13 Thread Jason A. Donenfeld
Recent libcs have gotten a bit more strict, so we actually need to
include the right headers and use the right types. This enables UML to
compile again.

Signed-off-by: Jason A. Donenfeld 
Cc: sta...@vger.kernel.org
---
 arch/um/os-Linux/file.c   | 1 +
 arch/um/os-Linux/signal.c | 3 ++-
 arch/x86/um/stub_segv.c   | 3 ++-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/um/os-Linux/file.c b/arch/um/os-Linux/file.c
index 2db18cbbb0ea..c0197097c86e 100644
--- a/arch/um/os-Linux/file.c
+++ b/arch/um/os-Linux/file.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c
index a86d7cc2c2d8..bf0acb8aad8b 100644
--- a/arch/um/os-Linux/signal.c
+++ b/arch/um/os-Linux/signal.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void (*sig_info[NSIG])(int, struct siginfo *, struct uml_pt_regs *) = {
[SIGTRAP]   = relay_signal,
@@ -159,7 +160,7 @@ static void (*handlers[_NSIG])(int sig, struct siginfo *si, 
mcontext_t *mc) = {
 
 static void hard_handler(int sig, siginfo_t *si, void *p)
 {
-   struct ucontext *uc = p;
+   ucontext_t *uc = p;
mcontext_t *mc = >uc_mcontext;
unsigned long pending = 1UL << sig;
 
diff --git a/arch/x86/um/stub_segv.c b/arch/x86/um/stub_segv.c
index 1518d2805ae8..27361cbb7ca9 100644
--- a/arch/x86/um/stub_segv.c
+++ b/arch/x86/um/stub_segv.c
@@ -6,11 +6,12 @@
 #include 
 #include 
 #include 
+#include 
 
 void __attribute__ ((__section__ (".__syscall_stub")))
 stub_segv_handler(int sig, siginfo_t *info, void *p)
 {
-   struct ucontext *uc = p;
+   ucontext_t *uc = p;
 
GET_FAULTINFO_FROM_MC(*((struct faultinfo *) STUB_DATA),
  >uc_mcontext);
-- 
2.15.1



Re: WireGuard Upstreaming Roadmap (November 2017)

2017-12-08 Thread Jason A. Donenfeld
Hi Dave,

On Fri, Dec 8, 2017 at 4:38 PM, David Miller  wrote:
> Sorry, you cannot force the discussion of a feature which will be submitted
> upstream to occur on a private mailing list.
>
> It is _ABSOLUTELY_ appropriate to discss this on netdev since it is the
> netdev community which must consider issues like this when looking at
> whether to accept WireGuard upstream.
>
> Jason, this action and response was entirely inappropriate, and please
> I'd like you to reply properly to questions about your feature here.

Whoops, okay! Very sorry. I'm actually kind of happy to hear that. I
had assumed that you'd be annoyed if WireGuard crypto discussion
spewed over into netdev adding even more message volume there for
something perhaps not directly relevant. But in fact, you're
interested and find it important to discuss there. So, good news. And
sorry for trying to shew it away before. I'll copy and paste the
response I had made on the other list:

> This is covered in the paper:
> https://www.wireguard.com/papers/wireguard.pdf
>
> The basic answer is that WireGuard has message type identifiers, and
> the handshake also hashes into it an identifier of the primitives
> used. If there's ever a problem with those primitives chosen, it will
> be possible to introduce new message type identifiers, if that kind of
> "support everything even the broken garbage" approach is desired by
> misguided people. However, a better approach, of course, is to keep
> your secure network separate from your insecure network, and to not
> allow insecure nodes on secure segments; when you mix the two,
> disaster tends to strike. So, in other words, both approaches to "upgrading"
> are possible, in this fantasy wireguardalypse. Take note, though, that
> neither one of these approaches (support new and retain old protocol
> too for old nodes, or only support new) are "agile" or are anything at
> all like the 90s "cipher agility" -- the user still is not permitted
> to "choose" ciphers.

Regards,
Jason


Re: WireGuard Upstreaming Roadmap (November 2017)

2017-12-08 Thread Jason A. Donenfeld
Hi Dave,

On Fri, Dec 8, 2017 at 4:38 PM, David Miller  wrote:
> Sorry, you cannot force the discussion of a feature which will be submitted
> upstream to occur on a private mailing list.
>
> It is _ABSOLUTELY_ appropriate to discss this on netdev since it is the
> netdev community which must consider issues like this when looking at
> whether to accept WireGuard upstream.
>
> Jason, this action and response was entirely inappropriate, and please
> I'd like you to reply properly to questions about your feature here.

Whoops, okay! Very sorry. I'm actually kind of happy to hear that. I
had assumed that you'd be annoyed if WireGuard crypto discussion
spewed over into netdev adding even more message volume there for
something perhaps not directly relevant. But in fact, you're
interested and find it important to discuss there. So, good news. And
sorry for trying to shew it away before. I'll copy and paste the
response I had made on the other list:

> This is covered in the paper:
> https://www.wireguard.com/papers/wireguard.pdf
>
> The basic answer is that WireGuard has message type identifiers, and
> the handshake also hashes into it an identifier of the primitives
> used. If there's ever a problem with those primitives chosen, it will
> be possible to introduce new message type identifiers, if that kind of
> "support everything even the broken garbage" approach is desired by
> misguided people. However, a better approach, of course, is to keep
> your secure network separate from your insecure network, and to not
> allow insecure nodes on secure segments; when you mix the two,
> disaster tends to strike. So, in other words, both approaches to "upgrading"
> are possible, in this fantasy wireguardalypse. Take note, though, that
> neither one of these approaches (support new and retain old protocol
> too for old nodes, or only support new) are "agile" or are anything at
> all like the 90s "cipher agility" -- the user still is not permitted
> to "choose" ciphers.

Regards,
Jason


Re: WireGuard Upstreaming Roadmap (November 2017)

2017-12-07 Thread Jason A. Donenfeld
On Thu, Dec 7, 2017 at 11:22 AM, Stefan Tatschner
 wrote:
> I have a question which is related to the involved crypto. As far as I
> have understood the protocol and the concept of wireguard
> What's your opinion on this?

This thread has been picked up on the WireGuard mailing list, not here.

Since this concerns the interworkings of the protocol and cryptography
as a whole, as opposed to implementation details of Linux, please do
not send these inquiries to LKML. Additionally, please start new
threads for new topics in the future, rather than hijacking a roadmap
thread.

Look for my answer on the other mailing list. I'll CC you too.

Regards,
Jason


Re: WireGuard Upstreaming Roadmap (November 2017)

2017-12-07 Thread Jason A. Donenfeld
On Thu, Dec 7, 2017 at 11:22 AM, Stefan Tatschner
 wrote:
> I have a question which is related to the involved crypto. As far as I
> have understood the protocol and the concept of wireguard
> What's your opinion on this?

This thread has been picked up on the WireGuard mailing list, not here.

Since this concerns the interworkings of the protocol and cryptography
as a whole, as opposed to implementation details of Linux, please do
not send these inquiries to LKML. Additionally, please start new
threads for new topics in the future, rather than hijacking a roadmap
thread.

Look for my answer on the other mailing list. I'll CC you too.

Regards,
Jason


Essential get_user fix missing from 3.10 aarch64

2017-12-01 Thread Jason A. Donenfeld
Hi stable/arm/Willy,

1f65c13efef69b6dc908e588f91a133641d8475c is an important commit,
because it involves evaluation of pointers from userspace. I'm running
into issues with RNDADDTOENTCNT reading bogus values, because p is
incremented twice as much as it should in this random.c block:

case RNDADDENTROPY:
   if (!capable(CAP_SYS_ADMIN))
   return -EPERM;
   if (get_user(ent_count, p++))
   return -EFAULT;
   if (ent_count < 0)
   return -EINVAL;
   if (get_user(size, p++))
   return -EFAULT;
   retval = write_pool(_pool, (const char __user *)p,
   size);

That seems reasonable, but on aarch64, get_user is defined as:

#define get_user(x, ptr)\
({  \
   might_sleep();  \
   access_ok(VERIFY_READ, (ptr), sizeof(*(ptr))) ? \
   __get_user((x), (ptr)) :\
   ((x) = 0, -EFAULT); \
})

Notice the multiple use of ptr.

I thought I had found something breathtakingly bad, until I realized
that it was already fixed in 2013 by Takahiro. It just wasn't marked
for stable.

Not sure if there's ever going to be another stable 3.10 release, but
if so, this would be an important one to backport.

Regards,
Jason


Essential get_user fix missing from 3.10 aarch64

2017-12-01 Thread Jason A. Donenfeld
Hi stable/arm/Willy,

1f65c13efef69b6dc908e588f91a133641d8475c is an important commit,
because it involves evaluation of pointers from userspace. I'm running
into issues with RNDADDTOENTCNT reading bogus values, because p is
incremented twice as much as it should in this random.c block:

case RNDADDENTROPY:
   if (!capable(CAP_SYS_ADMIN))
   return -EPERM;
   if (get_user(ent_count, p++))
   return -EFAULT;
   if (ent_count < 0)
   return -EINVAL;
   if (get_user(size, p++))
   return -EFAULT;
   retval = write_pool(_pool, (const char __user *)p,
   size);

That seems reasonable, but on aarch64, get_user is defined as:

#define get_user(x, ptr)\
({  \
   might_sleep();  \
   access_ok(VERIFY_READ, (ptr), sizeof(*(ptr))) ? \
   __get_user((x), (ptr)) :\
   ((x) = 0, -EFAULT); \
})

Notice the multiple use of ptr.

I thought I had found something breathtakingly bad, until I realized
that it was already fixed in 2013 by Takahiro. It just wasn't marked
for stable.

Not sure if there's ever going to be another stable 3.10 release, but
if so, this would be an important one to backport.

Regards,
Jason


[PATCH v2] arm: detect buggy binutils when in thumb2 mode

2017-11-23 Thread Jason A. Donenfeld
On older versions of binutils, \sym points to an aligned address. On
newer versions of binutils, \sym sometimes points to the unaligned thumb
address in certain circumstances. In order to homogenize this behavior,
rather than adding 1, we could simply OR in 1, so that already unaligned
instructions don't change. While that works, the downside is that we have
to add an `orr` instruction to a fast path. The assembler can't do this at
assemble time via "|1" because "invalid operands (.text and *ABS* sections)
for `|'". A better solution would be to have consistent binutils behavior,
but that ship has sailed.

So, this commit adds a detection mechanism, which began as a small thing
from Russell King that I then rewrote to use pure bash instead of
shelling out, so that it doesn't slow down the build process. The detection
mechanism _could_ be used to modify the assembly we generate, but for now
it's just being used to catch buggy binutils and abort the build process in
that case.

The rest of this commit message contains all of the relevant information
about the boot bug when compiled in thumb2 mode.

My tests concerned these versions:
broken: GNU ld (Gentoo 2.29.1 p3) 2.29.1
working: GNU ld (GNU Binutils for Ubuntu) 2.26.1

These produced the following code:
--- broken  2017-11-21 17:44:14.523416082 +0100
+++ working 2017-11-21 17:44:44.548461234 +0100
@@ -133,7 +133,7 @@
160:  f01a 0ff0   tst.w   sl, #240; 0xf0
164:  d111bne.n   18a <__sys_trace>
166:  f5b7 7fc8   cmp.w   r7, #400; 0x190
- 16a:  f2af 1e6a   subwlr, pc, #362; 0x16a
+ 16a:  f2af 1e6b   subwlr, pc, #363; 0x16b
16e:  bf38it  cc
170:  f858 f027   ldrcc.w pc, [r8, r7, lsl #2]
174:  a902add r1, sp, #8

The differing instruction corresponds with this actual line in
arch/arm/kernel/entry-common.S:
  badrlr, ret_fast_syscall@ return address

Running the broken kernel results in a runtime OOPS with:
PC is at ret_fast_syscall+0x4/0x52
LR is at ret_fast_syscall+0x2/0x52

The disassembly of that function for the crashing kernel is:
.text: ret_fast_syscall; CODE XREF: 
sys_syscall+1C↓j
.text: CPSID   I   ; jumptable 0840 
cases 15,18-376
.text:0002
.text:0002 loc_2   ; DATA XREF: 
sys_syscall-6BA↓o
.text:0002 LDR.W   R2, [R9,#8]
.text:0006 CMP.W   R2, #0xBF00

Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
Cc: Russell King <li...@armlinux.org.uk>
Cc: ni...@redhat.com
Cc: sta...@vger.kernel.org
---
Had the file mode wrong on the submission from a second ago, sorry about
that.

 arch/arm/Makefile|  7 +--
 arch/arm/tools/Makefile  |  5 -
 arch/arm/tools/toolcheck | 44 
 3 files changed, 53 insertions(+), 3 deletions(-)
 create mode 100755 arch/arm/tools/toolcheck

diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 80351e505fd5..bd4e248a7f8f 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -319,16 +319,19 @@ all:  $(notdir $(KBUILD_IMAGE)) $(KBUILD_DTBS)
 archheaders:
$(Q)$(MAKE) $(build)=arch/arm/tools uapi
 
-archprepare:
+archprepare: toolcheck
$(Q)$(MAKE) $(build)=arch/arm/tools kapi
 
+toolcheck:
+   $(Q)$(MAKE) $(build)=arch/arm/tools $@
+
 # Convert bzImage to zImage
 bzImage: zImage
 
 BOOT_TARGETS   = zImage Image xipImage bootpImage uImage
 INSTALL_TARGETS= zinstall uinstall install
 
-PHONY += bzImage $(BOOT_TARGETS) $(INSTALL_TARGETS)
+PHONY += bzImage $(BOOT_TARGETS) $(INSTALL_TARGETS) toolcheck
 
 bootpImage uImage: zImage
 zImage: Image
diff --git a/arch/arm/tools/Makefile b/arch/arm/tools/Makefile
index ddb89a7db36f..0a283756f1c5 100644
--- a/arch/arm/tools/Makefile
+++ b/arch/arm/tools/Makefile
@@ -23,12 +23,15 @@ uapi-hdrs-y += $(uapi)/unistd-eabi.h
 
 targets += $(addprefix ../../../,$(gen-y) $(kapi-hdrs-y) $(uapi-hdrs-y))
 
-PHONY += kapi uapi
+PHONY += kapi uapi toolcheck
 
 kapi:  $(kapi-hdrs-y) $(gen-y)
 
 uapi:  $(uapi-hdrs-y)
 
+toolcheck:
+   @'$(srctree)/$(src)/toolcheck'
+
 # Create output directory if not already present
 _dummy := $(shell [ -d '$(kapi)' ] || mkdir -p '$(kapi)') \
   $(shell [ -d '$(uapi)' ] || mkdir -p '$(uapi)')
diff --git a/arch/arm/tools/toolcheck b/arch/arm/tools/toolcheck
new file mode 100755
index ..04fc44b750d2
--- /dev/null
+++ b/arch/arm/tools/toolcheck
@@ -0,0 +1,44 @@
+#!/bin/bash
+#
+# Copyright 2017 Jason A. Donenfeld <ja...@zx2c4.com>. All Rights Reserved.
+#
+
+set -e
+
+cleanup() {
+   [[ ! -d $temp ]] || rm -rf "$temp"
+   exit
+}
+trap cleanup INT TERM EXIT
+temp="$(mktemp -d)"
+
+check_thumb2_address() {
+   local disassembly
+
+   $CC $KBUILD_AFLAGS -o "$temp/a.out" -c -x

[PATCH v2] arm: detect buggy binutils when in thumb2 mode

2017-11-23 Thread Jason A. Donenfeld
On older versions of binutils, \sym points to an aligned address. On
newer versions of binutils, \sym sometimes points to the unaligned thumb
address in certain circumstances. In order to homogenize this behavior,
rather than adding 1, we could simply OR in 1, so that already unaligned
instructions don't change. While that works, the downside is that we have
to add an `orr` instruction to a fast path. The assembler can't do this at
assemble time via "|1" because "invalid operands (.text and *ABS* sections)
for `|'". A better solution would be to have consistent binutils behavior,
but that ship has sailed.

So, this commit adds a detection mechanism, which began as a small thing
from Russell King that I then rewrote to use pure bash instead of
shelling out, so that it doesn't slow down the build process. The detection
mechanism _could_ be used to modify the assembly we generate, but for now
it's just being used to catch buggy binutils and abort the build process in
that case.

The rest of this commit message contains all of the relevant information
about the boot bug when compiled in thumb2 mode.

My tests concerned these versions:
broken: GNU ld (Gentoo 2.29.1 p3) 2.29.1
working: GNU ld (GNU Binutils for Ubuntu) 2.26.1

These produced the following code:
--- broken  2017-11-21 17:44:14.523416082 +0100
+++ working 2017-11-21 17:44:44.548461234 +0100
@@ -133,7 +133,7 @@
160:  f01a 0ff0   tst.w   sl, #240; 0xf0
164:  d111bne.n   18a <__sys_trace>
166:  f5b7 7fc8   cmp.w   r7, #400; 0x190
- 16a:  f2af 1e6a   subwlr, pc, #362; 0x16a
+ 16a:  f2af 1e6b   subwlr, pc, #363; 0x16b
16e:  bf38it  cc
170:  f858 f027   ldrcc.w pc, [r8, r7, lsl #2]
174:  a902add r1, sp, #8

The differing instruction corresponds with this actual line in
arch/arm/kernel/entry-common.S:
  badrlr, ret_fast_syscall@ return address

Running the broken kernel results in a runtime OOPS with:
PC is at ret_fast_syscall+0x4/0x52
LR is at ret_fast_syscall+0x2/0x52

The disassembly of that function for the crashing kernel is:
.text: ret_fast_syscall; CODE XREF: 
sys_syscall+1C↓j
.text: CPSID   I   ; jumptable 0840 
cases 15,18-376
.text:0002
.text:0002 loc_2   ; DATA XREF: 
sys_syscall-6BA↓o
.text:0002 LDR.W   R2, [R9,#8]
.text:0006 CMP.W   R2, #0xBF00

Signed-off-by: Jason A. Donenfeld 
Cc: Russell King 
Cc: ni...@redhat.com
Cc: sta...@vger.kernel.org
---
Had the file mode wrong on the submission from a second ago, sorry about
that.

 arch/arm/Makefile|  7 +--
 arch/arm/tools/Makefile  |  5 -
 arch/arm/tools/toolcheck | 44 
 3 files changed, 53 insertions(+), 3 deletions(-)
 create mode 100755 arch/arm/tools/toolcheck

diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 80351e505fd5..bd4e248a7f8f 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -319,16 +319,19 @@ all:  $(notdir $(KBUILD_IMAGE)) $(KBUILD_DTBS)
 archheaders:
$(Q)$(MAKE) $(build)=arch/arm/tools uapi
 
-archprepare:
+archprepare: toolcheck
$(Q)$(MAKE) $(build)=arch/arm/tools kapi
 
+toolcheck:
+   $(Q)$(MAKE) $(build)=arch/arm/tools $@
+
 # Convert bzImage to zImage
 bzImage: zImage
 
 BOOT_TARGETS   = zImage Image xipImage bootpImage uImage
 INSTALL_TARGETS= zinstall uinstall install
 
-PHONY += bzImage $(BOOT_TARGETS) $(INSTALL_TARGETS)
+PHONY += bzImage $(BOOT_TARGETS) $(INSTALL_TARGETS) toolcheck
 
 bootpImage uImage: zImage
 zImage: Image
diff --git a/arch/arm/tools/Makefile b/arch/arm/tools/Makefile
index ddb89a7db36f..0a283756f1c5 100644
--- a/arch/arm/tools/Makefile
+++ b/arch/arm/tools/Makefile
@@ -23,12 +23,15 @@ uapi-hdrs-y += $(uapi)/unistd-eabi.h
 
 targets += $(addprefix ../../../,$(gen-y) $(kapi-hdrs-y) $(uapi-hdrs-y))
 
-PHONY += kapi uapi
+PHONY += kapi uapi toolcheck
 
 kapi:  $(kapi-hdrs-y) $(gen-y)
 
 uapi:  $(uapi-hdrs-y)
 
+toolcheck:
+   @'$(srctree)/$(src)/toolcheck'
+
 # Create output directory if not already present
 _dummy := $(shell [ -d '$(kapi)' ] || mkdir -p '$(kapi)') \
   $(shell [ -d '$(uapi)' ] || mkdir -p '$(uapi)')
diff --git a/arch/arm/tools/toolcheck b/arch/arm/tools/toolcheck
new file mode 100755
index ..04fc44b750d2
--- /dev/null
+++ b/arch/arm/tools/toolcheck
@@ -0,0 +1,44 @@
+#!/bin/bash
+#
+# Copyright 2017 Jason A. Donenfeld . All Rights Reserved.
+#
+
+set -e
+
+cleanup() {
+   [[ ! -d $temp ]] || rm -rf "$temp"
+   exit
+}
+trap cleanup INT TERM EXIT
+temp="$(mktemp -d)"
+
+check_thumb2_address() {
+   local disassembly
+
+   $CC $KBUILD_AFLAGS -o "$temp/a.out" -c -xassembler - <<-_EOF
+   .syntax unified
+   .thumb
+

[PATCH] arm: detect buggy binutils when in thumb2 mode

2017-11-23 Thread Jason A. Donenfeld
On older versions of binutils, \sym points to an aligned address. On
newer versions of binutils, \sym sometimes points to the unaligned thumb
address in certain circumstances. In order to homogenize this behavior,
rather than adding 1, we could simply OR in 1, so that already unaligned
instructions don't change. While that works, the downside is that we have
to add an `orr` instruction to a fast path. The assembler can't do this at
assemble time via "|1" because "invalid operands (.text and *ABS* sections)
for `|'". A better solution would be to have consistent binutils behavior,
but that ship has sailed.

So, this commit adds a detection mechanism, which began as a small thing
from Russell King that I then rewrote to use pure bash instead of
shelling out, so that it doesn't slow down the build process. The detection
mechanism _could_ be used to modify the assembly we generate, but for now
it's just being used to catch buggy binutils and abort the build process in
that case.

The rest of this commit message contains all of the relevant information
about the boot bug when compiled in thumb2 mode.

My tests concerned these versions:
broken: GNU ld (Gentoo 2.29.1 p3) 2.29.1
working: GNU ld (GNU Binutils for Ubuntu) 2.26.1

These produced the following code:
--- broken  2017-11-21 17:44:14.523416082 +0100
+++ working 2017-11-21 17:44:44.548461234 +0100
@@ -133,7 +133,7 @@
160:  f01a 0ff0   tst.w   sl, #240; 0xf0
164:  d111bne.n   18a <__sys_trace>
166:  f5b7 7fc8   cmp.w   r7, #400; 0x190
- 16a:  f2af 1e6a   subwlr, pc, #362; 0x16a
+ 16a:  f2af 1e6b   subwlr, pc, #363; 0x16b
16e:  bf38it  cc
170:  f858 f027   ldrcc.w pc, [r8, r7, lsl #2]
174:  a902add r1, sp, #8

The differing instruction corresponds with this actual line in
arch/arm/kernel/entry-common.S:
  badrlr, ret_fast_syscall@ return address

Running the broken kernel results in a runtime OOPS with:
PC is at ret_fast_syscall+0x4/0x52
LR is at ret_fast_syscall+0x2/0x52

The disassembly of that function for the crashing kernel is:
.text: ret_fast_syscall; CODE XREF: 
sys_syscall+1C↓j
.text: CPSID   I   ; jumptable 0840 
cases 15,18-376
.text:0002
.text:0002 loc_2   ; DATA XREF: 
sys_syscall-6BA↓o
.text:0002 LDR.W   R2, [R9,#8]
.text:0006 CMP.W   R2, #0xBF00

Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
Cc: Russell King <li...@armlinux.org.uk>
Cc: ni...@redhat.com
Cc: sta...@vger.kernel.org
---
 arch/arm/Makefile|  7 +--
 arch/arm/tools/Makefile  |  5 -
 arch/arm/tools/toolcheck | 44 
 3 files changed, 53 insertions(+), 3 deletions(-)
 create mode 100644 arch/arm/tools/toolcheck

diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 80351e505fd5..bd4e248a7f8f 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -319,16 +319,19 @@ all:  $(notdir $(KBUILD_IMAGE)) $(KBUILD_DTBS)
 archheaders:
$(Q)$(MAKE) $(build)=arch/arm/tools uapi
 
-archprepare:
+archprepare: toolcheck
$(Q)$(MAKE) $(build)=arch/arm/tools kapi
 
+toolcheck:
+   $(Q)$(MAKE) $(build)=arch/arm/tools $@
+
 # Convert bzImage to zImage
 bzImage: zImage
 
 BOOT_TARGETS   = zImage Image xipImage bootpImage uImage
 INSTALL_TARGETS= zinstall uinstall install
 
-PHONY += bzImage $(BOOT_TARGETS) $(INSTALL_TARGETS)
+PHONY += bzImage $(BOOT_TARGETS) $(INSTALL_TARGETS) toolcheck
 
 bootpImage uImage: zImage
 zImage: Image
diff --git a/arch/arm/tools/Makefile b/arch/arm/tools/Makefile
index ddb89a7db36f..0a283756f1c5 100644
--- a/arch/arm/tools/Makefile
+++ b/arch/arm/tools/Makefile
@@ -23,12 +23,15 @@ uapi-hdrs-y += $(uapi)/unistd-eabi.h
 
 targets += $(addprefix ../../../,$(gen-y) $(kapi-hdrs-y) $(uapi-hdrs-y))
 
-PHONY += kapi uapi
+PHONY += kapi uapi toolcheck
 
 kapi:  $(kapi-hdrs-y) $(gen-y)
 
 uapi:  $(uapi-hdrs-y)
 
+toolcheck:
+   @'$(srctree)/$(src)/toolcheck'
+
 # Create output directory if not already present
 _dummy := $(shell [ -d '$(kapi)' ] || mkdir -p '$(kapi)') \
   $(shell [ -d '$(uapi)' ] || mkdir -p '$(uapi)')
diff --git a/arch/arm/tools/toolcheck b/arch/arm/tools/toolcheck
new file mode 100644
index ..04fc44b750d2
--- /dev/null
+++ b/arch/arm/tools/toolcheck
@@ -0,0 +1,44 @@
+#!/bin/bash
+#
+# Copyright 2017 Jason A. Donenfeld <ja...@zx2c4.com>. All Rights Reserved.
+#
+
+set -e
+
+cleanup() {
+   [[ ! -d $temp ]] || rm -rf "$temp"
+   exit
+}
+trap cleanup INT TERM EXIT
+temp="$(mktemp -d)"
+
+check_thumb2_address() {
+   local disassembly
+
+   $CC $KBUILD_AFLAGS -o "$temp/a.out" -c -xassembler - <<-_EOF
+   .syntax unified
+   .thu

[PATCH] arm: detect buggy binutils when in thumb2 mode

2017-11-23 Thread Jason A. Donenfeld
On older versions of binutils, \sym points to an aligned address. On
newer versions of binutils, \sym sometimes points to the unaligned thumb
address in certain circumstances. In order to homogenize this behavior,
rather than adding 1, we could simply OR in 1, so that already unaligned
instructions don't change. While that works, the downside is that we have
to add an `orr` instruction to a fast path. The assembler can't do this at
assemble time via "|1" because "invalid operands (.text and *ABS* sections)
for `|'". A better solution would be to have consistent binutils behavior,
but that ship has sailed.

So, this commit adds a detection mechanism, which began as a small thing
from Russell King that I then rewrote to use pure bash instead of
shelling out, so that it doesn't slow down the build process. The detection
mechanism _could_ be used to modify the assembly we generate, but for now
it's just being used to catch buggy binutils and abort the build process in
that case.

The rest of this commit message contains all of the relevant information
about the boot bug when compiled in thumb2 mode.

My tests concerned these versions:
broken: GNU ld (Gentoo 2.29.1 p3) 2.29.1
working: GNU ld (GNU Binutils for Ubuntu) 2.26.1

These produced the following code:
--- broken  2017-11-21 17:44:14.523416082 +0100
+++ working 2017-11-21 17:44:44.548461234 +0100
@@ -133,7 +133,7 @@
160:  f01a 0ff0   tst.w   sl, #240; 0xf0
164:  d111bne.n   18a <__sys_trace>
166:  f5b7 7fc8   cmp.w   r7, #400; 0x190
- 16a:  f2af 1e6a   subwlr, pc, #362; 0x16a
+ 16a:  f2af 1e6b   subwlr, pc, #363; 0x16b
16e:  bf38it  cc
170:  f858 f027   ldrcc.w pc, [r8, r7, lsl #2]
174:  a902add r1, sp, #8

The differing instruction corresponds with this actual line in
arch/arm/kernel/entry-common.S:
  badrlr, ret_fast_syscall@ return address

Running the broken kernel results in a runtime OOPS with:
PC is at ret_fast_syscall+0x4/0x52
LR is at ret_fast_syscall+0x2/0x52

The disassembly of that function for the crashing kernel is:
.text: ret_fast_syscall; CODE XREF: 
sys_syscall+1C↓j
.text: CPSID   I   ; jumptable 0840 
cases 15,18-376
.text:0002
.text:0002 loc_2   ; DATA XREF: 
sys_syscall-6BA↓o
.text:0002 LDR.W   R2, [R9,#8]
.text:0006 CMP.W   R2, #0xBF00

Signed-off-by: Jason A. Donenfeld 
Cc: Russell King 
Cc: ni...@redhat.com
Cc: sta...@vger.kernel.org
---
 arch/arm/Makefile|  7 +--
 arch/arm/tools/Makefile  |  5 -
 arch/arm/tools/toolcheck | 44 
 3 files changed, 53 insertions(+), 3 deletions(-)
 create mode 100644 arch/arm/tools/toolcheck

diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 80351e505fd5..bd4e248a7f8f 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -319,16 +319,19 @@ all:  $(notdir $(KBUILD_IMAGE)) $(KBUILD_DTBS)
 archheaders:
$(Q)$(MAKE) $(build)=arch/arm/tools uapi
 
-archprepare:
+archprepare: toolcheck
$(Q)$(MAKE) $(build)=arch/arm/tools kapi
 
+toolcheck:
+   $(Q)$(MAKE) $(build)=arch/arm/tools $@
+
 # Convert bzImage to zImage
 bzImage: zImage
 
 BOOT_TARGETS   = zImage Image xipImage bootpImage uImage
 INSTALL_TARGETS= zinstall uinstall install
 
-PHONY += bzImage $(BOOT_TARGETS) $(INSTALL_TARGETS)
+PHONY += bzImage $(BOOT_TARGETS) $(INSTALL_TARGETS) toolcheck
 
 bootpImage uImage: zImage
 zImage: Image
diff --git a/arch/arm/tools/Makefile b/arch/arm/tools/Makefile
index ddb89a7db36f..0a283756f1c5 100644
--- a/arch/arm/tools/Makefile
+++ b/arch/arm/tools/Makefile
@@ -23,12 +23,15 @@ uapi-hdrs-y += $(uapi)/unistd-eabi.h
 
 targets += $(addprefix ../../../,$(gen-y) $(kapi-hdrs-y) $(uapi-hdrs-y))
 
-PHONY += kapi uapi
+PHONY += kapi uapi toolcheck
 
 kapi:  $(kapi-hdrs-y) $(gen-y)
 
 uapi:  $(uapi-hdrs-y)
 
+toolcheck:
+   @'$(srctree)/$(src)/toolcheck'
+
 # Create output directory if not already present
 _dummy := $(shell [ -d '$(kapi)' ] || mkdir -p '$(kapi)') \
   $(shell [ -d '$(uapi)' ] || mkdir -p '$(uapi)')
diff --git a/arch/arm/tools/toolcheck b/arch/arm/tools/toolcheck
new file mode 100644
index ..04fc44b750d2
--- /dev/null
+++ b/arch/arm/tools/toolcheck
@@ -0,0 +1,44 @@
+#!/bin/bash
+#
+# Copyright 2017 Jason A. Donenfeld . All Rights Reserved.
+#
+
+set -e
+
+cleanup() {
+   [[ ! -d $temp ]] || rm -rf "$temp"
+   exit
+}
+trap cleanup INT TERM EXIT
+temp="$(mktemp -d)"
+
+check_thumb2_address() {
+   local disassembly
+
+   $CC $KBUILD_AFLAGS -o "$temp/a.out" -c -xassembler - <<-_EOF
+   .syntax unified
+   .thumb
+   .macro  badr, reg, sym
+   adr \reg, \sym + 1

Fwd: [PATCH] arm: ensure symbol is a thumb symbol in new binutils

2017-11-23 Thread Jason A. Donenfeld
Hello Nick & Binutils developers,

It would appear that recent changes in the binutils assembler (perhaps
52a86f843b6dee1de9977293da9786649b146b05? perhaps not?) have broken
the kernel when compiled in thumb2 mode. We currently do not have a
way of working around your breaking changes without adding additional
runtime instructions, which isn't acceptable for us. Details are
included in the thread below.

Thanks,
Jason

Forwarded conversation
Subject: [PATCH] arm: ensure symbol is a thumb symbol in new binutils


From: Jason A. Donenfeld <ja...@zx2c4.com>
Date: Tue, Nov 21, 2017 at 6:27 PM
To: li...@armlinux.org.uk, linux-arm-ker...@lists.infradead.org
Cc: "Jason A. Donenfeld" <ja...@zx2c4.com>, sta...@vger.kernel.org


On older versions of binutils, \sym points to an aligned address. On
newer versions of binutils, \sym sometimes points to the unaligned thumb
address in mysterious and buggy circumstances. In order to homogenize
this behavior, rather than adding 1, we simply OR in 1, so that already
unaligned instructions don't change. This fix is required for a
pedestrian THUMB2_KERNEL to boot without crashing when built with
non-old binutils.

While it works, the downside is that we have to add an `orr` instruction
to a fast path. The assembler can't do this at assemble time via "|1"
because "invalid operands (.text and *ABS* sections) for `|'", so we're
forced to do this. A better solution would be to have consistent
binutils behavior, or to have some kind of \sym feature detection that
won't turn into a maze of version comparisons. However, it's at the
moment unclear how to achieve this.

The rest of this commit message contains all of the relevant
information.

My tests concerned these versions:
broken: GNU ld (Gentoo 2.29.1 p3) 2.29.1
working: GNU ld (GNU Binutils for Ubuntu) 2.26.1

These produced the following code:
--- broken  2017-11-21 17:44:14.523416082 +0100
+++ working 2017-11-21 17:44:44.548461234 +0100
@@ -133,7 +133,7 @@
  160:  f01a 0ff0   tst.w   sl, #240; 0xf0
  164:  d111bne.n   18a <__sys_trace>
  166:  f5b7 7fc8   cmp.w   r7, #400; 0x190
- 16a:  f2af 1e6a   subwlr, pc, #362; 0x16a
+ 16a:  f2af 1e6b   subwlr, pc, #363; 0x16b
  16e:  bf38it  cc
  170:  f858 f027   ldrcc.w pc, [r8, r7, lsl #2]
  174:  a902add r1, sp, #8

The differing instruction corresponds with this actual line in
arch/arm/kernel/entry-common.S:
  badrlr, ret_fast_syscall@ return address

Running the broken kernel results in a runtime OOPS with:
PC is at ret_fast_syscall+0x4/0x52
LR is at ret_fast_syscall+0x2/0x52

The disassembly of that function for the crashing kernel is:
.text: ret_fast_syscall; CODE XREF:
sys_syscall+1C↓j
.text: CPSID   I   ; jumptable
0840 cases 15,18-376
.text:0002
.text:0002 loc_2   ; DATA XREF:
sys_syscall-6BA↓o
.text:0002 LDR.W   R2, [R9,#8]
.text:0006 CMP.W   R2, #0xBF00

Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
Cc: sta...@vger.kernel.org
---
 arch/arm/include/asm/assembler.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index ad301f107dd2..c62a3b6b0a3e 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -194,10 +194,9 @@
  */
.irpc,,eq,ne,cs,cc,mi,pl,vs,vc,hi,ls,ge,lt,gt,le,hs,lo
.macro  badr\c, rd, sym
-#ifdef CONFIG_THUMB2_KERNEL
-   adr\c   \rd, \sym + 1
-#else
adr\c   \rd, \sym
+#ifdef CONFIG_THUMB2_KERNEL
+   orr\c   \rd, \rd, 1
 #endif
.endm
.endr
--
2.15.0


--
From: Russell King - ARM Linux <li...@armlinux.org.uk>
Date: Tue, Nov 21, 2017 at 6:38 PM
To: "Jason A. Donenfeld" <ja...@zx2c4.com>, Will Deacon <will.dea...@arm.com>
Cc: linux-arm-ker...@lists.infradead.org, l...@vger.kernel.org,
sta...@vger.kernel.org


As it just seems to be a limited range of binutils versions that are
affected, I'd rather not impact the kernel fast-paths with extra
cycles just because binutils decided to change behaviour.  I'd prefer
to inform people about the problem and get them to change to a non-
buggy binutils.

This seems to be the second binutils bug that's biting us within the
last month... what's going on with binutils QA?

 arch/arm/Makefile|  7 +--
 arch/arm/tools/Makefile  |  5 -
 arch/arm/tools/toolcheck | 24 
 3 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 1cfac5119545..9e70d0435121 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -319,16 +319,19 @@ all:  $(notd

Fwd: [PATCH] arm: ensure symbol is a thumb symbol in new binutils

2017-11-23 Thread Jason A. Donenfeld
Hello Nick & Binutils developers,

It would appear that recent changes in the binutils assembler (perhaps
52a86f843b6dee1de9977293da9786649b146b05? perhaps not?) have broken
the kernel when compiled in thumb2 mode. We currently do not have a
way of working around your breaking changes without adding additional
runtime instructions, which isn't acceptable for us. Details are
included in the thread below.

Thanks,
Jason

Forwarded conversation
Subject: [PATCH] arm: ensure symbol is a thumb symbol in new binutils


From: Jason A. Donenfeld 
Date: Tue, Nov 21, 2017 at 6:27 PM
To: li...@armlinux.org.uk, linux-arm-ker...@lists.infradead.org
Cc: "Jason A. Donenfeld" , sta...@vger.kernel.org


On older versions of binutils, \sym points to an aligned address. On
newer versions of binutils, \sym sometimes points to the unaligned thumb
address in mysterious and buggy circumstances. In order to homogenize
this behavior, rather than adding 1, we simply OR in 1, so that already
unaligned instructions don't change. This fix is required for a
pedestrian THUMB2_KERNEL to boot without crashing when built with
non-old binutils.

While it works, the downside is that we have to add an `orr` instruction
to a fast path. The assembler can't do this at assemble time via "|1"
because "invalid operands (.text and *ABS* sections) for `|'", so we're
forced to do this. A better solution would be to have consistent
binutils behavior, or to have some kind of \sym feature detection that
won't turn into a maze of version comparisons. However, it's at the
moment unclear how to achieve this.

The rest of this commit message contains all of the relevant
information.

My tests concerned these versions:
broken: GNU ld (Gentoo 2.29.1 p3) 2.29.1
working: GNU ld (GNU Binutils for Ubuntu) 2.26.1

These produced the following code:
--- broken  2017-11-21 17:44:14.523416082 +0100
+++ working 2017-11-21 17:44:44.548461234 +0100
@@ -133,7 +133,7 @@
  160:  f01a 0ff0   tst.w   sl, #240; 0xf0
  164:  d111bne.n   18a <__sys_trace>
  166:  f5b7 7fc8   cmp.w   r7, #400; 0x190
- 16a:  f2af 1e6a   subwlr, pc, #362; 0x16a
+ 16a:  f2af 1e6b   subwlr, pc, #363; 0x16b
  16e:  bf38it  cc
  170:  f858 f027   ldrcc.w pc, [r8, r7, lsl #2]
  174:  a902add r1, sp, #8

The differing instruction corresponds with this actual line in
arch/arm/kernel/entry-common.S:
  badrlr, ret_fast_syscall@ return address

Running the broken kernel results in a runtime OOPS with:
PC is at ret_fast_syscall+0x4/0x52
LR is at ret_fast_syscall+0x2/0x52

The disassembly of that function for the crashing kernel is:
.text: ret_fast_syscall; CODE XREF:
sys_syscall+1C↓j
.text: CPSID   I   ; jumptable
0840 cases 15,18-376
.text:0002
.text:0002 loc_2   ; DATA XREF:
sys_syscall-6BA↓o
.text:0002 LDR.W   R2, [R9,#8]
.text:0006 CMP.W       R2, #0xBF00

Signed-off-by: Jason A. Donenfeld 
Cc: sta...@vger.kernel.org
---
 arch/arm/include/asm/assembler.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index ad301f107dd2..c62a3b6b0a3e 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -194,10 +194,9 @@
  */
.irpc,,eq,ne,cs,cc,mi,pl,vs,vc,hi,ls,ge,lt,gt,le,hs,lo
.macro  badr\c, rd, sym
-#ifdef CONFIG_THUMB2_KERNEL
-   adr\c   \rd, \sym + 1
-#else
adr\c   \rd, \sym
+#ifdef CONFIG_THUMB2_KERNEL
+   orr\c   \rd, \rd, 1
 #endif
.endm
.endr
--
2.15.0


--
From: Russell King - ARM Linux 
Date: Tue, Nov 21, 2017 at 6:38 PM
To: "Jason A. Donenfeld" , Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org, l...@vger.kernel.org,
sta...@vger.kernel.org


As it just seems to be a limited range of binutils versions that are
affected, I'd rather not impact the kernel fast-paths with extra
cycles just because binutils decided to change behaviour.  I'd prefer
to inform people about the problem and get them to change to a non-
buggy binutils.

This seems to be the second binutils bug that's biting us within the
last month... what's going on with binutils QA?

 arch/arm/Makefile|  7 +--
 arch/arm/tools/Makefile  |  5 -
 arch/arm/tools/toolcheck | 24 
 3 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 1cfac5119545..9e70d0435121 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -319,16 +319,19 @@ all:  $(notdir $(KBUILD_IMAGE)) $(KBUILD_DTBS)
 archheaders:
$(Q)$(MAKE) $(build)=arch/arm/tools uapi

-archprepare:
+archprepare: toolcheck
$(

Re: objtool - what if I want to clobber rbp?

2017-11-22 Thread Jason A. Donenfeld
Hi Josh,

On Wed, Nov 22, 2017 at 4:31 AM, Josh Poimboeuf  wrote:
> - Make your feature conflict with CONFIG_FRAME_POINTER on x86_64.  The
>   ORC unwinder is now the default anyway for 4.15, and we renamed the
>   configs, so most people will be actively switching to ORC.
> BTW, since CONFIG_FRAME_POINTER is no longer the default and is becoming
> deprecated, there has been some talk of disabling objtool with
> CONFIG_FRAME_POINTER.  That would make your life easier.  However I
> think we're not quite ready for that, so it might be a few more release
> cycles before that happens.

Fortunately my code won't be ready to be merged for a few release
cycles, so thanks for letting me know about future plans. I think my
patch set and these changes will synchronize nicely.

>
> - Add some ifdefs so your code only uses %rbp as a scratch register when
>   CONFIG_FRAME_POINTER is disabled.
> - If one of the registers is used less often than the others, you could
>   spill it to the stack.  I know you said you need all the registers,
>   but I'd be willing to review the code for ideas, if that would help.
>   Sometimes it helps to get fresh eyes on the problem.  We were able to
>   fix this problem with all the existing crypto code without affecting
>   performance measurably.  We had to get creative with a few of those.

That actually could be interesting, if you're curious about looking at it.

Jason


Re: objtool - what if I want to clobber rbp?

2017-11-22 Thread Jason A. Donenfeld
Hi Josh,

On Wed, Nov 22, 2017 at 4:31 AM, Josh Poimboeuf  wrote:
> - Make your feature conflict with CONFIG_FRAME_POINTER on x86_64.  The
>   ORC unwinder is now the default anyway for 4.15, and we renamed the
>   configs, so most people will be actively switching to ORC.
> BTW, since CONFIG_FRAME_POINTER is no longer the default and is becoming
> deprecated, there has been some talk of disabling objtool with
> CONFIG_FRAME_POINTER.  That would make your life easier.  However I
> think we're not quite ready for that, so it might be a few more release
> cycles before that happens.

Fortunately my code won't be ready to be merged for a few release
cycles, so thanks for letting me know about future plans. I think my
patch set and these changes will synchronize nicely.

>
> - Add some ifdefs so your code only uses %rbp as a scratch register when
>   CONFIG_FRAME_POINTER is disabled.
> - If one of the registers is used less often than the others, you could
>   spill it to the stack.  I know you said you need all the registers,
>   but I'd be willing to review the code for ideas, if that would help.
>   Sometimes it helps to get fresh eyes on the problem.  We were able to
>   fix this problem with all the existing crypto code without affecting
>   performance measurably.  We had to get creative with a few of those.

That actually could be interesting, if you're curious about looking at it.

Jason


objtool - what if I want to clobber rbp?

2017-11-21 Thread Jason A. Donenfeld
Hi Josh,

We're working on some highly optimized assembly crypto primitive
implementations for WireGuard. The last 24 hours have been spent
trying to make objtool happy with a variety of tricks, some more
unfortunate than others. There's still one issue remaining, however,
and I just can't figure out how to make it go away:

poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx uses BP as a
scratch register
poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx2 uses BP as a
scratch register
poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx512 uses BP as
a scratch register

The messages are right. We're using %rbp as a general purpose
register, writing into it all sorts of crypto gibberish certainly not
suitable for walking stack frames. It's hard to find a way of not
using it, without incurring a speed penalty. We really do just need
all the registers.

Of course the "problem" goes away with the new slick ORC unwinding,
which is great. But for frame pointer unwinding, this problem remains.

I'm wondering if you can think of any clever way of marking a function
or the like that will make this issue go away, somehow. Is there any
path forward without sacrificing %rbp and hence performance to a
useless frame pointer?

Thanks,
Jason


objtool - what if I want to clobber rbp?

2017-11-21 Thread Jason A. Donenfeld
Hi Josh,

We're working on some highly optimized assembly crypto primitive
implementations for WireGuard. The last 24 hours have been spent
trying to make objtool happy with a variety of tricks, some more
unfortunate than others. There's still one issue remaining, however,
and I just can't figure out how to make it go away:

poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx uses BP as a
scratch register
poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx2 uses BP as a
scratch register
poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx512 uses BP as
a scratch register

The messages are right. We're using %rbp as a general purpose
register, writing into it all sorts of crypto gibberish certainly not
suitable for walking stack frames. It's hard to find a way of not
using it, without incurring a speed penalty. We really do just need
all the registers.

Of course the "problem" goes away with the new slick ORC unwinding,
which is great. But for frame pointer unwinding, this problem remains.

I'm wondering if you can think of any clever way of marking a function
or the like that will make this issue go away, somehow. Is there any
path forward without sacrificing %rbp and hence performance to a
useless frame pointer?

Thanks,
Jason


Re: [GIT PULL] usercopy whitelisting for v4.15-rc1

2017-11-21 Thread Jason A. Donenfeld
Hi Linus,

On Fri, Nov 17, 2017 at 10:13 PM, Linus Torvalds
 wrote:
> As a security person, you need to repeat this mantra:
>
>   "security problems are just bugs"
>
> and you need to _internalize_ it, instead of scoff at it.

It might be news to you that actually some security people scoff at
other security peoples' obsession with "security bugs".

Security people like to rate the "utility" of a bug in a security
setting, usually from from "does nothing" (benign) to "allows code
execution" (dangerous/useful), with something like "crashes system"
somewhere in between those two poles. (You may not agree with this
scale, but indeed security people tend to think that a crash is safer
than allowing malicious code execution.)

The security industry is largely obsessed by finding (and
selling/using/patching/reporting/showcasing/stockpiling/detecting/stealing)
these "dangerous/useful" variety of bugs. And this obsession is
continually fulfilled because bugs keep happening -- which is just the
nature of software development -- and so this "security bug"
infatuation continues. It even leads to people upset with you that you
don't care about CVEs and so forth, because they're so fixated on
individual bugs and their security impact.

So, as I mentioned, it may come as a surprise to you that some
security people don't really care for that mentality. As Bas Alberts
wrote [1], "Bugs are irrelevant. Yet our industry is fatally focused
on what is essentially vulnerability masturbation." So what's the
alternative to obsessing over each individual software bug?

In the context of the kernel, the solution from Spender and Pipacs,
and more recently "adopted" by Kees and his KSPP project, has been to
try to eliminate the "security utility" of bugs. In the best case,
this means making bugs that were formerly "dangerous/useful" be
"benign". In the second best case, it usually means making bugs that
were formerly "dangerous/useful" be "irreliable" or "crashes system".
The idea is to completely eliminate the security impact of entire
*classes* of bugs. It acknowledges that programmers will continue to
make programming bugs, as they do, and seeks to make the kernel safer,
from a security perspective, in spite of the continued existence of
these bugs.

> Because the primary focus should be "debugging". The primary focus
> should be "let's make sure the kernel released in a year is better
> than the one released today".

In light of the above, this then is where there might be an
unfortunate disconnect. Sure, many hardening features that turn
dangerous bugs into less dangerous bugs have the quality of unearthing
individual bugs. But many other aspects are rightfully focused on
mitigating the security impact future unforeseen, undetected bugs.
>From a security perspective, please try to see beyond the individual
bug obsession, and consider the larger picture. After all, if the
security impact of all kernel bugs is reduced to nil, your mantra that
all bugs are just bugs will be even more true.

Jason

[1] https://lists.immunityinc.com/pipermail/dailydave/2015-August/000976.html


Re: [GIT PULL] usercopy whitelisting for v4.15-rc1

2017-11-21 Thread Jason A. Donenfeld
Hi Linus,

On Fri, Nov 17, 2017 at 10:13 PM, Linus Torvalds
 wrote:
> As a security person, you need to repeat this mantra:
>
>   "security problems are just bugs"
>
> and you need to _internalize_ it, instead of scoff at it.

It might be news to you that actually some security people scoff at
other security peoples' obsession with "security bugs".

Security people like to rate the "utility" of a bug in a security
setting, usually from from "does nothing" (benign) to "allows code
execution" (dangerous/useful), with something like "crashes system"
somewhere in between those two poles. (You may not agree with this
scale, but indeed security people tend to think that a crash is safer
than allowing malicious code execution.)

The security industry is largely obsessed by finding (and
selling/using/patching/reporting/showcasing/stockpiling/detecting/stealing)
these "dangerous/useful" variety of bugs. And this obsession is
continually fulfilled because bugs keep happening -- which is just the
nature of software development -- and so this "security bug"
infatuation continues. It even leads to people upset with you that you
don't care about CVEs and so forth, because they're so fixated on
individual bugs and their security impact.

So, as I mentioned, it may come as a surprise to you that some
security people don't really care for that mentality. As Bas Alberts
wrote [1], "Bugs are irrelevant. Yet our industry is fatally focused
on what is essentially vulnerability masturbation." So what's the
alternative to obsessing over each individual software bug?

In the context of the kernel, the solution from Spender and Pipacs,
and more recently "adopted" by Kees and his KSPP project, has been to
try to eliminate the "security utility" of bugs. In the best case,
this means making bugs that were formerly "dangerous/useful" be
"benign". In the second best case, it usually means making bugs that
were formerly "dangerous/useful" be "irreliable" or "crashes system".
The idea is to completely eliminate the security impact of entire
*classes* of bugs. It acknowledges that programmers will continue to
make programming bugs, as they do, and seeks to make the kernel safer,
from a security perspective, in spite of the continued existence of
these bugs.

> Because the primary focus should be "debugging". The primary focus
> should be "let's make sure the kernel released in a year is better
> than the one released today".

In light of the above, this then is where there might be an
unfortunate disconnect. Sure, many hardening features that turn
dangerous bugs into less dangerous bugs have the quality of unearthing
individual bugs. But many other aspects are rightfully focused on
mitigating the security impact future unforeseen, undetected bugs.
>From a security perspective, please try to see beyond the individual
bug obsession, and consider the larger picture. After all, if the
security impact of all kernel bugs is reduced to nil, your mantra that
all bugs are just bugs will be even more true.

Jason

[1] https://lists.immunityinc.com/pipermail/dailydave/2015-August/000976.html


Re: [PATCH] fork: fix incorrect fput of ->exe_file causing use-after-free

2017-11-16 Thread Jason A. Donenfeld
Hey Eric,

This is a pretty late response to the thread, but in case you're
curious, since Ubuntu forgot to backport this (I already emailed them
about it), I actually experienced a set of boxes that hit this bug in
the wild and were crashing every 2 weeks or so, when under highload.
It's pretty amazing that syzkaller unearthed this, resulting in a nice
test case, making it possible to debug and fix the error. Otherwise
it'd be a real heisenbug.

So, thanks.

Jason


Re: [PATCH] fork: fix incorrect fput of ->exe_file causing use-after-free

2017-11-16 Thread Jason A. Donenfeld
Hey Eric,

This is a pretty late response to the thread, but in case you're
curious, since Ubuntu forgot to backport this (I already emailed them
about it), I actually experienced a set of boxes that hit this bug in
the wild and were crashing every 2 weeks or so, when under highload.
It's pretty amazing that syzkaller unearthed this, resulting in a nice
test case, making it possible to debug and fix the error. Otherwise
it'd be a real heisenbug.

So, thanks.

Jason


Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps

2017-11-11 Thread Jason A. Donenfeld
On Sat, Nov 11, 2017 at 11:18 PM, Johannes Berg
 wrote:
>
>> > If you're handling this by forcing another read() to procude the
>> > NLMSG_DONE, then you have no reason to WARN_ON() here.
>> >
>> > In fact you are adding a WARN_ON() which is trivially triggerable by
>> > any user.
>>
>> I added this in my suggestion for how this could work, but I don't
>> think you're right, since we previously check if there's enough space.
>
> Or perhaps I should say this differently:
>
> Forcing another read happens through the
>
> skb_tailroom(skb) < nlmsg_total_size(...)
>
> check, so the nlmsg_put_answer() can't really fail.
>
>
> Handling nlmsg_put_answer() failures by forcing another read would have
> required jumping to the existing if code with a goto, or restructuring
> the whole thing completely somehow, and I didn't see how to do that.

Exactly. And if something _does_ go wrong in our logic, and we can't
add NLMSG_DONE, we really do want people to report this to us, since
dumps must always end that way. We'd probably have caught this a
number of years ago when userspace developers were twiddling with
their receive buffers if we had had the WARN_ON. Nice suggestion from
Johannes.


Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps

2017-11-11 Thread Jason A. Donenfeld
On Sat, Nov 11, 2017 at 11:18 PM, Johannes Berg
 wrote:
>
>> > If you're handling this by forcing another read() to procude the
>> > NLMSG_DONE, then you have no reason to WARN_ON() here.
>> >
>> > In fact you are adding a WARN_ON() which is trivially triggerable by
>> > any user.
>>
>> I added this in my suggestion for how this could work, but I don't
>> think you're right, since we previously check if there's enough space.
>
> Or perhaps I should say this differently:
>
> Forcing another read happens through the
>
> skb_tailroom(skb) < nlmsg_total_size(...)
>
> check, so the nlmsg_put_answer() can't really fail.
>
>
> Handling nlmsg_put_answer() failures by forcing another read would have
> required jumping to the existing if code with a goto, or restructuring
> the whole thing completely somehow, and I didn't see how to do that.

Exactly. And if something _does_ go wrong in our logic, and we can't
add NLMSG_DONE, we really do want people to report this to us, since
dumps must always end that way. We'd probably have caught this a
number of years ago when userspace developers were twiddling with
their receive buffers if we had had the WARN_ON. Nice suggestion from
Johannes.


WireGuard Upstreaming Roadmap (November 2017)

2017-11-10 Thread Jason A. Donenfeld
Hi folks,

This relates to WireGuard [0].

Following a very nice conference with the Linux kernel networking subsystem
community [1,2], I thought it might be a good idea to spell out the roadmap
for the coming months and the trajectory into upstream, based on my
discussions with several developers and maintainers. There are several threads
of this, the biggest of which surrounds the kernel code base, but there are
some other ends of the WireGuard project as a whole that are also relevant.

The current biggest blocker is issues with the crypto API. Before WireGuard
can go upstream, I intend to embark on a multi-pronged effort to overhaul the
crypto API. I very much need to sync up with Herbert regarding my plans for
this, and start spec'ing things out a bit more formally, so I can begin
concrete discussions with him. I intend to base my work both on feedback
from linux-crypto/Herbert and from the cryptographic research community. I
hope to go to RWC2018 [3] and the subsequent HACS workshop for the academic
engagement side, but of course like all the work I do on the kernel, things
will be highly based in engineering, rather than purely academic, practices.

Dave has strongly encouraged me to post patches sooner rather than later.
So I think before the crypto API is ready to go, I'll likely post a [RFG] --
request for grumbles -- patch set to netdev, in order to have some code
review, so as to gauge where we're at. This patch set will use my current
crypto API, not the kernel's crypto API, with it mentioned in the opening that
I intend to switch to the kernel's crypto API when it looks like the one used
here. This way we'll get early feedback so that the later real [PATCH] series
will go more smoothly.

There are a few WireGuard features that some of you have been waiting for. At
the urging of some folks at the conference, I intend to submit a "core"
WireGuard to upstream, and then use future releases to build on top of that,
to make the initial upstreaming process go as easily as possible. Therefore,
there are three big TODO items that may _or_ may not go in after upstreaming:

 - In-band control messages [possibly not]
 - Netlink multicast event notification
 - GRO integration

Of these, it's most likely I'll implement GRO and leave the other two until
after, but this may change. Since WireGuard already does GSO, it would make
sense to implement the other side of things too. There's also a longer more
ambitious roadmap [4], filled with both easy coding-related things and longer
term design things, but that's out of scope for this email, though many of
which will likely be completed before submission time.

There are also six other threads of development that are ongoing, which I
intend to put a bit more focus on too in the near future:

  - The userspace implementation. I'd like to bring this up to deployment
quality, which naturally fits into the next area.

  - Mobile apps. It's fairly easy to integrate the userspace implementation
with existing APIs. The current Android app already works well with the
kernel module, but of course people want this more easily deployed.

  - Mac and Windows support for the userspace implementation. These are
already mostly done, but the APIs used may in fact change, so there may
still be a bit of work to do here before we're satisfied.

  - Bindings and libraries. Now that we have a stable Netlink API, we can
start making nice wrappers for the various languages people like to use.
It remains to be seen whether or not a C "libwireguard" is needed, since in
that domain, talking Netlink directly is often a better choice, but I do see
some potential for a pywireguard and the like. This will also be essential
when the already mentioned plans for event notification and the possibly
excluded control messages materialize.

  - More formal verification. While we have the cryptographic protocol
verified, there are still more places where formalism is quite helpful,
proving more state machines, and even proving C implementations to be
correct. Work and research is ongoing in this domain.

  - Integration into network managers and routing daemons (mesh and
traditional). Work has already begun here on systemd-networkd, and others
are looking into daemons like babel and bird.

So that's where we're at. I hope to have a RFG submitted in the next several
months, and hopefully we gather some nice momentum and get the work
upstreamed and the project completed soon, for some definition of "complete".

If you'd like to work on WireGuard, or simply have any questions, don't
hesitate to email me.

Regards,
Jason


[0] https://www.wireguard.com/
[1] https://www.netdevconf.org/2.2/
[2] https://www.wireguard.com/presentations/
[3] https://rwc.iacr.org
[4] https://www.wireguard.com/todo/


WireGuard Upstreaming Roadmap (November 2017)

2017-11-10 Thread Jason A. Donenfeld
Hi folks,

This relates to WireGuard [0].

Following a very nice conference with the Linux kernel networking subsystem
community [1,2], I thought it might be a good idea to spell out the roadmap
for the coming months and the trajectory into upstream, based on my
discussions with several developers and maintainers. There are several threads
of this, the biggest of which surrounds the kernel code base, but there are
some other ends of the WireGuard project as a whole that are also relevant.

The current biggest blocker is issues with the crypto API. Before WireGuard
can go upstream, I intend to embark on a multi-pronged effort to overhaul the
crypto API. I very much need to sync up with Herbert regarding my plans for
this, and start spec'ing things out a bit more formally, so I can begin
concrete discussions with him. I intend to base my work both on feedback
from linux-crypto/Herbert and from the cryptographic research community. I
hope to go to RWC2018 [3] and the subsequent HACS workshop for the academic
engagement side, but of course like all the work I do on the kernel, things
will be highly based in engineering, rather than purely academic, practices.

Dave has strongly encouraged me to post patches sooner rather than later.
So I think before the crypto API is ready to go, I'll likely post a [RFG] --
request for grumbles -- patch set to netdev, in order to have some code
review, so as to gauge where we're at. This patch set will use my current
crypto API, not the kernel's crypto API, with it mentioned in the opening that
I intend to switch to the kernel's crypto API when it looks like the one used
here. This way we'll get early feedback so that the later real [PATCH] series
will go more smoothly.

There are a few WireGuard features that some of you have been waiting for. At
the urging of some folks at the conference, I intend to submit a "core"
WireGuard to upstream, and then use future releases to build on top of that,
to make the initial upstreaming process go as easily as possible. Therefore,
there are three big TODO items that may _or_ may not go in after upstreaming:

 - In-band control messages [possibly not]
 - Netlink multicast event notification
 - GRO integration

Of these, it's most likely I'll implement GRO and leave the other two until
after, but this may change. Since WireGuard already does GSO, it would make
sense to implement the other side of things too. There's also a longer more
ambitious roadmap [4], filled with both easy coding-related things and longer
term design things, but that's out of scope for this email, though many of
which will likely be completed before submission time.

There are also six other threads of development that are ongoing, which I
intend to put a bit more focus on too in the near future:

  - The userspace implementation. I'd like to bring this up to deployment
quality, which naturally fits into the next area.

  - Mobile apps. It's fairly easy to integrate the userspace implementation
with existing APIs. The current Android app already works well with the
kernel module, but of course people want this more easily deployed.

  - Mac and Windows support for the userspace implementation. These are
already mostly done, but the APIs used may in fact change, so there may
still be a bit of work to do here before we're satisfied.

  - Bindings and libraries. Now that we have a stable Netlink API, we can
start making nice wrappers for the various languages people like to use.
It remains to be seen whether or not a C "libwireguard" is needed, since in
that domain, talking Netlink directly is often a better choice, but I do see
some potential for a pywireguard and the like. This will also be essential
when the already mentioned plans for event notification and the possibly
excluded control messages materialize.

  - More formal verification. While we have the cryptographic protocol
verified, there are still more places where formalism is quite helpful,
proving more state machines, and even proving C implementations to be
correct. Work and research is ongoing in this domain.

  - Integration into network managers and routing daemons (mesh and
traditional). Work has already begun here on systemd-networkd, and others
are looking into daemons like babel and bird.

So that's where we're at. I hope to have a RFG submitted in the next several
months, and hopefully we gather some nice momentum and get the work
upstreamed and the project completed soon, for some definition of "complete".

If you'd like to work on WireGuard, or simply have any questions, don't
hesitate to email me.

Regards,
Jason


[0] https://www.wireguard.com/
[1] https://www.netdevconf.org/2.2/
[2] https://www.wireguard.com/presentations/
[3] https://rwc.iacr.org
[4] https://www.wireguard.com/todo/


Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps

2017-11-10 Thread Jason A. Donenfeld
On Sat, Nov 11, 2017 at 11:37 AM, David Miller  wrote:
> Jason I'm already pushing my luck as-is with the pull request I made
> yesterday.
>
> I've seen your original requst to get this in, you don't have to say
> it multiple times.
>
> We can get this into the merge window and submit it for -stable, so
> please relax.

Whoops, sorry! Okay, no problem.

Jason


Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps

2017-11-10 Thread Jason A. Donenfeld
On Sat, Nov 11, 2017 at 11:37 AM, David Miller  wrote:
> Jason I'm already pushing my luck as-is with the pull request I made
> yesterday.
>
> I've seen your original requst to get this in, you don't have to say
> it multiple times.
>
> We can get this into the merge window and submit it for -stable, so
> please relax.

Whoops, sorry! Okay, no problem.

Jason


Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps

2017-11-10 Thread Jason A. Donenfeld
IIRC, 4.14 comes tomorrow-ish? If possible, it would be nice to get
this in 4.14 before then, so it doesn't have to take time to trickle
down through stable.

Jason

On Thu, Nov 9, 2017 at 1:04 PM, Jason A. Donenfeld <ja...@zx2c4.com> wrote:
> The way people generally use netlink_dump is that they fill in the skb
> as much as possible, breaking when nla_put returns an error. Then, they
> get called again and start filling out the next skb, and again, and so
> forth. The mechanism at work here is the ability for the iterative
> dumping function to detect when the skb is filled up and not fill it
> past the brim, waiting for a fresh skb for the rest of the data.
>
> However, if the attributes are small and nicely packed, it is possible
> that a dump callback function successfully fills in attributes until the
> skb is of size 4080 (libmnl's default page-sized receive buffer size).
> The dump function completes, satisfied, and then, if it happens to be
> that this is actually the last skb, and no further ones are to be sent,
> then netlink_dump will add on the NLMSG_DONE part:
>
>   nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
>
> It is very important that netlink_dump does this, of course. However, in
> this example, that call to nlmsg_put_answer will fail, because the
> previous filling by the dump function did not leave it enough room. And
> how could it possibly have done so? All of the nla_put variety of
> functions simply check to see if the skb has enough tailroom,
> independent of the context it is in.
>
> In order to keep the important assumptions of all netlink dump users, it
> is therefore important to give them an skb that has this end part of the
> tail already reserved, so that the call to nlmsg_put_answer does not
> fail. Otherwise, library authors are forced to find some bizarre sized
> receive buffer that has a large modulo relative to the common sizes of
> messages received, which is ugly and buggy.
>
> This patch thus saves the NLMSG_DONE for an additional message, for the
> case that things are dangerously close to the brim. This requires
> keeping track of the errno from ->dump() across calls.
>
> Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
> ---
> Can we get this into 4.14? Is there still time? It should also be queued
> up for stable.
>
> Changes v3->v4:
>  - I like long lines. The kernel does not. Wrapped at 80 now.
>
>  net/netlink/af_netlink.c | 17 +++--
>  net/netlink/af_netlink.h |  1 +
>  2 files changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index b93148e8e9fb..15c99dfa3d72 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk)
> struct sk_buff *skb = NULL;
> struct nlmsghdr *nlh;
> struct module *module;
> -   int len, err = -ENOBUFS;
> +   int err = -ENOBUFS;
> int alloc_min_size;
> int alloc_size;
>
> @@ -2183,9 +2183,11 @@ static int netlink_dump(struct sock *sk)
> skb_reserve(skb, skb_tailroom(skb) - alloc_size);
> netlink_skb_set_owner_r(skb, sk);
>
> -   len = cb->dump(skb, cb);
> +   if (nlk->dump_done_errno > 0)
> +   nlk->dump_done_errno = cb->dump(skb, cb);
>
> -   if (len > 0) {
> +   if (nlk->dump_done_errno > 0 ||
> +   skb_tailroom(skb) < 
> nlmsg_total_size(sizeof(nlk->dump_done_errno))) {
> mutex_unlock(nlk->cb_mutex);
>
> if (sk_filter(sk, skb))
> @@ -2195,13 +2197,15 @@ static int netlink_dump(struct sock *sk)
> return 0;
> }
>
> -   nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
> -   if (!nlh)
> +   nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE,
> +  sizeof(nlk->dump_done_errno), NLM_F_MULTI);
> +   if (WARN_ON(!nlh))
> goto errout_skb;
>
> nl_dump_check_consistent(cb, nlh);
>
> -   memcpy(nlmsg_data(nlh), , sizeof(len));
> +   memcpy(nlmsg_data(nlh), >dump_done_errno,
> +  sizeof(nlk->dump_done_errno));
>
> if (sk_filter(sk, skb))
> kfree_skb(skb);
> @@ -2273,6 +2277,7 @@ int __netlink_dump_start(struct sock *ssk, struct 
> sk_buff *skb,
> }
>
> nlk->cb_running = true;
> +   nlk->dump_done_errno = INT_MAX;
>
> mutex_unlock(nlk->cb_mutex);
>
> diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
> index 028188597eaa..962de7b3c023 100644
> --- a/net/netlink/af_netlink.

Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps

2017-11-10 Thread Jason A. Donenfeld
IIRC, 4.14 comes tomorrow-ish? If possible, it would be nice to get
this in 4.14 before then, so it doesn't have to take time to trickle
down through stable.

Jason

On Thu, Nov 9, 2017 at 1:04 PM, Jason A. Donenfeld  wrote:
> The way people generally use netlink_dump is that they fill in the skb
> as much as possible, breaking when nla_put returns an error. Then, they
> get called again and start filling out the next skb, and again, and so
> forth. The mechanism at work here is the ability for the iterative
> dumping function to detect when the skb is filled up and not fill it
> past the brim, waiting for a fresh skb for the rest of the data.
>
> However, if the attributes are small and nicely packed, it is possible
> that a dump callback function successfully fills in attributes until the
> skb is of size 4080 (libmnl's default page-sized receive buffer size).
> The dump function completes, satisfied, and then, if it happens to be
> that this is actually the last skb, and no further ones are to be sent,
> then netlink_dump will add on the NLMSG_DONE part:
>
>   nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
>
> It is very important that netlink_dump does this, of course. However, in
> this example, that call to nlmsg_put_answer will fail, because the
> previous filling by the dump function did not leave it enough room. And
> how could it possibly have done so? All of the nla_put variety of
> functions simply check to see if the skb has enough tailroom,
> independent of the context it is in.
>
> In order to keep the important assumptions of all netlink dump users, it
> is therefore important to give them an skb that has this end part of the
> tail already reserved, so that the call to nlmsg_put_answer does not
> fail. Otherwise, library authors are forced to find some bizarre sized
> receive buffer that has a large modulo relative to the common sizes of
> messages received, which is ugly and buggy.
>
> This patch thus saves the NLMSG_DONE for an additional message, for the
> case that things are dangerously close to the brim. This requires
> keeping track of the errno from ->dump() across calls.
>
> Signed-off-by: Jason A. Donenfeld 
> ---
> Can we get this into 4.14? Is there still time? It should also be queued
> up for stable.
>
> Changes v3->v4:
>  - I like long lines. The kernel does not. Wrapped at 80 now.
>
>  net/netlink/af_netlink.c | 17 +++--
>  net/netlink/af_netlink.h |  1 +
>  2 files changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index b93148e8e9fb..15c99dfa3d72 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk)
> struct sk_buff *skb = NULL;
> struct nlmsghdr *nlh;
> struct module *module;
> -   int len, err = -ENOBUFS;
> +   int err = -ENOBUFS;
> int alloc_min_size;
> int alloc_size;
>
> @@ -2183,9 +2183,11 @@ static int netlink_dump(struct sock *sk)
> skb_reserve(skb, skb_tailroom(skb) - alloc_size);
> netlink_skb_set_owner_r(skb, sk);
>
> -   len = cb->dump(skb, cb);
> +   if (nlk->dump_done_errno > 0)
> +   nlk->dump_done_errno = cb->dump(skb, cb);
>
> -   if (len > 0) {
> +   if (nlk->dump_done_errno > 0 ||
> +   skb_tailroom(skb) < 
> nlmsg_total_size(sizeof(nlk->dump_done_errno))) {
> mutex_unlock(nlk->cb_mutex);
>
> if (sk_filter(sk, skb))
> @@ -2195,13 +2197,15 @@ static int netlink_dump(struct sock *sk)
> return 0;
> }
>
> -   nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
> -   if (!nlh)
> +   nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE,
> +  sizeof(nlk->dump_done_errno), NLM_F_MULTI);
> +   if (WARN_ON(!nlh))
> goto errout_skb;
>
> nl_dump_check_consistent(cb, nlh);
>
> -   memcpy(nlmsg_data(nlh), , sizeof(len));
> +   memcpy(nlmsg_data(nlh), >dump_done_errno,
> +  sizeof(nlk->dump_done_errno));
>
> if (sk_filter(sk, skb))
> kfree_skb(skb);
> @@ -2273,6 +2277,7 @@ int __netlink_dump_start(struct sock *ssk, struct 
> sk_buff *skb,
> }
>
> nlk->cb_running = true;
> +   nlk->dump_done_errno = INT_MAX;
>
> mutex_unlock(nlk->cb_mutex);
>
> diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
> index 028188597eaa..962de7b3c023 100644
> --- a/net/netlink/af_netlink.h
> +++ b/net/netlink/af_netlink.h
> @@ -34,6 +34,7 @@ struct netlink_sock {
> wait_queue_head_t   wait;
> boolbound;
> boolcb_running;
> +   int dump_done_errno;
> struct netlink_callback cb;
> struct mutex*cb_mutex;
> struct mutexcb_def_mutex;
> --
> 2.15.0
>


[PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps

2017-11-08 Thread Jason A. Donenfeld
The way people generally use netlink_dump is that they fill in the skb
as much as possible, breaking when nla_put returns an error. Then, they
get called again and start filling out the next skb, and again, and so
forth. The mechanism at work here is the ability for the iterative
dumping function to detect when the skb is filled up and not fill it
past the brim, waiting for a fresh skb for the rest of the data.

However, if the attributes are small and nicely packed, it is possible
that a dump callback function successfully fills in attributes until the
skb is of size 4080 (libmnl's default page-sized receive buffer size).
The dump function completes, satisfied, and then, if it happens to be
that this is actually the last skb, and no further ones are to be sent,
then netlink_dump will add on the NLMSG_DONE part:

  nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);

It is very important that netlink_dump does this, of course. However, in
this example, that call to nlmsg_put_answer will fail, because the
previous filling by the dump function did not leave it enough room. And
how could it possibly have done so? All of the nla_put variety of
functions simply check to see if the skb has enough tailroom,
independent of the context it is in.

In order to keep the important assumptions of all netlink dump users, it
is therefore important to give them an skb that has this end part of the
tail already reserved, so that the call to nlmsg_put_answer does not
fail. Otherwise, library authors are forced to find some bizarre sized
receive buffer that has a large modulo relative to the common sizes of
messages received, which is ugly and buggy.

This patch thus saves the NLMSG_DONE for an additional message, for the
case that things are dangerously close to the brim. This requires
keeping track of the errno from ->dump() across calls.

Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
---
Can we get this into 4.14? Is there still time? It should also be queued
up for stable.

Changes v3->v4:
 - I like long lines. The kernel does not. Wrapped at 80 now.

 net/netlink/af_netlink.c | 17 +++--
 net/netlink/af_netlink.h |  1 +
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index b93148e8e9fb..15c99dfa3d72 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk)
struct sk_buff *skb = NULL;
struct nlmsghdr *nlh;
struct module *module;
-   int len, err = -ENOBUFS;
+   int err = -ENOBUFS;
int alloc_min_size;
int alloc_size;
 
@@ -2183,9 +2183,11 @@ static int netlink_dump(struct sock *sk)
skb_reserve(skb, skb_tailroom(skb) - alloc_size);
netlink_skb_set_owner_r(skb, sk);
 
-   len = cb->dump(skb, cb);
+   if (nlk->dump_done_errno > 0)
+   nlk->dump_done_errno = cb->dump(skb, cb);
 
-   if (len > 0) {
+   if (nlk->dump_done_errno > 0 ||
+   skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) 
{
mutex_unlock(nlk->cb_mutex);
 
if (sk_filter(sk, skb))
@@ -2195,13 +2197,15 @@ static int netlink_dump(struct sock *sk)
return 0;
}
 
-   nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
-   if (!nlh)
+   nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE,
+  sizeof(nlk->dump_done_errno), NLM_F_MULTI);
+   if (WARN_ON(!nlh))
goto errout_skb;
 
nl_dump_check_consistent(cb, nlh);
 
-   memcpy(nlmsg_data(nlh), , sizeof(len));
+   memcpy(nlmsg_data(nlh), >dump_done_errno,
+  sizeof(nlk->dump_done_errno));
 
if (sk_filter(sk, skb))
kfree_skb(skb);
@@ -2273,6 +2277,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff 
*skb,
}
 
nlk->cb_running = true;
+   nlk->dump_done_errno = INT_MAX;
 
mutex_unlock(nlk->cb_mutex);
 
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index 028188597eaa..962de7b3c023 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -34,6 +34,7 @@ struct netlink_sock {
wait_queue_head_t   wait;
boolbound;
boolcb_running;
+   int dump_done_errno;
struct netlink_callback cb;
struct mutex*cb_mutex;
struct mutexcb_def_mutex;
-- 
2.15.0



[PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps

2017-11-08 Thread Jason A. Donenfeld
The way people generally use netlink_dump is that they fill in the skb
as much as possible, breaking when nla_put returns an error. Then, they
get called again and start filling out the next skb, and again, and so
forth. The mechanism at work here is the ability for the iterative
dumping function to detect when the skb is filled up and not fill it
past the brim, waiting for a fresh skb for the rest of the data.

However, if the attributes are small and nicely packed, it is possible
that a dump callback function successfully fills in attributes until the
skb is of size 4080 (libmnl's default page-sized receive buffer size).
The dump function completes, satisfied, and then, if it happens to be
that this is actually the last skb, and no further ones are to be sent,
then netlink_dump will add on the NLMSG_DONE part:

  nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);

It is very important that netlink_dump does this, of course. However, in
this example, that call to nlmsg_put_answer will fail, because the
previous filling by the dump function did not leave it enough room. And
how could it possibly have done so? All of the nla_put variety of
functions simply check to see if the skb has enough tailroom,
independent of the context it is in.

In order to keep the important assumptions of all netlink dump users, it
is therefore important to give them an skb that has this end part of the
tail already reserved, so that the call to nlmsg_put_answer does not
fail. Otherwise, library authors are forced to find some bizarre sized
receive buffer that has a large modulo relative to the common sizes of
messages received, which is ugly and buggy.

This patch thus saves the NLMSG_DONE for an additional message, for the
case that things are dangerously close to the brim. This requires
keeping track of the errno from ->dump() across calls.

Signed-off-by: Jason A. Donenfeld 
---
Can we get this into 4.14? Is there still time? It should also be queued
up for stable.

Changes v3->v4:
 - I like long lines. The kernel does not. Wrapped at 80 now.

 net/netlink/af_netlink.c | 17 +++--
 net/netlink/af_netlink.h |  1 +
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index b93148e8e9fb..15c99dfa3d72 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk)
struct sk_buff *skb = NULL;
struct nlmsghdr *nlh;
struct module *module;
-   int len, err = -ENOBUFS;
+   int err = -ENOBUFS;
int alloc_min_size;
int alloc_size;
 
@@ -2183,9 +2183,11 @@ static int netlink_dump(struct sock *sk)
skb_reserve(skb, skb_tailroom(skb) - alloc_size);
netlink_skb_set_owner_r(skb, sk);
 
-   len = cb->dump(skb, cb);
+   if (nlk->dump_done_errno > 0)
+   nlk->dump_done_errno = cb->dump(skb, cb);
 
-   if (len > 0) {
+   if (nlk->dump_done_errno > 0 ||
+   skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) 
{
mutex_unlock(nlk->cb_mutex);
 
if (sk_filter(sk, skb))
@@ -2195,13 +2197,15 @@ static int netlink_dump(struct sock *sk)
return 0;
}
 
-   nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
-   if (!nlh)
+   nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE,
+  sizeof(nlk->dump_done_errno), NLM_F_MULTI);
+   if (WARN_ON(!nlh))
goto errout_skb;
 
nl_dump_check_consistent(cb, nlh);
 
-   memcpy(nlmsg_data(nlh), , sizeof(len));
+   memcpy(nlmsg_data(nlh), >dump_done_errno,
+  sizeof(nlk->dump_done_errno));
 
if (sk_filter(sk, skb))
kfree_skb(skb);
@@ -2273,6 +2277,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff 
*skb,
}
 
nlk->cb_running = true;
+   nlk->dump_done_errno = INT_MAX;
 
mutex_unlock(nlk->cb_mutex);
 
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index 028188597eaa..962de7b3c023 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -34,6 +34,7 @@ struct netlink_sock {
wait_queue_head_t   wait;
boolbound;
boolcb_running;
+   int dump_done_errno;
struct netlink_callback cb;
struct mutex*cb_mutex;
struct mutexcb_def_mutex;
-- 
2.15.0



Re: [PATCH v3] af_netlink: ensure that NLMSG_DONE never fails in dumps

2017-11-08 Thread Jason A. Donenfeld
On Thu, Nov 9, 2017 at 11:02 AM, Johannes Berg
 wrote:
> nit: I think your line got a little long here :)

Ack. For v4.

> and here

Ack. For v4.

>
>> + nlk->dump_done_errno = INT_MAX;
>
> I guess positive values aren't really returned from dump?

When a positive value is returned, the API uses this to signify that
the dump is not done, and it should be called again. When 0 or
negative is returned, it means the dump is complete and the return
value should be returned in DONE. We therefore initialize
dump_done_errno to some positive value, so that we can make calling
->dump() conditional on it being positive. When it becomes zero or
negative, it's time to return. This mirrors the semantics of the
actual ->dump() function precisely.


Re: [PATCH v3] af_netlink: ensure that NLMSG_DONE never fails in dumps

2017-11-08 Thread Jason A. Donenfeld
On Thu, Nov 9, 2017 at 11:02 AM, Johannes Berg
 wrote:
> nit: I think your line got a little long here :)

Ack. For v4.

> and here

Ack. For v4.

>
>> + nlk->dump_done_errno = INT_MAX;
>
> I guess positive values aren't really returned from dump?

When a positive value is returned, the API uses this to signify that
the dump is not done, and it should be called again. When 0 or
negative is returned, it means the dump is complete and the return
value should be returned in DONE. We therefore initialize
dump_done_errno to some positive value, so that we can make calling
->dump() conditional on it being positive. When it becomes zero or
negative, it's time to return. This mirrors the semantics of the
actual ->dump() function precisely.


[PATCH v3] af_netlink: ensure that NLMSG_DONE never fails in dumps

2017-11-08 Thread Jason A. Donenfeld
The way people generally use netlink_dump is that they fill in the skb
as much as possible, breaking when nla_put returns an error. Then, they
get called again and start filling out the next skb, and again, and so
forth. The mechanism at work here is the ability for the iterative
dumping function to detect when the skb is filled up and not fill it
past the brim, waiting for a fresh skb for the rest of the data.

However, if the attributes are small and nicely packed, it is possible
that a dump callback function successfully fills in attributes until the
skb is of size 4080 (libmnl's default page-sized receive buffer size).
The dump function completes, satisfied, and then, if it happens to be
that this is actually the last skb, and no further ones are to be sent,
then netlink_dump will add on the NLMSG_DONE part:

  nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);

It is very important that netlink_dump does this, of course. However, in
this example, that call to nlmsg_put_answer will fail, because the
previous filling by the dump function did not leave it enough room. And
how could it possibly have done so? All of the nla_put variety of
functions simply check to see if the skb has enough tailroom,
independent of the context it is in.

In order to keep the important assumptions of all netlink dump users, it
is therefore important to give them an skb that has this end part of the
tail already reserved, so that the call to nlmsg_put_answer does not
fail. Otherwise, library authors are forced to find some bizarre sized
receive buffer that has a large modulo relative to the common sizes of
messages received, which is ugly and buggy.

This patch thus saves the NLMSG_DONE for an additional message, for the
case that things are dangerously close to the brim. This requires
keeping track of the errno from ->dump() across calls.

Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
---
Can we get this into 4.14? Is there still time? It should also be queued
up for stable.

Changes v2->v3:
  - Johannes didn't like the subject line of the patch, so the only
thing that's changed in this version is the new subject line.

 net/netlink/af_netlink.c | 14 --
 net/netlink/af_netlink.h |  1 +
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index b93148e8e9fb..7020689e643e 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk)
struct sk_buff *skb = NULL;
struct nlmsghdr *nlh;
struct module *module;
-   int len, err = -ENOBUFS;
+   int err = -ENOBUFS;
int alloc_min_size;
int alloc_size;
 
@@ -2183,9 +2183,10 @@ static int netlink_dump(struct sock *sk)
skb_reserve(skb, skb_tailroom(skb) - alloc_size);
netlink_skb_set_owner_r(skb, sk);
 
-   len = cb->dump(skb, cb);
+   if (nlk->dump_done_errno > 0)
+   nlk->dump_done_errno = cb->dump(skb, cb);
 
-   if (len > 0) {
+   if (nlk->dump_done_errno > 0 || skb_tailroom(skb) < 
nlmsg_total_size(sizeof(nlk->dump_done_errno))) {
mutex_unlock(nlk->cb_mutex);
 
if (sk_filter(sk, skb))
@@ -2195,13 +2196,13 @@ static int netlink_dump(struct sock *sk)
return 0;
}
 
-   nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
-   if (!nlh)
+   nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, 
sizeof(nlk->dump_done_errno), NLM_F_MULTI);
+   if (WARN_ON(!nlh))
goto errout_skb;
 
nl_dump_check_consistent(cb, nlh);
 
-   memcpy(nlmsg_data(nlh), , sizeof(len));
+   memcpy(nlmsg_data(nlh), >dump_done_errno, 
sizeof(nlk->dump_done_errno));
 
if (sk_filter(sk, skb))
kfree_skb(skb);
@@ -2273,6 +2274,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff 
*skb,
}
 
nlk->cb_running = true;
+   nlk->dump_done_errno = INT_MAX;
 
mutex_unlock(nlk->cb_mutex);
 
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index 028188597eaa..962de7b3c023 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -34,6 +34,7 @@ struct netlink_sock {
wait_queue_head_t   wait;
boolbound;
boolcb_running;
+   int dump_done_errno;
struct netlink_callback cb;
struct mutex*cb_mutex;
struct mutexcb_def_mutex;
-- 
2.15.0



[PATCH v3] af_netlink: ensure that NLMSG_DONE never fails in dumps

2017-11-08 Thread Jason A. Donenfeld
The way people generally use netlink_dump is that they fill in the skb
as much as possible, breaking when nla_put returns an error. Then, they
get called again and start filling out the next skb, and again, and so
forth. The mechanism at work here is the ability for the iterative
dumping function to detect when the skb is filled up and not fill it
past the brim, waiting for a fresh skb for the rest of the data.

However, if the attributes are small and nicely packed, it is possible
that a dump callback function successfully fills in attributes until the
skb is of size 4080 (libmnl's default page-sized receive buffer size).
The dump function completes, satisfied, and then, if it happens to be
that this is actually the last skb, and no further ones are to be sent,
then netlink_dump will add on the NLMSG_DONE part:

  nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);

It is very important that netlink_dump does this, of course. However, in
this example, that call to nlmsg_put_answer will fail, because the
previous filling by the dump function did not leave it enough room. And
how could it possibly have done so? All of the nla_put variety of
functions simply check to see if the skb has enough tailroom,
independent of the context it is in.

In order to keep the important assumptions of all netlink dump users, it
is therefore important to give them an skb that has this end part of the
tail already reserved, so that the call to nlmsg_put_answer does not
fail. Otherwise, library authors are forced to find some bizarre sized
receive buffer that has a large modulo relative to the common sizes of
messages received, which is ugly and buggy.

This patch thus saves the NLMSG_DONE for an additional message, for the
case that things are dangerously close to the brim. This requires
keeping track of the errno from ->dump() across calls.

Signed-off-by: Jason A. Donenfeld 
---
Can we get this into 4.14? Is there still time? It should also be queued
up for stable.

Changes v2->v3:
  - Johannes didn't like the subject line of the patch, so the only
thing that's changed in this version is the new subject line.

 net/netlink/af_netlink.c | 14 --
 net/netlink/af_netlink.h |  1 +
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index b93148e8e9fb..7020689e643e 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk)
struct sk_buff *skb = NULL;
struct nlmsghdr *nlh;
struct module *module;
-   int len, err = -ENOBUFS;
+   int err = -ENOBUFS;
int alloc_min_size;
int alloc_size;
 
@@ -2183,9 +2183,10 @@ static int netlink_dump(struct sock *sk)
skb_reserve(skb, skb_tailroom(skb) - alloc_size);
netlink_skb_set_owner_r(skb, sk);
 
-   len = cb->dump(skb, cb);
+   if (nlk->dump_done_errno > 0)
+   nlk->dump_done_errno = cb->dump(skb, cb);
 
-   if (len > 0) {
+   if (nlk->dump_done_errno > 0 || skb_tailroom(skb) < 
nlmsg_total_size(sizeof(nlk->dump_done_errno))) {
mutex_unlock(nlk->cb_mutex);
 
if (sk_filter(sk, skb))
@@ -2195,13 +2196,13 @@ static int netlink_dump(struct sock *sk)
return 0;
}
 
-   nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
-   if (!nlh)
+   nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, 
sizeof(nlk->dump_done_errno), NLM_F_MULTI);
+   if (WARN_ON(!nlh))
goto errout_skb;
 
nl_dump_check_consistent(cb, nlh);
 
-   memcpy(nlmsg_data(nlh), , sizeof(len));
+   memcpy(nlmsg_data(nlh), >dump_done_errno, 
sizeof(nlk->dump_done_errno));
 
if (sk_filter(sk, skb))
kfree_skb(skb);
@@ -2273,6 +2274,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff 
*skb,
}
 
nlk->cb_running = true;
+   nlk->dump_done_errno = INT_MAX;
 
mutex_unlock(nlk->cb_mutex);
 
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index 028188597eaa..962de7b3c023 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -34,6 +34,7 @@ struct netlink_sock {
wait_queue_head_t   wait;
boolbound;
boolcb_running;
+   int dump_done_errno;
struct netlink_callback cb;
struct mutex*cb_mutex;
struct mutexcb_def_mutex;
-- 
2.15.0



[PATCH v2] af_netlink: give correct bounds to dump skb for NLMSG_DONE

2017-11-07 Thread Jason A. Donenfeld
The way people generally use netlink_dump is that they fill in the skb
as much as possible, breaking when nla_put returns an error. Then, they
get called again and start filling out the next skb, and again, and so
forth. The mechanism at work here is the ability for the iterative
dumping function to detect when the skb is filled up and not fill it
past the brim, waiting for a fresh skb for the rest of the data.

However, if the attributes are small and nicely packed, it is possible
that a dump callback function successfully fills in attributes until the
skb is of size 4080 (libmnl's default page-sized receive buffer size).
The dump function completes, satisfied, and then, if it happens to be
that this is actually the last skb, and no further ones are to be sent,
then netlink_dump will add on the NLMSG_DONE part:

  nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);

It is very important that netlink_dump does this, of course. However, in
this example, that call to nlmsg_put_answer will fail, because the
previous filling by the dump function did not leave it enough room. And
how could it possibly have done so? All of the nla_put variety of
functions simply check to see if the skb has enough tailroom,
independent of the context it is in.

In order to keep the important assumptions of all netlink dump users, it
is therefore important to give them an skb that has this end part of the
tail already reserved, so that the call to nlmsg_put_answer does not
fail. Otherwise, library authors are forced to find some bizarre sized
receive buffer that has a large modulo relative to the common sizes of
messages received, which is ugly and buggy.

This patch thus saves the NLMSG_DONE for an additional message, for the
case that things are dangerously close to the brim. This requires
keeping track of the errno from ->dump() across calls.

Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
---
Changes v1->v2:
  - This implements things without having to twiddle with the skb, at
the expense of potentially having an extra back-and-forth and quite
a bit of added complexity.


 net/netlink/af_netlink.c | 14 --
 net/netlink/af_netlink.h |  1 +
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index b93148e8e9fb..7020689e643e 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk)
struct sk_buff *skb = NULL;
struct nlmsghdr *nlh;
struct module *module;
-   int len, err = -ENOBUFS;
+   int err = -ENOBUFS;
int alloc_min_size;
int alloc_size;
 
@@ -2183,9 +2183,10 @@ static int netlink_dump(struct sock *sk)
skb_reserve(skb, skb_tailroom(skb) - alloc_size);
netlink_skb_set_owner_r(skb, sk);
 
-   len = cb->dump(skb, cb);
+   if (nlk->dump_done_errno > 0)
+   nlk->dump_done_errno = cb->dump(skb, cb);
 
-   if (len > 0) {
+   if (nlk->dump_done_errno > 0 || skb_tailroom(skb) < 
nlmsg_total_size(sizeof(nlk->dump_done_errno))) {
mutex_unlock(nlk->cb_mutex);
 
if (sk_filter(sk, skb))
@@ -2195,13 +2196,13 @@ static int netlink_dump(struct sock *sk)
return 0;
}
 
-   nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
-   if (!nlh)
+   nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, 
sizeof(nlk->dump_done_errno), NLM_F_MULTI);
+   if (WARN_ON(!nlh))
goto errout_skb;
 
nl_dump_check_consistent(cb, nlh);
 
-   memcpy(nlmsg_data(nlh), , sizeof(len));
+   memcpy(nlmsg_data(nlh), >dump_done_errno, 
sizeof(nlk->dump_done_errno));
 
if (sk_filter(sk, skb))
kfree_skb(skb);
@@ -2273,6 +2274,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff 
*skb,
}
 
nlk->cb_running = true;
+   nlk->dump_done_errno = INT_MAX;
 
mutex_unlock(nlk->cb_mutex);
 
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index 028188597eaa..962de7b3c023 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -34,6 +34,7 @@ struct netlink_sock {
wait_queue_head_t   wait;
boolbound;
boolcb_running;
+   int dump_done_errno;
struct netlink_callback cb;
struct mutex*cb_mutex;
struct mutexcb_def_mutex;
-- 
2.15.0



[PATCH v2] af_netlink: give correct bounds to dump skb for NLMSG_DONE

2017-11-07 Thread Jason A. Donenfeld
The way people generally use netlink_dump is that they fill in the skb
as much as possible, breaking when nla_put returns an error. Then, they
get called again and start filling out the next skb, and again, and so
forth. The mechanism at work here is the ability for the iterative
dumping function to detect when the skb is filled up and not fill it
past the brim, waiting for a fresh skb for the rest of the data.

However, if the attributes are small and nicely packed, it is possible
that a dump callback function successfully fills in attributes until the
skb is of size 4080 (libmnl's default page-sized receive buffer size).
The dump function completes, satisfied, and then, if it happens to be
that this is actually the last skb, and no further ones are to be sent,
then netlink_dump will add on the NLMSG_DONE part:

  nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);

It is very important that netlink_dump does this, of course. However, in
this example, that call to nlmsg_put_answer will fail, because the
previous filling by the dump function did not leave it enough room. And
how could it possibly have done so? All of the nla_put variety of
functions simply check to see if the skb has enough tailroom,
independent of the context it is in.

In order to keep the important assumptions of all netlink dump users, it
is therefore important to give them an skb that has this end part of the
tail already reserved, so that the call to nlmsg_put_answer does not
fail. Otherwise, library authors are forced to find some bizarre sized
receive buffer that has a large modulo relative to the common sizes of
messages received, which is ugly and buggy.

This patch thus saves the NLMSG_DONE for an additional message, for the
case that things are dangerously close to the brim. This requires
keeping track of the errno from ->dump() across calls.

Signed-off-by: Jason A. Donenfeld 
---
Changes v1->v2:
  - This implements things without having to twiddle with the skb, at
the expense of potentially having an extra back-and-forth and quite
a bit of added complexity.


 net/netlink/af_netlink.c | 14 --
 net/netlink/af_netlink.h |  1 +
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index b93148e8e9fb..7020689e643e 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk)
struct sk_buff *skb = NULL;
struct nlmsghdr *nlh;
struct module *module;
-   int len, err = -ENOBUFS;
+   int err = -ENOBUFS;
int alloc_min_size;
int alloc_size;
 
@@ -2183,9 +2183,10 @@ static int netlink_dump(struct sock *sk)
skb_reserve(skb, skb_tailroom(skb) - alloc_size);
netlink_skb_set_owner_r(skb, sk);
 
-   len = cb->dump(skb, cb);
+   if (nlk->dump_done_errno > 0)
+   nlk->dump_done_errno = cb->dump(skb, cb);
 
-   if (len > 0) {
+   if (nlk->dump_done_errno > 0 || skb_tailroom(skb) < 
nlmsg_total_size(sizeof(nlk->dump_done_errno))) {
mutex_unlock(nlk->cb_mutex);
 
if (sk_filter(sk, skb))
@@ -2195,13 +2196,13 @@ static int netlink_dump(struct sock *sk)
return 0;
}
 
-   nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
-   if (!nlh)
+   nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, 
sizeof(nlk->dump_done_errno), NLM_F_MULTI);
+   if (WARN_ON(!nlh))
goto errout_skb;
 
nl_dump_check_consistent(cb, nlh);
 
-   memcpy(nlmsg_data(nlh), , sizeof(len));
+   memcpy(nlmsg_data(nlh), >dump_done_errno, 
sizeof(nlk->dump_done_errno));
 
if (sk_filter(sk, skb))
kfree_skb(skb);
@@ -2273,6 +2274,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff 
*skb,
}
 
nlk->cb_running = true;
+   nlk->dump_done_errno = INT_MAX;
 
mutex_unlock(nlk->cb_mutex);
 
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index 028188597eaa..962de7b3c023 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -34,6 +34,7 @@ struct netlink_sock {
wait_queue_head_t   wait;
boolbound;
boolcb_running;
+   int dump_done_errno;
struct netlink_callback cb;
struct mutex*cb_mutex;
struct mutexcb_def_mutex;
-- 
2.15.0



Re: [PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE

2017-11-07 Thread Jason A. Donenfeld
Erf, your patch doesn't handle what happens if len comes back
negative, but I'll fix it up and send a v2 using this approach.

I think I really prefer v1 though.

Jason


Re: [PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE

2017-11-07 Thread Jason A. Donenfeld
Erf, your patch doesn't handle what happens if len comes back
negative, but I'll fix it up and send a v2 using this approach.

I think I really prefer v1 though.

Jason


Re: [PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE

2017-11-07 Thread Jason A. Donenfeld
Hi Johannes,

Yes indeed. It sacrifices 24 bytes for making things much less
complex. However, if you prefer increasing the complexity of the state
machine a bit instead, I suppose we could roll with this approach
instead...

Jason


Re: [PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE

2017-11-07 Thread Jason A. Donenfeld
Hi Johannes,

Yes indeed. It sacrifices 24 bytes for making things much less
complex. However, if you prefer increasing the complexity of the state
machine a bit instead, I suppose we could roll with this approach
instead...

Jason


Re: [PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE

2017-11-07 Thread Jason A. Donenfeld
By the way, in case you're curious, here's the {up,down,cross}stream
WireGuard commit that works around it via its compat layer (a rat's
nest of hideous backports for all the weird kernels people want
WireGuard to run on, which I cannot wait to remove):

https://git.zx2c4.com/WireGuard/commit/?id=f689ea7acc23dc8e0968699d964ee382b04fbbe4

Particularly relavent here is the last chunk of that, which is part of
the automated test suite, which reproduces the issue by finding the
tightest possible packing.


Re: [PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE

2017-11-07 Thread Jason A. Donenfeld
By the way, in case you're curious, here's the {up,down,cross}stream
WireGuard commit that works around it via its compat layer (a rat's
nest of hideous backports for all the weird kernels people want
WireGuard to run on, which I cannot wait to remove):

https://git.zx2c4.com/WireGuard/commit/?id=f689ea7acc23dc8e0968699d964ee382b04fbbe4

Particularly relavent here is the last chunk of that, which is part of
the automated test suite, which reproduces the issue by finding the
tightest possible packing.


[PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE

2017-11-07 Thread Jason A. Donenfeld
The way people generally use netlink_dump is that they fill in the skb
as much as possible, breaking when nla_put returns an error. Then, they
get called again and start filling out the next skb, and again, and so
forth. The mechanism at work here is the ability for the iterative
dumping function to detect when the skb is filled up and not fill it
past the brim, waiting for a fresh skb for the rest of the data.

However, if the attributes are small and nicely packed, it is possible
that a dump callback function successfully fills in attributes until the
skb is of size 4080 (libmnl's default page-sized receive buffer size).
The dump function completes, satisfied, and then, if it happens to be
that this is actually the last skb, and no further ones are to be sent,
then netlink_dump will add on the NLMSG_DONE part:

  nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);

It is very important that netlink_dump does this, of course. However, in
this example, that call to nlmsg_put_answer will fail, because the
previous filling by the dump function did not leave it enough room. And
how could it possibly have done so? All of the nla_put variety of
functions simply check to see if the skb has enough tailroom,
independent of the context it is in.

In order to keep the important assumptions of all netlink dump users, it
is therefore important to give them an skb that has this end part of the
tail already reserved, so that the call to nlmsg_put_answer does not
fail. Otherwise, library authors are forced to find some bizarre sized
receive buffer that has a large modulo relative to the common sizes of
messages received, which is ugly and buggy.

This patch thus reserves and restores the required length for
NLMSG_DONE during the call to the dump function.

Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
---
See you all at netdevconf tomorrow!

 net/netlink/af_netlink.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index b93148e8e9fb..b2d0a2fb1939 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2183,7 +2183,9 @@ static int netlink_dump(struct sock *sk)
skb_reserve(skb, skb_tailroom(skb) - alloc_size);
netlink_skb_set_owner_r(skb, sk);
 
+   skb->end -= nlmsg_total_size(sizeof(len));
len = cb->dump(skb, cb);
+   skb->end += nlmsg_total_size(sizeof(len));
 
if (len > 0) {
mutex_unlock(nlk->cb_mutex);
-- 
2.15.0



[PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE

2017-11-07 Thread Jason A. Donenfeld
The way people generally use netlink_dump is that they fill in the skb
as much as possible, breaking when nla_put returns an error. Then, they
get called again and start filling out the next skb, and again, and so
forth. The mechanism at work here is the ability for the iterative
dumping function to detect when the skb is filled up and not fill it
past the brim, waiting for a fresh skb for the rest of the data.

However, if the attributes are small and nicely packed, it is possible
that a dump callback function successfully fills in attributes until the
skb is of size 4080 (libmnl's default page-sized receive buffer size).
The dump function completes, satisfied, and then, if it happens to be
that this is actually the last skb, and no further ones are to be sent,
then netlink_dump will add on the NLMSG_DONE part:

  nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);

It is very important that netlink_dump does this, of course. However, in
this example, that call to nlmsg_put_answer will fail, because the
previous filling by the dump function did not leave it enough room. And
how could it possibly have done so? All of the nla_put variety of
functions simply check to see if the skb has enough tailroom,
independent of the context it is in.

In order to keep the important assumptions of all netlink dump users, it
is therefore important to give them an skb that has this end part of the
tail already reserved, so that the call to nlmsg_put_answer does not
fail. Otherwise, library authors are forced to find some bizarre sized
receive buffer that has a large modulo relative to the common sizes of
messages received, which is ugly and buggy.

This patch thus reserves and restores the required length for
NLMSG_DONE during the call to the dump function.

Signed-off-by: Jason A. Donenfeld 
---
See you all at netdevconf tomorrow!

 net/netlink/af_netlink.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index b93148e8e9fb..b2d0a2fb1939 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2183,7 +2183,9 @@ static int netlink_dump(struct sock *sk)
skb_reserve(skb, skb_tailroom(skb) - alloc_size);
netlink_skb_set_owner_r(skb, sk);
 
+   skb->end -= nlmsg_total_size(sizeof(len));
len = cb->dump(skb, cb);
+   skb->end += nlmsg_total_size(sizeof(len));
 
if (len > 0) {
mutex_unlock(nlk->cb_mutex);
-- 
2.15.0



Re: [PATCH v4] arm64: support __int128 on gcc 5+

2017-11-06 Thread Jason A. Donenfeld
Hi Will,

On Tue, Nov 7, 2017 at 11:13 AM, Will Deacon  wrote:
> So I pushed a fixup patch on top of for-next/core, but I suggest we
> temporarily revert the- DCONFIG_ARCH_SUPPORTS_INT128 option if any other
> issues crop up.

The fixup looks good to me.

If there are additional problems, I'm happy to keep providing patches
until it's sorted. I just did an allyesconfig and didn't have any
problems, though, so I think we're all set now.

Jason


Re: [PATCH v4] arm64: support __int128 on gcc 5+

2017-11-06 Thread Jason A. Donenfeld
Hi Will,

On Tue, Nov 7, 2017 at 11:13 AM, Will Deacon  wrote:
> So I pushed a fixup patch on top of for-next/core, but I suggest we
> temporarily revert the- DCONFIG_ARCH_SUPPORTS_INT128 option if any other
> issues crop up.

The fixup looks good to me.

If there are additional problems, I'm happy to keep providing patches
until it's sorted. I just did an allyesconfig and didn't have any
problems, though, so I think we're all set now.

Jason


Re: [PATCH v4] arm64: support __int128 on gcc 5+

2017-11-06 Thread Jason A. Donenfeld
On Tue, Nov 7, 2017 at 1:55 AM, Ard Biesheuvel
 wrote:
> It appears your v4 adds __ashlti3() and __ashrti3, whereas the error
> is about __lshrti3() being undefined.

Whoopsie. v5 adds the final missing function. Looks like it now
compiles for -next with the config you provided me.


Re: [PATCH v4] arm64: support __int128 on gcc 5+

2017-11-06 Thread Jason A. Donenfeld
On Tue, Nov 7, 2017 at 1:55 AM, Ard Biesheuvel
 wrote:
> It appears your v4 adds __ashlti3() and __ashrti3, whereas the error
> is about __lshrti3() being undefined.

Whoopsie. v5 adds the final missing function. Looks like it now
compiles for -next with the config you provided me.


[PATCH v5] arm64: support __int128 on gcc 5+

2017-11-06 Thread Jason A. Donenfeld
Versions of gcc prior to gcc 5 emitted a __multi3 function call when
dealing with TI types, resulting in failures when trying to link to
libgcc, and more generally, bad performance. However, since gcc 5,
the compiler supports actually emitting fast instructions, which means
we can at long last enable this option and receive the speedups.

The gcc commit that added proper Aarch64 support is:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d
This commit appears to be part of the gcc 5 release.

There are still a few instructions, __ashlti3, __ashrti3, and __lshrti3
which require libgcc, which is fine. Rather than linking to libgcc, we
simply provide them ourselves, since they're not that complicated.

Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
---
Changes v4->v5:
   - Add implementation of __lshrti3. (Note that __lshlti3 is not
 a function.)

 arch/arm64/Makefile  |  2 ++
 arch/arm64/lib/Makefile  |  2 +-
 arch/arm64/lib/tishift.S | 80 
 3 files changed, 83 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/lib/tishift.S

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 939b310913cf..1f8a0fec6998 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -53,6 +53,8 @@ KBUILD_AFLAGS += $(lseinstr) $(brokengasinst)
 KBUILD_CFLAGS  += $(call cc-option,-mabi=lp64)
 KBUILD_AFLAGS  += $(call cc-option,-mabi=lp64)
 
+KBUILD_CFLAGS  += $(call cc-ifversion, -ge, 0500, 
-DCONFIG_ARCH_SUPPORTS_INT128)
+
 ifeq ($(CONFIG_CPU_BIG_ENDIAN), y)
 KBUILD_CPPFLAGS+= -mbig-endian
 CHECKFLAGS += -D__AARCH64EB__
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 9a8cb96555d6..4e696f96451f 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -3,7 +3,7 @@ lib-y   := bitops.o clear_user.o delay.o 
copy_from_user.o   \
   copy_to_user.o copy_in_user.o copy_page.o\
   clear_page.o memchr.o memcpy.o memmove.o memset.o\
   memcmp.o strcmp.o strncmp.o strlen.o strnlen.o   \
-  strchr.o strrchr.o
+  strchr.o strrchr.o tishift.o
 
 # Tell the compiler to treat all general purpose registers (with the
 # exception of the IP registers, which are already handled by the caller
diff --git a/arch/arm64/lib/tishift.S b/arch/arm64/lib/tishift.S
new file mode 100644
index ..d3db9b2cd479
--- /dev/null
+++ b/arch/arm64/lib/tishift.S
@@ -0,0 +1,80 @@
+/*
+ * Copyright (C) 2017 Jason A. Donenfeld <ja...@zx2c4.com>. All Rights 
Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include 
+
+ENTRY(__ashlti3)
+   cbz x2, 1f
+   mov x3, #64
+   sub x3, x3, x2
+   cmp x3, #0
+   b.le2f
+   lsl x1, x1, x2
+   lsr x3, x0, x3
+   lsl x2, x0, x2
+   orr x1, x1, x3
+   mov x0, x2
+1:
+   ret
+2:
+   neg w1, w3
+   mov x2, #0
+   lsl x1, x0, x1
+   mov x0, x2
+   ret
+ENDPROC(__ashlti3)
+
+ENTRY(__ashrti3)
+   cbz x2, 1f
+   mov x3, #64
+   sub x3, x3, x2
+   cmp x3, #0
+   b.le2f
+   lsr x0, x0, x2
+   lsl x3, x1, x3
+   asr x2, x1, x2
+   orr x0, x0, x3
+   mov x1, x2
+1:
+   ret
+2:
+   neg w0, w3
+   asr x2, x1, #63
+   asr x0, x1, x0
+   mov x1, x2
+   ret
+ENDPROC(__ashrti3)
+
+ENTRY(__lshrti3)
+   cbz x2, 1f
+   mov x3, #64
+   sub x3, x3, x2
+   cmp x3, #0
+   b.le2f
+   lsr x0, x0, x2
+   lsl x3, x1, x3
+   lsr x2, x1, x2
+   orr x0, x0, x3
+   mov x1, x2
+1:
+   ret
+2:
+   neg w0, w3
+   mov x2, #0
+   lsr x0, x1, x0
+   mov x1, x2
+   ret
+ENDPROC(__lshrti3)
-- 
2.15.0



[PATCH v5] arm64: support __int128 on gcc 5+

2017-11-06 Thread Jason A. Donenfeld
Versions of gcc prior to gcc 5 emitted a __multi3 function call when
dealing with TI types, resulting in failures when trying to link to
libgcc, and more generally, bad performance. However, since gcc 5,
the compiler supports actually emitting fast instructions, which means
we can at long last enable this option and receive the speedups.

The gcc commit that added proper Aarch64 support is:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d
This commit appears to be part of the gcc 5 release.

There are still a few instructions, __ashlti3, __ashrti3, and __lshrti3
which require libgcc, which is fine. Rather than linking to libgcc, we
simply provide them ourselves, since they're not that complicated.

Signed-off-by: Jason A. Donenfeld 
---
Changes v4->v5:
   - Add implementation of __lshrti3. (Note that __lshlti3 is not
 a function.)

 arch/arm64/Makefile  |  2 ++
 arch/arm64/lib/Makefile  |  2 +-
 arch/arm64/lib/tishift.S | 80 
 3 files changed, 83 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/lib/tishift.S

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 939b310913cf..1f8a0fec6998 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -53,6 +53,8 @@ KBUILD_AFLAGS += $(lseinstr) $(brokengasinst)
 KBUILD_CFLAGS  += $(call cc-option,-mabi=lp64)
 KBUILD_AFLAGS  += $(call cc-option,-mabi=lp64)
 
+KBUILD_CFLAGS  += $(call cc-ifversion, -ge, 0500, 
-DCONFIG_ARCH_SUPPORTS_INT128)
+
 ifeq ($(CONFIG_CPU_BIG_ENDIAN), y)
 KBUILD_CPPFLAGS+= -mbig-endian
 CHECKFLAGS += -D__AARCH64EB__
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 9a8cb96555d6..4e696f96451f 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -3,7 +3,7 @@ lib-y   := bitops.o clear_user.o delay.o 
copy_from_user.o   \
   copy_to_user.o copy_in_user.o copy_page.o\
   clear_page.o memchr.o memcpy.o memmove.o memset.o\
   memcmp.o strcmp.o strncmp.o strlen.o strnlen.o   \
-  strchr.o strrchr.o
+  strchr.o strrchr.o tishift.o
 
 # Tell the compiler to treat all general purpose registers (with the
 # exception of the IP registers, which are already handled by the caller
diff --git a/arch/arm64/lib/tishift.S b/arch/arm64/lib/tishift.S
new file mode 100644
index ..d3db9b2cd479
--- /dev/null
+++ b/arch/arm64/lib/tishift.S
@@ -0,0 +1,80 @@
+/*
+ * Copyright (C) 2017 Jason A. Donenfeld . All Rights 
Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include 
+
+ENTRY(__ashlti3)
+   cbz x2, 1f
+   mov x3, #64
+   sub x3, x3, x2
+   cmp x3, #0
+   b.le2f
+   lsl x1, x1, x2
+   lsr x3, x0, x3
+   lsl x2, x0, x2
+   orr x1, x1, x3
+   mov x0, x2
+1:
+   ret
+2:
+   neg w1, w3
+   mov x2, #0
+   lsl x1, x0, x1
+   mov x0, x2
+   ret
+ENDPROC(__ashlti3)
+
+ENTRY(__ashrti3)
+   cbz x2, 1f
+   mov x3, #64
+   sub x3, x3, x2
+   cmp x3, #0
+   b.le2f
+   lsr x0, x0, x2
+   lsl x3, x1, x3
+   asr x2, x1, x2
+   orr x0, x0, x3
+   mov x1, x2
+1:
+   ret
+2:
+   neg w0, w3
+   asr x2, x1, #63
+   asr x0, x1, x0
+   mov x1, x2
+   ret
+ENDPROC(__ashrti3)
+
+ENTRY(__lshrti3)
+   cbz x2, 1f
+   mov x3, #64
+   sub x3, x3, x2
+   cmp x3, #0
+   b.le2f
+   lsr x0, x0, x2
+   lsl x3, x1, x3
+   lsr x2, x1, x2
+   orr x0, x0, x3
+   mov x1, x2
+1:
+   ret
+2:
+   neg w0, w3
+   mov x2, #0
+   lsr x0, x1, x0
+   mov x1, x2
+   ret
+ENDPROC(__lshrti3)
-- 
2.15.0



[PATCH v4] arm64: support __int128 on gcc 5+

2017-11-06 Thread Jason A. Donenfeld
Versions of gcc prior to gcc 5 emitted a __multi3 function call when
dealing with TI types, resulting in failures when trying to link to
libgcc, and more generally, bad performance. However, since gcc 5,
the compiler supports actually emitting fast instructions, which means
we can at long last enable this option and receive the speedups.

The gcc commit that added proper Aarch64 support is:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d
This commit appears to be part of the gcc 5 release.

There are still a few instructions, __ashlti3 and __ashrti3, which
require libgcc, which is fine. Rather than linking to libgcc, we
simply provide them ourselves, since they're not that complicated.

Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
---
Changes v3->v4:
  - Typo in comment
  - Relabel second function

 arch/arm64/Makefile  |  2 ++
 arch/arm64/lib/Makefile  |  2 +-
 arch/arm64/lib/tishift.S | 59 
 3 files changed, 62 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/lib/tishift.S

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 939b310913cf..1f8a0fec6998 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -53,6 +53,8 @@ KBUILD_AFLAGS += $(lseinstr) $(brokengasinst)
 KBUILD_CFLAGS  += $(call cc-option,-mabi=lp64)
 KBUILD_AFLAGS  += $(call cc-option,-mabi=lp64)
 
+KBUILD_CFLAGS  += $(call cc-ifversion, -ge, 0500, 
-DCONFIG_ARCH_SUPPORTS_INT128)
+
 ifeq ($(CONFIG_CPU_BIG_ENDIAN), y)
 KBUILD_CPPFLAGS+= -mbig-endian
 CHECKFLAGS += -D__AARCH64EB__
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index a0abc142c92b..55bdb01f1ea6 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -2,7 +2,7 @@ lib-y   := bitops.o clear_user.o delay.o 
copy_from_user.o   \
   copy_to_user.o copy_in_user.o copy_page.o\
   clear_page.o memchr.o memcpy.o memmove.o memset.o\
   memcmp.o strcmp.o strncmp.o strlen.o strnlen.o   \
-  strchr.o strrchr.o
+  strchr.o strrchr.o tishift.o
 
 # Tell the compiler to treat all general purpose registers (with the
 # exception of the IP registers, which are already handled by the caller
diff --git a/arch/arm64/lib/tishift.S b/arch/arm64/lib/tishift.S
new file mode 100644
index ..bffe03c478a5
--- /dev/null
+++ b/arch/arm64/lib/tishift.S
@@ -0,0 +1,59 @@
+/*
+ * Copyright (C) 2017 Jason A. Donenfeld <ja...@zx2c4.com>. All Rights 
Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include 
+
+ENTRY(__ashlti3)
+   cbz x2, 1f
+   mov x3, #64
+   sub x3, x3, x2
+   cmp x3, #0
+   b.le2f
+   lsl x1, x1, x2
+   lsr x3, x0, x3
+   lsl x2, x0, x2
+   orr x1, x1, x3
+   mov x0, x2
+1:
+   ret
+2:
+   neg w1, w3
+   mov x2, #0
+   lsl x1, x0, x1
+   mov x0, x2
+   ret
+ENDPROC(__ashlti3)
+
+ENTRY(__ashrti3)
+   cbz x2, 1f
+   mov x3, #64
+   sub x3, x3, x2
+   cmp x3, #0
+   b.le2f
+   lsr x0, x0, x2
+   lsl x3, x1, x3
+   asr x2, x1, x2
+   orr x0, x0, x3
+   mov x1, x2
+1:
+   ret
+2:
+   neg w0, w3
+   asr x2, x1, #63
+   asr x0, x1, x0
+   mov x1, x2
+   ret
+ENDPROC(__ashrti3)
-- 
2.15.0



[PATCH v4] arm64: support __int128 on gcc 5+

2017-11-06 Thread Jason A. Donenfeld
Versions of gcc prior to gcc 5 emitted a __multi3 function call when
dealing with TI types, resulting in failures when trying to link to
libgcc, and more generally, bad performance. However, since gcc 5,
the compiler supports actually emitting fast instructions, which means
we can at long last enable this option and receive the speedups.

The gcc commit that added proper Aarch64 support is:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d
This commit appears to be part of the gcc 5 release.

There are still a few instructions, __ashlti3 and __ashrti3, which
require libgcc, which is fine. Rather than linking to libgcc, we
simply provide them ourselves, since they're not that complicated.

Signed-off-by: Jason A. Donenfeld 
---
Changes v3->v4:
  - Typo in comment
  - Relabel second function

 arch/arm64/Makefile  |  2 ++
 arch/arm64/lib/Makefile  |  2 +-
 arch/arm64/lib/tishift.S | 59 
 3 files changed, 62 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/lib/tishift.S

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 939b310913cf..1f8a0fec6998 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -53,6 +53,8 @@ KBUILD_AFLAGS += $(lseinstr) $(brokengasinst)
 KBUILD_CFLAGS  += $(call cc-option,-mabi=lp64)
 KBUILD_AFLAGS  += $(call cc-option,-mabi=lp64)
 
+KBUILD_CFLAGS  += $(call cc-ifversion, -ge, 0500, 
-DCONFIG_ARCH_SUPPORTS_INT128)
+
 ifeq ($(CONFIG_CPU_BIG_ENDIAN), y)
 KBUILD_CPPFLAGS+= -mbig-endian
 CHECKFLAGS += -D__AARCH64EB__
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index a0abc142c92b..55bdb01f1ea6 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -2,7 +2,7 @@ lib-y   := bitops.o clear_user.o delay.o 
copy_from_user.o   \
   copy_to_user.o copy_in_user.o copy_page.o\
   clear_page.o memchr.o memcpy.o memmove.o memset.o\
   memcmp.o strcmp.o strncmp.o strlen.o strnlen.o   \
-  strchr.o strrchr.o
+  strchr.o strrchr.o tishift.o
 
 # Tell the compiler to treat all general purpose registers (with the
 # exception of the IP registers, which are already handled by the caller
diff --git a/arch/arm64/lib/tishift.S b/arch/arm64/lib/tishift.S
new file mode 100644
index ..bffe03c478a5
--- /dev/null
+++ b/arch/arm64/lib/tishift.S
@@ -0,0 +1,59 @@
+/*
+ * Copyright (C) 2017 Jason A. Donenfeld . All Rights 
Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include 
+
+ENTRY(__ashlti3)
+   cbz x2, 1f
+   mov x3, #64
+   sub x3, x3, x2
+   cmp x3, #0
+   b.le2f
+   lsl x1, x1, x2
+   lsr x3, x0, x3
+   lsl x2, x0, x2
+   orr x1, x1, x3
+   mov x0, x2
+1:
+   ret
+2:
+   neg w1, w3
+   mov x2, #0
+   lsl x1, x0, x1
+   mov x0, x2
+   ret
+ENDPROC(__ashlti3)
+
+ENTRY(__ashrti3)
+   cbz x2, 1f
+   mov x3, #64
+   sub x3, x3, x2
+   cmp x3, #0
+   b.le2f
+   lsr x0, x0, x2
+   lsl x3, x1, x3
+   asr x2, x1, x2
+   orr x0, x0, x3
+   mov x1, x2
+1:
+   ret
+2:
+   neg w0, w3
+   asr x2, x1, #63
+   asr x0, x1, x0
+   mov x1, x2
+   ret
+ENDPROC(__ashrti3)
-- 
2.15.0



[PATCH v3] arm64: support __int128 on gcc 5+

2017-11-03 Thread Jason A. Donenfeld
Versions of gcc prior to gcc 5 emitted a __multi3 function call when
dealing with TI types, resulting in failures when trying to link to
libgcc, and more generally, bad performance. However, since gcc 5,
the compiler supports actually emitting fast instructions, which means
we can at long last enable this option and receive the speedups.

The gcc commit that added proper Aarch64 support is:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d
This commit appears to be part of the gcc 5 release.

There are still a few instructions, __ashlti3 and __ashrti3, which
require libgcc, which is fine. Rather than linking to libgcc, we
simply provide them ourselves, since they're not that complicated.

Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
---
Changes v2->v3:
  - We now just provide trivial implementations for the missing
functions, rather than linking to libgcc.

 arch/arm64/Makefile  |  2 ++
 arch/arm64/lib/Makefile  |  2 +-
 arch/arm64/lib/tishift.S | 59 
 3 files changed, 62 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/lib/tishift.S

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 939b310913cf..1f8a0fec6998 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -53,6 +53,8 @@ KBUILD_AFLAGS += $(lseinstr) $(brokengasinst)
 KBUILD_CFLAGS  += $(call cc-option,-mabi=lp64)
 KBUILD_AFLAGS  += $(call cc-option,-mabi=lp64)
 
+KBUILD_CFLAGS  += $(call cc-ifversion, -ge, 0500, 
-DCONFIG_ARCH_SUPPORTS_INT128)
+
 ifeq ($(CONFIG_CPU_BIG_ENDIAN), y)
 KBUILD_CPPFLAGS+= -mbig-endian
 CHECKFLAGS += -D__AARCH64EB__
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index a0abc142c92b..55bdb01f1ea6 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -2,7 +2,7 @@ lib-y   := bitops.o clear_user.o delay.o 
copy_from_user.o   \
   copy_to_user.o copy_in_user.o copy_page.o\
   clear_page.o memchr.o memcpy.o memmove.o memset.o\
   memcmp.o strcmp.o strncmp.o strlen.o strnlen.o   \
-  strchr.o strrchr.o
+  strchr.o strrchr.o tishift.o
 
 # Tell the compiler to treat all general purpose registers (with the
 # exception of the IP registers, which are already handled by the caller
diff --git a/arch/arm64/lib/tishift.S b/arch/arm64/lib/tishift.S
new file mode 100644
index ..47b6f7ee2b11
--- /dev/null
+++ b/arch/arm64/lib/tishift.S
@@ -0,0 +1,59 @@
+/*
+ * Copyright (C) 2017 Jason A. Donenfeld <ja...@zx2c4.com>. All Rights 
Resreved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include 
+
+ENTRY(__ashlti3)
+   cbz x2, 1f
+   mov x3, #64
+   sub x3, x3, x2
+   cmp x3, #0
+   b.le2f
+   lsl x1, x1, x2
+   lsr x3, x0, x3
+   lsl x2, x0, x2
+   orr x1, x1, x3
+   mov x0, x2
+1:
+   ret
+2:
+   neg w1, w3
+   mov x2, #0
+   lsl x1, x0, x1
+   mov x0, x2
+   ret
+ENDPROC(__ashlti3)
+
+ENTRY(__ashrti3)
+   cbz x2, 3f
+   mov x3, #64
+   sub x3, x3, x2
+   cmp x3, #0
+   b.le4f
+   lsr x0, x0, x2
+   lsl x3, x1, x3
+   asr x2, x1, x2
+   orr x0, x0, x3
+   mov x1, x2
+3:
+   ret
+4:
+   neg w0, w3
+   asr x2, x1, #63
+   asr x0, x1, x0
+   mov x1, x2
+   ret
+ENDPROC(__ashrti3)
-- 
2.14.2



[PATCH v3] arm64: support __int128 on gcc 5+

2017-11-03 Thread Jason A. Donenfeld
Versions of gcc prior to gcc 5 emitted a __multi3 function call when
dealing with TI types, resulting in failures when trying to link to
libgcc, and more generally, bad performance. However, since gcc 5,
the compiler supports actually emitting fast instructions, which means
we can at long last enable this option and receive the speedups.

The gcc commit that added proper Aarch64 support is:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d
This commit appears to be part of the gcc 5 release.

There are still a few instructions, __ashlti3 and __ashrti3, which
require libgcc, which is fine. Rather than linking to libgcc, we
simply provide them ourselves, since they're not that complicated.

Signed-off-by: Jason A. Donenfeld 
---
Changes v2->v3:
  - We now just provide trivial implementations for the missing
functions, rather than linking to libgcc.

 arch/arm64/Makefile  |  2 ++
 arch/arm64/lib/Makefile  |  2 +-
 arch/arm64/lib/tishift.S | 59 
 3 files changed, 62 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/lib/tishift.S

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 939b310913cf..1f8a0fec6998 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -53,6 +53,8 @@ KBUILD_AFLAGS += $(lseinstr) $(brokengasinst)
 KBUILD_CFLAGS  += $(call cc-option,-mabi=lp64)
 KBUILD_AFLAGS  += $(call cc-option,-mabi=lp64)
 
+KBUILD_CFLAGS  += $(call cc-ifversion, -ge, 0500, 
-DCONFIG_ARCH_SUPPORTS_INT128)
+
 ifeq ($(CONFIG_CPU_BIG_ENDIAN), y)
 KBUILD_CPPFLAGS+= -mbig-endian
 CHECKFLAGS += -D__AARCH64EB__
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index a0abc142c92b..55bdb01f1ea6 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -2,7 +2,7 @@ lib-y   := bitops.o clear_user.o delay.o 
copy_from_user.o   \
   copy_to_user.o copy_in_user.o copy_page.o\
   clear_page.o memchr.o memcpy.o memmove.o memset.o\
   memcmp.o strcmp.o strncmp.o strlen.o strnlen.o   \
-  strchr.o strrchr.o
+  strchr.o strrchr.o tishift.o
 
 # Tell the compiler to treat all general purpose registers (with the
 # exception of the IP registers, which are already handled by the caller
diff --git a/arch/arm64/lib/tishift.S b/arch/arm64/lib/tishift.S
new file mode 100644
index ..47b6f7ee2b11
--- /dev/null
+++ b/arch/arm64/lib/tishift.S
@@ -0,0 +1,59 @@
+/*
+ * Copyright (C) 2017 Jason A. Donenfeld . All Rights 
Resreved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include 
+
+ENTRY(__ashlti3)
+   cbz x2, 1f
+   mov x3, #64
+   sub x3, x3, x2
+   cmp x3, #0
+   b.le2f
+   lsl x1, x1, x2
+   lsr x3, x0, x3
+   lsl x2, x0, x2
+   orr x1, x1, x3
+   mov x0, x2
+1:
+   ret
+2:
+   neg w1, w3
+   mov x2, #0
+   lsl x1, x0, x1
+   mov x0, x2
+   ret
+ENDPROC(__ashlti3)
+
+ENTRY(__ashrti3)
+   cbz x2, 3f
+   mov x3, #64
+   sub x3, x3, x2
+   cmp x3, #0
+   b.le4f
+   lsr x0, x0, x2
+   lsl x3, x1, x3
+   asr x2, x1, x2
+   orr x0, x0, x3
+   mov x1, x2
+3:
+   ret
+4:
+   neg w0, w3
+   asr x2, x1, #63
+   asr x0, x1, x0
+   mov x1, x2
+   ret
+ENDPROC(__ashrti3)
-- 
2.14.2



Re: [PATCH v2] arm64: support __int128 on gcc 5+

2017-11-03 Thread Jason A. Donenfeld
On Fri, Nov 3, 2017 at 2:42 PM, Will Deacon  wrote:
> We used to link against libgcc way back when, but that dependency was
> removed in commit d67703a8a69e ("arm64: kill off the libgcc dependency")
> and I'm really not keen to add it back. I also think that there might
> be licensing concerns if you link against it and make use of GCC plugins,
> but IANAL.

Considering many other architectures link to it, that seems strange to me...

>
> Shouldn't we just provide our own implementations of __ashlti3 and
> __ashrti3 instead?

I just coded up an implementation. I'm testing it now and I'll have v3
for you shortly.

Jason


Re: [PATCH v2] arm64: support __int128 on gcc 5+

2017-11-03 Thread Jason A. Donenfeld
On Fri, Nov 3, 2017 at 2:42 PM, Will Deacon  wrote:
> We used to link against libgcc way back when, but that dependency was
> removed in commit d67703a8a69e ("arm64: kill off the libgcc dependency")
> and I'm really not keen to add it back. I also think that there might
> be licensing concerns if you link against it and make use of GCC plugins,
> but IANAL.

Considering many other architectures link to it, that seems strange to me...

>
> Shouldn't we just provide our own implementations of __ashlti3 and
> __ashrti3 instead?

I just coded up an implementation. I'm testing it now and I'll have v3
for you shortly.

Jason


Re: [PATCH] arm64: support __int128 on gcc 5+

2017-11-02 Thread Jason A. Donenfeld
Hi Will,

Nice catch. I posted a v2 that addresses that.

Thanks,
Jason


Re: [PATCH] arm64: support __int128 on gcc 5+

2017-11-02 Thread Jason A. Donenfeld
Hi Will,

Nice catch. I posted a v2 that addresses that.

Thanks,
Jason


[PATCH v2] arm64: support __int128 on gcc 5+

2017-11-02 Thread Jason A. Donenfeld
Versions of gcc prior to gcc 5 emitted a __multi3 function call when
dealing with TI types, resulting in failures when trying to link to
libgcc, and more generally, bad performance. However, since gcc 5,
the compiler supports actually emitting fast instructions, which means
we can at long last enable this option and receive the speedups.

The gcc commit that added proper Aarch64 support is:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d
This commit appears to be part of the gcc 5 release.

There are still a few instructions, such as __ashlti3 and __ashrti3,
that require libgcc, which is fine. So, we also link to libgcc for these
functions when needed, which is what several other architectures already
have been doing for a long time.

Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
---
 arch/arm64/Makefile | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 939b310913cf..70c7c0c1bccb 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -53,6 +53,8 @@ KBUILD_AFLAGS += $(lseinstr) $(brokengasinst)
 KBUILD_CFLAGS  += $(call cc-option,-mabi=lp64)
 KBUILD_AFLAGS  += $(call cc-option,-mabi=lp64)
 
+KBUILD_CFLAGS  += $(call cc-ifversion, -ge, 0500, 
-DCONFIG_ARCH_SUPPORTS_INT128)
+
 ifeq ($(CONFIG_CPU_BIG_ENDIAN), y)
 KBUILD_CPPFLAGS+= -mbig-endian
 CHECKFLAGS += -D__AARCH64EB__
@@ -108,7 +110,7 @@ core-$(CONFIG_NET) += arch/arm64/net/
 core-$(CONFIG_KVM) += arch/arm64/kvm/
 core-$(CONFIG_XEN) += arch/arm64/xen/
 core-$(CONFIG_CRYPTO) += arch/arm64/crypto/
-libs-y := arch/arm64/lib/ $(libs-y)
+libs-y := arch/arm64/lib/ $(shell $(CC) $(KBUILD_CFLAGS) 
-print-libgcc-file-name) $(libs-y)
 core-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
 
 # Default target when executing plain make
-- 
2.14.2



[PATCH v2] arm64: support __int128 on gcc 5+

2017-11-02 Thread Jason A. Donenfeld
Versions of gcc prior to gcc 5 emitted a __multi3 function call when
dealing with TI types, resulting in failures when trying to link to
libgcc, and more generally, bad performance. However, since gcc 5,
the compiler supports actually emitting fast instructions, which means
we can at long last enable this option and receive the speedups.

The gcc commit that added proper Aarch64 support is:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d
This commit appears to be part of the gcc 5 release.

There are still a few instructions, such as __ashlti3 and __ashrti3,
that require libgcc, which is fine. So, we also link to libgcc for these
functions when needed, which is what several other architectures already
have been doing for a long time.

Signed-off-by: Jason A. Donenfeld 
---
 arch/arm64/Makefile | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 939b310913cf..70c7c0c1bccb 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -53,6 +53,8 @@ KBUILD_AFLAGS += $(lseinstr) $(brokengasinst)
 KBUILD_CFLAGS  += $(call cc-option,-mabi=lp64)
 KBUILD_AFLAGS  += $(call cc-option,-mabi=lp64)
 
+KBUILD_CFLAGS  += $(call cc-ifversion, -ge, 0500, 
-DCONFIG_ARCH_SUPPORTS_INT128)
+
 ifeq ($(CONFIG_CPU_BIG_ENDIAN), y)
 KBUILD_CPPFLAGS+= -mbig-endian
 CHECKFLAGS += -D__AARCH64EB__
@@ -108,7 +110,7 @@ core-$(CONFIG_NET) += arch/arm64/net/
 core-$(CONFIG_KVM) += arch/arm64/kvm/
 core-$(CONFIG_XEN) += arch/arm64/xen/
 core-$(CONFIG_CRYPTO) += arch/arm64/crypto/
-libs-y := arch/arm64/lib/ $(libs-y)
+libs-y := arch/arm64/lib/ $(shell $(CC) $(KBUILD_CFLAGS) 
-print-libgcc-file-name) $(libs-y)
 core-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
 
 # Default target when executing plain make
-- 
2.14.2



Re: [PATCH v7] printk: hash addresses printed with %p

2017-10-31 Thread Jason A. Donenfeld
On Mon, Oct 30, 2017 at 9:22 PM, Steven Rostedt  wrote:
> How quickly do you need static_branch_disable() executed? You could
> always pass the work off to a worker thread (that can schedule).
>
> random_ready_callback -> initiates worker thread -> enables the static branch

I had already suggested that much earlier in the thread, but
discounted it in the very same suggestion, because that branch turns
out to be not that expensive anyway, and I think it's more important
that we're able to print meaningful values as soon as we can, rather
than waiting for the scheduler.


Re: [PATCH v7] printk: hash addresses printed with %p

2017-10-31 Thread Jason A. Donenfeld
On Mon, Oct 30, 2017 at 9:22 PM, Steven Rostedt  wrote:
> How quickly do you need static_branch_disable() executed? You could
> always pass the work off to a worker thread (that can schedule).
>
> random_ready_callback -> initiates worker thread -> enables the static branch

I had already suggested that much earlier in the thread, but
discounted it in the very same suggestion, because that branch turns
out to be not that expensive anyway, and I think it's more important
that we're able to print meaningful values as soon as we can, rather
than waiting for the scheduler.


Re: [PATCH] arm64: support __int128 on gcc 5+

2017-10-31 Thread Jason A. Donenfeld
On Tue, Oct 31, 2017 at 1:17 PM, Will Deacon  wrote:
>
> Sure, I'll wait for your patch that matches the relevant compiler versions.

That's the patch in this thread:
http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/539943.html

As mentioned in there, gcc does the right thing for 5 and up:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d


Re: [PATCH] arm64: support __int128 on gcc 5+

2017-10-31 Thread Jason A. Donenfeld
On Tue, Oct 31, 2017 at 1:17 PM, Will Deacon  wrote:
>
> Sure, I'll wait for your patch that matches the relevant compiler versions.

That's the patch in this thread:
http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/539943.html

As mentioned in there, gcc does the right thing for 5 and up:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d


Re: [PATCH] arm64: support __int128 on gcc 5+

2017-10-31 Thread Jason A. Donenfeld
Hi Will,

On Tue, Oct 31, 2017 at 12:51 PM, Will Deacon  wrote:
> Which code in the kernel actually uses 128-bit types directly? I know we
> have some unfortunate occurences in our headers (including uapi) for the
> vector registers, but I thought we generally used asm or copy routines to
> access those.

math64.h provides it, and various things throughout use those
functions. Notably, the scheduler and kvm use those the __int128
functions. There's also an elliptic curve implementation that uses it,
which makes a big difference. And soon I'll be adding an
implementation of curve25519 that will make heavy use of these
instructions for significant speedups as well.

Generally, adding this CONFIG_ARCH key is a hint for current and
future bits of code, so that they can use the faster 128-bit
implementations when available. Not providing it when it's there and
available would be silly. This wasn't originally added to the
architecture, because when the Kconfig symbol was added, gcc didn't
have sane support for it on aarch64. But now it does, so let's support
it.

Jason


Re: [PATCH] arm64: support __int128 on gcc 5+

2017-10-31 Thread Jason A. Donenfeld
Hi Will,

On Tue, Oct 31, 2017 at 12:51 PM, Will Deacon  wrote:
> Which code in the kernel actually uses 128-bit types directly? I know we
> have some unfortunate occurences in our headers (including uapi) for the
> vector registers, but I thought we generally used asm or copy routines to
> access those.

math64.h provides it, and various things throughout use those
functions. Notably, the scheduler and kvm use those the __int128
functions. There's also an elliptic curve implementation that uses it,
which makes a big difference. And soon I'll be adding an
implementation of curve25519 that will make heavy use of these
instructions for significant speedups as well.

Generally, adding this CONFIG_ARCH key is a hint for current and
future bits of code, so that they can use the faster 128-bit
implementations when available. Not providing it when it's there and
available would be silly. This wasn't originally added to the
architecture, because when the Kconfig symbol was added, gcc didn't
have sane support for it on aarch64. But now it does, so let's support
it.

Jason


[PATCH] arm64: support __int128 on gcc 5+

2017-10-31 Thread Jason A. Donenfeld
Versions of gcc prior to gcc 5 emitted a __multi3 function call when
dealing with TI types, resulting in failures when trying to link to
libgcc, and more generally, horrible performance. However, since gcc 5,
the compiler supports actually emitting fast instructions, which means
we can at long last enable this option and receive the speedups.

The gcc commit that added proper Aarch64 support is:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d

This commit appears to be part of the gcc 5 release.

Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
---
 arch/arm64/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 939b310913cf..1f8a0fec6998 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -53,6 +53,8 @@ KBUILD_AFLAGS += $(lseinstr) $(brokengasinst)
 KBUILD_CFLAGS  += $(call cc-option,-mabi=lp64)
 KBUILD_AFLAGS  += $(call cc-option,-mabi=lp64)
 
+KBUILD_CFLAGS  += $(call cc-ifversion, -ge, 0500, 
-DCONFIG_ARCH_SUPPORTS_INT128)
+
 ifeq ($(CONFIG_CPU_BIG_ENDIAN), y)
 KBUILD_CPPFLAGS+= -mbig-endian
 CHECKFLAGS += -D__AARCH64EB__
-- 
2.14.2



[PATCH] arm64: support __int128 on gcc 5+

2017-10-31 Thread Jason A. Donenfeld
Versions of gcc prior to gcc 5 emitted a __multi3 function call when
dealing with TI types, resulting in failures when trying to link to
libgcc, and more generally, horrible performance. However, since gcc 5,
the compiler supports actually emitting fast instructions, which means
we can at long last enable this option and receive the speedups.

The gcc commit that added proper Aarch64 support is:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d

This commit appears to be part of the gcc 5 release.

Signed-off-by: Jason A. Donenfeld 
---
 arch/arm64/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 939b310913cf..1f8a0fec6998 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -53,6 +53,8 @@ KBUILD_AFLAGS += $(lseinstr) $(brokengasinst)
 KBUILD_CFLAGS  += $(call cc-option,-mabi=lp64)
 KBUILD_AFLAGS  += $(call cc-option,-mabi=lp64)
 
+KBUILD_CFLAGS  += $(call cc-ifversion, -ge, 0500, 
-DCONFIG_ARCH_SUPPORTS_INT128)
+
 ifeq ($(CONFIG_CPU_BIG_ENDIAN), y)
 KBUILD_CPPFLAGS+= -mbig-endian
 CHECKFLAGS += -D__AARCH64EB__
-- 
2.14.2



Re: CONFIG_ARCH_SUPPORTS_INT128 for AArch64

2017-10-31 Thread Jason A. Donenfeld
Hi Mark,

On Tue, Oct 31, 2017 at 11:43 AM, Mark Rutland  wrote:
> Judging by Documentation/kbuild/makefiles.txt, we could do something
> like the below in the arm64 Makefile:
>
> cflags-y += $(call cc-ifversion, -ge, 0500, -DCONFIG_ARCH_SUPPORTS_INT128)

Okay, in that case, I'll re-research the gcc commit that made int128
work properly on aarch64, confirm the release version, and then submit
a patch doing this.

Jason


Re: CONFIG_ARCH_SUPPORTS_INT128 for AArch64

2017-10-31 Thread Jason A. Donenfeld
Hi Mark,

On Tue, Oct 31, 2017 at 11:43 AM, Mark Rutland  wrote:
> Judging by Documentation/kbuild/makefiles.txt, we could do something
> like the below in the arm64 Makefile:
>
> cflags-y += $(call cc-ifversion, -ge, 0500, -DCONFIG_ARCH_SUPPORTS_INT128)

Okay, in that case, I'll re-research the gcc commit that made int128
work properly on aarch64, confirm the release version, and then submit
a patch doing this.

Jason


CONFIG_ARCH_SUPPORTS_INT128 for AArch64

2017-10-31 Thread Jason A. Donenfeld
Hi folks,

Older versions of gcc are horrible for int128_t on aarch64, emitting a
__multi3 call, which is slow and defeats the purpose. However, more
recent versions of gcc (since 5, IIRC), do the right thing and produce
the proper inlined instructions.

Do we have the infrastructure available to test for this, and
conditionally set CONFIG_ARCH_SUPPORTS_INT128 if we do?

Thanks,
Jason


CONFIG_ARCH_SUPPORTS_INT128 for AArch64

2017-10-31 Thread Jason A. Donenfeld
Hi folks,

Older versions of gcc are horrible for int128_t on aarch64, emitting a
__multi3 call, which is slow and defeats the purpose. However, more
recent versions of gcc (since 5, IIRC), do the right thing and produce
the proper inlined instructions.

Do we have the infrastructure available to test for this, and
conditionally set CONFIG_ARCH_SUPPORTS_INT128 if we do?

Thanks,
Jason


Re: [PATCH V8 2/2] printk: hash addresses printed with %p

2017-10-25 Thread Jason A. Donenfeld
On Thu, Oct 26, 2017 at 4:53 AM, Tobin C. Harding  wrote:
> +static bool have_filled_random_ptr_key;

__read_mostly


Re: [PATCH V8 2/2] printk: hash addresses printed with %p

2017-10-25 Thread Jason A. Donenfeld
On Thu, Oct 26, 2017 at 4:53 AM, Tobin C. Harding  wrote:
> +static bool have_filled_random_ptr_key;

__read_mostly


Re: [PATCH v7] printk: hash addresses printed with %p

2017-10-25 Thread Jason A. Donenfeld
On Thu, Oct 26, 2017 at 12:27 AM, Tobin C. Harding  wrote:
> How good is unlikely()?

It places that branch way at the bottom of the function so that it's
less likely to pollute the icache.

> It doesn't _feel_ right adding a check on every call to printk just to
> check for a condition that was only true for the briefest time when the
> kernel booted. But if unlikely() is good then I guess it doesn't hurt.
>
> I'm leaning towards the option 1, but then all those text books I read
> are telling me to implement the simplest solution first then if we need
> to go faster implement the more complex solution.
>
> This is a pretty airy fairy discussion now, but if you have an opinion
> I'd love to hear it.

I don't think adding a single branch there really matters that much,
considering how many random other branches there are all over the
printk code. However, if you really want to optimize on the little
bits, and sensibly don't want to go with the overcomplex
workqueue-to-statickey thing, you could consider using a plain vanilla
`bool has_gotten_random_ptr_secret` instead of using the atomic_t. The
reason is that there's only ever one single writer, changing from a 0
to a 1. Basically the only thing doing the atomic_t got you was a
cache flush surrounding the read (and the write) so that assigning
has_gotten_random_ptr_secret=true would take effect _immediately_.
However, since you might not necessarily about that, going with a bool
instead will save you an expensive cache flush, while potentially
being a microsecond out of date the first time it's used. Seems like
an okay trade off to me. (That kind of cache latency, also, is a few
orders of magnitude better than using a work queue for the statickey
stuff.)


Re: [PATCH v7] printk: hash addresses printed with %p

2017-10-25 Thread Jason A. Donenfeld
On Thu, Oct 26, 2017 at 12:27 AM, Tobin C. Harding  wrote:
> How good is unlikely()?

It places that branch way at the bottom of the function so that it's
less likely to pollute the icache.

> It doesn't _feel_ right adding a check on every call to printk just to
> check for a condition that was only true for the briefest time when the
> kernel booted. But if unlikely() is good then I guess it doesn't hurt.
>
> I'm leaning towards the option 1, but then all those text books I read
> are telling me to implement the simplest solution first then if we need
> to go faster implement the more complex solution.
>
> This is a pretty airy fairy discussion now, but if you have an opinion
> I'd love to hear it.

I don't think adding a single branch there really matters that much,
considering how many random other branches there are all over the
printk code. However, if you really want to optimize on the little
bits, and sensibly don't want to go with the overcomplex
workqueue-to-statickey thing, you could consider using a plain vanilla
`bool has_gotten_random_ptr_secret` instead of using the atomic_t. The
reason is that there's only ever one single writer, changing from a 0
to a 1. Basically the only thing doing the atomic_t got you was a
cache flush surrounding the read (and the write) so that assigning
has_gotten_random_ptr_secret=true would take effect _immediately_.
However, since you might not necessarily about that, going with a bool
instead will save you an expensive cache flush, while potentially
being a microsecond out of date the first time it's used. Seems like
an okay trade off to me. (That kind of cache latency, also, is a few
orders of magnitude better than using a work queue for the statickey
stuff.)


Re: [PATCH] iscsi-target: Convert timers to use timer_setup()

2017-10-25 Thread Jason A. Donenfeld
On Wed, Oct 25, 2017 at 12:01 PM, Kees Cook  wrote:
> sess->time2retain_timer.expires =
> (get_jiffies_64() + sess->sess_ops->DefaultTime2Retain * HZ);
> add_timer(>time2retain_timer);
> cmd->dataout_timer.expires = (get_jiffies_64() + na->dataout_timeout 
> * HZ);
> add_timer(>dataout_timer);
> np->np_login_timer.expires = (get_jiffies_64() + TA_LOGIN_TIMEOUT * 
> HZ);
> add_timer(>np_login_timer);
> +   timeout.timer.expires = (get_jiffies_64() + TA_LOGIN_TIMEOUT * HZ);
> +   add_timer();
> conn->nopin_response_timer.expires =
> (get_jiffies_64() + na->nopin_response_timeout * HZ);
> add_timer(>nopin_response_timer);
> conn->nopin_timer.expires = (get_jiffies_64() + na->nopin_timeout * 
> HZ);
> add_timer(>nopin_timer);
> conn->nopin_timer.expires = (get_jiffies_64() + na->nopin_timeout * 
> HZ);
> add_timer(>nopin_timer);

I assume you're trying to keep these diffs as small as possible, but I
imagine at some point the above can also be combined into a single
mod_timer.

Also, all the goofiness with the flags stuff seems like an antique
replacement for timer_pending:

>conn->nopin_response_timer_flags &= ~ISCSI_TF_STOP;
>conn->nopin_response_timer_flags |= ISCSI_TF_RUNNING;


Re: [PATCH] iscsi-target: Convert timers to use timer_setup()

2017-10-25 Thread Jason A. Donenfeld
On Wed, Oct 25, 2017 at 12:01 PM, Kees Cook  wrote:
> sess->time2retain_timer.expires =
> (get_jiffies_64() + sess->sess_ops->DefaultTime2Retain * HZ);
> add_timer(>time2retain_timer);
> cmd->dataout_timer.expires = (get_jiffies_64() + na->dataout_timeout 
> * HZ);
> add_timer(>dataout_timer);
> np->np_login_timer.expires = (get_jiffies_64() + TA_LOGIN_TIMEOUT * 
> HZ);
> add_timer(>np_login_timer);
> +   timeout.timer.expires = (get_jiffies_64() + TA_LOGIN_TIMEOUT * HZ);
> +   add_timer();
> conn->nopin_response_timer.expires =
> (get_jiffies_64() + na->nopin_response_timeout * HZ);
> add_timer(>nopin_response_timer);
> conn->nopin_timer.expires = (get_jiffies_64() + na->nopin_timeout * 
> HZ);
> add_timer(>nopin_timer);
> conn->nopin_timer.expires = (get_jiffies_64() + na->nopin_timeout * 
> HZ);
> add_timer(>nopin_timer);

I assume you're trying to keep these diffs as small as possible, but I
imagine at some point the above can also be combined into a single
mod_timer.

Also, all the goofiness with the flags stuff seems like an antique
replacement for timer_pending:

>conn->nopin_response_timer_flags &= ~ISCSI_TF_STOP;
>conn->nopin_response_timer_flags |= ISCSI_TF_RUNNING;


Re: [PATCH v7] printk: hash addresses printed with %p

2017-10-24 Thread Jason A. Donenfeld
On Wed, Oct 25, 2017 at 5:49 AM, Tobin C. Harding  wrote:
> static_branch_disable(_ptr_secret) : Doesn't sleep, just atomic read
> and set and maybe a WARN_ONCE.

Are you sure about that? I just looked myself, and though there is a
!HAVE_JUMP_LABEL ifdef that does what you described, there's also a
HAVE_JUMP_LABEL that takes a mutex, which sleeps:

static_branch_disable
  static_key_disable
cpus_read_lock
  percpu_down_read
percpu_down_read_preempt_disable
  might_sleep

> Now for the 'executes from process context' stuff.

Er, sorry, I meant to write non-process context in my original
message, which is generally where you're worried about sleeping.

> If the callback mechanism is utilized (i.e print before randomness is
> ready) then the call back will be executed the next time the randomness
> pool gets added to

So it sounds to me like this might be called in non-process context.
Disaster. I realize the static_key thing was my idea in the original
email, so sorry for leading you astray. But moving to do this in
early_initcall wound up fixing other issues too, so all and all a net
good in going this direction.

Two options: you stick with static_branch, because it's cool and speed
is fun, and work around all of the above with a call to queue_work so
that static_branch_enable is called only from process context.

Or, you give up on static_key, because it's not actually super
necessary, and instead just use an atomic, and reason that using `if
(unlikely(!atomic_read()))` is probably good enough. In this
option, the code would be pretty much the same as v7, except you'd
s/static_branch/atomic_t/, and change the helpers, etc. This is
probably the more reasonable way.


Re: [PATCH v7] printk: hash addresses printed with %p

2017-10-24 Thread Jason A. Donenfeld
On Wed, Oct 25, 2017 at 5:49 AM, Tobin C. Harding  wrote:
> static_branch_disable(_ptr_secret) : Doesn't sleep, just atomic read
> and set and maybe a WARN_ONCE.

Are you sure about that? I just looked myself, and though there is a
!HAVE_JUMP_LABEL ifdef that does what you described, there's also a
HAVE_JUMP_LABEL that takes a mutex, which sleeps:

static_branch_disable
  static_key_disable
cpus_read_lock
  percpu_down_read
percpu_down_read_preempt_disable
  might_sleep

> Now for the 'executes from process context' stuff.

Er, sorry, I meant to write non-process context in my original
message, which is generally where you're worried about sleeping.

> If the callback mechanism is utilized (i.e print before randomness is
> ready) then the call back will be executed the next time the randomness
> pool gets added to

So it sounds to me like this might be called in non-process context.
Disaster. I realize the static_key thing was my idea in the original
email, so sorry for leading you astray. But moving to do this in
early_initcall wound up fixing other issues too, so all and all a net
good in going this direction.

Two options: you stick with static_branch, because it's cool and speed
is fun, and work around all of the above with a call to queue_work so
that static_branch_enable is called only from process context.

Or, you give up on static_key, because it's not actually super
necessary, and instead just use an atomic, and reason that using `if
(unlikely(!atomic_read()))` is probably good enough. In this
option, the code would be pretty much the same as v7, except you'd
s/static_branch/atomic_t/, and change the helpers, etc. This is
probably the more reasonable way.


<    1   2   3   4   5   6   7   8   9   10   >