from:"Vincent Whitchurch"

[PATCHv2] kmemleak: Add option to print warnings to dmesg

2018-08-27 Thread Vincent Whitchurch

Currently, kmemleak only prints the number of suspected leaks to dmesg
but requires the user to read a debugfs file to get the actual stack
traces of the objects' allocation points.  Add an option to print the
full object information to dmesg too.  This allows easier integration of
kmemleak into automated test systems since those kind of systems
presumably already save kernel logs.

Signed-off-by: Vincent Whitchurch 
---
v2: Print hex dump too.

 lib/Kconfig.debug |  9 +
 mm/kmemleak.c | 37 ++---
 2 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index ab1b599202bc..9a3fc905b8bd 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -593,6 +593,15 @@ config DEBUG_KMEMLEAK_DEFAULT_OFF
  Say Y here to disable kmemleak by default. It can then be enabled
  on the command line via kmemleak=on.
 
+config DEBUG_KMEMLEAK_WARN
+   bool "Print kmemleak object warnings to log buffer"
+   depends on DEBUG_KMEMLEAK
+   help
+ Say Y here to make kmemleak print information about unreferenced
+ objects (including stacktraces) as warnings to the kernel log buffer.
+ Otherwise this information is only available by reading the kmemleak
+ debugfs file.
+
 config DEBUG_STACK_USAGE
bool "Stack utilization instrumentation"
depends on DEBUG_KERNEL && !IA64
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 9a085d525bbc..22662715a3dc 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -181,6 +181,7 @@ struct kmemleak_object {
 /* flag set to not scan the object */
 #define OBJECT_NO_SCAN (1 << 2)
 
+#define HEX_PREFIX ""
 /* number of bytes to print per line; must be 16 or 32 */
 #define HEX_ROW_SIZE   16
 /* number of bytes to print at a time (1, 2, 4, 8) */
@@ -299,6 +300,25 @@ static void kmemleak_disable(void);
kmemleak_disable(); \
 } while (0)
 
+#define warn_or_seq_printf(seq, fmt, ...)  do {\
+   if (seq)\
+   seq_printf(seq, fmt, ##__VA_ARGS__);\
+   else\
+   pr_warn(fmt, ##__VA_ARGS__);\
+} while (0)
+
+static void warn_or_seq_hex_dump(struct seq_file *seq, int prefix_type,
+int rowsize, int groupsize, const void *buf,
+size_t len, bool ascii)
+{
+   if (seq)
+   seq_hex_dump(seq, HEX_PREFIX, prefix_type, rowsize, groupsize,
+buf, len, ascii);
+   else
+   print_hex_dump(KERN_WARNING, pr_fmt(HEX_PREFIX), prefix_type,
+  rowsize, groupsize, buf, len, ascii);
+}
+
 /*
  * Printing of the objects hex dump to the seq file. The number of lines to be
  * printed is limited to HEX_MAX_LINES to prevent seq file spamming. The
@@ -314,10 +334,10 @@ static void hex_dump_object(struct seq_file *seq,
/* limit the number of lines to HEX_MAX_LINES */
len = min_t(size_t, object->size, HEX_MAX_LINES * HEX_ROW_SIZE);
 
-   seq_printf(seq, "  hex dump (first %zu bytes):\n", len);
+   warn_or_seq_printf(seq, "  hex dump (first %zu bytes):\n", len);
kasan_disable_current();
-   seq_hex_dump(seq, "", DUMP_PREFIX_NONE, HEX_ROW_SIZE,
-HEX_GROUP_SIZE, ptr, len, HEX_ASCII);
+   warn_or_seq_hex_dump(seq, DUMP_PREFIX_NONE, HEX_ROW_SIZE,
+HEX_GROUP_SIZE, ptr, len, HEX_ASCII);
kasan_enable_current();
 }
 
@@ -365,17 +385,17 @@ static void print_unreferenced(struct seq_file *seq,
int i;
unsigned int msecs_age = jiffies_to_msecs(jiffies - object->jiffies);
 
-   seq_printf(seq, "unreferenced object 0x%08lx (size %zu):\n",
+   warn_or_seq_printf(seq, "unreferenced object 0x%08lx (size %zu):\n",
   object->pointer, object->size);
-   seq_printf(seq, "  comm \"%s\", pid %d, jiffies %lu (age %d.%03ds)\n",
+   warn_or_seq_printf(seq, "  comm \"%s\", pid %d, jiffies %lu (age 
%d.%03ds)\n",
   object->comm, object->pid, object->jiffies,
   msecs_age / 1000, msecs_age % 1000);
hex_dump_object(seq, object);
-   seq_printf(seq, "  backtrace:\n");
+   warn_or_seq_printf(seq, "  backtrace:\n");
 
for (i = 0; i < object->trace_len; i++) {
void *ptr = (void *)object->trace[i];
-   seq_printf(seq, "[<%p>] %pS\n", ptr, ptr);
+   warn_or_seq_printf(seq, "[<%p>] %pS\n", ptr, ptr);
}
 }
 
@@ -1598,6 +1618,9 @@ static void kmemleak_scan(void)
if (unreferenced_object(object

Re: [PATCHv2] kmemleak: Add option to print warnings to dmesg

2018-08-28 Thread Vincent Whitchurch

On Mon, Aug 27, 2018 at 03:16:41PM -0700, Andrew Morton wrote:
> On Mon, 27 Aug 2018 10:38:21 +0200 Vincent Whitchurch 
>  wrote:
> 
> > Currently, kmemleak only prints the number of suspected leaks to dmesg
> > but requires the user to read a debugfs file to get the actual stack
> > traces of the objects' allocation points.  Add an option to print the
> > full object information to dmesg too.  This allows easier integration of
> > kmemleak into automated test systems since those kind of systems
> > presumably already save kernel logs.
> 
> "presumably" is a bit rubbery.  Are you sure this change is sufficienty
> useful to justify including it?  Do you have use-cases for it?

We (like every one else) have automated test infrastructure which test
our Linux systems.  With this option, running our tests with kmemleak is
as simple as enabling kmemleak and this option; the test infrastructure
knows how to save kernel logs, which will now include kmemleak reports.
Without this option, the test infrastructure needs to be specifically
taught to read out the kmemleak debugfs file.  Removing this need for
special handling makes kmemleak more similar to other kernel debug
options (slab debugging, debug objects, etc).

> 
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -593,6 +593,15 @@ config DEBUG_KMEMLEAK_DEFAULT_OFF
> >   Say Y here to disable kmemleak by default. It can then be enabled
> >   on the command line via kmemleak=on.
> >  
> > +config DEBUG_KMEMLEAK_WARN
> > +   bool "Print kmemleak object warnings to log buffer"
> > +   depends on DEBUG_KMEMLEAK
> > +   help
> > + Say Y here to make kmemleak print information about unreferenced
> > + objects (including stacktraces) as warnings to the kernel log buffer.
> > + Otherwise this information is only available by reading the kmemleak
> > + debugfs file.
> 
> Why add the config option?  Why not simply make the change for all
> configs?

No particular reason other than preserving the current behaviour for
existing users.  I can remove the config option if Catalin is fine with
it.

> 
> >  config DEBUG_STACK_USAGE
> > bool "Stack utilization instrumentation"
> > depends on DEBUG_KERNEL && !IA64
> > diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> > index 9a085d525bbc..22662715a3dc 100644
> > --- a/mm/kmemleak.c
> > +++ b/mm/kmemleak.c
> > @@ -181,6 +181,7 @@ struct kmemleak_object {
> >  /* flag set to not scan the object */
> >  #define OBJECT_NO_SCAN (1 << 2)
> >  
> > +#define HEX_PREFIX ""
> >  /* number of bytes to print per line; must be 16 or 32 */
> >  #define HEX_ROW_SIZE   16
> >  /* number of bytes to print at a time (1, 2, 4, 8) */
> > @@ -299,6 +300,25 @@ static void kmemleak_disable(void);
> > kmemleak_disable(); \
> >  } while (0)
> >  
> > +#define warn_or_seq_printf(seq, fmt, ...)  do {\
> > +   if (seq)\
> > +   seq_printf(seq, fmt, ##__VA_ARGS__);\
> > +   else\
> > +   pr_warn(fmt, ##__VA_ARGS__);\
> > +} while (0)
> > +
> > +static void warn_or_seq_hex_dump(struct seq_file *seq, int prefix_type,
> > +int rowsize, int groupsize, const void *buf,
> > +size_t len, bool ascii)
> > +{
> > +   if (seq)
> > +   seq_hex_dump(seq, HEX_PREFIX, prefix_type, rowsize, groupsize,
> > +buf, len, ascii);
> > +   else
> > +   print_hex_dump(KERN_WARNING, pr_fmt(HEX_PREFIX), prefix_type,
> > +  rowsize, groupsize, buf, len, ascii);
> > +}
> 
> This will print to the logs OR to the debugfs file, won't it?

No, the information is always available in the debugfs file, even after
this patch.  The dmesg printing is in addition to that.  The code is
called with and without seq == NULL in different code paths.

[PATCH 1/2] kmemleak: dump all objects for slab usage analysis

2018-08-28 Thread Vincent Whitchurch

In order to be able to analyse the kernel's slab usage, we'd need a list
of allocated objects and their allocation stacks.  Kmemleak already
maintains such a list internally, so we expose it via debugfs file.

This file can be post-processed in userspace and converted to a suitable
format for slab usage analysis.

Signed-off-by: Vincent Whitchurch 
---
 mm/kmemleak.c | 53 +
 1 file changed, 53 insertions(+)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 17dd883198ae..7bef05c690d6 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -1759,6 +1759,34 @@ static int kmemleak_seq_show(struct seq_file *seq, void 
*v)
return 0;
 }
 
+static void kmemleak_print_object(struct seq_file *seq,
+ struct kmemleak_object *object)
+{
+   int i;
+
+   seq_printf(seq, "object 0x%08lx (size %zu):\n",
+  object->pointer, object->size);
+   seq_printf(seq, "  comm \"%s\", pid %d, jiffies %lu\n",
+  object->comm, object->pid, object->jiffies);
+
+   for (i = 0; i < object->trace_len; i++) {
+   void *ptr = (void *)object->trace[i];
+
+   seq_printf(seq, "[<%p>] %pS\n", ptr, ptr);
+   }
+}
+
+static int kmemleak_all_seq_show(struct seq_file *seq, void *v)
+{
+   struct kmemleak_object *object = v;
+   unsigned long flags;
+
+   spin_lock_irqsave(&object->lock, flags);
+   kmemleak_print_object(seq, object);
+   spin_unlock_irqrestore(&object->lock, flags);
+   return 0;
+}
+
 static const struct seq_operations kmemleak_seq_ops = {
.start = kmemleak_seq_start,
.next  = kmemleak_seq_next,
@@ -1766,11 +1794,23 @@ static const struct seq_operations kmemleak_seq_ops = {
.show  = kmemleak_seq_show,
 };
 
+static const struct seq_operations kmemleak_all_seq_ops = {
+   .start = kmemleak_seq_start,
+   .next  = kmemleak_seq_next,
+   .stop  = kmemleak_seq_stop,
+   .show  = kmemleak_all_seq_show,
+};
+
 static int kmemleak_open(struct inode *inode, struct file *file)
 {
return seq_open(file, &kmemleak_seq_ops);
 }
 
+static int kmemleak_all_open(struct inode *inode, struct file *file)
+{
+   return seq_open(file, &kmemleak_all_seq_ops);
+}
+
 static int dump_str_object_info(const char *str)
 {
unsigned long flags;
@@ -1911,6 +1951,14 @@ static const struct file_operations kmemleak_fops = {
.release= seq_release,
 };
 
+static const struct file_operations kmemleak_all_fops = {
+   .owner  = THIS_MODULE,
+   .open   = kmemleak_all_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= seq_release,
+};
+
 static void __kmemleak_do_cleanup(void)
 {
struct kmemleak_object *object;
@@ -2102,6 +2150,11 @@ static int __init kmemleak_late_init(void)
if (!dentry)
pr_warn("Failed to create the debugfs kmemleak file\n");
 
+   dentry = debugfs_create_file("kmemleak_all", 0400, NULL, NULL,
+&kmemleak_all_fops);
+   if (!dentry)
+   pr_warn("Failed to create the debugfs kmemleak_all file\n");
+
if (kmemleak_error) {
/*
 * Some error occurred and kmemleak was disabled. There is a
-- 
2.11.0

[PATCH 2/2] scripts: add kmemleak2pprof.py for slab usage analysis

2018-08-28 Thread Vincent Whitchurch

Add a script which converts /sys/kernel/debug/kmemleak_all to the pprof
format, which can be used for analysing memory usage.  See
https://github.com/google/pprof.

 $ ./kmemleak2pprof.py kmemleak_all
 $ pprof -text -ignore free_area_init_node -compact_labels -nodecount 10 prof
 Showing nodes accounting for 4.85MB, 34.05% of 14.23MB total
 Dropped 3989 nodes (cum <= 0.07MB)
 Showing top 10 nodes out of 190
   flat  flat%   sum%cum   cum%
 1.39MB  9.78%  9.78% 1.61MB 11.29%  new_inode_pseudo+0x8/0x4c
 0.75MB  5.27% 15.04% 0.75MB  5.27%  alloc_large_system_hash+0x19c/0x2b8
 0.73MB  5.12% 20.17% 0.86MB  6.07%  kernfs_new_node+0x30/0x50
 0.66MB  4.62% 24.79% 0.66MB  4.62%  
__vmalloc_node.constprop.9+0x48/0x50
 0.61MB  4.28% 29.06% 0.61MB  4.28%  d_alloc+0x10/0x78
 0.22MB  1.52% 30.58% 0.22MB  1.52%  alloc_inode+0x1c/0xa4
 0.18MB  1.28% 31.86% 0.20MB  1.42%  _do_fork+0xb0/0x41c
 0.13MB  0.88% 32.74% 0.13MB  0.88%  early_trace_init+0x16c/0x374
 0.09MB  0.66% 33.40% 0.17MB  1.17%  inet_init+0x128/0x24c
 0.09MB  0.65% 34.05% 0.09MB  0.65%  __kernfs_new_node+0x34/0x1a8

Signed-off-by: Vincent Whitchurch 
---
 scripts/kmemleak2pprof.py | 164 ++
 1 file changed, 164 insertions(+)
 create mode 100755 scripts/kmemleak2pprof.py

diff --git a/scripts/kmemleak2pprof.py b/scripts/kmemleak2pprof.py
new file mode 100755
index ..1295d3ca9a9d
--- /dev/null
+++ b/scripts/kmemleak2pprof.py
@@ -0,0 +1,164 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) 2018 Axis Communications AB
+#
+# Converts /sys/kernel/debug/kmemleak_all to the pprof format, see
+# https://github.com/google/pprof.
+#
+# profile_pb2.py can be generated with the following commands.  protoc is
+# packaged as protobuf-compiler in Debian:
+#
+#  wget 
https://raw.githubusercontent.com/google/pprof/master/proto/profile.proto
+#  protoc -I. --python_out=. profile.proto
+
+import argparse
+
+from collections import defaultdict
+
+import profile_pb2
+
+
+# object 0xee0243b0 (size 464):
+#   comm "swapper/0", pid 0, jiffies 4294937296
+# [<80220673>] alloc_inode+0x13/0x60
+# [<80221cc5>] new_inode_pseudo+0xd/0x38
+# [<802568a3>] proc_setup_thread_self+0x37/0xc4
+# [<8020e8c1>] mount_ns+0x55/0x94
+# [<8024f2e1>] proc_mount+0x45/0x48
+# [<8020ee9b>] mount_fs+0x1f/0x104
+# [<80224785>] vfs_kern_mount.part.3+0x35/0xbc
+# [<80224833>] kern_mount_data+0x17/0x2c
+# [<8024f44b>] pid_ns_prepare_proc+0x13/0x24
+# [<8012ed0d>] alloc_pid+0x309/0x338
+# [<80118e2b>] copy_process.part.5+0xa2b/0x1308
+# [<80119807>] _do_fork+0x77/0x2f0
+# [<80119abf>] kernel_thread+0x23/0x28
+# [<8053517f>] rest_init+0x27/0xb4
+# [<80900afb>] start_kernel+0x369/0x372
+# [<807b>] 0x807b
+class KmemleakAll(object):
+def __init__(self):
+pass
+
+def analyze(self, f):
+allocs = defaultdict(int)
+stack = []
+size = 0
+
+while True:
+line = f.readline()
+if not line:
+break
+
+line = line.strip()
+
+if line.startswith('['):
+# (null) is in the address part so later parsing steps fail.
+# Don't bother fixing it up since it's clearly bogus.
+if '(null)' in line:
+continue
+
+stack.append(line)
+continue
+elif line.startswith('comm'):
+continue
+
+if size:
+allocs[(tuple(stack), size)] += 1
+size = 0
+
+stack = []
+size = int(line.split('(size ')[1].strip('):'))
+
+return sorted(allocs.items(), key=lambda x: x[0][1] * x[1], 
reverse=True)
+
+
+class ProfileWriter(object):
+def __init__(self, allocs):
+self.profile = profile_pb2.Profile()
+self.strings = ['']
+self.allocs = allocs
+self.locations = {}
+self.functions = {}
+
+def stridx(self, s):
+try:
+idx = self.strings.index(s)
+except ValueError:
+idx = len(self.strings)
+self.strings.append(s)
+
+return idx
+
+def get_function_id(self, funcname, filename):
+try:
+return self.functions[(funcname, filename)].id
+except KeyError:
+pass
+
+function = self.profile.function.add()
+function.id = len(self.functions) + 1
+function.name = self.stridx(funcname)
+function.filename = self.stridx(filename)
+
+self.functions[(funcname, filename)] = function
+
+return function.id
+
+def get_location_id(self, addr):
+

Re: [PATCH 2/2] scripts: add kmemleak2pprof.py for slab usage analysis

2018-08-30 Thread Vincent Whitchurch

On Tue, Aug 28, 2018 at 04:28:04PM -0700, Andrew Morton wrote:
> On Tue, 28 Aug 2018 12:39:14 +0200 Vincent Whitchurch 
>  wrote:
> 
> > Add a script which converts /sys/kernel/debug/kmemleak_all to the pprof
> > format, which can be used for analysing memory usage.  See
> > https://github.com/google/pprof.
> 
> Why is this better than /proc/slabinfo?

slabinfo just tells you how much memory is being used in a particular
slab, it doesn't give you a breakdown of who allocated all that memory.
slabinfo can't also tell you how much memory a particular subsystem is
using.

For example, here we can see that tracer_init_tracefs() and its callers
are using ~12% of the total tracked memory:

 $ pprof -top -compact_labels -cum prof 
 Showing nodes accounting for 13418.95kB, 92.07% of 14575.28kB total
 Dropped 4069 nodes (cum <= 72.88kB)
   flat  flat%   sum%cum   cum%
   ...
  0 0% 56.71%  1832.15kB 12.57%  tracer_init_tracefs+0x74/0x1cc

 
And that tracefs' dentrys use 500 KiB and its inodes use 1+ MiB:

 $ pprof -text -compact_labels -focus tracer_init_tracefs -nodecount 2 prof
 Main binary filename not available.
 Showing nodes accounting for 1794.85kB, 12.31% of 14575.28kB total
 Dropped 1912 nodes (cum <= 72.88kB)
 Showing top 2 nodes out of 32
   flat  flat%   sum%cum   cum%
  1294.56kB  8.88%  8.88%  1294.56kB  8.88%  new_inode_pseudo+0x8/0x4c
   500.29kB  3.43% 12.31%   500.29kB  3.43%  d_alloc+0x10/0x78
   ...

> 
> >  $ ./kmemleak2pprof.py kmemleak_all
> >  $ pprof -text -ignore free_area_init_node -compact_labels -nodecount 10 
> > prof
> 
> Are we missing an argument here?  s/prof/kmemleak_all/?

No, the default output filename of this script is called "prof".

Re: [PATCHv2] kmemleak: Add option to print warnings to dmesg

2018-08-30 Thread Vincent Whitchurch

On Tue, Aug 28, 2018 at 11:26:22AM +0100, Catalin Marinas wrote:
> On Tue, Aug 28, 2018 at 12:14:12PM +0200, Vincent Whitchurch wrote:
> > On Mon, Aug 27, 2018 at 03:16:41PM -0700, Andrew Morton wrote:
> > > On Mon, 27 Aug 2018 10:38:21 +0200 Vincent Whitchurch 
> > >  wrote:
> > > > +config DEBUG_KMEMLEAK_WARN
> > > > +   bool "Print kmemleak object warnings to log buffer"
> > > > +   depends on DEBUG_KMEMLEAK
> > > > +   help
> > > > + Say Y here to make kmemleak print information about 
> > > > unreferenced
> > > > + objects (including stacktraces) as warnings to the kernel log 
> > > > buffer.
> > > > + Otherwise this information is only available by reading the 
> > > > kmemleak
> > > > + debugfs file.
> > > 
> > > Why add the config option?  Why not simply make the change for all
> > > configs?
> > 
> > No particular reason other than preserving the current behaviour for
> > existing users.  I can remove the config option if Catalin is fine with
> > it.
> 
> IIRC, in the early kmemleak days, people complained about it being to
> noisy (the false positives rate was also much higher), so the default
> behaviour was changed to monitor (almost) quietly with the details
> available via debugfs. I'd like to keep this default behaviour but we
> could have a "verbose" command via both debugfs and kernel parameter (as
> we do with "off" and "on"). Would this work for you?

Either a config option or a parameter are usable for me.  How about
something like this?  It can be enabled with kmemleak.verbose=1 or "echo
1 > /sys/module/kmemleak/parameters/verbose":

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 9a3fc905b8bd..ab1b599202bc 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -593,15 +593,6 @@ config DEBUG_KMEMLEAK_DEFAULT_OFF
  Say Y here to disable kmemleak by default. It can then be enabled
  on the command line via kmemleak=on.
 
-config DEBUG_KMEMLEAK_WARN
-   bool "Print kmemleak object warnings to log buffer"
-   depends on DEBUG_KMEMLEAK
-   help
- Say Y here to make kmemleak print information about unreferenced
- objects (including stacktraces) as warnings to the kernel log buffer.
- Otherwise this information is only available by reading the kmemleak
- debugfs file.
-
 config DEBUG_STACK_USAGE
bool "Stack utilization instrumentation"
depends on DEBUG_KERNEL && !IA64
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 22662715a3dc..c91d43738596 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -86,6 +86,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -236,6 +237,9 @@ static int kmemleak_skip_disable;
 /* If there are leaks that can be reported */
 static bool kmemleak_found_leaks;
 
+static bool kmemleak_verbose;
+module_param_named(verbose, kmemleak_verbose, bool, 0600);
+
 /*
  * Early object allocation/freeing logging. Kmemleak is initialized after the
  * kernel allocator. However, both the kernel allocator and kmemleak may
@@ -1618,9 +1622,10 @@ static void kmemleak_scan(void)
if (unreferenced_object(object) &&
!(object->flags & OBJECT_REPORTED)) {
object->flags |= OBJECT_REPORTED;
-#ifdef CONFIG_DEBUG_KMEMLEAK_WARN
-   print_unreferenced(NULL, object);
-#endif
+
+   if (kmemleak_verbose)
+   print_unreferenced(NULL, object);
+
new_leaks++;
}
spin_unlock_irqrestore(&object->lock, flags);

[PATCH] watchdog: Mark watchdog touch functions as notrace

2018-08-21 Thread Vincent Whitchurch

Some architectures need to use stop_machine() to patch functions for
ftrace, and the assumption is that the stopped CPUs do not make function
calls to traceable functions when they are in the stopped state.

Commit ce4f06dcbb5d ("stop_machine: Touch_nmi_watchdog() after
MULTI_STOP_PREPARE") added calls to the watchdog touch functions from
the stopped CPUs and those functions lack notrace annotations.  This
leads to crashes when enabling/disabling ftrace on ARM kernels built
with the Thumb-2 instruction set.

Fix it by adding the necessary notrace annotations.

Fixes: ce4f06dcbb5d ("stop_machine: Touch_nmi_watchdog() after 
MULTI_STOP_PREPARE")
Signed-off-by: Vincent Whitchurch 
---
 kernel/watchdog.c | 4 ++--
 kernel/watchdog_hld.c | 2 +-
 kernel/workqueue.c| 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 5470dce212c0..977918d5d350 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -261,7 +261,7 @@ static void __touch_watchdog(void)
  * entering idle state.  This should only be used for scheduler events.
  * Use touch_softlockup_watchdog() for everything else.
  */
-void touch_softlockup_watchdog_sched(void)
+notrace void touch_softlockup_watchdog_sched(void)
 {
/*
 * Preemption can be enabled.  It doesn't matter which CPU's timestamp
@@ -270,7 +270,7 @@ void touch_softlockup_watchdog_sched(void)
raw_cpu_write(watchdog_touch_ts, 0);
 }
 
-void touch_softlockup_watchdog(void)
+notrace void touch_softlockup_watchdog(void)
 {
touch_softlockup_watchdog_sched();
wq_watchdog_touch(raw_smp_processor_id());
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 1f7020d65d0a..71381168dede 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -29,7 +29,7 @@ static struct cpumask dead_events_mask;
 static unsigned long hardlockup_allcpu_dumped;
 static atomic_t watchdog_cpus = ATOMIC_INIT(0);
 
-void arch_touch_nmi_watchdog(void)
+notrace void arch_touch_nmi_watchdog(void)
 {
/*
 * Using __raw here because some code paths have
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 78b192071ef7..5f78c6e41796 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -5559,7 +5559,7 @@ static void wq_watchdog_timer_fn(struct timer_list 
*unused)
mod_timer(&wq_watchdog_timer, jiffies + thresh);
 }
 
-void wq_watchdog_touch(int cpu)
+notrace void wq_watchdog_touch(int cpu)
 {
if (cpu >= 0)
per_cpu(wq_watchdog_touched_cpu, cpu) = jiffies;
-- 
2.11.0

[PATCH] kmemleak: Add option to print warnings to dmesg

2018-08-24 Thread Vincent Whitchurch

Currently, kmemleak only prints the number of suspected leaks to dmesg
but requires the user to read a debugfs file to get the actual stack
traces of the objects' allocation points.  Add an option to print the
stack trace information (except the hex dumps) to dmesg too.  This
allows easier integration of kmemleak into automated test systems since
those kind of systems presumably already save kernel logs.

Signed-off-by: Vincent Whitchurch 
---
 lib/Kconfig.debug |  9 +
 mm/kmemleak.c | 21 +
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index ab1b599202bc..9a3fc905b8bd 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -593,6 +593,15 @@ config DEBUG_KMEMLEAK_DEFAULT_OFF
  Say Y here to disable kmemleak by default. It can then be enabled
  on the command line via kmemleak=on.
 
+config DEBUG_KMEMLEAK_WARN
+   bool "Print kmemleak object warnings to log buffer"
+   depends on DEBUG_KMEMLEAK
+   help
+ Say Y here to make kmemleak print information about unreferenced
+ objects (including stacktraces) as warnings to the kernel log buffer.
+ Otherwise this information is only available by reading the kmemleak
+ debugfs file.
+
 config DEBUG_STACK_USAGE
bool "Stack utilization instrumentation"
depends on DEBUG_KERNEL && !IA64
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 9a085d525bbc..61ba47a357fc 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -311,6 +311,9 @@ static void hex_dump_object(struct seq_file *seq,
const u8 *ptr = (const u8 *)object->pointer;
size_t len;
 
+   if (!seq)
+   return;
+
/* limit the number of lines to HEX_MAX_LINES */
len = min_t(size_t, object->size, HEX_MAX_LINES * HEX_ROW_SIZE);
 
@@ -355,6 +358,13 @@ static bool unreferenced_object(struct kmemleak_object 
*object)
   jiffies_last_scan);
 }
 
+#define warn_or_seq_printf(seq, fmt, ...)  do {\
+   if (seq)\
+   seq_printf(seq, fmt, ##__VA_ARGS__);\
+   else\
+   pr_warn(fmt, ##__VA_ARGS__);\
+} while (0)
+
 /*
  * Printing of the unreferenced objects information to the seq file. The
  * print_unreferenced function must be called with the object->lock held.
@@ -365,17 +375,17 @@ static void print_unreferenced(struct seq_file *seq,
int i;
unsigned int msecs_age = jiffies_to_msecs(jiffies - object->jiffies);
 
-   seq_printf(seq, "unreferenced object 0x%08lx (size %zu):\n",
+   warn_or_seq_printf(seq, "unreferenced object 0x%08lx (size %zu):\n",
   object->pointer, object->size);
-   seq_printf(seq, "  comm \"%s\", pid %d, jiffies %lu (age %d.%03ds)\n",
+   warn_or_seq_printf(seq, "  comm \"%s\", pid %d, jiffies %lu (age 
%d.%03ds)\n",
   object->comm, object->pid, object->jiffies,
   msecs_age / 1000, msecs_age % 1000);
hex_dump_object(seq, object);
-   seq_printf(seq, "  backtrace:\n");
+   warn_or_seq_printf(seq, "  backtrace:\n");
 
for (i = 0; i < object->trace_len; i++) {
void *ptr = (void *)object->trace[i];
-   seq_printf(seq, "[<%p>] %pS\n", ptr, ptr);
+   warn_or_seq_printf(seq, "[<%p>] %pS\n", ptr, ptr);
}
 }
 
@@ -1598,6 +1608,9 @@ static void kmemleak_scan(void)
if (unreferenced_object(object) &&
!(object->flags & OBJECT_REPORTED)) {
object->flags |= OBJECT_REPORTED;
+#ifdef CONFIG_DEBUG_KMEMLEAK_WARN
+   print_unreferenced(NULL, object);
+#endif
new_leaks++;
}
spin_unlock_irqrestore(&object->lock, flags);
-- 
2.11.0

[PATCH] kmemleak: Always register debugfs file

2018-08-24 Thread Vincent Whitchurch

If kmemleak built in to the kernel, but is disabled by default, the
debugfs file is never registered.  Because of this, it is not possible
to find out if the kernel is built with kmemleak support by checking for
the presence of this file.  To allow this, always register the file.

After this patch, if the file doesn't exist, kmemleak is not available
in the kernel.  If writing "scan" or any other value than "clear" to
this file results in EBUSY, then kmemleak is available but is disabled
by default and can be activated via the kernel command line.

Signed-off-by: Vincent Whitchurch 
---
 mm/kmemleak.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 9a085d525bbc..17dd883198ae 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -2097,6 +2097,11 @@ static int __init kmemleak_late_init(void)
 
kmemleak_initialized = 1;
 
+   dentry = debugfs_create_file("kmemleak", 0644, NULL, NULL,
+&kmemleak_fops);
+   if (!dentry)
+   pr_warn("Failed to create the debugfs kmemleak file\n");
+
if (kmemleak_error) {
/*
 * Some error occurred and kmemleak was disabled. There is a
@@ -2108,10 +2113,6 @@ static int __init kmemleak_late_init(void)
return -ENOMEM;
}
 
-   dentry = debugfs_create_file("kmemleak", 0644, NULL, NULL,
-&kmemleak_fops);
-   if (!dentry)
-   pr_warn("Failed to create the debugfs kmemleak file\n");
mutex_lock(&scan_mutex);
start_scan_thread();
mutex_unlock(&scan_mutex);
-- 
2.11.0

Re: [PATCH 1/2] module: Overwrite st_size instead of st_info

2018-11-22 Thread Vincent Whitchurch

On Thu, Nov 22, 2018 at 12:01:54PM +, Dave Martin wrote:
> On Mon, Nov 19, 2018 at 05:25:12PM +0100, Vincent Whitchurch wrote:
> > st_info is currently overwritten after relocation and used to store the
> > elf_type().  However, we're going to need it fix kallsyms on ARM's
> > Thumb-2 kernels, so preserve st_info and overwrite the st_size field
> > instead.  st_size is neither used by the module core nor by any
> > architecture.
> > 
> > Signed-off-by: Vincent Whitchurch 
> > ---
> > v4: Split out to separate patch.  Use st_size instead of st_other.
> > v1-v3: See PATCH 2/2
> > 
> >  kernel/module.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/module.c b/kernel/module.c
> > index 49a405891587..3d86a38b580c 100644
> > --- a/kernel/module.c
> > +++ b/kernel/module.c
> > @@ -2682,7 +2682,7 @@ static void add_kallsyms(struct module *mod, const 
> > struct load_info *info)
> >  
> > /* Set types up while we still have access to sections. */
> > for (i = 0; i < mod->kallsyms->num_symtab; i++)
> > -   mod->kallsyms->symtab[i].st_info
> > +   mod->kallsyms->symtab[i].st_size
> > = elf_type(&mod->kallsyms->symtab[i], info);
> >  
> > /* Now populate the cut down core kallsyms for after init. */
> > @@ -4061,7 +4061,7 @@ int module_get_kallsym(unsigned int symnum, unsigned 
> > long *value, char *type,
> > kallsyms = rcu_dereference_sched(mod->kallsyms);
> > if (symnum < kallsyms->num_symtab) {
> > *value = kallsyms->symtab[symnum].st_value;
> > -   *type = kallsyms->symtab[symnum].st_info;
> > +   *type = kallsyms->symtab[symnum].st_size;
> > strlcpy(name, symname(kallsyms, symnum), KSYM_NAME_LEN);
> > strlcpy(module_name, mod->name, MODULE_NAME_LEN);
> > *exported = is_exported(name, *value, mod);
> 
> This is fine if st_size is really unused, but how sure are you of that?
> 
> grepping for st_size throws up some hits that appear ELF-related, but
> I've not investigated them in detail.
> 
> (The fact that struct stat has an identically named field throws up
> a load of false positives too.)

$ git describe --tags
v4.20-rc3-93-g92b419289cee
 
$ rg -m1 '[\.>]st_size'  --iglob '!**/tools/**' --iglob '!**/vdso*' --iglob 
'!**/scripts/**' --iglob '!**/usr/**' --iglob '!**/samples/**' | cat
| kernel/kexec_file.c:  if (sym->st_size != size) {

Symbols in kexec kernel.

| fs/stat.c:tmp.st_size = stat->size;
| Documentation/networking/tls.txt:  sendfile(sock, file, &offset, 
stat.st_size);
| net/9p/client.c:  ret->st_rdev, ret->st_size, ret->st_blksize,
| net/9p/protocol.c:&stbuf->st_rdev, 
&stbuf->st_size,
| fs/9p/vfs_inode_dotl.c:   i_size_write(inode, stat->st_size);
| fs/hostfs/hostfs_user.c:  p->size = buf->st_size;
| arch/powerpc/boot/mktree.c:   nblks = (st.st_size + IMGBLK) / IMGBLK;
| arch/alpha/kernel/osf_sys.c:  tmp.st_size = lstat->size;
| arch/x86/ia32/sys_ia32.c: __put_user(stat->size, &ubuf->st_size) ||

Not Elf_Sym.

| arch/x86/kernel/machine_kexec_64.c:sym->st_size);

Symbols in kexec kernel.

| arch/sparc/boot/piggyback.c:  st4(buffer + 12, s.st_size);
| arch/sparc/kernel/sys_sparc32.c:  err |= put_user(stat->size, 
&statbuf->st_size);
| arch/um/os-Linux/file.c:  .ust_size= src->st_size,/* 
total size, in bytes */
| arch/um/os-Linux/start_up.c:  size = (buf.st_size + UM_KERN_PAGE_SIZE) & 
~(UM_KERN_PAGE_SIZE - 1);
| arch/s390/kernel/compat_linux.c:  tmp.st_size = stat->size;
| arch/arm/kernel/sys_oabi-compat.c:tmp.st_size = stat->size;
| arch/mips/boot/compressed/calc_vmlinuz_load_addr.c:   vmlinux_size = 
(uint64_t)sb.st_size;
| drivers/net/ethernet/marvell/sky2.c:  hw->st_idx = 
RING_NEXT(hw->st_idx, hw->st_size);

Not Elf_Sym.

[PATCH 1/2] module: Overwrite st_size instead of st_info

2018-11-19 Thread Vincent Whitchurch

st_info is currently overwritten after relocation and used to store the
elf_type().  However, we're going to need it fix kallsyms on ARM's
Thumb-2 kernels, so preserve st_info and overwrite the st_size field
instead.  st_size is neither used by the module core nor by any
architecture.

Signed-off-by: Vincent Whitchurch 
---
v4: Split out to separate patch.  Use st_size instead of st_other.
v1-v3: See PATCH 2/2

 kernel/module.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 49a405891587..3d86a38b580c 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2682,7 +2682,7 @@ static void add_kallsyms(struct module *mod, const struct 
load_info *info)
 
/* Set types up while we still have access to sections. */
for (i = 0; i < mod->kallsyms->num_symtab; i++)
-   mod->kallsyms->symtab[i].st_info
+   mod->kallsyms->symtab[i].st_size
= elf_type(&mod->kallsyms->symtab[i], info);
 
/* Now populate the cut down core kallsyms for after init. */
@@ -4061,7 +4061,7 @@ int module_get_kallsym(unsigned int symnum, unsigned long 
*value, char *type,
kallsyms = rcu_dereference_sched(mod->kallsyms);
if (symnum < kallsyms->num_symtab) {
*value = kallsyms->symtab[symnum].st_value;
-   *type = kallsyms->symtab[symnum].st_info;
+   *type = kallsyms->symtab[symnum].st_size;
strlcpy(name, symname(kallsyms, symnum), KSYM_NAME_LEN);
strlcpy(module_name, mod->name, MODULE_NAME_LEN);
*exported = is_exported(name, *value, mod);
-- 
2.11.0

[PATCH 2/2] ARM: module: Fix function kallsyms on Thumb-2

2018-11-19 Thread Vincent Whitchurch

Thumb-2 functions have the lowest bit set in the symbol value in the
symtab.  When kallsyms are generated for the vmlinux, the kallsyms are
generated from the output of nm, and nm clears the lowest bit.

 $ arm-linux-gnueabihf-readelf -a vmlinux | grep show_interrupts
  95947: 8015dc89   686 FUNCGLOBAL DEFAULT2 show_interrupts
 $ arm-linux-gnueabihf-nm vmlinux | grep show_interrupts
 8015dc88 T show_interrupts
 $ cat /proc/kallsyms | grep show_interrupts
 8015dc88 T show_interrupts

However, for modules, the kallsyms uses the values in the symbol table
without modification, so for functions in modules, the lowest bit is set
in kallsyms.

 $ arm-linux-gnueabihf-readelf -a drivers/net/tun.ko | grep tun_get_socket
333: 2d4d36 FUNCGLOBAL DEFAULT1 tun_get_socket
 $ arm-linux-gnueabihf-nm drivers/net/tun.ko | grep tun_get_socket
 2d4c T tun_get_socket
 $ cat /proc/kallsyms | grep tun_get_socket
 7f802d4d t tun_get_socket  [tun]

Because of this, the symbol+offset of the crashing instruction shown in
oopses is incorrect when the crash is in a module.  For example, given a
tun_get_socket which starts like this,

 2d4c :
 2d4c:   6943ldr r3, [r0, #20]
 2d4e:   4a07ldr r2, [pc, #28]
 2d50:   4293cmp r3, r2

a crash when tun_get_socket is called with NULL results in:

 PC is at tun_xdp+0xa3/0xa4 [tun]
 pc : [<7f802d4c>]

As can be seen, the "PC is at" line reports the wrong symbol name, and
the symbol+offset will point to the wrong source line if it is passed to
gdb.

To solve this, add a way for archs to fixup the reading of these module
kallsyms values, and use that to clear the lowest bit for function
symbols on Thumb-2.

After the fix:

 # cat /proc/kallsyms | grep tun_get_socket
 7f802d4c t tun_get_socket   [tun]

 PC is at tun_get_socket+0x0/0x24 [tun]
 pc : [<7f802d4c>]

Signed-off-by: Vincent Whitchurch 
---
v4: Split out st_value overwrite change.  Add HAVE* macro to avoid function 
call.
v3: Do not overwrite st_value
v2: Fix build warning with !MODULES

 arch/arm/include/asm/module.h | 11 +++
 include/linux/module.h|  7 +++
 kernel/module.c   | 25 ++---
 3 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/arch/arm/include/asm/module.h b/arch/arm/include/asm/module.h
index 9e81b7c498d8..e2ccec651e71 100644
--- a/arch/arm/include/asm/module.h
+++ b/arch/arm/include/asm/module.h
@@ -61,4 +61,15 @@ u32 get_module_plt(struct module *mod, unsigned long loc, 
Elf32_Addr val);
MODULE_ARCH_VERMAGIC_ARMTHUMB \
MODULE_ARCH_VERMAGIC_P2V
 
+#ifdef CONFIG_THUMB2_KERNEL
+#define HAVE_ARCH_MODULE_KALLSYMS_SYMBOL_VALUE
+static inline unsigned long module_kallsyms_symbol_value(Elf_Sym *sym)
+{
+   if (ELF_ST_TYPE(sym->st_info) == STT_FUNC)
+   return sym->st_value & ~1;
+
+   return sym->st_value;
+}
+#endif
+
 #endif /* _ASM_ARM_MODULE_H */
diff --git a/include/linux/module.h b/include/linux/module.h
index fce6b4335e36..cfc55f358800 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -486,6 +486,13 @@ struct module {
 #define MODULE_ARCH_INIT {}
 #endif
 
+#ifndef HAVE_ARCH_MODULE_KALLSYMS_SYMBOL_VALUE
+static inline unsigned long module_kallsyms_symbol_value(Elf_Sym *sym)
+{
+   return sym->st_value;
+}
+#endif
+
 extern struct mutex module_mutex;
 
 /* FIXME: It'd be nice to isolate modules during init, too, so they
diff --git a/kernel/module.c b/kernel/module.c
index 3d86a38b580c..a36d7915ed2b 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3934,6 +3934,9 @@ static const char *get_ksymbol(struct module *mod,
/* Scan for closest preceding symbol, and next symbol. (ELF
   starts real symbols at 1). */
for (i = 1; i < kallsyms->num_symtab; i++) {
+   unsigned long thisval = 
module_kallsyms_symbol_value(&kallsyms->symtab[i]);
+   unsigned long bestval = 
module_kallsyms_symbol_value(&kallsyms->symtab[best]);
+
if (kallsyms->symtab[i].st_shndx == SHN_UNDEF)
continue;
 
@@ -3943,21 +3946,21 @@ static const char *get_ksymbol(struct module *mod,
|| is_arm_mapping_symbol(symname(kallsyms, i)))
continue;
 
-   if (kallsyms->symtab[i].st_value <= addr
-   && kallsyms->symtab[i].st_value > 
kallsyms->symtab[best].st_value)
+   if (thisval <= addr
+   && thisval > bestval)
best = i;
-   if (kallsyms->symtab[i].st_value > addr
-   && kallsyms->symtab[i].st_value < nextval)
-   nextval = kallsyms->symtab[i].st_value;
+   if (thisval > addr
+   && thisval < nextval)
+

[PATCH v3] ARM: module: Fix function kallsyms on Thumb-2

2018-11-13 Thread Vincent Whitchurch

Thumb-2 functions have the lowest bit set in the symbol value in the
symtab.  When kallsyms are generated for the vmlinux, the kallsyms are
generated from the output of nm, and nm clears the lowest bit.

 $ arm-linux-gnueabihf-readelf -a vmlinux | grep show_interrupts
  95947: 8015dc89   686 FUNCGLOBAL DEFAULT2 show_interrupts
 $ arm-linux-gnueabihf-nm vmlinux | grep show_interrupts
 8015dc88 T show_interrupts
 $ cat /proc/kallsyms | grep show_interrupts
 8015dc88 T show_interrupts

However, for modules, the kallsyms uses the values in the symbol table
without modification, so for functions in modules, the lowest bit is set
in kallsyms.

 $ arm-linux-gnueabihf-readelf -a drivers/net/tun.ko | grep tun_get_socket
268: 00e144 FUNCGLOBAL DEFAULT2 tun_get_socket
 $ arm-linux-gnueabihf-nm drivers/net/tun.ko | grep tun_get_socket
 00e0 T tun_get_socket
 $ cat /proc/kallsyms | grep tun_get_socket
 7fcd30e1 t tun_get_socket  [tun]

Because of this, the offset of the crashing instruction shown in oopses
is incorrect when the crash is in a module.  For example, given a
tun_get_socket which starts like this,

 00e0 :
   e0:   b500push{lr}
   e2:   f7ff fffe   bl  0 <__gnu_mcount_nc>
   e6:   4b08ldr r3, [pc, #32]
   e8:   6942ldr r2, [r0, #20]
   ea:   429acmp r2, r3
   ec:   d002beq.n   f4 

a crash when tun_get_socket is called with NULL results in:

 PC is at tun_get_socket+0x7/0x2c [tun]
 pc : [<7fcdb0e8>]

which can result in the incorrect line being reported by gdb if this
symbol+offset is used there.  If the crash is on the first instruction
of a function, the "PC is at" line would also report the symbol name of
the preceding function.

To solve this, introduce a weak module_kallsyms_symbol_value() function
which arches can override to fix up these symbol values, and implement
this for Thumb-2.  We need to move elf_type() to st_other so that the
original value of st_info is preserved.

After the fix:

 $ cat /proc/kallsyms | grep tun_get_socket
 7fcd30e0 t tun_get_socket  [tun]

 PC is at tun_get_socket+0x8/0x2c [tun]
 pc : [<7fcdb0e8>]

Signed-off-by: Vincent Whitchurch 
---
v3: Do not overwrite st_value
v2: Fix build warning with !MODULES

 arch/arm/kernel/module.c | 10 ++
 include/linux/moduleloader.h |  2 ++
 kernel/module.c  | 34 +-
 3 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index 3ff571c2c71c..89ab84a83600 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -336,6 +336,16 @@ static const Elf_Shdr *find_mod_section(const Elf32_Ehdr 
*hdr,
 extern void fixup_pv_table(const void *, unsigned long);
 extern void fixup_smp(const void *, unsigned long);
 
+#ifdef CONFIG_THUMB2_KERNEL
+unsigned long module_kallsyms_symbol_value(Elf_Sym *sym)
+{
+   if (ELF_ST_TYPE(sym->st_info) == STT_FUNC)
+   return sym->st_value & ~1;
+
+   return sym->st_value;
+}
+#endif
+
 int module_finalize(const Elf32_Ehdr *hdr, const Elf_Shdr *sechdrs,
struct module *mod)
 {
diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 31013c2effd3..6395409b01a4 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -86,6 +86,8 @@ void module_arch_cleanup(struct module *mod);
 /* Any cleanup before freeing mod->module_init */
 void module_arch_freeing_init(struct module *mod);
 
+unsigned long module_kallsyms_symbol_value(Elf_Sym *sym);
+
 #ifdef CONFIG_KASAN
 #include 
 #define MODULE_ALIGN (PAGE_SIZE << KASAN_SHADOW_SCALE_SHIFT)
diff --git a/kernel/module.c b/kernel/module.c
index 49a405891587..5a588cfbb8f8 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2682,7 +2682,7 @@ static void add_kallsyms(struct module *mod, const struct 
load_info *info)
 
/* Set types up while we still have access to sections. */
for (i = 0; i < mod->kallsyms->num_symtab; i++)
-   mod->kallsyms->symtab[i].st_info
+   mod->kallsyms->symtab[i].st_other
= elf_type(&mod->kallsyms->symtab[i], info);
 
/* Now populate the cut down core kallsyms for after init. */
@@ -3916,6 +3916,11 @@ static const char *symname(struct mod_kallsyms 
*kallsyms, unsigned int symnum)
return kallsyms->strtab + kallsyms->symtab[symnum].st_name;
 }
 
+unsigned long __weak module_kallsyms_symbol_value(Elf_Sym *sym)
+{
+   return sym->st_value;
+}
+
 static const char *get_ksymbol(struct module *mod,
   unsigned long addr,
   unsigned long *size,
@@ -3934,6 +3939,9 @@ static const char *get_ksymbol(struct module *mod,
/* Scan for closest preceding sy

[PATCH v5 2/2] ARM: module: Fix function kallsyms on Thumb-2

2018-12-04 Thread Vincent Whitchurch

Thumb-2 functions have the lowest bit set in the symbol value in the
symtab.  When kallsyms are generated for the vmlinux, the kallsyms are
generated from the output of nm, and nm clears the lowest bit.

 $ arm-linux-gnueabihf-readelf -a vmlinux | grep show_interrupts
  95947: 8015dc89   686 FUNCGLOBAL DEFAULT2 show_interrupts
 $ arm-linux-gnueabihf-nm vmlinux | grep show_interrupts
 8015dc88 T show_interrupts
 $ cat /proc/kallsyms | grep show_interrupts
 8015dc88 T show_interrupts

However, for modules, the kallsyms uses the values in the symbol table
without modification, so for functions in modules, the lowest bit is set
in kallsyms.

 $ arm-linux-gnueabihf-readelf -a drivers/net/tun.ko | grep tun_get_socket
333: 2d4d36 FUNCGLOBAL DEFAULT1 tun_get_socket
 $ arm-linux-gnueabihf-nm drivers/net/tun.ko | grep tun_get_socket
 2d4c T tun_get_socket
 $ cat /proc/kallsyms | grep tun_get_socket
 7f802d4d t tun_get_socket  [tun]

Because of this, the symbol+offset of the crashing instruction shown in
oopses is incorrect when the crash is in a module.  For example, given a
tun_get_socket which starts like this,

 2d4c :
 2d4c:   6943ldr r3, [r0, #20]
 2d4e:   4a07ldr r2, [pc, #28]
 2d50:   4293cmp r3, r2

a crash when tun_get_socket is called with NULL results in:

 PC is at tun_xdp+0xa3/0xa4 [tun]
 pc : [<7f802d4c>]

As can be seen, the "PC is at" line reports the wrong symbol name, and
the symbol+offset will point to the wrong source line if it is passed to
gdb.

To solve this, add a way for archs to fixup the reading of these module
kallsyms values, and use that to clear the lowest bit for function
symbols on Thumb-2.

After the fix:

 # cat /proc/kallsyms | grep tun_get_socket
 7f802d4c t tun_get_socket   [tun]

 PC is at tun_get_socket+0x0/0x24 [tun]
 pc : [<7f802d4c>]

Signed-off-by: Vincent Whitchurch 
---
v5: Use/move local variables to reduce calls and keep lines short.  Use const 
arg.
v4: Split out st_value overwrite change.  Add HAVE* macro to avoid function 
call.
v3: Do not overwrite st_value
v2: Fix build warning with !MODULES

 arch/arm/include/asm/module.h | 11 +++
 include/linux/module.h|  7 +++
 kernel/module.c   | 45 +++
 3 files changed, 46 insertions(+), 17 deletions(-)

diff --git a/arch/arm/include/asm/module.h b/arch/arm/include/asm/module.h
index 9e81b7c498d8..c7bcf0347801 100644
--- a/arch/arm/include/asm/module.h
+++ b/arch/arm/include/asm/module.h
@@ -61,4 +61,15 @@ u32 get_module_plt(struct module *mod, unsigned long loc, 
Elf32_Addr val);
MODULE_ARCH_VERMAGIC_ARMTHUMB \
MODULE_ARCH_VERMAGIC_P2V
 
+#ifdef CONFIG_THUMB2_KERNEL
+#define HAVE_ARCH_MODULE_KALLSYMS_SYMBOL_VALUE
+static inline unsigned long module_kallsyms_symbol_value(const Elf_Sym *sym)
+{
+   if (ELF_ST_TYPE(sym->st_info) == STT_FUNC)
+   return sym->st_value & ~1;
+
+   return sym->st_value;
+}
+#endif
+
 #endif /* _ASM_ARM_MODULE_H */
diff --git a/include/linux/module.h b/include/linux/module.h
index fce6b4335e36..12146257eb5d 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -486,6 +486,13 @@ struct module {
 #define MODULE_ARCH_INIT {}
 #endif
 
+#ifndef HAVE_ARCH_MODULE_KALLSYMS_SYMBOL_VALUE
+static inline unsigned long module_kallsyms_symbol_value(const Elf_Sym *sym)
+{
+   return sym->st_value;
+}
+#endif
+
 extern struct mutex module_mutex;
 
 /* FIXME: It'd be nice to isolate modules during init, too, so they
diff --git a/kernel/module.c b/kernel/module.c
index 3d86a38b580c..9364017fdc21 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3922,7 +3922,7 @@ static const char *get_ksymbol(struct module *mod,
   unsigned long *offset)
 {
unsigned int i, best = 0;
-   unsigned long nextval;
+   unsigned long nextval, bestval;
struct mod_kallsyms *kallsyms = rcu_dereference_sched(mod->kallsyms);
 
/* At worse, next value is at end of module */
@@ -3931,10 +3931,15 @@ static const char *get_ksymbol(struct module *mod,
else
nextval = (unsigned 
long)mod->core_layout.base+mod->core_layout.text_size;
 
+   bestval = module_kallsyms_symbol_value(&kallsyms->symtab[best]);
+
/* Scan for closest preceding symbol, and next symbol. (ELF
   starts real symbols at 1). */
for (i = 1; i < kallsyms->num_symtab; i++) {
-   if (kallsyms->symtab[i].st_shndx == SHN_UNDEF)
+   const Elf_Sym *sym = &kallsyms->symtab[i];
+   unsigned long thisval = module_kallsyms_symbol_value(sym);
+
+   if (sym->st_shndx == SHN_UNDEF)
continue;
 
/* We ignore unnamed symbols: they're unin

[PATCH v5 1/2] module: Overwrite st_size instead of st_info

2018-12-04 Thread Vincent Whitchurch

st_info is currently overwritten after relocation and used to store the
elf_type().  However, we're going to need it fix kallsyms on ARM's
Thumb-2 kernels, so preserve st_info and overwrite the st_size field
instead.  st_size is neither used by the module core nor by any
architecture.

Reviewed-by: Dave Martin 
Signed-off-by: Vincent Whitchurch 
---
v5: Add Dave Martin's Reviewed-by
v4: Split out to separate patch.  Use st_size instead of st_other.
v1-v3: See PATCH 2/2

 kernel/module.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 49a405891587..3d86a38b580c 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2682,7 +2682,7 @@ static void add_kallsyms(struct module *mod, const struct 
load_info *info)
 
/* Set types up while we still have access to sections. */
for (i = 0; i < mod->kallsyms->num_symtab; i++)
-   mod->kallsyms->symtab[i].st_info
+   mod->kallsyms->symtab[i].st_size
= elf_type(&mod->kallsyms->symtab[i], info);
 
/* Now populate the cut down core kallsyms for after init. */
@@ -4061,7 +4061,7 @@ int module_get_kallsym(unsigned int symnum, unsigned long 
*value, char *type,
kallsyms = rcu_dereference_sched(mod->kallsyms);
if (symnum < kallsyms->num_symtab) {
*value = kallsyms->symtab[symnum].st_value;
-   *type = kallsyms->symtab[symnum].st_info;
+   *type = kallsyms->symtab[symnum].st_size;
strlcpy(name, symname(kallsyms, symnum), KSYM_NAME_LEN);
strlcpy(module_name, mod->name, MODULE_NAME_LEN);
*exported = is_exported(name, *value, mod);
-- 
2.11.0

Re: [PATCH v2] ARM: module: Fix function kallsyms on Thumb-2

2018-11-09 Thread Vincent Whitchurch

On Fri, Nov 02, 2018 at 02:53:22PM +0100, Jessica Yu wrote:
> +++ Vincent Whitchurch [01/11/18 16:29 +0100]:
> > On Wed, Oct 31, 2018 at 04:53:41PM +0100, Jessica Yu wrote:
> > > Could this be done in modpost? I'm guessing the answer is no as some
> > > relocations may rely on that bit being set in st_value, right?
> > > Therefore we can only clear the bit _after_ relocations to the module
> > > are applied at runtime, correct?
> > 
> > Yes, that's correct, it needs to be done after the relocations.
> > 
> > > I'm not against having an arch-specific kallsyms fixup function, my
> > > only concern would be if this would interfere with the delayed
> > > relocations livepatching does, if there are plans in the future to
> > > have livepatching on arm (IIRC there was an attempt at a port in
> > > 2016). If there exists some Thumb-2 relocation types that rely on that
> > > lowest bit in st_value being set in order to be applied, and we clear
> > > it permanently from the symtab, then livepatching wouldn't be able to
> > > apply those types of relocations anymore. If this is a legitimate
> > > concern, then perhaps an alternative solution would be to have an
> > > arch-specific kallsyms symbol-value-fixup function for accesses to
> > > sym.st_value, without modifying the module symtab.
> > 
> > I'm not familiar with livepatching, but yes, if it needs to do
> > relocations later I guess we should preserve the original value.
> 
> Yeah, I think the symtab needs to remain unmodified for livepatch to
> be able to do delayed relocations after module load.
> 
> > I gave the alternative solution a go and it seems to work.
> > add_kallsyms() currently overwrites st_info so I had to move the
> > elf_type to the unused st_other field instead to preserve st_info:
> 
> I think that's fine, I think the only usages of st_other I've found
> are during relocations, and the field is big enough for a char, so it
> should be fine to reuse it afterwards.

If livepatch can do relocations later, doesn't that mean that we _can't_
reuse st_other for storing the elf_type?  OTOH, the current code
overwrites st_info, and I see that used from various archs' relocation
functions too, so perhaps I'm missing something.

[PATCH v6 modules-next 1/2] module: Overwrite st_size instead of st_info

2018-12-14 Thread Vincent Whitchurch

st_info is currently overwritten after relocation and used to store the
elf_type().  However, we're going to need it fix kallsyms on ARM's
Thumb-2 kernels, so preserve st_info and overwrite the st_size field
instead.  st_size is neither used by the module core nor by any
architecture.

Reviewed-by: Miroslav Benes 
Reviewed-by: Dave Martin 
Signed-off-by: Vincent Whitchurch 
---
v6: Add Miroslav Benes' Reviewed-by
v5: Add Dave Martin's Reviewed-by
v4: Split out to separate patch.  Use st_size instead of st_other.
v1-v3: See PATCH 2/2

 kernel/module.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 1b5edf78694c..b36ff8a3d562 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2684,7 +2684,7 @@ static void add_kallsyms(struct module *mod, const struct 
load_info *info)
 
/* Set types up while we still have access to sections. */
for (i = 0; i < mod->kallsyms->num_symtab; i++)
-   mod->kallsyms->symtab[i].st_info
+   mod->kallsyms->symtab[i].st_size
= elf_type(&mod->kallsyms->symtab[i], info);
 
/* Now populate the cut down core kallsyms for after init. */
@@ -4070,7 +4070,7 @@ int module_get_kallsym(unsigned int symnum, unsigned long 
*value, char *type,
kallsyms = rcu_dereference_sched(mod->kallsyms);
if (symnum < kallsyms->num_symtab) {
*value = kallsyms->symtab[symnum].st_value;
-   *type = kallsyms->symtab[symnum].st_info;
+   *type = kallsyms->symtab[symnum].st_size;
strlcpy(name, kallsyms_symbol_name(kallsyms, symnum), 
KSYM_NAME_LEN);
strlcpy(module_name, mod->name, MODULE_NAME_LEN);
*exported = is_exported(name, *value, mod);
-- 
2.11.0

[PATCH v6 modules-next 2/2] ARM: module: Fix function kallsyms on Thumb-2

2018-12-14 Thread Vincent Whitchurch

Thumb-2 functions have the lowest bit set in the symbol value in the
symtab.  When kallsyms are generated for the vmlinux, the kallsyms are
generated from the output of nm, and nm clears the lowest bit.

 $ arm-linux-gnueabihf-readelf -a vmlinux | grep show_interrupts
  95947: 8015dc89   686 FUNCGLOBAL DEFAULT2 show_interrupts
 $ arm-linux-gnueabihf-nm vmlinux | grep show_interrupts
 8015dc88 T show_interrupts
 $ cat /proc/kallsyms | grep show_interrupts
 8015dc88 T show_interrupts

However, for modules, the kallsyms uses the values in the symbol table
without modification, so for functions in modules, the lowest bit is set
in kallsyms.

 $ arm-linux-gnueabihf-readelf -a drivers/net/tun.ko | grep tun_get_socket
333: 2d4d36 FUNCGLOBAL DEFAULT1 tun_get_socket
 $ arm-linux-gnueabihf-nm drivers/net/tun.ko | grep tun_get_socket
 2d4c T tun_get_socket
 $ cat /proc/kallsyms | grep tun_get_socket
 7f802d4d t tun_get_socket  [tun]

Because of this, the symbol+offset of the crashing instruction shown in
oopses is incorrect when the crash is in a module.  For example, given a
tun_get_socket which starts like this,

 2d4c :
 2d4c:   6943ldr r3, [r0, #20]
 2d4e:   4a07ldr r2, [pc, #28]
 2d50:   4293cmp r3, r2

a crash when tun_get_socket is called with NULL results in:

 PC is at tun_xdp+0xa3/0xa4 [tun]
 pc : [<7f802d4c>]

As can be seen, the "PC is at" line reports the wrong symbol name, and
the symbol+offset will point to the wrong source line if it is passed to
gdb.

To solve this, add a way for archs to fixup the reading of these module
kallsyms values, and use that to clear the lowest bit for function
symbols on Thumb-2.

After the fix:

 # cat /proc/kallsyms | grep tun_get_socket
 7f802d4c t tun_get_socket   [tun]

 PC is at tun_get_socket+0x0/0x24 [tun]
 pc : [<7f802d4c>]

Signed-off-by: Vincent Whitchurch 
---
v6: Rename module_kallsyms_symbol_value() -> kallsyms_symbol_value()
v5: Use/move local variables to reduce calls and keep lines short.  Use const 
arg.
v4: Split out st_value overwrite change.  Add HAVE* macro to avoid function 
call.
v3: Do not overwrite st_value
v2: Fix build warning with !MODULES

 arch/arm/include/asm/module.h | 11 +++
 include/linux/module.h|  7 +++
 kernel/module.c   | 43 +++
 3 files changed, 45 insertions(+), 16 deletions(-)

diff --git a/arch/arm/include/asm/module.h b/arch/arm/include/asm/module.h
index 9e81b7c498d8..182163b55546 100644
--- a/arch/arm/include/asm/module.h
+++ b/arch/arm/include/asm/module.h
@@ -61,4 +61,15 @@ u32 get_module_plt(struct module *mod, unsigned long loc, 
Elf32_Addr val);
MODULE_ARCH_VERMAGIC_ARMTHUMB \
MODULE_ARCH_VERMAGIC_P2V
 
+#ifdef CONFIG_THUMB2_KERNEL
+#define HAVE_ARCH_KALLSYMS_SYMBOL_VALUE
+static inline unsigned long kallsyms_symbol_value(const Elf_Sym *sym)
+{
+   if (ELF_ST_TYPE(sym->st_info) == STT_FUNC)
+   return sym->st_value & ~1;
+
+   return sym->st_value;
+}
+#endif
+
 #endif /* _ASM_ARM_MODULE_H */
diff --git a/include/linux/module.h b/include/linux/module.h
index fce6b4335e36..c0b4b7840b57 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -486,6 +486,13 @@ struct module {
 #define MODULE_ARCH_INIT {}
 #endif
 
+#ifndef HAVE_ARCH_KALLSYMS_SYMBOL_VALUE
+static inline unsigned long kallsyms_symbol_value(const Elf_Sym *sym)
+{
+   return sym->st_value;
+}
+#endif
+
 extern struct mutex module_mutex;
 
 /* FIXME: It'd be nice to isolate modules during init, too, so they
diff --git a/kernel/module.c b/kernel/module.c
index b36ff8a3d562..164bf201eae4 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3928,7 +3928,7 @@ static const char *find_kallsyms_symbol(struct module 
*mod,
unsigned long *offset)
 {
unsigned int i, best = 0;
-   unsigned long nextval;
+   unsigned long nextval, bestval;
struct mod_kallsyms *kallsyms = rcu_dereference_sched(mod->kallsyms);
 
/* At worse, next value is at end of module */
@@ -3937,10 +3937,15 @@ static const char *find_kallsyms_symbol(struct module 
*mod,
else
nextval = (unsigned 
long)mod->core_layout.base+mod->core_layout.text_size;
 
+   bestval = kallsyms_symbol_value(&kallsyms->symtab[best]);
+
/* Scan for closest preceding symbol, and next symbol. (ELF
   starts real symbols at 1). */
for (i = 1; i < kallsyms->num_symtab; i++) {
-   if (kallsyms->symtab[i].st_shndx == SHN_UNDEF)
+   const Elf_Sym *sym = &kallsyms->symtab[i];
+   unsigned long thisval = kallsyms_symbol_value(sym);
+
+   if (sym->st_shndx == SHN_UNDEF)
continue;
 
/* We ignore

Re: [PATCH v5 2/2] ARM: module: Fix function kallsyms on Thumb-2

2018-12-14 Thread Vincent Whitchurch

On Thu, Dec 06, 2018 at 06:29:20PM +0100, Jessica Yu wrote:
> Also, do you mind if I drop the module_ prefix from
> module_kallsyms_symbol_value()? I recently submitted some internal
> module kallsyms cleanups [1] and there we have the newly renamed
> kallsyms_symbol_name(), so I think it'd be nice if we kept the naming
> scheme consistent with just kallsyms_symbol_value(). I could just do
> this myself if you don't want to respin a v6.

I've sent out a v6 with this change.  I also rebased the change on top
of your cleanups in modules-next.

Thanks.

[PATCH] drop_caches: Allow unmapping pages

2019-01-07 Thread Vincent Whitchurch

drop_caches does not drop pages which are currently mapped.  Add an
option to try to unmap and drop even these pages.  This provides a
simple way to obtain a rough estimate of how many file pages are used in
a particular use case: drop everything and check how much gets read
back.

 # cat /proc/meminfo | grep file
 Active(file):  16608 kB
 Inactive(file):23424 kB
 # echo 3 > /proc/sys/vm/drop_caches && cat /proc/meminfo | grep file
 Active(file):  10624 kB
 Inactive(file):15060 kB
 # echo 11 > /proc/sys/vm/drop_caches && cat /proc/meminfo | grep file
 Active(file):240 kB
 Inactive(file): 2344 kB

Signed-off-by: Vincent Whitchurch 
---
 Documentation/sysctl/vm.txt |  4 
 fs/drop_caches.c|  3 ++-
 include/linux/fs.h  | 10 --
 kernel/sysctl.c |  4 ++--
 mm/truncate.c   | 39 -
 5 files changed, 41 insertions(+), 19 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 187ce4f599a2..6ea06c2c973b 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -222,6 +222,10 @@ To increase the number of objects freed by this operation, 
the user may run
 number of dirty objects on the system and create more candidates to be
 dropped.
 
+By default, pages which are currently mapped are not dropped from the
+pagecache.  If you want to unmap and drop these pages too, echo 9 or 11 instead
+of 1 or 3 respectively (set bit 4).
+
 This file is not a means to control the growth of the various kernel caches
 (inodes, dentries, pagecache, etc...)  These objects are automatically
 reclaimed by the kernel when memory is needed elsewhere on the system.
diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index 82377017130f..9faaa1e3a672 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -17,6 +17,7 @@ int sysctl_drop_caches;
 static void drop_pagecache_sb(struct super_block *sb, void *unused)
 {
struct inode *inode, *toput_inode = NULL;
+   bool unmap = sysctl_drop_caches & 8;
 
spin_lock(&sb->s_inode_list_lock);
list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
@@ -30,7 +31,7 @@ static void drop_pagecache_sb(struct super_block *sb, void 
*unused)
spin_unlock(&inode->i_lock);
spin_unlock(&sb->s_inode_list_lock);
 
-   invalidate_mapping_pages(inode->i_mapping, 0, -1);
+   __invalidate_mapping_pages(inode->i_mapping, 0, -1, unmap);
iput(toput_inode);
toput_inode = inode;
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 811c77743dad..503e176654ce 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2675,8 +2675,14 @@ extern int check_disk_change(struct block_device *);
 extern int __invalidate_device(struct block_device *, bool);
 extern int invalidate_partition(struct gendisk *, int);
 #endif
-unsigned long invalidate_mapping_pages(struct address_space *mapping,
-   pgoff_t start, pgoff_t end);
+unsigned long __invalidate_mapping_pages(struct address_space *mapping,
+   pgoff_t start, pgoff_t end, bool unmap);
+
+static inline unsigned long invalidate_mapping_pages(struct address_space 
*mapping,
+pgoff_t start, pgoff_t end)
+{
+   return __invalidate_mapping_pages(mapping, start, end, false);
+}
 
 static inline void invalidate_remote_inode(struct inode *inode)
 {
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index ba4d9e85feb8..f12c2a8d84fb 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -125,7 +125,7 @@ static int __maybe_unused neg_one = -1;
 static int zero;
 static int __maybe_unused one = 1;
 static int __maybe_unused two = 2;
-static int __maybe_unused four = 4;
+static int __maybe_unused fifteen = 15;
 static unsigned long one_ul = 1;
 static int one_hundred = 100;
 static int one_thousand = 1000;
@@ -1431,7 +1431,7 @@ static struct ctl_table vm_table[] = {
.mode   = 0644,
.proc_handler   = drop_caches_sysctl_handler,
.extra1 = &one,
-   .extra2 = &four,
+   .extra2 = &fifteen,
},
 #ifdef CONFIG_COMPACTION
{
diff --git a/mm/truncate.c b/mm/truncate.c
index 798e7ccfb030..613b02e02146 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -245,6 +245,22 @@ int generic_error_remove_page(struct address_space 
*mapping, struct page *page)
 }
 EXPORT_SYMBOL(generic_error_remove_page);
 
+static int __invalidate_inode_page(struct page *page, bool unmap)
+{
+   struct address_space *mapping = page_mapping(page);
+   if (!mapping)
+   return 0;
+   if (PageDirty(page) || PageWriteback(page))
+   return 0;
+   if (page_mapped(page)) {
+   if (!unmap)

[PATCH] ARM: module: Fix function kallsyms on Thumb-2

2018-10-29 Thread Vincent Whitchurch

Thumb-2 functions have the lowest bit set in the symbol value in the
symtab.  When kallsyms are generated for the vmlinux, the kallsyms are
generated from the output of nm, and nm clears the lowest bit.

 $ arm-linux-gnueabihf-readelf -a vmlinux | grep show_interrupts
  95947: 8015dc89   686 FUNCGLOBAL DEFAULT2 show_interrupts
 $ arm-linux-gnueabihf-nm vmlinux | grep show_interrupts
 8015dc88 T show_interrupts
 $ cat /proc/kallsyms | grep show_interrupts
 8015dc88 T show_interrupts

However, for modules, the kallsyms uses the values in the symbol table
without modification, so for functions in modules, the lowest bit is set
in kallsyms.

 $ arm-linux-gnueabihf-readelf -a drivers/net/tun.ko | grep tun_get_socket
268: 00e144 FUNCGLOBAL DEFAULT2 tun_get_socket
 $ arm-linux-gnueabihf-nm drivers/net/tun.ko | grep tun_get_socket
 00e0 T tun_get_socket
 $ cat /proc/kallsyms | grep tun_get_socket
 7fcd30e1 t tun_get_socket  [tun]

Because of this, the offset of the crashing instruction shown in oopses
is incorrect when the crash is in a module.  For example, given a
tun_get_socket which starts like this,

 00e0 :
   e0:   b500push{lr}
   e2:   f7ff fffe   bl  0 <__gnu_mcount_nc>
   e6:   4b08ldr r3, [pc, #32]
   e8:   6942ldr r2, [r0, #20]
   ea:   429acmp r2, r3
   ec:   d002beq.n   f4 

a crash when tun_get_socket is called with NULL results in:

 PC is at tun_get_socket+0x7/0x2c [tun]
 pc : [<7fcdb0e8>]

which can result in the incorrect line being reported by gdb if this
symbol+offset is used there.  If the crash is on the first instruction
of a function, the "PC is at" line would also report the symbol name of
the preceding function.

To solve this, fix up these symbols like nm does.  For this, we need a
new hook in the generic module loading code, before the symbols' st_info
is overwritten by add_kallsyms().  After the fix:

 $ cat /proc/kallsyms | grep tun_get_socket
 7fcd30e0 t tun_get_socket  [tun]

 PC is at tun_get_socket+0x8/0x2c [tun]
 pc : [<7fcdb0e8>]

Signed-off-by: Vincent Whitchurch 
---
 arch/arm/kernel/module.c | 14 ++
 include/linux/moduleloader.h |  2 ++
 kernel/module.c  |  6 ++
 3 files changed, 22 insertions(+)

diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index 3ff571c2c71c..771f86318d84 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -399,6 +399,20 @@ int module_finalize(const Elf32_Ehdr *hdr, const Elf_Shdr 
*sechdrs,
return 0;
 }
 
+#if defined(CONFIG_THUMB2_KERNEL) && defined(CONFIG_KALLSYMS)
+void module_fixup_kallsyms(struct mod_kallsyms *kallsyms)
+{
+   int i;
+
+   for (i = 0; i < kallsyms->num_symtab; i++) {
+   Elf_Sym *sym = &kallsyms->symtab[i];
+
+   if (ELF_ST_TYPE(sym->st_info) == STT_FUNC)
+   sym->st_value &= ~1;
+   }
+}
+#endif
+
 void
 module_arch_cleanup(struct module *mod)
 {
diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 31013c2effd3..92387dd49b82 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -86,6 +86,8 @@ void module_arch_cleanup(struct module *mod);
 /* Any cleanup before freeing mod->module_init */
 void module_arch_freeing_init(struct module *mod);
 
+void module_fixup_kallsyms(struct mod_kallsyms *kallsyms);
+
 #ifdef CONFIG_KASAN
 #include 
 #define MODULE_ALIGN (PAGE_SIZE << KASAN_SHADOW_SCALE_SHIFT)
diff --git a/kernel/module.c b/kernel/module.c
index 49a405891587..ded4f4b49824 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2659,6 +2659,10 @@ static void layout_symtab(struct module *mod, struct 
load_info *info)
mod->init_layout.size = debug_align(mod->init_layout.size);
 }
 
+void __weak module_fixup_kallsyms(struct mod_kallsyms *kallsyms)
+{
+}
+
 /*
  * We use the full symtab and strtab which layout_symtab arranged to
  * be appended to the init section.  Later we switch to the cut-down
@@ -2680,6 +2684,8 @@ static void add_kallsyms(struct module *mod, const struct 
load_info *info)
/* Make sure we get permanent strtab: don't use info->strtab. */
mod->kallsyms->strtab = (void *)info->sechdrs[info->index.str].sh_addr;
 
+   module_fixup_kallsyms(mod->kallsyms);
+
/* Set types up while we still have access to sections. */
for (i = 0; i < mod->kallsyms->num_symtab; i++)
mod->kallsyms->symtab[i].st_info
-- 
2.11.0

[PATCH v2] ARM: module: Fix function kallsyms on Thumb-2

2018-10-31 Thread Vincent Whitchurch

Thumb-2 functions have the lowest bit set in the symbol value in the
symtab.  When kallsyms are generated for the vmlinux, the kallsyms are
generated from the output of nm, and nm clears the lowest bit.

 $ arm-linux-gnueabihf-readelf -a vmlinux | grep show_interrupts
  95947: 8015dc89   686 FUNCGLOBAL DEFAULT2 show_interrupts
 $ arm-linux-gnueabihf-nm vmlinux | grep show_interrupts
 8015dc88 T show_interrupts
 $ cat /proc/kallsyms | grep show_interrupts
 8015dc88 T show_interrupts

However, for modules, the kallsyms uses the values in the symbol table
without modification, so for functions in modules, the lowest bit is set
in kallsyms.

 $ arm-linux-gnueabihf-readelf -a drivers/net/tun.ko | grep tun_get_socket
268: 00e144 FUNCGLOBAL DEFAULT2 tun_get_socket
 $ arm-linux-gnueabihf-nm drivers/net/tun.ko | grep tun_get_socket
 00e0 T tun_get_socket
 $ cat /proc/kallsyms | grep tun_get_socket
 7fcd30e1 t tun_get_socket  [tun]

Because of this, the offset of the crashing instruction shown in oopses
is incorrect when the crash is in a module.  For example, given a
tun_get_socket which starts like this,

 00e0 :
   e0:   b500push{lr}
   e2:   f7ff fffe   bl  0 <__gnu_mcount_nc>
   e6:   4b08ldr r3, [pc, #32]
   e8:   6942ldr r2, [r0, #20]
   ea:   429acmp r2, r3
   ec:   d002beq.n   f4 

a crash when tun_get_socket is called with NULL results in:

 PC is at tun_get_socket+0x7/0x2c [tun]
 pc : [<7fcdb0e8>]

which can result in the incorrect line being reported by gdb if this
symbol+offset is used there.  If the crash is on the first instruction
of a function, the "PC is at" line would also report the symbol name of
the preceding function.

To solve this, fix up these symbols like nm does.  For this, we need a
new hook in the generic module loading code, before the symbols' st_info
is overwritten by add_kallsyms().  After the fix:

 $ cat /proc/kallsyms | grep tun_get_socket
 7fcd30e0 t tun_get_socket  [tun]

 PC is at tun_get_socket+0x8/0x2c [tun]
 pc : [<7fcdb0e8>]

Signed-off-by: Vincent Whitchurch 
---
v2: Fix build warning with !MODULES

 arch/arm/kernel/module.c | 14 ++
 include/linux/moduleloader.h |  3 +++
 kernel/module.c  |  6 ++
 3 files changed, 23 insertions(+)

diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index 3ff571c2c71c..771f86318d84 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -399,6 +399,20 @@ int module_finalize(const Elf32_Ehdr *hdr, const Elf_Shdr 
*sechdrs,
return 0;
 }
 
+#if defined(CONFIG_THUMB2_KERNEL) && defined(CONFIG_KALLSYMS)
+void module_fixup_kallsyms(struct mod_kallsyms *kallsyms)
+{
+   int i;
+
+   for (i = 0; i < kallsyms->num_symtab; i++) {
+   Elf_Sym *sym = &kallsyms->symtab[i];
+
+   if (ELF_ST_TYPE(sym->st_info) == STT_FUNC)
+   sym->st_value &= ~1;
+   }
+}
+#endif
+
 void
 module_arch_cleanup(struct module *mod)
 {
diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 31013c2effd3..b1ac5584eaa5 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -86,6 +86,9 @@ void module_arch_cleanup(struct module *mod);
 /* Any cleanup before freeing mod->module_init */
 void module_arch_freeing_init(struct module *mod);
 
+struct mod_kallsyms;
+void module_fixup_kallsyms(struct mod_kallsyms *kallsyms);
+
 #ifdef CONFIG_KASAN
 #include 
 #define MODULE_ALIGN (PAGE_SIZE << KASAN_SHADOW_SCALE_SHIFT)
diff --git a/kernel/module.c b/kernel/module.c
index 49a405891587..ded4f4b49824 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2659,6 +2659,10 @@ static void layout_symtab(struct module *mod, struct 
load_info *info)
mod->init_layout.size = debug_align(mod->init_layout.size);
 }
 
+void __weak module_fixup_kallsyms(struct mod_kallsyms *kallsyms)
+{
+}
+
 /*
  * We use the full symtab and strtab which layout_symtab arranged to
  * be appended to the init section.  Later we switch to the cut-down
@@ -2680,6 +2684,8 @@ static void add_kallsyms(struct module *mod, const struct 
load_info *info)
/* Make sure we get permanent strtab: don't use info->strtab. */
mod->kallsyms->strtab = (void *)info->sechdrs[info->index.str].sh_addr;
 
+   module_fixup_kallsyms(mod->kallsyms);
+
/* Set types up while we still have access to sections. */
for (i = 0; i < mod->kallsyms->num_symtab; i++)
mod->kallsyms->symtab[i].st_info
-- 
2.11.0

Re: [PATCH v2] ARM: module: Fix function kallsyms on Thumb-2

2018-11-01 Thread Vincent Whitchurch

On Wed, Oct 31, 2018 at 04:53:41PM +0100, Jessica Yu wrote:
> Could this be done in modpost? I'm guessing the answer is no as some
> relocations may rely on that bit being set in st_value, right?
> Therefore we can only clear the bit _after_ relocations to the module
> are applied at runtime, correct?

Yes, that's correct, it needs to be done after the relocations.

> I'm not against having an arch-specific kallsyms fixup function, my
> only concern would be if this would interfere with the delayed
> relocations livepatching does, if there are plans in the future to
> have livepatching on arm (IIRC there was an attempt at a port in
> 2016). If there exists some Thumb-2 relocation types that rely on that
> lowest bit in st_value being set in order to be applied, and we clear
> it permanently from the symtab, then livepatching wouldn't be able to
> apply those types of relocations anymore. If this is a legitimate
> concern, then perhaps an alternative solution would be to have an
> arch-specific kallsyms symbol-value-fixup function for accesses to
> sym.st_value, without modifying the module symtab.

I'm not familiar with livepatching, but yes, if it needs to do
relocations later I guess we should preserve the original value.

I gave the alternative solution a go and it seems to work.
add_kallsyms() currently overwrites st_info so I had to move the
elf_type to the unused st_other field instead to preserve st_info:

diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index 3ff571c2c71c..f443d0ccd1a0 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -336,6 +336,16 @@ static const Elf_Shdr *find_mod_section(const Elf32_Ehdr 
*hdr,
 extern void fixup_pv_table(const void *, unsigned long);
 extern void fixup_smp(const void *, unsigned long);
 
+#ifdef CONFIG_THUMB2_KERNEL
+unsigned long symbol_value(Elf_Sym *sym)
+{
+   if (ELF_ST_TYPE(sym->st_info) == STT_FUNC)
+   return sym->st_value & ~1;
+
+   return sym->st_value;
+}
+#endif
+
 int module_finalize(const Elf32_Ehdr *hdr, const Elf_Shdr *sechdrs,
struct module *mod)
 {
diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 31013c2effd3..6bf6118db37f 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -86,6 +86,8 @@ void module_arch_cleanup(struct module *mod);
 /* Any cleanup before freeing mod->module_init */
 void module_arch_freeing_init(struct module *mod);
 
+unsigned long symbol_value(Elf_Sym *sym);
+
 #ifdef CONFIG_KASAN
 #include 
 #define MODULE_ALIGN (PAGE_SIZE << KASAN_SHADOW_SCALE_SHIFT)
diff --git a/kernel/module.c b/kernel/module.c
index 49a405891587..871bf4450e9d 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2682,7 +2682,7 @@ static void add_kallsyms(struct module *mod, const struct 
load_info *info)
 
/* Set types up while we still have access to sections. */
for (i = 0; i < mod->kallsyms->num_symtab; i++)
-   mod->kallsyms->symtab[i].st_info
+   mod->kallsyms->symtab[i].st_other
= elf_type(&mod->kallsyms->symtab[i], info);
 
/* Now populate the cut down core kallsyms for after init. */
@@ -3916,6 +3916,11 @@ static const char *symname(struct mod_kallsyms 
*kallsyms, unsigned int symnum)
return kallsyms->strtab + kallsyms->symtab[symnum].st_name;
 }
 
+unsigned long __weak symbol_value(Elf_Sym *sym)
+{
+   return sym->st_value;
+}
+
 static const char *get_ksymbol(struct module *mod,
   unsigned long addr,
   unsigned long *size,
@@ -3934,6 +3939,9 @@ static const char *get_ksymbol(struct module *mod,
/* Scan for closest preceding symbol, and next symbol. (ELF
   starts real symbols at 1). */
for (i = 1; i < kallsyms->num_symtab; i++) {
+   unsigned long thisval = symbol_value(&kallsyms->symtab[i]);
+   unsigned long bestval = symbol_value(&kallsyms->symtab[best]);
+
if (kallsyms->symtab[i].st_shndx == SHN_UNDEF)
continue;
 
@@ -3943,21 +3951,21 @@ static const char *get_ksymbol(struct module *mod,
|| is_arm_mapping_symbol(symname(kallsyms, i)))
continue;
 
-   if (kallsyms->symtab[i].st_value <= addr
-   && kallsyms->symtab[i].st_value > 
kallsyms->symtab[best].st_value)
+   if (thisval <= addr
+   && thisval > bestval)
best = i;
-   if (kallsyms->symtab[i].st_value > addr
-   && kallsyms->symtab[i].st_value < nextval)
-   nextval = kallsyms->symtab[i].st_value;
+   if (thisval > addr
+   && thisval < nextval)
+   nextval = thisval;
}
 
if (!best)
return NULL;
 
if (size)
-   *size = nextval - kallsyms->symtab[best].st

[PATCH v2] sysctl: Add panic-fatal-signals

2019-01-28 Thread Vincent Whitchurch

Add a sysctl which asks the kernel to panic when any userspace process
receives a fatal signal which would trigger a core dump.  This has
proven to be quite useful when debugging problems seen during testing of
embedded systems:  When combined with kernel core dumps (saved due to
the panic), it allows easier debugging of crashes which have their
origin in system-wide problems such as buggy drivers or other kernel or
hardware-related issues.

The crashing process's core dump can be extracted from the kernel core
dump using tools such as the crash utility's gcore extension.

Signed-off-by: Vincent Whitchurch 
---
v2: Put the sysctl behind a config option

 include/linux/signal.h |  1 +
 init/Kconfig   | 14 ++
 kernel/signal.c|  5 -
 kernel/sysctl.c|  9 +
 4 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index cc7e2c1cd444..109efd1432e9 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -10,6 +10,7 @@ struct task_struct;
 
 /* for sysctl */
 extern int print_fatal_signals;
+extern int panic_fatal_signals;
 
 static inline void copy_siginfo(kernel_siginfo_t *to,
const kernel_siginfo_t *from)
diff --git a/init/Kconfig b/init/Kconfig
index d47cb77a220e..875442f6ab53 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1242,6 +1242,20 @@ config SYSCTL_SYSCALL
 
  If unsure say N here.
 
+config SYSCTL_PANIC_FATAL_SIGNALS
+   bool "panic-fatal-signals sysctl" if EXPERT
+   depends on PROC_SYSCTL
+   help
+ If you say Y here, a kernel.panic-fatal-signals sysctl will be
+ offered.  If this sysctl is turned on, the kernel will panic if any
+ userspace process receives a fatal signal which would trigger a core
+ dump.
+
+ When used together with kernel core dumps, this can be useful for
+ debugging some system-wide problems, primarily on embedded systems.
+
+ If unsure, say N.
+
 config FHANDLE
bool "open by fhandle syscalls" if EXPERT
select EXPORTFS
diff --git a/kernel/signal.c b/kernel/signal.c
index e1d7ad8e6ab1..83c6877b0182 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -59,6 +59,7 @@
 static struct kmem_cache *sigqueue_cachep;
 
 int print_fatal_signals __read_mostly;
+int panic_fatal_signals __read_mostly;
 
 static void __user *sig_handler(struct task_struct *t, int sig)
 {
@@ -2497,8 +2498,10 @@ bool get_signal(struct ksignal *ksig)
current->flags |= PF_SIGNALED;
 
if (sig_kernel_coredump(signr)) {
-   if (print_fatal_signals)
+   if (print_fatal_signals || panic_fatal_signals)
print_fatal_signal(ksig->info.si_signo);
+   if (panic_fatal_signals)
+   panic("Fatal signal and panic_fatal_signals=1");
proc_coredump_connector(current);
/*
 * If it was able to dump core, this kills all
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index ba4d9e85feb8..48023bad5b08 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -557,6 +557,15 @@ static struct ctl_table kern_table[] = {
.mode   = 0644,
.proc_handler   = proc_dointvec,
},
+#ifdef CONFIG_SYSCTL_PANIC_FATAL_SIGNALS
+   {
+   .procname   = "panic-fatal-signals",
+   .data   = &panic_fatal_signals,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec,
+   },
+#endif
 #ifdef CONFIG_SPARC
{
.procname   = "reboot-cmd",
-- 
2.20.0

[PATCH] tty: Add NULL TTY driver

2019-04-03 Thread Vincent Whitchurch

If no console driver is enabled (or if a non-present driver is selected
with something like console=null in an attempt to disable the console),
opening /dev/console errors out, and init scripts and other userspace
code that relies on the existence of a console will fail.  Symlinking
/dev/null to /dev/console does not solve the problem since /dev/null
does not behave like a real TTY.

To just provide a dummy console to userspace when no console driver is
available or desired, add a ttynull driver which simply discards all
writes.  It can be chosen on the command line in the standard way, i.e.
with console=ttynull.

Signed-off-by: Vincent Whitchurch 
---
 drivers/tty/Kconfig   |  14 ++
 drivers/tty/Makefile  |   1 +
 drivers/tty/ttynull.c | 109 ++
 3 files changed, 124 insertions(+)
 create mode 100644 drivers/tty/ttynull.c

diff --git a/drivers/tty/Kconfig b/drivers/tty/Kconfig
index e0a04bfc873e..d862f442f389 100644
--- a/drivers/tty/Kconfig
+++ b/drivers/tty/Kconfig
@@ -376,6 +376,20 @@ config PPC_EARLY_DEBUG_EHV_BC_HANDLE
  there simply will be no early console output.  This is true also
  if you don't boot under a hypervisor at all.
 
+config NULL_TTY
+   tristate "NULL TTY driver"
+   help
+ Say Y here if you want a NULL TTY which simply discards messages.
+
+ This is useful to allow userspace applications which expect a console
+ device to work without modifications even when no console is
+ available or desired.
+
+ In order to use this driver, you should redirect the console to this
+ TTY, or boot the kernel with console=ttynull.
+
+ If unsure, say N.
+
 config GOLDFISH_TTY
tristate "Goldfish TTY Driver"
depends on GOLDFISH
diff --git a/drivers/tty/Makefile b/drivers/tty/Makefile
index c72cafdf32b4..020b1cd9294f 100644
--- a/drivers/tty/Makefile
+++ b/drivers/tty/Makefile
@@ -25,6 +25,7 @@ obj-$(CONFIG_ISI) += isicom.o
 obj-$(CONFIG_MOXA_INTELLIO)+= moxa.o
 obj-$(CONFIG_MOXA_SMARTIO) += mxser.o
 obj-$(CONFIG_NOZOMI)   += nozomi.o
+obj-$(CONFIG_NULL_TTY) += ttynull.o
 obj-$(CONFIG_ROCKETPORT)   += rocket.o
 obj-$(CONFIG_SYNCLINK_GT)  += synclink_gt.o
 obj-$(CONFIG_SYNCLINKMP)   += synclinkmp.o
diff --git a/drivers/tty/ttynull.c b/drivers/tty/ttynull.c
new file mode 100644
index ..17f05b7eb6d3
--- /dev/null
+++ b/drivers/tty/ttynull.c
@@ -0,0 +1,109 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Axis Communications AB
+ *
+ * Based on ttyprintk.c:
+ *  Copyright (C) 2010 Samo Pogacnik
+ */
+
+#include 
+#include 
+#include 
+
+static const struct tty_port_operations ttynull_port_ops;
+static struct tty_driver *ttynull_driver;
+static struct tty_port ttynull_port;
+
+static int ttynull_open(struct tty_struct *tty, struct file *filp)
+{
+   return tty_port_open(&ttynull_port, tty, filp);
+}
+
+static void ttynull_close(struct tty_struct *tty, struct file *filp)
+{
+   tty_port_close(&ttynull_port, tty, filp);
+}
+
+static void ttynull_hangup(struct tty_struct *tty)
+{
+   tty_port_hangup(&ttynull_port);
+}
+
+static int ttynull_write(struct tty_struct *tty, const unsigned char *buf,
+int count)
+{
+   return count;
+}
+
+static int ttynull_write_room(struct tty_struct *tty)
+{
+   return 65536;
+}
+
+static const struct tty_operations ttynull_ops = {
+   .open = ttynull_open,
+   .close = ttynull_close,
+   .hangup = ttynull_hangup,
+   .write = ttynull_write,
+   .write_room = ttynull_write_room,
+};
+
+static struct tty_driver *ttynull_device(struct console *c, int *index)
+{
+   *index = 0;
+   return ttynull_driver;
+}
+
+static struct console ttynull_console = {
+   .name = "ttynull",
+   .device = ttynull_device,
+};
+
+static int __init ttynull_init(void)
+{
+   struct tty_driver *driver;
+   int ret;
+
+   driver = tty_alloc_driver(1,
+   TTY_DRIVER_RESET_TERMIOS |
+   TTY_DRIVER_REAL_RAW |
+   TTY_DRIVER_UNNUMBERED_NODE);
+   if (IS_ERR(driver))
+   return PTR_ERR(driver);
+
+   tty_port_init(&ttynull_port);
+   ttynull_port.ops = &ttynull_port_ops;
+
+   driver->driver_name = "ttynull";
+   driver->name = "ttynull";
+   driver->type = TTY_DRIVER_TYPE_CONSOLE;
+   driver->init_termios = tty_std_termios;
+   driver->init_termios.c_oflag = OPOST | OCRNL | ONOCR | ONLRET;
+   tty_set_operations(driver, &ttynull_ops);
+   tty_port_link_device(&ttynull_port, driver, 0);
+
+   ret = tty_register_driver(driver);
+   if (ret < 0) {
+   put_tty_driver(driver);
+   tty_port_destroy(&ttynull_port);
+   return ret;
+   }
+
+   ttynull_driver = driver;
+

Re: [PATCH] tty: Add NULL TTY driver

2019-04-03 Thread Vincent Whitchurch

On Wed, Apr 03, 2019 at 03:12:13PM +0200, Greg KH wrote:
> On Wed, Apr 03, 2019 at 01:33:27PM +0200, Vincent Whitchurch wrote:
> > If no console driver is enabled (or if a non-present driver is selected
> > with something like console=null in an attempt to disable the console),
> > opening /dev/console errors out, and init scripts and other userspace
> > code that relies on the existence of a console will fail.  Symlinking
> > /dev/null to /dev/console does not solve the problem since /dev/null
> > does not behave like a real TTY.
> > 
> > To just provide a dummy console to userspace when no console driver is
> > available or desired, add a ttynull driver which simply discards all
> > writes.  It can be chosen on the command line in the standard way, i.e.
> > with console=ttynull.
> 
> If they have a broken system that sets "console=null", why would they
> know to fix it to be "console=ttynull"?
> 
> I'm all for adding new functionality, but to provide kernel code because
> userspace just isn't configured properly, that feels really wrong to me.

Especially on embedded systems, it would be convenient to have a simple
way to disable the console (both for kernel and userspace) on a system
which normally uses it, to free up the UART for other things.

In my case some early init script in userspace was actually tring to
handle the lack of a console in this way:

 [ ! -w /dev/console ] || exec 2>/dev/console

but the problem is that /dev/console always exists but fails on write,
so the obvious way of handling missing devices doesn't work.  AFAICS the
only way to get rid of /dev/console would be to disable the TTY layer
entirely in the kernel or have early userspace delete the file from
devtmpfs.

> Now if this were to be the "default" if nothing is set up at all, that
> might make a bit more sense, but as-is, this doesn't seem very useful.

Making it the default now would break users, wouldn't it?  Since IIRC if
no console is selected, the first registered console will automatically
be used by default.

Re: [PATCH] tty: Add NULL TTY driver

2019-04-11 Thread Vincent Whitchurch

On Fri, Apr 05, 2019 at 02:32:41PM +0200, Enrico Weigelt, metux IT consult 
wrote:
> On 05.04.19 11:00, Vincent Whitchurch wrote:
> > On Fri, Apr 05, 2019 at 10:39:43AM +0200, Enrico Weigelt, metux IT consult 
> > wrote:
> >> On 03.04.19 16:11, Vincent Whitchurch wrote:
> >>
> >>> Especially on embedded systems, it would be convenient to have a simple
> >>> way to disable the console (both for kernel and userspace) on a system
> >>> which normally uses it, to free up the UART for other things.
> >>
> >> Just symlinking to /dev/null does not work ?
> > 
> > No, /dev/null does not support the TTY ioctls.
> 
> hmm, wo (which programs) do you need, that really need them ?

I think it was systemd's debug-shell which complained about the ioctls.

> >> OTOH, if you're introducing a dummy console, wouldn't a ringbuffer that,
> >> can be read out later, a better option ?
> > 
> > There is already a ttyprintk driver in mainline to send these messages
> > to the printk ring buffer if one is actually intrested in what is
> > written to the console.  There's no option to enable it via console= in
> > mainline but I have a patch for that too.
> 
> Great. IMHO, that would be the better way.

Like Krzysztof explained, sometimes you just want to discard these
messages.

[PATCH] perf bench mem: Always memset source before memcpy

2020-08-10 Thread Vincent Whitchurch

For memcpy, the source pages are memset to zero only when --cycles is
used.  This leads to wildly different results with or without --cycles,
since all sources pages are likely to be mapped to the same zero page
without explicit writes.

Before this fix:

$ export cmd="./perf stat -e LLC-loads -- ./perf bench \
  mem memcpy -s 1024MB -l 100 -f default"
$ $cmd

 2,935,826  LLC-loads
   3.821677452 seconds time elapsed

$ $cmd --cycles

   217,533,436  LLC-loads
   8.616725985 seconds time elapsed

After this fix:

$ $cmd

   214,459,686  LLC-loads
   8.674301124 seconds time elapsed

$ $cmd --cycles

   214,758,651  LLC-loads
   8.644480006 seconds time elapsed

Fixes: 47b5757bac03c3387c ("perf bench mem: Move boilerplate memory allocation 
to the infrastructure")
Signed-off-by: Vincent Whitchurch 
---
 tools/perf/bench/mem-functions.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/tools/perf/bench/mem-functions.c b/tools/perf/bench/mem-functions.c
index 9235b76501be..19d45c377ac1 100644
--- a/tools/perf/bench/mem-functions.c
+++ b/tools/perf/bench/mem-functions.c
@@ -223,12 +223,8 @@ static int bench_mem_common(int argc, const char **argv, 
struct bench_mem_info *
return 0;
 }
 
-static u64 do_memcpy_cycles(const struct function *r, size_t size, void *src, 
void *dst)
+static void memcpy_prefault(memcpy_t fn, size_t size, void *src, void *dst)
 {
-   u64 cycle_start = 0ULL, cycle_end = 0ULL;
-   memcpy_t fn = r->fn.memcpy;
-   int i;
-
/* Make sure to always prefault zero pages even if MMAP_THRESH is 
crossed: */
memset(src, 0, size);
 
@@ -237,6 +233,15 @@ static u64 do_memcpy_cycles(const struct function *r, 
size_t size, void *src, vo
 * to not measure page fault overhead:
 */
fn(dst, src, size);
+}
+
+static u64 do_memcpy_cycles(const struct function *r, size_t size, void *src, 
void *dst)
+{
+   u64 cycle_start = 0ULL, cycle_end = 0ULL;
+   memcpy_t fn = r->fn.memcpy;
+   int i;
+
+   memcpy_prefault(fn, size, src, dst);
 
cycle_start = get_cycles();
for (i = 0; i < nr_loops; ++i)
@@ -252,11 +257,7 @@ static double do_memcpy_gettimeofday(const struct function 
*r, size_t size, void
memcpy_t fn = r->fn.memcpy;
int i;
 
-   /*
-* We prefault the freshly allocated memory range here,
-* to not measure page fault overhead:
-*/
-   fn(dst, src, size);
+   memcpy_prefault(fn, size, src, dst);
 
BUG_ON(gettimeofday(&tv_start, NULL));
for (i = 0; i < nr_loops; ++i)
-- 
2.25.1

[PATCH v3] cifs: Silently ignore unknown oplock break handle

2021-03-19 Thread Vincent Whitchurch

Make SMB2 not print out an error when an oplock break is received for an
unknown handle, similar to SMB1.  The debug message which is printed for
these unknown handles may also be misleading, so fix that too.

The SMB2 lease break path is not affected by this patch.

Without this, a program which writes to a file from one thread, and
opens, reads, and writes the same file from another thread triggers the
below errors several times a minute when run against a Samba server
configured with "smb2 leases = no".

 CIFS: VFS: \\192.168.0.1 No task to wake, unknown frame received! NumMids 2
 : 424d53fe 0040  0012  .SMB@...
 0010: 0001     
 0020:      
 0030:      

Signed-off-by: Vincent Whitchurch 
---

Notes:
v3:
- Change debug print to Tom Talpey's suggestion

v2:
- Drop change to lease break
- Rewrite commit message

 fs/cifs/smb2misc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c
index 60d4bd1eae2b..76cd05b8d53b 100644
--- a/fs/cifs/smb2misc.c
+++ b/fs/cifs/smb2misc.c
@@ -754,8 +754,8 @@ smb2_is_valid_oplock_break(char *buffer, struct 
TCP_Server_Info *server)
}
}
spin_unlock(&cifs_tcp_ses_lock);
-   cifs_dbg(FYI, "Can not process oplock break for non-existent 
connection\n");
-   return false;
+   cifs_dbg(FYI, "No file id matched, oplock break ignored\n");
+   return true;
 }
 
 void
-- 
2.28.0

[PATCH] CIFS: Prevent error log on spurious oplock break

2021-03-05 Thread Vincent Whitchurch

The SMB1 version of ->is_oplock_break() returns true even if the FileId
is not found, as long as the oplock break notification message structure
itself appears to be valid.  A true return value makes
cifs_demultiplex_thread() to not print an error message for such
packets.

However, the SMB2 version returns false in such cases, leading to an
error "No task to wake, unknown frame received!" followed by a hexdump
of the packet header being printed by cifs_demultiplex_thread().

Note that before commit fa9c2362497fbd64788063288d ("CIFS: Fix SMB2
oplock break processing"), SMB2 also returned true for the case where a
connection was found but the FileId was not, but it's not clear to me if
that commit really intended to change the behaviour of the error prints.

Change the behaviour of SMB2 to be the same as SMB1 and avoid the error
messages for these packets which we ignore as per the spec.

Signed-off-by: Vincent Whitchurch 
---
 fs/cifs/smb2misc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c
index 60d4bd1eae2b..3ea3bda64083 100644
--- a/fs/cifs/smb2misc.c
+++ b/fs/cifs/smb2misc.c
@@ -679,7 +679,7 @@ smb2_is_valid_lease_break(char *buffer)
}
spin_unlock(&cifs_tcp_ses_lock);
cifs_dbg(FYI, "Can not process lease break - no lease matched\n");
-   return false;
+   return true;
 }
 
 bool
@@ -755,7 +755,7 @@ smb2_is_valid_oplock_break(char *buffer, struct 
TCP_Server_Info *server)
}
spin_unlock(&cifs_tcp_ses_lock);
cifs_dbg(FYI, "Can not process oplock break for non-existent 
connection\n");
-   return false;
+   return true;
 }
 
 void
-- 
2.28.0

[PATCH v2] cifs: Silently ignore unknown oplock break handle

2021-03-16 Thread Vincent Whitchurch

Make SMB2 not print out an error when an oplock break is received for an
unknown handle, similar to SMB1.  The SMB2 lease break path is not
affected by this patch.

Without this, a program which writes to a file from one thread, and
opens, reads, and writes the same file from another thread triggers the
below errors several times a minute when run against a Samba server
configured with "smb2 leases = no".

 CIFS: VFS: \\192.168.0.1 No task to wake, unknown frame received! NumMids 2
 : 424d53fe 0040  0012  .SMB@...
 0010: 0001     
 0020:      
 0030:      

Signed-off-by: Vincent Whitchurch 
---

Notes:
v2:
- Drop change to lease break
- Rewrite commit message

 fs/cifs/smb2misc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c
index 60d4bd1eae2b..4d8576e202e3 100644
--- a/fs/cifs/smb2misc.c
+++ b/fs/cifs/smb2misc.c
@@ -755,7 +755,7 @@ smb2_is_valid_oplock_break(char *buffer, struct 
TCP_Server_Info *server)
}
spin_unlock(&cifs_tcp_ses_lock);
cifs_dbg(FYI, "Can not process oplock break for non-existent 
connection\n");
-   return false;
+   return true;
 }
 
 void
-- 
2.28.0

Re: [PATCH 1/9] regulator: Update DA9121 dt-bindings

2020-11-20 Thread Vincent Whitchurch

On Fri, Nov 20, 2020 at 01:14:50PM +0100, Adam Ward wrote:
> Update bindings for the Dialog Semiconductor DA9121 voltage regulator to add 
> device variants.
> 
> Signed-off-by: Adam Ward 
> ---
>  .../devicetree/bindings/regulator/dlg,da9121.yaml  | 177 
> +++--
>  MAINTAINERS|   2 +
>  .../dt-bindings/regulator/dlg,da9121-regulator.h   |  22 +++
>  3 files changed, 185 insertions(+), 16 deletions(-)
>  create mode 100644 include/dt-bindings/regulator/dlg,da9121-regulator.h
> 
> diff --git a/Documentation/devicetree/bindings/regulator/dlg,da9121.yaml 
> b/Documentation/devicetree/bindings/regulator/dlg,da9121.yaml
> index cde0d82..1bd177d 100644
> --- a/Documentation/devicetree/bindings/regulator/dlg,da9121.yaml
> +++ b/Documentation/devicetree/bindings/regulator/dlg,da9121.yaml
> @@ -8,40 +8,185 @@ title: Dialog Semiconductor DA9121 voltage regulator
>  
>  maintainers:
>- Vincent Whitchurch 
> +  - Adam Ward 

I'm quite happy to have myself removed from the list instead.  You are
in a much better position to maintain the bindings for these chips.

> -  buck1:
> -description:
> -  Initial data for the Buck1 regulator.
> -$ref: "regulator.yaml#"
> +  interrupt-parent:
> +maxItems: 1
> +description: Specifies the reference to the interrupt controller.
> +
> +  interrupts:
> +maxItems: 1
> +description: IRQ line information.
> +
> +  dlg,irq-polling-delay-passive:
> +maxItems: 1
> +description: |
> +  Specify the polling period, measured in milliseconds, between 
> interrupt status
> +  update checks. Range 1000-1 ms.
> +
> +  regulators:
>  type: object
> +$ref: regulator.yaml#
> +description: |
> +  This node defines the settings for the BUCK. The content of the
> +  sub-node is defined by the standard binding for regulators; see 
> regulator.yaml.
> +  The DA9121 regulator is bound using their names listed below
> +  buck1 - BUCK1
> +  buck2 - BUCK2   //DA9122, DA9220, DA9131, DA9132 only

This move to a sub-node means that older devicetrees won't work. I
assume that's fine since the driver is only in linux-next at the moment,
but perhaps it's worth mentioning this in the commit message?

> +
> +patternProperties:
> +  "^buck([0-1])$":
> +type: object
> +$ref: regulator.yaml#
> +
> +properties:
> +  regulator-mode:
> +maxItems: 1
> +description: Defined in 
> include/dt-bindings/regulator/dlg,da9121-regulator.h
> +
> +  regulator-initial-mode:
> +maxItems: 1
> +description: Defined in 
> include/dt-bindings/regulator/dlg,da9121-regulator.h
>  
> -unevaluatedProperties: false

I think my s/unevaluatedProperties/additionalProperties/ fix is already
in linux-next, so this looks like it needs rebasing.

Re: [PATCH 1/9] regulator: Update DA9121 dt-bindings

2020-11-25 Thread Vincent Whitchurch

On Fri, Nov 20, 2020 at 02:47:42PM +0100, Vincent Whitchurch wrote:
> On Fri, Nov 20, 2020 at 01:14:50PM +0100, Adam Ward wrote:
> > -  buck1:
> > -description:
> > -  Initial data for the Buck1 regulator.
> > -$ref: "regulator.yaml#"
> > +  interrupt-parent:
> > +maxItems: 1
> > +description: Specifies the reference to the interrupt controller.
> > +
> > +  interrupts:
> > +maxItems: 1
> > +description: IRQ line information.
> > +
> > +  dlg,irq-polling-delay-passive:
> > +maxItems: 1
> > +description: |
> > +  Specify the polling period, measured in milliseconds, between 
> > interrupt status
> > +  update checks. Range 1000-1 ms.
> > +
> > +  regulators:
> >  type: object
> > +$ref: regulator.yaml#
> > +description: |
> > +  This node defines the settings for the BUCK. The content of the
> > +  sub-node is defined by the standard binding for regulators; see 
> > regulator.yaml.
> > +  The DA9121 regulator is bound using their names listed below
> > +  buck1 - BUCK1
> > +  buck2 - BUCK2   //DA9122, DA9220, DA9131, DA9132 only
> 
> This move to a sub-node means that older devicetrees won't work. I
> assume that's fine since the driver is only in linux-next at the moment,
> but perhaps it's worth mentioning this in the commit message?

Actually, perhaps I'm missing something, but I don't quite see why this
move to a sub-node is needed.  There is some flexibility in the
regulator framework for this as I noted earlier
(https://lore.kernel.org/lkml/20201102154848.tm5nsydaukyd7...@axis.com/).
For the case of an MFD it certainly makes sense to have a "regulators"
sub-node but for these chips it seems rather redundant.

Also, perhaps you could split this patch into logical pieces too as Mark
has suggested for some of the other patches in this series?

Re: [PATCH] CIFS: Prevent error log on spurious oplock break

2021-03-12 Thread Vincent Whitchurch

On Tue, Mar 09, 2021 at 04:29:14PM +0100, Steve French wrote:
> On Tue, Mar 9, 2021, 07:42 Vincent Whitchurch via samba-technical 
> mailto:samba-techni...@lists.samba.org>> 
> wrote:
>> Thank you for the suggestions.  In my case, I've only received some
>> reports of this error being emitted very rarely (couple of times a month
>> in our stability tests).  Right now it looks like the problem may only
>> be with a particular NAS, and we're looking into triggering oplock
>> breaks more often and catching the problem with some more logging.
>
> I lean toward reducing or skipping the logging of the 'normsl' (or at
> least possible) race between close and oplock break.
> 
> I see this eg spamming the log running xfstest 524
> 
> Can you repro it as well running that?

I haven't run xfstests, but we figured out how to easily trigger the
error in a normal use case in our application.  I can now easily get the
errors to spam the logs with a small program which writes to a file from
one thread in a loop and opens, reads, and closes the same file in
another thread in a loop.  This is against a Samba server configured
with "smb2 leases = no".

Logs show that the oplock break FileId is not found because of the race
between close and oplock break which you mentioned, and in some cases
because of another race between open and oplock break (the open was not
completed since it was waiting on the response to GetInfo).

If this is unavoidable, I think it really would be nice to at least
reduce the severity since it's scary-looking and so easy to trigger.

How about something like the below?  It prints an info message for the
first unhandled oplock breaks once.

(I'm not sure if the lease key path should be handled differently. If
 the concerns about removing the message were primarily for that path,
 perhaps my original patch but with the change to
 smb2_is_valid_lease_break() dropped could be acceptable?)

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 3de3c5908a72..849c3721f8a2 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -256,7 +256,7 @@ struct smb_version_operations {
void (*dump_share_caps)(struct seq_file *, struct cifs_tcon *);
/* verify the message */
int (*check_message)(char *, unsigned int, struct TCP_Server_Info *);
-   bool (*is_oplock_break)(char *, struct TCP_Server_Info *);
+   int (*is_oplock_break)(char *, struct TCP_Server_Info *);
int (*handle_cancelled_mid)(char *, struct TCP_Server_Info *);
void (*downgrade_oplock)(struct TCP_Server_Info *server,
 struct cifsInodeInfo *cinode, __u32 oplock,
diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index 75ce6f742b8d..2714b6cdf70a 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -135,7 +135,7 @@ extern int SendReceiveBlockingLock(const unsigned int xid,
int *bytes_returned);
 extern int cifs_reconnect(struct TCP_Server_Info *server);
 extern int checkSMB(char *buf, unsigned int len, struct TCP_Server_Info *srvr);
-extern bool is_valid_oplock_break(char *, struct TCP_Server_Info *);
+extern int is_valid_oplock_break(char *, struct TCP_Server_Info *);
 extern bool backup_cred(struct cifs_sb_info *);
 extern bool is_size_safe_to_change(struct cifsInodeInfo *, __u64 eof);
 extern void cifs_update_eof(struct cifsInodeInfo *cifsi, loff_t offset,
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 112692300fb6..5dc58f0c99b0 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -1009,6 +1009,8 @@ cifs_demultiplex_thread(void *p)
server->lstrp = jiffies;
 
for (i = 0; i < num_mids; i++) {
+   int oplockret = -EINVAL;
+
if (mids[i] != NULL) {
mids[i]->resp_buf_size = server->pdu_size;
 
@@ -1020,17 +1022,24 @@ cifs_demultiplex_thread(void *p)
mids[i]->callback(mids[i]);
 
cifs_mid_q_entry_release(mids[i]);
-   } else if (server->ops->is_oplock_break &&
-  server->ops->is_oplock_break(bufs[i],
-   server)) {
-   smb2_add_credits_from_hdr(bufs[i], server);
+   continue;
+   }
+
+   if (server->ops->is_oplock_break)
+   oplockret = 
server->ops->is_oplock_break(bufs[i], server);
+
+   smb2_add_credits_from_hdr(bufs[i], server);
+
+   if (oplockret == 0) {
cifs_dbg(FYI, "Received oplock break\n");
+   } else if (oplockret == -ENOENT) {
+

Re: [PATCH] CIFS: Prevent error log on spurious oplock break

2021-03-09 Thread Vincent Whitchurch

On Tue, Mar 09, 2021 at 01:05:11AM +0100, ronnie sahlberg wrote:
> On Sun, Mar 7, 2021 at 8:52 PM Shyam Prasad N via samba-technical
>  wrote:
> > The reason for rejecting the request maybe a number of things like:
> > corrupted request, stale request (for some old session), or for a
> > wrong handle.
> > I don't think we should treat any of these cases as a success.
> 
> I agree with Shyam here.
> We shouldn't change the return value to pretend success just to
> suppress a warning.

Thank you all for your comments.  I see that everyone agrees that the
error print is useful for SMB2, so I will drop this patch.

> However, if it is common to trigger with false positives we might want
> to something to prevent it from
> spamming the logs.
> These messages could be useful if we encounter bugs in our leasing
> code, or bugs in server
> lease code, so we should't throw them away completely. But if false
> positives are common ...
> 
> Some thoughts I and Stever brainstormed about could be to change the code in 
> the
> demiltiplex thread where we currently dump the packets that were "invalid"
> to maybe:
> *  log once as VFS and then log any future ones as FYI
> * log once as VFS and then only make the others available via dynamic
> trace points
> * rate limit it so we only log it once every n minutes?  (this is overkill?)

Thank you for the suggestions.  In my case, I've only received some
reports of this error being emitted very rarely (couple of times a month
in our stability tests).  Right now it looks like the problem may only
be with a particular NAS, and we're looking into triggering oplock
breaks more often and catching the problem with some more logging.

Re: [PATCH 0/8] Virtio-over-PCIe on non-MIC

2019-01-21 Thread Vincent Whitchurch

On Fri, Jan 18, 2019 at 04:49:16PM -0700, Stephen Warren wrote:
> On 1/16/19 9:32 AM, Vincent Whitchurch wrote:
> > The Virtio-over-PCIe framework living under drivers/misc/mic/vop implements 
> > a
> > generic framework to use virtio between two Linux systems, given shared 
> > memory
> > and a couple of interrupts.  It does not actually require the Intel MIC
> > hardware, x86-64, or even PCIe for that matter.  This patch series makes it
> > buildable on more systems and adds a loopback driver to test it without 
> > special
> > hardware.
> > 
> > Note that I don't have access to Intel MIC hardware so some testing of the
> > patchset (especially the patch "vop: Use consistent DMA") on that platform
> > would be appreciated, to ensure that the series does not break anything 
> > there.
> 
> So a while ago I took a look at running virtio over PCIe. I found virtio
> basically had two parts:
> 
> 1) The protocol used to enumerate which virtio devices exist, and perhaps
> configure them.
> 
> 2) The ring buffer protocol that actually transfers the data.
> 
> I recall that data transfer was purely based on simple shared memory and
> interrupts, and hence could run over PCIe (e.g. via the PCIe endpoint
> subsystem in the kernel) without issue.
> 
> However, the enumeration/configuration protocol requires the host to be able
> to do all kinds of strange things that can't possibly be emulated over PCIe;
> IIRC the configuration data contains "registers" that when written select
> the data other "registers" access. When the virtio device is exposed by a
> hypervisor, and all the accesses are emulated synchronously through a trap,
> this is easy enough to implement. However, if the two ends of this
> configuration parsing are on different ends of a PCIe bus, there's no way
> this can work.

Correct, and that's why the MIC "Virtio-over-PCIe framework" does not
try to implement the standard "Virtio Over PCI Bus".  (Yes, it's
confusing.)

> Are you thinking of doing something different for enumeration/configuration,
> and just using the virtio ring buffer protocol over PCIe?

The mic/vop code already does this.  See
Documentation/mic/mic_overview.txt for some information.

> I did post asking about this quite a while back, but IIRC I didn't receive
> much of a response. Yes, here it is:
> 
> > https://lists.linuxfoundation.org/pipermail/virtualization/2018-March/037276.html
> "virtio over SW-defined/CPU-driven PCIe endpoint"

I came to essentialy the same conclusions before I found the MIC code.

(Your "aside" in that email about virtio doing PCIe reads instead of
 writes is not solved by the MIC code, since that is how the standard
 virtio devices/drivers work.)

[PATCH] mic: vop: Fix broken virtqueues

2019-01-29 Thread Vincent Whitchurch

VOP is broken in mainline since commit 1ce9e6055fa0a9043 ("virtio_ring:
introduce packed ring support"); attempting to use the virtqueues leads
to various kernel crashes.  I'm testing it with my not-yet-merged
loopback patches, but even the in-tree MIC hardware cannot work.

The problem is not in the referenced commit per se, but is due to the
following hack in vop_find_vq() which depends on the layout of private
structures in other source files, which that commit happened to change:

  /*
   * To reassign the used ring here we are directly accessing
   * struct vring_virtqueue which is a private data structure
   * in virtio_ring.c. At the minimum, a BUILD_BUG_ON() in
   * vring_new_virtqueue() would ensure that
   *  (&vq->vring == (struct vring *) (&vq->vq + 1));
   */
  vr = (struct vring *)(vq + 1);
  vr->used = used;

Fix vop by using __vring_new_virtqueue() to create the needed vring
layout from the start, instead of attempting to patch in the used ring
later.  __vring_new_virtqueue() was added way back in commit
2a2d1382fe9dcc ("virtio: Add improved queue allocation API") in order to
address mic's usecase, according to the commit message.

Signed-off-by: Vincent Whitchurch 
---
 drivers/misc/mic/vop/vop_main.c | 60 +++--
 1 file changed, 34 insertions(+), 26 deletions(-)

diff --git a/drivers/misc/mic/vop/vop_main.c b/drivers/misc/mic/vop/vop_main.c
index d2b9782eee87..fef45bf6d519 100644
--- a/drivers/misc/mic/vop/vop_main.c
+++ b/drivers/misc/mic/vop/vop_main.c
@@ -284,6 +284,26 @@ static void vop_del_vqs(struct virtio_device *dev)
vop_del_vq(vq, idx++);
 }
 
+static struct virtqueue *vop_new_virtqueue(unsigned int index,
+ unsigned int num,
+ struct virtio_device *vdev,
+ bool context,
+ void *pages,
+ bool (*notify)(struct virtqueue *vq),
+ void (*callback)(struct virtqueue *vq),
+ const char *name,
+ void *used)
+{
+   bool weak_barriers = false;
+   struct vring vring;
+
+   vring_init(&vring, num, pages, MIC_VIRTIO_RING_ALIGN);
+   vring.used = used;
+
+   return __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
+notify, callback, name);
+}
+
 /*
  * This routine will assign vring's allocated in host/io memory. Code in
  * virtio_ring.c however continues to access this io memory as if it were local
@@ -303,7 +323,6 @@ static struct virtqueue *vop_find_vq(struct virtio_device 
*dev,
struct _mic_vring_info __iomem *info;
void *used;
int vr_size, _vr_size, err, magic;
-   struct vring *vr;
u8 type = ioread8(&vdev->desc->type);
 
if (index >= ioread8(&vdev->desc->num_vq))
@@ -322,17 +341,7 @@ static struct virtqueue *vop_find_vq(struct virtio_device 
*dev,
return ERR_PTR(-ENOMEM);
vdev->vr[index] = va;
memset_io(va, 0x0, _vr_size);
-   vq = vring_new_virtqueue(
-   index,
-   le16_to_cpu(config.num), MIC_VIRTIO_RING_ALIGN,
-   dev,
-   false,
-   ctx,
-   (void __force *)va, vop_notify, callback, name);
-   if (!vq) {
-   err = -ENOMEM;
-   goto unmap;
-   }
+
info = va + _vr_size;
magic = ioread32(&info->magic);
 
@@ -341,7 +350,6 @@ static struct virtqueue *vop_find_vq(struct virtio_device 
*dev,
goto unmap;
}
 
-   /* Allocate and reassign used ring now */
vdev->used_size[index] = PAGE_ALIGN(sizeof(__u16) * 3 +
 sizeof(struct vring_used_elem) *
 le16_to_cpu(config.num));
@@ -351,8 +359,17 @@ static struct virtqueue *vop_find_vq(struct virtio_device 
*dev,
err = -ENOMEM;
dev_err(_vop_dev(vdev), "%s %d err %d\n",
__func__, __LINE__, err);
-   goto del_vq;
+   goto unmap;
+   }
+
+   vq = vop_new_virtqueue(index, le16_to_cpu(config.num), dev, ctx,
+  (void __force *)va, vop_notify, callback,
+  name, used);
+   if (!vq) {
+   err = -ENOMEM;
+   goto free_used;
}
+
vdev->used[index] = dma_map_single(&vpdev->dev, used,
vdev->used_size[index],
DMA_BIDIRECTIONAL);
@@ -360,26 +377,17 @@ static struct virt

[PATCH] sysctl: Add panic-fatal-signals

2019-01-16 Thread Vincent Whitchurch

Add a sysctl which asks the kernel to panic when any userspace process
receives a fatal signal which would trigger a core dump.  This has
proven to be quite useful when debugging problems seen during testing of
embedded systems:  When combined with kernel core dumps (saved due to
the panic), it allows easier debugging of crashes which have their
origin in system-wide problems such as buggy drivers or other kernel or
hardware-related issues.

The crashing process's core dump can be extracted from the kernel core
dump using tools such as the crash utility's gcore extension.

Signed-off-by: Vincent Whitchurch 
---
 include/linux/signal.h | 1 +
 kernel/signal.c| 5 -
 kernel/sysctl.c| 7 +++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index cc7e2c1cd444..109efd1432e9 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -10,6 +10,7 @@ struct task_struct;
 
 /* for sysctl */
 extern int print_fatal_signals;
+extern int panic_fatal_signals;
 
 static inline void copy_siginfo(kernel_siginfo_t *to,
const kernel_siginfo_t *from)
diff --git a/kernel/signal.c b/kernel/signal.c
index e1d7ad8e6ab1..83c6877b0182 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -59,6 +59,7 @@
 static struct kmem_cache *sigqueue_cachep;
 
 int print_fatal_signals __read_mostly;
+int panic_fatal_signals __read_mostly;
 
 static void __user *sig_handler(struct task_struct *t, int sig)
 {
@@ -2497,8 +2498,10 @@ bool get_signal(struct ksignal *ksig)
current->flags |= PF_SIGNALED;
 
if (sig_kernel_coredump(signr)) {
-   if (print_fatal_signals)
+   if (print_fatal_signals || panic_fatal_signals)
print_fatal_signal(ksig->info.si_signo);
+   if (panic_fatal_signals)
+   panic("Fatal signal and panic_fatal_signals=1");
proc_coredump_connector(current);
/*
 * If it was able to dump core, this kills all
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index ba4d9e85feb8..ad1ac6ec2004 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -557,6 +557,13 @@ static struct ctl_table kern_table[] = {
.mode   = 0644,
.proc_handler   = proc_dointvec,
},
+   {
+   .procname   = "panic-fatal-signals",
+   .data   = &panic_fatal_signals,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec,
+   },
 #ifdef CONFIG_SPARC
{
.procname   = "reboot-cmd",
-- 
2.20.0

[PATCH 8/8] vop: Add loopback

2019-01-16 Thread Vincent Whitchurch

Add a loopback driver to allow testing and evaluation of the VOP
framework without special hardware.  The host and the guest will run
under the same kernel.

Signed-off-by: Vincent Whitchurch 
---
 drivers/misc/mic/Kconfig|  10 +
 drivers/misc/mic/vop/Makefile   |   2 +
 drivers/misc/mic/vop/vop_loopback.c | 390 
 3 files changed, 402 insertions(+)
 create mode 100644 drivers/misc/mic/vop/vop_loopback.c

diff --git a/drivers/misc/mic/Kconfig b/drivers/misc/mic/Kconfig
index 319ee3c90a26..b181cf20d40f 100644
--- a/drivers/misc/mic/Kconfig
+++ b/drivers/misc/mic/Kconfig
@@ -149,6 +149,16 @@ config VOP
  OS and tools for MIC to use with this driver are available from
  <http://software.intel.com/en-us/mic-developer>.
 
+config VOP_LOOPBACK
+   tristate "VOP loopback driver"
+   depends on VOP
+   help
+ This enables a loopback driver to test and evaluate the VOP
+ infrastructure without actual PCIe hardware.  The host and the guest
+ sides run under the same kernel.
+
+ If unsure, say N.
+
 if VOP
 source "drivers/vhost/Kconfig.vringh"
 endif
diff --git a/drivers/misc/mic/vop/Makefile b/drivers/misc/mic/vop/Makefile
index 78819c8999f1..a6ead25c4418 100644
--- a/drivers/misc/mic/vop/Makefile
+++ b/drivers/misc/mic/vop/Makefile
@@ -7,3 +7,5 @@ obj-m := vop.o
 vop-objs += vop_main.o
 vop-objs += vop_debugfs.o
 vop-objs += vop_vringh.o
+
+obj-$(CONFIG_VOP_LOOPBACK) += vop_loopback.o
diff --git a/drivers/misc/mic/vop/vop_loopback.c 
b/drivers/misc/mic/vop/vop_loopback.c
new file mode 100644
index ..6e3d24f166cf
--- /dev/null
+++ b/drivers/misc/mic/vop/vop_loopback.c
@@ -0,0 +1,390 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Axis Communications AB
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../bus/vop_bus.h"
+
+struct mic_irq {
+   struct list_head list;
+   int irq;
+   irqreturn_t (*func)(int irq, void *data);
+   void *data;
+};
+
+struct vop_loopback_end {
+   struct vop_loopback *loopback;
+   const char *name;
+   struct vop_device *vop;
+   struct list_head irqs;
+   struct mutex mutex;
+   struct work_struct work;
+};
+
+struct vop_loopback {
+   struct device *dev;
+   void *dp;
+   struct vop_loopback_end host;
+   struct vop_loopback_end guest;
+};
+
+static inline struct vop_loopback *vop_to_loopback(struct device *dev)
+{
+   return dev_get_drvdata(dev->parent);
+}
+
+static dma_addr_t
+vop_loopback_dma_map_page(struct device *dev, struct page *page,
+ unsigned long offset, size_t size,
+ enum dma_data_direction dir, unsigned long attrs)
+{
+   return page_to_phys(page) + offset;
+}
+
+static void vop_loopback_dma_unmap_page(struct device *dev, dma_addr_t 
dma_addr,
+   size_t size, enum dma_data_direction dir,
+   unsigned long attrs)
+{
+}
+
+static int vop_loopback_dma_mmap(struct device *dev, struct vm_area_struct 
*vma,
+void *cpu_addr, dma_addr_t dma_addr, size_t size,
+unsigned long attrs)
+{
+   return remap_pfn_range(vma, vma->vm_start,
+  PHYS_PFN(dma_addr), size,
+  vma->vm_page_prot);
+}
+
+static void *vop_loopback_dma_alloc(struct device *dev, size_t size,
+   dma_addr_t *handle, gfp_t gfp,
+   unsigned long attrs)
+{
+   void *p = (void *) __get_free_pages(gfp, get_order(size));
+
+   if (p)
+   *handle = virt_to_phys(p);
+
+   return p;
+}
+
+static void vop_loopback_dma_free(struct device *dev, size_t size,
+ void *cpu_addr, dma_addr_t handle,
+ unsigned long attrs)
+{
+   free_pages((unsigned long) (uintptr_t) cpu_addr, get_order(size));
+}
+
+static const struct dma_map_ops vop_loopback_dma_ops = {
+   .map_page = vop_loopback_dma_map_page,
+   .unmap_page = vop_loopback_dma_unmap_page,
+   .mmap = vop_loopback_dma_mmap,
+   .alloc = vop_loopback_dma_alloc,
+   .free = vop_loopback_dma_free,
+};
+
+
+static void vop_loopback_ack_interrupt(struct vop_device *vop, int num)
+{
+}
+
+static int vop_loopback_next_db(struct vop_device *vop)
+{
+   return 0;
+}
+
+static void *vop_loopback_get_dp(struct vop_device *vop)
+{
+   struct vop_loopback *loopback = vop_to_loopback(&vop->dev);
+
+   return loopback->dp;
+}
+
+static dma_addr_t vop_loopback_get_dp_dma(struct vop_device *vop)
+{
+   struct vop_loopback *loopback = vop_to_loopback(&vop->dev);
+
+   return virt_to_phys(loopback->dp);
+}
+
+static void __iomem *vop_loopback_get_remote_dp

[PATCH 3/8] vop: Add definition of readq/writeq if missing

2019-01-16 Thread Vincent Whitchurch

Include  so that readq/writeq are
replaced by two readl/writel on systems that do not support them.  The
values read/written are pointers which will be 32-bit on 32-bit systems
so the non-atomicity should not matter.

Signed-off-by: Vincent Whitchurch 
---
 drivers/misc/mic/vop/vop_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/misc/mic/vop/vop_main.c b/drivers/misc/mic/vop/vop_main.c
index 5a366cf68d95..26b23f2cf94c 100644
--- a/drivers/misc/mic/vop/vop_main.c
+++ b/drivers/misc/mic/vop/vop_main.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "vop_main.h"
 
-- 
2.20.0

[PATCH 7/8] vop: Use consistent DMA

2019-01-16 Thread Vincent Whitchurch

The vop code maps buffers using the streaming DMA API but never syncs
them so it doesn't work on systems without cache coherence.  The vrings
want consistent mappings, and not streaming mappings so use that API to
allocate and map buffers.

Signed-off-by: Vincent Whitchurch 
---
 drivers/misc/mic/bus/vop_bus.h|   2 +
 drivers/misc/mic/host/mic_boot.c  |  46 +
 drivers/misc/mic/vop/vop_main.c   |  23 ++-
 drivers/misc/mic/vop/vop_vringh.c | 111 --
 4 files changed, 114 insertions(+), 68 deletions(-)

diff --git a/drivers/misc/mic/bus/vop_bus.h b/drivers/misc/mic/bus/vop_bus.h
index fff7a865d721..a2db11bea277 100644
--- a/drivers/misc/mic/bus/vop_bus.h
+++ b/drivers/misc/mic/bus/vop_bus.h
@@ -86,6 +86,7 @@ struct vop_driver {
  * node to add/remove/configure virtio devices.
  * @get_dp: Get access to the virtio device page used by the self
  *  node to add/remove/configure virtio devices.
+ * @get_dp_dma: Get DMA address of the virtio device page.
  * @send_intr: Send an interrupt to the peer node on a specified doorbell.
  * @ioremap: Map a buffer with the specified DMA address and length.
  * @iounmap: Unmap a buffer previously mapped.
@@ -103,6 +104,7 @@ struct vop_hw_ops {
void (*ack_interrupt)(struct vop_device *vpdev, int num);
void __iomem * (*get_remote_dp)(struct vop_device *vpdev);
void * (*get_dp)(struct vop_device *vpdev);
+   dma_addr_t (*get_dp_dma)(struct vop_device *vpdev);
void (*send_intr)(struct vop_device *vpdev, int db);
void __iomem * (*ioremap)(struct vop_device *vpdev,
  dma_addr_t pa, size_t len);
diff --git a/drivers/misc/mic/host/mic_boot.c b/drivers/misc/mic/host/mic_boot.c
index 6479435ac96b..1dcd25917eca 100644
--- a/drivers/misc/mic/host/mic_boot.c
+++ b/drivers/misc/mic/host/mic_boot.c
@@ -55,9 +55,47 @@ static void _mic_dma_unmap_page(struct device *dev, 
dma_addr_t dma_addr,
mic_unmap_single(mdev, dma_addr, size);
 }
 
+static int _mic_dma_mmap(struct device *dev, struct vm_area_struct *vma,
+void *cpu_addr, dma_addr_t dma_addr, size_t size,
+unsigned long attrs)
+{
+   return remap_pfn_range(vma, vma->vm_start,
+  virt_to_pfn(cpu_addr), size,
+  vma->vm_page_prot);
+}
+
+static void *_mic_dma_alloc(struct device *dev, size_t size, dma_addr_t 
*handle,
+   gfp_t gfp, unsigned long attrs)
+{
+   void *cpu_addr = (void *) __get_free_pages(gfp, get_order(size));
+   dma_addr_t dma_addr;
+
+   if (!cpu_addr)
+   return NULL;
+
+   *handle = dma_map_single(dev, p, size, attrs);
+   if (dma_mapping_error(dev, *handle)) {
+   free_pages((unsigned long) (uintptr_t) cpu_addr,
+  get_order(size));
+   return NULL;
+   }
+
+   return cpu_addr;
+}
+
+static void _mic_dma_free(struct device *dev, size_t size, void *cpu_addr,
+ dma_addr_t handle, unsigned long attrs)
+{
+   dma_unmap_single(dev, cpu_addr, handle, attrs);
+   free_pages((unsigned long) (uintptr_t) cpu_addr, get_order(size));
+}
+
 static const struct dma_map_ops _mic_dma_ops = {
.map_page = _mic_dma_map_page,
.unmap_page = _mic_dma_unmap_page,
+   .mmap = _mic_dma_mmap,
+   .alloc = _mic_dma_alloc,
+   .free = _mic_dma_free,
 };
 
 static struct mic_irq *
@@ -100,6 +138,13 @@ static void *__mic_get_dp(struct vop_device *vpdev)
return mdev->dp;
 }
 
+static void *__mic_get_dp_dma(struct vop_device *vpdev)
+{
+   struct mic_device *mdev = vpdev_to_mdev(&vpdev->dev);
+
+   return mdev->dp_dma_addr;
+}
+
 static void __iomem *__mic_get_remote_dp(struct vop_device *vpdev)
 {
return NULL;
@@ -131,6 +176,7 @@ static struct vop_hw_ops vop_hw_ops = {
.ack_interrupt = __mic_ack_interrupt,
.next_db = __mic_next_db,
.get_dp = __mic_get_dp,
+   .get_dp_dma = __mic_get_dp_dma,
.get_remote_dp = __mic_get_remote_dp,
.send_intr = __mic_send_intr,
.ioremap = __mic_ioremap,
diff --git a/drivers/misc/mic/vop/vop_main.c b/drivers/misc/mic/vop/vop_main.c
index dd49169ef53d..6d4a4e8993bb 100644
--- a/drivers/misc/mic/vop/vop_main.c
+++ b/drivers/misc/mic/vop/vop_main.c
@@ -264,9 +264,8 @@ static void vop_del_vq(struct virtqueue *vq, int n)
struct vring *vr = (struct vring *)(vq + 1);
struct vop_device *vpdev = vdev->vpdev;
 
-   dma_unmap_single(&vpdev->dev, vdev->used[n],
-vdev->used_size[n], DMA_BIDIRECTIONAL);
-   free_pages((unsigned long)vr->used, get_order(vdev->used_size[n]));
+   dma_free_wc(&vpdev->dev, vdev->used_size[n], vr->used, vdev->used[n]);
+
vring_del_virtqueue(vq);
vpdev->hw_ops->iounmap(vpdev, vdev->vr[n]);

[PATCH 1/8] vop: Use %z for size_t

2019-01-16 Thread Vincent Whitchurch

Fixes these kind of errors on 32-bit:

 drivers/misc/mic/vop/vop_vringh.c:590:3:
 error: format '%lx' expects argument of type 'long unsigned int',
 but argument 7 has type 'size_t {aka unsigned int}' [-Werror=format=]

Signed-off-by: Vincent Whitchurch 
---
 drivers/misc/mic/vop/vop_vringh.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/mic/vop/vop_vringh.c 
b/drivers/misc/mic/vop/vop_vringh.c
index cbc8ebcff5cf..4267fa08b483 100644
--- a/drivers/misc/mic/vop/vop_vringh.c
+++ b/drivers/misc/mic/vop/vop_vringh.c
@@ -587,7 +587,7 @@ static int vop_virtio_copy_to_user(struct vop_vdev *vdev, 
void __user *ubuf,
 err:
vpdev->hw_ops->iounmap(vpdev, dbuf);
dev_dbg(vop_dev(vdev),
-   "%s: ubuf %p dbuf %p len 0x%lx vr_idx 0x%x\n",
+   "%s: ubuf %p dbuf %p len 0x%zx vr_idx 0x%x\n",
__func__, ubuf, dbuf, len, vr_idx);
return err;
 }
@@ -670,7 +670,7 @@ static int vop_virtio_copy_from_user(struct vop_vdev *vdev, 
void __user *ubuf,
 err:
vpdev->hw_ops->iounmap(vpdev, dbuf);
dev_dbg(vop_dev(vdev),
-   "%s: ubuf %p dbuf %p len 0x%lx vr_idx 0x%x\n",
+   "%s: ubuf %p dbuf %p len 0x%zx vr_idx 0x%x\n",
__func__, ubuf, dbuf, len, vr_idx);
return err;
 }
-- 
2.20.0

[PATCH 5/8] vop: vringh: Do not crash if no DMA channel

2019-01-16 Thread Vincent Whitchurch

Fallback gracefully if no DMA channel is provided instead of
dereferencing NULL pointers.

Signed-off-by: Vincent Whitchurch 
---
 drivers/misc/mic/vop/vop_vringh.c | 32 +++
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/drivers/misc/mic/vop/vop_vringh.c 
b/drivers/misc/mic/vop/vop_vringh.c
index ccdfbb652123..18d6ecff1834 100644
--- a/drivers/misc/mic/vop/vop_vringh.c
+++ b/drivers/misc/mic/vop/vop_vringh.c
@@ -531,12 +531,12 @@ static int vop_virtio_copy_to_user(struct vop_vdev *vdev, 
void __user *ubuf,
void __iomem *dbuf = vpdev->hw_ops->ioremap(vpdev, daddr, len);
struct vop_vringh *vvr = &vdev->vvr[vr_idx];
struct vop_info *vi = dev_get_drvdata(&vpdev->dev);
-   size_t dma_alignment = 1 << vi->dma_ch->device->copy_align;
-   bool x200 = is_dma_copy_aligned(vi->dma_ch->device, 1, 1, 1);
+   size_t dma_alignment;
+   bool x200;
size_t dma_offset, partlen;
int err;
 
-   if (!VOP_USE_DMA) {
+   if (!VOP_USE_DMA || !vi->dma_ch) {
if (copy_to_user(ubuf, (void __force *)dbuf, len)) {
err = -EFAULT;
dev_err(vop_dev(vdev), "%s %d err %d\n",
@@ -548,6 +548,9 @@ static int vop_virtio_copy_to_user(struct vop_vdev *vdev, 
void __user *ubuf,
goto err;
}
 
+   dma_alignment = 1 << vi->dma_ch->device->copy_align;
+   x200 = is_dma_copy_aligned(vi->dma_ch->device, 1, 1, 1);
+
dma_offset = daddr - round_down(daddr, dma_alignment);
daddr -= dma_offset;
len += dma_offset;
@@ -606,18 +609,23 @@ static int vop_virtio_copy_from_user(struct vop_vdev 
*vdev, void __user *ubuf,
void __iomem *dbuf = vpdev->hw_ops->ioremap(vpdev, daddr, len);
struct vop_vringh *vvr = &vdev->vvr[vr_idx];
struct vop_info *vi = dev_get_drvdata(&vdev->vpdev->dev);
-   size_t dma_alignment = 1 << vi->dma_ch->device->copy_align;
-   bool x200 = is_dma_copy_aligned(vi->dma_ch->device, 1, 1, 1);
+   size_t dma_alignment;
+   bool x200;
size_t partlen;
-   bool dma = VOP_USE_DMA;
+   bool dma = VOP_USE_DMA && vi->dma_ch;
int err = 0;
 
-   if (daddr & (dma_alignment - 1)) {
-   vdev->tx_dst_unaligned += len;
-   dma = false;
-   } else if (ALIGN(len, dma_alignment) > dlen) {
-   vdev->tx_len_unaligned += len;
-   dma = false;
+   if (dma) {
+   dma_alignment = 1 << vi->dma_ch->device->copy_align;
+   x200 = is_dma_copy_aligned(vi->dma_ch->device, 1, 1, 1);
+
+   if (daddr & (dma_alignment - 1)) {
+   vdev->tx_dst_unaligned += len;
+   dma = false;
+   } else if (ALIGN(len, dma_alignment) > dlen) {
+   vdev->tx_len_unaligned += len;
+   dma = false;
+   }
}
 
if (!dma)
-- 
2.20.0

[PATCH 6/8] vop: Fix handling of >32 feature bits

2019-01-16 Thread Vincent Whitchurch

This is needed, for example, for VIRTIO_F_IOMMU_PLATFORM.

Signed-off-by: Vincent Whitchurch 
---
 drivers/misc/mic/vop/vop_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/mic/vop/vop_main.c b/drivers/misc/mic/vop/vop_main.c
index 26b23f2cf94c..dd49169ef53d 100644
--- a/drivers/misc/mic/vop/vop_main.c
+++ b/drivers/misc/mic/vop/vop_main.c
@@ -117,7 +117,7 @@ _vop_total_desc_size(struct mic_device_desc __iomem *desc)
 static u64 vop_get_features(struct virtio_device *vdev)
 {
unsigned int i, bits;
-   u32 features = 0;
+   u64 features = 0;
struct mic_device_desc __iomem *desc = to_vopvdev(vdev)->desc;
u8 __iomem *in_features = _vop_vq_features(desc);
int feature_len = ioread8(&desc->feature_len);
@@ -125,7 +125,7 @@ static u64 vop_get_features(struct virtio_device *vdev)
bits = min_t(unsigned, feature_len, sizeof(vdev->features)) * 8;
for (i = 0; i < bits; i++)
if (ioread8(&in_features[i / 8]) & (BIT(i % 8)))
-   features |= BIT(i);
+   features |= BIT_ULL(i);
 
return features;
 }
-- 
2.20.0

[PATCH 4/8] vop: Allow building on more systems

2019-01-16 Thread Vincent Whitchurch

VOP_BUS does not actually depend on 32-bit X86 or PCI.  The code uses
archdata.dma_ops so it can be built on ARM too.

Signed-off-by: Vincent Whitchurch 
---
 drivers/misc/mic/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/mic/Kconfig b/drivers/misc/mic/Kconfig
index 227cc7443671..319ee3c90a26 100644
--- a/drivers/misc/mic/Kconfig
+++ b/drivers/misc/mic/Kconfig
@@ -38,7 +38,7 @@ comment "VOP Bus Driver"
 
 config VOP_BUS
tristate "VOP Bus Driver"
-   depends on 64BIT && PCI && X86 && X86_DEV_DMA_OPS
+   depends on ARM || (X86 && X86_DEV_DMA_OPS)
help
  This option is selected by any driver which registers a
  device or driver on the VOP Bus, such as CONFIG_INTEL_MIC_HOST
@@ -132,7 +132,7 @@ comment "VOP Driver"
 
 config VOP
tristate "VOP Driver"
-   depends on 64BIT && PCI && X86 && VOP_BUS
+   depends on VOP_BUS
select VHOST_RING
select VIRTIO
help
-- 
2.20.0

[PATCH 2/8] vop: Cast pointers to uintptr_t

2019-01-16 Thread Vincent Whitchurch

Fix these on 32-bit:

 vop_vringh.c:711:13: error: cast from pointer to integer of different
 size [-Werror=pointer-to-int-cast]

Signed-off-by: Vincent Whitchurch 
---
 drivers/misc/mic/vop/vop_main.c   | 8 
 drivers/misc/mic/vop/vop_vringh.c | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/misc/mic/vop/vop_main.c b/drivers/misc/mic/vop/vop_main.c
index 6b212c8b78e7..5a366cf68d95 100644
--- a/drivers/misc/mic/vop/vop_main.c
+++ b/drivers/misc/mic/vop/vop_main.c
@@ -497,7 +497,7 @@ static int _vop_add_device(struct mic_device_desc __iomem 
*d,
vdev->desc = d;
vdev->dc = (void __iomem *)d + _vop_aligned_desc_size(d);
vdev->dnode = dnode;
-   vdev->vdev.priv = (void *)(u64)dnode;
+   vdev->vdev.priv = (void *)(uintptr_t)dnode;
init_completion(&vdev->reset_done);
 
vdev->h2c_vdev_db = vpdev->hw_ops->next_db(vpdev);
@@ -519,7 +519,7 @@ static int _vop_add_device(struct mic_device_desc __iomem 
*d,
offset, type);
goto free_irq;
}
-   writeq((u64)vdev, &vdev->dc->vdev);
+   writeq((uintptr_t)vdev, &vdev->dc->vdev);
dev_dbg(_vop_dev(vdev), "%s: registered vop device %u type %u vdev 
%p\n",
__func__, offset, type, vdev);
 
@@ -552,7 +552,7 @@ static void _vop_handle_config_change(struct 
mic_device_desc __iomem *d,
 {
struct mic_device_ctrl __iomem *dc
= (void __iomem *)d + _vop_aligned_desc_size(d);
-   struct _vop_vdev *vdev = (struct _vop_vdev *)readq(&dc->vdev);
+   struct _vop_vdev *vdev = (struct _vop_vdev 
*)(uintptr_t)readq(&dc->vdev);
 
if (ioread8(&dc->config_change) != MIC_VIRTIO_PARAM_CONFIG_CHANGED)
return;
@@ -571,7 +571,7 @@ static int _vop_remove_device(struct mic_device_desc 
__iomem *d,
 {
struct mic_device_ctrl __iomem *dc
= (void __iomem *)d + _vop_aligned_desc_size(d);
-   struct _vop_vdev *vdev = (struct _vop_vdev *)readq(&dc->vdev);
+   struct _vop_vdev *vdev = (struct _vop_vdev 
*)(uintptr_t)readq(&dc->vdev);
u8 status;
int ret = -1;
 
diff --git a/drivers/misc/mic/vop/vop_vringh.c 
b/drivers/misc/mic/vop/vop_vringh.c
index 4267fa08b483..ccdfbb652123 100644
--- a/drivers/misc/mic/vop/vop_vringh.c
+++ b/drivers/misc/mic/vop/vop_vringh.c
@@ -708,12 +708,12 @@ static int vop_vringh_copy(struct vop_vdev *vdev, struct 
vringh_kiov *iov,
partlen = min(kiov->iov_len, len);
if (read)
ret = vop_virtio_copy_to_user(vdev, ubuf, partlen,
- (u64)kiov->iov_base,
+ (uintptr_t)kiov->iov_base,
  kiov->iov_len,
  vr_idx);
else
ret = vop_virtio_copy_from_user(vdev, ubuf, partlen,
-   (u64)kiov->iov_base,
+   
(uintptr_t)kiov->iov_base,
kiov->iov_len,
vr_idx);
if (ret) {
-- 
2.20.0

[PATCH 0/8] Virtio-over-PCIe on non-MIC

2019-01-16 Thread Vincent Whitchurch

The Virtio-over-PCIe framework living under drivers/misc/mic/vop implements a
generic framework to use virtio between two Linux systems, given shared memory
and a couple of interrupts.  It does not actually require the Intel MIC
hardware, x86-64, or even PCIe for that matter.  This patch series makes it
buildable on more systems and adds a loopback driver to test it without special
hardware.

Note that I don't have access to Intel MIC hardware so some testing of the
patchset (especially the patch "vop: Use consistent DMA") on that platform
would be appreciated, to ensure that the series does not break anything there.

Vincent Whitchurch (8):
  vop: Use %z for size_t
  vop: Cast pointers to uintptr_t
  vop: Add definition of readq/writeq if missing
  vop: Allow building on more systems
  vop: vringh: Do not crash if no DMA channel
  vop: Fix handling of >32 feature bits
  vop: Use consistent DMA
  vop: Add loopback

 drivers/misc/mic/Kconfig|  14 +-
 drivers/misc/mic/bus/vop_bus.h  |   2 +
 drivers/misc/mic/host/mic_boot.c|  46 
 drivers/misc/mic/vop/Makefile   |   2 +
 drivers/misc/mic/vop/vop_loopback.c | 390 
 drivers/misc/mic/vop/vop_main.c |  36 +--
 drivers/misc/mic/vop/vop_vringh.c   | 151 ++-
 7 files changed, 549 insertions(+), 92 deletions(-)
 create mode 100644 drivers/misc/mic/vop/vop_loopback.c

-- 
2.20.0

Re: [PATCH 0/8] Virtio-over-PCIe on non-MIC

2019-01-17 Thread Vincent Whitchurch

On Wed, Jan 16, 2019 at 06:07:53PM +0100, Arnd Bergmann wrote:
> On Wed, Jan 16, 2019 at 5:33 PM Vincent Whitchurch 
>  wrote:
> > The Virtio-over-PCIe framework living under drivers/misc/mic/vop implements 
> > a
> > generic framework to use virtio between two Linux systems, given shared 
> > memory
> > and a couple of interrupts.  It does not actually require the Intel MIC
> > hardware, x86-64, or even PCIe for that matter.  This patch series makes it
> > buildable on more systems and adds a loopback driver to test it without 
> > special
> > hardware.
> >
> > Note that I don't have access to Intel MIC hardware so some testing of the
> > patchset (especially the patch "vop: Use consistent DMA") on that platform
> > would be appreciated, to ensure that the series does not break anything 
> > there.
> 
> I think we need to take a step back though and discuss what combinations
> we actually do want to support. I have not actually read the whole mic/vop
> driver, so I don't know if this would be a good fit as a generic interface --
> it may or may not be, and any other input would be helpful.

The MIC driver as a whole is uninteresting as a generic interface since
it is quite tied to the Intel hardware.  The VOP parts though are
logically separated and have no relation to that hardware, even if the
ioctls are called MIC_VIRTIO_*.

The samples/mic/mpssd/mpssd.c code handles both the boot of the MIC
(sysfs) and the VOP parts (ioctls).

> Aside from that, I should note that we have two related subsystems
> in the kernel: the PCIe endpoint subsystem maintained by Kishon and
> Lorenzo, and the NTB subsystem maintained by Jon, Dave and Allen.
> 
> In order to properly support virtio over PCIe, I would hope we can come
> up with a user space interface that looks the same way for configuring
> virtio drivers in mic, pcie-endpoint and ntb, if at all possible. Have
> you looked at those two subsystems?

pcie-endpoint is a generic framework that allows Linux to act as an
endpoint and set up the BARs, etc.  mic appears to have Intel
MIC-specific code for this (pre-dating pcie-endpoint) but this is
separate from the vop code.  pcie-endpoint and vop do not have
overlapping functionality and can be used together.

I'm not familiar with NTB, but from a quick look it seems to be tied to
special hardware, and I don't see any virtio-related code there.  A vop
backend for NTB-backend would presumably work to allow virtio
functionality there.

Re: [PATCH] sysctl: Add panic-fatal-signals

2019-01-17 Thread Vincent Whitchurch

On Wed, Jan 16, 2019 at 09:06:18AM -0800, Kees Cook wrote:
> On Wed, Jan 16, 2019 at 4:54 AM Vincent Whitchurch
>  wrote:
> > Add a sysctl which asks the kernel to panic when any userspace process
> > receives a fatal signal which would trigger a core dump.  This has
> > proven to be quite useful when debugging problems seen during testing of
> > embedded systems:  When combined with kernel core dumps (saved due to
> > the panic), it allows easier debugging of crashes which have their
> > origin in system-wide problems such as buggy drivers or other kernel or
> > hardware-related issues.
> >
> > The crashing process's core dump can be extracted from the kernel core
> > dump using tools such as the crash utility's gcore extension.
> 
> Whoa. That's intense. :)

"Insane" is the word I used myself in an earlier draft of the commit
message. :)

> If you have a use-case for it, then I guess I
> can't complain. Given that this is an instant DoS for a system that
> isn't expecting it, I do wonder if it should live behind a CONFIG or
> something to make it available only when really wanted?

Sure, I can put it behind a "depends on EXPERT" config option.  Distros
and the like should have no reason to provide access to such a sysctl.

Re: [PATCH 0/8] Virtio-over-PCIe on non-MIC

2019-01-17 Thread Vincent Whitchurch

On Thu, Jan 17, 2019 at 01:39:27PM +0100, Arnd Bergmann wrote:
> Correct, and again we have to see if this is a good interface. The NTB
> and PCIe-endpoint interfaces have a number of differences and a
> number of similarities. In particular they should both be usable with
> virtio-style drivers, but the underlying hardware differs mainly in how
> it is probed by the system: an NTB is seen as a PCI device attached
> to two host bridges, while and endpoint is typically a platform_device
> on one side, but a pci_dev on the other side.
> 
> Can you describe how you expect a VOP device over NTB or
> PCIe-endpoint would get created, configured and used?

Assuming PCIe-endpoint:

On the RC, a vop-host-backend driver (PCI driver) sets up some shared
memory area which the RC and the endpoint can use to communicate the
location of the MIC device descriptors and other information such as the
MSI address.  It implements vop callbacks to allow the vop framework to
obtain the address of the MIC descriptors and send/receive interrupts
to/from the guest.

On the endpoint, the PCIe endpoint driver sets up (hardcoded) BARs and
memory regions as required to allow the endpoint and the root complex to
access each other's memory.

On the endpoint, the vop-guest-backend, via the shared memory set up by
the vop-host-backend, obtains the address of the MIC device page and the
MSI address, and a method to receive vop interrupts from the host.  This
information is used to implement the vop callbacks allowing the vop
framework to access to the MIC device page and send/receive interrupts
from/to the host.

vop (despite its name) doesn't care about PCIe.  The vop-guest-backend
doesn't actually need to talk to the PCIe endpoint driver.  The
vop-guest-backend can be probed via any means, such as via a device tree
on the endpoint.

On the RC, userspace opens the vop device and adds the virtio devices,
which end up in the MIC device page set up by the vop-host-backend.

On the endpoint, when the vop framework (via the vop-guest-backend) sees
these devices, it registers devices on the virtio bus and the virtio
drivers are probed.

On the RC, userspace implements the device end of the virtio
communication in userspace, using the MIC_VIRTIO_COPY_DESC ioctl.  I
also have patches to support vhost.

> Is there always one master side that is responsible for creating
> virtio devices on it, with the slave side automatically attaching to
> them, or can either side create virtio devices?

Only the master can create virtio devices.  The virtio drivers run on
the slave.

> Is there any limit on
> the number of virtio devices or queues within a VOP device?

The virtio device information (mic_device_desc) is put into the MIC
device page whose size is limited by the ABI header in
include/uapi/linux/mic_ioctl.h (MIC_DP_SIZE, 4096 bytes).  So the number
of devices is limited by the limit of the number of device descriptors
that can fit in that size.  There is also a per-device limit on the
number of vrings (MIC_VRING_ENTRIES) and vring entries
(MIC_VRING_ENTRIES) in the ABI header.

Re: [PATCH 0/8] Virtio-over-PCIe on non-MIC

2019-01-17 Thread Vincent Whitchurch

On Thu, Jan 17, 2019 at 07:21:42AM -0800, Christoph Hellwig wrote:
> On Thu, Jan 17, 2019 at 04:19:06PM +0100, Vincent Whitchurch wrote:
> > On the RC, a vop-host-backend driver (PCI driver) sets up some shared
> > memory area which the RC and the endpoint can use to communicate the
> > location of the MIC device descriptors and other information such as the
> > MSI address.  It implements vop callbacks to allow the vop framework to
> > obtain the address of the MIC descriptors and send/receive interrupts
> > to/from the guest.
> 
> Why would we require any work on the RC / host side?  A properly
> setup software controlled virtio device should just show up as a
> normal PCIe device, and the virtio-pci device should bind to it.

If I understand you correctly, I think you're talking about the RC
running the virtio drivers and the endpoint implementing the virtio
device?  This vop stuff is used for the other way around: the virtio
device is implement on the RC and the endpoint runs the virtio drivers.

Re: [PATCH 0/8] Virtio-over-PCIe on non-MIC

2019-01-17 Thread Vincent Whitchurch

On Thu, Jan 17, 2019 at 04:53:25PM +0100, Arnd Bergmann wrote:
> On Thu, Jan 17, 2019 at 4:19 PM Vincent Whitchurch
>  wrote:
> > On Thu, Jan 17, 2019 at 01:39:27PM +0100, Arnd Bergmann wrote:
> > > Can you describe how you expect a VOP device over NTB or
> > > PCIe-endpoint would get created, configured and used?
> >
> > Assuming PCIe-endpoint:
> >
> > On the RC, a vop-host-backend driver (PCI driver) sets up some shared
> > memory area which the RC and the endpoint can use to communicate the
> > location of the MIC device descriptors and other information such as the
> > MSI address.  It implements vop callbacks to allow the vop framework to
> > obtain the address of the MIC descriptors and send/receive interrupts
> > to/from the guest.
> >
> > On the endpoint, the PCIe endpoint driver sets up (hardcoded) BARs and
> > memory regions as required to allow the endpoint and the root complex to
> > access each other's memory.
> >
> > On the endpoint, the vop-guest-backend, via the shared memory set up by
> > the vop-host-backend, obtains the address of the MIC device page and the
> > MSI address, and a method to receive vop interrupts from the host.  This
> > information is used to implement the vop callbacks allowing the vop
> > framework to access to the MIC device page and send/receive interrupts
> > from/to the host.
> 
> Ok, this seems fine so far. So the vop-host-backend is a regular PCI
> driver that implements the VOP protocol from the host side, and it
> can talk to either a MIC, or another guest-backend written for the PCI-EP
> framework to implement the same protocol, right?

Yes, but just to clarify:  the placement of the device page and the way
to communicate the location of the device page address and any other
information needed by the guest-backend are hardware-specific so there
is no generic vop-host-backend implementation which can talk to both a
MIC and to something else.

> > vop (despite its name) doesn't care about PCIe.  The vop-guest-backend
> > doesn't actually need to talk to the PCIe endpoint driver.  The
> > vop-guest-backend can be probed via any means, such as via a device tree
> > on the endpoint.
> >
> > On the RC, userspace opens the vop device and adds the virtio devices,
> > which end up in the MIC device page set up by the vop-host-backend.
> >
> > On the endpoint, when the vop framework (via the vop-guest-backend) sees
> > these devices, it registers devices on the virtio bus and the virtio
> > drivers are probed.
> 
> Ah, so the direction is fixed, and it's the opposite of what Christoph
> and I were expecting. This is probably something we need to discuss
> a bit. From what I understand, there is no technical requirement why
> it has to be this direction, right?

I don't think the vop framework itself has any such requirement.

The MIC uses it in this way (see Documentation/mic/mic_overview.txt) and
it also makes sense (to me, at least) if one wants to treat the endpoint
like one would treat a virtualized guest.

> What I mean is that the same vop framework could work with
> a PCI-EP driver implementing the vop-host-backend and
> a PCI driver implementing the vop-guest-backend? In order
> to do this, the PCI-EP configuration would need to pick whether
> it wants the EP to be the vop host or guest, but having more
> flexibility in it (letting each side add virtio devices) would be
> harder to do.

Correct, this is my understanding also.

> > On the RC, userspace implements the device end of the virtio
> > communication in userspace, using the MIC_VIRTIO_COPY_DESC ioctl.  I
> > also have patches to support vhost.
> 
> This is a part I don't understand yet. Does this mean that the
> normal operation is between a user space process on the vop-host
> talking to the kernel on the vop-guest?

Yes.  For example, the guest mounts a 9p filesystem with virtio-9p and
the 9p server is implemented in a userspace process on the host.  This
is again similar to virtualization.

> I'm a bit worried about the ioctl interface here, as this combines the
> configuration side with the actual data transfer, and that seems
> a bit inflexible.
>
> > > Is there always one master side that is responsible for creating
> > > virtio devices on it, with the slave side automatically attaching to
> > > them, or can either side create virtio devices?
> >
> > Only the master can create virtio devices.  The virtio drivers run on
> > the slave.
> 
> Ok.
> 
> > > Is there any limit on
> > > the number of virtio devices or queues within a VOP device?
> >
> > The virtio device infor

[PATCH] mic: vop: Fix crash on remove

2019-02-01 Thread Vincent Whitchurch

The remove path contains a hack which depends on internal structures in
other source files, similar to the one which was recently removed from
the registration path.  Since commit 1ce9e6055fa0 ("virtio_ring:
introduce packed ring support"), this leads to a crash when vop devices
are removed.

The structure in question is only examined to get the virtual address of
the allocated used page.  Store that pointer locally instead to fix the
crash.

Fixes: 1ce9e6055fa0 ("virtio_ring: introduce packed ring support")
Signed-off-by: Vincent Whitchurch 
---
 drivers/misc/mic/vop/vop_main.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/mic/vop/vop_main.c b/drivers/misc/mic/vop/vop_main.c
index eb8d0937abda..c4517703dc1d 100644
--- a/drivers/misc/mic/vop/vop_main.c
+++ b/drivers/misc/mic/vop/vop_main.c
@@ -48,7 +48,8 @@
  * @dc: Virtio device control
  * @vpdev: VOP device which is the parent for this virtio device
  * @vr: Buffer for accessing the VRING
- * @used: Buffer for used
+ * @used_virt: Virtual address of used ring
+ * @used: DMA address of used ring
  * @used_size: Size of the used buffer
  * @reset_done: Track whether VOP reset is complete
  * @virtio_cookie: Cookie returned upon requesting a interrupt
@@ -62,6 +63,7 @@ struct _vop_vdev {
struct mic_device_ctrl __iomem *dc;
struct vop_device *vpdev;
void __iomem *vr[VOP_MAX_VRINGS];
+   void *used_virt[VOP_MAX_VRINGS];
dma_addr_t used[VOP_MAX_VRINGS];
int used_size[VOP_MAX_VRINGS];
struct completion reset_done;
@@ -261,12 +263,12 @@ static bool vop_notify(struct virtqueue *vq)
 static void vop_del_vq(struct virtqueue *vq, int n)
 {
struct _vop_vdev *vdev = to_vopvdev(vq->vdev);
-   struct vring *vr = (struct vring *)(vq + 1);
struct vop_device *vpdev = vdev->vpdev;
 
dma_unmap_single(&vpdev->dev, vdev->used[n],
 vdev->used_size[n], DMA_BIDIRECTIONAL);
-   free_pages((unsigned long)vr->used, get_order(vdev->used_size[n]));
+   free_pages((unsigned long)vdev->used_virt[n],
+  get_order(vdev->used_size[n]));
vring_del_virtqueue(vq);
vpdev->hw_ops->unmap(vpdev, vdev->vr[n]);
vdev->vr[n] = NULL;
@@ -355,6 +357,7 @@ static struct virtqueue *vop_find_vq(struct virtio_device 
*dev,
 le16_to_cpu(config.num));
used = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
get_order(vdev->used_size[index]));
+   vdev->used_virt[index] = used;
if (!used) {
err = -ENOMEM;
dev_err(_vop_dev(vdev), "%s %d err %d\n",
-- 
2.20.0

[PATCH] mic: vop: Fix use-after-free on remove

2019-02-01 Thread Vincent Whitchurch

KASAN detects a use-after-free when vop devices are removed.

This problem was introduced by commit 0063e8bbd2b62d136 ("virtio_vop:
don't kfree device on register failure").  That patch moved the freeing
of the struct _vop_vdev to the release function, but failed to ensure
that vop holds a reference to the device when it doesn't want it to go
away.  A kfree() was replaced with a put_device() in the unregistration
path, but the last reference to the device is already dropped in
unregister_virtio_device() so the struct is freed before vop is done
with it.

Fix it by holding a reference until cleanup is done.  This is similar to
the fix in virtio_pci in commit 2989be09a8a9d6 ("virtio_pci: fix use
after free on release").

 ==
 BUG: KASAN: use-after-free in vop_scan_devices+0xc6c/0xe50 [vop]
 Read of size 8 at addr 88800da18580 by task kworker/0:1/12

 CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.0.0-rc4+ #53
 Workqueue: events vop_hotplug_devices [vop]
 Call Trace:
  dump_stack+0x74/0xbb
  print_address_description+0x5d/0x2b0
  ? vop_scan_devices+0xc6c/0xe50 [vop]
  kasan_report+0x152/0x1aa
  ? vop_scan_devices+0xc6c/0xe50 [vop]
  ? vop_scan_devices+0xc6c/0xe50 [vop]
  vop_scan_devices+0xc6c/0xe50 [vop]
  ? vop_loopback_free_irq+0x160/0x160 [vop_loopback]
  process_one_work+0x7c0/0x14b0
  ? pwq_dec_nr_in_flight+0x2d0/0x2d0
  ? do_raw_spin_lock+0x120/0x280
  worker_thread+0x8f/0xbf0
  ? __kthread_parkme+0x78/0xf0
  ? process_one_work+0x14b0/0x14b0
  kthread+0x2ae/0x3a0
  ? kthread_park+0x120/0x120
  ret_from_fork+0x3a/0x50

 Allocated by task 12:
  kmem_cache_alloc_trace+0x13a/0x2a0
  vop_scan_devices+0x473/0xe50 [vop]
  process_one_work+0x7c0/0x14b0
  worker_thread+0x8f/0xbf0
  kthread+0x2ae/0x3a0
  ret_from_fork+0x3a/0x50

 Freed by task 12:
  kfree+0x104/0x310
  device_release+0x73/0x1d0
  kobject_put+0x14f/0x420
  unregister_virtio_device+0x32/0x50
  vop_scan_devices+0x19d/0xe50 [vop]
  process_one_work+0x7c0/0x14b0
  worker_thread+0x8f/0xbf0
  kthread+0x2ae/0x3a0
  ret_from_fork+0x3a/0x50

 The buggy address belongs to the object at 88800da18008
  which belongs to the cache kmalloc-2k of size 2048
 The buggy address is located 1400 bytes inside of
  2048-byte region [88800da18008, 88800da18808)
 The buggy address belongs to the page:
 page:ea368600 count:1 mapcount:0 mapping:88801440dbc0 index:0x0 
compound_mapcount: 0
 flags: 0x40010200(slab|head)
 raw: 40010200 ea378608 ea37a008 88801440dbc0
 raw:  000d000d 0001 
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  88800da18480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  88800da18500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 >88800da18580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
  88800da18600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  88800da18680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ==

Fixes: 0063e8bbd2b62d136 ("virtio_vop: don't kfree device on register failure")
Signed-off-by: Vincent Whitchurch 
---
 drivers/misc/mic/vop/vop_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/mic/vop/vop_main.c b/drivers/misc/mic/vop/vop_main.c
index 6606e704b832..c4517703dc1d 100644
--- a/drivers/misc/mic/vop/vop_main.c
+++ b/drivers/misc/mic/vop/vop_main.c
@@ -598,6 +598,8 @@ static int _vop_remove_device(struct mic_device_desc 
__iomem *d,
int ret = -1;
 
if (ioread8(&dc->config_change) == MIC_VIRTIO_PARAM_DEV_REMOVE) {
+   struct device *dev = get_device(&vdev->vdev.dev);
+
dev_dbg(&vpdev->dev,
"%s %d config_change %d type %d vdev %p\n",
__func__, __LINE__,
@@ -609,7 +611,7 @@ static int _vop_remove_device(struct mic_device_desc 
__iomem *d,
iowrite8(-1, &dc->h2c_vdev_db);
if (status & VIRTIO_CONFIG_S_DRIVER_OK)
wait_for_completion(&vdev->reset_done);
-   put_device(&vdev->vdev.dev);
+   put_device(dev);
iowrite8(1, &dc->guest_ack);
dev_dbg(&vpdev->dev, "%s %d guest_ack %d\n",
__func__, __LINE__, ioread8(&dc->guest_ack));
-- 
2.20.0

Re: [PATCH v2] mic: vop: Fix broken virtqueues

2019-02-01 Thread Vincent Whitchurch

On Wed, Jan 30, 2019 at 08:40:23PM +0100, Greg KH wrote:
> This patch is already in my tree, can you just send a "fixup" patch
> instead of me  having to revert the old one and then adding this one
> again?

I've now posted the fixup patch separately as "mic: vop: Fix crash on
remove".  Thanks.

Re: [PATCH] tty: Add NULL TTY driver

2019-04-05 Thread Vincent Whitchurch

On Fri, Apr 05, 2019 at 10:39:43AM +0200, Enrico Weigelt, metux IT consult 
wrote:
> On 03.04.19 16:11, Vincent Whitchurch wrote:
> 
> > Especially on embedded systems, it would be convenient to have a simple
> > way to disable the console (both for kernel and userspace) on a system
> > which normally uses it, to free up the UART for other things.
> 
> Just symlinking to /dev/null does not work ?

No, /dev/null does not support the TTY ioctls.

> OTOH, if you're introducing a dummy console, wouldn't a ringbuffer that,
> can be read out later, a better option ?

There is already a ttyprintk driver in mainline to send these messages
to the printk ring buffer if one is actually intrested in what is
written to the console.  There's no option to enable it via console= in
mainline but I have a patch for that too.

[PATCH v2] printk: Do not lose last line in kmsg buffer dump

2019-07-11 Thread Vincent Whitchurch

kmsg_dump_get_buffer() is supposed to select all the youngest log
messages which fit into the provided buffer.  It determines the correct
start index by using msg_print_text() with a NULL buffer to calculate
the size of each entry.  However, when performing the actual writes,
msg_print_text() only writes the entry to the buffer if the written len
is lesser than the size of the buffer.  So if the lengths of the
selected youngest log messages happen to precisely fill up the provided
buffer, the last log message is not included.

We don't want to modify msg_print_text() to fill up the buffer and start
returning a length which is equal to the size of the buffer, since
callers of its other users, such as kmsg_dump_get_line(), depend upon
the current behaviour.

Instead, fix kmsg_dump_get_buffer() to compensate for this.

For example, with the following two final prints:

[6.427502] A
[6.427769] 12345

A dump of a 64-byte buffer filled by kmsg_dump_get_buffer(), before this
patch:

 : 3c 30 3e 5b 20 20 20 20 36 2e 35 32 32 31 39 37  <0>[6.522197
 0010: 5d 20 41 41 41 41 41 41 41 41 41 41 41 41 41 0a  ] A.
 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

After this patch:

 : 3c 30 3e 5b 20 20 20 20 36 2e 34 35 36 36 37 38  <0>[6.456678
 0010: 5d 20 42 42 42 42 42 42 42 42 31 32 33 34 35 0a  ] 12345.
 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

Signed-off-by: Vincent Whitchurch 
---
v2: Move fix to kmsg_dump_get_buffer()

 kernel/printk/printk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1888f6a3b694..424abf802f02 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3274,7 +3274,7 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, 
bool syslog,
/* move first record forward until length fits into the buffer */
seq = dumper->cur_seq;
idx = dumper->cur_idx;
-   while (l > size && seq < dumper->next_seq) {
+   while (l >= size && seq < dumper->next_seq) {
struct printk_log *msg = log_from_idx(idx);
 
l -= msg_print_text(msg, true, time, NULL, 0);
-- 
2.20.0

Re: [PATCH v2] printk: Do not lose last line in kmsg buffer dump

2019-07-12 Thread Vincent Whitchurch

On Fri, Jul 12, 2019 at 10:09:04AM +0200, Petr Mladek wrote:
> The patch looks like a hack using a hole that the next cycle
> does not longer check the number of really stored characters.
> 
> What would happen when msg_print_text() starts adding
> the trailing '\0' as suggested by
> https://lkml.kernel.org/r/20190710121049.rwhk7fknfzn3c...@pathway.suse.cz

I did have a look at that possibility, but I didn't see how that could
work without potentially affecting userspace users of the syslog ABI.
AFAICS the suggested change in msg_print_text() can be done in one of
three ways:

 (1) msg_print_text() adds the '\0' and includes this length both when
 it estimates the size (NULL buffer) and when it actually prints:

 If we do this:
 - kmsg_dump_get_line_nolock() would have to subtract 1 from the len
   since its callers expected that len is always smaller than the
   size of the buffer.
 - The buffers given to use via the syslog interface will now include
   a '\0', potentially affecting userspace applications which use
   this ABI.
 
 (2) msg_print_text() adds the '\0', and includes this in the length
 only when estimating the size, and not when it actually prints.

 If we do this:
 - SYSLOG_ACTION_SIZE_UNREAD tries uses the size estimate to give
   userspace a count of how many characters are present in the
   buffer, and now this count will start differing from the actual
   count that can be read, potentially affecting userspace
   applications.

 (3) msg_print_text() adds the '\0', and does not include this length
 in the result at all.

 If we do this:
 - The original kmsg dump issue is not solved, since the last line
   is still lost.

> BTW: What is the motivation for this fix? Is a bug report
> or just some research of possible buffer overflows?

The fix is not attempting to fix a buffer overflow, theoretical or
otherwise.

It's a fix for a bug in functionality which has been observed on our
systems:  We use pstore to save the kernel log when the kernel crashes,
and sometimes the log in the pstore misses the last line, and since the
last line usual says why we're panicing so it's rather important not to
miss.

> The commit message pretends that the problem is bigger than
> it really is. It is about one byte and not one line.

I'm not quite sure I follow.  The current code does fail to include the
*entire* last line.

The memcpy on line #1294 is never executed for the last line because we
stop the loop because of the check on line #1289:

  1270  static size_t msg_print_text(const struct printk_log *msg, bool syslog, 
char *buf, size_t size)
  1271  {
  1272  const char *text = log_text(msg);
  1273  size_t text_size = msg->text_len;
  1274  size_t len = 0;
  1275  
  1276  do {
  1277  const char *next = memchr(text, '\n', text_size);
  1278  size_t text_len;
  1279  
  1280  if (next) {
  1281  text_len = next - text;
  1282  next++;
  1283  text_size -= next - text;
  1284  } else {
  1285  text_len = text_size;
  1286  }
  1287  
  1288  if (buf) {
  1289  if (print_prefix(msg, syslog, NULL) +
  1290  text_len + 1 > size - len)
  1291  break;
  1292  
  1293  len += print_prefix(msg, syslog, buf + len);
  1294  memcpy(buf + len, text, text_len);
  1295  len += text_len;
  1296  buf[len++] = '\n';
  1297  } else {
  1298  /* SYSLOG_ACTION_* buffer size only calculation 
*/
  1299  len += print_prefix(msg, syslog, NULL);
  1300  len += text_len;
  1301  len++;
  1302  }

[PATCH] crypto: cryptd - Fix skcipher instance memory leak

2019-07-02 Thread Vincent Whitchurch

cryptd_skcipher_free() fails to free the struct skcipher_instance
allocated in cryptd_create_skcipher(), leading to a memory leak.  This
is detected by kmemleak on bootup on ARM64 platforms:

 unreferenced object 0x80003377b180 (size 1024):
   comm "cryptomgr_probe", pid 822, jiffies 4294894830 (age 52.760s)
   backtrace:
 kmem_cache_alloc_trace+0x270/0x2d0
 cryptd_create+0x990/0x124c
 cryptomgr_probe+0x5c/0x1e8
 kthread+0x258/0x318
 ret_from_fork+0x10/0x1c

Signed-off-by: Vincent Whitchurch 
---
 crypto/cryptd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/crypto/cryptd.c b/crypto/cryptd.c
index 1ce1bf6d3bab..5f76c6e222c6 100644
--- a/crypto/cryptd.c
+++ b/crypto/cryptd.c
@@ -388,6 +388,7 @@ static void cryptd_skcipher_free(struct skcipher_instance 
*inst)
struct skcipherd_instance_ctx *ctx = skcipher_instance_ctx(inst);
 
crypto_drop_skcipher(&ctx->spawn);
+   kfree(inst);
 }
 
 static int cryptd_create_skcipher(struct crypto_template *tmpl,
-- 
2.20.0

[PATCH] printk: Do not lose last line in kmsg dump

2019-07-09 Thread Vincent Whitchurch

kmsg_dump_get_buffer() is supposed to select the youngest log messages
which fit into the provided buffer.  When that function determines the
correct start index, by looping and calling msg_print_text() with a NULL
buffer, msg_print_text() calculates a length which would allow the
youngest log messages to completely fill the provided buffer.

However, when doing the actual printing, an off-by-one error in
msg_print_text() leads to that function allowing the provided buffer to
only be filled to (size - 1).

So if the lengths of the selected youngest log messages happen to
precisely fill up the provided buffer, the last log message is not
included.

For example, with the following two final prints:

[6.427502] A
[6.427769] 12345

A dump of a 64-byte buffer filled by kmsg_dump_get_buffer(), before this
patch:

 : 3c 30 3e 5b 20 20 20 20 36 2e 35 32 32 31 39 37  <0>[6.522197
 0010: 5d 20 41 41 41 41 41 41 41 41 41 41 41 41 41 0a  ] A.
 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

After this patch:

 : 3c 30 3e 5b 20 20 20 20 36 2e 34 32 37 35 30 32  <0>[6.427502
 0010: 5d 20 41 41 41 41 41 41 41 41 41 41 41 41 41 0a  ] A.
 0020: 3c 30 3e 5b 20 20 20 20 36 2e 34 32 37 37 36 39  <0>[6.427769
 0030: 5d 20 42 42 42 42 42 42 42 42 31 32 33 34 35 0a  ] 12345.

Note that this bug only affects the kmsg dump code.  msg_print_text() is
also used from the syslog code and console_unlock() but this bug does
trigger there since the buffers used there are never filled up
completely (since they are only used to print individual lines, and
their size is always LOG_LINE_MAX + PREFIX_MAX, and PREFIX_MAX has a
value which is larger than the largest possible prefix).

Signed-off-by: Vincent Whitchurch 
---
I posted this patch two years ago and received no replies.  This problem is
still present in mainline.  https://lore.kernel.org/patchwork/patch/781106/

 kernel/printk/printk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1888f6a3b694..7679d779d5cc 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1318,7 +1318,7 @@ static size_t msg_print_text(const struct printk_log 
*msg, bool syslog,
}
 
if (buf) {
-   if (prefix_len + text_len + 1 >= size - len)
+   if (prefix_len + text_len + 1 > size - len)
break;
 
memcpy(buf + len, prefix, prefix_len);
-- 
2.20.0

Re: [PATCH] printk: Do not lose last line in kmsg dump

2019-07-10 Thread Vincent Whitchurch

On Tue, Jul 09, 2019 at 04:29:39PM +0200, Petr Mladek wrote:
> On Tue 2019-07-09 19:12:30, Sergey Senozhatsky wrote:
> > On (07/09/19 10:10), Vincent Whitchurch wrote:
> > > A dump of a 64-byte buffer filled by kmsg_dump_get_buffer(), before this
> > > patch:
> > > 
> > >  : 3c 30 3e 5b 20 20 20 20 36 2e 35 32 32 31 39 37  <0>[
> > > 6.522197
> > >  0010: 5d 20 41 41 41 41 41 41 41 41 41 41 41 41 41 0a  ] 
> > > A.
> > >  0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> > > 
> > >  0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> > > 
> > > 
> > > After this patch:
> > > 
> > >  : 3c 30 3e 5b 20 20 20 20 36 2e 34 32 37 35 30 32  <0>[
> > > 6.427502
> > >  0010: 5d 20 41 41 41 41 41 41 41 41 41 41 41 41 41 0a  ] 
> > > A.
> > >  0020: 3c 30 3e 5b 20 20 20 20 36 2e 34 32 37 37 36 39  <0>[
> > > 6.427769
> > >  0030: 5d 20 42 42 42 42 42 42 42 42 31 32 33 34 35 0a  ] 
> > > 12345.
> > 
> > [..]
> > 
> > > @@ -1318,7 +1318,7 @@ static size_t msg_print_text(const struct 
> > > printk_log *msg, bool syslog,
> > >   }
> > >  
> > >   if (buf) {
> > > - if (prefix_len + text_len + 1 >= size - len)
> > > + if (prefix_len + text_len + 1 > size - len)
> > >   break;
> > 
> > So with this patch the last byte of the buffer is 0xA. It's a bit
> > uncomfortable that `len', which we return from msg_print_text(),
> > now points one byte beyond the buffer:
> > 
> > buf[len++] = '\n';
> > return len;
> > 
> > This is not very common. Not sure what usually happens to kmsg_dump
> > buffers, but anyone who'd do a rather innocent
> > 
> > kmsg_dump(buf, &len);
> > buf[len] = 0x00;
> > 
> > will write to something which is not a kmsg buffer (in some cases).
> 
> I have the same worries.
> 
> On the other hand. The function does not store the trailing '\0'
> into the buffer itself. The callers would need to add it themself.
> It is their responsibility to avoid a buffer overflow.
> 
> I have checked several users and it seems that nobody adds or
> needs the trailing '\0'.
> 
> It is ugly to do not use the entire buffer just because theoretical
> buggy users.

All the callers of kmsg_dump_get_buffer() seem to be fine but these two
users of kmsg_dump_get_line(), which also calls msg_print_text(), have
exactly the code which you and Sergey worry about:

arch/powerpc/xmon/xmon.c
2836:   while (kmsg_dump_get_line_nolock(&dumper, false, buf, sizeof(buf), 
&len)) {
2837-   buf[len] = '\0';

arch/um/kernel/kmsg_dump.c
29: while (kmsg_dump_get_line(dumper, true, line, sizeof(line), &len)) {
30- line[len] = '\0';

I guess we should fix these first and leave this patch as is?

[PATCH v2 char-misc-next 5/7] mic: vop: Fix init race with shared interrupts

2019-02-22 Thread Vincent Whitchurch

If a vop virtio device's interrupt is shared with others, there is a
window where the interrupt can hit and the ->vqs list can be attempted
to be traversed before register_virtio_device() has initialized it,
leading to a NULL pointer dereference in vop_virtio_intr_handler().

Fix this by keeping a local list of virtqueues in this driver and using
that instead of the list inside the struct virtio_device, similar to how
virtio-pci handles this.

Signed-off-by: Vincent Whitchurch 
---
 drivers/misc/mic/vop/vop_main.c | 42 ++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/mic/vop/vop_main.c b/drivers/misc/mic/vop/vop_main.c
index e37b2c2152a2..6764feea6f55 100644
--- a/drivers/misc/mic/vop/vop_main.c
+++ b/drivers/misc/mic/vop/vop_main.c
@@ -40,6 +40,11 @@
 
 #define VOP_MAX_VRINGS 4
 
+struct vop_vq {
+   struct virtqueue *vq;
+   struct list_head list;
+};
+
 /*
  * _vop_vdev - Allocated per virtio device instance injected by the peer.
  *
@@ -68,6 +73,9 @@ struct _vop_vdev {
int used_size[VOP_MAX_VRINGS];
struct completion reset_done;
struct mic_irq *virtio_cookie;
+   struct vop_vq *vqs;
+   struct list_head virtqueues;
+   spinlock_t virtqueues_lock;
int c2h_vdev_db;
int h2c_vdev_db;
int dnode;
@@ -264,6 +272,12 @@ static void vop_del_vq(struct virtqueue *vq, int n)
 {
struct _vop_vdev *vdev = to_vopvdev(vq->vdev);
struct vop_device *vpdev = vdev->vpdev;
+   struct vop_vq *vopvq = &vdev->vqs[n];
+   unsigned long flags;
+
+   spin_lock_irqsave(&vdev->virtqueues_lock, flags);
+   list_del(&vopvq->list);
+   spin_unlock_irqrestore(&vdev->virtqueues_lock, flags);
 
dma_unmap_single(&vpdev->dev, vdev->used[n],
 vdev->used_size[n], DMA_BIDIRECTIONAL);
@@ -284,6 +298,9 @@ static void vop_del_vqs(struct virtio_device *dev)
 
list_for_each_entry_safe(vq, n, &dev->vqs, list)
vop_del_vq(vq, idx++);
+
+   kfree(vdev->vqs);
+   vdev->vqs = NULL;
 }
 
 static struct virtqueue *vop_new_virtqueue(unsigned int index,
@@ -411,7 +428,13 @@ static int vop_find_vqs(struct virtio_device *dev, 
unsigned nvqs,
if (nvqs > ioread8(&vdev->desc->num_vq))
return -ENOENT;
 
+   vdev->vqs = kcalloc(nvqs, sizeof(*vdev->vqs), GFP_KERNEL);
+   if (!vdev->vqs)
+   return -ENOMEM;
+
for (i = 0; i < nvqs; ++i) {
+   unsigned long flags;
+
if (!names[i]) {
vqs[i] = NULL;
continue;
@@ -425,6 +448,12 @@ static int vop_find_vqs(struct virtio_device *dev, 
unsigned nvqs,
err = PTR_ERR(vqs[i]);
goto error;
}
+
+   vdev->vqs[i].vq = vqs[i];
+
+   spin_lock_irqsave(&vdev->virtqueues_lock, flags);
+   list_add(&vdev->vqs[i].list, &vdev->virtqueues);
+   spin_unlock_irqrestore(&vdev->virtqueues_lock, flags);
}
 
iowrite8(1, &dc->used_address_updated);
@@ -468,13 +497,17 @@ static struct virtio_config_ops vop_vq_config_ops = {
 
 static irqreturn_t vop_virtio_intr_handler(int irq, void *data)
 {
+   unsigned long flags;
struct _vop_vdev *vdev = data;
struct vop_device *vpdev = vdev->vpdev;
-   struct virtqueue *vq;
+   struct vop_vq *vopvq;
 
vpdev->hw_ops->ack_interrupt(vpdev, vdev->h2c_vdev_db);
-   list_for_each_entry(vq, &vdev->vdev.vqs, list)
-   vring_interrupt(0, vq);
+
+   spin_lock_irqsave(&vdev->virtqueues_lock, flags);
+   list_for_each_entry(vopvq, &vdev->virtqueues, list)
+   vring_interrupt(0, vopvq->vq);
+   spin_unlock_irqrestore(&vdev->virtqueues_lock, flags);
 
return IRQ_HANDLED;
 }
@@ -516,6 +549,9 @@ static int _vop_add_device(struct mic_device_desc __iomem 
*d,
vdev->vdev.priv = (void *)(unsigned long)dnode;
init_completion(&vdev->reset_done);
 
+   INIT_LIST_HEAD(&vdev->virtqueues);
+   spin_lock_init(&vdev->virtqueues_lock);
+
vdev->h2c_vdev_db = vpdev->hw_ops->next_db(vpdev);
vdev->virtio_cookie = vpdev->hw_ops->request_irq(vpdev,
vop_virtio_intr_handler, "virtio intr",
-- 
2.20.0

[PATCH v2 char-misc-next 3/7] mic: vop: Allow building on more systems

2019-02-22 Thread Vincent Whitchurch

VOP_BUS does not actually depend on x86-64 or PCI or X86_DEV_DMA_OPS.

The dependency on X86_DEV_DMA_OPS has been unnecessary since commit
5657933dbb6e25fe ("treewide: Move dma_ops from struct dev_archdata into
struct device").

Signed-off-by: Vincent Whitchurch 
---
 drivers/misc/mic/Kconfig | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/misc/mic/Kconfig b/drivers/misc/mic/Kconfig
index 227cc7443671..242dcee14689 100644
--- a/drivers/misc/mic/Kconfig
+++ b/drivers/misc/mic/Kconfig
@@ -38,7 +38,6 @@ comment "VOP Bus Driver"
 
 config VOP_BUS
tristate "VOP Bus Driver"
-   depends on 64BIT && PCI && X86 && X86_DEV_DMA_OPS
help
  This option is selected by any driver which registers a
  device or driver on the VOP Bus, such as CONFIG_INTEL_MIC_HOST
@@ -132,7 +131,7 @@ comment "VOP Driver"
 
 config VOP
tristate "VOP Driver"
-   depends on 64BIT && PCI && X86 && VOP_BUS
+   depends on VOP_BUS
select VHOST_RING
select VIRTIO
help
-- 
2.20.0

[PATCH v2 char-misc-next 4/7] vop: Add loopback driver

2019-02-22 Thread Vincent Whitchurch

Add a loopback driver to allow testing and evaluation of the VOP
framework without special hardware.  The host and the guest will run
under the same kernel.

Signed-off-by: Vincent Whitchurch 
---
 drivers/misc/mic/Kconfig|  10 +
 drivers/misc/mic/vop/Makefile   |   2 +
 drivers/misc/mic/vop/vop_loopback.c | 382 
 3 files changed, 394 insertions(+)
 create mode 100644 drivers/misc/mic/vop/vop_loopback.c

diff --git a/drivers/misc/mic/Kconfig b/drivers/misc/mic/Kconfig
index 242dcee14689..2e2f745afb3a 100644
--- a/drivers/misc/mic/Kconfig
+++ b/drivers/misc/mic/Kconfig
@@ -148,6 +148,16 @@ config VOP
  OS and tools for MIC to use with this driver are available from
  <http://software.intel.com/en-us/mic-developer>.
 
+config VOP_LOOPBACK
+   tristate "VOP loopback driver"
+   depends on VOP
+   help
+ This enables a loopback driver to test and evaluate the VOP
+ infrastructure without actual PCIe hardware.  The host and the guest
+ sides run under the same kernel.
+
+ If unsure, say N.
+
 if VOP
 source "drivers/vhost/Kconfig.vringh"
 endif
diff --git a/drivers/misc/mic/vop/Makefile b/drivers/misc/mic/vop/Makefile
index 78819c8999f1..a6ead25c4418 100644
--- a/drivers/misc/mic/vop/Makefile
+++ b/drivers/misc/mic/vop/Makefile
@@ -7,3 +7,5 @@ obj-m := vop.o
 vop-objs += vop_main.o
 vop-objs += vop_debugfs.o
 vop-objs += vop_vringh.o
+
+obj-$(CONFIG_VOP_LOOPBACK) += vop_loopback.o
diff --git a/drivers/misc/mic/vop/vop_loopback.c 
b/drivers/misc/mic/vop/vop_loopback.c
new file mode 100644
index ..76d787db3d3e
--- /dev/null
+++ b/drivers/misc/mic/vop/vop_loopback.c
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Axis Communications AB
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../bus/vop_bus.h"
+
+struct mic_irq {
+   struct list_head list;
+   int irq;
+   irqreturn_t (*func)(int irq, void *data);
+   void *data;
+};
+
+struct vop_loopback_end {
+   struct vop_loopback *loopback;
+   const char *name;
+   struct vop_device *vop;
+   struct list_head irqs;
+   struct mutex mutex;
+   struct work_struct work;
+};
+
+struct vop_loopback {
+   struct device *dev;
+   void *dp;
+   struct vop_loopback_end host;
+   struct vop_loopback_end guest;
+};
+
+static inline struct vop_loopback *vop_to_loopback(struct device *dev)
+{
+   return dev_get_drvdata(dev->parent);
+}
+
+static dma_addr_t
+vop_loopback_dma_map_page(struct device *dev, struct page *page,
+ unsigned long offset, size_t size,
+ enum dma_data_direction dir, unsigned long attrs)
+{
+   return page_to_phys(page) + offset;
+}
+
+static void vop_loopback_dma_unmap_page(struct device *dev, dma_addr_t 
dma_addr,
+   size_t size, enum dma_data_direction dir,
+   unsigned long attrs)
+{
+}
+
+static int vop_loopback_dma_mmap(struct device *dev, struct vm_area_struct 
*vma,
+void *cpu_addr, dma_addr_t dma_addr, size_t size,
+unsigned long attrs)
+{
+   return remap_pfn_range(vma, vma->vm_start,
+  PHYS_PFN(dma_addr), size,
+  vma->vm_page_prot);
+}
+
+static void *vop_loopback_dma_alloc(struct device *dev, size_t size,
+   dma_addr_t *handle, gfp_t gfp,
+   unsigned long attrs)
+{
+   void *p = (void *) __get_free_pages(gfp, get_order(size));
+
+   if (p)
+   *handle = virt_to_phys(p);
+
+   return p;
+}
+
+static void vop_loopback_dma_free(struct device *dev, size_t size,
+ void *cpu_addr, dma_addr_t handle,
+ unsigned long attrs)
+{
+   free_pages((unsigned long) (uintptr_t) cpu_addr, get_order(size));
+}
+
+static const struct dma_map_ops vop_loopback_dma_ops = {
+   .map_page = vop_loopback_dma_map_page,
+   .unmap_page = vop_loopback_dma_unmap_page,
+   .mmap = vop_loopback_dma_mmap,
+   .alloc = vop_loopback_dma_alloc,
+   .free = vop_loopback_dma_free,
+};
+
+
+static void vop_loopback_ack_interrupt(struct vop_device *vop, int num)
+{
+}
+
+static int vop_loopback_next_db(struct vop_device *vop)
+{
+   return 0;
+}
+
+static void *vop_loopback_get_dp(struct vop_device *vop)
+{
+   struct vop_loopback *loopback = vop_to_loopback(&vop->dev);
+
+   return loopback->dp;
+}
+
+static void __iomem *vop_loopback_get_remote_dp(struct vop_device *vop)
+{
+   struct vop_loopback *loopback = vop_to_loopback(&vop->dev);
+
+   return (void __iomem *) loopback->dp;
+}
+
+static struct mic_irq *
+vop_loopback_reques

[PATCH v2 char-misc-next 4/7] mic: vop: Add loopback driver

2019-02-22 Thread Vincent Whitchurch

Add a loopback driver to allow testing and evaluation of the VOP
framework without special hardware.  The host and the guest will run
under the same kernel.

Signed-off-by: Vincent Whitchurch 
---
 drivers/misc/mic/Kconfig|  10 +
 drivers/misc/mic/vop/Makefile   |   2 +
 drivers/misc/mic/vop/vop_loopback.c | 382 
 3 files changed, 394 insertions(+)
 create mode 100644 drivers/misc/mic/vop/vop_loopback.c

diff --git a/drivers/misc/mic/Kconfig b/drivers/misc/mic/Kconfig
index 242dcee14689..2e2f745afb3a 100644
--- a/drivers/misc/mic/Kconfig
+++ b/drivers/misc/mic/Kconfig
@@ -148,6 +148,16 @@ config VOP
  OS and tools for MIC to use with this driver are available from
  <http://software.intel.com/en-us/mic-developer>.
 
+config VOP_LOOPBACK
+   tristate "VOP loopback driver"
+   depends on VOP
+   help
+ This enables a loopback driver to test and evaluate the VOP
+ infrastructure without actual PCIe hardware.  The host and the guest
+ sides run under the same kernel.
+
+ If unsure, say N.
+
 if VOP
 source "drivers/vhost/Kconfig.vringh"
 endif
diff --git a/drivers/misc/mic/vop/Makefile b/drivers/misc/mic/vop/Makefile
index 78819c8999f1..a6ead25c4418 100644
--- a/drivers/misc/mic/vop/Makefile
+++ b/drivers/misc/mic/vop/Makefile
@@ -7,3 +7,5 @@ obj-m := vop.o
 vop-objs += vop_main.o
 vop-objs += vop_debugfs.o
 vop-objs += vop_vringh.o
+
+obj-$(CONFIG_VOP_LOOPBACK) += vop_loopback.o
diff --git a/drivers/misc/mic/vop/vop_loopback.c 
b/drivers/misc/mic/vop/vop_loopback.c
new file mode 100644
index ..76d787db3d3e
--- /dev/null
+++ b/drivers/misc/mic/vop/vop_loopback.c
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Axis Communications AB
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../bus/vop_bus.h"
+
+struct mic_irq {
+   struct list_head list;
+   int irq;
+   irqreturn_t (*func)(int irq, void *data);
+   void *data;
+};
+
+struct vop_loopback_end {
+   struct vop_loopback *loopback;
+   const char *name;
+   struct vop_device *vop;
+   struct list_head irqs;
+   struct mutex mutex;
+   struct work_struct work;
+};
+
+struct vop_loopback {
+   struct device *dev;
+   void *dp;
+   struct vop_loopback_end host;
+   struct vop_loopback_end guest;
+};
+
+static inline struct vop_loopback *vop_to_loopback(struct device *dev)
+{
+   return dev_get_drvdata(dev->parent);
+}
+
+static dma_addr_t
+vop_loopback_dma_map_page(struct device *dev, struct page *page,
+ unsigned long offset, size_t size,
+ enum dma_data_direction dir, unsigned long attrs)
+{
+   return page_to_phys(page) + offset;
+}
+
+static void vop_loopback_dma_unmap_page(struct device *dev, dma_addr_t 
dma_addr,
+   size_t size, enum dma_data_direction dir,
+   unsigned long attrs)
+{
+}
+
+static int vop_loopback_dma_mmap(struct device *dev, struct vm_area_struct 
*vma,
+void *cpu_addr, dma_addr_t dma_addr, size_t size,
+unsigned long attrs)
+{
+   return remap_pfn_range(vma, vma->vm_start,
+  PHYS_PFN(dma_addr), size,
+  vma->vm_page_prot);
+}
+
+static void *vop_loopback_dma_alloc(struct device *dev, size_t size,
+   dma_addr_t *handle, gfp_t gfp,
+   unsigned long attrs)
+{
+   void *p = (void *) __get_free_pages(gfp, get_order(size));
+
+   if (p)
+   *handle = virt_to_phys(p);
+
+   return p;
+}
+
+static void vop_loopback_dma_free(struct device *dev, size_t size,
+ void *cpu_addr, dma_addr_t handle,
+ unsigned long attrs)
+{
+   free_pages((unsigned long) (uintptr_t) cpu_addr, get_order(size));
+}
+
+static const struct dma_map_ops vop_loopback_dma_ops = {
+   .map_page = vop_loopback_dma_map_page,
+   .unmap_page = vop_loopback_dma_unmap_page,
+   .mmap = vop_loopback_dma_mmap,
+   .alloc = vop_loopback_dma_alloc,
+   .free = vop_loopback_dma_free,
+};
+
+
+static void vop_loopback_ack_interrupt(struct vop_device *vop, int num)
+{
+}
+
+static int vop_loopback_next_db(struct vop_device *vop)
+{
+   return 0;
+}
+
+static void *vop_loopback_get_dp(struct vop_device *vop)
+{
+   struct vop_loopback *loopback = vop_to_loopback(&vop->dev);
+
+   return loopback->dp;
+}
+
+static void __iomem *vop_loopback_get_remote_dp(struct vop_device *vop)
+{
+   struct vop_loopback *loopback = vop_to_loopback(&vop->dev);
+
+   return (void __iomem *) loopback->dp;
+}
+
+static struct mic_irq *
+vop_loopback_reques

[PATCH v2 char-misc-next 1/7] vop: Cast pointers to unsigned long

2019-02-22 Thread Vincent Whitchurch

Fix these on 32-bit:

 vop_vringh.c:711:13: error: cast from pointer to integer of different
 size [-Werror=pointer-to-int-cast]

Signed-off-by: Vincent Whitchurch 
---
 drivers/misc/mic/vop/vop_main.c   | 13 +
 drivers/misc/mic/vop/vop_vringh.c |  5 +++--
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/misc/mic/vop/vop_main.c b/drivers/misc/mic/vop/vop_main.c
index 3f96e19e57eb..263e0c3fa7c6 100644
--- a/drivers/misc/mic/vop/vop_main.c
+++ b/drivers/misc/mic/vop/vop_main.c
@@ -514,7 +514,7 @@ static int _vop_add_device(struct mic_device_desc __iomem 
*d,
vdev->desc = d;
vdev->dc = (void __iomem *)d + _vop_aligned_desc_size(d);
vdev->dnode = dnode;
-   vdev->vdev.priv = (void *)(u64)dnode;
+   vdev->vdev.priv = (void *)(unsigned long)dnode;
init_completion(&vdev->reset_done);
 
vdev->h2c_vdev_db = vpdev->hw_ops->next_db(vpdev);
@@ -536,7 +536,7 @@ static int _vop_add_device(struct mic_device_desc __iomem 
*d,
offset, type);
goto free_irq;
}
-   writeq((u64)vdev, &vdev->dc->vdev);
+   writeq((unsigned long)vdev, &vdev->dc->vdev);
dev_dbg(_vop_dev(vdev), "%s: registered vop device %u type %u vdev 
%p\n",
__func__, offset, type, vdev);
 
@@ -563,13 +563,18 @@ static int vop_match_desc(struct device *dev, void *data)
return vdev->desc == (void __iomem *)data;
 }
 
+static struct _vop_vdev *vop_dc_to_vdev(struct mic_device_ctrl *dc)
+{
+   return (struct _vop_vdev *)(unsigned long)readq(&dc->vdev);
+}
+
 static void _vop_handle_config_change(struct mic_device_desc __iomem *d,
  unsigned int offset,
  struct vop_device *vpdev)
 {
struct mic_device_ctrl __iomem *dc
= (void __iomem *)d + _vop_aligned_desc_size(d);
-   struct _vop_vdev *vdev = (struct _vop_vdev *)readq(&dc->vdev);
+   struct _vop_vdev *vdev = vop_dc_to_vdev(dc);
 
if (ioread8(&dc->config_change) != MIC_VIRTIO_PARAM_CONFIG_CHANGED)
return;
@@ -588,7 +593,7 @@ static int _vop_remove_device(struct mic_device_desc 
__iomem *d,
 {
struct mic_device_ctrl __iomem *dc
= (void __iomem *)d + _vop_aligned_desc_size(d);
-   struct _vop_vdev *vdev = (struct _vop_vdev *)readq(&dc->vdev);
+   struct _vop_vdev *vdev = vop_dc_to_vdev(dc);
u8 status;
int ret = -1;
 
diff --git a/drivers/misc/mic/vop/vop_vringh.c 
b/drivers/misc/mic/vop/vop_vringh.c
index 0bac8bce933c..73d9fe6bc18d 100644
--- a/drivers/misc/mic/vop/vop_vringh.c
+++ b/drivers/misc/mic/vop/vop_vringh.c
@@ -712,16 +712,17 @@ static int vop_vringh_copy(struct vop_vdev *vdev, struct 
vringh_kiov *iov,
 
while (len && iov->i < iov->used) {
struct kvec *kiov = &iov->iov[iov->i];
+   unsigned long daddr = (unsigned long)kiov->iov_base;
 
partlen = min(kiov->iov_len, len);
if (read)
ret = vop_virtio_copy_to_user(vdev, ubuf, partlen,
- (u64)kiov->iov_base,
+ daddr,
  kiov->iov_len,
  vr_idx);
else
ret = vop_virtio_copy_from_user(vdev, ubuf, partlen,
-   (u64)kiov->iov_base,
+   daddr,
kiov->iov_len,
vr_idx);
if (ret) {
-- 
2.20.0

[PATCH v2 char-misc-next 3/7] vop: Allow building on more systems

2019-02-22 Thread Vincent Whitchurch

VOP_BUS does not actually depend on x86-64 or PCI or X86_DEV_DMA_OPS.

The dependency on X86_DEV_DMA_OPS has been unnecessary since commit
5657933dbb6e25fe ("treewide: Move dma_ops from struct dev_archdata into
struct device").

Signed-off-by: Vincent Whitchurch 
---
 drivers/misc/mic/Kconfig | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/misc/mic/Kconfig b/drivers/misc/mic/Kconfig
index 227cc7443671..242dcee14689 100644
--- a/drivers/misc/mic/Kconfig
+++ b/drivers/misc/mic/Kconfig
@@ -38,7 +38,6 @@ comment "VOP Bus Driver"
 
 config VOP_BUS
tristate "VOP Bus Driver"
-   depends on 64BIT && PCI && X86 && X86_DEV_DMA_OPS
help
  This option is selected by any driver which registers a
  device or driver on the VOP Bus, such as CONFIG_INTEL_MIC_HOST
@@ -132,7 +131,7 @@ comment "VOP Driver"
 
 config VOP
tristate "VOP Driver"
-   depends on 64BIT && PCI && X86 && VOP_BUS
+   depends on VOP_BUS
select VHOST_RING
select VIRTIO
help
-- 
2.20.0

[PATCH v2 char-misc-next 6/7] samples: mic: Split out vop code from mpssd

2019-02-22 Thread Vincent Whitchurch

mpssd.c contains both code to boot the MIC hardware and code to serve
virtio devices over vop.  Split the vop-related code to a separate file
so that it can be used by other hardware and vop-loopback.

Except for changes to change_virtblk_backend() and the introduction of
serve_virtio(), the rest is just movement of unmodified code between
files.

Signed-off-by: Vincent Whitchurch 
---
 samples/mic/mpssd/Makefile |4 +-
 samples/mic/mpssd/mpssd.c  | 1317 +-
 samples/mic/mpssd/mpssd.h  |1 +
 samples/mic/mpssd/vop.c| 1357 
 4 files changed, 1364 insertions(+), 1315 deletions(-)
 create mode 100644 samples/mic/mpssd/vop.c

diff --git a/samples/mic/mpssd/Makefile b/samples/mic/mpssd/Makefile
index a7a6e0c70424..bc94054dbe2b 100644
--- a/samples/mic/mpssd/Makefile
+++ b/samples/mic/mpssd/Makefile
@@ -14,8 +14,8 @@ CFLAGS += -DDEBUG=$(DEBUG)
 endif
 
 all: $(PROGS)
-mpssd: mpssd.c sysfs.c
-   $(CC) $(CFLAGS) mpssd.c sysfs.c -o mpssd -lpthread
+mpssd: mpssd.c sysfs.c vop.o
+   $(CC) $(CFLAGS) $^ -o $@ -lpthread
 
 install:
install mpssd /usr/sbin/mpssd
diff --git a/samples/mic/mpssd/mpssd.c b/samples/mic/mpssd/mpssd.c
index f42ce551bb48..5e4233c1ff93 100644
--- a/samples/mic/mpssd/mpssd.c
+++ b/samples/mic/mpssd/mpssd.c
@@ -31,1292 +31,18 @@
 #include 
 #include 
 #include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
 #include "mpssd.h"
 #include 
 #include 
 #include 
 
-static void *init_mic(void *arg);
-
 static FILE *logfp;
-static struct mic_info mic_list;
-
-#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
-
-#define min_t(type, x, y) ({   \
-   type __min1 = (x);  \
-   type __min2 = (y);  \
-   __min1 < __min2 ? __min1 : __min2; })
-
-/* align addr on a size boundary - adjust address up/down if needed */
-#define _ALIGN_DOWN(addr, size)  ((addr)&(~((size)-1)))
-#define _ALIGN_UP(addr, size)_ALIGN_DOWN(addr + size - 1, size)
-
-/* align addr on a size boundary - adjust address up if needed */
-#define _ALIGN(addr, size) _ALIGN_UP(addr, size)
-
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)_ALIGN(addr, PAGE_SIZE)
-
-#define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
-
-#define GSO_ENABLED1
-#define MAX_GSO_SIZE   (64 * 1024)
-#define ETH_H_LEN  14
-#define MAX_NET_PKT_SIZE   (_ALIGN_UP(MAX_GSO_SIZE + ETH_H_LEN, 64))
-#define MIC_DEVICE_PAGE_END0x1000
-
-#ifndef VIRTIO_NET_HDR_F_DATA_VALID
-#define VIRTIO_NET_HDR_F_DATA_VALID2   /* Csum is valid */
-#endif
 
-static struct {
-   struct mic_device_desc dd;
-   struct mic_vqconfig vqconfig[2];
-   __u32 host_features, guest_acknowledgements;
-   struct virtio_console_config cons_config;
-} virtcons_dev_page = {
-   .dd = {
-   .type = VIRTIO_ID_CONSOLE,
-   .num_vq = ARRAY_SIZE(virtcons_dev_page.vqconfig),
-   .feature_len = sizeof(virtcons_dev_page.host_features),
-   .config_len = sizeof(virtcons_dev_page.cons_config),
-   },
-   .vqconfig[0] = {
-   .num = htole16(MIC_VRING_ENTRIES),
-   },
-   .vqconfig[1] = {
-   .num = htole16(MIC_VRING_ENTRIES),
-   },
-};
-
-static struct {
-   struct mic_device_desc dd;
-   struct mic_vqconfig vqconfig[2];
-   __u32 host_features, guest_acknowledgements;
-   struct virtio_net_config net_config;
-} virtnet_dev_page = {
-   .dd = {
-   .type = VIRTIO_ID_NET,
-   .num_vq = ARRAY_SIZE(virtnet_dev_page.vqconfig),
-   .feature_len = sizeof(virtnet_dev_page.host_features),
-   .config_len = sizeof(virtnet_dev_page.net_config),
-   },
-   .vqconfig[0] = {
-   .num = htole16(MIC_VRING_ENTRIES),
-   },
-   .vqconfig[1] = {
-   .num = htole16(MIC_VRING_ENTRIES),
-   },
-#if GSO_ENABLED
-   .host_features = htole32(
-   1 << VIRTIO_NET_F_CSUM |
-   1 << VIRTIO_NET_F_GSO |
-   1 << VIRTIO_NET_F_GUEST_TSO4 |
-   1 << VIRTIO_NET_F_GUEST_TSO6 |
-   1 << VIRTIO_NET_F_GUEST_ECN),
-#else
-   .host_features = 0,
-#endif
-};
-
-static const char *mic_config_dir = "/etc/mpss";
-static const char *virtblk_backend = "VIRTBLK_BACKEND";
-static struct {
-   struct mic_device_desc dd;
-   struct mic_vqconfig vqconfig[1];
-   __u32 host_features, guest_acknowledgements;
-   struct virtio_blk_config blk_config;
-} virtblk_dev_page = {
-   .dd = {
-   .type = VIRTIO_ID_BLOCK,
-   .num_vq = ARRAY_SIZE(virtblk_dev_page.vqconfig),
-   .feature_len = sizeof(virtblk_dev_page.host_feat

[PATCH v2 char-misc-next 2/7] mic: Rename ioremap pointer to remap

2019-02-22 Thread Vincent Whitchurch

Some architectures (like MIPS) implement ioremap as a macro, and this
leads to conflicts with the ioremap function pointer in various mic
structures.

 drivers/misc/mic/vop/vop_vringh.c:
   In function 'vop_virtio_init_post':
 drivers/misc/mic/vop/vop_vringh.c:86:13:
   error: macro "ioremap" passed 3 arguments, but takes just 2

Rename ioremap to remap to fix this.  Likewise for iounmap.

Signed-off-by: Vincent Whitchurch 
---
 drivers/misc/mic/bus/scif_bus.h|  8 
 drivers/misc/mic/bus/vop_bus.h |  8 
 drivers/misc/mic/card/mic_device.c |  8 
 drivers/misc/mic/host/mic_boot.c   |  8 
 drivers/misc/mic/scif/scif_map.h   |  4 ++--
 drivers/misc/mic/vop/vop_main.c|  7 +++
 drivers/misc/mic/vop/vop_vringh.c  | 10 +-
 7 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/drivers/misc/mic/bus/scif_bus.h b/drivers/misc/mic/bus/scif_bus.h
index ff59568219ad..377a4f38cd7e 100644
--- a/drivers/misc/mic/bus/scif_bus.h
+++ b/drivers/misc/mic/bus/scif_bus.h
@@ -88,8 +88,8 @@ struct scif_driver {
  * @send_intr: Send an interrupt to the remote node on a specified doorbell.
  * @send_p2p_intr: Send an interrupt to the peer node on a specified doorbell
  * which is specifically targeted for a peer to peer node.
- * @ioremap: Map a buffer with the specified physical address and length.
- * @iounmap: Unmap a buffer previously mapped.
+ * @remap: Map a buffer with the specified physical address and length.
+ * @unmap: Unmap a buffer previously mapped.
  */
 struct scif_hw_ops {
int (*next_db)(struct scif_hw_dev *sdev);
@@ -104,9 +104,9 @@ struct scif_hw_ops {
void (*send_intr)(struct scif_hw_dev *sdev, int db);
void (*send_p2p_intr)(struct scif_hw_dev *sdev, int db,
  struct mic_mw *mw);
-   void __iomem * (*ioremap)(struct scif_hw_dev *sdev,
+   void __iomem * (*remap)(struct scif_hw_dev *sdev,
  phys_addr_t pa, size_t len);
-   void (*iounmap)(struct scif_hw_dev *sdev, void __iomem *va);
+   void (*unmap)(struct scif_hw_dev *sdev, void __iomem *va);
 };
 
 int scif_register_driver(struct scif_driver *driver);
diff --git a/drivers/misc/mic/bus/vop_bus.h b/drivers/misc/mic/bus/vop_bus.h
index fff7a865d721..cf5f3fae573c 100644
--- a/drivers/misc/mic/bus/vop_bus.h
+++ b/drivers/misc/mic/bus/vop_bus.h
@@ -87,8 +87,8 @@ struct vop_driver {
  * @get_dp: Get access to the virtio device page used by the self
  *  node to add/remove/configure virtio devices.
  * @send_intr: Send an interrupt to the peer node on a specified doorbell.
- * @ioremap: Map a buffer with the specified DMA address and length.
- * @iounmap: Unmap a buffer previously mapped.
+ * @remap: Map a buffer with the specified DMA address and length.
+ * @unmap: Unmap a buffer previously mapped.
  * @dma_filter: The DMA filter function to use for obtaining access to
  * a DMA channel on the peer node.
  */
@@ -104,9 +104,9 @@ struct vop_hw_ops {
void __iomem * (*get_remote_dp)(struct vop_device *vpdev);
void * (*get_dp)(struct vop_device *vpdev);
void (*send_intr)(struct vop_device *vpdev, int db);
-   void __iomem * (*ioremap)(struct vop_device *vpdev,
+   void __iomem * (*remap)(struct vop_device *vpdev,
  dma_addr_t pa, size_t len);
-   void (*iounmap)(struct vop_device *vpdev, void __iomem *va);
+   void (*unmap)(struct vop_device *vpdev, void __iomem *va);
 };
 
 struct vop_device *
diff --git a/drivers/misc/mic/card/mic_device.c 
b/drivers/misc/mic/card/mic_device.c
index e749af48f736..dcd07ef29801 100644
--- a/drivers/misc/mic/card/mic_device.c
+++ b/drivers/misc/mic/card/mic_device.c
@@ -245,8 +245,8 @@ static struct scif_hw_ops scif_hw_ops = {
.next_db = ___mic_next_db,
.send_intr = ___mic_send_intr,
.send_p2p_intr = ___mic_send_p2p_intr,
-   .ioremap = ___mic_ioremap,
-   .iounmap = ___mic_iounmap,
+   .remap = ___mic_ioremap,
+   .unmap = ___mic_iounmap,
 };
 
 static inline struct mic_driver *vpdev_to_mdrv(struct vop_device *vpdev)
@@ -316,8 +316,8 @@ static struct vop_hw_ops vop_hw_ops = {
.next_db = __mic_next_db,
.get_remote_dp = __mic_get_remote_dp,
.send_intr = __mic_send_intr,
-   .ioremap = __mic_ioremap,
-   .iounmap = __mic_iounmap,
+   .remap = __mic_ioremap,
+   .unmap = __mic_iounmap,
 };
 
 static int mic_request_dma_chans(struct mic_driver *mdrv)
diff --git a/drivers/misc/mic/host/mic_boot.c b/drivers/misc/mic/host/mic_boot.c
index 6479435ac96b..079c36f0ce6e 100644
--- a/drivers/misc/mic/host/mic_boot.c
+++ b/drivers/misc/mic/host/mic_boot.c
@@ -133,8 +133,8 @@ static struct vop_hw_ops vop_hw_ops = {
.get_dp = __mic_get_dp,
.get_remote_dp = __mic_get_remote_dp,
.send_intr = __mic_send_intr,
-   .ioremap = __mic_ioremap,
-   .iounmap = __mic_iou

[PATCH v2 char-misc-next 1/7] mic: vop: Cast pointers to unsigned long

2019-02-22 Thread Vincent Whitchurch

Fix these on 32-bit:

 vop_vringh.c:711:13: error: cast from pointer to integer of different
 size [-Werror=pointer-to-int-cast]

Signed-off-by: Vincent Whitchurch 
---
 drivers/misc/mic/vop/vop_main.c   | 13 +
 drivers/misc/mic/vop/vop_vringh.c |  5 +++--
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/misc/mic/vop/vop_main.c b/drivers/misc/mic/vop/vop_main.c
index 3f96e19e57eb..263e0c3fa7c6 100644
--- a/drivers/misc/mic/vop/vop_main.c
+++ b/drivers/misc/mic/vop/vop_main.c
@@ -514,7 +514,7 @@ static int _vop_add_device(struct mic_device_desc __iomem 
*d,
vdev->desc = d;
vdev->dc = (void __iomem *)d + _vop_aligned_desc_size(d);
vdev->dnode = dnode;
-   vdev->vdev.priv = (void *)(u64)dnode;
+   vdev->vdev.priv = (void *)(unsigned long)dnode;
init_completion(&vdev->reset_done);
 
vdev->h2c_vdev_db = vpdev->hw_ops->next_db(vpdev);
@@ -536,7 +536,7 @@ static int _vop_add_device(struct mic_device_desc __iomem 
*d,
offset, type);
goto free_irq;
}
-   writeq((u64)vdev, &vdev->dc->vdev);
+   writeq((unsigned long)vdev, &vdev->dc->vdev);
dev_dbg(_vop_dev(vdev), "%s: registered vop device %u type %u vdev 
%p\n",
__func__, offset, type, vdev);
 
@@ -563,13 +563,18 @@ static int vop_match_desc(struct device *dev, void *data)
return vdev->desc == (void __iomem *)data;
 }
 
+static struct _vop_vdev *vop_dc_to_vdev(struct mic_device_ctrl *dc)
+{
+   return (struct _vop_vdev *)(unsigned long)readq(&dc->vdev);
+}
+
 static void _vop_handle_config_change(struct mic_device_desc __iomem *d,
  unsigned int offset,
  struct vop_device *vpdev)
 {
struct mic_device_ctrl __iomem *dc
= (void __iomem *)d + _vop_aligned_desc_size(d);
-   struct _vop_vdev *vdev = (struct _vop_vdev *)readq(&dc->vdev);
+   struct _vop_vdev *vdev = vop_dc_to_vdev(dc);
 
if (ioread8(&dc->config_change) != MIC_VIRTIO_PARAM_CONFIG_CHANGED)
return;
@@ -588,7 +593,7 @@ static int _vop_remove_device(struct mic_device_desc 
__iomem *d,
 {
struct mic_device_ctrl __iomem *dc
= (void __iomem *)d + _vop_aligned_desc_size(d);
-   struct _vop_vdev *vdev = (struct _vop_vdev *)readq(&dc->vdev);
+   struct _vop_vdev *vdev = vop_dc_to_vdev(dc);
u8 status;
int ret = -1;
 
diff --git a/drivers/misc/mic/vop/vop_vringh.c 
b/drivers/misc/mic/vop/vop_vringh.c
index 0bac8bce933c..73d9fe6bc18d 100644
--- a/drivers/misc/mic/vop/vop_vringh.c
+++ b/drivers/misc/mic/vop/vop_vringh.c
@@ -712,16 +712,17 @@ static int vop_vringh_copy(struct vop_vdev *vdev, struct 
vringh_kiov *iov,
 
while (len && iov->i < iov->used) {
struct kvec *kiov = &iov->iov[iov->i];
+   unsigned long daddr = (unsigned long)kiov->iov_base;
 
partlen = min(kiov->iov_len, len);
if (read)
ret = vop_virtio_copy_to_user(vdev, ubuf, partlen,
- (u64)kiov->iov_base,
+ daddr,
  kiov->iov_len,
  vr_idx);
else
ret = vop_virtio_copy_from_user(vdev, ubuf, partlen,
-   (u64)kiov->iov_base,
+   daddr,
kiov->iov_len,
vr_idx);
if (ret) {
-- 
2.20.0

[PATCH v2 char-misc-next 0/7] Virtio-over-PCIe on non-MIC

2019-02-22 Thread Vincent Whitchurch

This allows the drivers/misc/mic/vop/ code to be used on other architectures
and adds a loopback driver to be able to use/test/evaluate it without special
hardware.

See v1 for more info:
https://lore.kernel.org/lkml/20190116163253.23780-1-vincent.whitchu...@axis.com/

v2:
- Rebase on top of the patches which have merged
- Use unsigned long instead of uintptr_t
- Drop "Use consistent DMA" for now since it's not needed for loopback
- Add race fix
- Build on all architectures, not just ARM and x86
- Add sample userspace code

Vincent Whitchurch (7):
  mic: vop: Cast pointers to unsigned long
  mic: Rename ioremap pointer to remap
  mic: vop: Allow building on more systems
  mic: vop: Add loopback driver
  mic: vop: Fix init race with shared interrupts
  samples: mic: Split out vop code from mpssd
  samples: mic: Add sample VOP userspace

 drivers/misc/mic/Kconfig|   13 +-
 drivers/misc/mic/bus/scif_bus.h |8 +-
 drivers/misc/mic/bus/vop_bus.h  |8 +-
 drivers/misc/mic/card/mic_device.c  |8 +-
 drivers/misc/mic/host/mic_boot.c|8 +-
 drivers/misc/mic/scif/scif_map.h|4 +-
 drivers/misc/mic/vop/Makefile   |2 +
 drivers/misc/mic/vop/vop_loopback.c |  382 
 drivers/misc/mic/vop/vop_main.c |   62 +-
 drivers/misc/mic/vop/vop_vringh.c   |   15 +-
 samples/mic/mpssd/.gitignore|1 +
 samples/mic/mpssd/Makefile  |9 +-
 samples/mic/mpssd/mpssd.c   | 1317 +-
 samples/mic/mpssd/mpssd.h   |1 +
 samples/mic/mpssd/vop.c | 1357 +++
 samples/mic/mpssd/vopd.c|   25 +
 16 files changed, 1866 insertions(+), 1354 deletions(-)
 create mode 100644 drivers/misc/mic/vop/vop_loopback.c
 create mode 100644 samples/mic/mpssd/vop.c
 create mode 100644 samples/mic/mpssd/vopd.c

-- 
2.20.0

[PATCH v2 char-misc-next 7/7] samples: mic: Add sample VOP userspace

2019-02-22 Thread Vincent Whitchurch

Add a sample userspace program which can be used with the VOP framework
with vop-loopback and non-MIC hardware.

For example, the following commands can be used to test networking over
vop-loopback.  (IPv6 is used with zone indices to force traffic through
the interfaces instead of short-circuit delivery, since both interfaces
are on the same system.)

 # modprobe vop_loopback
 # ./vopd &

 Setup host interface:
 # ip a a dev mic0 scope link fe80::0
 # ip link set dev mic0 up

 Setup guest interface:
 # ip a a dev eth1 scope link fe80::1
 # ip link set dev eth1 up

 Ping from host to guest:
 # ping6 fe80::1%mic0

 Ping from guest to host:
 # ping6 fe80::0%eth1

Signed-off-by: Vincent Whitchurch 
---
 samples/mic/mpssd/.gitignore |  1 +
 samples/mic/mpssd/Makefile   |  5 -
 samples/mic/mpssd/vopd.c | 25 +
 3 files changed, 30 insertions(+), 1 deletion(-)
 create mode 100644 samples/mic/mpssd/vopd.c

diff --git a/samples/mic/mpssd/.gitignore b/samples/mic/mpssd/.gitignore
index 8b7c72f07c92..34c001135abb 100644
--- a/samples/mic/mpssd/.gitignore
+++ b/samples/mic/mpssd/.gitignore
@@ -1 +1,2 @@
 mpssd
+vopd
diff --git a/samples/mic/mpssd/Makefile b/samples/mic/mpssd/Makefile
index bc94054dbe2b..b94b18de5a43 100644
--- a/samples/mic/mpssd/Makefile
+++ b/samples/mic/mpssd/Makefile
@@ -5,7 +5,7 @@ ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e 
s/x86_64/x86/)
 
 ifeq ($(ARCH),x86)
 
-PROGS := mpssd
+PROGS := mpssd vopd
 CC = $(CROSS_COMPILE)gcc
 CFLAGS := -I../../../usr/include -I../../../tools/include
 
@@ -17,6 +17,9 @@ all: $(PROGS)
 mpssd: mpssd.c sysfs.c vop.o
$(CC) $(CFLAGS) $^ -o $@ -lpthread
 
+vopd: vopd.c vop.o
+   $(CC) $(CFLAGS) $^ -o $@ -lpthread
+
 install:
install mpssd /usr/sbin/mpssd
install micctrl /usr/sbin/micctrl
diff --git a/samples/mic/mpssd/vopd.c b/samples/mic/mpssd/vopd.c
new file mode 100644
index ..48c1877fed28
--- /dev/null
+++ b/samples/mic/mpssd/vopd.c
@@ -0,0 +1,25 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include 
+
+#include "mpssd.h"
+
+void mpsslog(char *format, ...)
+{
+   va_list args;
+
+   va_start(args, format);
+   vprintf(format, args);
+   va_end(args);
+}
+
+int main(int argc, char *argv[])
+{
+   struct mic_info themic = {
+   .name = "mic",
+   };
+
+   serve_virtio(&themic, &themic);
+
+   return 0;
+}
-- 
2.20.0

[PATCH] Documentation/sysctl/vm.txt: Fix drop_caches bit number

2019-01-11 Thread Vincent Whitchurch

Bits are usually numbered starting from zero, so 4 should be bit 2, not
bit 3.

Suggested-by: Matthew Wilcox 
Signed-off-by: Vincent Whitchurch 
---
 Documentation/sysctl/vm.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 187ce4f599a2..6af24cdb25cc 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -237,7 +237,7 @@ used:
cat (1234): drop_caches: 3
 
 These are informational only.  They do not mean that anything is wrong
-with your system.  To disable them, echo 4 (bit 3) into drop_caches.
+with your system.  To disable them, echo 4 (bit 2) into drop_caches.
 
 ==
 
-- 
2.20.0

[PATCH v3] kmemleak: add module param to print warnings to dmesg

2018-09-03 Thread Vincent Whitchurch

Currently, kmemleak only prints the number of suspected leaks to dmesg
but requires the user to read a debugfs file to get the actual stack
traces of the objects' allocation points.  Add a module option to print
the full object information to dmesg too.  It can be enabled with
kmemleak.verbose=1 on the kernel command line, or "echo 1 >
/sys/module/kmemleak/parameters/verbose":

This allows easier integration of kmemleak into test systems:  We have
automated test infrastructure to test our Linux systems.  With this
option, running our tests with kmemleak is as simple as enabling
kmemleak and passing this command line option; the test infrastructure
knows how to save kernel logs, which will now include kmemleak reports.
Without this option, the test infrastructure needs to be specifically
taught to read out the kmemleak debugfs file.  Removing this need for
special handling makes kmemleak more similar to other kernel debug
options (slab debugging, debug objects, etc).

Signed-off-by: Vincent Whitchurch 
---
v3: Expand use case description.  Replace config option with module parameter.

 mm/kmemleak.c | 42 +++---
 1 file changed, 35 insertions(+), 7 deletions(-)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 9a085d525bbc..c91d43738596 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -86,6 +86,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -181,6 +182,7 @@ struct kmemleak_object {
 /* flag set to not scan the object */
 #define OBJECT_NO_SCAN (1 << 2)
 
+#define HEX_PREFIX ""
 /* number of bytes to print per line; must be 16 or 32 */
 #define HEX_ROW_SIZE   16
 /* number of bytes to print at a time (1, 2, 4, 8) */
@@ -235,6 +237,9 @@ static int kmemleak_skip_disable;
 /* If there are leaks that can be reported */
 static bool kmemleak_found_leaks;
 
+static bool kmemleak_verbose;
+module_param_named(verbose, kmemleak_verbose, bool, 0600);
+
 /*
  * Early object allocation/freeing logging. Kmemleak is initialized after the
  * kernel allocator. However, both the kernel allocator and kmemleak may
@@ -299,6 +304,25 @@ static void kmemleak_disable(void);
kmemleak_disable(); \
 } while (0)
 
+#define warn_or_seq_printf(seq, fmt, ...)  do {\
+   if (seq)\
+   seq_printf(seq, fmt, ##__VA_ARGS__);\
+   else\
+   pr_warn(fmt, ##__VA_ARGS__);\
+} while (0)
+
+static void warn_or_seq_hex_dump(struct seq_file *seq, int prefix_type,
+int rowsize, int groupsize, const void *buf,
+size_t len, bool ascii)
+{
+   if (seq)
+   seq_hex_dump(seq, HEX_PREFIX, prefix_type, rowsize, groupsize,
+buf, len, ascii);
+   else
+   print_hex_dump(KERN_WARNING, pr_fmt(HEX_PREFIX), prefix_type,
+  rowsize, groupsize, buf, len, ascii);
+}
+
 /*
  * Printing of the objects hex dump to the seq file. The number of lines to be
  * printed is limited to HEX_MAX_LINES to prevent seq file spamming. The
@@ -314,10 +338,10 @@ static void hex_dump_object(struct seq_file *seq,
/* limit the number of lines to HEX_MAX_LINES */
len = min_t(size_t, object->size, HEX_MAX_LINES * HEX_ROW_SIZE);
 
-   seq_printf(seq, "  hex dump (first %zu bytes):\n", len);
+   warn_or_seq_printf(seq, "  hex dump (first %zu bytes):\n", len);
kasan_disable_current();
-   seq_hex_dump(seq, "", DUMP_PREFIX_NONE, HEX_ROW_SIZE,
-HEX_GROUP_SIZE, ptr, len, HEX_ASCII);
+   warn_or_seq_hex_dump(seq, DUMP_PREFIX_NONE, HEX_ROW_SIZE,
+HEX_GROUP_SIZE, ptr, len, HEX_ASCII);
kasan_enable_current();
 }
 
@@ -365,17 +389,17 @@ static void print_unreferenced(struct seq_file *seq,
int i;
unsigned int msecs_age = jiffies_to_msecs(jiffies - object->jiffies);
 
-   seq_printf(seq, "unreferenced object 0x%08lx (size %zu):\n",
+   warn_or_seq_printf(seq, "unreferenced object 0x%08lx (size %zu):\n",
   object->pointer, object->size);
-   seq_printf(seq, "  comm \"%s\", pid %d, jiffies %lu (age %d.%03ds)\n",
+   warn_or_seq_printf(seq, "  comm \"%s\", pid %d, jiffies %lu (age 
%d.%03ds)\n",
   object->comm, object->pid, object->jiffies,
   msecs_age / 1000, msecs_age % 1000);
hex_dump_object(seq, object);
-   seq_printf(seq, "  backtrace:\n");
+   warn_or_seq_printf(seq, "  backtrace:\n");
 
for (i = 0; i < object->trace_len; i++) {
void *ptr = (void *)object->trace[i];
-   seq_printf(s

Re: [PATCH v2] dynamic debug: allow printing to trace event

2020-08-25 Thread Vincent Whitchurch

On Fri, Aug 14, 2020 at 11:30:34PM +0200, Jason Baron wrote:
> On 8/14/20 1:15 PM, Steven Rostedt wrote:
> > On Fri, 14 Aug 2020 15:31:51 +0200
> > Vincent Whitchurch  wrote:
> >> index aa9ff9e1c0b3..f599ed21ecc5 100644
> >> --- a/include/linux/dynamic_debug.h
> >> +++ b/include/linux/dynamic_debug.h
> >> @@ -27,13 +27,16 @@ struct _ddebug {
> >> * writes commands to /dynamic_debug/control
> >> */
> >>  #define _DPRINTK_FLAGS_NONE   0
> >> -#define _DPRINTK_FLAGS_PRINT  (1<<0) /* printk() a message using the 
> >> format */
> >> +#define _DPRINTK_FLAGS_PRINTK (1<<0) /* printk() a message using the 
> >> format */
> > 
> > The above looks like a cleanup unrelated to this patch, and probably
> > should be on its own.
> 
> I read it as we used to have this one thing called 'print', which really meant
> printk, but now that we also have the ability to output to the trace buffer,
> what does 'print' mean now? So I read it as being part of this change.

Yes, that's what was intended, but I think it makes sense to split it
out as Steven suggested so I've done that now (and also renamed the
combined flag to the less ambiguous _DPRINTK_FLAGS_ENABLE).

>
> > 
> >>  #define _DPRINTK_FLAGS_INCL_MODNAME   (1<<1)
> >>  #define _DPRINTK_FLAGS_INCL_FUNCNAME  (1<<2)
> >>  #define _DPRINTK_FLAGS_INCL_LINENO(1<<3)
> >>  #define _DPRINTK_FLAGS_INCL_TID   (1<<4)
> >> +#define _DPRINTK_FLAGS_TRACE  (1<<5)  
> >> +#define _DPRINTK_FLAGS_PRINT  (_DPRINTK_FLAGS_PRINTK | \
> >> +   _DPRINTK_FLAGS_TRACE)
> 
> 
> Is _DPRINTK_FLAGS_PRINT actually used anywhere? Looks to me like
> it can be removed.

It's used from DYNAMIC_DEBUG_BRANCH() as well as from
lib/dynamic_debug.c to check if the location is enabled.

> This is a feature I've wanted for dynamic debug for a while. Thanks for
> implementing it!
> 
> Dynamic can be enabled on the command line in order to print things early
> in boot (I think ftrace can as well), I want to make sure that there are
> no ordering issues here? And things wouldn't blow up if we enable printing
> to the ftrace buffer early on via dyanmic debug?

I tried enabling all dynamic debug locations and tracing via the command
line and that worked fine:

 dyndbg="file * +x" trace_event=printk:*

[PATCH v3 1/2] dynamic debug: split enable and printk flags

2020-08-25 Thread Vincent Whitchurch

We're going to add support to allow dynamic debug locations to print to
a trace event, and that will make _DPRINTK_FLAGS_PRINT a bit ambiguous.
Will it mean printk(), or printing to the trace event, or any of the
two?

To make it clearer, split _DPRINTK_FLAGS_PRINT into two:

 (1) _DPRINTK_FLAGS_PRINTK for turning on the printk() and
 (2) _DPRINTK_FLAGS_ENABLE when checking if the dynamic debug location
 is enabled.

_DPRINTK_FLAGS_ENABLE is currently just an alias of
_DPRINTK_FLAGS_PRINTK but will later also include a new flag for
enabling printing to the trace event.

Signed-off-by: Vincent Whitchurch 
---
 include/linux/dynamic_debug.h | 9 +
 lib/dynamic_debug.c   | 8 
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h
index aa9ff9e1c0b3..738421898aac 100644
--- a/include/linux/dynamic_debug.h
+++ b/include/linux/dynamic_debug.h
@@ -27,13 +27,14 @@ struct _ddebug {
 * writes commands to /dynamic_debug/control
 */
 #define _DPRINTK_FLAGS_NONE0
-#define _DPRINTK_FLAGS_PRINT   (1<<0) /* printk() a message using the format */
+#define _DPRINTK_FLAGS_PRINTK  (1<<0) /* printk() a message using the format */
 #define _DPRINTK_FLAGS_INCL_MODNAME(1<<1)
 #define _DPRINTK_FLAGS_INCL_FUNCNAME   (1<<2)
 #define _DPRINTK_FLAGS_INCL_LINENO (1<<3)
 #define _DPRINTK_FLAGS_INCL_TID(1<<4)
+#define _DPRINTK_FLAGS_ENABLE  _DPRINTK_FLAGS_PRINTK
 #if defined DEBUG
-#define _DPRINTK_FLAGS_DEFAULT _DPRINTK_FLAGS_PRINT
+#define _DPRINTK_FLAGS_DEFAULT _DPRINTK_FLAGS_PRINTK
 #else
 #define _DPRINTK_FLAGS_DEFAULT 0
 #endif
@@ -111,10 +112,10 @@ void __dynamic_ibdev_dbg(struct _ddebug *descriptor,
 
 #ifdef DEBUG
 #define DYNAMIC_DEBUG_BRANCH(descriptor) \
-   likely(descriptor.flags & _DPRINTK_FLAGS_PRINT)
+   likely(descriptor.flags & _DPRINTK_FLAGS_ENABLE)
 #else
 #define DYNAMIC_DEBUG_BRANCH(descriptor) \
-   unlikely(descriptor.flags & _DPRINTK_FLAGS_PRINT)
+   unlikely(descriptor.flags & _DPRINTK_FLAGS_ENABLE)
 #endif
 
 #endif
diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index 1d012e597cc3..88af85cb5222 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -84,7 +84,7 @@ static inline const char *trim_prefix(const char *path)
 }
 
 static struct { unsigned flag:8; char opt_char; } opt_array[] = {
-   { _DPRINTK_FLAGS_PRINT, 'p' },
+   { _DPRINTK_FLAGS_PRINTK, 'p' },
{ _DPRINTK_FLAGS_INCL_MODNAME, 'm' },
{ _DPRINTK_FLAGS_INCL_FUNCNAME, 'f' },
{ _DPRINTK_FLAGS_INCL_LINENO, 'l' },
@@ -206,10 +206,10 @@ static int ddebug_change(const struct ddebug_query *query,
if (newflags == dp->flags)
continue;
 #ifdef CONFIG_JUMP_LABEL
-   if (dp->flags & _DPRINTK_FLAGS_PRINT) {
-   if (!(modifiers->flags & _DPRINTK_FLAGS_PRINT))
+   if (dp->flags & _DPRINTK_FLAGS_ENABLE) {
+   if (!(modifiers->flags & _DPRINTK_FLAGS_ENABLE))

static_branch_disable(&dp->key.dd_key_true);
-   } else if (modifiers->flags & _DPRINTK_FLAGS_PRINT)
+   } else if (modifiers->flags & _DPRINTK_FLAGS_ENABLE)
static_branch_enable(&dp->key.dd_key_true);
 #endif
dp->flags = newflags;
-- 
2.28.0

[PATCH v3 0/2] Dynamic debug trace support

2020-08-25 Thread Vincent Whitchurch

v3:
 - Split flag rename to a separate patch
 - Rename event to printk:dyndbg

v2:
 - Remove stack buffer and use code similar to __ftrace_trace_stack()
 - Use an event with the same class as printk:console

Vincent Whitchurch (2):
  dynamic debug: split enable and printk flags
  dynamic debug: allow printing to trace event

 .../admin-guide/dynamic-debug-howto.rst   |   1 +
 include/linux/dynamic_debug.h |  11 +-
 include/trace/events/printk.h |  12 +-
 lib/dynamic_debug.c   | 161 ++
 4 files changed, 151 insertions(+), 34 deletions(-)

Range-diff:
-:   > 1:  2564b3dbbb04 dynamic debug: split enable and printk flags
1:  7bd3fb553503 ! 2:  90291c35d751 dynamic debug: allow printing to trace event
@@ Commit message
 debug do it.
 
 Add an "x" flag to make the dynamic debug call site print to a new
-printk:dynamic trace event.  The trace event can be emitted instead of
-or in addition to the printk().
+printk:dyndbg trace event.  The trace event can be emitted instead of 
or
+in addition to the printk().
 
 The print buffer is statically allocated and managed using code 
borrowed
 from __ftrace_trace_stack() and is limited to 256 bytes (four of these
@@ Documentation/admin-guide/dynamic-debug-howto.rst: of the characters::
  The flags are::
  
penables the pr_debug() callsite.
-+  xenables trace to the printk:dynamic event
++  xenables trace to the printk:dyndbg event
fInclude the function name in the printed message
lInclude line number in the printed message
mInclude module name in the printed message
 
  ## include/linux/dynamic_debug.h ##
 @@ include/linux/dynamic_debug.h: struct _ddebug {
-* writes commands to /dynamic_debug/control
-*/
- #define _DPRINTK_FLAGS_NONE   0
--#define _DPRINTK_FLAGS_PRINT  (1<<0) /* printk() a message using the 
format */
-+#define _DPRINTK_FLAGS_PRINTK (1<<0) /* printk() a message using the 
format */
- #define _DPRINTK_FLAGS_INCL_MODNAME   (1<<1)
  #define _DPRINTK_FLAGS_INCL_FUNCNAME  (1<<2)
  #define _DPRINTK_FLAGS_INCL_LINENO(1<<3)
  #define _DPRINTK_FLAGS_INCL_TID   (1<<4)
+-#define _DPRINTK_FLAGS_ENABLE _DPRINTK_FLAGS_PRINTK
 +#define _DPRINTK_FLAGS_TRACE  (1<<5)
-+#define _DPRINTK_FLAGS_PRINT  (_DPRINTK_FLAGS_PRINTK | \
++#define _DPRINTK_FLAGS_ENABLE (_DPRINTK_FLAGS_PRINTK | \
 +   _DPRINTK_FLAGS_TRACE)
  #if defined DEBUG
--#define _DPRINTK_FLAGS_DEFAULT _DPRINTK_FLAGS_PRINT
-+#define _DPRINTK_FLAGS_DEFAULT _DPRINTK_FLAGS_PRINTK
+ #define _DPRINTK_FLAGS_DEFAULT _DPRINTK_FLAGS_PRINTK
  #else
- #define _DPRINTK_FLAGS_DEFAULT 0
- #endif
 
  ## include/trace/events/printk.h ##
 @@
@@ include/trace/events/printk.h: TRACE_EVENT(console,
 +  TP_ARGS(text, len)
 +);
 +
-+DEFINE_EVENT(printk, dynamic,
++DEFINE_EVENT(printk, dyndbg,
 +  TP_PROTO(const char *text, size_t len),
 +  TP_ARGS(text, len)
 +);
@@ lib/dynamic_debug.c
  
  #include 
  
-@@ lib/dynamic_debug.c: static inline const char *trim_prefix(const char 
*path)
- }
- 
- static struct { unsigned flag:8; char opt_char; } opt_array[] = {
--  { _DPRINTK_FLAGS_PRINT, 'p' },
-+  { _DPRINTK_FLAGS_PRINTK, 'p' },
-   { _DPRINTK_FLAGS_INCL_MODNAME, 'm' },
+@@ lib/dynamic_debug.c: static struct { unsigned flag:8; char opt_char; } 
opt_array[] = {
{ _DPRINTK_FLAGS_INCL_FUNCNAME, 'f' },
{ _DPRINTK_FLAGS_INCL_LINENO, 'l' },
{ _DPRINTK_FLAGS_INCL_TID, 't' },
@@ lib/dynamic_debug.c: static char *dynamic_emit_prefix(const struct 
_ddebug *desc
 +  buf = this_cpu_ptr(dynamic_trace_bufs.bufs) + bufidx;
 +
 +  len = vscnprintf(buf->buf, sizeof(buf->buf), fmt, args);
-+  trace_dynamic(buf->buf, len);
++  trace_dyndbg(buf->buf, len);
 +
 +out:
 +  /* As above. */
-- 
2.28.0

[PATCH v3 2/2] dynamic debug: allow printing to trace event

2020-08-25 Thread Vincent Whitchurch

When debugging device drivers, I've found it very useful to be able to
redirect existing pr_debug()/dev_dbg() prints to the trace buffer
instead of dmesg.  Among the many advantages of the trace buffer is that
it can be dynamically resized, allows these prints to combined with
other trace events, and doesn't fill up system logs.

Since dynamic debug already has hooks in these call sites, getting these
prints into the ftrace buffer is straightforward if we have dynamic
debug do it.

Add an "x" flag to make the dynamic debug call site print to a new
printk:dyndbg trace event.  The trace event can be emitted instead of or
in addition to the printk().

The print buffer is statically allocated and managed using code borrowed
from __ftrace_trace_stack() and is limited to 256 bytes (four of these
are allocated per CPU to handle the various contexts); anything larger
will be truncated.

Signed-off-by: Vincent Whitchurch 
---
 .../admin-guide/dynamic-debug-howto.rst   |   1 +
 include/linux/dynamic_debug.h |   4 +-
 include/trace/events/printk.h |  12 +-
 lib/dynamic_debug.c   | 153 +++---
 4 files changed, 143 insertions(+), 27 deletions(-)

diff --git a/Documentation/admin-guide/dynamic-debug-howto.rst 
b/Documentation/admin-guide/dynamic-debug-howto.rst
index e5a8def45f3f..e717054ae142 100644
--- a/Documentation/admin-guide/dynamic-debug-howto.rst
+++ b/Documentation/admin-guide/dynamic-debug-howto.rst
@@ -229,6 +229,7 @@ of the characters::
 The flags are::
 
   penables the pr_debug() callsite.
+  xenables trace to the printk:dyndbg event
   fInclude the function name in the printed message
   lInclude line number in the printed message
   mInclude module name in the printed message
diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h
index 738421898aac..74aee3f922d7 100644
--- a/include/linux/dynamic_debug.h
+++ b/include/linux/dynamic_debug.h
@@ -32,7 +32,9 @@ struct _ddebug {
 #define _DPRINTK_FLAGS_INCL_FUNCNAME   (1<<2)
 #define _DPRINTK_FLAGS_INCL_LINENO (1<<3)
 #define _DPRINTK_FLAGS_INCL_TID(1<<4)
-#define _DPRINTK_FLAGS_ENABLE  _DPRINTK_FLAGS_PRINTK
+#define _DPRINTK_FLAGS_TRACE   (1<<5)
+#define _DPRINTK_FLAGS_ENABLE  (_DPRINTK_FLAGS_PRINTK | \
+_DPRINTK_FLAGS_TRACE)
 #if defined DEBUG
 #define _DPRINTK_FLAGS_DEFAULT _DPRINTK_FLAGS_PRINTK
 #else
diff --git a/include/trace/events/printk.h b/include/trace/events/printk.h
index 13d405b2fd8b..1f78bd237a91 100644
--- a/include/trace/events/printk.h
+++ b/include/trace/events/printk.h
@@ -7,7 +7,7 @@
 
 #include 
 
-TRACE_EVENT(console,
+DECLARE_EVENT_CLASS(printk,
TP_PROTO(const char *text, size_t len),
 
TP_ARGS(text, len),
@@ -31,6 +31,16 @@ TRACE_EVENT(console,
 
TP_printk("%s", __get_str(msg))
 );
+
+DEFINE_EVENT(printk, console,
+   TP_PROTO(const char *text, size_t len),
+   TP_ARGS(text, len)
+);
+
+DEFINE_EVENT(printk, dyndbg,
+   TP_PROTO(const char *text, size_t len),
+   TP_ARGS(text, len)
+);
 #endif /* _TRACE_PRINTK_H */
 
 /* This part must be outside protection */
diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index 88af85cb5222..3319da2e235c 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -89,6 +90,7 @@ static struct { unsigned flag:8; char opt_char; } opt_array[] 
= {
{ _DPRINTK_FLAGS_INCL_FUNCNAME, 'f' },
{ _DPRINTK_FLAGS_INCL_LINENO, 'l' },
{ _DPRINTK_FLAGS_INCL_TID, 't' },
+   { _DPRINTK_FLAGS_TRACE, 'x' },
{ _DPRINTK_FLAGS_NONE, '_' },
 };
 
@@ -600,6 +602,96 @@ static char *dynamic_emit_prefix(const struct _ddebug 
*desc, char *buf)
return buf;
 }
 
+/*
+ * This code is heavily based on __ftrace_trace_stack().
+ *
+ * Allow 4 levels of nesting: normal, softirq, irq, NMI.
+ */
+#define DYNAMIC_TRACE_NESTING  4
+
+struct dynamic_trace_buf {
+   char buf[256];
+};
+
+struct dynamic_trace_bufs {
+   struct dynamic_trace_buf bufs[DYNAMIC_TRACE_NESTING];
+};
+
+static DEFINE_PER_CPU(struct dynamic_trace_bufs, dynamic_trace_bufs);
+static DEFINE_PER_CPU(int, dynamic_trace_reserve);
+
+static void dynamic_trace(const char *fmt, va_list args)
+{
+   struct dynamic_trace_buf *buf;
+   int bufidx;
+   int len;
+
+   preempt_disable_notrace();
+
+   bufidx = __this_cpu_inc_return(dynamic_trace_reserve) - 1;
+
+   if (WARN_ON_ONCE(bufidx > DYNAMIC_TRACE_NESTING))
+   goto out;
+
+   /* For the same reasons as in __ftrace_trace_stack(). */
+   barrier();
+
+   buf = this_cpu_ptr(dynamic_trace_bufs.bufs) + bufidx;
+
+   len = vscnprintf(buf->buf, sizeof(buf->buf), fmt, args);
+   trace_dyndb

Re: [PATCH v2] dynamic debug: allow printing to trace event

2020-08-25 Thread Vincent Whitchurch

On Fri, Aug 14, 2020 at 07:15:31PM +0200, Steven Rostedt wrote:
> On Fri, 14 Aug 2020 15:31:51 +0200
> Vincent Whitchurch  wrote:
> > index aa9ff9e1c0b3..f599ed21ecc5 100644
> > --- a/include/linux/dynamic_debug.h
> > +++ b/include/linux/dynamic_debug.h
> > @@ -27,13 +27,16 @@ struct _ddebug {
> >  * writes commands to /dynamic_debug/control
> >  */
> >  #define _DPRINTK_FLAGS_NONE0
> > -#define _DPRINTK_FLAGS_PRINT   (1<<0) /* printk() a message using the 
> > format */
> > +#define _DPRINTK_FLAGS_PRINTK  (1<<0) /* printk() a message using the 
> > format */
> 
> The above looks like a cleanup unrelated to this patch, and probably
> should be on its own.

I've moved this and the other renaming hunk out to a separate patch.

> >  #define _DPRINTK_FLAGS_INCL_MODNAME(1<<1)
> >  #define _DPRINTK_FLAGS_INCL_FUNCNAME   (1<<2)
> >  #define _DPRINTK_FLAGS_INCL_LINENO (1<<3)
> >  #define _DPRINTK_FLAGS_INCL_TID(1<<4)
> > +#define _DPRINTK_FLAGS_TRACE   (1<<5)  
> > +#define _DPRINTK_FLAGS_PRINT   (_DPRINTK_FLAGS_PRINTK | \
> > +_DPRINTK_FLAGS_TRACE)
> >  #if defined DEBUG
> > -#define _DPRINTK_FLAGS_DEFAULT _DPRINTK_FLAGS_PRINT
> > +#define _DPRINTK_FLAGS_DEFAULT _DPRINTK_FLAGS_PRINTK
> >  #else
> >  #define _DPRINTK_FLAGS_DEFAULT 0
> >  #endif
> > diff --git a/include/trace/events/printk.h b/include/trace/events/printk.h
> > index 13d405b2fd8b..6c89121a1669 100644
> > --- a/include/trace/events/printk.h
> > +++ b/include/trace/events/printk.h
> > @@ -7,7 +7,7 @@
> >  
> >  #include 
> >  
> > -TRACE_EVENT(console,
> > +DECLARE_EVENT_CLASS(printk,
> > TP_PROTO(const char *text, size_t len),
> >  
> > TP_ARGS(text, len),
> > @@ -31,6 +31,16 @@ TRACE_EVENT(console,
> >  
> > TP_printk("%s", __get_str(msg))
> >  );
> > +
> > +DEFINE_EVENT(printk, console,
> > +   TP_PROTO(const char *text, size_t len),
> > +   TP_ARGS(text, len)
> > +);
> > +
> > +DEFINE_EVENT(printk, dynamic,
> 
> Can we call this "dynamic_printk" or "printk_dynamic", as
> trace_dynamic() is too generic.

I went for "dyndbg" (printk:dyndbg / trace_dyndbg()) which is the
documented name of the infrastructure.

[PATCH v2 2/2] ARM: module: fix handling of unwind init sections

2020-05-11 Thread Vincent Whitchurch

Unwind information for init sections is placed in .ARM.exidx.init.text
and .ARM.extab.init.text.  The module core doesn't know that these are
init sections so they are allocated along with the core sections, and if
the core and init sections get allocated in different memory regions
(which is possible with CONFIG_ARM_MODULE_PLTS=y) and they can't reach
each other, relocation fails:

  final section addresses:
...
0x7f80 .init.text
..
0xcbb54078 .ARM.exidx.init.text
..

 section 16 reloc 0 sym '': relocation 42 out of range (0xcbb54078 ->
 0x7f80)

Fix this by informing the module core that these sections are init
sections, and by removing the init unwind tables before the module core
frees the init sections.

Signed-off-by: Vincent Whitchurch 
---
v2: No changes.

 arch/arm/kernel/module.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index deef17f34bd2..af0a8500a24e 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -55,6 +55,13 @@ void *module_alloc(unsigned long size)
 }
 #endif
 
+bool module_init_section(const char *name)
+{
+   return strstarts(name, ".init") ||
+   strstarts(name, ".ARM.extab.init") ||
+   strstarts(name, ".ARM.exidx.init");
+}
+
 bool module_exit_section(const char *name)
 {
return strstarts(name, ".exit") ||
@@ -409,8 +416,17 @@ module_arch_cleanup(struct module *mod)
 #ifdef CONFIG_ARM_UNWIND
int i;
 
-   for (i = 0; i < ARM_SEC_MAX; i++)
-   if (mod->arch.unwind[i])
-   unwind_table_del(mod->arch.unwind[i]);
+   for (i = 0; i < ARM_SEC_MAX; i++) {
+   unwind_table_del(mod->arch.unwind[i]);
+   mod->arch.unwind[i] = NULL;
+   }
+#endif
+}
+
+void __weak module_arch_freeing_init(struct module *mod)
+{
+#ifdef CONFIG_ARM_UNWIND
+   unwind_table_del(mod->arch.unwind[ARM_SEC_INIT]);
+   mod->arch.unwind[ARM_SEC_INIT] = NULL;
 #endif
 }
-- 
2.25.1

[PATCH v2 1/2] module: allow arch overrides for .init section names

2020-05-11 Thread Vincent Whitchurch

ARM stores unwind information for .init.text in sections named
.ARM.extab.init.text and .ARM.exidx.init.text.  Since those aren't
currently recognized as init sections, they're allocated along with the
core section, and relocation fails if the core and the init section are
allocated from different regions and can't reach other.

  final section addresses:
...
0x7f80 .init.text
..
0xcbb54078 .ARM.exidx.init.text
..

 section 16 reloc 0 sym '': relocation 42 out of range (0xcbb54078 ->
 0x7f80)

Allow architectures to override the section name so that ARM can fix
this.

Signed-off-by: Vincent Whitchurch 
---
v2: Add comment and move module_init_section() next to module_exit_section().

 include/linux/moduleloader.h | 5 +
 kernel/module.c  | 9 +++--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index ca92aea8a6bd..4fa67a8b2265 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -29,6 +29,11 @@ void *module_alloc(unsigned long size);
 /* Free memory returned from module_alloc. */
 void module_memfree(void *module_region);
 
+/* Determines if the section name is an init section (that is only used during
+ * module loading).
+ */
+bool module_init_section(const char *name);
+
 /* Determines if the section name is an exit section (that is only used during
  * module unloading)
  */
diff --git a/kernel/module.c b/kernel/module.c
index 33569a01d6e1..84d0c455fb44 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2400,7 +2400,7 @@ static void layout_sections(struct module *mod, struct 
load_info *info)
if ((s->sh_flags & masks[m][0]) != masks[m][0]
|| (s->sh_flags & masks[m][1])
|| s->sh_entsize != ~0UL
-   || strstarts(sname, ".init"))
+   || module_init_section(sname))
continue;
s->sh_entsize = get_offset(mod, &mod->core_layout.size, 
s, i);
pr_debug("\t%s\n", sname);
@@ -2433,7 +2433,7 @@ static void layout_sections(struct module *mod, struct 
load_info *info)
if ((s->sh_flags & masks[m][0]) != masks[m][0]
|| (s->sh_flags & masks[m][1])
|| s->sh_entsize != ~0UL
-   || !strstarts(sname, ".init"))
+   || !module_init_section(sname))
continue;
s->sh_entsize = (get_offset(mod, 
&mod->init_layout.size, s, i)
 | INIT_OFFSET_MASK);
@@ -2768,6 +2768,11 @@ void * __weak module_alloc(unsigned long size)
return vmalloc_exec(size);
 }
 
+bool __weak module_init_section(const char *name)
+{
+   return strstarts(name, ".init");
+}
+
 bool __weak module_exit_section(const char *name)
 {
return strstarts(name, ".exit");
-- 
2.25.1

Re: [PATCH v2 1/2] module: allow arch overrides for .init section names

2020-05-14 Thread Vincent Whitchurch

On Mon, May 11, 2020 at 05:45:00PM +0200, Jessica Yu wrote:
> +++ Vincent Whitchurch [11/05/20 13:48 +0200]:
> >ARM stores unwind information for .init.text in sections named
> >.ARM.extab.init.text and .ARM.exidx.init.text.  Since those aren't
> >currently recognized as init sections, they're allocated along with the
> >core section, and relocation fails if the core and the init section are
> >allocated from different regions and can't reach other.
> >
> >  final section addresses:
> >...
> >0x7f80 .init.text
> >..
> >0xcbb54078 .ARM.exidx.init.text
> >..
> >
> > section 16 reloc 0 sym '': relocation 42 out of range (0xcbb54078 ->
> > 0x7f80)
> >
> >Allow architectures to override the section name so that ARM can fix
> >this.
> >
> >Signed-off-by: Vincent Whitchurch 
> >---
> >v2: Add comment and move module_init_section() next to module_exit_section().
> 
> Thanks, this patch looks fine to me. You could add my:
> 
>Acked-by: Jessica Yu 
> 
> Alternatively, I can take this through modules-next if the second
> patch gets a review and ack from an ARM maintainer.

Thanks!  I've put the patches in Russell's patch system.

Re: [PATCH v3 2/2] dynamic debug: allow printing to trace event

2020-09-02 Thread Vincent Whitchurch

On Wed, Aug 26, 2020 at 09:53:57PM +0200, Joe Perches wrote:
> On Wed, 2020-08-26 at 15:32 -0400, Steven Rostedt wrote:
> > On Tue, 25 Aug 2020 08:53:25 -0700
> > Joe Perches  wrote:
> > 
> > > > The print buffer is statically allocated and managed using code borrowed
> > > > from __ftrace_trace_stack() and is limited to 256 bytes (four of these
> > > > are allocated per CPU to handle the various contexts); anything larger
> > > > will be truncated.  
> > > 
> > > There is an effort to avoid using trace_printk and the like
> > > so perhaps this feature should have the same compile time
> > > guard.
> > 
> > No, this is fine for being in a production kernel. Basically, these are
> > simply debug printk()s that can also be put into the trace buffer. The
> > key difference to trace_printk() is that they are an event that needs
> > to be enabled to write into the buffer.
> 
> It just seems like a backdoor way to convert various pr_debug functions
> (dev_dbg/netdev_dbg, et al) into tracing.
> 
> What's the real value?  Timing data?  Something else?

I mentioned my use case for this in the commit message and why it works
much better than printk() for that, please let me know if it is unclear:

 When debugging device drivers, I've found it very useful to be able to
 redirect existing pr_debug()/dev_dbg() prints to the trace buffer
 instead of dmesg.  Among the many advantages of the trace buffer is that
 it can be dynamically resized, allows these prints to combined with
 other trace events, and doesn't fill up system logs.

This is my only use case for this, and I've used it very very often
during the years I've been carrying this patch locally.

[PATCH] regulator: pwm: Fix machine constraints application

2020-09-02 Thread Vincent Whitchurch

If the zero duty cycle doesn't correspond to any voltage in the voltage
table, the PWM regulator returns an -EINVAL from get_voltage_sel() which
results in the core erroring out with a "failed to get the current
voltage" and ending up not applying the machine constraints.

Instead, return -ENOTRECOVERABLE which makes the core set the voltage
since it's at an unknown value.

For example, with this device tree:

fooregulator {
compatible = "pwm-regulator";
pwms = <&foopwm 0 10>;
regulator-min-microvolt = <225>;
regulator-max-microvolt = <225>;
regulator-name = "fooregulator";
regulator-always-on;
regulator-boot-on;
voltage-table = <225 30>;
};

Before this patch:

  fooregulator: failed to get the current voltage(-22)

After this patch:

  fooregulator: Setting 225-225uV
  fooregulator: 2250 mV

Signed-off-by: Vincent Whitchurch 
---
 drivers/regulator/pwm-regulator.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/regulator/pwm-regulator.c 
b/drivers/regulator/pwm-regulator.c
index 3234b118b53e..990bd50771d8 100644
--- a/drivers/regulator/pwm-regulator.c
+++ b/drivers/regulator/pwm-regulator.c
@@ -279,7 +279,7 @@ static int pwm_regulator_init_table(struct platform_device 
*pdev,
return ret;
}
 
-   drvdata->state  = -EINVAL;
+   drvdata->state  = -ENOTRECOVERABLE;
drvdata->duty_cycle_table   = duty_cycle_table;
drvdata->desc.ops = &pwm_regulator_voltage_table_ops;
drvdata->desc.n_voltages= length / sizeof(*duty_cycle_table);
-- 
2.28.0

[PATCH v2] dynamic debug: allow printing to trace event

2020-08-14 Thread Vincent Whitchurch

When debugging device drivers, I've found it very useful to be able to
redirect existing pr_debug()/dev_dbg() prints to the trace buffer
instead of dmesg.  Among the many advantages of the trace buffer is that
it can be dynamically resized, allows these prints to combined with
other trace events, and doesn't fill up system logs.

Since dynamic debug already has hooks in these call sites, getting these
prints into the ftrace buffer is straightforward if we have dynamic
debug do it.

Add an "x" flag to make the dynamic debug call site print to a new
printk:dynamic trace event.  The trace event can be emitted instead of
or in addition to the printk().

The print buffer is statically allocated and managed using code borrowed
from __ftrace_trace_stack() and is limited to 256 bytes (four of these
are allocated per CPU to handle the various contexts); anything larger
will be truncated.

Signed-off-by: Vincent Whitchurch 
---
v2:
 - Remove stack buffer and use code similar to __ftrace_trace_stack()
 - Use an event with the same class as printk:console

 .../admin-guide/dynamic-debug-howto.rst   |   1 +
 include/linux/dynamic_debug.h |   7 +-
 include/trace/events/printk.h |  12 +-
 lib/dynamic_debug.c   | 155 +++---
 4 files changed, 146 insertions(+), 29 deletions(-)

diff --git a/Documentation/admin-guide/dynamic-debug-howto.rst 
b/Documentation/admin-guide/dynamic-debug-howto.rst
index e5a8def45f3f..d2ebee464db7 100644
--- a/Documentation/admin-guide/dynamic-debug-howto.rst
+++ b/Documentation/admin-guide/dynamic-debug-howto.rst
@@ -229,6 +229,7 @@ of the characters::
 The flags are::
 
   penables the pr_debug() callsite.
+  xenables trace to the printk:dynamic event
   fInclude the function name in the printed message
   lInclude line number in the printed message
   mInclude module name in the printed message
diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h
index aa9ff9e1c0b3..f599ed21ecc5 100644
--- a/include/linux/dynamic_debug.h
+++ b/include/linux/dynamic_debug.h
@@ -27,13 +27,16 @@ struct _ddebug {
 * writes commands to /dynamic_debug/control
 */
 #define _DPRINTK_FLAGS_NONE0
-#define _DPRINTK_FLAGS_PRINT   (1<<0) /* printk() a message using the format */
+#define _DPRINTK_FLAGS_PRINTK  (1<<0) /* printk() a message using the format */
 #define _DPRINTK_FLAGS_INCL_MODNAME(1<<1)
 #define _DPRINTK_FLAGS_INCL_FUNCNAME   (1<<2)
 #define _DPRINTK_FLAGS_INCL_LINENO (1<<3)
 #define _DPRINTK_FLAGS_INCL_TID(1<<4)
+#define _DPRINTK_FLAGS_TRACE   (1<<5)
+#define _DPRINTK_FLAGS_PRINT   (_DPRINTK_FLAGS_PRINTK | \
+_DPRINTK_FLAGS_TRACE)
 #if defined DEBUG
-#define _DPRINTK_FLAGS_DEFAULT _DPRINTK_FLAGS_PRINT
+#define _DPRINTK_FLAGS_DEFAULT _DPRINTK_FLAGS_PRINTK
 #else
 #define _DPRINTK_FLAGS_DEFAULT 0
 #endif
diff --git a/include/trace/events/printk.h b/include/trace/events/printk.h
index 13d405b2fd8b..6c89121a1669 100644
--- a/include/trace/events/printk.h
+++ b/include/trace/events/printk.h
@@ -7,7 +7,7 @@
 
 #include 
 
-TRACE_EVENT(console,
+DECLARE_EVENT_CLASS(printk,
TP_PROTO(const char *text, size_t len),
 
TP_ARGS(text, len),
@@ -31,6 +31,16 @@ TRACE_EVENT(console,
 
TP_printk("%s", __get_str(msg))
 );
+
+DEFINE_EVENT(printk, console,
+   TP_PROTO(const char *text, size_t len),
+   TP_ARGS(text, len)
+);
+
+DEFINE_EVENT(printk, dynamic,
+   TP_PROTO(const char *text, size_t len),
+   TP_ARGS(text, len)
+);
 #endif /* _TRACE_PRINTK_H */
 
 /* This part must be outside protection */
diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index 1d012e597cc3..76fc3e33fe41 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -84,11 +85,12 @@ static inline const char *trim_prefix(const char *path)
 }
 
 static struct { unsigned flag:8; char opt_char; } opt_array[] = {
-   { _DPRINTK_FLAGS_PRINT, 'p' },
+   { _DPRINTK_FLAGS_PRINTK, 'p' },
{ _DPRINTK_FLAGS_INCL_MODNAME, 'm' },
{ _DPRINTK_FLAGS_INCL_FUNCNAME, 'f' },
{ _DPRINTK_FLAGS_INCL_LINENO, 'l' },
{ _DPRINTK_FLAGS_INCL_TID, 't' },
+   { _DPRINTK_FLAGS_TRACE, 'x' },
{ _DPRINTK_FLAGS_NONE, '_' },
 };
 
@@ -600,6 +602,96 @@ static char *dynamic_emit_prefix(const struct _ddebug 
*desc, char *buf)
return buf;
 }
 
+/*
+ * This code is heavily based on __ftrace_trace_stack().
+ *
+ * Allow 4 levels of nesting: normal, softirq, irq, NMI.
+ */
+#define DYNAMIC_TRACE_NESTING  4
+
+struct dynamic_trace_buf {
+   char buf[256];
+};
+
+struct dynamic_trace_bufs {
+   struct dynamic_trace_buf bufs[DYNAMIC_TRACE_N

Re: [PATCH] dynamic debug: allow printing to trace event

2020-08-14 Thread Vincent Whitchurch

On Thu, Jul 23, 2020 at 05:26:44PM +0200, Steven Rostedt wrote:
> On Thu, 23 Jul 2020 12:57:35 +0200
> Vincent Whitchurch  wrote:
> 
> > Would it be acceptable to just use a fixed size for the event?  At least
> > for my own debugging use cases it's preferable to just have to increase
> > the trace buffer size in case it's insufficient, rather than to have to
> > restort to one-off debugging code.
> 
> There's two other options.
> 
> Option 1, is to allocate 256 bytes times 4 (in case of interruption,
> where you have a separate buffer for every context - normal, softirq,
> irq, nmi), and use it like I do for stack traces in the latest kernel
> (see __ftrace_stack_trace() in kernel/trace/trace.c)
> 
> Option 2, would be to use trace_array_vprintk(), but you need to create
> your own instance to do so to keep from messing with the top level buffer.

Thanks for the suggestions, I've sent out a v2 implementing option 1:

https://lore.kernel.org/lkml/20200814133151.7759-1-vincent.whitchu...@axis.com/

[PATCH v2] gpio: mockup: Allow probing from device tree

2020-10-27 Thread Vincent Whitchurch

Allow the mockup driver to be probed via the device tree without any
module parameters, allowing it to be used to configure and test higher
level drivers like the leds-gpio driver and corresponding userspace
before actual hardware is available.

Signed-off-by: Vincent Whitchurch 
---
v2: Remove most of the added code, since the latest driver doesn't need it.
Drop DT binding document, since Rob Herring was OK with not documenting
this:

https://lore.kernel.org/linux-devicetree/5baa1ae6.1c69fb81.847f2.3...@mx.google.com/

 drivers/gpio/gpio-mockup.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/gpio/gpio-mockup.c b/drivers/gpio/gpio-mockup.c
index 67ed4f238d43..c93892a6936a 100644
--- a/drivers/gpio/gpio-mockup.c
+++ b/drivers/gpio/gpio-mockup.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -460,9 +461,18 @@ static int gpio_mockup_probe(struct platform_device *pdev)
return 0;
 }
 
+#ifdef CONFIG_OF
+static const struct of_device_id gpio_mockup_of_match[] = {
+   { .compatible = "gpio-mockup", },
+   {},
+};
+MODULE_DEVICE_TABLE(of, gpio_mockup_of_match);
+#endif
+
 static struct platform_driver gpio_mockup_driver = {
.driver = {
.name = "gpio-mockup",
+   .of_match_table = of_match_ptr(gpio_mockup_of_match),
},
.probe = gpio_mockup_probe,
 };
@@ -556,8 +566,7 @@ static int __init gpio_mockup_init(void)
 {
int i, num_chips, err;
 
-   if ((gpio_mockup_num_ranges < 2) ||
-   (gpio_mockup_num_ranges % 2) ||
+   if ((gpio_mockup_num_ranges % 2) ||
(gpio_mockup_num_ranges > GPIO_MOCKUP_MAX_RANGES))
return -EINVAL;
 
-- 
2.28.0

Re: [PATCH V3 2/4] misc: vop: do not allocate and reassign the used ring

2020-10-27 Thread Vincent Whitchurch

On Tue, Oct 27, 2020 at 08:05:43AM +0100, Sherry Sun wrote:
> Can you help test the patch about removing the codes of reassign used
> ring, and comment on the impact for Intel MIC platform?  Thanks for
> any help.

I don't have access to MIC hardware myself, either.

But this patch is quite certainly going to break it since guests using a
kernel without the patch will not work with hosts with a kernel with the
patch.

[PATCH] of: Fix reserved-memory overlap detection

2020-10-20 Thread Vincent Whitchurch

The reserved-memory overlap detection code fails to detect overlaps if
either of the regions starts at address 0x0.  For some reason the code
explicitly checks for and ignores such regions, but this check looks
invalid.  Remove the check and fix this detection.

For example, no overlap is currently reported for this case:

foo@0 {
reg = <0x 0x2000>;
};

bar@1000 {
reg = <0x1000 0x1000>;
};

but it is after this patch:

 OF: reserved mem: OVERLAP DETECTED!
 foo@0 (0x--0x2000) overlaps with bar@1000 (0x1000--0x2000)

Signed-off-by: Vincent Whitchurch 
---
 drivers/of/of_reserved_mem.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/of/of_reserved_mem.c b/drivers/of/of_reserved_mem.c
index 46b9371c8a33..1c5259e3e81f 100644
--- a/drivers/of/of_reserved_mem.c
+++ b/drivers/of/of_reserved_mem.c
@@ -217,8 +217,7 @@ static void __init __rmem_check_for_overlap(void)
 
this = &reserved_mem[i];
next = &reserved_mem[i + 1];
-   if (!(this->base && next->base))
-   continue;
+
if (this->base + this->size > next->base) {
phys_addr_t this_end, next_end;
 
-- 
2.28.0

Re: [PATCH] of: Fix reserved-memory overlap detection

2020-10-20 Thread Vincent Whitchurch

On Tue, Oct 20, 2020 at 03:00:14PM +0200, Rob Herring wrote:
> On Tue, Oct 20, 2020 at 2:36 AM Vincent Whitchurch
>  wrote:
> >
> > The reserved-memory overlap detection code fails to detect overlaps if
> > either of the regions starts at address 0x0.  For some reason the code
> > explicitly checks for and ignores such regions, but this check looks
> > invalid.  Remove the check and fix this detection.
> 
> Wouldn't 'base' be 0 for nodes that have a 'size' and no address? The
> base in those cases isn't set until later when
> __reserved_mem_alloc_size() is called.

Ah, yes, I guess that's why the check was there.  I see that those
entries have both a zero address and a zero size, so this seems to work:

diff --git a/arch/arm/boot/dts/vexpress-v2p-ca9.dts 
b/arch/arm/boot/dts/vexpress-v2p-ca9.dts
index 623246f37448..6627e71c7283 100644
--- a/arch/arm/boot/dts/vexpress-v2p-ca9.dts
+++ b/arch/arm/boot/dts/vexpress-v2p-ca9.dts
@@ -81,6 +81,18 @@ vram: vram@4c00 {
reg = <0x4c00 0x0080>;
no-map;
};
+
+   foo@0 {
+   reg = <0x0 0x2000>;
+   };
+
+   bar@1000 {
+   reg = <0x0 0x1000>;
+   };
+
+   baz {
+   size = <0x1000>;
+   };
};
 
clcd@1002 {
diff --git a/drivers/of/of_reserved_mem.c b/drivers/of/of_reserved_mem.c
index 46b9371c8a33..fea9433d942a 100644
--- a/drivers/of/of_reserved_mem.c
+++ b/drivers/of/of_reserved_mem.c
@@ -200,6 +200,16 @@ static int __init __rmem_cmp(const void *a, const void *b)
if (ra->base > rb->base)
return 1;
 
+   /*
+* Put the dynamic allocations (address == 0, size == 0) before static
+* allocations at address 0x0 so that overlap detection works
+* correctly.
+*/
+   if (ra->size < rb->size)
+   return -1;
+   if (ra->size > rb->size)
+   return 1;
+
return 0;
 }
 
@@ -212,13 +222,19 @@ static void __init __rmem_check_for_overlap(void)
 
sort(reserved_mem, reserved_mem_count, sizeof(reserved_mem[0]),
 __rmem_cmp, NULL);
+
+   for (i = 0; i < reserved_mem_count - 1; i++) {
+   struct reserved_mem *this = &reserved_mem[i];
+
+   pr_info("i %d base %x size %x\n", i, this->base, this->size);
+   }
+
for (i = 0; i < reserved_mem_count - 1; i++) {
struct reserved_mem *this, *next;
 
this = &reserved_mem[i];
next = &reserved_mem[i + 1];
-   if (!(this->base && next->base))
-   continue;
+
if (this->base + this->size > next->base) {
phys_addr_t this_end, next_end;

[PATCH v2] of: Fix reserved-memory overlap detection

2020-10-21 Thread Vincent Whitchurch

The reserved-memory overlap detection code fails to detect overlaps if
either of the regions starts at address 0x0.  The code explicitly checks
for and ignores such regions, apparently in order to ignore dynamically
allocated regions which have an address of 0x0 at this point.  These
dynamically allocated regions also have a size of 0x0 at this point, so
fix this by removing the check and sorting the dynamically allocated
regions ahead of any static regions at address 0x0.

For example, there are two overlaps in this case but they are not
currently reported:

foo@0 {
reg = <0x0 0x2000>;
};

bar@0 {
reg = <0x0 0x1000>;
};

baz@1000 {
reg = <0x1000 0x1000>;
};

quux {
size = <0x1000>;
};

but they are after this patch:

 OF: reserved mem: OVERLAP DETECTED!
 bar@0 (0x--0x1000) overlaps with foo@0 (0x--0x2000)
 OF: reserved mem: OVERLAP DETECTED!
 foo@0 (0x--0x2000) overlaps with baz@1000 (0x1000--0x00002000)

Signed-off-by: Vincent Whitchurch 
---
v2: Fix handling of dynamically allocated regions.

 drivers/of/of_reserved_mem.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/of/of_reserved_mem.c b/drivers/of/of_reserved_mem.c
index 46b9371..6530b8b 100644
--- a/drivers/of/of_reserved_mem.c
+++ b/drivers/of/of_reserved_mem.c
@@ -200,6 +200,16 @@ static int __init __rmem_cmp(const void *a, const void *b)
if (ra->base > rb->base)
return 1;
 
+   /*
+* Put the dynamic allocations (address == 0, size == 0) before static
+* allocations at address 0x0 so that overlap detection works
+* correctly.
+*/
+   if (ra->size < rb->size)
+   return -1;
+   if (ra->size > rb->size)
+   return 1;
+
return 0;
 }
 
@@ -217,8 +227,7 @@ static void __init __rmem_check_for_overlap(void)
 
this = &reserved_mem[i];
next = &reserved_mem[i + 1];
-   if (!(this->base && next->base))
-   continue;
+
if (this->base + this->size > next->base) {
phys_addr_t this_end, next_end;
 

base-commit: 270315b8235e3d10c2e360cff56c2f9e0915a252
-- 
git-series 0.9.1

Re: [PATCH] of: Fix reserved-memory overlap detection

2020-10-21 Thread Vincent Whitchurch

On Tue, Oct 20, 2020 at 04:17:27PM +0200, Rob Herring wrote:
> On Tue, Oct 20, 2020 at 8:46 AM Vincent Whitchurch
>  wrote:
> > On Tue, Oct 20, 2020 at 03:00:14PM +0200, Rob Herring wrote:
> > > On Tue, Oct 20, 2020 at 2:36 AM Vincent Whitchurch
> > >  wrote:
> > > >
> > > > The reserved-memory overlap detection code fails to detect overlaps if
> > > > either of the regions starts at address 0x0.  For some reason the code
> > > > explicitly checks for and ignores such regions, but this check looks
> > > > invalid.  Remove the check and fix this detection.
> > >
> > > Wouldn't 'base' be 0 for nodes that have a 'size' and no address? The
> > > base in those cases isn't set until later when
> > > __reserved_mem_alloc_size() is called.
> >
> > Ah, yes, I guess that's why the check was there.  I see that those
> > entries have both a zero address and a zero size, so this seems to work:
> 
> Yes, I think it should work.

Thanks, I've tested it a bit more and sent it out as a v2 now.

> > diff --git a/arch/arm/boot/dts/vexpress-v2p-ca9.dts 
> > b/arch/arm/boot/dts/vexpress-v2p-ca9.dts
> > index 623246f37448..6627e71c7283 100644
> > --- a/arch/arm/boot/dts/vexpress-v2p-ca9.dts
> > +++ b/arch/arm/boot/dts/vexpress-v2p-ca9.dts
> > @@ -81,6 +81,18 @@ vram: vram@4c00 {
> > reg = <0x4c00 0x0080>;
> > no-map;
> > };
> > +
> > +   foo@0 {
> > +   reg = <0x0 0x2000>;
> > +   };
> > +
> > +   bar@1000 {
> > +   reg = <0x0 0x1000>;
> 
> 0x1000 base?

I've corrected this in the example in the commit message for v2.

Re: [PATCH V3 2/4] misc: vop: do not allocate and reassign the used ring

2020-10-28 Thread Vincent Whitchurch

On Wed, Oct 28, 2020 at 02:47:49AM +0100, Sherry Sun wrote:
> > Subject: Re: [PATCH V3 2/4] misc: vop: do not allocate and reassign the used
> > ring
> > 
> > On Tue, Oct 27, 2020 at 08:05:43AM +0100, Sherry Sun wrote:
> > > Can you help test the patch about removing the codes of reassign used
> > > ring, and comment on the impact for Intel MIC platform?  Thanks for
> > > any help.
> > 
> > I don't have access to MIC hardware myself, either.
> > 
> > But this patch is quite certainly going to break it since guests using a 
> > kernel
> > without the patch will not work with hosts with a kernel with the patch.
> 
> Thanks for your reply.
> This patch can be used by both guests and hosts.
> I have tested it on imx8qm platform(both guest and host are ARM64 
> architecture), and it works well.
> So I guess Intel MIC won't meet big problems when both guest and host
> apply this patch. But it is best if it can be tested.

The guest and host are different systems and it should be possible to
upgrade the host kernel without being forced to upgrade the guest
kernel, and vice-versa.  This patch breaks that.

If MIC hardware has no users and the drivers are being deleted then of
course backward compatibility doesn't matter.

Re: [PATCH V5 0/2] Change vring space from nomal memory to dma coherent memory

2020-10-28 Thread Vincent Whitchurch

On Wed, Oct 28, 2020 at 06:58:36AM +0100, Greg KH wrote:
> Have you all seen:
>   
> https://lore.kernel.org/r/8c1443136563de34699d2c084df478181c205db4.1603854416.git.sudeep.d...@intel.com

No, that link doesn't work and I can't find that email from Sudeep in
any of the archives:

 https://lore.kernel.org/lkml/?q=f%3Asudeep.dutt%40intel.com

Sudeep, perhaps you could try to resend the patch?  Thank you.

Re: [PATCH v2] gpio: mockup: Allow probing from device tree

2020-10-28 Thread Vincent Whitchurch

On Wed, Oct 28, 2020 at 12:43:22PM +0100, Andy Shevchenko wrote:
> On Wed, Oct 28, 2020 at 10:00 AM Vincent Whitchurch
>  wrote:
> > Allow the mockup driver to be probed via the device tree without any
> > module parameters, allowing it to be used to configure and test higher
> > level drivers like the leds-gpio driver and corresponding userspace
> > before actual hardware is available.
> 
> You have to officially announce a DT binding for that.

Rob Herring has earlier said that was not required for this driver.  As
I mentioned a little further down in the email you are replying to:

| Drop DT binding document, since Rob Herring was OK with not documenting
| this:
| 
https://lore.kernel.org/linux-devicetree/5baa1ae6.1c69fb81.847f2.3...@mx.google.com/

Re: [PATCH v2] gpio: mockup: Allow probing from device tree

2020-10-28 Thread Vincent Whitchurch

On Tue, Oct 27, 2020 at 07:12:13PM +0100, Bartosz Golaszewski wrote:
> On Tue, Oct 27, 2020 at 2:54 PM Vincent Whitchurch
>  wrote:
> > diff --git a/drivers/gpio/gpio-mockup.c b/drivers/gpio/gpio-mockup.c
> > index 67ed4f238d43..c93892a6936a 100644
> > --- a/drivers/gpio/gpio-mockup.c
> > +++ b/drivers/gpio/gpio-mockup.c
> > @@ -13,6 +13,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> 
> Please keep the includes ordered alphabetically.

Thanks, fixed in v3.

> >  #include 
> >  #include 
> >  #include 
> > @@ -460,9 +461,18 @@ static int gpio_mockup_probe(struct platform_device 
> > *pdev)
> > return 0;
> >  }
> >
> > +#ifdef CONFIG_OF
> > +static const struct of_device_id gpio_mockup_of_match[] = {
> > +   { .compatible = "gpio-mockup", },
> > +   {},
> > +};
> > +MODULE_DEVICE_TABLE(of, gpio_mockup_of_match);
> > +#endif
> 
> You don't need this ifdef - of_match_ptr() will evaluate to NULL if
> CONFIG_OF is disabled and the compiler will optimize this struct out.

The compiler can't optimise out the struct in the case of a module build
since there is a reference from the MODULE_DEVICE_TABLE:

 $ grep CONFIG_OF .config
 # CONFIG_OF is not set
 $ nm drivers/gpio/gpio-mockup.ko  | grep of_
  r gpio_mockup_of_match
  R __mod_of__gpio_mockup_of_match_device_table

But these few wasted bytes don't matter so I removed the CONFIG_OF
anyway as you suggested.

[PATCH v3] gpio: mockup: Allow probing from device tree

2020-10-28 Thread Vincent Whitchurch

Allow the mockup driver to be probed via the device tree without any
module parameters, allowing it to be used to configure and test higher
level drivers like the leds-gpio driver and corresponding userspace
before actual hardware is available.

Signed-off-by: Vincent Whitchurch 
---

Notes:
v3:
- Keep includes sorted alphabetically
- Drop CONFIG_OF ifdefs

v2:
- Remove most of the added code, since the latest driver doesn't need it.
- Drop DT binding document, since Rob Herring was OK with not documenting 
this:
  
https://lore.kernel.org/linux-devicetree/5baa1ae6.1c69fb81.847f2.3...@mx.google.com/

 drivers/gpio/gpio-mockup.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpio/gpio-mockup.c b/drivers/gpio/gpio-mockup.c
index 67ed4f238d43..ca87c590ef3f 100644
--- a/drivers/gpio/gpio-mockup.c
+++ b/drivers/gpio/gpio-mockup.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -460,9 +461,16 @@ static int gpio_mockup_probe(struct platform_device *pdev)
return 0;
 }
 
+static const struct of_device_id gpio_mockup_of_match[] = {
+   { .compatible = "gpio-mockup", },
+   {},
+};
+MODULE_DEVICE_TABLE(of, gpio_mockup_of_match);
+
 static struct platform_driver gpio_mockup_driver = {
.driver = {
.name = "gpio-mockup",
+   .of_match_table = of_match_ptr(gpio_mockup_of_match),
},
.probe = gpio_mockup_probe,
 };
@@ -556,8 +564,7 @@ static int __init gpio_mockup_init(void)
 {
int i, num_chips, err;
 
-   if ((gpio_mockup_num_ranges < 2) ||
-   (gpio_mockup_num_ranges % 2) ||
+   if ((gpio_mockup_num_ranges % 2) ||
(gpio_mockup_num_ranges > GPIO_MOCKUP_MAX_RANGES))
return -EINVAL;
 
-- 
2.28.0

Re: [PATCH v2] gpio: mockup: Allow probing from device tree

2020-10-29 Thread Vincent Whitchurch

On Wed, Oct 28, 2020 at 09:25:32PM +0100, Andy Shevchenko wrote:
> On Wed, Oct 28, 2020 at 8:41 PM Bartosz Golaszewski  wrote:
> > On Tue, Oct 27, 2020 at 2:54 PM Vincent Whitchurch
> > > +#ifdef CONFIG_OF
> > > +static const struct of_device_id gpio_mockup_of_match[] = {
> > > +   { .compatible = "gpio-mockup", },
> > > +   {},
> > > +};
> > > +MODULE_DEVICE_TABLE(of, gpio_mockup_of_match);
> > > +#endif
> >
> > You don't need this ifdef - of_match_ptr() will evaluate to NULL if
> > CONFIG_OF is disabled and the compiler will optimize this struct out.
> 
> It's not so. If you drop ugly ifdeffery (and I vote for that, see also
> above) the of_match_ptr() must be dropped as well.  Otherwise the
> compiler will issue the warning. So it is either all or none.

Yes, you're right.  I actually tested a !OF build before but it turns
out that the warning is disabled by default and only enabled with W=1.
I'll fix this and change the header in the next version.  Thank you.

[PATCH v4] gpio: mockup: Allow probing from device tree

2020-10-29 Thread Vincent Whitchurch

Allow the mockup driver to be probed via the device tree without any
module parameters, allowing it to be used to configure and test higher
level drivers like the leds-gpio driver and corresponding userspace
before actual hardware is available.

Signed-off-by: Vincent Whitchurch 
---

Notes:
v4:
- Remove of_match_ptr() to fix unused variable warning with W=1
- Include linux/mod_devicetable.h instead of linux/of.h

v3:
- Keep includes sorted alphabetically
- Drop CONFIG_OF ifdefs

v2:
- Remove most of the added code, since the latest driver doesn't need it.
- Drop DT binding document, since Rob Herring was OK with not documenting 
this:
  
https://lore.kernel.org/linux-devicetree/5baa1ae6.1c69fb81.847f2.3...@mx.google.com/

Range-diff against v3:
1:  1e9b8f36676d ! 1:  4e8fdcfe1a47 gpio: mockup: Allow probing from device tree
@@ Commit message
 
 
  ## Notes ##
+v4:
+- Remove of_match_ptr() to fix unused variable warning with W=1
+- Include linux/mod_devicetable.h instead of linux/of.h
+
 v3:
 - Keep includes sorted alphabetically
 - Drop CONFIG_OF ifdefs
@@ Notes
 
  ## drivers/gpio/gpio-mockup.c ##
 @@
+ #include 
  #include 
  #include 
++#include 
  #include 
-+#include 
  #include 
  #include 
- #include 
 @@ drivers/gpio/gpio-mockup.c: static int gpio_mockup_probe(struct 
platform_device *pdev)
return 0;
  }
@@ drivers/gpio/gpio-mockup.c: static int gpio_mockup_probe(struct 
platform_device
  static struct platform_driver gpio_mockup_driver = {
.driver = {
.name = "gpio-mockup",
-+  .of_match_table = of_match_ptr(gpio_mockup_of_match),
++  .of_match_table = gpio_mockup_of_match,
},
.probe = gpio_mockup_probe,
  };

 drivers/gpio/gpio-mockup.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpio/gpio-mockup.c b/drivers/gpio/gpio-mockup.c
index 67ed4f238d43..28b757d34046 100644
--- a/drivers/gpio/gpio-mockup.c
+++ b/drivers/gpio/gpio-mockup.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -460,9 +461,16 @@ static int gpio_mockup_probe(struct platform_device *pdev)
return 0;
 }
 
+static const struct of_device_id gpio_mockup_of_match[] = {
+   { .compatible = "gpio-mockup", },
+   {},
+};
+MODULE_DEVICE_TABLE(of, gpio_mockup_of_match);
+
 static struct platform_driver gpio_mockup_driver = {
.driver = {
.name = "gpio-mockup",
+   .of_match_table = gpio_mockup_of_match,
},
.probe = gpio_mockup_probe,
 };
@@ -556,8 +564,7 @@ static int __init gpio_mockup_init(void)
 {
int i, num_chips, err;
 
-   if ((gpio_mockup_num_ranges < 2) ||
-   (gpio_mockup_num_ranges % 2) ||
+   if ((gpio_mockup_num_ranges % 2) ||
(gpio_mockup_num_ranges > GPIO_MOCKUP_MAX_RANGES))
return -EINVAL;
 
-- 
2.28.0

Re: [PATCH V3 2/4] misc: vop: do not allocate and reassign the used ring

2020-10-29 Thread Vincent Whitchurch

On Wed, Oct 28, 2020 at 04:50:36PM +0100, Arnd Bergmann wrote:
> I think we should try to do something on top of the PCIe endpoint subsystem
> to make it work across arbitrary combinations of host and device
> implementations,
> and provide a superset of what the MIC driver, (out-of-tree) Bluefield 
> endpoint
> driver, and the NTB subsystem as well as a couple of others used to do,
> each of them tunneling block/network/serial/... over a PCIe link of some
> sort, usually with virtio.

VOP is not PCIe-specific (as demonstrated by the vop-loopback patches I
posted a while ago [1]), and it would be a shame for a replacement to be
tied to the PCIe endpoint subsystem.  There are many SOCs out there
which have multiple Linux-capable processors without cache-coherency
between them.  VOP is (or should I say was since I guess it's being
deleted) the closest we have in mainline to easily get generic virtio
(and not just rpmsg) running between these kind of Linux instances.  If
a new replacement framework were to be PCIe-exclusive then we'd have to
invent one more framework for non-PCIe links to do pretty much the same
thing.

[1] 
https://lore.kernel.org/lkml/20190403104746.16063-1-vincent.whitchu...@axis.com/

1 2 >

1 - 100 of 124 matches

Mail list logo