Re: [PATCH v2] HID: uhid: forbid UHID_CREATE under KERNEL_DS or elevated privileges

2018-11-19 Thread David Herrmann
Hey

On Mon, Nov 19, 2018 at 1:52 PM Jiri Kosina  wrote:
>
>
> [ David added to CC ]
>
> On Wed, 14 Nov 2018, Eric Biggers wrote:
>
> > From: Eric Biggers 
> >
> > When a UHID_CREATE command is written to the uhid char device, a
> > copy_from_user() is done from a user pointer embedded in the command.
> > When the address limit is KERNEL_DS, e.g. as is the case during
> > sys_sendfile(), this can read from kernel memory.  Alternatively,
> > information can be leaked from a setuid binary that is tricked to write
> > to the file descriptor.  Therefore, forbid UHID_CREATE in these cases.
> >
> > No other commands in uhid_char_write() are affected by this bug and
> > UHID_CREATE is marked as "obsolete", so apply the restriction to
> > UHID_CREATE only rather than to uhid_char_write() entirely.
> >
> > Thanks to Dmitry Vyukov for adding uhid definitions to syzkaller and to
> > Jann Horn for commit 9da3f2b740544 ("x86/fault: BUG() when uaccess
> > helpers fault on kernel addresses"), allowing this bug to be found.
> >
> > Reported-by: syzbot+72473edc9bf4eb1c6...@syzkaller.appspotmail.com
> > Fixes: d365c6cfd337 ("HID: uhid: add UHID_CREATE and UHID_DESTROY events")
> > Cc:  # v3.6+
> > Cc: Jann Horn 
> > Cc: Andy Lutomirski 
> > Signed-off-by: Eric Biggers 
>
> Thanks for the patch. I however believe the fix below is more generic, and
> would prefer taking that one in case noone sees any major flaw in that
> I've overlooked. Thanks.

As Andy rightly pointed out, the credentials check is actually needed.
The scenario here is using a uhid-fd as stdout when executing a
setuid-program. This will possibly end up reading arbitrary memory
from the setuid program and use it as input for the hid-descriptor.

To my knowledge, this is a rather small attack surface. UHID is
usually a privileged interface, which in itself already gives you
ridiculous privileges. Furthermore, it only allows read-access if you
happen to be able to craft the message the setuid program writes
(which must be a valid user-space pointer, pointing to a valid hid
descriptor).
But people have been creative in the past, and they will find a way to
use this. So I do think Eric's patch here is the way to go.

Lastly, this only guards UHID_CREATE, which is already a deprecated
interface for several years. I don't think we can drop it any time
soon, but at least the other uhid interfaces should be safe.

Thanks
David


Re: [PATCH v2] HID: uhid: forbid UHID_CREATE under KERNEL_DS or elevated privileges

2018-11-15 Thread David Herrmann
Hi

On Wed, Nov 14, 2018 at 10:55 PM Eric Biggers  wrote:
> From: Eric Biggers 
>
> When a UHID_CREATE command is written to the uhid char device, a
> copy_from_user() is done from a user pointer embedded in the command.
> When the address limit is KERNEL_DS, e.g. as is the case during
> sys_sendfile(), this can read from kernel memory.  Alternatively,
> information can be leaked from a setuid binary that is tricked to write
> to the file descriptor.  Therefore, forbid UHID_CREATE in these cases.
>
> No other commands in uhid_char_write() are affected by this bug and
> UHID_CREATE is marked as "obsolete", so apply the restriction to
> UHID_CREATE only rather than to uhid_char_write() entirely.
>
> Thanks to Dmitry Vyukov for adding uhid definitions to syzkaller and to
> Jann Horn for commit 9da3f2b740544 ("x86/fault: BUG() when uaccess
> helpers fault on kernel addresses"), allowing this bug to be found.
>
> Reported-by: syzbot+72473edc9bf4eb1c6...@syzkaller.appspotmail.com
> Fixes: d365c6cfd337 ("HID: uhid: add UHID_CREATE and UHID_DESTROY events")
> Cc:  # v3.6+
> Cc: Jann Horn 
> Cc: Andy Lutomirski 
> Signed-off-by: Eric Biggers 
> ---
>  drivers/hid/uhid.c | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/drivers/hid/uhid.c b/drivers/hid/uhid.c
> index 3c55073136064..051639c09f728 100644
> --- a/drivers/hid/uhid.c
> +++ b/drivers/hid/uhid.c
> @@ -12,6 +12,7 @@
>
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -722,6 +723,17 @@ static ssize_t uhid_char_write(struct file *file, const 
> char __user *buffer,
>
> switch (uhid->input_buf.type) {
> case UHID_CREATE:
> +   /*
> +* 'struct uhid_create_req' contains a __user pointer which is
> +* copied from, so it's unsafe to allow this with elevated
> +* privileges (e.g. from a setuid binary) or via 
> kernel_write().
> +*/
> +   if (file->f_cred != current_cred() || uaccess_kernel()) {

I think `uaccess_kernel()` would be enough. UHID does not check any
credentials. If you believe this should be there nevertheless, feel
free to keep it. Either way:

Acked-by: David Herrmann 

Thanks
David

> +   pr_err_once("UHID_CREATE from different security 
> context by process %d (%s), this is not allowed.\n",
> +   task_tgid_vnr(current), current->comm);
> +   ret = -EACCES;
> +   goto unlock;
> +   }
> ret = uhid_dev_create(uhid, &uhid->input_buf);
> break;
> case UHID_CREATE2:
> --
> 2.19.1.930.g4563a0d9d0-goog
>


Re: [PATCH v2] HID: uhid: forbid UHID_CREATE under KERNEL_DS or elevated privileges

2018-11-15 Thread David Herrmann
Hey

On Thu, Nov 15, 2018 at 9:14 AM Benjamin Tissoires
 wrote:
>
> On Thu, Nov 15, 2018 at 12:20 AM Dmitry Torokhov  wrote:
> > > I think it's best not to make
> > > assumptions about how the interface will be used and to be consistent 
> > > with how
> > > other ->write() methods in the kernel handle the misfeature where a __user
> > > pointer in the write() or read() payload is dereferenced.
> >
> > I can see that you might want to check credentials, etc, if interface
> > can be accessed by unprivileged process, however is it a big no no for
> > uhid/userio/uinput devices.
>
> Yep, any sane distribution would restrict the permissions of
> uhid/userio/uinput to only be accessed by root. If that ever changes,
> there is already an issue with the system and it was compromised
> either by a terribly dizzy sysadmin.

UHID is safe to be used by a non-root user. This does not imply that
you should open up access to the world, but you are free to have a
dedicated group or user with access to uhid. I agree that in most
common desktop-scenarios you should not grant world-access to it,
though.

> >
> > > Temporarily switching
> > > to USER_DS would only avoid one of the two problems.
> >
> > So because of the above there is only one problem. If your system
> > opened access to uhid to random processes you have much bigger
> > problems than exposing some data from a suid binary. You can spam "rm
> > -rf .; rm -rf /" though uhid if there is interactive session
> > somewhere.
> >
> > >
> > > Do you think the proposed restrictions would actually break anything?
> >
> > It would break if someone uses UHID_CREATE with sendpage. I do not
> > know if anyone does. If we were certain there are no users we'd simply
> > removed UHID_CREATE altogether.
>
> AFAICT, there are 2 users of uhid:
> - bluez for BLE devices (using HID over GATT)
> - hid-replay for debugging.

There are several more (eg., android bt-broadcom), and UHID_CREATE is
actively used. Dropping support for it will break these use-cases.

Thanks
David


Re: [PATCH] Revert "HID: uhid: use strlcpy() instead of strncpy()"

2018-11-15 Thread David Herrmann
Hi

On Thu, Nov 15, 2018 at 12:09 AM Kees Cook  wrote:
> On Wed, Nov 14, 2018 at 9:40 AM, Laura Abbott  wrote:
[...]
> > Can we switch to strscpy instead? This will quiet gcc and avoid the
> > issues with strlcpy.
>
> Yes please: it looks like these strings are expected to be NUL
> terminated, so strscpy() without the "- 1" and min() logic would be
> the correct solution here.

"the correct solution"? To my knowledge the original code was correct
as well. Am I missing something?

>If @hid is already zero, then this would
> just be:
>
>strscpy(hid->name, ev->u.create2.name, sizeof(hid->name));
>strscpy(hid->phys, ev->u.create2.phys, sizeof(hid->phys));
>strscpy(hid->uniq, ev->u.create2.uniq, sizeof(hid->uniq));
>
> If they are NOT NUL terminated, then keep using strncpy() but mark the
> fields in the struct with the __nonstring attribute.

They are supposed to be NUL terminated, but for compatibility reasons
we allow them to be not. So I don't think your proposal is safe.

Thanks
David


[PATCH] Revert "HID: uhid: use strlcpy() instead of strncpy()"

2018-11-14 Thread David Herrmann
This reverts commit 336fd4f5f25157e9e8bd50e898a1bbcd99eaea46.

Please note that `strlcpy()` does *NOT* do what you think it does.
strlcpy() *ALWAYS* reads the full input string, regardless of the
'length' parameter. That is, if the input is not zero-terminated,
strlcpy() will *READ* beyond input boundaries. It does this, because it
always returns the size it *would* copy if the target was big enough,
not the truncated size it actually copied.

The original code was perfectly fine. The hid device is
zero-initialized and the strncpy() functions copied up to n-1
characters. The result is always zero-terminated this way.

This is the third time someone tried to replace strncpy with strlcpy in
this function, and gets it wrong. I now added a comment that should at
least make people reconsider.

Signed-off-by: David Herrmann 
---
 drivers/hid/uhid.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/hid/uhid.c b/drivers/hid/uhid.c
index fefedc0b4dc6..0dfdd0ac7120 100644
--- a/drivers/hid/uhid.c
+++ b/drivers/hid/uhid.c
@@ -496,12 +496,13 @@ static int uhid_dev_create2(struct uhid_device *uhid,
goto err_free;
}
 
-   len = min(sizeof(hid->name), sizeof(ev->u.create2.name));
-   strlcpy(hid->name, ev->u.create2.name, len);
-   len = min(sizeof(hid->phys), sizeof(ev->u.create2.phys));
-   strlcpy(hid->phys, ev->u.create2.phys, len);
-   len = min(sizeof(hid->uniq), sizeof(ev->u.create2.uniq));
-   strlcpy(hid->uniq, ev->u.create2.uniq, len);
+   /* @hid is zero-initialized, strncpy() is correct, strlcpy() not */
+   len = min(sizeof(hid->name), sizeof(ev->u.create2.name)) - 1;
+   strncpy(hid->name, ev->u.create2.name, len);
+   len = min(sizeof(hid->phys), sizeof(ev->u.create2.phys)) - 1;
+   strncpy(hid->phys, ev->u.create2.phys, len);
+   len = min(sizeof(hid->uniq), sizeof(ev->u.create2.uniq)) - 1;
+   strncpy(hid->uniq, ev->u.create2.uniq, len);
 
hid->ll_driver = &uhid_hid_driver;
hid->bus = ev->u.create2.bus;
-- 
2.19.1



[PATCH] HID: uhid: prevent splice(2)

2018-11-14 Thread David Herrmann
The kernel has a default implementation of splice(2) for writing from a
pipe into an arbitrary file. This behavior can be overriden by
providing an f_op.splice_write() callback.

Unfortunately, the default implementation of splice_write() takes page
by page from the source pipe, calls kmap() and passes the mapped page
as kernel-address to f_op.write(). Thus, it uses standard write(2) to
implement splice(2). However, since the page is kernel-mapped, they
have to `set_fs(get_ds())`. This is mostly fine, but UHID takes
command-streams through write(2), and thus it might interpret the data
taken as pointers. If called with KERNEL_DS, you can trick UHID to
allow kernel-space pointers as well.

As a simple fix, prevent splice(2) on UHID. It is unsecure, but it is
also non-functional. We need a linear mapping of the input in UHID, so
chunked input from splice(2) makes no sense, anyway.

Reported-by: syzbot+72473edc9bf4eb1c6...@syzkaller.appspotmail.com
Signed-off-by: David Herrmann 
---
 drivers/hid/uhid.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/hid/uhid.c b/drivers/hid/uhid.c
index 3c5507313606..fefedc0b4dc6 100644
--- a/drivers/hid/uhid.c
+++ b/drivers/hid/uhid.c
@@ -753,6 +753,15 @@ static ssize_t uhid_char_write(struct file *file, const 
char __user *buffer,
return ret ? ret : count;
 }
 
+static ssize_t uhid_char_splice_write(struct pipe_inode_info *pipe,
+ struct file *out,
+ loff_t *ppos,
+ size_t len,
+ unsigned int flags)
+{
+   return -EOPNOTSUPP;
+}
+
 static __poll_t uhid_char_poll(struct file *file, poll_table *wait)
 {
struct uhid_device *uhid = file->private_data;
@@ -771,6 +780,7 @@ static const struct file_operations uhid_fops = {
.release= uhid_char_release,
.read   = uhid_char_read,
.write  = uhid_char_write,
+   .splice_write   = uhid_char_splice_write,
.poll   = uhid_char_poll,
.llseek = no_llseek,
 };
-- 
2.19.1



Re: BUG: GPF in non-whitelisted uaccess (non-canonical address?)

2018-11-14 Thread David Herrmann
Hey

On Wed, Nov 14, 2018 at 1:25 AM syzbot
 wrote:
> syzbot has found a reproducer for the following crash on:
>
> HEAD commit:ccda4af0f4b9 Linux 4.20-rc2
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=13b4e77b40
> kernel config:  https://syzkaller.appspot.com/x/.config?x=4a0a89f12ca9b0f5
> dashboard link: https://syzkaller.appspot.com/bug?extid=72473edc9bf4eb1c6556
> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1646a22540
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=108a653340
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+72473edc9bf4eb1c6...@syzkaller.appspotmail.com
>
[...]
> BUG: GPF in non-whitelisted uaccess (non-canonical address?)

This uses sendpage(2) to feed data from a file into a uhid chardev.
The default behavior of the kernel is to create a temporary pipe, then
splice from the file into the pipe, and then splice again from the
pipe into uhid.

The kernel provides default implementations for splicing between files
and any other file. The default implementation of `.splice_write()`
uses kmap() to map the page from the pipe and then uses the
__kernel_write() (which uses .f_op->write()) to push the data into the
target file. The problem is, __kernel_write() sets the address-space
to KERNEL_DS `set_fs(get_ds())`, thus granting the UHID request access
to kernel memory.

I see several ways to fix that, the most simple solution is to simply
prevent splice/sendpage on uhid (by setting f_op.splice_write to a
dummy). Alternatively, we can implement a proper splice helper that
takes the page directly, rather than through the __kernel_write()
default implementation.

Thanks
David


Re: [PATCH] HID: uhid: fix a missing-check bug

2018-05-06 Thread David Herrmann
Hey

On Sat, May 5, 2018 at 6:17 AM, Wenwen Wang  wrote:
> In uhid_event_from_user(), if it is in_compat_syscall(), the 'type' of the
> event is first fetched from the 'buffer' in userspace and checked. If the
> 'type' is UHID_CREATE, it is a messed up request with compat pointer, which
> could be more than 256 bytes, so it is better allocated from the heap, as
> mentioned in the comment. Its fields are then prepared one by one instead
> of using a whole copy. For all other cases, the event object is copied
> directly from user space. In other words, based on the 'type', the memory
> size and structure of the event object vary.
>
> Given that the 'buffer' resides in userspace, a malicious userspace process
> can race to change the 'type' between the two copies, which will cause
> inconsistency issues, potentially security issues. Plus, various operations
> such as uhid_dev_destroy() and uhid_dev_input() are performed based on
> 'type' in function uhid_char_write(). If 'type' is modified by user, there
> could be some issues such as uninitialized uses.
>
> To fix this problem, we need to recheck the type after the second fetch to
> make sure it is not UHID_CREATE.
>
> Signed-off-by: Wenwen Wang 
> ---
>  drivers/hid/uhid.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hid/uhid.c b/drivers/hid/uhid.c
> index 3c55073..0220385 100644
> --- a/drivers/hid/uhid.c
> +++ b/drivers/hid/uhid.c
> @@ -447,11 +447,17 @@ static int uhid_event_from_user(const char __user 
> *buffer, size_t len,
> event->u.create.country = compat->country;
>
> kfree(compat);
> -   return 0;
> +   } else {
> +   if (copy_from_user(event, buffer,
> +   min(len, sizeof(*event
> +   return -EFAULT;
> +   if (event->type == UHID_CREATE)
> +   return -EINVAL;

You are correct, the race against malicious/buggy user-space exists.
However, it is harmless. Lets assume user-space rewrites the input,
all that happens is that we end up with garbage in the kernel.
However, input-validation is done *after* this function, hence
everything will work just fine. If user-space rewrites input-buffers,
they better be aware that this might mean random garbage is pushed
into the kernel.

IOW, this race simply allows user-space to circumvent the "compat
conversion". But if user-space wants that... let them. No point in
protecting them from doing something stupid.

Thanks
David

> }
> -   /* All others can be copied directly */
> +   return 0;
> }
>
> +   /* Others can be copied directly */
> if (copy_from_user(event, buffer, min(len, sizeof(*event
> return -EFAULT;
>
> --
> 2.7.4
>


[PATCH v2 0/4] Introduce LSM-hook for socketpair(2)

2018-05-04 Thread David Herrmann
Hi

This is v2 of the socketpair(2) LSM hook introduction. Changes
since v1 are:

  - Added ACKs from previous series.

  - Moved the hook into generic socketpair(2) handling. The hook is now
called security_socket_socketpair(), just like the other hooks on
the socket layer.
There is no AF_UNIX specific code, anymore.

  - Added SMACK support.

  - Still *NO* AppArmor support, since upstream AppArmor lacks
SO_PEERSEC support and requires downstream patches (in particular,
the apparmor_unix_stream_connect() function is mostly a stub and
was never synced with Ubuntu downstream modifications).

Old cover letter follows (only trivial changes).

Thanks
David


This series adds a new LSM hook for the socketpair(2) syscall. The idea
is to allow SO_PEERSEC to be called on AF_UNIX sockets created via
socketpair(2), and return the same information as if you emulated
socketpair(2) via a temporary listener socket. Right now SO_PEERSEC
will return the unlabeled credentials for a socketpair, rather than the
actual credentials of the creating process.

A simple call to:

socketpair(AF_UNIX, SOCK_STREAM, 0, out);

can be emulated via a temporary listener socket bound to a unique,
random name in the abstract namespace. By connecting to this listener
socket, accept(2) will return the second part of the pair. If
SO_PEERSEC is queried on these, the correct credentials of the creating
process are returned. A simple comparison between the behavior of
SO_PEERSEC on socketpair(2) and an emulated socketpair is included in
the dbus-broker test-suite [1].

This patch series tries to close this gap and makes both behave the
same. A new LSM-hook is added which allows LSMs to cache the correct
peer information on newly created socket-pairs.

Apart from fixing this behavioral difference, the dbus-broker project
actually needs to query the credentials of socketpairs, and currently
must resort to racy procfs(2) queries to get the LSM credentials of its
controller socket. Several parts of the dbus-broker project allow you
to pass in a socket during execve(2), which will be used by the child
process to accept control-commands from its parent. One natural way to
create this communication channel is to use socketpair(2). However,
right now SO_PEERSEC does not return any useful information, hence, the
child-process would need other means to retrieve this information. By
avoiding socketpair(2) and using the hacky-emulated version, this is not
an issue.

There was a previous discussion on this matter [2] roughly a year ago.
Back then there was the suspicion that proper SO_PEERSEC would confuse
applications. However, we could not find any evidence backing this
suspicion. Furthermore, we now actually see the contrary. Lack of
SO_PEERSEC makes it a hassle to use socketpairs with LSM credentials.
Hence, we propose to implement full SO_PEERSEC for socketpairs.

Thanks
David

[1] https://github.com/bus1/dbus-broker/blob/master/src/util/test-peersec.c
[2] https://www.spinics.net/lists/selinux/msg22674.html

David Herrmann (3):
  security: add hook for socketpair()
  net: hook socketpair() into LSM
  selinux: provide socketpair callback

Tom Gundersen (1):
  smack: provide socketpair callback

 include/linux/lsm_hooks.h  |  7 +++
 include/linux/security.h   |  7 +++
 net/socket.c   |  7 +++
 security/security.c|  6 ++
 security/selinux/hooks.c   | 13 +
 security/smack/smack_lsm.c | 22 ++
 6 files changed, 62 insertions(+)

-- 
2.17.0



[PATCH v2 3/4] selinux: provide socketpair callback

2018-05-04 Thread David Herrmann
Make sure to implement the new socketpair callback so the SO_PEERSEC
call on socketpair(2)s will return correct information.

Acked-by: Serge Hallyn 
Acked-by: Stephen Smalley 
Signed-off-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 security/selinux/hooks.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 4cafe6a19167..02ebd1585eaf 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4569,6 +4569,18 @@ static int selinux_socket_post_create(struct socket 
*sock, int family,
return err;
 }
 
+static int selinux_socket_socketpair(struct socket *socka,
+struct socket *sockb)
+{
+   struct sk_security_struct *sksec_a = socka->sk->sk_security;
+   struct sk_security_struct *sksec_b = sockb->sk->sk_security;
+
+   sksec_a->peer_sid = sksec_b->sid;
+   sksec_b->peer_sid = sksec_a->sid;
+
+   return 0;
+}
+
 /* Range of port numbers used to automatically bind.
Need to determine whether we should perform a name_bind
permission check between the socket and the port number. */
@@ -6999,6 +7011,7 @@ static struct security_hook_list selinux_hooks[] 
__lsm_ro_after_init = {
 
LSM_HOOK_INIT(socket_create, selinux_socket_create),
LSM_HOOK_INIT(socket_post_create, selinux_socket_post_create),
+   LSM_HOOK_INIT(socket_socketpair, selinux_socket_socketpair),
LSM_HOOK_INIT(socket_bind, selinux_socket_bind),
LSM_HOOK_INIT(socket_connect, selinux_socket_connect),
LSM_HOOK_INIT(socket_listen, selinux_socket_listen),
-- 
2.17.0



[PATCH v2 4/4] smack: provide socketpair callback

2018-05-04 Thread David Herrmann
From: Tom Gundersen 

Make sure to implement the new socketpair callback so the SO_PEERSEC
call on socketpair(2)s will return correct information.

Signed-off-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 security/smack/smack_lsm.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 0b414836bebd..dcb976f98df2 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -2842,6 +2842,27 @@ static int smack_socket_post_create(struct socket *sock, 
int family,
return smack_netlabel(sock->sk, SMACK_CIPSO_SOCKET);
 }
 
+/**
+ * smack_socket_socketpair - create socket pair
+ * @socka: one socket
+ * @sockb: another socket
+ *
+ * Cross reference the peer labels for SO_PEERSEC
+ *
+ * Returns 0 on success, and error code otherwise
+ */
+static int smack_socket_socketpair(struct socket *socka,
+  struct socket *sockb)
+{
+   struct socket_smack *asp = socka->sk->sk_security;
+   struct socket_smack *bsp = sockb->sk->sk_security;
+
+   asp->smk_packet = bsp->smk_out;
+   bsp->smk_packet = asp->smk_out;
+
+   return 0;
+}
+
 #ifdef SMACK_IPV6_PORT_LABELING
 /**
  * smack_socket_bind - record port binding information.
@@ -4724,6 +4745,7 @@ static struct security_hook_list smack_hooks[] 
__lsm_ro_after_init = {
LSM_HOOK_INIT(unix_may_send, smack_unix_may_send),
 
LSM_HOOK_INIT(socket_post_create, smack_socket_post_create),
+   LSM_HOOK_INIT(socket_socketpair, smack_socket_socketpair),
 #ifdef SMACK_IPV6_PORT_LABELING
LSM_HOOK_INIT(socket_bind, smack_socket_bind),
 #endif
-- 
2.17.0



[PATCH v2 1/4] security: add hook for socketpair()

2018-05-04 Thread David Herrmann
Right now the LSM labels for socketpairs are always uninitialized,
since there is no security hook for the socketpair() syscall. This
patch adds the required hooks so LSMs can properly label socketpairs.
This allows SO_PEERSEC to return useful information on those sockets.

Note that the behavior of socketpair() can be emulated by creating a
listener socket, connecting to it, and then discarding the initial
listener socket. With this workaround, SO_PEERSEC would return the
caller's security context. However, with socketpair(), the uninitialized
context is returned unconditionally. This is unexpected and makes
socketpair() less useful in situations where the security context is
crucial to the application.

With the new socketpair-hook this disparity can be solved by making
socketpair() return the expected security context.

Acked-by: Serge Hallyn 
Signed-off-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 include/linux/lsm_hooks.h | 7 +++
 include/linux/security.h  | 7 +++
 security/security.c   | 6 ++
 3 files changed, 20 insertions(+)

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 9d0b286f3dba..8f1131c8dd54 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -757,6 +757,11 @@
  * @type contains the requested communications type.
  * @protocol contains the requested protocol.
  * @kern set to 1 if a kernel socket.
+ * @socket_socketpair:
+ * Check permissions before creating a fresh pair of sockets.
+ * @socka contains the first socket structure.
+ * @sockb contains the second socket structure.
+ * Return 0 if permission is granted and the connection was established.
  * @socket_bind:
  * Check permission before socket protocol layer bind operation is
  * performed and the socket @sock is bound to the address specified in the
@@ -1656,6 +1661,7 @@ union security_list_options {
int (*socket_create)(int family, int type, int protocol, int kern);
int (*socket_post_create)(struct socket *sock, int family, int type,
int protocol, int kern);
+   int (*socket_socketpair)(struct socket *socka, struct socket *sockb);
int (*socket_bind)(struct socket *sock, struct sockaddr *address,
int addrlen);
int (*socket_connect)(struct socket *sock, struct sockaddr *address,
@@ -1922,6 +1928,7 @@ struct security_hook_heads {
struct hlist_head unix_may_send;
struct hlist_head socket_create;
struct hlist_head socket_post_create;
+   struct hlist_head socket_socketpair;
struct hlist_head socket_bind;
struct hlist_head socket_connect;
struct hlist_head socket_listen;
diff --git a/include/linux/security.h b/include/linux/security.h
index 200920f521a1..4ff3ba457e56 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1191,6 +1191,7 @@ int security_unix_may_send(struct socket *sock,  struct 
socket *other);
 int security_socket_create(int family, int type, int protocol, int kern);
 int security_socket_post_create(struct socket *sock, int family,
int type, int protocol, int kern);
+int security_socket_socketpair(struct socket *socka, struct socket *sockb);
 int security_socket_bind(struct socket *sock, struct sockaddr *address, int 
addrlen);
 int security_socket_connect(struct socket *sock, struct sockaddr *address, int 
addrlen);
 int security_socket_listen(struct socket *sock, int backlog);
@@ -1262,6 +1263,12 @@ static inline int security_socket_post_create(struct 
socket *sock,
return 0;
 }
 
+static inline int security_socket_socketpair(struct socket *socka,
+struct socket *sockb)
+{
+   return 0;
+}
+
 static inline int security_socket_bind(struct socket *sock,
   struct sockaddr *address,
   int addrlen)
diff --git a/security/security.c b/security/security.c
index 7bc2fde023a7..68f46d849abe 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1358,6 +1358,12 @@ int security_socket_post_create(struct socket *sock, int 
family,
protocol, kern);
 }
 
+int security_socket_socketpair(struct socket *socka, struct socket *sockb)
+{
+   return call_int_hook(socket_socketpair, 0, socka, sockb);
+}
+EXPORT_SYMBOL(security_socket_socketpair);
+
 int security_socket_bind(struct socket *sock, struct sockaddr *address, int 
addrlen)
 {
return call_int_hook(socket_bind, 0, sock, address, addrlen);
-- 
2.17.0



[PATCH v2 2/4] net: hook socketpair() into LSM

2018-05-04 Thread David Herrmann
Use the newly created LSM-hook for socketpair(). The default hook
return-value is 0, so behavior stays the same unless LSMs start using
this hook.

Acked-by: Serge Hallyn 
Signed-off-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 net/socket.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/net/socket.c b/net/socket.c
index f10f1d947c78..667a7b397134 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1420,6 +1420,13 @@ int __sys_socketpair(int family, int type, int protocol, 
int __user *usockvec)
goto out;
}
 
+   err = security_socket_socketpair(sock1, sock2);
+   if (unlikely(err)) {
+   sock_release(sock2);
+   sock_release(sock1);
+   goto out;
+   }
+
err = sock1->ops->socketpair(sock1, sock2);
if (unlikely(err < 0)) {
sock_release(sock2);
-- 
2.17.0



Re: [PATCH 0/3] Introduce LSM-hook for socketpair(2)

2018-05-04 Thread David Herrmann
Hey

On Wed, Apr 25, 2018 at 9:02 PM, James Morris  wrote:
> On Wed, 25 Apr 2018, Paul Moore wrote:
>
>> On Wed, Apr 25, 2018 at 2:44 PM, James Morris  wrote:
>> > On Mon, 23 Apr 2018, David Herrmann wrote:
>> >> This patch series tries to close this gap and makes both behave the
>> >> same. A new LSM-hook is added which allows LSMs to cache the correct
>> >> peer information on newly created socket-pairs.
>> >
>> > Looks okay to me.
>> >
>> > Once it's respun with the Smack backend and maybe the hook name change,
>> > I'll merge this unless DaveM wants it to go in via his networking tree.
>>
>> Note my objection to the hook placement in patch 2/3; I think we
>> should move the hook out of the AF_UNIX layer and up into the socket
>> layer.
>
> I vote for this as it maintains the intended abstraction of the socket
> API.

Sounds good, I changed it. I will send v2 shortly.

Thanks
David


[PATCH 1/3] security: add hook for socketpair(AF_UNIX, ...)

2018-04-23 Thread David Herrmann
Right now the LSM labels for socketpairs are always uninitialized,
since there is no security hook for the socketpair() syscall. This
patch adds the required hooks so LSMs can properly label socketpairs.
This allows SO_PEERSEC to return useful information on those sockets.

Note that the behavior of socketpair() can be emulated by creating a
listener socket, connecting to it, and then discarding the initial
listener socket. With this workaround, SO_PEERSEC would return the
caller's security context. However, with socketpair(), the uninitialized
context is returned unconditionally. This is unexpected and makes
socketpair() less useful in situations where the security context is
crucial to the application.

With the new socketpair-hook this disparity can be solved by making
socketpair() return the expected security context.

Signed-off-by: David Herrmann 
---
 include/linux/lsm_hooks.h | 8 
 include/linux/security.h  | 7 +++
 security/security.c   | 6 ++
 3 files changed, 21 insertions(+)

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 9d0b286f3dba..2a23c75c1541 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -717,6 +717,12 @@
  * @other contains the peer sock structure.
  * @newsk contains the new sock structure.
  * Return 0 if permission is granted.
+ * @unix_stream_socketpair:
+ * Check permissions before establishing a Unix domain stream connection
+ * for a fresh pair of sockets.
+ * @socka contains the first sock structure.
+ * @sockb contains the second sock structure.
+ * Return 0 if permission is granted and the connection was established.
  * @unix_may_send:
  * Check permissions before connecting or sending datagrams from @sock to
  * @other.
@@ -1651,6 +1657,7 @@ union security_list_options {
 #ifdef CONFIG_SECURITY_NETWORK
int (*unix_stream_connect)(struct sock *sock, struct sock *other,
struct sock *newsk);
+   int (*unix_stream_socketpair)(struct sock *socka, struct sock *sockb);
int (*unix_may_send)(struct socket *sock, struct socket *other);
 
int (*socket_create)(int family, int type, int protocol, int kern);
@@ -1919,6 +1926,7 @@ struct security_hook_heads {
struct hlist_head inode_getsecctx;
 #ifdef CONFIG_SECURITY_NETWORK
struct hlist_head unix_stream_connect;
+   struct hlist_head unix_stream_socketpair;
struct hlist_head unix_may_send;
struct hlist_head socket_create;
struct hlist_head socket_post_create;
diff --git a/include/linux/security.h b/include/linux/security.h
index 200920f521a1..be275deeda10 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1187,6 +1187,7 @@ static inline int security_inode_getsecctx(struct inode 
*inode, void **ctx, u32
 #ifdef CONFIG_SECURITY_NETWORK
 
 int security_unix_stream_connect(struct sock *sock, struct sock *other, struct 
sock *newsk);
+int security_unix_stream_socketpair(struct sock *socka, struct sock *sockb);
 int security_unix_may_send(struct socket *sock,  struct socket *other);
 int security_socket_create(int family, int type, int protocol, int kern);
 int security_socket_post_create(struct socket *sock, int family,
@@ -1242,6 +1243,12 @@ static inline int security_unix_stream_connect(struct 
sock *sock,
return 0;
 }
 
+static inline int security_unix_stream_socketpair(struct sock *socka,
+ struct sock *sockb)
+{
+   return 0;
+}
+
 static inline int security_unix_may_send(struct socket *sock,
 struct socket *other)
 {
diff --git a/security/security.c b/security/security.c
index 7bc2fde023a7..3dfd374e84e5 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1340,6 +1340,12 @@ int security_unix_stream_connect(struct sock *sock, 
struct sock *other, struct s
 }
 EXPORT_SYMBOL(security_unix_stream_connect);
 
+int security_unix_stream_socketpair(struct sock *socka, struct sock *sockb)
+{
+   return call_int_hook(unix_stream_socketpair, 0, socka, sockb);
+}
+EXPORT_SYMBOL(security_unix_stream_socketpair);
+
 int security_unix_may_send(struct socket *sock,  struct socket *other)
 {
return call_int_hook(unix_may_send, 0, sock, other);
-- 
2.17.0



[PATCH 2/3] net/unix: hook unix_socketpair() into LSM

2018-04-23 Thread David Herrmann
Use the newly created LSM-hook for unix_socketpair(). The default hook
return-value is 0, so behavior stays the same unless LSMs start using
this hook.

Signed-off-by: David Herrmann 
---
 net/unix/af_unix.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 68bb70a62afe..bc9705ace9b1 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1371,6 +1371,11 @@ static int unix_stream_connect(struct socket *sock, 
struct sockaddr *uaddr,
 static int unix_socketpair(struct socket *socka, struct socket *sockb)
 {
struct sock *ska = socka->sk, *skb = sockb->sk;
+   int err;
+
+   err = security_unix_stream_socketpair(ska, skb);
+   if (err)
+   return err;
 
/* Join our sockets back to back */
sock_hold(ska);
-- 
2.17.0



[PATCH 3/3] selinux: provide unix_stream_socketpair callback

2018-04-23 Thread David Herrmann
Make sure to implement the new unix_stream_socketpair callback so the
SO_PEERSEC call on socketpair(2)s will return correct information.

Signed-off-by: David Herrmann 
---
 security/selinux/hooks.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 4cafe6a19167..828881d9a41d 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4905,6 +4905,18 @@ static int selinux_socket_unix_stream_connect(struct 
sock *sock,
return 0;
 }
 
+static int selinux_socket_unix_stream_socketpair(struct sock *socka,
+struct sock *sockb)
+{
+   struct sk_security_struct *sksec_a = socka->sk_security;
+   struct sk_security_struct *sksec_b = sockb->sk_security;
+
+   sksec_a->peer_sid = sksec_b->sid;
+   sksec_b->peer_sid = sksec_a->sid;
+
+   return 0;
+}
+
 static int selinux_socket_unix_may_send(struct socket *sock,
struct socket *other)
 {
@@ -6995,6 +7007,8 @@ static struct security_hook_list selinux_hooks[] 
__lsm_ro_after_init = {
LSM_HOOK_INIT(inode_getsecctx, selinux_inode_getsecctx),
 
LSM_HOOK_INIT(unix_stream_connect, selinux_socket_unix_stream_connect),
+   LSM_HOOK_INIT(unix_stream_socketpair,
+   selinux_socket_unix_stream_socketpair),
LSM_HOOK_INIT(unix_may_send, selinux_socket_unix_may_send),
 
LSM_HOOK_INIT(socket_create, selinux_socket_create),
-- 
2.17.0



[PATCH 0/3] Introduce LSM-hook for socketpair(2)

2018-04-23 Thread David Herrmann
Hi

This series adds a new LSM hook for the socketpair(2) syscall. The idea
is to allow SO_PEERSEC to be called on AF_UNIX sockets created via
socketpair(2), and return the same information as if you emulated
socketpair(2) via a temporary listener socket. Right now SO_PEERSEC
will return the unlabeled credentials for a socketpair, rather than the
actual credentials of the creating process.

A simple call to:

socketpair(AF_UNIX, SOCK_STREAM, 0, out);

can be emulated via a temporary listener socket bound to a unique,
random name in the abstract namespace. By connecting to this listener
socket, accept(2) will return the second part of the pair. If
SO_PEERSEC is queried on these, the correct credentials of the creating
process are returned. A simple comparison between the behavior of
SO_PEERSEC on socketpair(2) and an emulated socketpair is included in
the dbus-broker test-suite [1].

This patch series tries to close this gap and makes both behave the
same. A new LSM-hook is added which allows LSMs to cache the correct
peer information on newly created socket-pairs.

Apart from fixing this behavioral difference, the dbus-broker project
actually needs to query the credentials of socketpairs, and currently
must resort to racy procfs(2) queries to get the LSM credentials of its
controller socket. Several parts of the dbus-broker project allow you
to pass in a socket during execve(2), which will be used by the child
process to accept control-commands from its parent. One natural way to
create this communication channel is to use socketpair(2). However,
right now SO_PEERSEC does not return any useful information, hence, the
child-process would need other means to retrieve this information. By
avoiding socketpair(2) and using the hacky-emulated version, this is not
an issue.

There was a previous discussion on this matter [2] roughly a year ago.
Back then there was the suspicion that proper SO_PEERSEC would confuse
applications. However, we could not find any evidence backing this
suspicion. Furthermore, we now actually see the contrary. Lack of
SO_PEERSEC makes it a hassle to use socketpairs with LSM credentials.
Hence, we propose to implement full SO_PEERSEC for socketpairs.

This series only adds SELinux backends, since that is what we need for
RHEL. I will gladly extend the other LSMs if needed.

Thanks
David

[1] https://github.com/bus1/dbus-broker/blob/master/src/util/test-peersec.c
[2] https://www.spinics.net/lists/selinux/msg22674.html

David Herrmann (3):
  security: add hook for socketpair(AF_UNIX, ...)
  net/unix: hook unix_socketpair() into LSM
  selinux: provide unix_stream_socketpair callback

 include/linux/lsm_hooks.h |  8 
 include/linux/security.h  |  7 +++
 net/unix/af_unix.c|  5 +
 security/security.c   |  6 ++
 security/selinux/hooks.c  | 14 ++
 5 files changed, 40 insertions(+)

-- 
2.17.0



Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

2018-04-19 Thread David Herrmann
Hi

On Fri, Apr 20, 2018 at 7:44 AM, Sergey Senozhatsky
 wrote:
> On (04/20/18 06:37), David Herrmann wrote:
>>
>> I get lots of timer-errors on Arch-Linux booting current master, after
>> a suspend/resume cycle. Just a selection of errors I see on resume:
>
> Hello David,
> Any chance you can revert the patches in question and test? I'm running
> ARCH (4.17.0-rc1-dbg-00042-gaa03ddd9c434) and suspend/resume cycle does
> not trigger any errors. Except for this one
>
> kernel: do_IRQ: 0.55 No irq handler for vector

I can easily reproduce it by sleeping for >5min, so the systemd
watchdog timers are triggered. The patches don't revert cleanly, so I
didn't look into booting without them, yet. I will try just linking
the monotonic clock to the monotonic_active clock later.

Also, doesn't this hunk in 72199320d49d need a 'break;':

diff --git a/kernel/time/posix-stubs.c b/kernel/time/posix-stubs.c
index b258bee13b02..6259dbc0191a 100644
--- a/kernel/time/posix-stubs.c
+++ b/kernel/time/posix-stubs.c
@@ -73,6 +73,8 @@ int do_clock_gettime(clockid_t which_clock, struct
timespec64 *tp)
case CLOCK_BOOTTIME:
get_monotonic_boottime64(tp);
break;
+   case CLOCK_MONOTONIC_ACTIVE:
+   ktime_get_active_ts64(tp);
default:
return -EINVAL;
}


Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

2018-04-19 Thread David Herrmann
Hey

On Tue, Mar 13, 2018 at 7:11 PM, John Stultz  wrote:
> On Mon, Mar 12, 2018 at 11:36 PM, Ingo Molnar  wrote:
>> Ok, I have edited all the changelogs accordingly (and also flipped around the
>> 'clock MONOTONIC' language to the more readable 'the MONOTONIC clock' 
>> variant),
>> the resulting titles are (in order):
>>
>>  72199320d49d: timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock
>>  d6ed449afdb3: timekeeping: Make the MONOTONIC clock behave like the 
>> BOOTTIME clock
>>  f2d6fdbfd238: Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior
>>  d6c7270e913d: timekeeping: Remove boot time specific code
>>  7250a4047aa6: posix-timers: Unify MONOTONIC and BOOTTIME clock behavior
>>  127bfa5f4342: hrtimer: Unify MONOTONIC and BOOTTIME clock behavior
>>  92af4dcb4e1c: tracing: Unify the "boot" and "mono" tracing clocks
>>
>> I'll push these out after testing.
>
> I'm still anxious about userspace effects given how much I've seen the
> current behavior documented, and wouldn't pushed for this myself (I'm
> a worrier), but at least I'm not seeing any failures in initial
> testing w/ kselftest so far.

I get lots of timer-errors on Arch-Linux booting current master, after
a suspend/resume cycle. Just a selection of errors I see on resume:

systemd[1]: systemd-journald.service: Main process exited,
code=dumped, status=6/ABRT
rtkit-daemon[742]: The canary thread is apparently starving. Taking action.
systemd[1]: systemd-udevd.service: Watchdog timeout (limit 3min)!
systemd[1]: systemd-journald.service: Watchdog timeout (limit 3min)!
kernel: e1000e :00:1f.6: Failed to restore TIMINCA clock rate delta: -22

Lots of crashes with SIGABRT due to these.

I did not bisect it, but it sounds related to me. Also, user-space
uses CLOCK_MONOTONIC for watchdog timers. That is, a process is
required to respond to a watchdog-request in a given MONOTONIC
time-frame. If this jumps during suspend/resume, watchdogs will fire
immediately. I don't see how this can work with the new MONOTONIC
behavior?

Thanks
David


Re: [PATCH] platform/x86: thinkpad_acpi: suppress warning about palm detection

2018-01-12 Thread David Herrmann
Hi Andy

On Fri, Jan 12, 2018 at 2:50 PM, Andy Shevchenko
 wrote:
> On Fri, Jan 12, 2018 at 1:04 PM, David Herrmann  wrote:
>> This patch prevents the thinkpad_acpi driver from warning about 2 event
>> codes returned for keyboard palm-detection. No behavioral changes,
>> other than suppressing the warning in the kernel log. The events are
>> still forwarded via acpi-netlink channels.
>>
>> We could, optionally, decide to forward the event through a
>> input-switch on the tpacpi input device. However, so far no suitable
>> input-code exists, and no similar drivers report such events. Hence,
>> leave it an acpi event for now.
>>
>> Note that the event-codes are named based on empirical studies. On the
>> ThinkPad X1 5th Gen the sensor can be found underneath the arrow key.
>>
>
>> Cc: Matthew Thode 
>
> Shouldn't be Suggested-by or even Signed-off-by?

The patch is different (Matthew originally suppressed the ACPI event),
so I did not copy the sign-off. Please add Suggested-by, if that is an
acceptable tag.

>> Signed-off-by: David Herrmann 
>
>
>> +   case TP_HKEY_EV_PALM_DETECTED:
>> +   case TP_HKEY_EV_PALM_UNDETECTED:
>
>> +   /* palm detected hovering the keyboard, forward to user-space
>> +* via netlink for consumption */
>
> Comment style is
> /*
>  * Multi line comment.
>  * This is an example.
>  */

All other 6 comments in this function follow the style I used here, so
I tried to be consistent. But feel free to amend this change.

Thanks a lot!
David


[PATCH] platform/x86: thinkpad_acpi: suppress warning about palm detection

2018-01-12 Thread David Herrmann
This patch prevents the thinkpad_acpi driver from warning about 2 event
codes returned for keyboard palm-detection. No behavioral changes,
other than suppressing the warning in the kernel log. The events are
still forwarded via acpi-netlink channels.

We could, optionally, decide to forward the event through a
input-switch on the tpacpi input device. However, so far no suitable
input-code exists, and no similar drivers report such events. Hence,
leave it an acpi event for now.

Note that the event-codes are named based on empirical studies. On the
ThinkPad X1 5th Gen the sensor can be found underneath the arrow key.

Cc: Matthew Thode 
Signed-off-by: David Herrmann 
---
 drivers/platform/x86/thinkpad_acpi.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/platform/x86/thinkpad_acpi.c 
b/drivers/platform/x86/thinkpad_acpi.c
index 117be48ff4de..128f91af716e 100644
--- a/drivers/platform/x86/thinkpad_acpi.c
+++ b/drivers/platform/x86/thinkpad_acpi.c
@@ -214,6 +214,10 @@ enum tpacpi_hkey_event_t {
/* AC-related events */
TP_HKEY_EV_AC_CHANGED   = 0x6040, /* AC status changed */
 
+   /* Further user-interface events */
+   TP_HKEY_EV_PALM_DETECTED= 0x60b0, /* palm hoveres keyboard */
+   TP_HKEY_EV_PALM_UNDETECTED  = 0x60b1, /* palm removed */
+
/* Misc */
TP_HKEY_EV_RFKILL_CHANGED   = 0x7000, /* rfkill switch changed */
 };
@@ -4079,6 +4083,12 @@ static bool hotkey_notify_6xxx(const u32 hkey,
*send_acpi_ev = false;
break;
 
+   case TP_HKEY_EV_PALM_DETECTED:
+   case TP_HKEY_EV_PALM_UNDETECTED:
+   /* palm detected hovering the keyboard, forward to user-space
+* via netlink for consumption */
+   return true;
+
default:
pr_warn("unknown possible thermal alarm or keyboard event 
received\n");
known = false;
-- 
2.15.1



Re: [PATCH 4/6] hugetlbfs: implement memfd sealing

2017-11-05 Thread David Herrmann
Hi

On Sat, Nov 4, 2017 at 12:31 AM, Mike Kravetz  wrote:
> On 11/03/2017 10:56 AM, Mike Kravetz wrote:
>> On 11/03/2017 10:41 AM, David Herrmann wrote:
>>> Hi
>>>
>>> On Fri, Nov 3, 2017 at 6:12 PM, Mike Kravetz  
>>> wrote:
>>>> On 11/03/2017 10:03 AM, David Herrmann wrote:
>>>>> Hi
>>>>>
>>>>> On Tue, Oct 31, 2017 at 7:40 PM, Marc-André Lureau
>>>>>  wrote:
>>>>>> Implements memfd sealing, similar to shmem:
>>>>>> - WRITE: deny fallocate(PUNCH_HOLE). mmap() write is denied in
>>>>>>   memfd_add_seals(). write() doesn't exist for hugetlbfs.
>>>>>> - SHRINK: added similar check as shmem_setattr()
>>>>>> - GROW: added similar check as shmem_setattr() & shmem_fallocate()
>>>>>>
>>>>>> Except write() operation that doesn't exist with hugetlbfs, that
>>>>>> should make sealing as close as it can be to shmem support.
>>>>>
>>>>> SEAL, SHRINK, and GROW look fine to me.
>>>>>
>>>>> Regarding WRITE
>>>>
>>>> The commit message may not be clear.  However, hugetlbfs does not support
>>>> the write system call (or aio).  The only way to modify contents of a
>>>> hugetlbfs file is via mmap or hole punch/truncate.  So, we do not really
>>>> need to worry about those special (a)io cases for hugetlbfs.
>>>
>>> This is not about the write(2) syscall. Please consider this scenario
>>> about shmem:
>>>
>>> You create a memfd via memfd_create() and map it writable. You now
>>> call another kernel syscall that takes as input _any mapped page
>>> range_. You pass your mapped memfd-addresses to it. Those syscalls
>>> tend to use get_user_pages() to pin arbitrary user-mapped pages, as
>>> such this also affects shmem. In this case, those pages might stay
>>> mapped even if you munmap() your memfd!
>>>
>>> One example of this is using AIO-read() on any other file that
>>> supports it, passing your mapped memfd as buffer to _read into_. The
>>> operations supported on the memfd are irrelevant here.
>>> The selftests contain a FUSE-based test for this, since FUSE allows
>>> user-space to GUP pages for an arbitrary amount of time.
>>>
>>> The original fix for this is:
>>>
>>> commit 05f65b5c70909ef686f865f0a85406d74d75f70f
>>> Author: David Herrmann 
>>> Date:   Fri Aug 8 14:25:36 2014 -0700
>>>
>>> shm: wait for pins to be released when sealing
>>>
>>> Please have a look at this. Your patches use shmem_add_seals() almost
>>> unchanged, and as such you call into shmem_wait_for_pins() on
>>> hugetlbfs. I would really like to see an explicit ACK that this works
>>> on hugetlbfs.
>>
>> Thanks for the explanation.  I missed that in your first reply.  I'll
>> look into this for hugetlbfs.
>
> I reviewed the routines in the above commit and did not see anything that
> would prevent them from working properly with hugetlbfs.  I modified the
> fuse test to use hugetlbfs based mapping.  I also instrumented the above
> routines and verified that tags were set/checked/cleared as designed for
> hugetlb pages.  So, that is an ACK on working with hugetlbfs.
>
> This does bring up the point that the fuse seals test should also be
> modified to work with hugetlbfs as part of this series.

Perfect! Looks all good to me then!

Thanks
David


Re: [PATCH 4/6] hugetlbfs: implement memfd sealing

2017-11-03 Thread David Herrmann
Hi

On Fri, Nov 3, 2017 at 6:12 PM, Mike Kravetz  wrote:
> On 11/03/2017 10:03 AM, David Herrmann wrote:
>> Hi
>>
>> On Tue, Oct 31, 2017 at 7:40 PM, Marc-André Lureau
>>  wrote:
>>> Implements memfd sealing, similar to shmem:
>>> - WRITE: deny fallocate(PUNCH_HOLE). mmap() write is denied in
>>>   memfd_add_seals(). write() doesn't exist for hugetlbfs.
>>> - SHRINK: added similar check as shmem_setattr()
>>> - GROW: added similar check as shmem_setattr() & shmem_fallocate()
>>>
>>> Except write() operation that doesn't exist with hugetlbfs, that
>>> should make sealing as close as it can be to shmem support.
>>
>> SEAL, SHRINK, and GROW look fine to me.
>>
>> Regarding WRITE
>
> The commit message may not be clear.  However, hugetlbfs does not support
> the write system call (or aio).  The only way to modify contents of a
> hugetlbfs file is via mmap or hole punch/truncate.  So, we do not really
> need to worry about those special (a)io cases for hugetlbfs.

This is not about the write(2) syscall. Please consider this scenario
about shmem:

You create a memfd via memfd_create() and map it writable. You now
call another kernel syscall that takes as input _any mapped page
range_. You pass your mapped memfd-addresses to it. Those syscalls
tend to use get_user_pages() to pin arbitrary user-mapped pages, as
such this also affects shmem. In this case, those pages might stay
mapped even if you munmap() your memfd!

One example of this is using AIO-read() on any other file that
supports it, passing your mapped memfd as buffer to _read into_. The
operations supported on the memfd are irrelevant here.
The selftests contain a FUSE-based test for this, since FUSE allows
user-space to GUP pages for an arbitrary amount of time.

The original fix for this is:

commit 05f65b5c70909ef686f865f0a85406d74d75f70f
Author: David Herrmann 
Date:   Fri Aug 8 14:25:36 2014 -0700

shm: wait for pins to be released when sealing

Please have a look at this. Your patches use shmem_add_seals() almost
unchanged, and as such you call into shmem_wait_for_pins() on
hugetlbfs. I would really like to see an explicit ACK that this works
on hugetlbfs.

Thanks
David


Re: [PATCH 4/6] hugetlbfs: implement memfd sealing

2017-11-03 Thread David Herrmann
Hi

On Tue, Oct 31, 2017 at 7:40 PM, Marc-André Lureau
 wrote:
> Implements memfd sealing, similar to shmem:
> - WRITE: deny fallocate(PUNCH_HOLE). mmap() write is denied in
>   memfd_add_seals(). write() doesn't exist for hugetlbfs.
> - SHRINK: added similar check as shmem_setattr()
> - GROW: added similar check as shmem_setattr() & shmem_fallocate()
>
> Except write() operation that doesn't exist with hugetlbfs, that
> should make sealing as close as it can be to shmem support.

SEAL, SHRINK, and GROW look fine to me.

Regarding WRITE you need to make sure there are no page references
left around. For instance, on shmem any process might trigger the
kernel to GUP mapped shmem pages for asynchronous IO, then unmap the
file and request F_SEAL_WRITE. In this case the seal must be rejected
*iff* the pages are still pinned. shmem does this by requiring the
page-refcounts to be 0. Preferably there would be some better
infrastructure that tells us whether someone operates on those pages,
but this does not exist right now. See shmem_wait_for_pins() for
details.

I have little knowledge on how hugetlbs integrate with the page-cache
and radix-tree, hence I'd prefer if someone can explicitly ACK that
shmem_wait_for_pins() is suitable for hugetlbfs.

Otherwise, this series looks good to me (minus the #ifdef mess..).

Thanks
David


Re: [PATCH 1/2] shmem: drop lru_add_drain_all from shmem_wait_for_pins

2017-11-03 Thread David Herrmann
Hi

On Fri, Nov 3, 2017 at 9:24 AM, Michal Hocko  wrote:
> On Fri 03-11-17 00:46:18, Hugh Dickins wrote:
>> On Thu, 2 Nov 2017, Michal Hocko wrote:
>> > From: Michal Hocko 
>> >
>> > syzkaller has reported the following lockdep splat
>> > ==
>> > WARNING: possible circular locking dependency detected
>> > 4.13.0-next-20170911+ #19 Not tainted
>> > --
>> > syz-executor5/6914 is trying to acquire lock:
>> >   (cpu_hotplug_lock.rw_sem){}, at: [] 
>> > get_online_cpus  include/linux/cpu.h:126 [inline]
>> >   (cpu_hotplug_lock.rw_sem){}, at: [] 
>> > lru_add_drain_all+0xe/0x20 mm/swap.c:729
>> >
>> > but task is already holding lock:
>> >   (&sb->s_type->i_mutex_key#9){}, at: [] inode_lock 
>> > include/linux/fs.h:712 [inline]
>> >   (&sb->s_type->i_mutex_key#9){}, at: [] 
>> > shmem_add_seals+0x197/0x1060 mm/shmem.c:2768
>> >
>> > more details [1] and dependencies explained [2]. The problem seems to be
>> > the usage of lru_add_drain_all from shmem_wait_for_pins. While the lock
>> > dependency is subtle as hell and we might want to make lru_add_drain_all
>> > less dependent on the hotplug locks the usage of lru_add_drain_all seems
>> > dubious here. The whole function cares only about radix tree tags, page
>> > count and page mapcount. None of those are touched from the draining
>> > context. So it doesn't make much sense to drain pcp caches. Moreover
>> > this looks like a wrong thing to do because it basically induces
>> > unpredictable latency to the call because draining is not for free
>> > (especially on larger machines with many cpus).
>> >
>> > Let's simply drop the call to lru_add_drain_all to address both issues.
>> >
>> > [1] http://lkml.kernel.org/r/089e0825eec8955c1f055c83d...@google.com
>> > [2] 
>> > http://lkml.kernel.org/r/http://lkml.kernel.org/r/20171030151009.ip4k7nwan7muo...@hirez.programming.kicks-ass.net
>> >
>> > Cc: David Herrmann 
>> > Cc: Hugh Dickins 
>> > Signed-off-by: Michal Hocko 
>>
>> NAK.  shmem_wait_for_pins() is waiting for temporary pins on the pages
>> to go away, and using lru_add_drain_all() in the usual way, to lower
>> the refcount of pages temporarily pinned in a pagevec somewhere.  Page
>> count is touched by draining pagevecs: I'm surprised to see you say
>> that it isn't - or have pagevec page references been eliminated by
>> a recent commit that I missed?
>
> I must be missing something here. __pagevec_lru_add_fn merely about
> moving the page into the appropriate LRU list, pagevec_move_tail only
> rotates, lru_deactivate_file_fn moves from active to inactive LRUs,
> lru_lazyfree_fn moves from anon to file LRUs and activate_page_drain
> just moves to the active list. None of those operations touch the page
> count AFAICS. So I would agree that some pages might be pinned outside
> of the LRU (lru_add_pvec) and thus unreclaimable but does this really
> matter. Or what else I am missing?

Yes, we need to make sure those page-pins are dropped.
shmem_wait_for_pins() literally just waits for all those to be
cleared, since there is no way to tell whether a page is still
inflight for some pending async WRITE operation. Hence, if the
pagevecs keep pinning those pages, we must fail the shmem-seal
operation, as we cannot guarantee there are no further WRITEs to this
file. The refcount is our only way to tell.

I think the caller could just call lru_add_drain_all() between
mapping_deny_writable() and shmem_wait_for_pins(), releasing the
inode-lock in between. But that means we drain it even if
shmem_tag_pins() does not find anything (presumably the common case).
It would also have weird interactions with parallel inode-operations,
in case the seal-operation fails and is reverted. Not sure I like
that.

Thanks
David


Re: [PATCH] hid/uhid: fix a double-fetch bug when copying event from user

2017-09-22 Thread David Herrmann
Hey

On Wed, Sep 20, 2017 at 12:12 AM, Dmitry Torokhov
 wrote:
> On Tue, Sep 19, 2017 at 2:54 PM, Meng Xu  wrote:
>>
>>
>> On 09/19/2017 05:31 PM, Dmitry Torokhov wrote:
>>>
>>> On Mon, Sep 18, 2017 at 10:21 PM, Meng Xu  wrote:

 When in_compat_syscall(), a user could make type != UHID_CREATE when
 get_user(type, buffer) [first fetch] and later make event->type ==
 UHID_CREATE in copy_from_user(event, buffer, ...) [second fetch].

 By doing so, an attacker might circumvent the specific logic to handle
 the type == UHID_CREATE case and later cause undefined behaviors.

 This patch enforces that event->type is overriden to the type value
 copied in the first fetch and thus, mitigate this race condition attack.
>>>
>>> I do not believe this mitigates anything, as copy_form_user() is not
>>> an atomic operation and we can have 2nd thread "scrambling" the memory
>>> while 1st does the ioctl.
>>
>> Yes, what you described is the root cause of this double-fetch situation
>> and that is why I am proposing this patch.
>>>
>>>
>>> We also should not expect that we always get consistent data from
>>> userspace and the rest of the driver should be able to cope with
>>> (reject) such data.
>>
>> Yes, that is also true and we should never assume to get consistent
>> data from userspace. That is why in this case, we can have user
>> started with UHID_INPUT just to skip the large chunk of code in
>> if (type == UHID_CREATE) {} and then replace it with UHID_CREATE
>> and take the function uhid_dev_create(uhid, &uhid->input_buf).
>> This is exactly what this patch tries to mitigate.
>
> OK, so the driver should be able to withstand equivalent of "dd
> if=/dev/random of=/dev/uhid". The point of special handling of
> UHID_CREATE in uhid_event_from_user() is to convert the layout of the
> UHID event from 32 to 64 bit layout because we made a mistake and have
> a pointer data there. We do not really care about contents. We do not
> care it if changes. We do not care of the pointer is valid. If there
> was garbage, or there will be garbage after conversion, it does not
> care one bit. uhid_dev_create() has to validate the input and reject
> invalid input.

I agree.

With current code, worst case scenario is someone shortcutting the
compat-conversion by setting UHID_CREATE after uhid_event_from_user()
copied it. However, this does no harm, as Dmitry explained. If
user-space wants to shortcut the conversion, let them do so...

Thanks
David


[PATCH] net/unix: drop obsolete fd-recursion limits

2017-07-17 Thread David Herrmann
All unix sockets now account inflight FDs to the respective sender.
This was introduced in:

commit 712f4aad406bb1ed67f3f98d04c044191f0ff593
Author: willy tarreau 
Date:   Sun Jan 10 07:54:56 2016 +0100

unix: properly account for FDs passed over unix sockets

and further refined in:

commit 415e3d3e90ce9e18727e8843ae343eda5a58fad6
Author: Hannes Frederic Sowa 
Date:   Wed Feb 3 02:11:03 2016 +0100

unix: correctly track in-flight fds in sending process user_struct

Hence, regardless of the stacking depth of FDs, the total number of
inflight FDs is limited, and accounted. There is no known way for a
local user to exceed those limits or exploit the accounting.

Furthermore, the GC logic is independent of the recursion/stacking depth
as well. It solely depends on the total number of inflight FDs,
regardless of their layout.

Lastly, the current `recursion_level' suffers a TOCTOU race, since it
checks and inherits depths only at queue time. If we consider `A<-B' to
mean `queue-B-on-A', the following sequence circumvents the recursion
level easily:

A<-B
   B<-C
  C<-D
 ...
   Y<-Z

resulting in:

A<-B<-C<-...<-Z

With all of this in mind, lets drop the recursion limit. It has no
additional security value, anymore. On the contrary, it randomly
confuses message brokers that try to forward file-descriptors, since
any sendmsg(2) call can fail spuriously with ETOOMANYREFS if a client
maliciously modifies the FD while inflight.

Cc: Alban Crequy 
Cc: Simon McVittie 
Signed-off-by: David Herrmann 
---
 include/net/af_unix.h |  1 -
 net/unix/af_unix.c| 24 +---
 2 files changed, 1 insertion(+), 24 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 678e4d6fa317..3b3194b2fc65 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -58,7 +58,6 @@ struct unix_sock {
struct list_headlink;
atomic_long_t   inflight;
spinlock_t  lock;
-   unsigned char   recursion_level;
unsigned long   gc_flags;
 #define UNIX_GC_CANDIDATE  0
 #define UNIX_GC_MAYBE_CYCLE1
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 7b52a380d710..5c53f22d62e8 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1528,26 +1528,13 @@ static inline bool too_many_unix_fds(struct task_struct 
*p)
return false;
 }
 
-#define MAX_RECURSION_LEVEL 4
-
 static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 {
int i;
-   unsigned char max_level = 0;
 
if (too_many_unix_fds(current))
return -ETOOMANYREFS;
 
-   for (i = scm->fp->count - 1; i >= 0; i--) {
-   struct sock *sk = unix_get_socket(scm->fp->fp[i]);
-
-   if (sk)
-   max_level = max(max_level,
-   unix_sk(sk)->recursion_level);
-   }
-   if (unlikely(max_level > MAX_RECURSION_LEVEL))
-   return -ETOOMANYREFS;
-
/*
 * Need to duplicate file references for the sake of garbage
 * collection.  Otherwise a socket in the fps might become a
@@ -1559,7 +1546,7 @@ static int unix_attach_fds(struct scm_cookie *scm, struct 
sk_buff *skb)
 
for (i = scm->fp->count - 1; i >= 0; i--)
unix_inflight(scm->fp->user, scm->fp->fp[i]);
-   return max_level;
+   return 0;
 }
 
 static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool 
send_fds)
@@ -1649,7 +1636,6 @@ static int unix_dgram_sendmsg(struct socket *sock, struct 
msghdr *msg,
struct sk_buff *skb;
long timeo;
struct scm_cookie scm;
-   int max_level;
int data_len = 0;
int sk_locked;
 
@@ -1701,7 +1687,6 @@ static int unix_dgram_sendmsg(struct socket *sock, struct 
msghdr *msg,
err = unix_scm_to_skb(&scm, skb, true);
if (err < 0)
goto out_free;
-   max_level = err + 1;
 
skb_put(skb, len - data_len);
skb->data_len = data_len;
@@ -1819,8 +1804,6 @@ static int unix_dgram_sendmsg(struct socket *sock, struct 
msghdr *msg,
__net_timestamp(skb);
maybe_add_creds(skb, sock, other);
skb_queue_tail(&other->sk_receive_queue, skb);
-   if (max_level > unix_sk(other)->recursion_level)
-   unix_sk(other)->recursion_level = max_level;
unix_state_unlock(other);
other->sk_data_ready(other);
sock_put(other);
@@ -1855,7 +1838,6 @@ static int unix_stream_sendmsg(struct socket *sock, 
struct msghdr *msg,
int sent = 0;
struct scm_cookie scm;
bool fds_sent = false;
-   int max_level;
int data_len;
 
wait_for_unix_gc();
@@ -1905,7 +1887,6 @@ static int unix_stream_sendmsg(struc

Re: [PATCH v2] HID: Replace semaphore driver_lock with mutex

2017-06-19 Thread David Herrmann
Hi

On Wed, Jun 14, 2017 at 1:58 PM, Arnd Bergmann  wrote:
> Does that mean that we can have a concurrent hid_device_remove()
> and hid_device_probe() on the same device, using different
> drivers and actually still need the driver_lock for that? I would assume
> that the driver core handles that part at least.

Nope. Only one device can be probed per physical device. Driver core
locking is sufficient.

>> Also note that hid_input_report() might be called from interrupt
>> context, hence it can never call into kref_put() or similar (unless we
>> want to guarantee that unbinding can run in interrupt context).
>
> I was thinking that we could do most of the unbinding in
> hid_device_remove() and only do a small part (the final kfree
> at the minimum) in the release() callback that gets called from
> kref_put(), but I guess that also isn't easy to retrofit.

Not a big fan of putting such restrictions on unbinding. Also unlikely
to retrofit now. But I also think it is not needed.

>> If we really want to get rid of the semaphore, a spinlock might do
>> fine as well. Then again, some hid device drivers might expect their
>> transport driver to *not* run in irq context, and thus break under a
>> spinlock. So if you want to fix this, we need to audit the hid device
>> drivers.
>
> Do you mean the hdrv->report or hdrv->raw_event might not work
> in atomic context, or the probe/release callbacks might not work
> there?

Correct. I assume that there are hid-device-drivers that use raw_event
(or other) callbacks, that assume that they're *not* in atomic
context.

For instance, the bluetooth ll-drivers call hid_input_report() from a
workqueue. Hence, any device-driver running on bluetooth might have
put kmalloc(GFP_KERNEL) calls into its callbacks without ever noticing
that this is a bad idea. This is obviously not correct, since the
device driver might very well be probed on USB and then fault. But...
yeah... I wouldn't bet on all drivers to be correct in that regard.

> If it's only the former, we could do something like the (untested,
> needs rebasing etc) patch below, which only holds the spinlock
> during hid_input_report(). We test the io_started flag under the
> lock, which makes the flag sufficiently atomic, and the release
> function will wait for any concurrent hid_input_report() to complete
> before resetting the flag.
>
> For reference only, do not apply.
> Signed-off-by: Arnd Bergmann 

I like the patch. It should be sufficient for what we want. If it
breaks things, lets fix those device drivers then.

Thanks
David

> diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c
> index 5f87dbe28336..300c65bd40a1 100644
> --- a/drivers/hid/hid-core.c
> +++ b/drivers/hid/hid-core.c
> @@ -1532,8 +1532,11 @@ int hid_input_report(struct hid_device *hid,
> int type, u8 *data, int size, int i
> if (!hid)
> return -ENODEV;
>
> -   if (down_trylock(&hid->driver_input_lock))
> -   return -EBUSY;
> +   spin_lock(&hid->driver_input_lock);
> +   if (!hid->io_started) {
> +   ret = -EBUSY;
> +   goto unlock;
> +   }
>
> if (!hid->driver) {
> ret = -ENODEV;
> @@ -1568,7 +1571,7 @@ int hid_input_report(struct hid_device *hid, int
> type, u8 *data, int size, int i
> ret = hid_report_raw_event(hid, type, data, size, interrupt);
>
>  unlock:
> -   up(&hid->driver_input_lock);
> +   spin_unlock(&hid->driver_input_lock);
> return ret;
>  }
>  EXPORT_SYMBOL_GPL(hid_input_report);
> @@ -2317,11 +2320,6 @@ static int hid_device_probe(struct device *dev)
>
> if (down_interruptible(&hdev->driver_lock))
> return -EINTR;
> -   if (down_interruptible(&hdev->driver_input_lock)) {
> -   ret = -EINTR;
> -   goto unlock_driver_lock;
> -   }
> -   hdev->io_started = false;
>
> if (!hdev->driver) {
> id = hid_match_device(hdev, hdrv);
> @@ -2345,7 +2343,7 @@ static int hid_device_probe(struct device *dev)
> }
>  unlock:
> if (!hdev->io_started)
> -   up(&hdev->driver_input_lock);
> +   hid_device_io_start(hdev);
>  unlock_driver_lock:
> up(&hdev->driver_lock);
> return ret;
> @@ -2363,7 +2361,7 @@ static int hid_device_remove(struct device *dev)
> ret = -EINTR;
> goto unlock_driver_lock;
> }
> -   hdev->io_started = false;
> +   hid_device_io_stop(hdev);
>
> hdrv = hdev->driver;
> if (hdrv) {
> @@ -2375,8 +2373,11 @@ static int hid_device_remove(struct device *dev)
> hdev->driver = NULL;
> }
>
> -   if (!hdev->io_started)
> -   up(&hdev->driver_input_lock);
> +   if (!hdev->io_started) {
> +   dev_warn(dev, "hdrv->remove left io active\n");
> +   hid_device_io_stop(hdev);
> +   }
> +
>  unlock_driver_lock:
> up(&hdev->driver_lock);
>  

Re: [PATCH v2] HID: Replace semaphore driver_lock with mutex

2017-06-14 Thread David Herrmann
Hey

On Wed, Jun 14, 2017 at 9:20 AM, Arnd Bergmann  wrote:
> On Wed, Jun 14, 2017 at 7:22 AM, Binoy Jayan  wrote:
>> Hi,
>>
>> On 14 June 2017 at 01:55, Arnd Bergmann  wrote:
>>
 The mutex code clearly states mutex_trylock() must not be used in
 interrupt context (see kernel/locking/mutex.c), hence we used a
 semaphore here. Unless the mutex code is changed to allow this, we
 cannot switch away from semaphores.
>>>
>>> Right, that makes a lot of sense. I don't think changing the mutex
>>> code is an option here, but I wonder if we can replace the semaphore
>>> with something simpler anyway.
>>>
>>> From what I can tell, it currently does two things:
>>>
>>> 1. it acts as a simple flag to prevent  hid_input_report from derefencing
>>> the hid->driver pointer during initialization and exit. I think this 
>>> could
>>> be done equally well using a simple atomic set_bit()/test_bit() or 
>>> similar.
>>>
>>> 2. it prevents the hid->driver pointer from becoming invalid while an
>>> asynchronous hid_input_report() is in progress. This actually seems to
>>> be a reference counting problem rather than a locking problem.
>>> I don't immediately see how to better address it, or how exactly this
>>> could go wrong in practice, but I would naively expect that either
>>> hdev->driver->remove() needs to wait for the last user of hdev->driver
>>> to complete, or we need kref_get/kref_put in hid_input_report()
>>> to trigger the actual release function.

The HID design is explained in detail in
./Documentation/hid/hid-transport.txt, in case you want some
background information. The problem here is that the low-level
transport driver has a lifetime that is independent of the hid
device-driver. So the transport driver needs to be able to tell the
HID layer about coming/going devices, as well as incoming traffic. At
the same time, the HID layer can bind upper-layer hid device drivers
*anytime* (since it is exposed via the driver core interfaces in /sys
to bind drivers).

The locking architecture is very similar to 's_active' on
super-blocks, or 'active' on kernfs-nodes. However, the big difference
here is that drivers can be rebound. Hence, we're not limited to just
one driver lifetime, which is why we went with a semaphore instead.

Also note that hid_input_report() might be called from interrupt
context, hence it can never call into kref_put() or similar (unless we
want to guarantee that unbinding can run in interrupt context).

If we really want to get rid of the semaphore, a spinlock might do
fine as well. Then again, some hid device drivers might expect their
transport driver to *not* run in irq context, and thus break under a
spinlock. So if you want to fix this, we need to audit the hid device
drivers.

David


Re: [PATCH v2] HID: Replace semaphore driver_lock with mutex

2017-06-13 Thread David Herrmann
Hi

On Tue, Jun 13, 2017 at 11:56 AM, Benjamin Tissoires
 wrote:
>> > -   struct semaphore driver_lock;   /* 
>> > protects the current driver, except during input */
>> > +   struct mutex driver_lock;   /* 
>> > protects the current driver, except during input */
>> > struct semaphore driver_input_lock; /* 
>> > protects the current driver */
>
> Unless I am mistaken, this one could also be converted to a mutex (in a
> separate patch, of course).

The mutex code clearly states mutex_trylock() must not be used in
interrupt context (see kernel/locking/mutex.c), hence we used a
semaphore here. Unless the mutex code is changed to allow this, we
cannot switch away from semaphores.

Otherwise, this patch (given Benjamin's comments are addressed) looks good:

Reviewed-by: David Herrmann 

Thanks
David


Re: [PATCH v7] Input: psxpad-spi - Add PlayStation 1/2 joypads via SPI interface Driver

2017-05-03 Thread David Herrmann
Hey

On Mon, May 1, 2017 at 9:47 PM, Dmitry Torokhov
 wrote:
> On Sun, Apr 30, 2017 at 02:11:03PM +0900, Tomohiro Yoshidomi wrote:
>> PlayStation 1/2 joypads can be connected directly to the SPI interface.
>>
>> Signed-off-by: Tomohiro Yoshidomi 
>
> Benjamin, David, I think this driver now matches the gamepad spec that
> we have in Drivers/input/gamepad.rst. Could you give it an one-over, as
> far as mappings go, otherwise I think it can be applied and merged with
> my 2nd pull request.

Looks good to me:

Acked-by: David Herrmann 

The FF code could be select'ed rather than conditional. Everything
else looks fine. I didn't check the SPI internals though. Never
touched SPI, sorry.

Thanks
David


Re: Re: [PATCH] Input: evdev: fix queueing of SYN_DROPPED event for EVIOCG[type] IOCTL case

2017-02-23 Thread David Herrmann
Hi

On Tue, Jan 31, 2017 at 5:09 PM, Aniroop Mathur  wrote:
> On Sun, Jan 22, 2017 at 6:45 PM, David Herrmann  wrote:
>> On Thu, Nov 24, 2016 at 9:11 PM, Aniroop Mathur  wrote:
>>> continue;
>>> } else if (head != i) {
>>> /* move entry to fill the gap */
>>> @@ -151,6 +155,7 @@ static void __evdev_flush_queue(struct evdev_client 
>>> *client, unsigned int type)
>>> }
>>>
>>> client->head = head;
>>> +   return drop_count;
>>>  }
>>>
>>>  static void __evdev_queue_syn_dropped(struct evdev_client *client)
>>> @@ -920,6 +925,7 @@ static int evdev_handle_get_val(struct evdev_client 
>>> *client,
>>> int ret;
>>> unsigned long *mem;
>>> size_t len;
>>> +   unsigned int drop_count = 0;
>>>
>>> len = BITS_TO_LONGS(maxbit) * sizeof(unsigned long);
>>> mem = kmalloc(len, GFP_KERNEL);
>>> @@ -933,12 +939,12 @@ static int evdev_handle_get_val(struct evdev_client 
>>> *client,
>>>
>>> spin_unlock(&dev->event_lock);
>>>
>>> -   __evdev_flush_queue(client, type);
>>> +   drop_count = __evdev_flush_queue(client, type);
>>>
>>> spin_unlock_irq(&client->buffer_lock);
>>>
>>> ret = bits_to_user(mem, maxbit, maxlen, p, compat);
>>> -   if (ret < 0)
>>> +   if (ret < 0 && drop_count > 0)
>>> evdev_queue_syn_dropped(client);
>>
>> I don't see the point. If bits_to_user() fails, you get EFAULT.
>> User-space cannot assume anything is still valid if they get EFAULT.
>> This is not like ENOMEM or other errors that you can recover from.
>> EFAULT means _programming_ error, not runtime exception.
>>
>> IOW, EFAULT is special nearly everywhere in the kernel. Usually,
>> whenever a syscall returns an error, you can rely on it to not have
>> modified state. EFAULT is an exception on most paths, since it might
>> occur when copying over results, and it is overly expensive to handle
>> EFAULT gracefully there (you'd have to copy _results_ to user-space,
>> before making them visible to the system).
>>
>> Long story short: I don't see the point in this patch. This path is
>> *never* triggered, except if your client is buggy. And even if you
>> trigger it, placing SYN_DROPPED in the queue does no harm at all.
>>
>> Care to elaborate why exactly you want this modification?
>> David
>>
>
> Sure, I will elaborate for you.
> Basically, the bug is that if the last event dropped in the queue is
> EV_SYN/SYN_REPORT and next event inserted is EV_SYN/SYN_DROPPED, then the
> userspace client will ignore the next complete event packet as per rule
> defined in the document although the client should not ignore the events
> until EV_SYN/SYN_REPORT because the events for this case are not partial
> events but the full packet indeed. So we need to make sure whenever this
> happens we immediately insert another EV_SYN/SYN_REPORT after EV_SYN/DROPPED
> event so that client do not ignore the next full packet.
> I already fixed this bug and you can see the patch here (not submitted yet)
> https://patchwork.kernel.org/patch/9362233/
>
> For this patch, we had no problem with the case of kernel buffer overrun and
> also had no problem for the case of clock change request, but only had problem
> for the case of EVIOCG ioctl call which I have already explained in this patch
> description.
> In short, if we insert SYN_DROPPED event wrongly then client will ignore
> events until SYN_REPORT event which we do not want to happen. So that is
> why I want this modification in order to have correct insertion of
> SYN_DROPPED event and hence go ahead with another patch I mentioned above.

Fair enough. A SYN_DROPPED should be followed by a SYN_REPORT. The
normal insertion path guarantees that (since it keeps the last event
alive), the other 2 fake SYN_DROPPED insertions don't. But...

> Next, you have also mentioned that this path is never triggered which I am not
> sure of. However, if this path is never triggered then it is best to delete it
> to avoid such confusion but I dont think thats a good idea. And if this path
> can be triggered rarely (even once) which I believe it can like in case of 
> buggy
> client you mentioned or in case of bit flip or for any possible reason, then
> we need to make this modification.

...you seem to misunderstand when this code-path is triggered. This is
an EFAULT handler. So it is 

Re: [PATCH] Input: evdev: fix queueing of SYN_DROPPED event for EVIOCG[type] IOCTL case

2017-01-22 Thread David Herrmann
Hi

On Thu, Nov 24, 2016 at 9:11 PM, Aniroop Mathur  wrote:
> Currently, when EVIOCG[type] ioctl call is issued and bits_to_user fails,
> then SYN_DROPPED event is inserted in the event queue always.
>
> However, it is not compulsory that some events are flushed out on every
> EVIOCG[type] ioctl call like in case of empty event queue and in case when
> EVIOCG[type] ioctl is issued for say A type of events but event queue does
> not have any A type of events but some other type of events.
>
> Therefore, insert SYN_DROPPED event only when some events have been flushed
> out from event queue plus bits_to_user fails.
>
> Signed-off-by: Aniroop Mathur 
> ---
>  drivers/input/evdev.c | 12 +---
>  1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c
> index e9ae3d5..f8b295e 100644
> --- a/drivers/input/evdev.c
> +++ b/drivers/input/evdev.c
> @@ -108,9 +108,11 @@ static bool __evdev_is_filtered(struct evdev_client 
> *client,
>  }
>
>  /* flush queued events of type @type, caller must hold client->buffer_lock */
> -static void __evdev_flush_queue(struct evdev_client *client, unsigned int 
> type)
> +static unsigned int __evdev_flush_queue(struct evdev_client *client,
> +   unsigned int type)
>  {
> unsigned int i, head, num;
> +   unsigned int drop_count = 0;
> unsigned int mask = client->bufsize - 1;
> bool is_report;
> struct input_event *ev;
> @@ -129,9 +131,11 @@ static void __evdev_flush_queue(struct evdev_client 
> *client, unsigned int type)
>
> if (ev->type == type) {
> /* drop matched entry */
> +   drop_count++;
> continue;
> } else if (is_report && !num) {
> /* drop empty SYN_REPORT groups */
> +   drop_count++;

Dropping an empty report-group should not be considered in
`drop_count` here. Empty report groups carry no information and should
not affect client's state. You should only increment `drop_count` in
the first block.

> continue;
> } else if (head != i) {
> /* move entry to fill the gap */
> @@ -151,6 +155,7 @@ static void __evdev_flush_queue(struct evdev_client 
> *client, unsigned int type)
> }
>
> client->head = head;
> +   return drop_count;
>  }
>
>  static void __evdev_queue_syn_dropped(struct evdev_client *client)
> @@ -920,6 +925,7 @@ static int evdev_handle_get_val(struct evdev_client 
> *client,
> int ret;
> unsigned long *mem;
> size_t len;
> +   unsigned int drop_count = 0;
>
> len = BITS_TO_LONGS(maxbit) * sizeof(unsigned long);
> mem = kmalloc(len, GFP_KERNEL);
> @@ -933,12 +939,12 @@ static int evdev_handle_get_val(struct evdev_client 
> *client,
>
> spin_unlock(&dev->event_lock);
>
> -   __evdev_flush_queue(client, type);
> +   drop_count = __evdev_flush_queue(client, type);
>
> spin_unlock_irq(&client->buffer_lock);
>
> ret = bits_to_user(mem, maxbit, maxlen, p, compat);
> -   if (ret < 0)
> +   if (ret < 0 && drop_count > 0)
> evdev_queue_syn_dropped(client);

I don't see the point. If bits_to_user() fails, you get EFAULT.
User-space cannot assume anything is still valid if they get EFAULT.
This is not like ENOMEM or other errors that you can recover from.
EFAULT means _programming_ error, not runtime exception.

IOW, EFAULT is special nearly everywhere in the kernel. Usually,
whenever a syscall returns an error, you can rely on it to not have
modified state. EFAULT is an exception on most paths, since it might
occur when copying over results, and it is overly expensive to handle
EFAULT gracefully there (you'd have to copy _results_ to user-space,
before making them visible to the system).

Long story short: I don't see the point in this patch. This path is
*never* triggered, except if your client is buggy. And even if you
trigger it, placing SYN_DROPPED in the queue does no harm at all.

Care to elaborate why exactly you want this modification?
David

>
> kfree(mem);
> --
> 2.6.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-input" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] input: Add disable sysfs entry for every input device

2017-01-02 Thread David Herrmann
Hi

On Sun, Dec 25, 2016 at 11:04 AM, Pali Rohár  wrote:
> This patch allows user to disable events from any input device so events
> would not be delivered to userspace.
>
> Currently there is no way to disable particular input device by kernel.
> User for different reasons would need it for integrated PS/2 keyboard or
> touchpad in notebook or touchscreen on mobile device to prevent sending
> events. E.g. mobile phone in pocket or broken integrated PS/2 keyboard.
>
> This is just a RFC patch, not tested yet. Original post about motivation
> about this patch is there: https://lkml.org/lkml/2014/11/29/92
>
> Signed-off-by: Pali Rohár 
> ---
>  drivers/input/input.c |   35 +++
>  include/linux/input.h |4 
>  2 files changed, 39 insertions(+)

Don't open the device, if you don't want events from it. Really.

I assume the reason behind this is that you don't know how to make
your user-space ignore devices. But with this patch in place, you now
end up with user-space trying to use the device, but not getting any
events. Anyone trying to debug this will go nuts because the setup
looks like the device is used, but ends up being muted.

I strongly recommend improving your user-space code to do what you
want it to do (meaning, make your user-space be configurable, if you
need this).

Btw., as a workaround, you can always disable input-drivers that you
don't want. Or you can run EVIOCGRAB on a device to get the same
effect as your patch.

Thanks
David

> diff --git a/drivers/input/input.c b/drivers/input/input.c
> index d95c34e..9f0da7e 100644
> --- a/drivers/input/input.c
> +++ b/drivers/input/input.c
> @@ -430,6 +430,9 @@ void input_event(struct input_dev *dev,
>  {
> unsigned long flags;
>
> +   if (unlikely(dev->disabled))
> +   return;
> +
> if (is_event_supported(type, dev->evbit, EV_MAX)) {
>
> spin_lock_irqsave(&dev->event_lock, flags);
> @@ -457,6 +460,9 @@ void input_inject_event(struct input_handle *handle,
> struct input_handle *grab;
> unsigned long flags;
>
> +   if (unlikely(dev->disabled))
> +   return;
> +
> if (is_event_supported(type, dev->evbit, EV_MAX)) {
> spin_lock_irqsave(&dev->event_lock, flags);
>
> @@ -1389,12 +1395,41 @@ static ssize_t input_dev_show_properties(struct 
> device *dev,
>  }
>  static DEVICE_ATTR(properties, S_IRUGO, input_dev_show_properties, NULL);
>
> +static ssize_t input_dev_show_disable(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> +   struct input_dev *input_dev = to_input_dev(dev);
> +
> +   return snprintf(buf, PAGE_SIZE, "%d\n", input_dev->disabled ? 1 : 0);
> +}
> +static ssize_t input_dev_store_disable(struct device *dev,
> +  struct device_attribute *attr,
> +  const char *buf, size_t count)
> +{
> +   struct input_dev *input_dev = to_input_dev(dev);
> +   int disable;
> +   int ret;
> +
> +   ret = kstrtoint(buf, 0, &disable);
> +   if (ret)
> +   return ret;
> +
> +   if (disable != 0 && disable != 1)
> +   return -EINVAL;
> +
> +   input_dev->disabled = disable;
> +   return count;
> +}
> +static DEVICE_ATTR(disable, S_IRUGO | S_IWUSR, input_dev_show_disable, 
> input_dev_store_disable);
> +
>  static struct attribute *input_dev_attrs[] = {
> &dev_attr_name.attr,
> &dev_attr_phys.attr,
> &dev_attr_uniq.attr,
> &dev_attr_modalias.attr,
> &dev_attr_properties.attr,
> +   &dev_arrr_disable.attr,
> NULL
>  };
>
> diff --git a/include/linux/input.h b/include/linux/input.h
> index a65e3b2..e390b56 100644
> --- a/include/linux/input.h
> +++ b/include/linux/input.h
> @@ -117,6 +117,8 @@ struct input_value {
>   * @vals: array of values queued in the current frame
>   * @devres_managed: indicates that devices is managed with devres framework
>   * and needs not be explicitly unregistered or freed.
> + * @disabled: indicates that device is in disabled state and kernel drop
> + * all events from it
>   */
>  struct input_dev {
> const char *name;
> @@ -187,6 +189,8 @@ struct input_dev {
> struct input_value *vals;
>
> bool devres_managed;
> +
> +   bool disabled;
>  };
>  #define to_input_dev(d) container_of(d, struct input_dev, dev)
>
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-input" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] drm: add fourcc codes for 16bit R and GR

2017-01-02 Thread David Herrmann
Hi

On Mon, Jan 2, 2017 at 11:41 AM, Rainer Hochecker  wrote:
> From: Rainer Hochecker 
>
> Add fourcc codes for 16bit planes. Required by mesa for
> eglCreateImageKHR to access P010 surfaces created by vaapi.
>
> Signed-off-by: Rainer Hochecker 
> ---
>  include/uapi/drm/drm_fourcc.h | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> index a5890bf..e6ab638 100644
> --- a/include/uapi/drm/drm_fourcc.h
> +++ b/include/uapi/drm/drm_fourcc.h
> @@ -41,10 +41,16 @@ extern "C" {
>  /* 8 bpp Red */
>  #define DRM_FORMAT_R8  fourcc_code('R', '8', ' ', ' ') /* [7:0] R */
>
> +/* 16 bpp Red */
> +#define DRM_FORMAT_R16 fourcc_code('R', '1', '6', ' ') /* [15:0] R */
> +
>  /* 16 bpp RG */
>  #define DRM_FORMAT_RG88fourcc_code('R', 'G', '8', '8') /* 
> [15:0] R:G 8:8 little endian */
>  #define DRM_FORMAT_GR88fourcc_code('G', 'R', '8', '8') /* 
> [15:0] G:R 8:8 little endian */
>
> +/* 32 bpp GR */
> +#define DRM_FORMAT_GR16fourcc_code('G', 'R', '1', '6') /* 
> [31:0] G:R 16:16 little endian */
> +

Shouldn't it be 'G', 'R', '3', '2'?

Also, please put dri-devel on CC.

Thanks
David

>  /* 8 bpp RGB */
>  #define DRM_FORMAT_RGB332  fourcc_code('R', 'G', 'B', '8') /* [7:0] 
> R:G:B 3:3:2 */
>  #define DRM_FORMAT_BGR233  fourcc_code('B', 'G', 'R', '8') /* [7:0] 
> B:G:R 2:3:3 */
> --
> 2.9.3
>


Re: [RFC PATCH 0/3] staging: remove fbdev drivers

2016-12-09 Thread David Herrmann
Hey

On Fri, Dec 9, 2016 at 2:33 PM, Daniel Vetter  wrote:
>> > So it is possible, only reason vram dumb buffers look worse is that there's
>> > only 3 and no one cares about them, vs about 20 and a very active community
>> > of contributors (also for core drm improvements) for the other case.
>>
>> Well, we could move offb to drm while at it I suppose that would be another
>> one (offb is the "dumb driver based on pre-programmed output by firmware).
>
> One of the still in-flight drm drivers is the simpledrm thing meant for
> all kinds of firmware drivers like efifb and similar things on arm for
> pre-programmed output set up by firmware. I.e. no modeset support and
> otherwise a lot of fake to make it work as drm driver, but the idea that
> it's good enough until your real drm driver takes over.

The x86 platform device fixups for SimpleDRM went in some weeks ago,
so maybe I should resend the patches. The driver could easily do
'offb'-like devices as well. Trivial to add.

Anyway, Benjamin is right, we always do shadow buffering for trivial
drivers. Even in SimpleDRM I blit the shadow buffer on page-flip or
dirty-ioctl. Reason is that we cannot easily expose the real
framebuffer in DRM via FB-objects. But I also never saw a use-case for
it, since all trivial devices I worked with were only either used as
fallback or nobody cared for performance.

The generic DRM API is designed for dynamic FB allocation. If your
hardware does not allow you to change the scanout source, you will
have a hard time trying to expose the static buffers via the dynamic
FB-object API. Furthermore, all DRM user-space expects dynamic FB
management to work, preferably without a ridiculously low memory
limit. That's also why I never bothered changing the drivers.

Despite all of this I still see no reason why a driver could not
expose the static, real frambuffers via private ioctls. You can get
all your fancy acceleration that way. Then fix user-space to use this
API. If enough drivers end up with something similar, move it into the
core. Just like we always do in DRM.

Thanks
David


Re: drm: GPF in drm_getcap

2016-11-26 Thread David Herrmann
Hi

On Sat, Nov 26, 2016 at 7:07 PM, Dmitry Vyukov  wrote:
> grep "card0" dmesg:
> [5.298617] device: 'card0': device_add
> [5.298946] PM: Adding info for No Bus:card0
> [6.436178] device: 'card0': device_add
> [6.436488] PM: Adding info for No Bus:card0
>
>
> # ls -l /dev/dri/card0
> crw-rw---T 1 root video 226, 0 Nov 26 18:05 /dev/dri/card0
>
> # ls -lt /sys/class/drm/card0/device/
> ls: cannot access /sys/class/drm/card0/device/: No such file or directory
>
> # ls -lt /sys/class/drm/card0/device/driver
> ls: cannot access /sys/class/drm/card0/device/driver: No such file or 
> directory

Looks like vgem. Something like this should help:

https://gist.github.com/dvdhrm/1bcdf4f3485aa1614a0198a7b90515e2

I wonder whether it would be more appropriate to return -ENOTSUPP rather than 0.

Thanks
David


Re: drm: GPF in drm_getcap

2016-11-26 Thread David Herrmann
Hi

On Sat, Nov 26, 2016 at 6:50 PM, Dmitry Vyukov  wrote:
> On Sat, Nov 26, 2016 at 6:35 PM, David Herrmann  wrote:
>> Hi
>>
>> On Sat, Nov 26, 2016 at 6:17 PM, Dmitry Vyukov  wrote:
>>> On Fri, Sep 9, 2016 at 1:56 PM, Dmitry Vyukov  wrote:
>>>> Hello,
>>>>
>>>> The following program triggers GPF in drm_getcap:
>>>>
>>>> // autogenerated by syzkaller (http://github.com/google/syzkaller)
>>>> #include 
>>>> #include 
>>>> #include 
>>>> #include 
>>>> #include 
>>>> #include 
>>>> #include 
>>>> #include 
>>>>
>>>> int main()
>>>> {
>>>>   int fd = open("/dev/dri/card0", O_RDONLY);
>>>>   uint64_t data[2] = {0x11, 0x80};
>>>>   ioctl(fd, 0xc010640cul /*DRM_IOCTL_GET_CAP*/, data);
>>>>   return 0;
>>>> }
>>>>
>>>>
>>>> general protection fault:  [#1] SMP DEBUG_PAGEALLOC KASAN
>>>> Modules linked in:
>>>> CPU: 1 PID: 5745 Comm: syz-executor Not tainted 4.8.0-rc5-next-20160905+ 
>>>> #14
>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 
>>>> 01/01/2011
>>>> task: 8800310dc540 task.stack: 88003cbc
>>>> RIP: 0010:[]  []
>>>> drm_getcap+0x34b/0x4f0 drivers/gpu/drm/drm_ioctl.c:260
>>>> RSP: 0018:88003cbc7c28  EFLAGS: 00010202
>>>> RAX: 0058 RBX: 88003cbc7cf8 RCX: c90001db
>>>> RDX: 005d RSI: 88003cbc7cf8 RDI: 02c0
>>>> RBP: 88003cbc7c50 R08: ed0007978fa1 R09: ed0007978fa0
>>>> R10: 88003cbc7d07 R11: ed0007978fa1 R12: fff0
>>>> R13: dc00 R14: 88003bcc6850 R15: fff2
>>>> FS:  7fcbf4e03700() GS:88003ed0() 
>>>> knlGS:
>>>> CS:  0010 DS:  ES:  CR0: 80050033
>>>> CR2: 006dce00 CR3: 66135000 CR4: 06e0
>>>> DR0: 001e DR1: 001e DR2: 
>>>> DR3:  DR6: 0ff0 DR7: 0600
>>>> Stack:
>>>>  88003c26db00 88003cbc7cf8 875a3000 88cf0ee0
>>>>  fff2 88003cbc7dc0 834cb57c e200
>>>>  1101 875a1ba0 882ae930 0010
>>>> Call Trace:
>>>>  [] drm_ioctl+0x54c/0xaf0 drivers/gpu/drm/drm_ioctl.c:728
>>>>  [< inline >] vfs_ioctl fs/ioctl.c:43
>>>>  [] do_vfs_ioctl+0x18c/0x1080 fs/ioctl.c:675
>>>>  [< inline >] SYSC_ioctl fs/ioctl.c:690
>>>>  [] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:681
>>>>  [] entry_SYSCALL_64_fastpath+0x23/0xc1
>>>> Code: 3c 28 00 0f 85 88 01 00 00 49 8b 44 24 10 49 39 c6 4c 8d 60 f0
>>>> 74 82 e8 64 19 10 fe 49 8d bc 24 d0 02 00 00 48 89 f8 48 c1 e8 03 <42>
>>>> 80 3c 28 00 0f 85 6f 01 00 00 4d 8b bc 24 d0 02 00 00 49 8d
>>>> RIP  [] drm_getcap+0x34b/0x4f0 
>>>> drivers/gpu/drm/drm_ioctl.c:260
>>>>  RSP 
>>>> ---[ end trace c6e1afa8cd73b880 ]---
>>>>
>>>>
>>>> On commit 4affa544adb8077403893e62b9e327fcf87de6f7 (Sep 8) of linux-next.
>>>
>>> ping
>>>
>>> Still happens on 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24).
>>
>> I suspect this is because we run drm_for_each_crtc() in
>> drm_getcap(DRM_PAGE_FLIP_TARGET) on a legacy driver (meaning
>> mode_config is not initialized). @danvet, how about always
>> initializing mode_config to 0/empty/dummy?
>>
>> Dmitry, what driver do you run this on? And is CONFIG_DRM_LEGACY enabled?
>
>
> CONFIG_DRM_LEGACY is enabled.
>
> How can I understand what driver is used?
> This happens inside of qemu. This is the device:
> crw-rw---T 1 root video 226, 0 Nov 26 17:45 /dev/dri/card0

Usually by looking into `dmesg` and grepping for 'card0', or by inspecting:

/sys/class/drm/card0/device/

or more importantly looking at the symlink:

/sys/class/drm/card0/device/driver

Thanks
David


Re: drm: GPF in drm_getcap

2016-11-26 Thread David Herrmann
Hi

On Sat, Nov 26, 2016 at 6:17 PM, Dmitry Vyukov  wrote:
> On Fri, Sep 9, 2016 at 1:56 PM, Dmitry Vyukov  wrote:
>> Hello,
>>
>> The following program triggers GPF in drm_getcap:
>>
>> // autogenerated by syzkaller (http://github.com/google/syzkaller)
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>>
>> int main()
>> {
>>   int fd = open("/dev/dri/card0", O_RDONLY);
>>   uint64_t data[2] = {0x11, 0x80};
>>   ioctl(fd, 0xc010640cul /*DRM_IOCTL_GET_CAP*/, data);
>>   return 0;
>> }
>>
>>
>> general protection fault:  [#1] SMP DEBUG_PAGEALLOC KASAN
>> Modules linked in:
>> CPU: 1 PID: 5745 Comm: syz-executor Not tainted 4.8.0-rc5-next-20160905+ #14
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> task: 8800310dc540 task.stack: 88003cbc
>> RIP: 0010:[]  []
>> drm_getcap+0x34b/0x4f0 drivers/gpu/drm/drm_ioctl.c:260
>> RSP: 0018:88003cbc7c28  EFLAGS: 00010202
>> RAX: 0058 RBX: 88003cbc7cf8 RCX: c90001db
>> RDX: 005d RSI: 88003cbc7cf8 RDI: 02c0
>> RBP: 88003cbc7c50 R08: ed0007978fa1 R09: ed0007978fa0
>> R10: 88003cbc7d07 R11: ed0007978fa1 R12: fff0
>> R13: dc00 R14: 88003bcc6850 R15: fff2
>> FS:  7fcbf4e03700() GS:88003ed0() knlGS:
>> CS:  0010 DS:  ES:  CR0: 80050033
>> CR2: 006dce00 CR3: 66135000 CR4: 06e0
>> DR0: 001e DR1: 001e DR2: 
>> DR3:  DR6: 0ff0 DR7: 0600
>> Stack:
>>  88003c26db00 88003cbc7cf8 875a3000 88cf0ee0
>>  fff2 88003cbc7dc0 834cb57c e200
>>  1101 875a1ba0 882ae930 0010
>> Call Trace:
>>  [] drm_ioctl+0x54c/0xaf0 drivers/gpu/drm/drm_ioctl.c:728
>>  [< inline >] vfs_ioctl fs/ioctl.c:43
>>  [] do_vfs_ioctl+0x18c/0x1080 fs/ioctl.c:675
>>  [< inline >] SYSC_ioctl fs/ioctl.c:690
>>  [] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:681
>>  [] entry_SYSCALL_64_fastpath+0x23/0xc1
>> Code: 3c 28 00 0f 85 88 01 00 00 49 8b 44 24 10 49 39 c6 4c 8d 60 f0
>> 74 82 e8 64 19 10 fe 49 8d bc 24 d0 02 00 00 48 89 f8 48 c1 e8 03 <42>
>> 80 3c 28 00 0f 85 6f 01 00 00 4d 8b bc 24 d0 02 00 00 49 8d
>> RIP  [] drm_getcap+0x34b/0x4f0 
>> drivers/gpu/drm/drm_ioctl.c:260
>>  RSP 
>> ---[ end trace c6e1afa8cd73b880 ]---
>>
>>
>> On commit 4affa544adb8077403893e62b9e327fcf87de6f7 (Sep 8) of linux-next.
>
> ping
>
> Still happens on 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24).

I suspect this is because we run drm_for_each_crtc() in
drm_getcap(DRM_PAGE_FLIP_TARGET) on a legacy driver (meaning
mode_config is not initialized). @danvet, how about always
initializing mode_config to 0/empty/dummy?

Dmitry, what driver do you run this on? And is CONFIG_DRM_LEGACY enabled?

Thanks
David


[tip:x86/urgent] x86/sysfb: Fix lfb_size calculation

2016-11-16 Thread tip-bot for David Herrmann
Commit-ID:  f96acec8c8020807429d21324547f4b904c37177
Gitweb: http://git.kernel.org/tip/f96acec8c8020807429d21324547f4b904c37177
Author: David Herrmann 
AuthorDate: Tue, 15 Nov 2016 13:01:58 +0100
Committer:  Ingo Molnar 
CommitDate: Wed, 16 Nov 2016 09:38:23 +0100

x86/sysfb: Fix lfb_size calculation

The screen_info.lfb_size field is shifted by 16 bits *only* in case of
VBE. This has historical reasons since VBE advertised it similarly.
However, in case of EFI framebuffers, the size is no longer shifted. Fix
the x86 simple-framebuffer setup code to use the correct size in the
non-VBE case.

While at it, avoid variable abbreviations and rename 'len' to 'length',
and use the correct types matching the screen_info definition.

Signed-off-by: David Herrmann 
Acked-by: Thomas Gleixner 
Cc: Linus Torvalds 
Cc: Matt Fleming 
Cc: Peter Zijlstra 
Cc: Tom Gundersen 
Link: http://lkml.kernel.org/r/20161115120158.15388-3-dh.herrm...@gmail.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/sysfb_simplefb.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/sysfb_simplefb.c b/arch/x86/kernel/sysfb_simplefb.c
index 35b8641..85195d4 100644
--- a/arch/x86/kernel/sysfb_simplefb.c
+++ b/arch/x86/kernel/sysfb_simplefb.c
@@ -66,8 +66,8 @@ __init int create_simplefb(const struct screen_info *si,
 {
struct platform_device *pd;
struct resource res;
-   unsigned long len;
-   u64 base;
+   u64 base, size;
+   u32 length;
 
/*
 * If the 64BIT_BASE capability is set, ext_lfb_base will contain the
@@ -82,11 +82,20 @@ __init int create_simplefb(const struct screen_info *si,
return -EINVAL;
}
 
-   /* don't use lfb_size as it may contain the whole VMEM instead of only
-* the part that is occupied by the framebuffer */
-   len = mode->height * mode->stride;
-   len = PAGE_ALIGN(len);
-   if (len > (u64)si->lfb_size << 16) {
+   /*
+* Don't use lfb_size as IORESOURCE size, since it may contain the
+* entire VMEM, and thus require huge mappings. Use just the part we
+* need, that is, the part where the framebuffer is located. But verify
+* that it does not exceed the advertised VMEM.
+* Note that in case of VBE, the lfb_size is shifted by 16 bits for
+* historical reasons.
+*/
+   size = si->lfb_size;
+   if (si->orig_video_isVGA == VIDEO_TYPE_VLFB)
+   size <<= 16;
+   length = mode->height * mode->stride;
+   length = PAGE_ALIGN(length);
+   if (length > size) {
printk(KERN_WARNING "sysfb: VRAM smaller than advertised\n");
return -EINVAL;
}
@@ -96,7 +105,7 @@ __init int create_simplefb(const struct screen_info *si,
res.flags = IORESOURCE_MEM | IORESOURCE_BUSY;
res.name = simplefb_resname;
res.start = base;
-   res.end = res.start + len - 1;
+   res.end = res.start + length - 1;
if (res.end <= res.start)
return -EINVAL;
 


[tip:x86/urgent] x86/sysfb: Add support for 64bit EFI lfb_base

2016-11-16 Thread tip-bot for David Herrmann
Commit-ID:  9164b4ceb7b492a77c7fe770a4b9d1375c9cd45a
Gitweb: http://git.kernel.org/tip/9164b4ceb7b492a77c7fe770a4b9d1375c9cd45a
Author: David Herrmann 
AuthorDate: Tue, 15 Nov 2016 13:01:57 +0100
Committer:  Ingo Molnar 
CommitDate: Wed, 16 Nov 2016 09:38:22 +0100

x86/sysfb: Add support for 64bit EFI lfb_base

The screen_info object was extended to support 64-bit lfb_base addresses
in:

  ae2ee627dc87 ("efifb: Add support for 64-bit frame buffer addresses")

However, the x86 simple-framebuffer setup code never made use of it. Fix
it to properly assemble and verify the lfb_base before advertising
simple-framebuffer devices.

In particular, this means if VIDEO_CAPABILITY_64BIT_BASE is set, the
screen_info->ext_lfb_base field will contain the upper 32bit of the
actual lfb_base. Make sure the address is not 0 (i.e., unset), as well as
does not overflow the physical address type.

Signed-off-by: David Herrmann 
Acked-by: Thomas Gleixner 
Cc: Linus Torvalds 
Cc: Matt Fleming 
Cc: Peter Zijlstra 
Cc: Tom Gundersen 
Link: http://lkml.kernel.org/r/20161115120158.15388-2-dh.herrm...@gmail.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/sysfb_simplefb.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/sysfb_simplefb.c b/arch/x86/kernel/sysfb_simplefb.c
index 764a29f..35b8641 100644
--- a/arch/x86/kernel/sysfb_simplefb.c
+++ b/arch/x86/kernel/sysfb_simplefb.c
@@ -67,6 +67,20 @@ __init int create_simplefb(const struct screen_info *si,
struct platform_device *pd;
struct resource res;
unsigned long len;
+   u64 base;
+
+   /*
+* If the 64BIT_BASE capability is set, ext_lfb_base will contain the
+* upper half of the base address. Assemble the address, then make sure
+* it is valid and we can actually access it.
+*/
+   base = si->lfb_base;
+   if (si->capabilities & VIDEO_CAPABILITY_64BIT_BASE)
+   base |= (u64)si->ext_lfb_base << 32;
+   if (!base || (u64)(resource_size_t)base != base) {
+   printk(KERN_DEBUG "sysfb: inaccessible VRAM base\n");
+   return -EINVAL;
+   }
 
/* don't use lfb_size as it may contain the whole VMEM instead of only
 * the part that is occupied by the framebuffer */
@@ -81,8 +95,8 @@ __init int create_simplefb(const struct screen_info *si,
memset(&res, 0, sizeof(res));
res.flags = IORESOURCE_MEM | IORESOURCE_BUSY;
res.name = simplefb_resname;
-   res.start = si->lfb_base;
-   res.end = si->lfb_base + len - 1;
+   res.start = base;
+   res.end = res.start + len - 1;
if (res.end <= res.start)
return -EINVAL;
 


[PATCH 1/2] x86/sysfb: add support for 64bit EFI lfb_base

2016-11-15 Thread David Herrmann
The screen_info object was extended to support 64bit lfb_base addresses
in:

commit ae2ee627dc87a70910de91b791b3cd0e9c6facdd
Author: Matt Fleming 
Date:   Tue Aug 25 16:32:55 2015 +0100

efifb: Add support for 64-bit frame buffer addresses

However, the x86 simple-framebuffer setup code never made use of it. Fix
it to properly assemble and verify the lfb_base before advertising
simple-framebuffer devices.

In particular, this means if VIDEO_CAPABILITY_64BIT_BASE is set, the
screen_info->ext_lfb_base field will contain the upper 32bit of the
actual lfb_base. Make sure the address is not 0 (i.e., unset), as well as
does not overflow the physical address type.

Signed-off-by: David Herrmann 
---
 arch/x86/kernel/sysfb_simplefb.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/sysfb_simplefb.c b/arch/x86/kernel/sysfb_simplefb.c
index 764a29f..35b8641 100644
--- a/arch/x86/kernel/sysfb_simplefb.c
+++ b/arch/x86/kernel/sysfb_simplefb.c
@@ -67,6 +67,20 @@ __init int create_simplefb(const struct screen_info *si,
struct platform_device *pd;
struct resource res;
unsigned long len;
+   u64 base;
+
+   /*
+* If the 64BIT_BASE capability is set, ext_lfb_base will contain the
+* upper half of the base address. Assemble the address, then make sure
+* it is valid and we can actually access it.
+*/
+   base = si->lfb_base;
+   if (si->capabilities & VIDEO_CAPABILITY_64BIT_BASE)
+   base |= (u64)si->ext_lfb_base << 32;
+   if (!base || (u64)(resource_size_t)base != base) {
+   printk(KERN_DEBUG "sysfb: inaccessible VRAM base\n");
+   return -EINVAL;
+   }
 
/* don't use lfb_size as it may contain the whole VMEM instead of only
 * the part that is occupied by the framebuffer */
@@ -81,8 +95,8 @@ __init int create_simplefb(const struct screen_info *si,
memset(&res, 0, sizeof(res));
res.flags = IORESOURCE_MEM | IORESOURCE_BUSY;
res.name = simplefb_resname;
-   res.start = si->lfb_base;
-   res.end = si->lfb_base + len - 1;
+   res.start = base;
+   res.end = res.start + len - 1;
if (res.end <= res.start)
return -EINVAL;
 
-- 
2.10.2



[PATCH 2/2] x86/sysfb: fix lfb_size calculation

2016-11-15 Thread David Herrmann
The screen_info.lfb_size field is shifted by 16 bits *only* in case of
VBE. This has historical reasons since VBE advertised it similarly.
However, in case of EFI framebuffers, the size is no longer shifted. Fix
the x86 simple-framebuffer setup code to use the correct size in the
non-VBE case.

While at it, avoid variable abbreviations and rename 'len' to 'length',
and use the correct types matching the screen_info definition.

Signed-off-by: David Herrmann 
---
 arch/x86/kernel/sysfb_simplefb.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/sysfb_simplefb.c b/arch/x86/kernel/sysfb_simplefb.c
index 35b8641..85195d4 100644
--- a/arch/x86/kernel/sysfb_simplefb.c
+++ b/arch/x86/kernel/sysfb_simplefb.c
@@ -66,8 +66,8 @@ __init int create_simplefb(const struct screen_info *si,
 {
struct platform_device *pd;
struct resource res;
-   unsigned long len;
-   u64 base;
+   u64 base, size;
+   u32 length;
 
/*
 * If the 64BIT_BASE capability is set, ext_lfb_base will contain the
@@ -82,11 +82,20 @@ __init int create_simplefb(const struct screen_info *si,
return -EINVAL;
}
 
-   /* don't use lfb_size as it may contain the whole VMEM instead of only
-* the part that is occupied by the framebuffer */
-   len = mode->height * mode->stride;
-   len = PAGE_ALIGN(len);
-   if (len > (u64)si->lfb_size << 16) {
+   /*
+* Don't use lfb_size as IORESOURCE size, since it may contain the
+* entire VMEM, and thus require huge mappings. Use just the part we
+* need, that is, the part where the framebuffer is located. But verify
+* that it does not exceed the advertised VMEM.
+* Note that in case of VBE, the lfb_size is shifted by 16 bits for
+* historical reasons.
+*/
+   size = si->lfb_size;
+   if (si->orig_video_isVGA == VIDEO_TYPE_VLFB)
+   size <<= 16;
+   length = mode->height * mode->stride;
+   length = PAGE_ALIGN(length);
+   if (length > size) {
printk(KERN_WARNING "sysfb: VRAM smaller than advertised\n");
return -EINVAL;
}
@@ -96,7 +105,7 @@ __init int create_simplefb(const struct screen_info *si,
res.flags = IORESOURCE_MEM | IORESOURCE_BUSY;
res.name = simplefb_resname;
res.start = base;
-   res.end = res.start + len - 1;
+   res.end = res.start + length - 1;
if (res.end <= res.start)
return -EINVAL;
 
-- 
2.10.2



[PATCH 0/2] x86 Simplefb Fixes

2016-11-15 Thread David Herrmann
Hi

Two fixes for the simplefb setup code in arch/x86/. One to support Matt's 64bit
EFI fb-base-address, and one to fix a longer standing verification bug.

Neither fix is critical. Additionally, most people still use efifb/vesafb
instead of simplefb, so probably does not affect any normal distribution.
However, SimpleDRM is close to being merged, so x86-sysfb might get into
some distros.

Thanks
David

David Herrmann (2):
  x86/sysfb: add support for 64bit EFI lfb_base
  x86/sysfb: fix lfb_size calculation

 arch/x86/kernel/sysfb_simplefb.c | 39 +++
 1 file changed, 31 insertions(+), 8 deletions(-)

-- 
2.10.2



Re: [RFC v1 00/14] Bus1 Kernel Message Bus

2016-11-02 Thread David Herrmann
Hi

On Thu, Oct 27, 2016 at 2:45 AM, Kirill A. Shutemov
 wrote:
> On Wed, Oct 26, 2016 at 10:34:30PM +0200, David Herrmann wrote:
>> Long story short: We have uid<->uid quotas so far, which prevent DoS
>> attacks, unless you get access to a ridiculous amount of local UIDs.
>> Details on which resources are accounted can be found in the wiki [1].
>
> Does only root user_ns uid count as separate or per-ns too?
>
> In first case we will have vitually unbounded access to UIDs.
>
> The second case can cap number of user namespaces a user can create while
> using bus1 inside.
>
> Or am I missing something?

We use the exact same mechanism as "struct user_struct" (as defined in
linux/sched.h). One instance corresponds to each kuid_t currently in
use. This is analogous to task, epoll, inotify, fanotify, mqueue,
pipes, keys, ... resource accounting.

Could you elaborate on what problem you see?

Thanks
David


Re: [RFC v1 04/14] bus1: util - fixed list utility library

2016-10-27 Thread David Herrmann
Hi

On Thu, Oct 27, 2016 at 2:37 PM, Peter Zijlstra  wrote:
> On Wed, Oct 26, 2016 at 09:18:00PM +0200, David Herrmann wrote:
>> + e = kmalloc_array(sizeof(*e), BUS1_FLIST_BATCH + 1, gfp);
>
>> +#define BUS1_FLIST_BATCH (1024)
>
>> +struct bus1_flist {
>> + union {
>> + struct bus1_flist *next;
>> + void *ptr;
>> + };
>> +};
>
> So that's an allocation of 8*(1024+1), or slightly more than 2 pages.
>
> kmalloc will round up to the next power of two, so you'll end up with an
> allocation of 16*1024, wasting a whopping 8184 bytes per such allocation
> in slack space.
>
> Please consider using 1023 or something for your batch size, 511 would
> get you to exactly 1 page which would be even better.

Thanks for the hint! 511 looks like the obvious choice. Maybe even
(PAGE_SIZE / sizeof(long) - 1). I will put a comment next to the
definition.

Thanks
David


Re: [RFC v1 04/14] bus1: util - fixed list utility library

2016-10-27 Thread David Herrmann
Hi

On Thu, Oct 27, 2016 at 2:56 PM, Arnd Bergmann  wrote:
> On Thursday, October 27, 2016 2:48:46 PM CEST David Herrmann wrote:
>> On Thu, Oct 27, 2016 at 2:37 PM, Peter Zijlstra  wrote:
>> > On Wed, Oct 26, 2016 at 09:18:00PM +0200, David Herrmann wrote:
>> >> + e = kmalloc_array(sizeof(*e), BUS1_FLIST_BATCH + 1, gfp);
>> >
>> >> +#define BUS1_FLIST_BATCH (1024)
>> >
>> >> +struct bus1_flist {
>> >> + union {
>> >> + struct bus1_flist *next;
>> >> + void *ptr;
>> >> + };
>> >> +};
>> >
>> > So that's an allocation of 8*(1024+1), or slightly more than 2 pages.
>> >
>> > kmalloc will round up to the next power of two, so you'll end up with an
>> > allocation of 16*1024, wasting a whopping 8184 bytes per such allocation
>> > in slack space.
>> >
>> > Please consider using 1023 or something for your batch size, 511 would
>> > get you to exactly 1 page which would be even better.
>>
>> Thanks for the hint! 511 looks like the obvious choice. Maybe even
>> (PAGE_SIZE / sizeof(long) - 1). I will put a comment next to the
>> definition.
>>
>>
>
> PAGE_SIZE can be up to 64KB though, so that might lead wasting a lot
> of memory.

The bus1-flist implementation never over-allocates. It is a fixed size
list, so it only allocates as much memory as needed. The issue PeterZ
pointed out is passing suitable sizes to kmalloc(), which internally
over-allocates to power-of-2 bounds (or some similar bounds). So we
only ever waste space here if kmalloc() internally rounds up. The code
in bus1-flist allocates exactly the needed space.

Thanks
David


Re: [RFC v1 00/14] Bus1 Kernel Message Bus

2016-10-26 Thread David Herrmann
Hi

On Wed, Oct 26, 2016 at 9:39 PM, Linus Torvalds
 wrote:
> So the thing that tends to worry me about these is resource management.
>
> If I understood the documentation correctly, this has per-user
> resource management, which guarantees that at least the system won't
> run out of memory. Good. The act of sending a message transfers the
> resource to the receiver end. Fine.
>
> However, the usual problem ends up being that a bad user can basically
> DoS a system agent, especially since for obvious performance reasons
> the send/receive has to be asynchronous.
>
> So the usual DoS model is that some user just sends a lot of messages
> to a system agent, filling up the system agent resource quota, and
> basically killing the system. No, it didn't run out of memory, but the
> system agent may not be able to do anything more, since it is now out
> of resources.
>
> Keeping the resource management with the sender doesn't solve the
> problem, it just reverses it: now the attack will be to send a lot of
> queries to the system agent, but then just refuse to listen to the
> replies - again causing the system agent to run out of resources.
>
> Usually the way this is resolved this is by forcing a
> "request-and-reply" resource management model, where the person who
> sends out a request is not only the one who is accounted for the
> request, but also accounted for the reply buffer. That way the system
> agent never runs out of resources, because it's always the requesting
> party that has its resources accounted, never the system agent.
>
> You may well have solved this, but can you describe what the solution
> is without forcing people to read the code and try to analyze it?

All accounting on bus1 is done on a UID-basis. This is the initial
model that tries to match POSIX semantics. More advanced accounting is
left as a future extension (like cgroup-based, etc.). So whenever we
talk about "user accounting", we talk about the user-abstraction in
bus1 that right now is based on UIDs, but could be extended for other
schemes.

All bus1 resources are owned by a peer, and each peer has a user
assigned (which right now corresponds to file->f_cred->uid). Whenever
a peer allocates resources, it is accounted on its user. There are
global limits per user which cannot be exceeded. Additionally, each
peer can set its own per-peer limits, to further partition the
per-user limits. Of course, per-user limits override per-peer limits,
if necessary.

Now this is all trivial and obvious. It works like any resource
accounting in the kernel. It becomes tricky when we try to transfer
resources. Before SEND, a resource is always accounted on the sender.
After RECV, a resource is accounted on the receiver. That is, resource
ownership is transferred. In most cases this is obvious: memory is
copied from one address-space into another, or file-descriptors are
added into the file-table of another process, etc.

Lastly, when a resource is queued, we decided to go with
receiver-accounting. This means, at the time of SEND resource
ownership is transferred (unlike sender-accounting, which would
transfer it at time of RECV). The reasons are manifold, but mainly we
want RECV to not fail due to accounting, resource exhaustion, etc. We
wanted SEND to do the heavy-lifting, and RECV to just dequeue. By
avoiding sender-based accounting, we avoid attacks where a receiver
does not dequeue messages and thus exhausts the sender's limits. The
issue left is senders DoS'ing a target user. To mitigate this, we
implemented a quota system. Whenever a sender wants to transfer
resources to a receiver, it only gets access to a subset of the
receivers resource limits. The inflight resources are accounted on a
uid<->uid basis, and the current algorithm allows a receiver access to
at most half the limit of the destination not currently used by anyone
else.

Example:
Imagine a receiver with a limit of 1024 handles. A sender transmits a
message to that receiver. It gets access to half the limit not used by
anyone else, hence 512 handles. It does not matter how many senders
there are, nor how many messages are sent, it will reach its quota at
512. As long as they all belong to the same user, they will share the
quota and can queue at most 512 handles. If a second sending user
comes into play, it gets half the remaining not used by anyone else,
which ends up being 256. And so on... If the peer dequeues messages in
between, the numbers get higher again. But if you do the math, the
most you can get is 50% of the targets resources, if you're the only
sender. In all other cases you get less (like intertwined transfers,
etc).

We did look into sender-based inflight accounting, but the same set of
issues arises. Sure, a Request+Reply model would make this easier to
handle, but we want to explicitly support a Subscribe+Event{n} model.
In this case there is more than one Reply to a message.

Long story short: We have uid<->uid quotas so far, which prevent DoS
attacks, unless 

[RFC v1 02/14] bus1: provide stub cdev /dev/bus1

2016-10-26 Thread David Herrmann
From: Tom Gundersen 

Add the CONFIG_BUS1 option to enable the bus1 kernel messaging bus. If
enabled, provide the bus1.ko module with a stub cdev /dev/bus1. So far
it does not expose any API, but the full intended uapi is provided in
include/uapi/linux/bus1.h already.

Signed-off-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 include/uapi/linux/bus1.h | 138 ++
 init/Kconfig  |  17 ++
 ipc/Makefile  |   1 +
 ipc/bus1/Makefile |   6 ++
 ipc/bus1/main.c   |  80 +++
 ipc/bus1/main.h   |  74 +
 ipc/bus1/tests.c  |  19 +++
 ipc/bus1/tests.h  |  32 +++
 8 files changed, 367 insertions(+)
 create mode 100644 include/uapi/linux/bus1.h
 create mode 100644 ipc/bus1/Makefile
 create mode 100644 ipc/bus1/main.c
 create mode 100644 ipc/bus1/main.h
 create mode 100644 ipc/bus1/tests.c
 create mode 100644 ipc/bus1/tests.h

diff --git a/include/uapi/linux/bus1.h b/include/uapi/linux/bus1.h
new file mode 100644
index 000..8ec3357
--- /dev/null
+++ b/include/uapi/linux/bus1.h
@@ -0,0 +1,138 @@
+#ifndef _UAPI_LINUX_BUS1_H
+#define _UAPI_LINUX_BUS1_H
+
+/*
+ * Copyright (C) 2013-2016 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include 
+#include 
+
+#define BUS1_FD_MAX(256)
+
+#define BUS1_IOCTL_MAGIC   0x96
+#define BUS1_HANDLE_INVALID((__u64)-1)
+#define BUS1_OFFSET_INVALID((__u64)-1)
+
+enum {
+   BUS1_HANDLE_FLAG_MANAGED= 1ULL <<  0,
+   BUS1_HANDLE_FLAG_REMOTE = 1ULL <<  1,
+};
+
+enum {
+   BUS1_PEER_FLAG_WANT_SECCTX  = 1ULL <<  0,
+};
+
+enum {
+   BUS1_PEER_RESET_FLAG_FLUSH  = 1ULL <<  0,
+   BUS1_PEER_RESET_FLAG_FLUSH_SEED = 1ULL <<  1,
+};
+
+struct bus1_cmd_peer_reset {
+   __u64 flags;
+   __u64 peer_flags;
+   __u32 max_slices;
+   __u32 max_handles;
+   __u32 max_inflight_bytes;
+   __u32 max_inflight_fds;
+} __attribute__((__aligned__(8)));
+
+struct bus1_cmd_handle_transfer {
+   __u64 flags;
+   __u64 src_handle;
+   __u64 dst_fd;
+   __u64 dst_handle;
+} __attribute__((__aligned__(8)));
+
+enum {
+   BUS1_NODES_DESTROY_FLAG_RELEASE_HANDLES = 1ULL <<  0,
+};
+
+struct bus1_cmd_nodes_destroy {
+   __u64 flags;
+   __u64 ptr_nodes;
+   __u64 n_nodes;
+} __attribute__((__aligned__(8)));
+
+enum {
+   BUS1_SEND_FLAG_CONTINUE = 1ULL <<  0,
+   BUS1_SEND_FLAG_SEED = 1ULL <<  1,
+};
+
+struct bus1_cmd_send {
+   __u64 flags;
+   __u64 ptr_destinations;
+   __u64 ptr_errors;
+   __u64 n_destinations;
+   __u64 ptr_vecs;
+   __u64 n_vecs;
+   __u64 ptr_handles;
+   __u64 n_handles;
+   __u64 ptr_fds;
+   __u64 n_fds;
+} __attribute__((__aligned__(8)));
+
+enum {
+   BUS1_RECV_FLAG_PEEK = 1ULL <<  0,
+   BUS1_RECV_FLAG_SEED = 1ULL <<  1,
+   BUS1_RECV_FLAG_INSTALL_FDS  = 1ULL <<  2,
+};
+
+enum {
+   BUS1_MSG_NONE,
+   BUS1_MSG_DATA,
+   BUS1_MSG_NODE_DESTROY,
+   BUS1_MSG_NODE_RELEASE,
+};
+
+enum {
+   BUS1_MSG_FLAG_HAS_SECCTX= 1ULL <<  0,
+   BUS1_MSG_FLAG_CONTINUE  = 1ULL <<  1,
+};
+
+struct bus1_cmd_recv {
+   __u64 flags;
+   __u64 max_offset;
+   struct {
+   __u64 type;
+   __u64 flags;
+   __u64 destination;
+   __u32 uid;
+   __u32 gid;
+   __u32 pid;
+   __u32 tid;
+   __u64 offset;
+   __u64 n_bytes;
+   __u64 n_handles;
+   __u64 n_fds;
+   __u64 n_secctx;
+   } __attribute__((__aligned__(8))) msg;
+} __attribute__((__aligned__(8)));
+
+enum {
+   BUS1_CMD_PEER_DISCONNECT= _IOWR(BUS1_IOCTL_MAGIC, 0x00,
+   __u64),
+   BUS1_CMD_PEER_QUERY = _IOWR(BUS1_IOCTL_MAGIC, 0x01,
+   struct bus1_cmd_peer_reset),
+   BUS1_CMD_PEER_RESET = _IOWR(BUS1_IOCTL_MAGIC, 0x02,
+   struct bus1_cmd_peer_reset),
+   BUS1_CMD_HANDLE_RELEASE = _IOWR(BUS1_IOCTL_MAGIC, 0x10,
+   

[RFC v1 03/14] bus1: util - active reference utility library

2016-10-26 Thread David Herrmann
From: Tom Gundersen 

The bus1_active object implements active references. They work
similarly to plain object reference counters, but allow disabling
any new references from being taken.

Each bus1_active object goes through a set of states:
NEW: Initial state, no active references can be acquired
ACTIVE: Live state, active references can be acquired
DRAINING: Deactivated but lingering, no active references
  can be acquired
DRAINED: Deactivated and all active references were dropped
RELEASED: Fully drained and synchronously released

Initially, all bus1_active objects are in state NEW. As soon as they're
activated, they enter ACTIVE and active references can be acquired.
This is the normal, live state. Once the object is deactivated, it
enters state DRAINING. No new active references can be acquired, but
some threads might still own active references. Once all those are
dropped, the object enters state DRAINED. Now the object can be
released a *single* time, before it enters state RELEASED and is
finished. It cannot be re-used anymore.

Active-references are very useful to track threads that invoke callbacks
on an object. As long as a callback is running, an active reference is
held, and as such the object is usually protected from being destroyed.
The destructor of the object needs to deactivate *and* drain the object,
before releasing resources.

Active references will be used heavy by the upcoming bus1_peer object.
Whenever a peer operates on a remote peer, it must acquire and hold an
active reference on that remote peer. This guarantees that the remote
peer will wait for this operation to finish before possibly
disconnecting from the bus.
In concept, active-references can be seen as rw-locks. However, they
have much more strict state-transitions. Prior art can be seen in
super-blocks ('atomic_t s_active'), and kernfs ('atomic_t active').

Signed-off-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 ipc/bus1/Makefile  |   3 +-
 ipc/bus1/util/active.c | 419 +
 ipc/bus1/util/active.h | 154 ++
 3 files changed, 575 insertions(+), 1 deletion(-)
 create mode 100644 ipc/bus1/util/active.c
 create mode 100644 ipc/bus1/util/active.h

diff --git a/ipc/bus1/Makefile b/ipc/bus1/Makefile
index d3a4491..9e491691 100644
--- a/ipc/bus1/Makefile
+++ b/ipc/bus1/Makefile
@@ -1,5 +1,6 @@
 bus1-y :=  \
-   main.o
+   main.o  \
+   util/active.o
 
 obj-$(CONFIG_BUS1) += bus1.o
 
diff --git a/ipc/bus1/util/active.c b/ipc/bus1/util/active.c
new file mode 100644
index 000..5f5fdaa
--- /dev/null
+++ b/ipc/bus1/util/active.c
@@ -0,0 +1,419 @@
+/*
+ * Copyright (C) 2013-2016 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include 
+#include 
+#include 
+#include 
+#include "active.h"
+
+/*
+ * Bias values track states of "active references". They're all negative. If an
+ * object is active, its active-ref-counter is >=0 and tracks all active
+ * references. Once an object is deactivated, we subtract ACTIVE_BIAS. This
+ * means, the counter is now negative but still counts the active references.
+ * Once it drops to exactly ACTIVE_BIAS, we know all active references were
+ * dropped. Exactly one thread will change it to ACTIVE_RELEASE now, perform
+ * cleanup and then put it into ACTIVE_DONE. Once released, all other threads
+ * that tried deactivating the node will now be woken up (thus, they wait until
+ * the object is fully done).
+ * The initial state during object setup is ACTIVE_NEW. If an object is
+ * directly deactivated without having ever been active, it is put into
+ * ACTIVE_RELEASE_DIRECT instead of ACTIVE_BIAS. This tracks this one-bit state
+ * across deactivation. The task putting it into ACTIVE_RELEASE now knows
+ * whether the object was active before or not.
+ *
+ * We support lockdep annotations for 'active references'. We treat active
+ * references as a read-trylock, and deactivation as a write-lock.
+ *
+ * Some archs implement atomic_sub(v) with atomic_add(-v), so reserve INT_MIN
+ * to avoid overflows if multiplied by -1.
+ */
+#define BUS1_ACTIVE_RELEASE_DIRECT (BUS1_ACTIVE_BIAS - 1)
+#define BUS1_ACTIVE_RELEASE(BUS1_ACTIVE_BIAS - 2)
+#define BUS1_ACTIVE_DONE   (BUS1_ACTIVE_BIAS - 3)
+#define BUS1_ACTIVE_NEW(BUS1_ACTIVE_BIAS - 4)
+#define _BUS1_ACTIVE_RESERVED  (BUS1_ACTIVE_BIAS - 5)
+
+/**
+ * bus1_active_init_private() - initialize object
+ * @active:object to initialize
+ *
+ * This initializes an act

[RFC v1 08/14] bus1: implement peer management context

2016-10-26 Thread David Herrmann
From: Tom Gundersen 

A peer context provides access to the bus1 system. A peer itself is not
a routable entity, but rather only a local anchor to serve as gateway to
the bus. To participate on the bus, you need to allocate a peer. This
peer manages all your state on the bus, including all allocated nodes,
owned handles, incoming messages, and more.

A peer is split into 3 sections:
  - A static section that is initialized at peer creation and never
changes
  - A peer-local section that is only ever accessed by ioctls done by
the peer itself.
  - A data section that might be accessed by remote peers when
interacting with this peer.

All peers on the system operate on the same level. There is no context
a peer is linked into. Hence, you can never lock multiple peers at the
same time. Instead, peers provide active-references. Before performing
an operation on a peer, an active reference must be acquired, and hold
as long as the operation goes on. When done, the reference is released
again. When a peer is disconnected, no more active references can be
acquired, and any outstanding operation is waited for before the peer
is destroyed.

Additionally to active-references, there are 2 locks: A peer-local lock
and a data lock. The peer-local lock is used to synchronize operations
done by the peer itself. It is never acquired by a remote peer. The
data lock protects the data of the peer, which might be modified by
remote peers. The data lock nests underneath the local-lock.
Furthermore, the data-lock critical sections must be kept small and
never block indefinitely. Remote peers might wait for data-locks, hence
they must rely on not being DoSed. The local peer lock, however, is
private to the peer itself. Not such restrictions apply. It is mostly
used to give the impression of atomic operations (i.e., making the API
appear consistent and coherent).

This only adds the peer context, the ioctls will be implemented in
follow-up patches.

Signed-off-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 ipc/bus1/Makefile |   2 +
 ipc/bus1/main.c   |  17 +++
 ipc/bus1/main.h   |  14 ++
 ipc/bus1/peer.c   | 145 +
 ipc/bus1/peer.h   | 146 ++
 ipc/bus1/util.c   |  52 +++
 ipc/bus1/util.h   |  51 +++
 7 files changed, 427 insertions(+)
 create mode 100644 ipc/bus1/peer.c
 create mode 100644 ipc/bus1/peer.h
 create mode 100644 ipc/bus1/util.c
 create mode 100644 ipc/bus1/util.h

diff --git a/ipc/bus1/Makefile b/ipc/bus1/Makefile
index 94d79e0..c689917 100644
--- a/ipc/bus1/Makefile
+++ b/ipc/bus1/Makefile
@@ -1,6 +1,8 @@
 bus1-y :=  \
main.o  \
+   peer.o  \
user.o  \
+   util.o  \
util/active.o   \
util/flist.o\
util/pool.o \
diff --git a/ipc/bus1/main.c b/ipc/bus1/main.c
index 526347d..51034f3 100644
--- a/ipc/bus1/main.c
+++ b/ipc/bus1/main.c
@@ -15,24 +15,41 @@
 #include 
 #include 
 #include "main.h"
+#include "peer.h"
 #include "tests.h"
 #include "user.h"
 
 static int bus1_fop_open(struct inode *inode, struct file *file)
 {
+   struct bus1_peer *peer;
+
+   peer = bus1_peer_new();
+   if (IS_ERR(peer))
+   return PTR_ERR(peer);
+
+   file->private_data = peer;
return 0;
 }
 
 static int bus1_fop_release(struct inode *inode, struct file *file)
 {
+   bus1_peer_free(file->private_data);
return 0;
 }
 
+static void bus1_fop_show_fdinfo(struct seq_file *m, struct file *file)
+{
+   struct bus1_peer *peer = file->private_data;
+
+   seq_printf(m, KBUILD_MODNAME "-peer:\t%16llx\n", peer->id);
+}
+
 const struct file_operations bus1_fops = {
.owner  = THIS_MODULE,
.open   = bus1_fop_open,
.release= bus1_fop_release,
.llseek = noop_llseek,
+   .show_fdinfo= bus1_fop_show_fdinfo,
 };
 
 static struct miscdevice bus1_misc = {
diff --git a/ipc/bus1/main.h b/ipc/bus1/main.h
index 76fce66..dd319d9 100644
--- a/ipc/bus1/main.h
+++ b/ipc/bus1/main.h
@@ -49,6 +49,20 @@
  * ordered, including unicasts, multicasts, and notifications.
  */
 
+/**
+ * Locking
+ *
+ * Most of the bus1 objects form a hierarchy, as such, their locks must be
+ * ordered. Not all orders are explicitly defined (e.g., they might define
+ * orthogonal hierarchies), but this list gives a rough overview:
+ *
+ *   bus1_peer.active
+ * bus1_peer.local.lock
+ *   bus1_peer.data.lock
+ * bus1_user.lock
+ * bus1_user_lock
+ */
+
 struct dentry;
 struct file_operations;
 
diff --git a/ipc/bus1/peer.c b/ipc/bus1/peer.c
new file mode 100644
index 000..a6fbca01
--- /dev/null
+++ b/ipc/bus1/peer.c

[RFC v1 09/14] bus1: provide transaction context for multicasts

2016-10-26 Thread David Herrmann
From: Tom Gundersen 

The transaction engine is an object that lives on the stack and is used
to stage and commit multicasts properly. Unlike unicasts, a multicast
cannot just be queued on each destination, but must be properly
synchronized. This requires us to first stage each message on their
respective destination, then sync and tick the clocks, and eventual
commit all messages.

The transaction context implements this logic for both, unicasts and
multicasts. It hides the timestamp handling and takes care to properly
synchronize accesses to the peer queues.

Signed-off-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 ipc/bus1/Makefile |   1 +
 ipc/bus1/peer.c   |   2 +
 ipc/bus1/peer.h   |   3 +
 ipc/bus1/tx.c | 360 ++
 ipc/bus1/tx.h | 102 
 5 files changed, 468 insertions(+)
 create mode 100644 ipc/bus1/tx.c
 create mode 100644 ipc/bus1/tx.h

diff --git a/ipc/bus1/Makefile b/ipc/bus1/Makefile
index c689917..e3c7dd7 100644
--- a/ipc/bus1/Makefile
+++ b/ipc/bus1/Makefile
@@ -1,6 +1,7 @@
 bus1-y :=  \
main.o  \
peer.o  \
+   tx.o\
user.o  \
util.o  \
util/active.o   \
diff --git a/ipc/bus1/peer.c b/ipc/bus1/peer.c
index a6fbca01..3421f8c 100644
--- a/ipc/bus1/peer.c
+++ b/ipc/bus1/peer.c
@@ -70,6 +70,7 @@ struct bus1_peer *bus1_peer_new(void)
 
/* initialize data section */
mutex_init(&peer->data.lock);
+   bus1_queue_init(&peer->data.queue);
 
/* initialize peer-private section */
mutex_init(&peer->local.lock);
@@ -131,6 +132,7 @@ struct bus1_peer *bus1_peer_free(struct bus1_peer *peer)
mutex_destroy(&peer->local.lock);
 
/* deinitialize data section */
+   bus1_queue_deinit(&peer->data.queue);
mutex_destroy(&peer->data.lock);
 
/* deinitialize constant fields */
diff --git a/ipc/bus1/peer.h b/ipc/bus1/peer.h
index 277fcf8..149ddf6 100644
--- a/ipc/bus1/peer.h
+++ b/ipc/bus1/peer.h
@@ -54,6 +54,7 @@
 #include 
 #include "user.h"
 #include "util/active.h"
+#include "util/queue.h"
 
 struct cred;
 struct dentry;
@@ -71,6 +72,7 @@ struct pid_namespace;
  * @active:active references
  * @debugdir:  debugfs root of this peer, or NULL/ERR_PTR
  * @data.lock: data lock
+ * @data.queue:message queue
  * @local.lock:local peer runtime lock
  */
 struct bus1_peer {
@@ -86,6 +88,7 @@ struct bus1_peer {
 
struct {
struct mutex lock;
+   struct bus1_queue queue;
} data;
 
struct {
diff --git a/ipc/bus1/tx.c b/ipc/bus1/tx.c
new file mode 100644
index 000..6ff8949
--- /dev/null
+++ b/ipc/bus1/tx.c
@@ -0,0 +1,360 @@
+/*
+ * Copyright (C) 2013-2016 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "peer.h"
+#include "tx.h"
+#include "util/active.h"
+#include "util/queue.h"
+
+static void bus1_tx_push(struct bus1_tx *tx,
+struct bus1_queue_node **list,
+struct bus1_queue_node *qnode)
+{
+   struct bus1_peer *peer = qnode->owner;
+
+   /*
+* Push @qnode onto one of the lists in @tx (specified as @list). Note
+* that each list has different locking/ordering requirements, which
+* the caller has to verify. This helper does not check them.
+*
+* Whenever something is pushed on a list, we make sure it has the tx
+* set as group. Furthermore, we tell lockdep that its peer was
+* released. This is required to allow holding hundreds of peers in a
+* multicast without exceeding the lockdep limits of allowed locks held
+* in parallel.
+* Note that pushing a qnode on a list consumes the qnode together with
+* its set owner. The caller must not access it, except by popping it
+* from the list or using one of the internal list-iterators. In other
+* words, we say that a caller must be aware of lockdep limitations
+* whenever they hold an unlimited number of peers. However, if they
+* make sure they only ever hold a fixed number, but use transaction
+* lists to stash them, the transaction lists make sure to properly
+* avoid lockdep limitations.
+*/
+
+   WARN_ON(qnode->group && tx != qnode->group);
+   WARN_ON(qn

[RFC v1 05/14] bus1: util - pool utility library

2016-10-26 Thread David Herrmann
From: Tom Gundersen 

A bus1-pool is a shmem-backed memory pool shared between userspace and
the kernel. The pool is used to transfer memory from the kernel to
userspace without requiring userspace to pre-allocate space.

The pool is managed in slices, which are published to userspace when
they are ready to be read and must be released by userspace when
userspace is done with them.

Userspace has read-only access to its pools and the kernel has
read-write access, but published slices are not altered.

This pool implementation will be used by bus1 message transactions to
support single-copy data transfers, directly from the sender's address
space into the pool of the destination peer. The allocation algorithm
is based on the Android Binder code and has served their needs well for
many years now.

Signed-off-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 ipc/bus1/Makefile|   3 +-
 ipc/bus1/util/pool.c | 572 +++
 ipc/bus1/util/pool.h | 164 +++
 3 files changed, 738 insertions(+), 1 deletion(-)
 create mode 100644 ipc/bus1/util/pool.c
 create mode 100644 ipc/bus1/util/pool.h

diff --git a/ipc/bus1/Makefile b/ipc/bus1/Makefile
index 6db6d13..ca8e19d 100644
--- a/ipc/bus1/Makefile
+++ b/ipc/bus1/Makefile
@@ -1,7 +1,8 @@
 bus1-y :=  \
main.o  \
util/active.o   \
-   util/flist.o
+   util/flist.o\
+   util/pool.o
 
 obj-$(CONFIG_BUS1) += bus1.o
 
diff --git a/ipc/bus1/util/pool.c b/ipc/bus1/util/pool.c
new file mode 100644
index 000..2ddbffb
--- /dev/null
+++ b/ipc/bus1/util/pool.c
@@ -0,0 +1,572 @@
+/*
+ * Copyright (C) 2013-2016 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "pool.h"
+
+static struct bus1_pool_slice *bus1_pool_slice_new(size_t offset, size_t size)
+{
+   struct bus1_pool_slice *slice;
+
+   if (offset > U32_MAX || size == 0 || size > BUS1_POOL_SLICE_SIZE_MAX)
+   return ERR_PTR(-EMSGSIZE);
+
+   slice = kmalloc(sizeof(*slice), GFP_KERNEL);
+   if (!slice)
+   return ERR_PTR(-ENOMEM);
+
+   slice->offset = offset;
+   slice->size = size;
+
+   return slice;
+}
+
+static struct bus1_pool_slice *
+bus1_pool_slice_free(struct bus1_pool_slice *slice)
+{
+   if (!slice)
+   return NULL;
+
+   kfree(slice);
+
+   return NULL;
+}
+
+/* insert slice into the free tree */
+static void bus1_pool_slice_link_free(struct bus1_pool_slice *slice,
+ struct bus1_pool *pool)
+{
+   struct rb_node **n, *prev = NULL;
+   struct bus1_pool_slice *ps;
+
+   n = &pool->slices_free.rb_node;
+   while (*n) {
+   prev = *n;
+   ps = container_of(prev, struct bus1_pool_slice, rb);
+   if (slice->size < ps->size)
+   n = &prev->rb_left;
+   else
+   n = &prev->rb_right;
+   }
+
+   rb_link_node(&slice->rb, prev, n);
+   rb_insert_color(&slice->rb, &pool->slices_free);
+}
+
+/* insert slice into the busy tree */
+static void bus1_pool_slice_link_busy(struct bus1_pool_slice *slice,
+ struct bus1_pool *pool)
+{
+   struct rb_node **n, *prev = NULL;
+   struct bus1_pool_slice *ps;
+
+   n = &pool->slices_busy.rb_node;
+   while (*n) {
+   prev = *n;
+   ps = container_of(prev, struct bus1_pool_slice, rb);
+   if (WARN_ON(slice->offset == ps->offset))
+   n = &prev->rb_right; /* add anyway */
+   else if (slice->offset < ps->offset)
+   n = &prev->rb_left;
+   else /* if (slice->offset > ps->offset) */
+   n = &prev->rb_right;
+   }
+
+   rb_link_node(&slice->rb, prev, n);
+   rb_insert_color(&slice->rb, &pool->slices_busy);
+
+   pool->allocated_size += slice->size;
+}
+
+/* find free slice big enough to hold @size bytes */
+static struct bus1_pool_slice *
+bus1_pool_slice_find_by_size(struct bus1_pool *pool, size_t size)
+{
+   struct bus1_pool_slice *ps, *closest = NULL;
+   struct rb_node *n;
+
+   n = pool->slices_free.rb_node;
+   while (n) {
+   ps = container_of(n, struct bus1_pool_s

[RFC v1 13/14] bus1: limit and protect resources

2016-10-26 Thread David Herrmann
From: Tom Gundersen 

This adds resource-counters to peers and users. They limit the number
of objects that a peer can operate on. The root limits per user can be
configured as a module option. Everything else is just based on them.

This also adds LSM hooks. They are not integrated with ./security/, yet,
but just provide the hooks as discussed with the SELinux maintainers.
Since the operations on bus1 are very similar in nature to Binder, the
hooks are the same as well (since the LSM people seemed pretty happy
with the Binder hooks, anyway).

Signed-off-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 ipc/bus1/message.c  |  47 +-
 ipc/bus1/peer.c |  95 ++-
 ipc/bus1/peer.h |   2 +
 ipc/bus1/security.h |  45 +
 ipc/bus1/user.c | 475 
 ipc/bus1/user.h |  75 -
 6 files changed, 728 insertions(+), 11 deletions(-)
 create mode 100644 ipc/bus1/security.h

diff --git a/ipc/bus1/message.c b/ipc/bus1/message.c
index 4c5c905..6145d5f 100644
--- a/ipc/bus1/message.c
+++ b/ipc/bus1/message.c
@@ -26,6 +26,7 @@
 #include "handle.h"
 #include "message.h"
 #include "peer.h"
+#include "security.h"
 #include "tx.h"
 #include "user.h"
 #include "util.h"
@@ -242,9 +243,16 @@ int bus1_factory_seal(struct bus1_factory *f)
struct bus1_handle *h;
struct bus1_flist *e;
size_t i;
+   int r;
 
lockdep_assert_held(&f->peer->local.lock);
 
+   r = bus1_user_charge(&f->peer->user->limits.n_handles,
+&f->peer->data.limits.n_handles,
+f->n_handles_charge);
+   if (r < 0)
+   return r;
+
for (i = 0, e = f->handles;
 i < f->n_handles;
 e = bus1_flist_next(e, &i)) {
@@ -291,11 +299,29 @@ struct bus1_message *bus1_factory_instantiate(struct 
bus1_factory *f,
transmit_secctx = f->has_secctx &&
  (READ_ONCE(peer->flags) & BUS1_PEER_FLAG_WANT_SECCTX);
 
+   r = bus1_user_charge(&peer->user->limits.n_slices,
+&peer->data.limits.n_slices, 1);
+   if (r < 0)
+   return ERR_PTR(r);
+
+   r = bus1_user_charge(&peer->user->limits.n_handles,
+&peer->data.limits.n_handles, f->n_handles);
+   if (r < 0) {
+   bus1_user_discharge(&peer->user->limits.n_slices,
+   &peer->data.limits.n_slices, 1);
+   return ERR_PTR(r);
+   }
+
size = sizeof(*m) + bus1_flist_inline_size(f->n_handles) +
   f->n_files * sizeof(struct file *);
m = kmalloc(size, GFP_KERNEL);
-   if (!m)
+   if (!m) {
+   bus1_user_discharge(&peer->user->limits.n_handles,
+   &peer->data.limits.n_handles, f->n_handles);
+   bus1_user_discharge(&peer->user->limits.n_slices,
+   &peer->data.limits.n_slices, 1);
return ERR_PTR(-ENOMEM);
+   }
 
/* set to default first, so the destructor can be called anytime */
kref_init(&m->ref);
@@ -329,6 +355,8 @@ struct bus1_message *bus1_factory_instantiate(struct 
bus1_factory *f,
m->slice = bus1_pool_alloc(&peer->data.pool, size);
mutex_unlock(&peer->data.lock);
if (IS_ERR(m->slice)) {
+   bus1_user_discharge(&peer->user->limits.n_slices,
+   &peer->data.limits.n_slices, 1);
r = PTR_ERR(m->slice);
m->slice = NULL;
goto error;
@@ -376,6 +404,11 @@ struct bus1_message *bus1_factory_instantiate(struct 
bus1_factory *f,
 
/* import files */
while (m->n_files < f->n_files) {
+   r = security_bus1_transfer_file(f->peer, peer,
+   f->files[m->n_files]);
+   if (r < 0)
+   goto error;
+
m->files[m->n_files] = get_file(f->files[m->n_files]);
++m->n_files;
}
@@ -436,10 +469,15 @@ void bus1_message_free(struct kref *k)
bus1_handle_unref(e->ptr);
}
}
+   bus1_user_discharge(&peer->user->limits.n_handles,
+   &peer->data.limits.n_handles, m->n_handles_charge);
bus1_flist_deinit(m->handles, m->n_handles);
 
if (m->slice) {
mutex_lock(&peer->data.lock);
+   if (!bus1_pool_slice_is_public(m->slice))
+   bus1_user_discharge(&peer->user->l

[RFC v1 07/14] bus1: tracking user contexts

2016-10-26 Thread David Herrmann
From: Tom Gundersen 

Different users can communicate via bus1, and many resources are shared
between multiple users. The bus1_user object represents the UID of a
user, like "struct user_struct" does in the kernel core. It is used to
account global resources, apply limits, and calculate quotas if
different UIDs communicate with each other.

All dynamic resources have global per-user limits, which cannot be
exceeded by a user. They prevent a single user from exhausting local
resources. Each peer that is created is always owned by the user that
initialized it. All resources allocated on that peer are accounted on
that pinned user. Additionally to global resources, there are local
limits per peer, that can be controlled by each peer individually
(e.g., specifying a maximum pool size). Those local limits allow a user
to distribute the globally available resources across its peer
instances.

Since bus1 allows communication across UID boundaries, any such
transmission of resources must be properly accounted. Bus1 employs
dynamic quotas to fairly distribute available resources. Those quotas
make sure that available resources of a peer cannot be exhausted by
remote UIDs, but are fairly divided among all communicating peers.

This only implements the user tracking, the resource limits will be
added in follow-up patches.

Signed-off-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 ipc/bus1/Makefile |   1 +
 ipc/bus1/main.c   |   3 ++
 ipc/bus1/user.c   | 153 ++
 ipc/bus1/user.h   |  67 
 4 files changed, 224 insertions(+)
 create mode 100644 ipc/bus1/user.c
 create mode 100644 ipc/bus1/user.h

diff --git a/ipc/bus1/Makefile b/ipc/bus1/Makefile
index 3c90657..94d79e0 100644
--- a/ipc/bus1/Makefile
+++ b/ipc/bus1/Makefile
@@ -1,5 +1,6 @@
 bus1-y :=  \
main.o  \
+   user.o  \
util/active.o   \
util/flist.o\
util/pool.o \
diff --git a/ipc/bus1/main.c b/ipc/bus1/main.c
index 02412a7..526347d 100644
--- a/ipc/bus1/main.c
+++ b/ipc/bus1/main.c
@@ -16,6 +16,7 @@
 #include 
 #include "main.h"
 #include "tests.h"
+#include "user.h"
 
 static int bus1_fop_open(struct inode *inode, struct file *file)
 {
@@ -64,6 +65,7 @@ static int __init bus1_modinit(void)
 
 error:
debugfs_remove(bus1_debugdir);
+   bus1_user_modexit();
return r;
 }
 
@@ -71,6 +73,7 @@ static void __exit bus1_modexit(void)
 {
misc_deregister(&bus1_misc);
debugfs_remove(bus1_debugdir);
+   bus1_user_modexit();
pr_info("unloaded\n");
 }
 
diff --git a/ipc/bus1/user.c b/ipc/bus1/user.c
new file mode 100644
index 000..0498ab4
--- /dev/null
+++ b/ipc/bus1/user.c
@@ -0,0 +1,153 @@
+/*
+ * Copyright (C) 2013-2016 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "user.h"
+
+static DEFINE_MUTEX(bus1_user_lock);
+static DEFINE_IDR(bus1_user_idr);
+
+/**
+ * bus1_user_modexit() - clean up global resources of user accounting
+ *
+ * This function cleans up any remaining global resources that were allocated
+ * by the user accounting helpers. The caller must make sure that no user
+ * object is referenced anymore, before calling this. This function just clears
+ * caches and verifies nothing is leaked.
+ *
+ * This is meant to be called on module-exit.
+ */
+void bus1_user_modexit(void)
+{
+   WARN_ON(!idr_is_empty(&bus1_user_idr));
+   idr_destroy(&bus1_user_idr);
+   idr_init(&bus1_user_idr);
+}
+
+static struct bus1_user *bus1_user_new(void)
+{
+   struct bus1_user *user;
+
+   user = kmalloc(sizeof(*user), GFP_KERNEL);
+   if (!user)
+   return ERR_PTR(-ENOMEM);
+
+   kref_init(&user->ref);
+   user->uid = INVALID_UID;
+   mutex_init(&user->lock);
+
+   return user;
+}
+
+static void bus1_user_free(struct kref *ref)
+{
+   struct bus1_user *user = container_of(ref, struct bus1_user, ref);
+
+   lockdep_assert_held(&bus1_user_lock);
+
+   if (likely(uid_valid(user->uid)))
+   idr_remove(&bus1_user_idr, __kuid_val(user->uid));
+   mutex_destroy(&user->lock);
+   kfree_rcu(user, rcu);
+}
+
+/**
+ * bus1_user_ref_by_uid() - get a user object for a uid
+ * @uid:   uid of the user
+ *
+ * Find and return the user object for the uid if it exists, otherwise create
+ * it first.
+ *
+ * Return: A user object

[RFC v1 06/14] bus1: util - queue utility library

2016-10-26 Thread David Herrmann
From: Tom Gundersen 

(Please refer to 'Lamport Timestamps', the concept of
 'happened-before', and 'causal ordering'. The queue implementation
 has its roots in Lamport Timestamps, treating a set of local CPUs
 as a distributed system to avoid any global synchronization.)

A bus1 message queue is a FIFO, i.e., messages are linearly ordered by
the time they were sent. Moreover, atomic delivery of messages to
multiple queues are supported, without any global synchronization, i.e.,
the order of message delivery is consistent across queues.

Messages can be destined for multiple queues, hence, we need to be
careful that all queues get a consistent order of incoming messages. We
define the concept of `global order' to provide a basic set of
guarantees. This global order is a partial order on the set of all
messages. The order is defined as:

1) If a message B was queued *after* a message A, then: A < B

2) If a message B was queued *after* a message A was dequeued,
   then: A < B

3) If a message B was dequeued *after* a message A on the same queue,
   then: A < B

(Note: Causality is honored. `after' and `before' do not refer to
 the same task, nor the same queue, but rather any kind of
 synchronization between the two operations.)

The queue object implements this global order in a lockless fashion. It
solely relies on a distributed clock on each queue. Each message to be
sent causes a clock tick on the local clock and on all destination
clocks. Furthermore, all clocks are synchronized, meaning they're
fast-forwarded in case they're behind the highest of all participating
peers. No global state tracking is involved.

During a message transaction, we first queue a message as 'staging'
entry in each destination with a preliminary timestamp. This timestamp
is explicitly odd numbered. Any odd numbered timestamp is considered
'staging' and causes *any* message ordered after it to be blocked until
it is no longer staging. This allows us to queue the message in parallel
with any racing multicast, and be guaranteed that all possible conflicts
are blocked until we eventually commit a transaction. To commit a
transaction (after all staging entries are queued), we choose the
highest timestamp we have seen across all destinations and re-queue all
our entries on each peer using that timestamp. Here we use a commit
timestamp (even numbered).

With this in mind, we define that a client can only dequeue messages
from its queue that have an even timestamp. Furthermore, if there is a
message queued with an odd timestamp that is lower than the even
timestamp of another message, then neither message can be dequeued.
They're considered to be in-flight conflicts. This guarantees that two
concurrent multicast messages can be queued without any *global* locks,
but either can only be dequeued by a peer if their ordering has been
established (via commit timestamps).

NOTE: A fully committed message is not guaranteed to be ready to be
  dequeued as it may be blocked by a staging entry. This means
  that there is an arbitrary (though bounded) time from a
  message transaction completing when the queue may still appear
  to be empty. In other words, message transmission is not
  instantaneous. It would be possible to change this at the
  cost of shortly blocking each message transaction on all other
  conflicting tasks.

The queue implementation uses an rb-tree (ordered by timestamps and
sender), with a cached pointer to the front of the queue. It will be
embedded in every peer participating on the bus1 kernel message bus1.

Signed-off-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 ipc/bus1/Makefile |   3 +-
 ipc/bus1/util/queue.c | 445 ++
 ipc/bus1/util/queue.h | 351 +++
 3 files changed, 798 insertions(+), 1 deletion(-)
 create mode 100644 ipc/bus1/util/queue.c
 create mode 100644 ipc/bus1/util/queue.h

diff --git a/ipc/bus1/Makefile b/ipc/bus1/Makefile
index ca8e19d..3c90657 100644
--- a/ipc/bus1/Makefile
+++ b/ipc/bus1/Makefile
@@ -2,7 +2,8 @@ bus1-y :=   \
main.o  \
util/active.o   \
util/flist.o\
-   util/pool.o
+   util/pool.o \
+   util/queue.o
 
 obj-$(CONFIG_BUS1) += bus1.o
 
diff --git a/ipc/bus1/util/queue.c b/ipc/bus1/util/queue.c
new file mode 100644
index 000..38d7b98
--- /dev/null
+++ b/ipc/bus1/util/queue.c
@@ -0,0 +1,445 @@
+/*
+ * Copyright (C) 2013-2016 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */

[RFC v1 00/14] Bus1 Kernel Message Bus

2016-10-26 Thread David Herrmann
Hi

This proposal introduces bus1.ko, a kernel messaging bus. This is not a request
for inclusion, yet. It is rather an initial draft and a Request For Comments.

While bus1 emerged out of the kdbus project, bus1 was started from scratch and
the concepts have little in common. In a nutshell, bus1 provides a
capability-based IPC system, similar in nature to Android Binder, Cap'n Proto,
and seL4. The module is completely generic and does neither require nor mandate
a user-space counter-part.

 o Description

Bus1 is a local IPC system, which provides a decentralized infrastructure to
share objects between local peers. The main building blocks are nodes and
handles. Nodes represent objects of a local peer, while handles represent
descriptors that point to a node. Nodes can be created and destroyed by any
peer, and they will always remain owned by their respective creator. Handles
on the other hand, are used to refer to nodes and can be passed around with
messages as auxiliary data. Whenever a handle is transferred, the receiver
will get its own handle allocated, pointing to the same node as the original
handle.

Any peer can send messages directed at one of their handles. This will
transfer the message to the owner of the node the handle points to. If a
peer does not posess a handle to a given node, it will not be able to send a
message to that node. That is, handles provide exclusive access management.
Anyone that somehow acquired a handle to a node is privileged to further
send this handle to other peers. As such, access management is transitive.
Once a peer acquired a handle, it cannot be revoked again. However, a node
owner can, at anytime, destroy a node. This will effectively unbind all
existing handles to that node on any peer, notifying each one of the
destruction.

Unlike nodes and handles, peers cannot be addressed directly. In fact, peers
are completely disconnected entities. A peer is merely an anchor of a set of
nodes and handles, including an incoming message queue for any of those.
Whether multiple nodes are all part of the same peer, or part of different
peers does not affect the remote view of those. Peers solely exist as
management entity and command dispatcher to local processes.

The set of actors on a system is completely decentralized. There is no
global component involved that provides a central registry or discovery
mechanism. Furthermore, communication between peers only involves those
peers, and does not affect any other peer in any way. No global
communication lock is taken. However, any communication is still globally
ordered, including unicasts, multicasts, and notifications.

 o Prior Art

The concepts behind bus1 are almost identical to capability systems like
Android Binder, Google Mojo, Cap'n Proto, seL4, and more. Bus1 differs from
them by supporting Global Ordering, Multicasts, Resource Accounting, No
Global Locking, No Global Context.

While the bus1 UAPI does not expose all features (like soft-references as
supported by Binder), the in-kernel code includes support for it. Multiple
UAPIs can be supported on top of the in-kernel bus1 code, including support
for the Binder UAPI. Efforts on this are still on-going.

 o Documentation

The first patch in this series provides the bus1(7) man-page. It explains
all concepts in bus1 in more detail. Furthermore, it describes the API that
is available on bus1 file descriptors. The pre-compiled man-page is
available at:

http://www.bus1.org/bus1.html

There is also a great bunch of in-source documentation available. All
cross-source-file APIs have KernelDoc annotations. Furthermore, we have an
introduction for each subsystem, to be found in the header files. The total
number in lines of code for bus1 is roughly ~4.5k. The remaining ~5k LOC
are comments and documentation.

 o Upstream

The upstream development repository is available on github:

http://github.com/bus1/bus1

It is an out-of-tree repository that allows easy and fast development of
new bus1 features. The in-tree integration repository is available at:

http://github.com/bus1/linux

 o Conferences

Tom and I will be attending Linux Plumbers Conf next week. Please do not
hesitate to contact us there in person. There will also be a presentation
[1] of bus1 on the last day of the conference.

Thanks
Tom & David

[1] https://www.linuxplumbersconf.org/2016/ocw/proposals/3819

Tom Gundersen (14):
  bus1: add bus1(7) man-page
  bus1: provide stub cdev /dev/bus1
  bus1: util - active reference utility library
  bus1: util - fixed list utility library
  bus1: util - pool utility library
  bus1: util - queue utility library
  bus1: tracking user contexts
  bus1: implement peer management context
  bus1: provide transaction context for multicasts
  

[RFC v1 14/14] bus1: basic user-space kselftests

2016-10-26 Thread David Herrmann
From: Tom Gundersen 

This adds kselftests integration and provides some basic API tests for
bus1.

Signed-off-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 tools/testing/selftests/bus1/.gitignore   |   2 +
 tools/testing/selftests/bus1/Makefile |  19 ++
 tools/testing/selftests/bus1/bus1-ioctl.h | 111 +++
 tools/testing/selftests/bus1/test-api.c   | 532 ++
 tools/testing/selftests/bus1/test-io.c| 198 +++
 tools/testing/selftests/bus1/test.h   | 114 +++
 6 files changed, 976 insertions(+)
 create mode 100644 tools/testing/selftests/bus1/.gitignore
 create mode 100644 tools/testing/selftests/bus1/Makefile
 create mode 100644 tools/testing/selftests/bus1/bus1-ioctl.h
 create mode 100644 tools/testing/selftests/bus1/test-api.c
 create mode 100644 tools/testing/selftests/bus1/test-io.c
 create mode 100644 tools/testing/selftests/bus1/test.h

diff --git a/tools/testing/selftests/bus1/.gitignore 
b/tools/testing/selftests/bus1/.gitignore
new file mode 100644
index 000..76ecb9c
--- /dev/null
+++ b/tools/testing/selftests/bus1/.gitignore
@@ -0,0 +1,2 @@
+test-api
+test-io
diff --git a/tools/testing/selftests/bus1/Makefile 
b/tools/testing/selftests/bus1/Makefile
new file mode 100644
index 000..cbcf689
--- /dev/null
+++ b/tools/testing/selftests/bus1/Makefile
@@ -0,0 +1,19 @@
+# Makefile for bus1 selftests
+
+CC = $(CROSS_COMPILE)gcc
+CFLAGS += -D_FILE_OFFSET_BITS=64 -Wall -g -O2
+CFLAGS += -I../../../../include/uapi/
+CFLAGS += -I../../../../include/
+CFLAGS += -I../../../../usr/include/
+
+TEST_PROGS := test-api test-io
+
+all: $(TEST_PROGS)
+
+%: %.c bus1-ioctl.h test.h ../../../../usr/include/linux/bus1.h
+   $(CC) $(CFLAGS) $< -o $@
+
+include ../lib.mk
+
+clean:
+   $(RM) $(TEST_PROGS)
diff --git a/tools/testing/selftests/bus1/bus1-ioctl.h 
b/tools/testing/selftests/bus1/bus1-ioctl.h
new file mode 100644
index 000..552bd5d
--- /dev/null
+++ b/tools/testing/selftests/bus1/bus1-ioctl.h
@@ -0,0 +1,111 @@
+#pragma once
+
+/*
+ * Copyright (C) 2013-2016 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+static inline int
+bus1_ioctl(int fd, unsigned int cmd, void *arg)
+{
+   return (ioctl(fd, cmd, arg) >= 0) ? 0: -errno;
+}
+
+static inline int
+bus1_ioctl_peer_disconnect(int fd)
+{
+   static_assert(_IOC_SIZE(BUS1_CMD_PEER_DISCONNECT) == sizeof(uint64_t),
+ "ioctl is called with invalid argument size");
+
+   return bus1_ioctl(fd, BUS1_CMD_PEER_DISCONNECT, NULL);
+}
+
+static inline int
+bus1_ioctl_peer_query(int fd, struct bus1_cmd_peer_reset *cmd)
+{
+   static_assert(_IOC_SIZE(BUS1_CMD_PEER_QUERY) == sizeof(*cmd),
+ "ioctl is called with invalid argument size");
+
+   return bus1_ioctl(fd, BUS1_CMD_PEER_QUERY, cmd);
+}
+
+static inline int
+bus1_ioctl_peer_reset(int fd, struct bus1_cmd_peer_reset *cmd)
+{
+   static_assert(_IOC_SIZE(BUS1_CMD_PEER_RESET) == sizeof(*cmd),
+ "ioctl is called with invalid argument size");
+
+   return bus1_ioctl(fd, BUS1_CMD_PEER_RESET, cmd);
+}
+
+static inline int
+bus1_ioctl_handle_release(int fd, uint64_t *cmd)
+{
+   static_assert(_IOC_SIZE(BUS1_CMD_HANDLE_RELEASE) == sizeof(*cmd),
+ "ioctl is called with invalid argument size");
+
+   return bus1_ioctl(fd, BUS1_CMD_HANDLE_RELEASE, cmd);
+}
+
+static inline int
+bus1_ioctl_handle_transfer(int fd, struct bus1_cmd_handle_transfer *cmd)
+{
+   static_assert(_IOC_SIZE(BUS1_CMD_HANDLE_TRANSFER) == sizeof(*cmd),
+ "ioctl is called with invalid argument size");
+
+   return bus1_ioctl(fd, BUS1_CMD_HANDLE_TRANSFER, cmd);
+}
+
+static inline int
+bus1_ioctl_nodes_destroy(int fd, struct bus1_cmd_nodes_destroy *cmd)
+{
+   static_assert(_IOC_SIZE(BUS1_CMD_NODES_DESTROY) == sizeof(*cmd),
+ "ioctl is called with invalid argument size");
+
+   return bus1_ioctl(fd, BUS1_CMD_NODES_DESTROY, cmd);
+}
+
+static inline int
+bus1_ioctl_slice_release(int fd, uint64_t *cmd)
+{
+   static_assert(_IOC_SIZE(BUS1_CMD_SLICE_RELEASE) == sizeof(*cmd),
+ "ioctl is called with invalid argument size");
+
+   return bus1_ioctl(fd, BUS1_CMD_SLICE_RELEASE, cmd);
+}
+
+static inline int
+bus1_ioctl_send(int fd, struct bus1_cmd_send *cmd)
+{
+   static_assert(_IOC_SIZE(BUS1_CMD_SEND) == sizeof(*cmd),
+ "ioctl is called with invalid argument size");
+
+   return bus1_ioctl(fd, BUS1_CMD_SEND, cmd);
+}
+
+static inline int
+bus

[RFC v1 10/14] bus1: add handle management

2016-10-26 Thread David Herrmann
From: Tom Gundersen 

The object system on a bus is based on 'nodes' and 'handles'. Any peer
can allocate new, local objects at any time. The creator automatically
becomes the sole owner of the object. References to objects can be
passed as payload of messages. The recipient will then gain their own
reference to the object as well. Additionally, an object can be the
destination of a message, in which case the message is always sent to
the original creator (and thus the owner) of the object.

Internally, objects are called 'nodes'. A reference to an object is a
'handle'. Whenever a new node is created, the owner implicitly gains an
handle as well. In fact, handles are the only way to refer to a node.
The node itself is entirely hidden in the implementation, and visible
in the API as an “anchor handle”.

Whenever a handle is passed as payload of a message, the target peer
will gain a handle linked to the same underlying node. This works
regardless of whether the sender is the owner of the underlying node,
or not.

Each peer can identify all its handles (both owned and un-owned) by a
64-bit integer. The namespace is local to each peer, and the numbers
cannot be compared with the numbers of other peers (in fact, they are
very likely to clash, but might still have *different* underlying
nodes). However, if a peer receives a reference to the same node
multiple times, the resulting handle will be the same. The kernel keeps
count of how often each peer owns a handle.

If a peer no longer requires a specific handle, it can release it. If
the peer releases its last reference to a handle, the handle will be
destroyed.

The owner of a node (and *only* the owner) can trigger the destruction
of a node (even if other peers still own handles to it). In this case,
all peers that own a handle are notified of this fact. Once all handles
to a specific node have been released (except for the handle internally
pinned in the node itself), the owner of the node is notified of this,
so it can potentially destroy both any linked state and the node itself.

Node destruction is fully synchronized with any transaction. That is, a
node and all its handles are valid in every message that is transmitted
*before* the notification of its destruction. Furthermore, no message
after this notification will carry the ID of such a destroyed node. Note
that message transactions are asynchronous. That is, there is no unique
point in time that a message is synchronized with another message.
Hence, whether a specific handle passed with a message is still valid or
not, cannot be predicted by the sender, but only by one of the
receivers.

Signed-off-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 ipc/bus1/Makefile |   1 +
 ipc/bus1/handle.c | 823 ++
 ipc/bus1/handle.h | 312 +
 ipc/bus1/peer.c   |   3 +
 ipc/bus1/peer.h   |   2 +
 ipc/bus1/util.h   |  83 ++
 6 files changed, 1224 insertions(+)
 create mode 100644 ipc/bus1/handle.c
 create mode 100644 ipc/bus1/handle.h

diff --git a/ipc/bus1/Makefile b/ipc/bus1/Makefile
index e3c7dd7..b87cddb 100644
--- a/ipc/bus1/Makefile
+++ b/ipc/bus1/Makefile
@@ -1,4 +1,5 @@
 bus1-y :=  \
+   handle.o\
main.o  \
peer.o  \
tx.o\
diff --git a/ipc/bus1/handle.c b/ipc/bus1/handle.c
new file mode 100644
index 000..10f224e
--- /dev/null
+++ b/ipc/bus1/handle.c
@@ -0,0 +1,823 @@
+/*
+ * Copyright (C) 2013-2016 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "handle.h"
+#include "peer.h"
+#include "tx.h"
+#include "util.h"
+#include "util/queue.h"
+
+static void bus1_handle_init(struct bus1_handle *h, struct bus1_peer *holder)
+{
+   kref_init(&h->ref);
+   atomic_set(&h->n_weak, 0);
+   atomic_set(&h->n_user, 0);
+   h->holder = holder;
+   h->anchor = NULL;
+   h->tlink = NULL;
+   RB_CLEAR_NODE(&h->rb_to_peer);
+   h->id = BUS1_HANDLE_INVALID;
+}
+
+static void bus1_handle_deinit(struct bus1_handle *h)
+{
+   if (h == h->anchor) {
+   WARN_ON(atomic_read(&h->node.n_strong) != 0);
+   WARN_ON(!RB_EMPTY_ROOT(&h->node.map_handles));
+   } else if (h->anchor) {
+   WARN_ON(!RB_EMPTY_NODE(&h->remote.rb_to_anchor));
+   bus1_handle_unref(h->anchor);
+   }
+
+   bus1_queue_node_deinit(&h->qnode);
+   WARN_ON(!RB_EMPTY_NODE(&h-&

[RFC v1 11/14] bus1: implement message transmission

2016-10-26 Thread David Herrmann
From: Tom Gundersen 

While notifications already work and simply require linking bus1_handle
objects into the destination queue, real messages require proper
payloads. This implements two core objects: Message objects and
factories.

The message factory is similar to transaction contexts, and lives
completely on the stack. It is used to import the parameters given by
user-space in a SEND ioctl. It parses and validates them. With this
message factors we can now instantiate many messages, one for each
destination of a multicast.

Messages need to carry a bunch of data, mainly:
  - metadata: This just matches what Unix-sockets do (uid, gid, pid,
  tid, and secctx)
  - payload: Random memory passed in as iovec-array by user-space
  - files: Set of file-descriptors, very similar to SCM_RIGHTS
  - handles: Set of local handles to transfer to the destination

Signed-off-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 ipc/bus1/Makefile  |   1 +
 ipc/bus1/message.c | 613 +
 ipc/bus1/message.h | 171 +++
 ipc/bus1/peer.c|   2 +
 ipc/bus1/peer.h|   2 +
 ipc/bus1/util.c| 162 ++
 ipc/bus1/util.h|   7 +
 7 files changed, 958 insertions(+)
 create mode 100644 ipc/bus1/message.c
 create mode 100644 ipc/bus1/message.h

diff --git a/ipc/bus1/Makefile b/ipc/bus1/Makefile
index b87cddb..05434bda 100644
--- a/ipc/bus1/Makefile
+++ b/ipc/bus1/Makefile
@@ -1,6 +1,7 @@
 bus1-y :=  \
handle.o\
main.o  \
+   message.o   \
peer.o  \
tx.o\
user.o  \
diff --git a/ipc/bus1/message.c b/ipc/bus1/message.c
new file mode 100644
index 000..4c5c905
--- /dev/null
+++ b/ipc/bus1/message.c
@@ -0,0 +1,613 @@
+/*
+ * Copyright (C) 2013-2016 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "handle.h"
+#include "message.h"
+#include "peer.h"
+#include "tx.h"
+#include "user.h"
+#include "util.h"
+#include "util/flist.h"
+#include "util/pool.h"
+#include "util/queue.h"
+
+static size_t bus1_factory_size(struct bus1_cmd_send *param)
+{
+   /* make sure @size cannot overflow */
+   BUILD_BUG_ON(UIO_MAXIOV > U16_MAX);
+   BUILD_BUG_ON(BUS1_FD_MAX > U16_MAX);
+
+   /* make sure we do not violate alignment rules */
+   BUILD_BUG_ON(__alignof(struct bus1_flist) < __alignof(struct iovec));
+   BUILD_BUG_ON(__alignof(struct iovec) < __alignof(struct file *));
+
+   return sizeof(struct bus1_factory) +
+  bus1_flist_inline_size(param->n_handles) +
+  param->n_vecs * sizeof(struct iovec) +
+  param->n_fds * sizeof(struct file *);
+}
+
+/**
+ * bus1_factory_new() - create new message factory
+ * @peer:  peer to operate as
+ * @param: factory parameters
+ * @stack: optional stack for factory, or NULL
+ * @n_stack:   size of space at @stack
+ *
+ * This allocates a new message factory. It imports data from @param and
+ * prepares the factory for a transaction. From this factory, messages can be
+ * instantiated. This is used both for unicasts and multicasts.
+ *
+ * If @stack is given, this tries to place the factory on the specified stack
+ * space. The caller must guarantee that the factory does not outlive the stack
+ * frame. If this is not wanted, pass 0 as @n_stack.
+ * In either case, if the stack frame is too small, this will allocate the
+ * factory on the heap.
+ *
+ * Return: Pointer to factory, or ERR_PTR on failure.
+ */
+struct bus1_factory *bus1_factory_new(struct bus1_peer *peer,
+ struct bus1_cmd_send *param,
+ void *stack,
+ size_t n_stack)
+{
+   const struct iovec __user *ptr_vecs;
+   const u64 __user *ptr_handles;
+   const int __user *ptr_fds;
+   struct bus1_factory *f;
+   struct bus1_flist *e;
+   struct file *file;
+   size_t i, size;
+   bool is_new;
+   int r, fd;
+   u32 sid;
+   u64 id;
+
+   lockdep_assert_held(&peer->local.lock);
+
+   size = bus1_factory_size(param);
+   if (unlikely(size > n_stack)) {
+   f = kmalloc(size, GFP_TEMPORARY);
+   if (!f)

[RFC v1 01/14] bus1: add bus1(7) man-page

2016-10-26 Thread David Herrmann
From: Tom Gundersen 

Add new directory ./Documentation/bus1/ and include DocBook scripts to
build the bus1(7) man-page. This documents the bus1 character-device,
including all the file-operations you can perform on it.

Furthermore, the man-page also introduces the core bus1 concepts and
explains how they work.

Build the bus1-documentation via:

$ make -C Documentation/bus1/ mandocs
$ make -C Documentation/bus1/ htmldocs

Signed-off-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 Documentation/bus1/.gitignore |   2 +
 Documentation/bus1/Makefile   |  41 ++
 Documentation/bus1/bus1.xml   | 833 ++
 Documentation/bus1/stylesheet.xsl |  16 +
 4 files changed, 892 insertions(+)
 create mode 100644 Documentation/bus1/.gitignore
 create mode 100644 Documentation/bus1/Makefile
 create mode 100644 Documentation/bus1/bus1.xml
 create mode 100644 Documentation/bus1/stylesheet.xsl

diff --git a/Documentation/bus1/.gitignore b/Documentation/bus1/.gitignore
new file mode 100644
index 000..b4a77cc
--- /dev/null
+++ b/Documentation/bus1/.gitignore
@@ -0,0 +1,2 @@
+*.7
+*.html
diff --git a/Documentation/bus1/Makefile b/Documentation/bus1/Makefile
new file mode 100644
index 000..d2b9e61
--- /dev/null
+++ b/Documentation/bus1/Makefile
@@ -0,0 +1,41 @@
+cmd = $(cmd_$(1))
+srctree = $(shell pwd)
+src = .
+obj = $(srctree)/$(src)
+
+DOCS := \
+   bus1.xml
+
+XMLFILES := $(addprefix $(obj)/,$(DOCS))
+MANFILES := $(patsubst %.xml, %.7, $(XMLFILES))
+HTMLFILES := $(patsubst %.xml, %.html, $(XMLFILES))
+
+XMLTO_ARGS := \
+   -m $(srctree)/$(src)/stylesheet.xsl \
+   --skip-validation \
+   --stringparam funcsynopsis.style=ansi \
+   --stringparam man.output.quietly=1 \
+   --stringparam man.authors.section.enabled=0 \
+   --stringparam man.copyright.section.enabled=0
+
+quiet_cmd_db2man = MAN $@
+  cmd_db2man = xmlto man $(XMLTO_ARGS) -o $(obj) $<
+%.7: %.xml
+   @(which xmlto > /dev/null 2>&1) || \
+(echo "*** You need to install xmlto ***"; \
+ exit 1)
+   $(call cmd,db2man)
+
+quiet_cmd_db2html = HTML$@
+  cmd_db2html = xmlto html-nochunks $(XMLTO_ARGS) -o $(obj) $<
+%.html: %.xml
+   @(which xmlto > /dev/null 2>&1) || \
+(echo "*** You need to install xmlto ***"; \
+ exit 1)
+   $(call cmd,db2html)
+
+mandocs: $(MANFILES)
+
+htmldocs: $(HTMLFILES)
+
+clean-files := $(MANFILES) $(HTMLFILES)
diff --git a/Documentation/bus1/bus1.xml b/Documentation/bus1/bus1.xml
new file mode 100644
index 000..40b55c0
--- /dev/null
+++ b/Documentation/bus1/bus1.xml
@@ -0,0 +1,833 @@
+ 
+http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd";>
+
+
+
+  
+bus1
+    bus1
+
+
+  
+Documentation
+David
+Herrmann
+  
+  
+Documentation
+Tom
+Gundersen
+  
+
+  
+
+  
+bus1
+7
+  
+
+  
+bus1
+Kernel Message Bus
+  
+
+  
+
+  #include <linux/bus1.h>
+
+  
+
+   
+Description
+
+The bus1 Kernel Message Bus defines and implements a distributed object model.
+It allows local processes to send messages to objects owned by remote 
processes,
+as well as share their own objects with others. Object ownership is static and
+cannot be transferred. Access to remote objects is prohibited, unless it was
+explicitly granted. Processes can transmit messages to a remote object via the
+message bus, transferring a data payload, object access rights, file
+descriptors, or other auxiliary data.
+
+
+To participate on the message bus, a peer context must be created. Peer 
contexts
+are kernel objects, identified by a file descriptor. They are not bound to any
+process, but can be shared freely. The peer context provides a message queue to
+store all incoming messages, a registry for all locally owned objects, and
+tracks access rights to remote objects. A peer context never serves as
+routing entity, but merely as anchor for peer-owned resources. Any message on
+the bus is always destined for an object, and the bus takes care to transfer a
+message into the message queue of the peer context that owns this object.
+
+
+The message bus manages object access using capabilities. That is, by default
+only the owner of an object is granted access rights. No other peer can access
+the object, nor are they aware of the existance of the object. However, access
+rights can be transmitted as auxiliary data with any message, effectively
+granting them to the receiver of the message. This even works transitively, 
that
+is, any peer that was granted access to an object can pass on those rights, 
even
+if they do not own the object. But mind that no access rights can ever be
+revoked, besides the owner destroying the object.
+
+
+
+  Nodes and Handles
+  
+Each peer context comes with a registry of owned objects, wh

[RFC v1 12/14] bus1: hook up file-operations

2016-10-26 Thread David Herrmann
From: Tom Gundersen 

This hooks up all the file-operations on a bus1-file-descriptor. It
implements the ioctls as defined in the UAPI, as well as mmap() and
poll() support.

Signed-off-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 ipc/bus1/main.c |  46 +++
 ipc/bus1/peer.c | 934 +++-
 ipc/bus1/peer.h |   8 +
 3 files changed, 987 insertions(+), 1 deletion(-)

diff --git a/ipc/bus1/main.c b/ipc/bus1/main.c
index 51034f3..d5a726a 100644
--- a/ipc/bus1/main.c
+++ b/ipc/bus1/main.c
@@ -14,10 +14,17 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 #include "main.h"
 #include "peer.h"
 #include "tests.h"
 #include "user.h"
+#include "util/active.h"
+#include "util/pool.h"
+#include "util/queue.h"
 
 static int bus1_fop_open(struct inode *inode, struct file *file)
 {
@@ -37,6 +44,41 @@ static int bus1_fop_release(struct inode *inode, struct file 
*file)
return 0;
 }
 
+static unsigned int bus1_fop_poll(struct file *file,
+ struct poll_table_struct *wait)
+{
+   struct bus1_peer *peer = file->private_data;
+   unsigned int mask;
+
+   poll_wait(file, &peer->waitq, wait);
+
+   /* access queue->front unlocked */
+   rcu_read_lock();
+   if (bus1_active_is_deactivated(&peer->active)) {
+   mask = POLLHUP;
+   } else {
+   mask = POLLOUT | POLLWRNORM;
+   if (bus1_queue_is_readable_rcu(&peer->data.queue))
+   mask |= POLLIN | POLLRDNORM;
+   }
+   rcu_read_unlock();
+
+   return mask;
+}
+
+static int bus1_fop_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   struct bus1_peer *peer = file->private_data;
+   int r;
+
+   if (!bus1_peer_acquire(peer))
+   return -ESHUTDOWN;
+
+   r = bus1_pool_mmap(&peer->data.pool, vma);
+   bus1_peer_release(peer);
+   return r;
+}
+
 static void bus1_fop_show_fdinfo(struct seq_file *m, struct file *file)
 {
struct bus1_peer *peer = file->private_data;
@@ -48,7 +90,11 @@ const struct file_operations bus1_fops = {
.owner  = THIS_MODULE,
.open   = bus1_fop_open,
.release= bus1_fop_release,
+   .poll   = bus1_fop_poll,
.llseek = noop_llseek,
+   .mmap   = bus1_fop_mmap,
+   .unlocked_ioctl = bus1_peer_ioctl,
+   .compat_ioctl   = bus1_peer_ioctl,
.show_fdinfo= bus1_fop_show_fdinfo,
 };
 
diff --git a/ipc/bus1/peer.c b/ipc/bus1/peer.c
index 0ff7a98..f0da4a7 100644
--- a/ipc/bus1/peer.c
+++ b/ipc/bus1/peer.c
@@ -23,11 +23,52 @@
 #include 
 #include 
 #include 
+#include 
+#include "handle.h"
 #include "main.h"
+#include "message.h"
 #include "peer.h"
+#include "tx.h"
 #include "user.h"
 #include "util.h"
 #include "util/active.h"
+#include "util/pool.h"
+#include "util/queue.h"
+
+static struct bus1_queue_node *
+bus1_peer_free_qnode(struct bus1_queue_node *qnode)
+{
+   struct bus1_message *m;
+   struct bus1_handle *h;
+
+   /*
+* Queue-nodes are generic entities that can only be destroyed by who
+* created them. That is, they have no embedded release callback.
+* Instead, we must detect them by type. Since the queue logic is kept
+* generic, it cannot provide this helper. Instead, we have this small
+* destructor here, which simply dispatches to the correct handler.
+*/
+
+   if (qnode) {
+   switch (bus1_queue_node_get_type(qnode)) {
+   case BUS1_MSG_DATA:
+   m = container_of(qnode, struct bus1_message, qnode);
+   bus1_message_unref(m);
+   break;
+   case BUS1_MSG_NODE_DESTROY:
+   case BUS1_MSG_NODE_RELEASE:
+   h = container_of(qnode, struct bus1_handle, qnode);
+   bus1_handle_unref(h);
+   break;
+   case BUS1_MSG_NONE:
+   default:
+   WARN(1, "Unknown message type\n");
+   break;
+   }
+   }
+
+   return NULL;
+}
 
 /**
  * bus1_peer_new() - allocate new peer
@@ -47,6 +88,7 @@ struct bus1_peer *bus1_peer_new(void)
const struct cred *cred = current_cred();
struct bus1_peer *peer;
struct bus1_user *user;
+   int r;
 
user = bus1_user_ref_by_uid(cred->uid);
if (IS_ERR(user))
@@ -75,9 +117,14 @@ struct bus1_peer *bus1_peer_new(void)
 
/* initialize peer-private section */
mutex_init(&peer->local.lock);
+   peer-

[RFC v1 04/14] bus1: util - fixed list utility library

2016-10-26 Thread David Herrmann
From: Tom Gundersen 

This implements a fixed-size list called bus1_flist. The size of
the list must be constant over the lifetime of the list. The list
can hold one arbitrary pointer per node.

Fixed lists are a combination of a linked list and a static array.
That is, fixed lists behave like linked lists (no random access, but
arbitrary size), but compare in speed with arrays (consequetive
accesses are fast). Unlike fixed arrays, fixed lists can hold huge
number of elements without requiring vmalloc, but solely relying on
small-size kmalloc allocations.

Internally, fixed lists are a singly-linked list of static arrays.
This guarantees that iterations behave almost like on an array,
except when crossing a batch-border.

Fixed lists can replace fixed-size arrays whenever you need to support
large number of elements, but don't need random access. Fixed lists
have ALMOST the same memory requirements as fixed-size arrays, except
one pointer of state per 'BUS1_FLIST_BATCH' elements. If only a small
size (i.e., it only requires one batch) is stored in a fixed list,
then its memory requirements and iteration time are equivalent to
fixed-size arrays.

Fixed lists will be required by the upcoming bus1 message-transactions.
They must support large auxiliary data transfers, in case users want to
send their entire handle state via the bus.

Signed-off-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 ipc/bus1/Makefile |   3 +-
 ipc/bus1/util/flist.c | 116 +
 ipc/bus1/util/flist.h | 202 ++
 3 files changed, 320 insertions(+), 1 deletion(-)
 create mode 100644 ipc/bus1/util/flist.c
 create mode 100644 ipc/bus1/util/flist.h

diff --git a/ipc/bus1/Makefile b/ipc/bus1/Makefile
index 9e491691..6db6d13 100644
--- a/ipc/bus1/Makefile
+++ b/ipc/bus1/Makefile
@@ -1,6 +1,7 @@
 bus1-y :=  \
main.o  \
-   util/active.o
+   util/active.o   \
+   util/flist.o
 
 obj-$(CONFIG_BUS1) += bus1.o
 
diff --git a/ipc/bus1/util/flist.c b/ipc/bus1/util/flist.c
new file mode 100644
index 000..b8b0d4e
--- /dev/null
+++ b/ipc/bus1/util/flist.c
@@ -0,0 +1,116 @@
+/*
+ * Copyright (C) 2013-2016 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include 
+#include 
+#include 
+#include "flist.h"
+
+/**
+ * bus1_flist_populate() - populate an flist
+ * @list:  flist to operate on
+ * @n: number of elements
+ * @gfp:   GFP to use for allocations
+ *
+ * Populate an flist. This pre-allocates the backing memory for an flist that
+ * was statically initialized via bus1_flist_init(). This is NOT needed if the
+ * list was allocated via bus1_flist_new().
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int bus1_flist_populate(struct bus1_flist *list, size_t n, gfp_t gfp)
+{
+   if (gfp & __GFP_ZERO)
+   memset(list, 0, bus1_flist_inline_size(n));
+
+   if (unlikely(n > BUS1_FLIST_BATCH)) {
+   /* Never populate twice! */
+   WARN_ON(list[BUS1_FLIST_BATCH].next);
+
+   n -= BUS1_FLIST_BATCH;
+   list[BUS1_FLIST_BATCH].next = bus1_flist_new(n, gfp);
+   if (!list[BUS1_FLIST_BATCH].next)
+   return -ENOMEM;
+   }
+
+   return 0;
+}
+
+/**
+ * bus1_flist_new() - allocate new flist
+ * @n: number of elements
+ * @gfp:   GFP to use for allocations
+ *
+ * This allocates a new flist ready to store @n elements.
+ *
+ * Return: Pointer to flist, NULL if out-of-memory.
+ */
+struct bus1_flist *bus1_flist_new(size_t n, gfp_t gfp)
+{
+   struct bus1_flist list, *e, *slot;
+   size_t remaining;
+
+   list.next = NULL;
+   slot = &list;
+   remaining = n;
+
+   while (remaining >= BUS1_FLIST_BATCH) {
+   e = kmalloc_array(sizeof(*e), BUS1_FLIST_BATCH + 1, gfp);
+   if (!e)
+   return bus1_flist_free(list.next, n);
+
+   slot->next = e;
+   slot = &e[BUS1_FLIST_BATCH];
+   slot->next = NULL;
+
+   remaining -= BUS1_FLIST_BATCH;
+   }
+
+   if (remaining > 0) {
+   slot->next = kmalloc_array(remaining, sizeof(*e), gfp);
+   if (!slot->next)
+   return bus1_flist_free(list.next, n);
+   }
+
+   return list.next;
+}
+
+/**
+ * bus1_flist_free() - free flist
+ * @list:  flist to operate on, or NULL
+ * @n: number of elements
+ *
+ * This deallocates an flist previously crea

Re: drm: NULL pointer dereference in drm_mode_object_find()

2016-09-21 Thread David Herrmann
Hi

On Fri, Aug 19, 2016 at 7:10 PM, Alexander Potapenko  wrote:
> Hello,
>
> the program below triggers a NULL deref in DRM code when ran on QEMU:
>
> ===
> BUG: unable to handle kernel NULL pointer dereference at   (null)
> IP: [< inline >] __list_add ./include/linux/list.h:44
> IP: [< inline >] list_add_tail ./include/linux/list.h:77
> IP: [< inline >] __mutex_lock_common kernel/locking/mutex.c:543
> IP: [] __mutex_lock_slowpath+0x6f/0x100
> kernel/locking/mutex.c:824
> PGD 1c555067 PUD 1c554067 PMD 0
> Oops: 0002 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 2517 Comm: crash_drm_mode_ Not tainted 4.8.0-rc2+ #1157
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88001c40a700 task.stack: 88001c984000
> RIP: 0010:[]  []
> __mutex_lock_slowpath+0x6f/0x100
> RSP: 0018:88001c987cb0  EFLAGS: 00010282
> RAX:  RBX: 88001d5212a0 RCX: c100
> RDX: 0001 RSI: 88001c40a700 RDI: 88001d5212a4
> RBP: 88001c987cf8 R08: 88001c984000 R09: 
> R10:  R11:  R12: 88001c40a700
> R13: 88001d5212a4 R14:  R15: 88001d5212a8
> FS:  00dc9880() GS:88001f00() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2:  CR3: 1c8a9000 CR4: 000406f0
> Stack:
>  88001d5212a8   811a398f
>  88001d5212a0 88001d5212a0  
>  81a6eb20 88001c987d10 818e85ba 88001d521000
> Call Trace:
>  [< inline >] __mutex_fastpath_lock 
> ./arch/x86/include/asm/mutex_64.h:28
>  [] mutex_lock+0x1a/0x30 kernel/locking/mutex.c:102
>  [] _object_find+0x23/0xc0 drivers/gpu/drm/drm_crtc.c:329
>  [< inline >] drm_mode_object_find drivers/gpu/drm/drm_crtc.c:360
>  [< inline >] drm_crtc_find ./include/drm/drm_crtc.h:2999
>  [] drm_mode_page_flip_ioctl+0x4e/0x300
> drivers/gpu/drm/drm_crtc.c:5414
>  [] drm_ioctl+0x2a2/0x460 drivers/gpu/drm/drm_ioctl.c:721
>  [< inline >] vfs_ioctl fs/ioctl.c:43
>  [] do_vfs_ioctl+0x8d/0x580 fs/ioctl.c:675
>  [< inline >] SYSC_ioctl fs/ioctl.c:690
>  [] SyS_ioctl+0x74/0x80 fs/ioctl.c:681
>  [] entry_SYSCALL_64_fastpath+0x13/0x8f
> arch/x86/entry/entry_64.S:207
> Code: e8 37 23 00 00 8b 03 83 f8 01 0f 84 95 00 00 00 48 8b 43 10 4c
> 8d 7b 08 48 89 63 10 41 be ff ff ff ff 4c 89 3c 24 48 89 44 24 08 <48>
> 89 20 4c 89 64 24 10 eb 19 49 c7 04 24 02 00 00 00 c6 43 04
> RIP  [< inline >] __list_add ./include/linux/list.h:44
> RIP  [< inline >] list_add_tail ./include/linux/list.h:77
> RIP  [< inline >] __mutex_lock_common kernel/locking/mutex.c:543
> RIP  [] __mutex_lock_slowpath+0x6f/0x100
> kernel/locking/mutex.c:824
>  RSP 
> CR2: 
> ---[ end trace 3cef4eb618ac6bb6 ]---
> ===
>
> // autogenerated by syzkaller (http://github.com/google/syzkaller)
> #include 
> #include 
> #include 
>
> int main()
> {
>   int fd = open("/dev/dri/card0", 0);
>   mmap(0x20036000ul, 0x1000ul, 0x3ul, 0x32ul, 0xul, 0x0ul);
>   memcpy((void*)0x20036ad7,
>  "\x1e\xa4\x45\xdc\xca\x11\xff\x25\x72\x65\x7e\x4a\x56\x54\x35"
>  "\x67\xe3\x8b\x41\x5c\x6d\x69\xa5\xf9\x88\x29\xb8\xc9\x6a\x45"
>  "\x76\xa9\xe7\x14\xd1\xf6\xa3\x59\x07\x4d\xe5\xc8\x39\xbf\x33"
>  "\xb9\x3d\x21\xd1\xaf\x16\x4d\xbc\xbf\xb1\x0a\x92\x97\xd9\x91"
>  "\x4d\xd8\xf8\xa1\xa6\xa3\x20\x02\x2c\x5e\x8f\x87\x05\x8b\xeb"
>  "\x9a\xb9\xbc\xa6\x60\x45\x8d\x19\x01\x7d\xb7\xef\x64\x62\x2e"
>  "\x5e\x3d\xfe\x65\xbf\xe2\x80\x89\x36\x48\x73\xc6\xa2\x6e\xe2"
>  "\x1a\x8f\x1b\x11\x6f\x49\x20\xeb\x74\x2d\x41\xb9\x8b\xb4\x8e"
>  "\x8b\xf5\x6d\xb7\xb1\xa3\xcb\xc4\xc2\x7f\x6d\xef\x32\xef\xa7"
>  "\x1c\x17\x03\x60\x7b\x31\x1f\x66",
>  143);
>   ioctl(fd, 0xbb0ul, 0x20036ad7ul, 0, 0, 0);
>   return 0;
> }
>
> I build the ToT kernel (commit
> 952b159f2919a8d514f13999f9f463bddcc1dae7, Aug 18) with defconfig and
> CONFIG_DRM_VGEM=y.

Can you make sure you have this commit:


commit 6f00975c619064a18c23fd3aced325ae165a73b9
Author: Daniel Vetter 
Date:   Sat Aug 20 12:22:11 2016 +0200

drm: Reject page_flip for !DRIVER_MODESET


Thanks
David


Re: drm: NULL pointer dereference in drm_mode_object_find()

2016-09-21 Thread David Herrmann
Hi

On Tue, Sep 20, 2016 at 11:25 AM, Alexander Potapenko  wrote:
> On Tue, Sep 20, 2016 at 11:21 AM, David Herrmann  
> wrote:
>> Hi
>>
>> On Mon, Sep 5, 2016 at 10:30 AM, Dmitry Vyukov  wrote:
>>> On Fri, Aug 19, 2016 at 7:10 PM, Alexander Potapenko  
>>> wrote:
>>>> Hello,
>>>>
>>>> the program below triggers a NULL deref in DRM code when ran on QEMU:
>>>>
>>>> ===
>>>> BUG: unable to handle kernel NULL pointer dereference at   (null)
>>>> IP: [< inline >] __list_add ./include/linux/list.h:44
>>>> IP: [< inline >] list_add_tail ./include/linux/list.h:77
>>>> IP: [< inline >] __mutex_lock_common kernel/locking/mutex.c:543
>>>> IP: [] __mutex_lock_slowpath+0x6f/0x100
>>>> kernel/locking/mutex.c:824
>>>> PGD 1c555067 PUD 1c554067 PMD 0
>>>> Oops: 0002 [#1] SMP
>>>> Modules linked in:
>>>> CPU: 0 PID: 2517 Comm: crash_drm_mode_ Not tainted 4.8.0-rc2+ #1157
>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 
>>>> 01/01/2011
>>>> task: 88001c40a700 task.stack: 88001c984000
>>>> RIP: 0010:[]  []
>>>> __mutex_lock_slowpath+0x6f/0x100
>>>> RSP: 0018:88001c987cb0  EFLAGS: 00010282
>>>> RAX:  RBX: 88001d5212a0 RCX: c100
>>>> RDX: 0001 RSI: 88001c40a700 RDI: 88001d5212a4
>>>> RBP: 88001c987cf8 R08: 88001c984000 R09: 
>>>> R10:  R11:  R12: 88001c40a700
>>>> R13: 88001d5212a4 R14:  R15: 88001d5212a8
>>>> FS:  00dc9880() GS:88001f00() 
>>>> knlGS:
>>>> CS:  0010 DS:  ES:  CR0: 80050033
>>>> CR2:  CR3: 1c8a9000 CR4: 000406f0
>>>> Stack:
>>>>  88001d5212a8   811a398f
>>>>  88001d5212a0 88001d5212a0  
>>>>  81a6eb20 88001c987d10 818e85ba 88001d521000
>>>> Call Trace:
>>>>  [< inline >] __mutex_fastpath_lock 
>>>> ./arch/x86/include/asm/mutex_64.h:28
>>>>  [] mutex_lock+0x1a/0x30 kernel/locking/mutex.c:102
>>>>  [] _object_find+0x23/0xc0 drivers/gpu/drm/drm_crtc.c:329
>>>>  [< inline >] drm_mode_object_find drivers/gpu/drm/drm_crtc.c:360
>>>>  [< inline >] drm_crtc_find ./include/drm/drm_crtc.h:2999
>>>>  [] drm_mode_page_flip_ioctl+0x4e/0x300
>>>> drivers/gpu/drm/drm_crtc.c:5414
>>>>  [] drm_ioctl+0x2a2/0x460 drivers/gpu/drm/drm_ioctl.c:721
>>>>  [< inline >] vfs_ioctl fs/ioctl.c:43
>>>>  [] do_vfs_ioctl+0x8d/0x580 fs/ioctl.c:675
>>>>  [< inline >] SYSC_ioctl fs/ioctl.c:690
>>>>  [] SyS_ioctl+0x74/0x80 fs/ioctl.c:681
>>>>  [] entry_SYSCALL_64_fastpath+0x13/0x8f
>>>> arch/x86/entry/entry_64.S:207
>>>> Code: e8 37 23 00 00 8b 03 83 f8 01 0f 84 95 00 00 00 48 8b 43 10 4c
>>>> 8d 7b 08 48 89 63 10 41 be ff ff ff ff 4c 89 3c 24 48 89 44 24 08 <48>
>>>> 89 20 4c 89 64 24 10 eb 19 49 c7 04 24 02 00 00 00 c6 43 04
>>>> RIP  [< inline >] __list_add ./include/linux/list.h:44
>>>> RIP  [< inline >] list_add_tail ./include/linux/list.h:77
>>>> RIP  [< inline >] __mutex_lock_common kernel/locking/mutex.c:543
>>>> RIP  [] __mutex_lock_slowpath+0x6f/0x100
>>>> kernel/locking/mutex.c:824
>>>>  RSP 
>>>> CR2: 
>>>> ---[ end trace 3cef4eb618ac6bb6 ]---
>>>> ===
>>>>
>>>> // autogenerated by syzkaller (http://github.com/google/syzkaller)
>>>> #include 
>>>> #include 
>>>> #include 
>>>>
>>>> int main()
>>>> {
>>>>   int fd = open("/dev/dri/card0", 0);
>>>>   mmap(0x20036000ul, 0x1000ul, 0x3ul, 0x32ul, 0xul, 0x0ul);
>>>>   memcpy((void*)0x20036ad7,
>>>>  "\x1e\xa4\x45\xdc\xca\x11\xff\x25\x72\x65\x7e\x4a\x56\x54\x35"
>>>>  "\x67\xe3\x8b\x41\x5c\x6d\x69\xa5\xf9\x88\x29\xb8\xc9\x6a\x45"
>>>>  "\x76\xa9

Re: drm: NULL pointer dereference in drm_mode_object_find()

2016-09-20 Thread David Herrmann
Hi

On Mon, Sep 5, 2016 at 10:30 AM, Dmitry Vyukov  wrote:
> On Fri, Aug 19, 2016 at 7:10 PM, Alexander Potapenko  
> wrote:
>> Hello,
>>
>> the program below triggers a NULL deref in DRM code when ran on QEMU:
>>
>> ===
>> BUG: unable to handle kernel NULL pointer dereference at   (null)
>> IP: [< inline >] __list_add ./include/linux/list.h:44
>> IP: [< inline >] list_add_tail ./include/linux/list.h:77
>> IP: [< inline >] __mutex_lock_common kernel/locking/mutex.c:543
>> IP: [] __mutex_lock_slowpath+0x6f/0x100
>> kernel/locking/mutex.c:824
>> PGD 1c555067 PUD 1c554067 PMD 0
>> Oops: 0002 [#1] SMP
>> Modules linked in:
>> CPU: 0 PID: 2517 Comm: crash_drm_mode_ Not tainted 4.8.0-rc2+ #1157
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> task: 88001c40a700 task.stack: 88001c984000
>> RIP: 0010:[]  []
>> __mutex_lock_slowpath+0x6f/0x100
>> RSP: 0018:88001c987cb0  EFLAGS: 00010282
>> RAX:  RBX: 88001d5212a0 RCX: c100
>> RDX: 0001 RSI: 88001c40a700 RDI: 88001d5212a4
>> RBP: 88001c987cf8 R08: 88001c984000 R09: 
>> R10:  R11:  R12: 88001c40a700
>> R13: 88001d5212a4 R14:  R15: 88001d5212a8
>> FS:  00dc9880() GS:88001f00() knlGS:
>> CS:  0010 DS:  ES:  CR0: 80050033
>> CR2:  CR3: 1c8a9000 CR4: 000406f0
>> Stack:
>>  88001d5212a8   811a398f
>>  88001d5212a0 88001d5212a0  
>>  81a6eb20 88001c987d10 818e85ba 88001d521000
>> Call Trace:
>>  [< inline >] __mutex_fastpath_lock 
>> ./arch/x86/include/asm/mutex_64.h:28
>>  [] mutex_lock+0x1a/0x30 kernel/locking/mutex.c:102
>>  [] _object_find+0x23/0xc0 drivers/gpu/drm/drm_crtc.c:329
>>  [< inline >] drm_mode_object_find drivers/gpu/drm/drm_crtc.c:360
>>  [< inline >] drm_crtc_find ./include/drm/drm_crtc.h:2999
>>  [] drm_mode_page_flip_ioctl+0x4e/0x300
>> drivers/gpu/drm/drm_crtc.c:5414
>>  [] drm_ioctl+0x2a2/0x460 drivers/gpu/drm/drm_ioctl.c:721
>>  [< inline >] vfs_ioctl fs/ioctl.c:43
>>  [] do_vfs_ioctl+0x8d/0x580 fs/ioctl.c:675
>>  [< inline >] SYSC_ioctl fs/ioctl.c:690
>>  [] SyS_ioctl+0x74/0x80 fs/ioctl.c:681
>>  [] entry_SYSCALL_64_fastpath+0x13/0x8f
>> arch/x86/entry/entry_64.S:207
>> Code: e8 37 23 00 00 8b 03 83 f8 01 0f 84 95 00 00 00 48 8b 43 10 4c
>> 8d 7b 08 48 89 63 10 41 be ff ff ff ff 4c 89 3c 24 48 89 44 24 08 <48>
>> 89 20 4c 89 64 24 10 eb 19 49 c7 04 24 02 00 00 00 c6 43 04
>> RIP  [< inline >] __list_add ./include/linux/list.h:44
>> RIP  [< inline >] list_add_tail ./include/linux/list.h:77
>> RIP  [< inline >] __mutex_lock_common kernel/locking/mutex.c:543
>> RIP  [] __mutex_lock_slowpath+0x6f/0x100
>> kernel/locking/mutex.c:824
>>  RSP 
>> CR2: 
>> ---[ end trace 3cef4eb618ac6bb6 ]---
>> ===
>>
>> // autogenerated by syzkaller (http://github.com/google/syzkaller)
>> #include 
>> #include 
>> #include 
>>
>> int main()
>> {
>>   int fd = open("/dev/dri/card0", 0);
>>   mmap(0x20036000ul, 0x1000ul, 0x3ul, 0x32ul, 0xul, 0x0ul);
>>   memcpy((void*)0x20036ad7,
>>  "\x1e\xa4\x45\xdc\xca\x11\xff\x25\x72\x65\x7e\x4a\x56\x54\x35"
>>  "\x67\xe3\x8b\x41\x5c\x6d\x69\xa5\xf9\x88\x29\xb8\xc9\x6a\x45"
>>  "\x76\xa9\xe7\x14\xd1\xf6\xa3\x59\x07\x4d\xe5\xc8\x39\xbf\x33"
>>  "\xb9\x3d\x21\xd1\xaf\x16\x4d\xbc\xbf\xb1\x0a\x92\x97\xd9\x91"
>>  "\x4d\xd8\xf8\xa1\xa6\xa3\x20\x02\x2c\x5e\x8f\x87\x05\x8b\xeb"
>>  "\x9a\xb9\xbc\xa6\x60\x45\x8d\x19\x01\x7d\xb7\xef\x64\x62\x2e"
>>  "\x5e\x3d\xfe\x65\xbf\xe2\x80\x89\x36\x48\x73\xc6\xa2\x6e\xe2"
>>  "\x1a\x8f\x1b\x11\x6f\x49\x20\xeb\x74\x2d\x41\xb9\x8b\xb4\x8e"
>>  "\x8b\xf5\x6d\xb7\xb1\xa3\xcb\xc4\xc2\x7f\x6d\xef\x32\xef\xa7"
>>  "\x1c\x17\x03\x60\x7b\x31\x1f\x66",
>>  143);
>>   ioctl(fd, 0xbb0ul, 0x20036ad7ul, 0, 0, 0);
>>   return 0;
>> }
>>
>> I build the ToT kernel (commit
>> 952b159f2919a8d514f13999f9f463bddcc1dae7, Aug 18) with defconfig and
>> CONFIG_DRM_VGEM=y.
>
> +dri-devel
>
> I am also hitting this on 0f98f121e1670eaa2a2fbb675e07d6ba7f0e146f of
> linux-next.

Can you tell us which DRM driver this is? vgem does not specify
DRIVER_MODESET, so the page-flip ioctl should not be hooked up. Also,
the mmap() operation should fail on any GEM driver. *confused*

Thanks
David


Re: drm: WARNING in ioremap_wc

2016-09-02 Thread David Herrmann
Hey

On Fri, Sep 2, 2016 at 2:23 PM, Dmitry Vyukov  wrote:
> On Mon, Aug 29, 2016 at 8:05 AM, Daniel Vetter  wrote:
>> On Sun, Aug 28, 2016 at 07:36:59PM +0200, Dmitry Vyukov wrote:
>>> Hello,
>>>
>>> The following program triggers WARNING in ioremap_wc:
>>
>> Yup, that should also be fixed in linux-next. Probably better to not
>> report more for 4.8-rc kernels for now ;-)
>>
>> If you have some time to test, you need to cherry pick the following two
>> patches I think:
>>
>> fa5386459f06 ("drm: Used DRM_LEGACY for all legacy functions")
>> 3cbf6a5deb2f ("drm: Mark up legacy/dri1 drivers with DRM_LEGACY")
>>
>> If that's confirmed we can cherry-pick them over to -fixes.
>
>
> Hi Daniel,
>
> I've switched to linux-next and don't see the WARNINGs anymore.
> I wonder how they were discovered? Is somebody else running syzkaller?
> Or they were discovered in some other way?

We independently discovered that some DRM-KMS drivers enabled the DRI1
ioctls, and fixed this. Hence, you should be unable to trigger
anything in drm_bufs.c, drm_context.c, and friends.

Thanks
David


Re: [PATCH v4 3/5] drm: add SimpleDRM driver

2016-09-01 Thread David Herrmann
Hey

On Mon, Aug 22, 2016 at 10:25 PM, Noralf Trønnes  wrote:
> The SimpleDRM driver binds to simple-framebuffer devices and provides a
> DRM/KMS API. It provides only a single CRTC+encoder+connector combination
> plus one initial mode.
>
> Userspace can create dumb-buffers which can be blit into the real
> framebuffer similar to UDL. No access to the real framebuffer is allowed
> (compared to earlier version of this driver) to avoid security issues.
> Furthermore, this way we can support arbitrary modes as long as we have a
> conversion-helper.
>
> The driver was originally written by David Herrmann in 2014.
> My main contribution is to make use of drm_simple_kms_helper and
> rework the probe path to avoid use of the deprecated drm_platform_init()
> and drm_driver.{load,unload}().
> Additions have also been made for later changes to the Device Tree binding
> document, like support for clocks, regulators and having the node under
> /chosen. This code was lifted verbatim from simplefb.c.
>
> Cc: dh.herrm...@gmail.com
> Cc: l...@skynet.be
> Signed-off-by: Noralf Trønnes 
> ---
>
> Changes from version 3:
> - Reworked gem code to match udl
> - Dropped PRIME support
> - Dropped dirty_info_property, it's gone
> - Don't use drm_device.platformdev it's deprecated
> - Remove struct sdrm_device #ifdef's
> - Split out sdrm_fb_init() from sdrm_fb_create(), needed by fbdev code
> - Simplify drm_clip validation by extending the check in
>   sdrm_dirty() and drop the one in sdrm_blit()
> - Removed sdrm_dirty_all_unlocked() which was unused.
>
> Changes from version 2:
> - Remove superfluos module.h includes
> - Move includes from header to source files
> - Set plane.fb before flushing in pipe update, or else the previous one
>   gets flushed
> - Added check for vblank event in sdrm_display_pipe_update()
>
> Changes from version 1:
> - Move platform_set_drvdata() before drm_dev_register()
> - Remove drm_legacy_mmap() call.
> - Set mode_config.{min,max}_{width,height} to the actual dimensions
>   of the native framebuffer
> - Remove plane positioning since it won't work with the simple display pipe,
>   meaning sdrm_display_pipe_check() isn't necessary either
> - Support the additions to the Device Tree binding document, including
>   clocks, regulators and having the node under /chosen
>
> Changes from previous version:
> - Remove FB_SIMPLE=n dependency to avoid kconfig recursive error
> - Changed module name to match kconfig help text: sdrm -> simpledrm
> - Use drm_simple_display_pipe
> - Replace deprecated drm_platform_init()
> - sdrm_dumb_create(): drm_gem_object_unreference() -> *_unlocked()
> - sdrm_dumb_map_offset(): drm_gem_object_lookup() remove drm_device parameter
> - sdrm_drm_mmap() changes:
>   Remove struct_mutex locking
>   Add drm_vma_offset_{lock,unlock}_lookup()
>   drm_mmap() -> drm_legacy_mmap()
> - dma_buf_begin_cpu_access() doesn't require start and length anymore
> - Use drm_cvt_mode() instead of open coding a mode
> - Fix format conversion. In the intermediate step, store the 8/6/5 bit color
>   value in the upper part of the 16-bit color variable, not the lower.
> - Support clips == NULL in sdrm_dirty()
> - Set mode_config.preferred_depth
> - Attach mode_config.dirty_info_property to connector
>
>  drivers/gpu/drm/Kconfig  |   2 +
>  drivers/gpu/drm/Makefile |   1 +
>  drivers/gpu/drm/simpledrm/Kconfig|  19 +
>  drivers/gpu/drm/simpledrm/Makefile   |   4 +
>  drivers/gpu/drm/simpledrm/simpledrm.h|  86 +
>  drivers/gpu/drm/simpledrm/simpledrm_damage.c | 235 
>  drivers/gpu/drm/simpledrm/simpledrm_drv.c| 543 
> +++
>  drivers/gpu/drm/simpledrm/simpledrm_gem.c| 202 ++
>  drivers/gpu/drm/simpledrm/simpledrm_kms.c| 234 
>  9 files changed, 1326 insertions(+)
>  create mode 100644 drivers/gpu/drm/simpledrm/Kconfig
>  create mode 100644 drivers/gpu/drm/simpledrm/Makefile
>  create mode 100644 drivers/gpu/drm/simpledrm/simpledrm.h
>  create mode 100644 drivers/gpu/drm/simpledrm/simpledrm_damage.c
>  create mode 100644 drivers/gpu/drm/simpledrm/simpledrm_drv.c
>  create mode 100644 drivers/gpu/drm/simpledrm/simpledrm_gem.c
>  create mode 100644 drivers/gpu/drm/simpledrm/simpledrm_kms.c
>
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index fc35731..a54cc8d 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -290,3 +290,5 @@ source "drivers/gpu/drm/arc/Kconfig"
>  source "drivers/gpu/drm/hisilicon/Kconfig"
>
>  source "drivers/gpu/drm/mediatek/Kconfi

Re: [PATCH 0/2] drm: add SimpleDRM driver

2016-08-04 Thread David Herrmann
Hi

On Thu, Aug 4, 2016 at 5:34 PM, Luc Verhaegen  wrote:
> Do we really want to recreate a 400+ email thread again, or are we
> capable of learning from the past?

No we don't. And no-one intends to. I am fully aware of the discussion
that introduced the clock-dependencies to simplefb, and I gladly
accept patches that add such support to SimpleDRM. Did anyone say
otherwise? This series adds initial support for the devices _we_ know
and can test (which is x86 and RPi, in my case). If someone wants
support for more devices, please send patches. Why does this have to
be included (or even discussed) as part of this submission?

Thanks
David


Re: [PATCH 1/3 v4] drm: Introduce drm_connector_register_all() helper

2016-03-29 Thread David Herrmann
Hi

On Tue, Mar 29, 2016 at 12:02 PM, Alexey Brodkin
 wrote:
> As a pair to already existing drm_connector_unregister_all() we're adding
> generic implementation of what is already done in some drivers.
>
> Once this helper is implemented we'll be ready to switch existing
> driver-specific implementations with the generic one.
>
> Signed-off-by: Alexey Brodkin 
> Cc: Daniel Vetter 
> Cc: David Airlie 
> Cc: David Herrmann 
> ---
>
> Changes v3 -> v4:
>  * In drm_connector_register_all() fail path which calls unregister_all()
>is moved outside of loop&locked section (as suggested by Daniel)
>
> Changes v2 -> v3:
>  * Updated title with capital after colon
>  * Simplified failure path with direct and unconditional invocation of
>unregister_all()
>  * Updated kerneldoc description of the drm_connector_register_all()
>
> Changes v1 -> v2:
>  * Rename drm_connector_unplug_all() to drm_connector_unregister_all()
>  * Use drm_for_each_connector() instead of list_for_each_entry()
>  * Updated kerneldoc for drm_dev_register()
>
>  drivers/gpu/drm/drm_crtc.c | 39 +++
>  drivers/gpu/drm/drm_drv.c  |  6 +-
>  include/drm/drm_crtc.h |  3 ++-
>  3 files changed, 46 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
> index 7675826..3e4cdb1 100644
> --- a/drivers/gpu/drm/drm_crtc.c
> +++ b/drivers/gpu/drm/drm_crtc.c
> @@ -1079,6 +1079,45 @@ void drm_connector_unregister(struct drm_connector 
> *connector)
>  EXPORT_SYMBOL(drm_connector_unregister);
>
>  /**
> + * drm_connector_register_all - register all connectors
> + * @dev: drm device
> + *
> + * This function registers all connectors in sysfs and other places so that
> + * userspace can start to access them. Drivers can call it after calling
> + * drm_dev_register() to complete the device registration, if they don't call
> + * drm_connector_register() on each connector individually.
> + *
> + * When a device is unplugged and should be removed from userspace access,
> + * call drm_connector_unregister_all(), which is the inverse of this
> + * function.
> + *
> + * Returns:
> + * Zero on success, error code on failure.
> + */
> +int drm_connector_register_all(struct drm_device *dev)
> +{
> +   struct drm_connector *connector;
> +   int ret;
> +
> +   mutex_lock(&dev->mode_config.mutex);
> +
> +   drm_for_each_connector(connector, dev) {
> +   ret = drm_connector_register(connector);
> +   if (ret)
> +   goto err;
> +   }
> +
> +   mutex_unlock(&dev->mode_config.mutex);
> +
> +   return 0;
> +
> +err:
> +   drm_connector_unregister_all(dev);

You _must_ unlock the mutex before returning.

Thanks
David

> +   return ret;
> +}
> +EXPORT_SYMBOL(drm_connector_register_all);
> +
> +/**
>   * drm_connector_unregister_all - unregister connector userspace interfaces
>   * @dev: drm device
>   *
> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> index 167c8d3..2c9a2b6 100644
> --- a/drivers/gpu/drm/drm_drv.c
> +++ b/drivers/gpu/drm/drm_drv.c
> @@ -715,7 +715,11 @@ EXPORT_SYMBOL(drm_dev_unref);
>   *
>   * Register the DRM device @dev with the system, advertise device to 
> user-space
>   * and start normal device operation. @dev must be allocated via 
> drm_dev_alloc()
> - * previously.
> + * previously. Right after drm_dev_register() the driver should call
> + * drm_connector_register_all() to register all connectors in sysfs. This is
> + * a separate call for backward compatibility with drivers still using
> + * the deprecated ->load() callback, where connectors are registered from 
> within
> + * the ->load() callback.
>   *
>   * Never call this twice on any device!
>   *
> diff --git a/include/drm/drm_crtc.h b/include/drm/drm_crtc.h
> index 12f2bd4..6231f6c 100644
> --- a/include/drm/drm_crtc.h
> +++ b/include/drm/drm_crtc.h
> @@ -2253,7 +2253,8 @@ void drm_connector_unregister(struct drm_connector 
> *connector);
>
>  extern void drm_connector_cleanup(struct drm_connector *connector);
>  extern unsigned int drm_connector_index(struct drm_connector *connector);
> -/* helper to unregister all connectors from sysfs for device */
> +/* helpers to {un}register all connectors from sysfs for device */
> +extern int drm_connector_register_all(struct drm_device *dev);
>  extern void drm_connector_unregister_all(struct drm_device *dev);
>
>  extern int drm_bridge_add(struct drm_bridge *bridge);
> --
> 2.5.0
>


Re: [PATCH 2/4 v3] drm: Introduce drm_connector_register_all() helper

2016-03-23 Thread David Herrmann
Hi

On Wed, Mar 23, 2016 at 9:42 AM, Alexey Brodkin
 wrote:
> As a pair to already existing drm_connector_unregister_all() we're adding
> generic implementation of what is already done in some drivers.
>
> Once this helper is implemented we'll be ready to switch existing
> driver-specific implementations with the generic one.
>
> Signed-off-by: Alexey Brodkin 
> Cc: Daniel Vetter 
> Cc: David Airlie 
> ---
>
> Changes v2 -> v3:
>  * Updated title with capital after colon
>  * Simplified failure path with direct and unconditional invocation of
>unregister_all()
>  * Updated kerneldoc description of the drm_connector_register_all()
>
> Changes v1 -> v2:
>  * Rename drm_connector_unplug_all() to drm_connector_unregister_all()
>  * Use drm_for_each_connector() instead of list_for_each_entry()
>  * Updated kerneldoc for drm_dev_register()
>
>  drivers/gpu/drm/drm_crtc.c | 43 +++
>  drivers/gpu/drm/drm_drv.c  |  6 +-
>  include/drm/drm_crtc.h |  3 ++-
>  3 files changed, 50 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
> index 65488a6..21eea11 100644
> --- a/drivers/gpu/drm/drm_crtc.c
> +++ b/drivers/gpu/drm/drm_crtc.c
> @@ -1081,6 +1081,49 @@ void drm_connector_unregister(struct drm_connector 
> *connector)
>  EXPORT_SYMBOL(drm_connector_unregister);
>
>  /**
> + * drm_connector_register_all - register all connectors
> + * @dev: drm device
> + *
> + * This function registers all connectors in sysfs and other places so that
> + * userspace can start to access them. Drivers can call it after calling
> + * drm_dev_register() to complete the device registration, if they don't call
> + * drm_connector_register() on each connector individually.
> + *
> + * When a device is unplugged and should be removed from userspace access,
> + * call drm_connector_unregister_all(), which is the inverse of this
> + * function.
> + *
> + * Returns:
> + * Zero on success, error code on failure.
> + */
> +int drm_connector_register_all(struct drm_device *dev)
> +{
> +   struct drm_connector *connector;
> +   int ret;
> +
> +   mutex_lock(&dev->mode_config.mutex);
> +
> +   drm_for_each_connector(connector, dev) {
> +   ret = drm_connector_register(connector);
> +   if (ret) {
> +   /*
> +* We may safely call unregister_all() here within
> +* area locked with mutex because unregister_all()
> +* doesn't use locks inside (see a comment in that
> +* function).
> +*/

Ugh? unregister_all() says:

/* FIXME: taking the mode config mutex ends up in a clash with sysfs */

This strongly contradicts your comment. Anyway, regardless how you
want to fix it: You better unlock the mode-config mutex before
returning below.

Thanks
David

> +   drm_connector_unregister_all(dev);
> +   return ret;
> +   }
> +   }
> +
> +   mutex_unlock(&dev->mode_config.mutex);
> +
> +   return 0;
> +}
> +EXPORT_SYMBOL(drm_connector_register_all);
> +
> +/**
>   * drm_connector_unregister_all - unregister connector userspace interfaces
>   * @dev: drm device
>   *
> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> index 167c8d3..2c9a2b6 100644
> --- a/drivers/gpu/drm/drm_drv.c
> +++ b/drivers/gpu/drm/drm_drv.c
> @@ -715,7 +715,11 @@ EXPORT_SYMBOL(drm_dev_unref);
>   *
>   * Register the DRM device @dev with the system, advertise device to 
> user-space
>   * and start normal device operation. @dev must be allocated via 
> drm_dev_alloc()
> - * previously.
> + * previously. Right after drm_dev_register() the driver should call
> + * drm_connector_register_all() to register all connectors in sysfs. This is
> + * a separate call for backward compatibility with drivers still using
> + * the deprecated ->load() callback, where connectors are registered from 
> within
> + * the ->load() callback.
>   *
>   * Never call this twice on any device!
>   *
> diff --git a/include/drm/drm_crtc.h b/include/drm/drm_crtc.h
> index 42d9f4d..6a34117 100644
> --- a/include/drm/drm_crtc.h
> +++ b/include/drm/drm_crtc.h
> @@ -2214,7 +2214,8 @@ void drm_connector_unregister(struct drm_connector 
> *connector);
>
>  extern void drm_connector_cleanup(struct drm_connector *connector);
>  extern unsigned int drm_connector_index(struct drm_connector *connector);
> -/* helper to unregister all connectors from sysfs for device */
> +/* helpers to {un}register all connectors from sysfs for device */
> +extern int drm_connector_register_all(struct drm_device *dev);
>  extern void drm_connector_unregister_all(struct drm_device *dev);
>
>  extern int drm_bridge_add(struct drm_bridge *bridge);
> --
> 2.5.0
>


Re: [PATCH v2 3/9] efi/x86: efifb: move DMI based quirks handling out of generic code

2016-03-21 Thread David Herrmann
Hi

On Fri, Mar 18, 2016 at 5:47 PM, Ard Biesheuvel
 wrote:
> The efifb quirks handling based on DMI identification of the platform is
> specific to x86, so move it to x86 arch code.
>
> Signed-off-by: Ard Biesheuvel 

Fine with me:

Acked-by: David Herrmann 

Thanks
David

> ---
>  arch/x86/include/asm/efi.h  |  2 ++
>  arch/x86/kernel/sysfb_efi.c | 15 +++
>  drivers/video/fbdev/efifb.c | 15 ---
>  3 files changed, 21 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
> index 7d5187b66108..790adbc3a64a 100644
> --- a/arch/x86/include/asm/efi.h
> +++ b/arch/x86/include/asm/efi.h
> @@ -154,6 +154,8 @@ static inline bool efi_runtime_supported(void)
>  extern struct console early_efi_console;
>  extern void parse_efi_setup(u64 phys_addr, u32 data_len);
>
> +extern void efifb_setup_from_dmi(struct screen_info *si, const char *opt);
> +
>  #ifdef CONFIG_EFI_MIXED
>  extern void efi_thunk_runtime_setup(void);
>  extern efi_status_t efi_thunk_set_virtual_address_map(
> diff --git a/arch/x86/kernel/sysfb_efi.c b/arch/x86/kernel/sysfb_efi.c
> index b285d4e8c68e..e21a8a7ddcff 100644
> --- a/arch/x86/kernel/sysfb_efi.c
> +++ b/arch/x86/kernel/sysfb_efi.c
> @@ -68,6 +68,21 @@ struct efifb_dmi_info efifb_dmi_list[] = {
> [M_UNKNOWN] = { NULL, 0, 0, 0, 0, OVERRIDE_NONE }
>  };
>
> +void efifb_setup_from_dmi(struct screen_info *si, const char *opt)
> +{
> +   int i;
> +
> +   for (i = 0; i < M_UNKNOWN; i++) {
> +   if (efifb_dmi_list[i].base != 0 &&
> +   !strcmp(opt, efifb_dmi_list[i].optname)) {
> +   si->lfb_base = efifb_dmi_list[i].base;
> +   si->lfb_linelength = efifb_dmi_list[i].stride;
> +   si->lfb_width = efifb_dmi_list[i].width;
> +   si->lfb_height = efifb_dmi_list[i].height;
> +   }
> +   }
> +}
> +
>  #define choose_value(dmivalue, fwvalue, field, flags) ({   \
> typeof(fwvalue) _ret_ = fwvalue;\
> if ((flags) & (field))  \
> diff --git a/drivers/video/fbdev/efifb.c b/drivers/video/fbdev/efifb.c
> index 95d293b7445a..dd594369b8a6 100644
> --- a/drivers/video/fbdev/efifb.c
> +++ b/drivers/video/fbdev/efifb.c
> @@ -8,6 +8,7 @@
>
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -15,7 +16,7 @@
>  #include 
>  #include 
>  #include 
> -#include 
> +#include 
>
>  static bool request_mem_succeeded = false;
>
> @@ -85,21 +86,13 @@ static struct fb_ops efifb_ops = {
>  static int efifb_setup(char *options)
>  {
> char *this_opt;
> -   int i;
>
> if (options && *options) {
> while ((this_opt = strsep(&options, ",")) != NULL) {
> if (!*this_opt) continue;
>
> -   for (i = 0; i < M_UNKNOWN; i++) {
> -   if (efifb_dmi_list[i].base != 0 &&
> -   !strcmp(this_opt, 
> efifb_dmi_list[i].optname)) {
> -   screen_info.lfb_base = 
> efifb_dmi_list[i].base;
> -   screen_info.lfb_linelength = 
> efifb_dmi_list[i].stride;
> -   screen_info.lfb_width = 
> efifb_dmi_list[i].width;
> -   screen_info.lfb_height = 
> efifb_dmi_list[i].height;
> -   }
> -   }
> +   efifb_setup_from_dmi(&screen_info, this_opt);
> +
> if (!strncmp(this_opt, "base:", 5))
> screen_info.lfb_base = 
> simple_strtoul(this_opt+5, NULL, 0);
> else if (!strncmp(this_opt, "stride:", 7))
> --
> 2.5.0
>


Re: [PATCH v2 8/9] efi/arm*: wire up struct screen_info to efi-framebuffer platform device

2016-03-21 Thread David Herrmann
Hi

On Fri, Mar 18, 2016 at 5:47 PM, Ard Biesheuvel
 wrote:
> This adds code to the ARM and arm64 EFI init routines to expose a platform
> device of type 'efi-framebuffer' if struct screen_info has been populated
> appropriately from the GOP protocol by the stub. Since the framebuffer may
> potentially be located in system RAM, make sure that the region is reserved
> and marked MEMBLOCK_NOMAP.
>
> Signed-off-by: Ard Biesheuvel 
> ---
>  drivers/firmware/efi/arm-init.c | 18 ++
>  1 file changed, 18 insertions(+)
>
> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
> index acd44de30504..f5f13b871d42 100644
> --- a/drivers/firmware/efi/arm-init.c
> +++ b/drivers/firmware/efi/arm-init.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>
>  #include 
> @@ -76,6 +77,10 @@ static void __init init_screen_info(void)
> screen_info = *si;
> early_memunmap(si, sizeof(*si));
> }
> +
> +   if (screen_info.orig_video_isVGA == VIDEO_TYPE_EFI &&
> +   memblock_is_map_memory(screen_info.lfb_base))
> +   memblock_mark_nomap(screen_info.lfb_base, 
> screen_info.lfb_size);
>  }
>
>  static int __init uefi_init(void)
> @@ -235,3 +240,16 @@ void __init efi_init(void)
>
> init_screen_info();
>  }
> +
> +static int __init register_gop_device(void)
> +{
> +   void *pd;
> +
> +   if (screen_info.orig_video_isVGA != VIDEO_TYPE_EFI)
> +   return 0;
> +
> +   /* the efifb driver accesses screen_info directly, no need to pass it 
> */
> +   pd = platform_device_register_simple("efi-framebuffer", 0, NULL, 0);

I'd really appreciate if you pass "screen_info" as data, just like x86
does. Sure, we still use the global screen_info variable from efifb,
but we should try to reduce the number of global variables and rather
make drivers use the platform-data of the platform device.

Thanks
David


Re: [PATCH] HID: wiimote: Fix wiimote mp scale linearization

2016-03-19 Thread David Herrmann
Hi

On Wed, Mar 16, 2016 at 5:59 PM, Cyan Ogilvie  wrote:
> The wiimote motion plus gyros use two scales to report fast and slow
> rotation - below 440 deg/s uses 8192/440 units / deg/s, and above uses
> 8192/2000 units / deg/s.
>
> Previously this driver attempted to linearize the two by scaling the fast
> rate by 18 and the slow by 9, but this results in a scale of
> 8192*9/440 = ~167.564 for slow and 8192*18/2000 = 73.728 for fast.
>
> Correct the fast motion scale factor so that both report ~167.564
> units / deg/s
>
> Signed-off-by: Cyan Ogilvie 
> ---
>  drivers/hid/hid-wiimote-modules.c | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/hid/hid-wiimote-modules.c 
> b/drivers/hid/hid-wiimote-modules.c
> index 4390eee..c830ed3 100644
> --- a/drivers/hid/hid-wiimote-modules.c
> +++ b/drivers/hid/hid-wiimote-modules.c
> @@ -2049,9 +2049,11 @@ static void wiimod_mp_in_mp(struct wiimote_data 
> *wdata, const __u8 *ext)
>  *   -+--+-+-+
>  * The single bits Yaw, Roll, Pitch in the lower right corner specify
>  * whether the wiimote is rotating fast (0) or slow (1). Speed for 
> slow
> -* roation is 440 deg/s and for fast rotation 2000 deg/s. To get a
> -* linear scale we multiply by 2000/440 = ~4.5454 which is 18 for fast
> -* and 9 for slow.

Oh dear! My initial reasoning was completely off. I somehow upscaled
4.5 to 18, and 1 to 9, when it should have been 1 to 4. Oops!

> +* roation is 8192/440 units / deg/s and for fast rotation 8192/2000
> +* units / deg/s. To get a linear scale for fast rotation we multiply
> +* by 2000/440 = ~4.5454 and scale both fast and slow by 9 to match 
> the
> +* previous scale reported by this driver.
> +* This leaves a linear scale with 8192*9/440 (~167.564) units / 
> deg/s.
>  * If the wiimote is not rotating the sensor reports 2^13 = 8192.
>  * Ext specifies whether an extension is connected to the motionp.
>  * which is parsed by wiimote-core.
> @@ -2070,15 +2072,15 @@ static void wiimod_mp_in_mp(struct wiimote_data 
> *wdata, const __u8 *ext)
> z -= 8192;
>
> if (!(ext[3] & 0x02))
> -   x *= 18;
> +   x = (x * 2000 * 9) / 440;
> else
> x *= 9;
> if (!(ext[4] & 0x02))
> -   y *= 18;
> +   y = (y * 2000 * 9) / 440;
> else
> y *= 9;
> if (!(ext[3] & 0x01))
> -   z *= 18;
> +   z = (z * 2000 * 9) / 440;

So 2000/440 ~= 4.5, so this is exactly what the initial comment said.
Patch looks good to me:

Reviewed-by: David Herrmann 

Thanks
David

> else
> z *= 9;
>
> --
> 2.1.4
>


Re: [PATCH, RESEND] drm: avoid uninitialized timestamp use in wait_vblank

2016-02-26 Thread David Herrmann
Hi

On Thu, Feb 25, 2016 at 10:09 PM, Arnd Bergmann  wrote:
> gcc warns about the timestamp in drm_wait_vblank being possibly
> used without an initialization:
>
> drivers/gpu/drm/drm_irq.c: In function 'drm_wait_vblank':
> drivers/gpu/drm/drm_irq.c:1755:28: warning: 'now.tv_usec' may be used 
> uninitialized in this function [-Wmaybe-uninitialized]
>vblwait->reply.tval_usec = now.tv_usec;
> drivers/gpu/drm/drm_irq.c:1754:27: warning: 'now.tv_sec' may be used 
> uninitialized in this function [-Wmaybe-uninitialized]
>vblwait->reply.tval_sec = now.tv_sec;
>
> This can happen if drm_vblank_count_and_time() returns 0 in its
> error path. To sanitize the error case, I'm changing that function
> to return a zero timestamp when it fails.
>
> Signed-off-by: Arnd Bergmann 
> Fixes: e6ae8687a87b ("drm: idiot-proof vblank")
> ---
>  drivers/gpu/drm/drm_irq.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> I'm going through the maybe-unused warnings in randconfig builds, this one is
> apparently not a false positive, although it only happens if something
> else has already gone wrong.
>
> Originally sent out on Jan 13, but I received no reply, so resending now.
>
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index 881c5a6c180c..6f41ddfbe061 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -997,8 +997,10 @@ u32 drm_vblank_count_and_time(struct drm_device *dev, 
> unsigned int pipe,
> int count = DRM_TIMESTAMP_MAXRETRIES;
> u32 cur_vblank;
>
> -   if (WARN_ON(pipe >= dev->num_crtcs))
> +   if (WARN_ON(pipe >= dev->num_crtcs)) {
> +   *vblanktime = (struct timeval) { 0 };

The '0' is redundant. Anyway, this is:

Reviewed-by: David Herrmann 

(CC: Daniel, for drm-misc)

Thanks
David

> return 0;
> +   }
>
> /*
>  * Vblank timestamps are read lockless. To ensure consistency the 
> vblank
> --
> 2.7.0
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2] unix: properly account for FDs passed over unix sockets

2016-02-03 Thread David Herrmann
Hi

On Wed, Feb 3, 2016 at 12:36 PM, Simon McVittie
 wrote:
> Am I right in saying that the advice I give to D-Bus users should be
> something like this?
>
> * system services should not send fds at all, unless they trust the
>   dbus-daemon
> * system services should not send fds via D-Bus that will be delivered
>   to recipients that they do not trust
> * sending fds to an untrusted recipient would enable that recipient to
>   carry out a denial-of-service attack (on what? the sender? the
>   dbus-daemon?)

With the revised patch from Hannes, this should no longer be needed.
My original concern was only about accounting inflight-fds on the
file-owner, rather than the sender.

However, with Hannes' revised patch, a different DoS attack against
dbus-daemon is possible. Imagine a peer that receives batches of FDs,
but never dequeues them. They will be accounted on the inflight-limit
of dbus-daemon, as such causing messages of independent peers to be
rejected in case they carry FDs.
Preferably, dbus-daemon would avoid queuing more than 16 FDs on a
single destination (total). But that would require POLLOUT to be
capped by the number of queued fds. A possible workaround is to add
CAP_SYS_RESOURCE to dbus-daemon.

Thanks
David


Re: [PATCH v2] unix: properly account for FDs passed over unix sockets

2016-02-02 Thread David Herrmann
Hi

On Sun, Jan 10, 2016 at 7:54 AM, Willy Tarreau  wrote:
> It is possible for a process to allocate and accumulate far more FDs than
> the process' limit by sending them over a unix socket then closing them
> to keep the process' fd count low.
>
> This change addresses this problem by keeping track of the number of FDs
> in flight per user and preventing non-privileged processes from having
> more FDs in flight than their configured FD limit.

This is not really what this patch does. This patch rather prevents
processes from having more of their owned *files* in flight than their
configured FD limit. See below for details..

> Reported-by: socketp...@gmail.com
> Reported-by: Tetsuo Handa 
> Mitigates: CVE-2013-4312 (Linux 2.0+)
> Suggested-by: Linus Torvalds 
> Acked-by: Hannes Frederic Sowa 
> Signed-off-by: Willy Tarreau 
> ---
> v2: add reported-by, mitigates and acked-by.
>
> It would be nice if (if accepted) it would be backported to -stable as the
> issue is currently exploitable.
> ---
>  include/linux/sched.h |  1 +
>  net/unix/af_unix.c| 24 
>  net/unix/garbage.c| 13 -
>  3 files changed, 29 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index edad7a4..fbf25f1 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -830,6 +830,7 @@ struct user_struct {
> unsigned long mq_bytes; /* How many bytes can be allocated to mqueue? 
> */
>  #endif
> unsigned long locked_shm; /* How many pages of mlocked shm ? */
> +   unsigned long unix_inflight;/* How many files in flight in unix 
> sockets */
>
>  #ifdef CONFIG_KEYS
> struct key *uid_keyring;/* UID specific keyring */
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 45aebd9..d6d7b43 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1499,6 +1499,21 @@ static void unix_destruct_scm(struct sk_buff *skb)
> sock_wfree(skb);
>  }
>
> +/*
> + * The "user->unix_inflight" variable is protected by the garbage
> + * collection lock, and we just read it locklessly here. If you go
> + * over the limit, there might be a tiny race in actually noticing
> + * it across threads. Tough.

This limit is checked once per transaction. IIRC, a transaction can
carry 255 files. Thus, it is easy to exceed this limit without any
race involved.

> + */
> +static inline bool too_many_unix_fds(struct task_struct *p)
> +{
> +   struct user_struct *user = current_user();
> +
> +   if (unlikely(user->unix_inflight > task_rlimit(p, RLIMIT_NOFILE)))
> +   return !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN);
> +   return false;
> +}
> +
>  #define MAX_RECURSION_LEVEL 4
>
>  static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
> @@ -1507,6 +1522,9 @@ static int unix_attach_fds(struct scm_cookie *scm, 
> struct sk_buff *skb)
> unsigned char max_level = 0;
> int unix_sock_count = 0;
>
> +   if (too_many_unix_fds(current))
> +   return -ETOOMANYREFS;
> +

Ignoring the capabilities, this effectively resolves to:

if (current_cred()->user->unix_inflight > rlimit(RLIMIT_NOFILE))
return -ETOOMANYREFS;

It limits the number of inflight FDs of the *current* user to its *own* limit.

But..

> for (i = scm->fp->count - 1; i >= 0; i--) {
> struct sock *sk = unix_get_socket(scm->fp->fp[i]);
>
> @@ -1528,10 +1546,8 @@ static int unix_attach_fds(struct scm_cookie *scm, 
> struct sk_buff *skb)
> if (!UNIXCB(skb).fp)
> return -ENOMEM;
>
> -   if (unix_sock_count) {
> -   for (i = scm->fp->count - 1; i >= 0; i--)
> -   unix_inflight(scm->fp->fp[i]);
> -   }
> +   for (i = scm->fp->count - 1; i >= 0; i--)
> +   unix_inflight(scm->fp->fp[i]);
> return max_level;
>  }
>
> diff --git a/net/unix/garbage.c b/net/unix/garbage.c
> index a73a226..8fcdc22 100644
> --- a/net/unix/garbage.c
> +++ b/net/unix/garbage.c
> @@ -120,11 +120,11 @@ void unix_inflight(struct file *fp)
>  {
> struct sock *s = unix_get_socket(fp);
>
> +   spin_lock(&unix_gc_lock);
> +
> if (s) {
> struct unix_sock *u = unix_sk(s);
>
> -   spin_lock(&unix_gc_lock);
> -
> if (atomic_long_inc_return(&u->inflight) == 1) {
> BUG_ON(!list_empty(&u->link));
> list_add_tail(&u->link, &gc_inflight_list);
> @@ -132,25 +132,28 @@ void unix_inflight(struct file *fp)
> BUG_ON(list_empty(&u->link));
> }
> unix_tot_inflight++;
> -   spin_unlock(&unix_gc_lock);
> }
> +   fp->f_cred->user->unix_inflight++;

..this modifies the limit of the owner of the file you send. As such,
if you only send files that you don't own, you will continuously bump
the limit of the file-owner, but never 

Re: [PATCH 1/2] ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANO

2016-01-22 Thread David Herrmann
Hi

On Fri, Jan 22, 2016 at 12:03 AM, John Stultz  wrote:
> Recently, in commit 37cf4dc3370f
> ("time: Verify time values in adjtimex ADJ_SETOFFSET to avoid overflow")
> I forgot to check if the timeval being passed was actually a
> timespec (as is signaled with ADJ_NANO).
>
> This resulted in that patch breaking ADJ_SETOFFSET users who set
> ADJ_NANO, by rejecting valid timespecs that were compared with
> valid timeval ranges.
>
> This patch addresses this by checking for the ADJ_NANO flag and
> using the timepsec check instead in that case.
>
> Cc: Sasha Levin 
> Cc: Richard Cochran 
> Cc: Thomas Gleixner 
> Cc: Prarit Bhargava 
> Cc: Harald Hoyer 
> Cc: Kay Sievers 
> Cc: David Herrmann 
> Reported-by: Harald Hoyer 
> Reported-by: Kay Sievers 
> Signed-off-by: John Stultz 

Thanks for picking this up. Looks good to me:

Reviewed-by: David Herrmann 

Thanks
David

> ---
>  kernel/time/ntp.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
> index 36f2ca0..6df8927 100644
> --- a/kernel/time/ntp.c
> +++ b/kernel/time/ntp.c
> @@ -685,8 +685,18 @@ int ntp_validate_timex(struct timex *txc)
> if (!capable(CAP_SYS_TIME))
> return -EPERM;
>
> -   if (!timeval_inject_offset_valid(&txc->time))
> -   return -EINVAL;
> +   if (txc->modes & ADJ_NANO) {
> +   struct timespec ts;
> +
> +   ts.tv_sec = txc->time.tv_sec;
> +   ts.tv_nsec = txc->time.tv_usec;
> +   if (!timespec_inject_offset_valid(&ts))
> +   return -EINVAL;
> +
> +   } else {
> +   if (!timeval_inject_offset_valid(&txc->time))
> +   return -EINVAL;
> +   }
> }
>
> /*
> --
> 1.9.1
>


Re: [PATCH] vgaarb: fix signal handling in vga_get()

2015-12-14 Thread David Herrmann
Hi

On Mon, Dec 14, 2015 at 9:19 AM, Kirill A. Shutemov
 wrote:
> On Thu, Dec 10, 2015 at 11:28:58AM +0100, David Herrmann wrote:
>> Hi
>>
>> On Mon, Nov 30, 2015 at 3:17 AM, Kirill A. Shutemov
>>  wrote:
>> > There are few defects in vga_get() related to signal hadning:
>> >
>> >   - we shouldn't check for pending signals for TASK_UNINTERRUPTIBLE
>> > case;
>> >
>> >   - if we found pending signal we must remove ourself from wait queue
>> > and change task state back to running;
>> >
>> >   - -ERESTARTSYS is more appropriate, I guess.
>> >
>> > Signed-off-by: Kirill A. Shutemov 
>> > ---
>> >
>> > Alex, I try to get KVM with VGA passthrough working properly. I have i915
>> > (HD 4600) on the host and GTX 580 for the guest. The guest GPU is not
>> > capabale of EFI, so I have to use x-vga=on. It's kinda work with your
>> > patch for i915.enable_hd_vgaarb=1. But guest refuse to initialize the GPU
>> > after KVM was not shut down correctly, resulting in host crash like this:
>> >
>> > BUG: unable to handle kernel paging request at 880870187ed8
>> > IP: [] 0x880870187ed8
>> > PGD 2129067 PUD 800841e3
>> > Oops: 0011 [#1] PREEMPT SMP
>> > Modules linked in: iwlmvm iwlwifi
>> > CPU: 6 PID: 3983 Comm: qemu-system-x86 Not tainted 4.3.0-gentoo #6
>> > Hardware name: Gigabyte Technology Co., Ltd. Z87X-UD7 TH/Z87X-UD7 TH-CF, 
>> > BIOS F5a 06/12/2014
>> > task: 88087a91 ti: 8808632c task.ti: 8808632c
>> > RIP: 0010:[]  [] 0x880870187ed8
>> > RSP: 0018:8808632c3d08  EFLAGS: 00010006
>> > RAX: 880870187db0 RBX: 70187f58 RCX: 
>> > RDX:  RSI: 0003 RDI: 880870187db0
>> > RBP: 8808632c3d48 R08:  R09: 
>> > R10: 000103c0 R11: 0293 R12: 81ea03c8
>> > R13: 8104c7cb R14:  R15: 0003
>> > FS:  7f984f9b2700() GS:88089f38() 
>> > knlGS:
>> > CS:  0010 DS:  ES:  CR0: 80050033
>> > CR2: 880870187ed8 CR3: 0008645f8000 CR4: 001426e0
>> > Stack:
>> >  810cc83d 632c3d28  81ea03c0
>> >  0046 0003  
>> >  8808632c3d80 810cca44 88087af63800 0286
>> > Call Trace:
>> >  [] ? __wake_up_common+0x4d/0x80
>> >  [] __wake_up+0x34/0x50
>> >  [] __vga_put+0x73/0xd0
>> >  [] vga_put+0x54/0x80
>> >  [] vfio_pci_vga_rw+0x1d2/0x220
>> >  [] vfio_pci_rw+0x33/0x60
>> >  [] vfio_pci_write+0x17/0x20
>> >  [] vfio_device_fops_write+0x26/0x30
>> >  [] __vfs_write+0x23/0xe0
>> >  [] ? __vfs_read+0x23/0xd0
>> >  [] ? do_vfs_ioctl+0x2b5/0x490
>> >  [] vfs_write+0xa4/0x190
>> >  [] SyS_pwrite64+0x66/0xa0
>> >  [] entry_SYSCALL_64_fastpath+0x12/0x6a
>> > Code: 88 ff ff e0 7e 18 70 08 88 ff ff 00 8c 57 76 08 88 ff ff 20 7f 18 70 
>> > 08 88 ff ff 08 7f 18 70 08 88 ff ff 94 51 1a 81 ff ff ff ff <09> 00 00 00 
>> > 00 00 00 00 01 8c 57 76 08 88 ff ff 00 8c 57 76 08
>> > RIP  [] 0x880870187ed8
>> >  RSP 
>> > CR2: 880870187ed8
>> >
>> > The patch fixes the crash, but doesn't help with getting GPU in guest
>> > working again.
>> >
>> > Any ideas?
>> >
>> > ---
>> >  drivers/gpu/vga/vgaarb.c | 6 --
>> >  1 file changed, 4 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/gpu/vga/vgaarb.c
>> > index 3166e4bc4eb6..9abcaa53bd25 100644
>> > --- a/drivers/gpu/vga/vgaarb.c
>> > +++ b/drivers/gpu/vga/vgaarb.c
>> > @@ -395,8 +395,10 @@ int vga_get(struct pci_dev *pdev, unsigned int rsrc, 
>> > int interruptible)
>> > set_current_state(interruptible ?
>> >   TASK_INTERRUPTIBLE :
>> >   TASK_UNINTERRUPTIBLE);
>> > -   if (signal_pending(current)) {
>> > -   rc = -EINTR;
>> > +   if (interruptible && signal_pending(current)) {
>> > +   __set_current_state(TASK_RUNNING);
>> > +   remove_wait_queue(&vga_wait_queue, &wai

Re: [PATCH] vgaarb: fix signal handling in vga_get()

2015-12-10 Thread David Herrmann
Hi

On Mon, Nov 30, 2015 at 3:17 AM, Kirill A. Shutemov
 wrote:
> There are few defects in vga_get() related to signal hadning:
>
>   - we shouldn't check for pending signals for TASK_UNINTERRUPTIBLE
> case;
>
>   - if we found pending signal we must remove ourself from wait queue
> and change task state back to running;
>
>   - -ERESTARTSYS is more appropriate, I guess.
>
> Signed-off-by: Kirill A. Shutemov 
> ---
>
> Alex, I try to get KVM with VGA passthrough working properly. I have i915
> (HD 4600) on the host and GTX 580 for the guest. The guest GPU is not
> capabale of EFI, so I have to use x-vga=on. It's kinda work with your
> patch for i915.enable_hd_vgaarb=1. But guest refuse to initialize the GPU
> after KVM was not shut down correctly, resulting in host crash like this:
>
> BUG: unable to handle kernel paging request at 880870187ed8
> IP: [] 0x880870187ed8
> PGD 2129067 PUD 800841e3
> Oops: 0011 [#1] PREEMPT SMP
> Modules linked in: iwlmvm iwlwifi
> CPU: 6 PID: 3983 Comm: qemu-system-x86 Not tainted 4.3.0-gentoo #6
> Hardware name: Gigabyte Technology Co., Ltd. Z87X-UD7 TH/Z87X-UD7 TH-CF, BIOS 
> F5a 06/12/2014
> task: 88087a91 ti: 8808632c task.ti: 8808632c
> RIP: 0010:[]  [] 0x880870187ed8
> RSP: 0018:8808632c3d08  EFLAGS: 00010006
> RAX: 880870187db0 RBX: 70187f58 RCX: 
> RDX:  RSI: 0003 RDI: 880870187db0
> RBP: 8808632c3d48 R08:  R09: 
> R10: 000103c0 R11: 0293 R12: 81ea03c8
> R13: 8104c7cb R14:  R15: 0003
> FS:  7f984f9b2700() GS:88089f38() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 880870187ed8 CR3: 0008645f8000 CR4: 001426e0
> Stack:
>  810cc83d 632c3d28  81ea03c0
>  0046 0003  
>  8808632c3d80 810cca44 88087af63800 0286
> Call Trace:
>  [] ? __wake_up_common+0x4d/0x80
>  [] __wake_up+0x34/0x50
>  [] __vga_put+0x73/0xd0
>  [] vga_put+0x54/0x80
>  [] vfio_pci_vga_rw+0x1d2/0x220
>  [] vfio_pci_rw+0x33/0x60
>  [] vfio_pci_write+0x17/0x20
>  [] vfio_device_fops_write+0x26/0x30
>  [] __vfs_write+0x23/0xe0
>  [] ? __vfs_read+0x23/0xd0
>  [] ? do_vfs_ioctl+0x2b5/0x490
>  [] vfs_write+0xa4/0x190
>  [] SyS_pwrite64+0x66/0xa0
>  [] entry_SYSCALL_64_fastpath+0x12/0x6a
> Code: 88 ff ff e0 7e 18 70 08 88 ff ff 00 8c 57 76 08 88 ff ff 20 7f 18 70 08 
> 88 ff ff 08 7f 18 70 08 88 ff ff 94 51 1a 81 ff ff ff ff <09> 00 00 00 00 00 
> 00 00 01 8c 57 76 08 88 ff ff 00 8c 57 76 08
> RIP  [] 0x880870187ed8
>  RSP 
> CR2: 880870187ed8
>
> The patch fixes the crash, but doesn't help with getting GPU in guest
> working again.
>
> Any ideas?
>
> ---
>  drivers/gpu/vga/vgaarb.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/gpu/vga/vgaarb.c
> index 3166e4bc4eb6..9abcaa53bd25 100644
> --- a/drivers/gpu/vga/vgaarb.c
> +++ b/drivers/gpu/vga/vgaarb.c
> @@ -395,8 +395,10 @@ int vga_get(struct pci_dev *pdev, unsigned int rsrc, int 
> interruptible)
> set_current_state(interruptible ?
>   TASK_INTERRUPTIBLE :
>   TASK_UNINTERRUPTIBLE);
> -   if (signal_pending(current)) {
> -   rc = -EINTR;
> +   if (interruptible && signal_pending(current)) {
> +   __set_current_state(TASK_RUNNING);
> +   remove_wait_queue(&vga_wait_queue, &wait);
> +   rc = -ERESTARTSYS;
> break;

All 3 points are valid, and the patch looks good to me:

Reviewed-by: David Herrmann 

However, there seems to be a race between vga_lock and putting the
thread asleep. We should fix that as well. See the hunk below
(completely untested.. why is VGA still in use? *sigh*).

Thanks
David

diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/gpu/vga/vgaarb.c
index a0b4334..82cf1e3 100644
--- a/drivers/gpu/vga/vgaarb.c
+++ b/drivers/gpu/vga/vgaarb.c
@@ -359,8 +359,8 @@ static void __vga_put
 int vga_get(struct pci_dev *pdev, unsigned int rsrc, int interruptible)
 {
struct vga_device *vgadev, *conflict;
+   DECLARE_WAITQUEUE(wait, current);
unsigned long flags;
-   wait_queue_t wait;
int rc = 0;

vga_check_first_use();
@@ -371,6 +371,11 @@ int vga_get
return 0;

for (;;) {
+   add_wait_queue(&vga_wait_qu

Re: kdbus refactoring?

2015-11-09 Thread David Herrmann
Hi

On Mon, Nov 9, 2015 at 10:14 AM, Peter Zijlstra  wrote:
> On Mon, Nov 09, 2015 at 09:56:04AM +0100, David Herrmann wrote:
>
>> The submission and following year of development followed the 'kernel
>> way'.
>
> This is not true; the last complete posting I could find was:
>
>   Date: Mon,  9 Mar 2015 14:09:06 +0100
>   Subject: [PATCH v4 00/14] Add kdbus implementation

..which was the last time we proposed it for inclusion.

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kdbus refactoring?

2015-11-09 Thread David Herrmann
Hi

On Mon, Nov 9, 2015 at 9:48 AM, Peter Zijlstra  wrote:
> On Mon, Nov 09, 2015 at 09:34:37AM +0100, David Herrmann wrote:
>> Hi
>>
>> On Mon, Nov 9, 2015 at 1:53 AM, Josh Triplett  wrote:
>> > Quoting Documentation/development-process/1.Intro:
>> [...]
>> >> Years of experience with the kernel development community have taught a
>> >> clear lesson: kernel code which is designed and developed behind closed
>> >> doors invariably has problems which are only revealed when the code is
>> >> released into the community.  Sometimes these problems are severe,
>> >> requiring months or years of effort before the code can be brought up to
>> >> the kernel community's standards.
>> [...]
>> > And I've seen you specifically recommend having such conversations early
>> > and often.
>>
>> I think comparing kdbus to "behind closed doors" development models is
>> unfair. We chose to center our development around DBus, not the
>> kernel.
>
> And yet it will be the kernel people you ask to take your code. You
> don't see something funny with that?

No.

>> Anybody who is interested in kdbus discussions could have
>> easily joined the DBus and systemd communication channels (and *many*
>> people did). I see little reason in cross-posting everything to LKML,
>> especially given that our communication is rarely mail-based.
>
> So you want to develop kernel code, but can't be arsed to do it the
> kernel way? Gheez, no wonder this all is going so well..

The submission and following year of development followed the 'kernel
way'. I don't see why the design phase needs to follow your style,
though.

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kdbus refactoring?

2015-11-09 Thread David Herrmann
Hi

On Mon, Nov 9, 2015 at 1:53 AM, Josh Triplett  wrote:
> Quoting Documentation/development-process/1.Intro:
[...]
>> Years of experience with the kernel development community have taught a
>> clear lesson: kernel code which is designed and developed behind closed
>> doors invariably has problems which are only revealed when the code is
>> released into the community.  Sometimes these problems are severe,
>> requiring months or years of effort before the code can be brought up to
>> the kernel community's standards.
[...]
> And I've seen you specifically recommend having such conversations early
> and often.

I think comparing kdbus to "behind closed doors" development models is
unfair. We chose to center our development around DBus, not the
kernel. Anybody who is interested in kdbus discussions could have
easily joined the DBus and systemd communication channels (and *many*
people did). I see little reason in cross-posting everything to LKML,
especially given that our communication is rarely mail-based.

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] thinkpad_acpi: don't yell on unsupported brightness interfaces

2015-11-06 Thread David Herrmann
Hi Darren!

On Wed, Oct 21, 2015 at 4:44 PM, Darren Hart  wrote:
> On Wed, Oct 21, 2015 at 12:33:52PM -0200, Henrique de Moraes Holschuh wrote:
>> On Wed, Oct 21, 2015, at 08:46, David Herrmann wrote:
>> > The thinkpad_acpi driver currently emits error messages on unsupported
>> > brightness interfaces, giving the impression that someone will implement
>> > those. However, this error is spit out on nearly every thinkpad in
>> > production since 2 years now. Furthermore, the backlight interfaces on
>> > those devices are supported by the i915 driver just fine.
>> >
>> > Downgrade the error message to a normal pr_info() and stop telling people
>> > to report it to IBM.
>>
>> IBM?  Those reports go directly to me.
>>
>> > Signed-off-by: David Herrmann 
>>
>> Acked-by: Henrique de Moraes Holschuh 
>
> Thanks, Queued.

Any particular reason this didn't show up in:
[GIT PULL] platform-drivers-x86 for 4.4-1

Just a kind reminder, in case it slipped through the cracks.

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] Input: uinput - add new UINPUT_DEV_SETUP and UI_ABS_SETUP ioctl

2015-10-25 Thread David Herrmann
Hi

On Sun, Oct 25, 2015 at 12:53 AM, Dmitry Torokhov
 wrote:
> Hi Benjamin,
>
> On Tue, Aug 25, 2015 at 11:12:59AM -0400, Benjamin Tissoires wrote:
>> +static int uinput_abs_setup(struct uinput_device *udev,
>> + struct uinput_setup __user *arg, size_t size)
>> +{
>> + struct uinput_abs_setup setup = {};
>> + struct input_dev *dev;
>> +
>> + if (size > sizeof(setup))
>> + return -E2BIG;
>> + if (udev->state == UIST_CREATED)
>> + return -EINVAL;
>> + if (copy_from_user(&setup, arg, size))
>> + return -EFAULT;
>> + if (setup.code > ABS_MAX)
>> + return -ERANGE;
>> +
>> + dev = udev->dev;
>> +
>> + input_alloc_absinfo(dev);
>> + if (!dev->absinfo)
>> + return -ENOMEM;
>> +
>> + set_bit(setup.code, dev->absbit);
>> + dev->absinfo[setup.code] = setup.absinfo;
>> +
>> + /*
>> +  * We restore the state to UIST_NEW_DEVICE because the user has to call
>> +  * UI_DEV_SETUP in the last place before UI_DEV_CREATE to check the
>> +  * validity of the absbits.
>> +  */
>
> Do we have to be this strict? It seems to me that with the new IOCTLs
> you naturally want to use the new ioctl to setup the device, then adjust
> various axes and bits and then validate everything.

Indeed, we now force the order to be abs-setup first, then
device-setup as last step. Appended is a follow-up patch to cleanup
ABS handling in uinput. It is untested. Benjamin, care to give it a
spin?

Thanks
David


>From 2568f83bcc5c4b8aeb149be2c5fc5a743812f0fe Mon Sep 17 00:00:00 2001
From: David Herrmann 
Date: Sun, 25 Oct 2015 10:34:13 +0100
Subject: [PATCH] Input: uinput - rework ABS validation

Rework the uinput ABS validation to check passed absinfo data
immediately, but do ABS initialization as last step in UI_DEV_CREATE. The
behavior observed by user-space is not changed, as ABS initialization was
never checked for errors.

With this in place, the order of device-initialization and
abs-configuration is no longer fixed. User-space can initialize the
device and afterwards set absinfo just fine.

Signed-off-by: David Herrmann 
---
 drivers/input/misc/uinput.c | 89 +++--
 1 file changed, 45 insertions(+), 44 deletions(-)

diff --git a/drivers/input/misc/uinput.c b/drivers/input/misc/uinput.c
index baac903..1d93037 100644
--- a/drivers/input/misc/uinput.c
+++ b/drivers/input/misc/uinput.c
@@ -256,13 +256,22 @@ static void uinput_destroy_device(...)
 static int uinput_create_device(struct uinput_device *udev)
 {
  struct input_dev *dev = udev->dev;
- int error;
+ int error, nslot;

  if (udev->state != UIST_SETUP_COMPLETE) {
  printk(KERN_DEBUG "%s: write device info first\n", UINPUT_NAME);
  return -EINVAL;
  }

+ if (test_bit(ABS_MT_SLOT, dev->absbit)) {
+ nslot = input_abs_get_max(dev, ABS_MT_SLOT) + 1;
+ error = input_mt_init_slots(dev, nslot, 0);
+ if (error)
+ goto fail1;
+ } else if (test_bit(ABS_MT_POSITION_X, dev->absbit)) {
+ input_set_events_per_packet(dev, 60);
+ }
+
  if (udev->ff_effects_max) {
  error = input_ff_create(dev, udev->ff_effects_max);
  if (error)
@@ -308,10 +317,35 @@ static int uinput_open(...)
  return 0;
 }

+static int uinput_validate_absinfo(struct input_dev *dev, unsigned int code,
+   const struct input_absinfo *abs)
+{
+ int min, max;
+
+ min = abs->minimum;
+ max = abs->maximum;
+
+ if ((min != 0 || max != 0) && max <= min) {
+ printk(KERN_DEBUG
+   "%s: invalid abs[%02x] min:%d max:%d\n",
+   UINPUT_NAME, code, min, max);
+ return -EINVAL;
+ }
+
+ if (abs->flat > max - min) {
+ printk(KERN_DEBUG
+   "%s: abs_flat #%02x out of range: %d (min:%d/max:%d)\n",
+   UINPUT_NAME, code, abs->flat, min, max);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
 static int uinput_validate_absbits(struct input_dev *dev)
 {
  unsigned int cnt;
- int nslot;
+ int error;

  if (!test_bit(EV_ABS, dev->evbit))
  return 0;
@@ -321,38 +355,12 @@ static int uinput_validate_absbits(struct input_dev *dev)
  */

  for_each_set_bit(cnt, dev->absbit, ABS_CNT) {
- int min, max;
-
- min = input_abs_get_min(dev, cnt);
- max = input_abs_get_max(dev, cnt);
-
- if ((min != 0 || max != 0) && max <= min) {
- printk(KERN_DEBUG
- "%s: invalid abs[%02x] min:%d max:%d\n",
- UINPUT_NAME, cnt,
- input_abs_get_min(dev, cnt),
- input_abs_get_max(dev, cnt));
+ if (!dev->absinfo)
  return -EINVAL;
- }
-
- if (input_abs_get_flat(dev, cnt) >
-input_abs_get_max(dev, cnt) - input_abs_get_min(dev, cnt)) {
- printk(KERN_DEBUG
- "%s: abs_flat #%02x out of range: %d "
- "(min:%d/max:%d)\n",
- UINPUT_NAME, cnt,
- input_a

[PATCH] thinkpad_acpi: don't yell on unsupported brightness interfaces

2015-10-21 Thread David Herrmann
The thinkpad_acpi driver currently emits error messages on unsupported
brightness interfaces, giving the impression that someone will implement
those. However, this error is spit out on nearly every thinkpad in
production since 2 years now. Furthermore, the backlight interfaces on
those devices are supported by the i915 driver just fine.

Downgrade the error message to a normal pr_info() and stop telling people
to report it to IBM.

Signed-off-by: David Herrmann 
---
 drivers/platform/x86/thinkpad_acpi.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/platform/x86/thinkpad_acpi.c 
b/drivers/platform/x86/thinkpad_acpi.c
index 131dd74..0bed473 100644
--- a/drivers/platform/x86/thinkpad_acpi.c
+++ b/drivers/platform/x86/thinkpad_acpi.c
@@ -6459,8 +6459,7 @@ static void __init 
tpacpi_detect_brightness_capabilities(void)
pr_info("detected a 8-level brightness capable ThinkPad\n");
break;
default:
-   pr_err("Unsupported brightness interface, "
-  "please contact %s\n", TPACPI_MAIL);
+   pr_info("Unsupported brightness interface\n");
tp_features.bright_unkfw = 1;
bright_maxlvl = b - 1;
}
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7] Input: Add userio module

2015-10-08 Thread David Herrmann
Hi

On Mon, Oct 5, 2015 at 5:55 PM,   wrote:
> From: Stephen Chandler Paul 
>
> Debugging input devices, specifically laptop touchpads, can be tricky
> without having the physical device handy. Here we try to remedy that
> with userio. This module allows an application to connect to a character
> device provided by the kernel, and emulate any serio device. In
> combination with userspace programs that can record PS/2 devices and
> replay them through the /dev/userio device, this allows developers to
> debug driver issues on the PS/2 level with devices simply by requesting
> a recording from the user experiencing the issue without having to have
> the physical hardware in front of them.
>
> Signed-off-by: Stephen Chandler Paul 

Apart from the serio-unregistration, all I have left is bike-shedding.
Regardless of those changes, this is:

Reviewed-by: David Herrmann 

See below for some details..

> ---
> Changes
> * Removed useless if (!userio) check in userio_device_write()
>
>  Documentation/input/userio.txt |  70 ++
>  MAINTAINERS|   6 +
>  drivers/input/serio/Kconfig|  11 ++
>  drivers/input/serio/Makefile   |   1 +
>  drivers/input/serio/userio.c   | 286 
> +
>  include/linux/miscdevice.h |   1 +
>  include/uapi/linux/userio.h|  44 +++
>  7 files changed, 419 insertions(+)
>  create mode 100644 Documentation/input/userio.txt
>  create mode 100644 drivers/input/serio/userio.c
>  create mode 100644 include/uapi/linux/userio.h
>
> diff --git a/Documentation/input/userio.txt b/Documentation/input/userio.txt
> new file mode 100644
> index 000..0880c0f
> --- /dev/null
> +++ b/Documentation/input/userio.txt
> @@ -0,0 +1,70 @@
> + The userio Protocol
> +(c) 2015 Stephen Chandler Paul 
> + Sponsored by Red Hat
> +
> +
> +1. Introduction
> +~~~
> +  This module is intended to try to make the lives of input driver developers
> +easier by allowing them to test various serio devices (mainly the various
> +touchpads found on laptops) without having to have the physical device in 
> front
> +of them. userio accomplishes this by allowing any privileged userspace 
> program
> +to directly interact with the kernel's serio driver and control a virtual 
> serio
> +port from there.
> +
> +2. Usage overview
> +~
> +  In order to interact with the userio kernel module, one simply opens the
> +/dev/userio character device in their applications. Commands are sent to the
> +kernel module by writing to the device, and any data received from the serio
> +driver is read as-is from the /dev/userio device. All of the structures and
> +macros you need to interact with the device are defined in  
> and
> +.
> +
> +3. Command Structure
> +
> +  The struct used for sending commands to /dev/userio is as follows:
> +
> +   struct userio_cmd {
> +   __u8 type;
> +   __u8 data;
> +   };
> +
> +  "type" describes the type of command that is being sent. This can be any 
> one
> +of the USERIO_CMD macros defined in . "data" is the argument
> +that goes along with the command. In the event that the command doesn't have 
> an
> +argument, this field can be left untouched and will be ignored by the kernel.
> +Each command should be sent by writing the struct directly to the character
> +device. In the event that the command you send is invalid, an error will be
> +returned by the character device and a more descriptive error will be printed
> +to the kernel log. Only one command can be sent at a time, any additional 
> data
> +written to the character device after the initial command will be ignored.
> +  To close the virtual serio port, just close /dev/userio.
> +
> +4. Commands
> +~~~
> +
> +4.1 USERIO_CMD_REGISTER
> +~~~
> +  Registers the port with the serio driver and begins transmitting data back 
> and
> +forth. Registration can only be performed once a port type is set with
> +USERIO_CMD_SET_PORT_TYPE. Has no argument.
> +
> +4.2 USERIO_CMD_SET_PORT_TYPE
> +
> +  Sets the type of port we're emulating, where "data" is the port type being
> +set. Can be any of the macros from . For example: SERIO_8042
> +would set the port type to be a normal PS/2 port.
> +
> +4.3 USERIO_CMD_SEND_INTERRUPT
> +~
> +  Sends an interrupt through the virtual serio port to

Re: [PATCH 00/44] kdbus cleanups

2015-10-08 Thread David Herrmann
Hi

On Thu, Oct 8, 2015 at 1:31 PM, Sergei Zviagintsev  wrote:
> Hi all,
>
> This is a set of various kdbus code cleanups. Patches are ordered by
> increasing complexity, starting with docs and comments fixes and
> one-liners.
>
> Patch 29 is the revised version of
> http://lkml.kernel.org/g/1435497454-10464-6-git-send-email-ser...@s15v.net
>
> Feel free to ask to change layout of this, split/join, etc if necessary.

So I reviewed all of the patches, most of them look good. Some comments:
 - Please justify your changes in the commit-message. Always.
 - Please don't split patches based on modified functions. If you fix
typos, do them subsystem-wide. If you fix common errors like "don't
init 'name' to NULL before calling kdbus_pin_dst()", then please do
that for all functions. In other words, group your changes logically,
not based on location.
 - If you do cleanups, explain why you do them. I commented on some of
the changes, which IMO reduce readability.

Anyway, looks good.

Thanks a lot!
David

> Thanks, Sergei
>
> Sergei Zviagintsev (44):
>   Documentation/kdbus: Document new name registry flags
>   uapi: kdbus.h: Kernel-doc fixes
>   kdbus: Kernel-docs and comments trivial fixes
>   kdbus: Update kernel-doc for struct kdbus_pool
>   kdbus: Add comment on merging free pool slices
>   kdbus: Fix kernel-doc for struct kdbus_gaps
>   kdbus: Fix comment on translation of caps between namespaces
>   kdbus: Rename var in kdbus_meta_export_caps()
>   kdbus: Remove unused KDBUS_MSG_MAX_SIZE constant
>   kdbus: Use conditional operator
>   kdbus: Cosmetic fix of kdbus_name_is_valid()
>   kdbus: Use conventional list macros in __kdbus_pool_slice_release()
>   kdbus: Use list_next_entry() in kdbus_queue_entry_unlink()
>   kdbus: Simplify expression in kdbus_get_memfd()
>   kdbus: Simplify bitwise expression in kdbus_meta_get_mask()
>   kdbus: Drop redundant code from kdbus_name_acquire()
>   kdbus: Drop duplicated code from kdbus_pool_slice_alloc()
>   kdbus: Add var initialization to kdbus_conn_entry_insert()
>   kdbus: Drop useless initialization from kdbus_conn_reply()
>   kdbus: Drop useless initialization from kdbus_cmd_hello()
>   kdbus: Cleanup tests in kdbus_cmd_send()
>   kdbus: Cleanup error path in kdbus_staging_new_user()
>   kdbus: Cleanup kdbus_conn_call()
>   kdbus: Cleanup kdbus_conn_unicast()
>   kdbus: Cleanup kdbus_cmd_conn_info()
>   kdbus: Cleanup kdbus_pin_dst()
>   kdbus: Cleanup kdbus_conn_new()
>   kdbus: Cleanup kdbus_queue_entry_new()
>   kdbus: Improve tests on incrementing quota
>   kdbus: Cleanup kdbus_meta_proc_mask()
>   kdbus: Cleanup kdbus_conn_move_messages()
>   kdbus: Remove duplicated code from kdbus_conn_lock2()
>   kdbus: Improve kdbus_staging_reserve()
>   kdbus: Improve kdbus_conn_entry_sync_attach()
>   kdbus: Drop goto from kdbus_queue_entry_link()
>   kdbus: Improve kdbus_name_release()
>   kdbus: Fix error path in kdbus_meta_proc_collect_cgroup()
>   kdbus: Fix error path in kdbus_user_lookup()
>   kdbus: Cleanup kdbus_user_lookup()
>   kdbus: Cleanup kdbus_item_validate_name()
>   kdbus: Fix memfd install algorithm
>   kdbus: Check if fd is allocated before trying to free it
>   kdbus: Give up on failed fd allocation
>   kdbus: Cleanup kdbus_gaps_install()
>
>  Documentation/kdbus/kdbus.name.xml |  42 +-
>  include/uapi/linux/kdbus.h |  43 +-
>  ipc/kdbus/connection.c | 157 
> +++--
>  ipc/kdbus/connection.h |  19 ++---
>  ipc/kdbus/domain.c |  38 +
>  ipc/kdbus/fs.c |   2 +-
>  ipc/kdbus/item.c   |  26 +++---
>  ipc/kdbus/limits.h |   3 -
>  ipc/kdbus/message.c|  81 +--
>  ipc/kdbus/message.h|   9 ++-
>  ipc/kdbus/metadata.c   |  79 ++-
>  ipc/kdbus/names.c  |  32 
>  ipc/kdbus/node.c   |   4 +-
>  ipc/kdbus/pool.c   |  26 +++---
>  ipc/kdbus/queue.c  |  51 ++--
>  ipc/kdbus/queue.h  |   2 +-
>  16 files changed, 298 insertions(+), 316 deletions(-)
>
> --
> 1.8.3.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 43/44] kdbus: Give up on failed fd allocation

2015-10-08 Thread David Herrmann
Hi

On Thu, Oct 8, 2015 at 1:32 PM, Sergei Zviagintsev  wrote:
> If we failed to allocate a file descriptor, do not try to do it again on
> the next iteration. There are few chances that we will have success, and
> we can simplify the algorithm assuming that valid fd numbers are not
> mixed with -1 values.

Why? This is no a fast-path, and the penalty is accounted on the
actual culprit (the caller). I don't see why we should optimize for
that case. If the fd-allocation fails, you clearly did something
wrong.

Thanks
David

> Signed-off-by: Sergei Zviagintsev 
> ---
>  ipc/kdbus/message.c | 42 --
>  1 file changed, 24 insertions(+), 18 deletions(-)
>
> diff --git a/ipc/kdbus/message.c b/ipc/kdbus/message.c
> index da685049d66c..75e6213e7ed5 100644
> --- a/ipc/kdbus/message.c
> +++ b/ipc/kdbus/message.c
> @@ -123,7 +123,7 @@ int kdbus_gaps_install(struct kdbus_gaps *gaps, struct 
> kdbus_pool_slice *slice,
>  {
> bool incomplete_fds = false;
> struct kvec kvec;
> -   size_t i, n_fds;
> +   size_t i, n_fds, n_memfds;
> int ret, *fds;
>
> if (!gaps) {
> @@ -140,25 +140,31 @@ int kdbus_gaps_install(struct kdbus_gaps *gaps, struct 
> kdbus_pool_slice *slice,
> }
>
> fds = kmalloc_array(n_fds, sizeof(*fds), GFP_TEMPORARY);
> -   n_fds = 0;
> if (!fds)
> return -ENOMEM;
>
> /* 1) allocate fds and copy them over */
>
> +   n_fds = 0; /* n_fds now tracks the number of allocated fds */
> if (gaps->n_fds > 0) {
> for (i = 0; i < gaps->n_fds; ++i) {
> int fd;
>
> fd = get_unused_fd_flags(O_CLOEXEC);
> -   if (fd < 0)
> +   if (fd < 0) {
> incomplete_fds = true;
> +   break;
> +   }
>
> WARN_ON(!gaps->fd_files[i]);
>
> -   fds[n_fds++] = fd < 0 ? -1 : fd;
> +   fds[n_fds++] = fd;
> }
>
> +   /* If we couldn't allocate a fd, fill the rest with -1 */
> +   for ( ; i < gaps->n_fds; ++i)
> +   fds[i] = -1;
> +
> /*
>  * The file-descriptor array can only be present once per
>  * message. Hence, prepare all fds and then copy them over 
> with
> @@ -175,18 +181,18 @@ int kdbus_gaps_install(struct kdbus_gaps *gaps, struct 
> kdbus_pool_slice *slice,
> goto exit;
> }
>
> -   for (i = 0; i < gaps->n_memfds; ++i) {
> +   n_memfds = 0;
> +   for (i = 0; i < gaps->n_memfds && !incomplete_fds; ++i) {
> int memfd;
>
> memfd = get_unused_fd_flags(O_CLOEXEC);
> if (memfd < 0) {
> incomplete_fds = true;
> -   fds[n_fds++] = -1;
> /* memfds are initialized to -1, skip copying it */
> -   continue;
> +   break;
> }
>
> -   fds[n_fds++] = memfd;
> +   fds[gaps->n_fds + n_memfds++] = memfd;
>
> /*
>  * memfds have to be copied individually as they each are put
> @@ -208,21 +214,21 @@ int kdbus_gaps_install(struct kdbus_gaps *gaps, struct 
> kdbus_pool_slice *slice,
>
> /* 2) install fds now that everything was successful */
>
> -   for (i = 0; i < gaps->n_fds; ++i)
> -   if (fds[i] >= 0)
> -   fd_install(fds[i], get_file(gaps->fd_files[i]));
> -   for (i = 0; i < gaps->n_memfds; ++i)
> -   if (fds[gaps->n_fds + i] >= 0)
> -   fd_install(fds[gaps->n_fds + i],
> -  get_file(gaps->memfd_files[i]));
> +   for (i = 0; i < n_fds; ++i)
> +   fd_install(fds[i], get_file(gaps->fd_files[i]));
> +   for (i = 0; i < n_memfds; ++i)
> +   fd_install(fds[gaps->n_fds + i],
> +  get_file(gaps->memfd_files[i]));
>
> ret = 0;
>
>  exit:
> -   if (ret < 0)
> +   if (ret < 0) {
> for (i = 0; i < n_fds; ++i)
> -   if (fds[i] >= 0)
> -   put_unused_fd(fds[i]);
> +   put_unused_fd(fds[i]);
> +   for (i = 0; i < n_memfds; ++i)
> +   put_unused_fd(fds[gaps->n_fds + i]);
> +   }
> kfree(fds);
> *out_incomplete = incomplete_fds;
> return ret;
> --
> 1.8.3.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 38/44] kdbus: Fix error path in kdbus_user_lookup()

2015-10-08 Thread David Herrmann
Hi

On Thu, Oct 8, 2015 at 1:32 PM, Sergei Zviagintsev  wrote:
> If idr_alloc() fails, we shouldn't call idr_remove() as latter produces
> warning when called on non-allocated ids. Split cleanup code into three
> parts for differrent cleanup scenarios before and after idr_alloc().
>
> Signed-off-by: Sergei Zviagintsev 
> ---
>  ipc/kdbus/domain.c | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/ipc/kdbus/domain.c b/ipc/kdbus/domain.c
> index ac9f760c150d..31cd09fb572f 100644
> --- a/ipc/kdbus/domain.c
> +++ b/ipc/kdbus/domain.c
> @@ -208,7 +208,7 @@ struct kdbus_user *kdbus_user_lookup(struct kdbus_domain 
> *domain, kuid_t uid)
> u = kzalloc(sizeof(*u), GFP_KERNEL);
> if (!u) {
> ret = -ENOMEM;
> -   goto exit;
> +   goto exit_unlock;
> }
>
> kref_init(&u->kref);
> @@ -225,7 +225,7 @@ struct kdbus_user *kdbus_user_lookup(struct kdbus_domain 
> *domain, kuid_t uid)
> ret = idr_alloc(&domain->user_idr, u, __kuid_val(uid),
> __kuid_val(uid) + 1, GFP_KERNEL);
> if (ret < 0)
> -   goto exit;
> +   goto exit_free;
> }
> }
>
> @@ -235,19 +235,19 @@ struct kdbus_user *kdbus_user_lookup(struct 
> kdbus_domain *domain, kuid_t uid)
>  */
> ret = ida_simple_get(&domain->user_ida, 1, 0, GFP_KERNEL);
> if (ret < 0)
> -   goto exit;
> +   goto exit_idr;
>

Why not simply assign "u->uid = uid;" _after_ doing the idr operations?

Thanks
David

> u->id = ret;
> mutex_unlock(&domain->lock);
> return u;
>
> -exit:
> -   if (u) {
> -   if (uid_valid(u->uid))
> -   idr_remove(&domain->user_idr, __kuid_val(u->uid));
> -   kdbus_domain_unref(u->domain);
> -   kfree(u);
> -   }
> +exit_idr:
> +   if (uid_valid(u->uid))
> +   idr_remove(&domain->user_idr, __kuid_val(u->uid));
> +exit_free:
> +   kdbus_domain_unref(u->domain);
> +   kfree(u);
> +exit_unlock:
> mutex_unlock(&domain->lock);
> return ERR_PTR(ret);
>  }
> --
> 1.8.3.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 31/44] kdbus: Cleanup kdbus_conn_move_messages()

2015-10-08 Thread David Herrmann
Hi

On Thu, Oct 8, 2015 at 1:32 PM, Sergei Zviagintsev  wrote:
>  - Move `r' and `ret' to scopes where they are used. Drop redundant
>initialization of `ret'.
>
>  - Initialize `bus' on declaration.
>
>  - Replace list_for_each_entry_safe() with list_for_each_entry() when
>iterating over list of replies.
>
>  - Drop redundant `continue'.
>
> Signed-off-by: Sergei Zviagintsev 
> ---
>  ipc/kdbus/connection.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c
> index b32b4f981618..6ee688d3de53 100644
> --- a/ipc/kdbus/connection.c
> +++ b/ipc/kdbus/connection.c
> @@ -1293,25 +1293,24 @@ void kdbus_conn_move_messages(struct kdbus_conn 
> *conn_dst,
>   u64 name_id)
>  {
> struct kdbus_queue_entry *e, *e_tmp;
> -   struct kdbus_reply *r, *r_tmp;
> -   struct kdbus_bus *bus;
> +   struct kdbus_bus *bus = conn_src->ep->bus;
> struct kdbus_conn *c;
> LIST_HEAD(msg_list);
> -   int i, ret = 0;
> +   int i;
>
> if (WARN_ON(conn_src == conn_dst))
> return;
>
> -   bus = conn_src->ep->bus;
> -
> /* lock order: domain -> bus -> ep -> names -> conn */
> down_read(&bus->conn_rwlock);
> hash_for_each(bus->conn_hash, i, c, hentry) {
> +   struct kdbus_reply *r;
> +

Why make 'r' local?

> if (c == conn_src || c == conn_dst)
> continue;
>
> mutex_lock(&c->lock);
> -   list_for_each_entry_safe(r, r_tmp, &c->reply_list, entry) {
> +   list_for_each_entry(r, &c->reply_list, entry) {

Looks good.

> if (r->reply_src != conn_src)
> continue;
>
> @@ -1328,6 +1327,8 @@ void kdbus_conn_move_messages(struct kdbus_conn 
> *conn_dst,
>
> kdbus_conn_lock2(conn_src, conn_dst);
> list_for_each_entry_safe(e, e_tmp, &conn_src->queue.msg_list, entry) {
> +   int ret;
> +

Why make it local?

> /* filter messages for a specific name */
> if (name_id > 0 && e->dst_name_id != name_id)
> continue;
> @@ -1343,7 +1344,6 @@ void kdbus_conn_move_messages(struct kdbus_conn 
> *conn_dst,
> if (ret < 0) {
> kdbus_conn_lost_message(conn_dst);
> kdbus_queue_entry_free(e);
> -   continue;

Looks good.

Thanks
David

> }
> }
> kdbus_conn_unlock2(conn_src, conn_dst);
> --
> 1.8.3.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 30/44] kdbus: Cleanup kdbus_meta_proc_mask()

2015-10-08 Thread David Herrmann
Hi

On Thu, Oct 8, 2015 at 1:31 PM, Sergei Zviagintsev  wrote:
>  - Simplify if-statements
>
>'proc < KDBUS_META_PROC_NORMAL' test only makes sense when we call
>kdbus_proc_permission(). Include it into 'prv_pid != req_pid' branch,
>remove redundant assignment of `proc' var and reduce its scope.
>
>  - Drop redundant binary operation
>
>In 'proc < KDBUS_META_PROC_NORMAL' case we firstly do
>'wanted &= ~flags' and then 'wanted & flags' in the return statement,
>which is the same as 'wanted & 0'. Return 0 instead.
>
>  - Cosmetic cleanup of the return statement.
>
> Signed-off-by: Sergei Zviagintsev 
> ---
>  ipc/kdbus/metadata.c | 32 
>  1 file changed, 12 insertions(+), 20 deletions(-)
>
> diff --git a/ipc/kdbus/metadata.c b/ipc/kdbus/metadata.c
> index 61215a078359..b8d094d9fb56 100644
> --- a/ipc/kdbus/metadata.c
> +++ b/ipc/kdbus/metadata.c
> @@ -1224,7 +1224,6 @@ static u64 kdbus_meta_proc_mask(struct pid *prv_pid,
> u64 wanted)
>  {
> struct pid_namespace *prv_ns, *req_ns;
> -   unsigned int proc;
>
> prv_ns = ns_of_pid(prv_pid);
> req_ns = ns_of_pid(req_pid);
> @@ -1243,30 +1242,23 @@ static u64 kdbus_meta_proc_mask(struct pid *prv_pid,
>  * provider and requestor are the same. If not, we perform rather
>  * expensive /proc permission checks.
>  */
> -   if (prv_pid == req_pid)
> -   proc = KDBUS_META_PROC_NORMAL;
> -   else
> -   proc = kdbus_proc_permission(req_ns, req_cred, prv_pid);
> -
> -   /* you need /proc access to read standard process attributes */
> -   if (proc < KDBUS_META_PROC_NORMAL)
> -   wanted &= ~(KDBUS_ATTACH_TID_COMM |
> -   KDBUS_ATTACH_PID_COMM |
> -   KDBUS_ATTACH_SECLABEL |
> -   KDBUS_ATTACH_CMDLINE |
> -   KDBUS_ATTACH_CGROUP |
> -   KDBUS_ATTACH_AUDIT |
> -   KDBUS_ATTACH_CAPS |
> -   KDBUS_ATTACH_EXE);
> +   if (prv_pid != req_pid) {
> +   unsigned int proc = kdbus_proc_permission(req_ns, req_cred,
> + prv_pid);

Please keep "proc" declared non-local. Also, we don't do direct
assignments of non-static data in declarations.

Thanks
David

> +
> +   /* you need /proc access to read standard process attributes 
> */
> +   if (proc < KDBUS_META_PROC_NORMAL)
> +   return 0;
> +   }
>
> /* clear all non-/proc flags */
> return wanted & (KDBUS_ATTACH_TID_COMM |
>  KDBUS_ATTACH_PID_COMM |
>  KDBUS_ATTACH_SECLABEL |
> -KDBUS_ATTACH_CMDLINE |
> -KDBUS_ATTACH_CGROUP |
> -KDBUS_ATTACH_AUDIT |
> -KDBUS_ATTACH_CAPS |
> +KDBUS_ATTACH_CMDLINE  |
> +KDBUS_ATTACH_CGROUP   |
> +KDBUS_ATTACH_AUDIT|
> +KDBUS_ATTACH_CAPS |
>  KDBUS_ATTACH_EXE);
>  }
>
> --
> 1.8.3.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 26/44] kdbus: Cleanup kdbus_pin_dst()

2015-10-08 Thread David Herrmann
Hi

On Thu, Oct 8, 2015 at 1:31 PM, Sergei Zviagintsev  wrote:
>  - Reduce scope of `owner' var to the block where it is actually used.
>Use conditional operator to set its value.
>
>  - Drop initialization of `dst' as it gets value later.
>
>  - Drop separate exit point as we have only one use case of it.
>
> Signed-off-by: Sergei Zviagintsev 
> ---
>  ipc/kdbus/connection.c | 17 ++---
>  1 file changed, 6 insertions(+), 11 deletions(-)
>
> diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c
> index 6a73ac3f444d..ace587ee951a 100644
> --- a/ipc/kdbus/connection.c
> +++ b/ipc/kdbus/connection.c
> @@ -1048,10 +1048,8 @@ static int kdbus_pin_dst(struct kdbus_bus *bus,
>  struct kdbus_conn **out_dst)
>  {
> const struct kdbus_msg *msg = staging->msg;
> -   struct kdbus_name_owner *owner = NULL;
> struct kdbus_name_entry *name = NULL;
> -   struct kdbus_conn *dst = NULL;
> -   int ret;
> +   struct kdbus_conn *dst;
>
> lockdep_assert_held(&bus->name_registry->rwlock);
>
> @@ -1061,14 +1059,15 @@ static int kdbus_pin_dst(struct kdbus_bus *bus,
> return -ENXIO;
>
> if (!kdbus_conn_is_ordinary(dst)) {
> -   ret = -ENXIO;
> -   goto error;
> +   kdbus_conn_unref(dst);
> +   return -ENXIO;

Looks good.

> }
> } else {
> +   struct kdbus_name_owner *owner;
> +

I'd prefer if you avoid making this block-local.

> name = kdbus_name_lookup_unlocked(bus->name_registry,
>   staging->dst_name);
> -   if (name)
> -   owner = kdbus_name_get_owner(name);
> +   owner = name ? kdbus_name_get_owner(name) : NULL;

Looks good.

> if (!owner)
> return -ESRCH;
>
> @@ -1094,10 +1093,6 @@ static int kdbus_pin_dst(struct kdbus_bus *bus,
> *out_name = name;
> *out_dst = dst;
> return 0;
> -
> -error:
> -   kdbus_conn_unref(dst);
> -   return ret;

Looks good.

Thanks
David

>  }
>
>  static int kdbus_conn_reply(struct kdbus_conn *src,
> --
> 1.8.3.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 25/44] kdbus: Cleanup kdbus_cmd_conn_info()

2015-10-08 Thread David Herrmann
Hi

On Thu, Oct 8, 2015 at 1:31 PM, Sergei Zviagintsev  wrote:
>  - Move `entry' and `owner' to the scope where they are used. Drop
>redundand initialization of `entry'. Use conditional operator to set
>the value of `owner'.
>
>  - Set `ret' to zero right after call to kdbus_pool_slice_copy_kvec(),
>not in the end of function.
>
>  - Remove redundant goto.
>
> Signed-off-by: Sergei Zviagintsev 
> ---
>  ipc/kdbus/connection.c | 16 +++-
>  1 file changed, 7 insertions(+), 9 deletions(-)
>
> diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c
> index b3c5f20a57d8..6a73ac3f444d 100644
> --- a/ipc/kdbus/connection.c
> +++ b/ipc/kdbus/connection.c
> @@ -1702,8 +1702,6 @@ int kdbus_cmd_conn_info(struct kdbus_conn *conn, void 
> __user *argp)
>  {
> struct kdbus_meta_conn *conn_meta = NULL;
> struct kdbus_pool_slice *slice = NULL;
> -   struct kdbus_name_entry *entry = NULL;
> -   struct kdbus_name_owner *owner = NULL;
> struct kdbus_conn *owner_conn = NULL;
> struct kdbus_item *meta_items = NULL;
> struct kdbus_info info = {};
> @@ -1739,9 +1737,12 @@ int kdbus_cmd_conn_info(struct kdbus_conn *conn, void 
> __user *argp)
> name = argv[1].item ? argv[1].item->str : NULL;
>
> if (name) {
> +   struct kdbus_name_entry *entry;
> +   struct kdbus_name_owner *owner;
> +
> entry = kdbus_name_lookup_unlocked(bus->name_registry, name);
> -   if (entry)
> -   owner = kdbus_name_get_owner(entry);
> +   owner = entry ? kdbus_name_get_owner(entry) : NULL;
> +

This looks good to me.

> if (!owner ||
> !kdbus_conn_policy_see_name(conn, current_cred(), name) ||
> (cmd->id != 0 && owner->conn->id != cmd->id)) {
> @@ -1804,17 +1805,14 @@ int kdbus_cmd_conn_info(struct kdbus_conn *conn, void 
> __user *argp)
> ret = kdbus_pool_slice_copy_kvec(slice, 0, kvec, cnt, size);
> if (ret < 0)
> goto exit;
> +   ret = 0;
>
> kdbus_pool_slice_publish(slice, &cmd->offset, &cmd->info_size);
>
> if (kdbus_member_set_user(&cmd->offset, argp, typeof(*cmd), offset) ||
> kdbus_member_set_user(&cmd->info_size, argp,
> - typeof(*cmd), info_size)) {
> + typeof(*cmd), info_size))
> ret = -EFAULT;
> -   goto exit;
> -   }
> -
> -   ret = 0;

Again, why? Now you have a random "ret = 0;" somewhere in between,
instead of directly at the tail of the success-path.

Thanks
David

>
>  exit:
> up_read(&bus->name_registry->rwlock);
> --
> 1.8.3.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/44] kdbus: Cleanup kdbus_conn_unicast()

2015-10-08 Thread David Herrmann
Hi

On Thu, Oct 8, 2015 at 1:31 PM, Sergei Zviagintsev  wrote:
> Do not initialize `name' and `ret' as values are assigned to them at
> the first use by kdbus_pin_dst(). Simplify handling of
> kdbus_conn_entry_insert() return value and drop useless goto.
>
> Signed-off-by: Sergei Zviagintsev 
> ---
>  ipc/kdbus/connection.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c
> index db49f282a1bf..b3c5f20a57d8 100644
> --- a/ipc/kdbus/connection.c
> +++ b/ipc/kdbus/connection.c
> @@ -1229,12 +1229,12 @@ static int kdbus_conn_unicast(struct kdbus_conn *src,
>   struct kdbus_staging *staging)
>  {
> const struct kdbus_msg *msg = staging->msg;
> -   struct kdbus_name_entry *name = NULL;
> +   struct kdbus_name_entry *name;
> struct kdbus_reply *wait = NULL;
> struct kdbus_conn *dst = NULL;
> struct kdbus_bus *bus = src->ep->bus;
> bool is_signal = (msg->flags & KDBUS_MSG_SIGNAL);
> -   int ret = 0;
> +   int ret;
>
> if (WARN_ON(msg->dst_id == KDBUS_DST_ID_BROADCAST) ||
> WARN_ON(!(msg->flags & KDBUS_MSG_EXPECT_REPLY) &&
> @@ -1245,7 +1245,6 @@ static int kdbus_conn_unicast(struct kdbus_conn *src,
> down_read(&bus->name_registry->rwlock);
>
> /* find and pin destination */
> -

If a comment is addressed to a whole following block, we usually put a
newline after it. Only if the comment is only addressed at the next
code-line, we don't.

> ret = kdbus_pin_dst(bus, staging, &name, &dst);
> if (ret < 0)
> goto exit;
> @@ -1276,11 +1275,10 @@ static int kdbus_conn_unicast(struct kdbus_conn *src,
> kdbus_bus_eavesdrop(bus, src, staging);
>
> ret = kdbus_conn_entry_insert(src, dst, staging, wait, name);
> -   if (ret < 0 && !is_signal)
> -   goto exit;
>
> /* signals are treated like broadcasts, recv-errors are ignored */
> -   ret = 0;
> +   if (is_signal)
> +   ret = 0;

Why? Just to reduce the line-count? You break the code-flow here, by
making the success-path conditional, instead of the error-path.

Thanks
David

>
>  exit:
> up_read(&bus->name_registry->rwlock);
> --
> 1.8.3.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 23/44] kdbus: Cleanup kdbus_conn_call()

2015-10-08 Thread David Herrmann
Hi

On Thu, Oct 8, 2015 at 1:31 PM, Sergei Zviagintsev  wrote:
> Do not initialize `wait' and `name' as values are assigned to them at
> first use: `wait' gets its value from kdbus_reply_find(), `name' is set
> by kdbus_pin_dst().
>
> Remove redundant code. goto isn't required as we reached exit point
> already. Setting `ret' to zero is unnecessary because
> kdbus_conn_entry_insert() returns 0 on success.
>
> Signed-off-by: Sergei Zviagintsev 
> ---
>  ipc/kdbus/connection.c | 10 ++
>  1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c
> index a4d7414ecaea..db49f282a1bf 100644
> --- a/ipc/kdbus/connection.c
> +++ b/ipc/kdbus/connection.c
> @@ -1159,8 +1159,8 @@ static struct kdbus_reply *kdbus_conn_call(struct 
> kdbus_conn *src,
>ktime_t exp)
>  {
> const struct kdbus_msg *msg = staging->msg;
> -   struct kdbus_name_entry *name = NULL;
> -   struct kdbus_reply *wait = NULL;
> +   struct kdbus_name_entry *name;
> +   struct kdbus_reply *wait;
> struct kdbus_conn *dst = NULL;
> struct kdbus_bus *bus = src->ep->bus;
> int ret;
> @@ -1212,14 +1212,8 @@ static struct kdbus_reply *kdbus_conn_call(struct 
> kdbus_conn *src,
> }
>
> /* send message */
> -
> kdbus_bus_eavesdrop(bus, src, staging);
> -
> ret = kdbus_conn_entry_insert(src, dst, staging, wait, name);
> -   if (ret < 0)
> -   goto exit;
> -
> -   ret = 0;

Who says kdbus_conn_entry_insert() returns 0? It might be >0. I'd
prefer the explicit check.

Thanks
David

>
>  exit:
> up_read(&bus->name_registry->rwlock);
> --
> 1.8.3.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 21/44] kdbus: Cleanup tests in kdbus_cmd_send()

2015-10-08 Thread David Herrmann
Hi

On Thu, Oct 8, 2015 at 1:31 PM, Sergei Zviagintsev  wrote:
> Rearrange tests a little to make them look cleaner.

What's wrong with 'else if'?

Thanks
David

> Signed-off-by: Sergei Zviagintsev 
> ---
>  ipc/kdbus/connection.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c
> index 93da7f539f74..a4d7414ecaea 100644
> --- a/ipc/kdbus/connection.c
> +++ b/ipc/kdbus/connection.c
> @@ -1977,7 +1977,7 @@ int kdbus_cmd_send(struct kdbus_conn *conn, struct file 
> *f, void __user *argp)
> ret = kdbus_args_parse(&args, argp, &cmd);
> if (ret < 0)
> goto exit;
> -   else if (ret > 0 && !cmd->msg_address) /* negotiation without msg */
> +   if (ret > 0 && !cmd->msg_address) /* negotiation without msg */
> goto exit;
>
> ret2 = kdbus_args_parse_msg(&msg_args, KDBUS_PTR(cmd->msg_address),
> @@ -1985,10 +1985,11 @@ int kdbus_cmd_send(struct kdbus_conn *conn, struct 
> file *f, void __user *argp)
> if (ret2 < 0) { /* cannot parse message */
> ret = ret2;
> goto exit;
> -   } else if (ret2 > 0 && !ret) { /* msg-negot implies cmd-negot */
> -   ret = -EINVAL;
> +   }
> +   if (ret > 0) /* negotiation */
> goto exit;
> -   } else if (ret > 0) { /* negotiation */
> +   if (ret2 > 0) { /* msg-negot implies cmd-negot */
> +   ret = -EINVAL;
> goto exit;
> }
>
> --
> 1.8.3.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 18/44] kdbus: Add var initialization to kdbus_conn_entry_insert()

2015-10-08 Thread David Herrmann
Hi

On Thu, Oct 8, 2015 at 1:31 PM, Sergei Zviagintsev  wrote:
> Assign zero to `ret' in the beginning of function instead of doing it
> in the end.
>
> Signed-off-by: Sergei Zviagintsev 
> ---
>  ipc/kdbus/connection.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c
> index 4f3cd370ecd9..185ed3ba1bce 100644
> --- a/ipc/kdbus/connection.c
> +++ b/ipc/kdbus/connection.c
> @@ -889,7 +889,7 @@ int kdbus_conn_entry_insert(struct kdbus_conn *conn_src,
> const struct kdbus_name_entry *name)
>  {
> struct kdbus_queue_entry *entry;
> -   int ret;
> +   int ret = 0;
>
> kdbus_conn_lock2(conn_src, conn_dst);
>
> @@ -916,8 +916,6 @@ int kdbus_conn_entry_insert(struct kdbus_conn *conn_src,
> kdbus_queue_entry_enqueue(entry, reply);
> wake_up_interruptible(&conn_dst->wait);
>
> -   ret = 0;
> -

Not a big fan of this. It makes it less obvious, and this style is
wrong in several cases (but not here). We often only check for "ret <
0", but generally want >0 to be turned into 0 on return.

It does not matter in this specific case, but I prefer making return
codes explicit, rather than relying on a previous initialization to be
still valid.

What's your rationale here?

Thanks
David

>  exit_unlock:
> kdbus_conn_unlock2(conn_src, conn_dst);
> return ret;
> --
> 1.8.3.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 15/44] kdbus: Simplify bitwise expression in kdbus_meta_get_mask()

2015-10-08 Thread David Herrmann
Hi

On Thu, Oct 8, 2015 at 1:31 PM, Sergei Zviagintsev  wrote:
> Replace the expression with more concise and readable equivalent. It can
> be proven by opening parentheses:
>
> r & ~((p | i) & r) == r & (~(p | i) | ~r) ==
> (r & ~(p | i)) | (r & ~r) == r & ~(p | i) == r & ~p & ~i

But why? The current code follows the description, and does exactly
the same. It shows that it merges the "provided" and "implied" masks,
and then extracts the flags that are missing compared to the required
mask.

I cannot follow why your code is more obvious?
David

> Signed-off-by: Sergei Zviagintsev 
> ---
>  ipc/kdbus/metadata.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/ipc/kdbus/metadata.c b/ipc/kdbus/metadata.c
> index 788b4d9c7ecb..61215a078359 100644
> --- a/ipc/kdbus/metadata.c
> +++ b/ipc/kdbus/metadata.c
> @@ -1321,7 +1321,7 @@ static u64 kdbus_meta_get_mask(struct pid *prv_pid, u64 
> prv_mask,
>  * the sender, but still requested by the receiver. If any are left,
>  * perform rather expensive /proc access checks for them.
>  */
> -   missing = req_mask & ~((prv_mask | impl_mask) & req_mask);
> +   missing = req_mask & ~prv_mask & ~impl_mask;
> if (missing)
> proc_mask = kdbus_meta_proc_mask(prv_pid, req_pid, req_cred,
>  missing);
> --
> 1.8.3.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   >