Re: [PATCH v2] HID: uhid: forbid UHID_CREATE under KERNEL_DS or elevated privileges
Hey On Mon, Nov 19, 2018 at 1:52 PM Jiri Kosina wrote: > > > [ David added to CC ] > > On Wed, 14 Nov 2018, Eric Biggers wrote: > > > From: Eric Biggers > > > > When a UHID_CREATE command is written to the uhid char device, a > > copy_from_user() is done from a user pointer embedded in the command. > > When the address limit is KERNEL_DS, e.g. as is the case during > > sys_sendfile(), this can read from kernel memory. Alternatively, > > information can be leaked from a setuid binary that is tricked to write > > to the file descriptor. Therefore, forbid UHID_CREATE in these cases. > > > > No other commands in uhid_char_write() are affected by this bug and > > UHID_CREATE is marked as "obsolete", so apply the restriction to > > UHID_CREATE only rather than to uhid_char_write() entirely. > > > > Thanks to Dmitry Vyukov for adding uhid definitions to syzkaller and to > > Jann Horn for commit 9da3f2b740544 ("x86/fault: BUG() when uaccess > > helpers fault on kernel addresses"), allowing this bug to be found. > > > > Reported-by: syzbot+72473edc9bf4eb1c6...@syzkaller.appspotmail.com > > Fixes: d365c6cfd337 ("HID: uhid: add UHID_CREATE and UHID_DESTROY events") > > Cc: # v3.6+ > > Cc: Jann Horn > > Cc: Andy Lutomirski > > Signed-off-by: Eric Biggers > > Thanks for the patch. I however believe the fix below is more generic, and > would prefer taking that one in case noone sees any major flaw in that > I've overlooked. Thanks. As Andy rightly pointed out, the credentials check is actually needed. The scenario here is using a uhid-fd as stdout when executing a setuid-program. This will possibly end up reading arbitrary memory from the setuid program and use it as input for the hid-descriptor. To my knowledge, this is a rather small attack surface. UHID is usually a privileged interface, which in itself already gives you ridiculous privileges. Furthermore, it only allows read-access if you happen to be able to craft the message the setuid program writes (which must be a valid user-space pointer, pointing to a valid hid descriptor). But people have been creative in the past, and they will find a way to use this. So I do think Eric's patch here is the way to go. Lastly, this only guards UHID_CREATE, which is already a deprecated interface for several years. I don't think we can drop it any time soon, but at least the other uhid interfaces should be safe. Thanks David
Re: [PATCH v2] HID: uhid: forbid UHID_CREATE under KERNEL_DS or elevated privileges
Hi On Wed, Nov 14, 2018 at 10:55 PM Eric Biggers wrote: > From: Eric Biggers > > When a UHID_CREATE command is written to the uhid char device, a > copy_from_user() is done from a user pointer embedded in the command. > When the address limit is KERNEL_DS, e.g. as is the case during > sys_sendfile(), this can read from kernel memory. Alternatively, > information can be leaked from a setuid binary that is tricked to write > to the file descriptor. Therefore, forbid UHID_CREATE in these cases. > > No other commands in uhid_char_write() are affected by this bug and > UHID_CREATE is marked as "obsolete", so apply the restriction to > UHID_CREATE only rather than to uhid_char_write() entirely. > > Thanks to Dmitry Vyukov for adding uhid definitions to syzkaller and to > Jann Horn for commit 9da3f2b740544 ("x86/fault: BUG() when uaccess > helpers fault on kernel addresses"), allowing this bug to be found. > > Reported-by: syzbot+72473edc9bf4eb1c6...@syzkaller.appspotmail.com > Fixes: d365c6cfd337 ("HID: uhid: add UHID_CREATE and UHID_DESTROY events") > Cc: # v3.6+ > Cc: Jann Horn > Cc: Andy Lutomirski > Signed-off-by: Eric Biggers > --- > drivers/hid/uhid.c | 12 > 1 file changed, 12 insertions(+) > > diff --git a/drivers/hid/uhid.c b/drivers/hid/uhid.c > index 3c55073136064..051639c09f728 100644 > --- a/drivers/hid/uhid.c > +++ b/drivers/hid/uhid.c > @@ -12,6 +12,7 @@ > > #include > #include > +#include > #include > #include > #include > @@ -722,6 +723,17 @@ static ssize_t uhid_char_write(struct file *file, const > char __user *buffer, > > switch (uhid->input_buf.type) { > case UHID_CREATE: > + /* > +* 'struct uhid_create_req' contains a __user pointer which is > +* copied from, so it's unsafe to allow this with elevated > +* privileges (e.g. from a setuid binary) or via > kernel_write(). > +*/ > + if (file->f_cred != current_cred() || uaccess_kernel()) { I think `uaccess_kernel()` would be enough. UHID does not check any credentials. If you believe this should be there nevertheless, feel free to keep it. Either way: Acked-by: David Herrmann Thanks David > + pr_err_once("UHID_CREATE from different security > context by process %d (%s), this is not allowed.\n", > + task_tgid_vnr(current), current->comm); > + ret = -EACCES; > + goto unlock; > + } > ret = uhid_dev_create(uhid, &uhid->input_buf); > break; > case UHID_CREATE2: > -- > 2.19.1.930.g4563a0d9d0-goog >
Re: [PATCH v2] HID: uhid: forbid UHID_CREATE under KERNEL_DS or elevated privileges
Hey On Thu, Nov 15, 2018 at 9:14 AM Benjamin Tissoires wrote: > > On Thu, Nov 15, 2018 at 12:20 AM Dmitry Torokhov wrote: > > > I think it's best not to make > > > assumptions about how the interface will be used and to be consistent > > > with how > > > other ->write() methods in the kernel handle the misfeature where a __user > > > pointer in the write() or read() payload is dereferenced. > > > > I can see that you might want to check credentials, etc, if interface > > can be accessed by unprivileged process, however is it a big no no for > > uhid/userio/uinput devices. > > Yep, any sane distribution would restrict the permissions of > uhid/userio/uinput to only be accessed by root. If that ever changes, > there is already an issue with the system and it was compromised > either by a terribly dizzy sysadmin. UHID is safe to be used by a non-root user. This does not imply that you should open up access to the world, but you are free to have a dedicated group or user with access to uhid. I agree that in most common desktop-scenarios you should not grant world-access to it, though. > > > > > Temporarily switching > > > to USER_DS would only avoid one of the two problems. > > > > So because of the above there is only one problem. If your system > > opened access to uhid to random processes you have much bigger > > problems than exposing some data from a suid binary. You can spam "rm > > -rf .; rm -rf /" though uhid if there is interactive session > > somewhere. > > > > > > > > Do you think the proposed restrictions would actually break anything? > > > > It would break if someone uses UHID_CREATE with sendpage. I do not > > know if anyone does. If we were certain there are no users we'd simply > > removed UHID_CREATE altogether. > > AFAICT, there are 2 users of uhid: > - bluez for BLE devices (using HID over GATT) > - hid-replay for debugging. There are several more (eg., android bt-broadcom), and UHID_CREATE is actively used. Dropping support for it will break these use-cases. Thanks David
Re: [PATCH] Revert "HID: uhid: use strlcpy() instead of strncpy()"
Hi On Thu, Nov 15, 2018 at 12:09 AM Kees Cook wrote: > On Wed, Nov 14, 2018 at 9:40 AM, Laura Abbott wrote: [...] > > Can we switch to strscpy instead? This will quiet gcc and avoid the > > issues with strlcpy. > > Yes please: it looks like these strings are expected to be NUL > terminated, so strscpy() without the "- 1" and min() logic would be > the correct solution here. "the correct solution"? To my knowledge the original code was correct as well. Am I missing something? >If @hid is already zero, then this would > just be: > >strscpy(hid->name, ev->u.create2.name, sizeof(hid->name)); >strscpy(hid->phys, ev->u.create2.phys, sizeof(hid->phys)); >strscpy(hid->uniq, ev->u.create2.uniq, sizeof(hid->uniq)); > > If they are NOT NUL terminated, then keep using strncpy() but mark the > fields in the struct with the __nonstring attribute. They are supposed to be NUL terminated, but for compatibility reasons we allow them to be not. So I don't think your proposal is safe. Thanks David
[PATCH] Revert "HID: uhid: use strlcpy() instead of strncpy()"
This reverts commit 336fd4f5f25157e9e8bd50e898a1bbcd99eaea46. Please note that `strlcpy()` does *NOT* do what you think it does. strlcpy() *ALWAYS* reads the full input string, regardless of the 'length' parameter. That is, if the input is not zero-terminated, strlcpy() will *READ* beyond input boundaries. It does this, because it always returns the size it *would* copy if the target was big enough, not the truncated size it actually copied. The original code was perfectly fine. The hid device is zero-initialized and the strncpy() functions copied up to n-1 characters. The result is always zero-terminated this way. This is the third time someone tried to replace strncpy with strlcpy in this function, and gets it wrong. I now added a comment that should at least make people reconsider. Signed-off-by: David Herrmann --- drivers/hid/uhid.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/hid/uhid.c b/drivers/hid/uhid.c index fefedc0b4dc6..0dfdd0ac7120 100644 --- a/drivers/hid/uhid.c +++ b/drivers/hid/uhid.c @@ -496,12 +496,13 @@ static int uhid_dev_create2(struct uhid_device *uhid, goto err_free; } - len = min(sizeof(hid->name), sizeof(ev->u.create2.name)); - strlcpy(hid->name, ev->u.create2.name, len); - len = min(sizeof(hid->phys), sizeof(ev->u.create2.phys)); - strlcpy(hid->phys, ev->u.create2.phys, len); - len = min(sizeof(hid->uniq), sizeof(ev->u.create2.uniq)); - strlcpy(hid->uniq, ev->u.create2.uniq, len); + /* @hid is zero-initialized, strncpy() is correct, strlcpy() not */ + len = min(sizeof(hid->name), sizeof(ev->u.create2.name)) - 1; + strncpy(hid->name, ev->u.create2.name, len); + len = min(sizeof(hid->phys), sizeof(ev->u.create2.phys)) - 1; + strncpy(hid->phys, ev->u.create2.phys, len); + len = min(sizeof(hid->uniq), sizeof(ev->u.create2.uniq)) - 1; + strncpy(hid->uniq, ev->u.create2.uniq, len); hid->ll_driver = &uhid_hid_driver; hid->bus = ev->u.create2.bus; -- 2.19.1
[PATCH] HID: uhid: prevent splice(2)
The kernel has a default implementation of splice(2) for writing from a pipe into an arbitrary file. This behavior can be overriden by providing an f_op.splice_write() callback. Unfortunately, the default implementation of splice_write() takes page by page from the source pipe, calls kmap() and passes the mapped page as kernel-address to f_op.write(). Thus, it uses standard write(2) to implement splice(2). However, since the page is kernel-mapped, they have to `set_fs(get_ds())`. This is mostly fine, but UHID takes command-streams through write(2), and thus it might interpret the data taken as pointers. If called with KERNEL_DS, you can trick UHID to allow kernel-space pointers as well. As a simple fix, prevent splice(2) on UHID. It is unsecure, but it is also non-functional. We need a linear mapping of the input in UHID, so chunked input from splice(2) makes no sense, anyway. Reported-by: syzbot+72473edc9bf4eb1c6...@syzkaller.appspotmail.com Signed-off-by: David Herrmann --- drivers/hid/uhid.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/hid/uhid.c b/drivers/hid/uhid.c index 3c5507313606..fefedc0b4dc6 100644 --- a/drivers/hid/uhid.c +++ b/drivers/hid/uhid.c @@ -753,6 +753,15 @@ static ssize_t uhid_char_write(struct file *file, const char __user *buffer, return ret ? ret : count; } +static ssize_t uhid_char_splice_write(struct pipe_inode_info *pipe, + struct file *out, + loff_t *ppos, + size_t len, + unsigned int flags) +{ + return -EOPNOTSUPP; +} + static __poll_t uhid_char_poll(struct file *file, poll_table *wait) { struct uhid_device *uhid = file->private_data; @@ -771,6 +780,7 @@ static const struct file_operations uhid_fops = { .release= uhid_char_release, .read = uhid_char_read, .write = uhid_char_write, + .splice_write = uhid_char_splice_write, .poll = uhid_char_poll, .llseek = no_llseek, }; -- 2.19.1
Re: BUG: GPF in non-whitelisted uaccess (non-canonical address?)
Hey On Wed, Nov 14, 2018 at 1:25 AM syzbot wrote: > syzbot has found a reproducer for the following crash on: > > HEAD commit:ccda4af0f4b9 Linux 4.20-rc2 > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=13b4e77b40 > kernel config: https://syzkaller.appspot.com/x/.config?x=4a0a89f12ca9b0f5 > dashboard link: https://syzkaller.appspot.com/bug?extid=72473edc9bf4eb1c6556 > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1646a22540 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=108a653340 > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+72473edc9bf4eb1c6...@syzkaller.appspotmail.com > [...] > BUG: GPF in non-whitelisted uaccess (non-canonical address?) This uses sendpage(2) to feed data from a file into a uhid chardev. The default behavior of the kernel is to create a temporary pipe, then splice from the file into the pipe, and then splice again from the pipe into uhid. The kernel provides default implementations for splicing between files and any other file. The default implementation of `.splice_write()` uses kmap() to map the page from the pipe and then uses the __kernel_write() (which uses .f_op->write()) to push the data into the target file. The problem is, __kernel_write() sets the address-space to KERNEL_DS `set_fs(get_ds())`, thus granting the UHID request access to kernel memory. I see several ways to fix that, the most simple solution is to simply prevent splice/sendpage on uhid (by setting f_op.splice_write to a dummy). Alternatively, we can implement a proper splice helper that takes the page directly, rather than through the __kernel_write() default implementation. Thanks David
Re: [PATCH] HID: uhid: fix a missing-check bug
Hey On Sat, May 5, 2018 at 6:17 AM, Wenwen Wang wrote: > In uhid_event_from_user(), if it is in_compat_syscall(), the 'type' of the > event is first fetched from the 'buffer' in userspace and checked. If the > 'type' is UHID_CREATE, it is a messed up request with compat pointer, which > could be more than 256 bytes, so it is better allocated from the heap, as > mentioned in the comment. Its fields are then prepared one by one instead > of using a whole copy. For all other cases, the event object is copied > directly from user space. In other words, based on the 'type', the memory > size and structure of the event object vary. > > Given that the 'buffer' resides in userspace, a malicious userspace process > can race to change the 'type' between the two copies, which will cause > inconsistency issues, potentially security issues. Plus, various operations > such as uhid_dev_destroy() and uhid_dev_input() are performed based on > 'type' in function uhid_char_write(). If 'type' is modified by user, there > could be some issues such as uninitialized uses. > > To fix this problem, we need to recheck the type after the second fetch to > make sure it is not UHID_CREATE. > > Signed-off-by: Wenwen Wang > --- > drivers/hid/uhid.c | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/hid/uhid.c b/drivers/hid/uhid.c > index 3c55073..0220385 100644 > --- a/drivers/hid/uhid.c > +++ b/drivers/hid/uhid.c > @@ -447,11 +447,17 @@ static int uhid_event_from_user(const char __user > *buffer, size_t len, > event->u.create.country = compat->country; > > kfree(compat); > - return 0; > + } else { > + if (copy_from_user(event, buffer, > + min(len, sizeof(*event > + return -EFAULT; > + if (event->type == UHID_CREATE) > + return -EINVAL; You are correct, the race against malicious/buggy user-space exists. However, it is harmless. Lets assume user-space rewrites the input, all that happens is that we end up with garbage in the kernel. However, input-validation is done *after* this function, hence everything will work just fine. If user-space rewrites input-buffers, they better be aware that this might mean random garbage is pushed into the kernel. IOW, this race simply allows user-space to circumvent the "compat conversion". But if user-space wants that... let them. No point in protecting them from doing something stupid. Thanks David > } > - /* All others can be copied directly */ > + return 0; > } > > + /* Others can be copied directly */ > if (copy_from_user(event, buffer, min(len, sizeof(*event > return -EFAULT; > > -- > 2.7.4 >
[PATCH v2 0/4] Introduce LSM-hook for socketpair(2)
Hi This is v2 of the socketpair(2) LSM hook introduction. Changes since v1 are: - Added ACKs from previous series. - Moved the hook into generic socketpair(2) handling. The hook is now called security_socket_socketpair(), just like the other hooks on the socket layer. There is no AF_UNIX specific code, anymore. - Added SMACK support. - Still *NO* AppArmor support, since upstream AppArmor lacks SO_PEERSEC support and requires downstream patches (in particular, the apparmor_unix_stream_connect() function is mostly a stub and was never synced with Ubuntu downstream modifications). Old cover letter follows (only trivial changes). Thanks David This series adds a new LSM hook for the socketpair(2) syscall. The idea is to allow SO_PEERSEC to be called on AF_UNIX sockets created via socketpair(2), and return the same information as if you emulated socketpair(2) via a temporary listener socket. Right now SO_PEERSEC will return the unlabeled credentials for a socketpair, rather than the actual credentials of the creating process. A simple call to: socketpair(AF_UNIX, SOCK_STREAM, 0, out); can be emulated via a temporary listener socket bound to a unique, random name in the abstract namespace. By connecting to this listener socket, accept(2) will return the second part of the pair. If SO_PEERSEC is queried on these, the correct credentials of the creating process are returned. A simple comparison between the behavior of SO_PEERSEC on socketpair(2) and an emulated socketpair is included in the dbus-broker test-suite [1]. This patch series tries to close this gap and makes both behave the same. A new LSM-hook is added which allows LSMs to cache the correct peer information on newly created socket-pairs. Apart from fixing this behavioral difference, the dbus-broker project actually needs to query the credentials of socketpairs, and currently must resort to racy procfs(2) queries to get the LSM credentials of its controller socket. Several parts of the dbus-broker project allow you to pass in a socket during execve(2), which will be used by the child process to accept control-commands from its parent. One natural way to create this communication channel is to use socketpair(2). However, right now SO_PEERSEC does not return any useful information, hence, the child-process would need other means to retrieve this information. By avoiding socketpair(2) and using the hacky-emulated version, this is not an issue. There was a previous discussion on this matter [2] roughly a year ago. Back then there was the suspicion that proper SO_PEERSEC would confuse applications. However, we could not find any evidence backing this suspicion. Furthermore, we now actually see the contrary. Lack of SO_PEERSEC makes it a hassle to use socketpairs with LSM credentials. Hence, we propose to implement full SO_PEERSEC for socketpairs. Thanks David [1] https://github.com/bus1/dbus-broker/blob/master/src/util/test-peersec.c [2] https://www.spinics.net/lists/selinux/msg22674.html David Herrmann (3): security: add hook for socketpair() net: hook socketpair() into LSM selinux: provide socketpair callback Tom Gundersen (1): smack: provide socketpair callback include/linux/lsm_hooks.h | 7 +++ include/linux/security.h | 7 +++ net/socket.c | 7 +++ security/security.c| 6 ++ security/selinux/hooks.c | 13 + security/smack/smack_lsm.c | 22 ++ 6 files changed, 62 insertions(+) -- 2.17.0
[PATCH v2 3/4] selinux: provide socketpair callback
Make sure to implement the new socketpair callback so the SO_PEERSEC call on socketpair(2)s will return correct information. Acked-by: Serge Hallyn Acked-by: Stephen Smalley Signed-off-by: Tom Gundersen Signed-off-by: David Herrmann --- security/selinux/hooks.c | 13 + 1 file changed, 13 insertions(+) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 4cafe6a19167..02ebd1585eaf 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -4569,6 +4569,18 @@ static int selinux_socket_post_create(struct socket *sock, int family, return err; } +static int selinux_socket_socketpair(struct socket *socka, +struct socket *sockb) +{ + struct sk_security_struct *sksec_a = socka->sk->sk_security; + struct sk_security_struct *sksec_b = sockb->sk->sk_security; + + sksec_a->peer_sid = sksec_b->sid; + sksec_b->peer_sid = sksec_a->sid; + + return 0; +} + /* Range of port numbers used to automatically bind. Need to determine whether we should perform a name_bind permission check between the socket and the port number. */ @@ -6999,6 +7011,7 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(socket_create, selinux_socket_create), LSM_HOOK_INIT(socket_post_create, selinux_socket_post_create), + LSM_HOOK_INIT(socket_socketpair, selinux_socket_socketpair), LSM_HOOK_INIT(socket_bind, selinux_socket_bind), LSM_HOOK_INIT(socket_connect, selinux_socket_connect), LSM_HOOK_INIT(socket_listen, selinux_socket_listen), -- 2.17.0
[PATCH v2 4/4] smack: provide socketpair callback
From: Tom Gundersen Make sure to implement the new socketpair callback so the SO_PEERSEC call on socketpair(2)s will return correct information. Signed-off-by: Tom Gundersen Signed-off-by: David Herrmann --- security/smack/smack_lsm.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index 0b414836bebd..dcb976f98df2 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -2842,6 +2842,27 @@ static int smack_socket_post_create(struct socket *sock, int family, return smack_netlabel(sock->sk, SMACK_CIPSO_SOCKET); } +/** + * smack_socket_socketpair - create socket pair + * @socka: one socket + * @sockb: another socket + * + * Cross reference the peer labels for SO_PEERSEC + * + * Returns 0 on success, and error code otherwise + */ +static int smack_socket_socketpair(struct socket *socka, + struct socket *sockb) +{ + struct socket_smack *asp = socka->sk->sk_security; + struct socket_smack *bsp = sockb->sk->sk_security; + + asp->smk_packet = bsp->smk_out; + bsp->smk_packet = asp->smk_out; + + return 0; +} + #ifdef SMACK_IPV6_PORT_LABELING /** * smack_socket_bind - record port binding information. @@ -4724,6 +4745,7 @@ static struct security_hook_list smack_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(unix_may_send, smack_unix_may_send), LSM_HOOK_INIT(socket_post_create, smack_socket_post_create), + LSM_HOOK_INIT(socket_socketpair, smack_socket_socketpair), #ifdef SMACK_IPV6_PORT_LABELING LSM_HOOK_INIT(socket_bind, smack_socket_bind), #endif -- 2.17.0
[PATCH v2 1/4] security: add hook for socketpair()
Right now the LSM labels for socketpairs are always uninitialized, since there is no security hook for the socketpair() syscall. This patch adds the required hooks so LSMs can properly label socketpairs. This allows SO_PEERSEC to return useful information on those sockets. Note that the behavior of socketpair() can be emulated by creating a listener socket, connecting to it, and then discarding the initial listener socket. With this workaround, SO_PEERSEC would return the caller's security context. However, with socketpair(), the uninitialized context is returned unconditionally. This is unexpected and makes socketpair() less useful in situations where the security context is crucial to the application. With the new socketpair-hook this disparity can be solved by making socketpair() return the expected security context. Acked-by: Serge Hallyn Signed-off-by: Tom Gundersen Signed-off-by: David Herrmann --- include/linux/lsm_hooks.h | 7 +++ include/linux/security.h | 7 +++ security/security.c | 6 ++ 3 files changed, 20 insertions(+) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 9d0b286f3dba..8f1131c8dd54 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -757,6 +757,11 @@ * @type contains the requested communications type. * @protocol contains the requested protocol. * @kern set to 1 if a kernel socket. + * @socket_socketpair: + * Check permissions before creating a fresh pair of sockets. + * @socka contains the first socket structure. + * @sockb contains the second socket structure. + * Return 0 if permission is granted and the connection was established. * @socket_bind: * Check permission before socket protocol layer bind operation is * performed and the socket @sock is bound to the address specified in the @@ -1656,6 +1661,7 @@ union security_list_options { int (*socket_create)(int family, int type, int protocol, int kern); int (*socket_post_create)(struct socket *sock, int family, int type, int protocol, int kern); + int (*socket_socketpair)(struct socket *socka, struct socket *sockb); int (*socket_bind)(struct socket *sock, struct sockaddr *address, int addrlen); int (*socket_connect)(struct socket *sock, struct sockaddr *address, @@ -1922,6 +1928,7 @@ struct security_hook_heads { struct hlist_head unix_may_send; struct hlist_head socket_create; struct hlist_head socket_post_create; + struct hlist_head socket_socketpair; struct hlist_head socket_bind; struct hlist_head socket_connect; struct hlist_head socket_listen; diff --git a/include/linux/security.h b/include/linux/security.h index 200920f521a1..4ff3ba457e56 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1191,6 +1191,7 @@ int security_unix_may_send(struct socket *sock, struct socket *other); int security_socket_create(int family, int type, int protocol, int kern); int security_socket_post_create(struct socket *sock, int family, int type, int protocol, int kern); +int security_socket_socketpair(struct socket *socka, struct socket *sockb); int security_socket_bind(struct socket *sock, struct sockaddr *address, int addrlen); int security_socket_connect(struct socket *sock, struct sockaddr *address, int addrlen); int security_socket_listen(struct socket *sock, int backlog); @@ -1262,6 +1263,12 @@ static inline int security_socket_post_create(struct socket *sock, return 0; } +static inline int security_socket_socketpair(struct socket *socka, +struct socket *sockb) +{ + return 0; +} + static inline int security_socket_bind(struct socket *sock, struct sockaddr *address, int addrlen) diff --git a/security/security.c b/security/security.c index 7bc2fde023a7..68f46d849abe 100644 --- a/security/security.c +++ b/security/security.c @@ -1358,6 +1358,12 @@ int security_socket_post_create(struct socket *sock, int family, protocol, kern); } +int security_socket_socketpair(struct socket *socka, struct socket *sockb) +{ + return call_int_hook(socket_socketpair, 0, socka, sockb); +} +EXPORT_SYMBOL(security_socket_socketpair); + int security_socket_bind(struct socket *sock, struct sockaddr *address, int addrlen) { return call_int_hook(socket_bind, 0, sock, address, addrlen); -- 2.17.0
[PATCH v2 2/4] net: hook socketpair() into LSM
Use the newly created LSM-hook for socketpair(). The default hook return-value is 0, so behavior stays the same unless LSMs start using this hook. Acked-by: Serge Hallyn Signed-off-by: Tom Gundersen Signed-off-by: David Herrmann --- net/socket.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/net/socket.c b/net/socket.c index f10f1d947c78..667a7b397134 100644 --- a/net/socket.c +++ b/net/socket.c @@ -1420,6 +1420,13 @@ int __sys_socketpair(int family, int type, int protocol, int __user *usockvec) goto out; } + err = security_socket_socketpair(sock1, sock2); + if (unlikely(err)) { + sock_release(sock2); + sock_release(sock1); + goto out; + } + err = sock1->ops->socketpair(sock1, sock2); if (unlikely(err < 0)) { sock_release(sock2); -- 2.17.0
Re: [PATCH 0/3] Introduce LSM-hook for socketpair(2)
Hey On Wed, Apr 25, 2018 at 9:02 PM, James Morris wrote: > On Wed, 25 Apr 2018, Paul Moore wrote: > >> On Wed, Apr 25, 2018 at 2:44 PM, James Morris wrote: >> > On Mon, 23 Apr 2018, David Herrmann wrote: >> >> This patch series tries to close this gap and makes both behave the >> >> same. A new LSM-hook is added which allows LSMs to cache the correct >> >> peer information on newly created socket-pairs. >> > >> > Looks okay to me. >> > >> > Once it's respun with the Smack backend and maybe the hook name change, >> > I'll merge this unless DaveM wants it to go in via his networking tree. >> >> Note my objection to the hook placement in patch 2/3; I think we >> should move the hook out of the AF_UNIX layer and up into the socket >> layer. > > I vote for this as it maintains the intended abstraction of the socket > API. Sounds good, I changed it. I will send v2 shortly. Thanks David
[PATCH 1/3] security: add hook for socketpair(AF_UNIX, ...)
Right now the LSM labels for socketpairs are always uninitialized, since there is no security hook for the socketpair() syscall. This patch adds the required hooks so LSMs can properly label socketpairs. This allows SO_PEERSEC to return useful information on those sockets. Note that the behavior of socketpair() can be emulated by creating a listener socket, connecting to it, and then discarding the initial listener socket. With this workaround, SO_PEERSEC would return the caller's security context. However, with socketpair(), the uninitialized context is returned unconditionally. This is unexpected and makes socketpair() less useful in situations where the security context is crucial to the application. With the new socketpair-hook this disparity can be solved by making socketpair() return the expected security context. Signed-off-by: David Herrmann --- include/linux/lsm_hooks.h | 8 include/linux/security.h | 7 +++ security/security.c | 6 ++ 3 files changed, 21 insertions(+) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 9d0b286f3dba..2a23c75c1541 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -717,6 +717,12 @@ * @other contains the peer sock structure. * @newsk contains the new sock structure. * Return 0 if permission is granted. + * @unix_stream_socketpair: + * Check permissions before establishing a Unix domain stream connection + * for a fresh pair of sockets. + * @socka contains the first sock structure. + * @sockb contains the second sock structure. + * Return 0 if permission is granted and the connection was established. * @unix_may_send: * Check permissions before connecting or sending datagrams from @sock to * @other. @@ -1651,6 +1657,7 @@ union security_list_options { #ifdef CONFIG_SECURITY_NETWORK int (*unix_stream_connect)(struct sock *sock, struct sock *other, struct sock *newsk); + int (*unix_stream_socketpair)(struct sock *socka, struct sock *sockb); int (*unix_may_send)(struct socket *sock, struct socket *other); int (*socket_create)(int family, int type, int protocol, int kern); @@ -1919,6 +1926,7 @@ struct security_hook_heads { struct hlist_head inode_getsecctx; #ifdef CONFIG_SECURITY_NETWORK struct hlist_head unix_stream_connect; + struct hlist_head unix_stream_socketpair; struct hlist_head unix_may_send; struct hlist_head socket_create; struct hlist_head socket_post_create; diff --git a/include/linux/security.h b/include/linux/security.h index 200920f521a1..be275deeda10 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1187,6 +1187,7 @@ static inline int security_inode_getsecctx(struct inode *inode, void **ctx, u32 #ifdef CONFIG_SECURITY_NETWORK int security_unix_stream_connect(struct sock *sock, struct sock *other, struct sock *newsk); +int security_unix_stream_socketpair(struct sock *socka, struct sock *sockb); int security_unix_may_send(struct socket *sock, struct socket *other); int security_socket_create(int family, int type, int protocol, int kern); int security_socket_post_create(struct socket *sock, int family, @@ -1242,6 +1243,12 @@ static inline int security_unix_stream_connect(struct sock *sock, return 0; } +static inline int security_unix_stream_socketpair(struct sock *socka, + struct sock *sockb) +{ + return 0; +} + static inline int security_unix_may_send(struct socket *sock, struct socket *other) { diff --git a/security/security.c b/security/security.c index 7bc2fde023a7..3dfd374e84e5 100644 --- a/security/security.c +++ b/security/security.c @@ -1340,6 +1340,12 @@ int security_unix_stream_connect(struct sock *sock, struct sock *other, struct s } EXPORT_SYMBOL(security_unix_stream_connect); +int security_unix_stream_socketpair(struct sock *socka, struct sock *sockb) +{ + return call_int_hook(unix_stream_socketpair, 0, socka, sockb); +} +EXPORT_SYMBOL(security_unix_stream_socketpair); + int security_unix_may_send(struct socket *sock, struct socket *other) { return call_int_hook(unix_may_send, 0, sock, other); -- 2.17.0
[PATCH 2/3] net/unix: hook unix_socketpair() into LSM
Use the newly created LSM-hook for unix_socketpair(). The default hook return-value is 0, so behavior stays the same unless LSMs start using this hook. Signed-off-by: David Herrmann --- net/unix/af_unix.c | 5 + 1 file changed, 5 insertions(+) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 68bb70a62afe..bc9705ace9b1 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1371,6 +1371,11 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr, static int unix_socketpair(struct socket *socka, struct socket *sockb) { struct sock *ska = socka->sk, *skb = sockb->sk; + int err; + + err = security_unix_stream_socketpair(ska, skb); + if (err) + return err; /* Join our sockets back to back */ sock_hold(ska); -- 2.17.0
[PATCH 3/3] selinux: provide unix_stream_socketpair callback
Make sure to implement the new unix_stream_socketpair callback so the SO_PEERSEC call on socketpair(2)s will return correct information. Signed-off-by: David Herrmann --- security/selinux/hooks.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 4cafe6a19167..828881d9a41d 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -4905,6 +4905,18 @@ static int selinux_socket_unix_stream_connect(struct sock *sock, return 0; } +static int selinux_socket_unix_stream_socketpair(struct sock *socka, +struct sock *sockb) +{ + struct sk_security_struct *sksec_a = socka->sk_security; + struct sk_security_struct *sksec_b = sockb->sk_security; + + sksec_a->peer_sid = sksec_b->sid; + sksec_b->peer_sid = sksec_a->sid; + + return 0; +} + static int selinux_socket_unix_may_send(struct socket *sock, struct socket *other) { @@ -6995,6 +7007,8 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(inode_getsecctx, selinux_inode_getsecctx), LSM_HOOK_INIT(unix_stream_connect, selinux_socket_unix_stream_connect), + LSM_HOOK_INIT(unix_stream_socketpair, + selinux_socket_unix_stream_socketpair), LSM_HOOK_INIT(unix_may_send, selinux_socket_unix_may_send), LSM_HOOK_INIT(socket_create, selinux_socket_create), -- 2.17.0
[PATCH 0/3] Introduce LSM-hook for socketpair(2)
Hi This series adds a new LSM hook for the socketpair(2) syscall. The idea is to allow SO_PEERSEC to be called on AF_UNIX sockets created via socketpair(2), and return the same information as if you emulated socketpair(2) via a temporary listener socket. Right now SO_PEERSEC will return the unlabeled credentials for a socketpair, rather than the actual credentials of the creating process. A simple call to: socketpair(AF_UNIX, SOCK_STREAM, 0, out); can be emulated via a temporary listener socket bound to a unique, random name in the abstract namespace. By connecting to this listener socket, accept(2) will return the second part of the pair. If SO_PEERSEC is queried on these, the correct credentials of the creating process are returned. A simple comparison between the behavior of SO_PEERSEC on socketpair(2) and an emulated socketpair is included in the dbus-broker test-suite [1]. This patch series tries to close this gap and makes both behave the same. A new LSM-hook is added which allows LSMs to cache the correct peer information on newly created socket-pairs. Apart from fixing this behavioral difference, the dbus-broker project actually needs to query the credentials of socketpairs, and currently must resort to racy procfs(2) queries to get the LSM credentials of its controller socket. Several parts of the dbus-broker project allow you to pass in a socket during execve(2), which will be used by the child process to accept control-commands from its parent. One natural way to create this communication channel is to use socketpair(2). However, right now SO_PEERSEC does not return any useful information, hence, the child-process would need other means to retrieve this information. By avoiding socketpair(2) and using the hacky-emulated version, this is not an issue. There was a previous discussion on this matter [2] roughly a year ago. Back then there was the suspicion that proper SO_PEERSEC would confuse applications. However, we could not find any evidence backing this suspicion. Furthermore, we now actually see the contrary. Lack of SO_PEERSEC makes it a hassle to use socketpairs with LSM credentials. Hence, we propose to implement full SO_PEERSEC for socketpairs. This series only adds SELinux backends, since that is what we need for RHEL. I will gladly extend the other LSMs if needed. Thanks David [1] https://github.com/bus1/dbus-broker/blob/master/src/util/test-peersec.c [2] https://www.spinics.net/lists/selinux/msg22674.html David Herrmann (3): security: add hook for socketpair(AF_UNIX, ...) net/unix: hook unix_socketpair() into LSM selinux: provide unix_stream_socketpair callback include/linux/lsm_hooks.h | 8 include/linux/security.h | 7 +++ net/unix/af_unix.c| 5 + security/security.c | 6 ++ security/selinux/hooks.c | 14 ++ 5 files changed, 40 insertions(+) -- 2.17.0
Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME
Hi On Fri, Apr 20, 2018 at 7:44 AM, Sergey Senozhatsky wrote: > On (04/20/18 06:37), David Herrmann wrote: >> >> I get lots of timer-errors on Arch-Linux booting current master, after >> a suspend/resume cycle. Just a selection of errors I see on resume: > > Hello David, > Any chance you can revert the patches in question and test? I'm running > ARCH (4.17.0-rc1-dbg-00042-gaa03ddd9c434) and suspend/resume cycle does > not trigger any errors. Except for this one > > kernel: do_IRQ: 0.55 No irq handler for vector I can easily reproduce it by sleeping for >5min, so the systemd watchdog timers are triggered. The patches don't revert cleanly, so I didn't look into booting without them, yet. I will try just linking the monotonic clock to the monotonic_active clock later. Also, doesn't this hunk in 72199320d49d need a 'break;': diff --git a/kernel/time/posix-stubs.c b/kernel/time/posix-stubs.c index b258bee13b02..6259dbc0191a 100644 --- a/kernel/time/posix-stubs.c +++ b/kernel/time/posix-stubs.c @@ -73,6 +73,8 @@ int do_clock_gettime(clockid_t which_clock, struct timespec64 *tp) case CLOCK_BOOTTIME: get_monotonic_boottime64(tp); break; + case CLOCK_MONOTONIC_ACTIVE: + ktime_get_active_ts64(tp); default: return -EINVAL; }
Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME
Hey On Tue, Mar 13, 2018 at 7:11 PM, John Stultz wrote: > On Mon, Mar 12, 2018 at 11:36 PM, Ingo Molnar wrote: >> Ok, I have edited all the changelogs accordingly (and also flipped around the >> 'clock MONOTONIC' language to the more readable 'the MONOTONIC clock' >> variant), >> the resulting titles are (in order): >> >> 72199320d49d: timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock >> d6ed449afdb3: timekeeping: Make the MONOTONIC clock behave like the >> BOOTTIME clock >> f2d6fdbfd238: Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior >> d6c7270e913d: timekeeping: Remove boot time specific code >> 7250a4047aa6: posix-timers: Unify MONOTONIC and BOOTTIME clock behavior >> 127bfa5f4342: hrtimer: Unify MONOTONIC and BOOTTIME clock behavior >> 92af4dcb4e1c: tracing: Unify the "boot" and "mono" tracing clocks >> >> I'll push these out after testing. > > I'm still anxious about userspace effects given how much I've seen the > current behavior documented, and wouldn't pushed for this myself (I'm > a worrier), but at least I'm not seeing any failures in initial > testing w/ kselftest so far. I get lots of timer-errors on Arch-Linux booting current master, after a suspend/resume cycle. Just a selection of errors I see on resume: systemd[1]: systemd-journald.service: Main process exited, code=dumped, status=6/ABRT rtkit-daemon[742]: The canary thread is apparently starving. Taking action. systemd[1]: systemd-udevd.service: Watchdog timeout (limit 3min)! systemd[1]: systemd-journald.service: Watchdog timeout (limit 3min)! kernel: e1000e :00:1f.6: Failed to restore TIMINCA clock rate delta: -22 Lots of crashes with SIGABRT due to these. I did not bisect it, but it sounds related to me. Also, user-space uses CLOCK_MONOTONIC for watchdog timers. That is, a process is required to respond to a watchdog-request in a given MONOTONIC time-frame. If this jumps during suspend/resume, watchdogs will fire immediately. I don't see how this can work with the new MONOTONIC behavior? Thanks David
Re: [PATCH] platform/x86: thinkpad_acpi: suppress warning about palm detection
Hi Andy On Fri, Jan 12, 2018 at 2:50 PM, Andy Shevchenko wrote: > On Fri, Jan 12, 2018 at 1:04 PM, David Herrmann wrote: >> This patch prevents the thinkpad_acpi driver from warning about 2 event >> codes returned for keyboard palm-detection. No behavioral changes, >> other than suppressing the warning in the kernel log. The events are >> still forwarded via acpi-netlink channels. >> >> We could, optionally, decide to forward the event through a >> input-switch on the tpacpi input device. However, so far no suitable >> input-code exists, and no similar drivers report such events. Hence, >> leave it an acpi event for now. >> >> Note that the event-codes are named based on empirical studies. On the >> ThinkPad X1 5th Gen the sensor can be found underneath the arrow key. >> > >> Cc: Matthew Thode > > Shouldn't be Suggested-by or even Signed-off-by? The patch is different (Matthew originally suppressed the ACPI event), so I did not copy the sign-off. Please add Suggested-by, if that is an acceptable tag. >> Signed-off-by: David Herrmann > > >> + case TP_HKEY_EV_PALM_DETECTED: >> + case TP_HKEY_EV_PALM_UNDETECTED: > >> + /* palm detected hovering the keyboard, forward to user-space >> +* via netlink for consumption */ > > Comment style is > /* > * Multi line comment. > * This is an example. > */ All other 6 comments in this function follow the style I used here, so I tried to be consistent. But feel free to amend this change. Thanks a lot! David
[PATCH] platform/x86: thinkpad_acpi: suppress warning about palm detection
This patch prevents the thinkpad_acpi driver from warning about 2 event codes returned for keyboard palm-detection. No behavioral changes, other than suppressing the warning in the kernel log. The events are still forwarded via acpi-netlink channels. We could, optionally, decide to forward the event through a input-switch on the tpacpi input device. However, so far no suitable input-code exists, and no similar drivers report such events. Hence, leave it an acpi event for now. Note that the event-codes are named based on empirical studies. On the ThinkPad X1 5th Gen the sensor can be found underneath the arrow key. Cc: Matthew Thode Signed-off-by: David Herrmann --- drivers/platform/x86/thinkpad_acpi.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c index 117be48ff4de..128f91af716e 100644 --- a/drivers/platform/x86/thinkpad_acpi.c +++ b/drivers/platform/x86/thinkpad_acpi.c @@ -214,6 +214,10 @@ enum tpacpi_hkey_event_t { /* AC-related events */ TP_HKEY_EV_AC_CHANGED = 0x6040, /* AC status changed */ + /* Further user-interface events */ + TP_HKEY_EV_PALM_DETECTED= 0x60b0, /* palm hoveres keyboard */ + TP_HKEY_EV_PALM_UNDETECTED = 0x60b1, /* palm removed */ + /* Misc */ TP_HKEY_EV_RFKILL_CHANGED = 0x7000, /* rfkill switch changed */ }; @@ -4079,6 +4083,12 @@ static bool hotkey_notify_6xxx(const u32 hkey, *send_acpi_ev = false; break; + case TP_HKEY_EV_PALM_DETECTED: + case TP_HKEY_EV_PALM_UNDETECTED: + /* palm detected hovering the keyboard, forward to user-space +* via netlink for consumption */ + return true; + default: pr_warn("unknown possible thermal alarm or keyboard event received\n"); known = false; -- 2.15.1
Re: [PATCH 4/6] hugetlbfs: implement memfd sealing
Hi On Sat, Nov 4, 2017 at 12:31 AM, Mike Kravetz wrote: > On 11/03/2017 10:56 AM, Mike Kravetz wrote: >> On 11/03/2017 10:41 AM, David Herrmann wrote: >>> Hi >>> >>> On Fri, Nov 3, 2017 at 6:12 PM, Mike Kravetz >>> wrote: >>>> On 11/03/2017 10:03 AM, David Herrmann wrote: >>>>> Hi >>>>> >>>>> On Tue, Oct 31, 2017 at 7:40 PM, Marc-André Lureau >>>>> wrote: >>>>>> Implements memfd sealing, similar to shmem: >>>>>> - WRITE: deny fallocate(PUNCH_HOLE). mmap() write is denied in >>>>>> memfd_add_seals(). write() doesn't exist for hugetlbfs. >>>>>> - SHRINK: added similar check as shmem_setattr() >>>>>> - GROW: added similar check as shmem_setattr() & shmem_fallocate() >>>>>> >>>>>> Except write() operation that doesn't exist with hugetlbfs, that >>>>>> should make sealing as close as it can be to shmem support. >>>>> >>>>> SEAL, SHRINK, and GROW look fine to me. >>>>> >>>>> Regarding WRITE >>>> >>>> The commit message may not be clear. However, hugetlbfs does not support >>>> the write system call (or aio). The only way to modify contents of a >>>> hugetlbfs file is via mmap or hole punch/truncate. So, we do not really >>>> need to worry about those special (a)io cases for hugetlbfs. >>> >>> This is not about the write(2) syscall. Please consider this scenario >>> about shmem: >>> >>> You create a memfd via memfd_create() and map it writable. You now >>> call another kernel syscall that takes as input _any mapped page >>> range_. You pass your mapped memfd-addresses to it. Those syscalls >>> tend to use get_user_pages() to pin arbitrary user-mapped pages, as >>> such this also affects shmem. In this case, those pages might stay >>> mapped even if you munmap() your memfd! >>> >>> One example of this is using AIO-read() on any other file that >>> supports it, passing your mapped memfd as buffer to _read into_. The >>> operations supported on the memfd are irrelevant here. >>> The selftests contain a FUSE-based test for this, since FUSE allows >>> user-space to GUP pages for an arbitrary amount of time. >>> >>> The original fix for this is: >>> >>> commit 05f65b5c70909ef686f865f0a85406d74d75f70f >>> Author: David Herrmann >>> Date: Fri Aug 8 14:25:36 2014 -0700 >>> >>> shm: wait for pins to be released when sealing >>> >>> Please have a look at this. Your patches use shmem_add_seals() almost >>> unchanged, and as such you call into shmem_wait_for_pins() on >>> hugetlbfs. I would really like to see an explicit ACK that this works >>> on hugetlbfs. >> >> Thanks for the explanation. I missed that in your first reply. I'll >> look into this for hugetlbfs. > > I reviewed the routines in the above commit and did not see anything that > would prevent them from working properly with hugetlbfs. I modified the > fuse test to use hugetlbfs based mapping. I also instrumented the above > routines and verified that tags were set/checked/cleared as designed for > hugetlb pages. So, that is an ACK on working with hugetlbfs. > > This does bring up the point that the fuse seals test should also be > modified to work with hugetlbfs as part of this series. Perfect! Looks all good to me then! Thanks David
Re: [PATCH 4/6] hugetlbfs: implement memfd sealing
Hi On Fri, Nov 3, 2017 at 6:12 PM, Mike Kravetz wrote: > On 11/03/2017 10:03 AM, David Herrmann wrote: >> Hi >> >> On Tue, Oct 31, 2017 at 7:40 PM, Marc-André Lureau >> wrote: >>> Implements memfd sealing, similar to shmem: >>> - WRITE: deny fallocate(PUNCH_HOLE). mmap() write is denied in >>> memfd_add_seals(). write() doesn't exist for hugetlbfs. >>> - SHRINK: added similar check as shmem_setattr() >>> - GROW: added similar check as shmem_setattr() & shmem_fallocate() >>> >>> Except write() operation that doesn't exist with hugetlbfs, that >>> should make sealing as close as it can be to shmem support. >> >> SEAL, SHRINK, and GROW look fine to me. >> >> Regarding WRITE > > The commit message may not be clear. However, hugetlbfs does not support > the write system call (or aio). The only way to modify contents of a > hugetlbfs file is via mmap or hole punch/truncate. So, we do not really > need to worry about those special (a)io cases for hugetlbfs. This is not about the write(2) syscall. Please consider this scenario about shmem: You create a memfd via memfd_create() and map it writable. You now call another kernel syscall that takes as input _any mapped page range_. You pass your mapped memfd-addresses to it. Those syscalls tend to use get_user_pages() to pin arbitrary user-mapped pages, as such this also affects shmem. In this case, those pages might stay mapped even if you munmap() your memfd! One example of this is using AIO-read() on any other file that supports it, passing your mapped memfd as buffer to _read into_. The operations supported on the memfd are irrelevant here. The selftests contain a FUSE-based test for this, since FUSE allows user-space to GUP pages for an arbitrary amount of time. The original fix for this is: commit 05f65b5c70909ef686f865f0a85406d74d75f70f Author: David Herrmann Date: Fri Aug 8 14:25:36 2014 -0700 shm: wait for pins to be released when sealing Please have a look at this. Your patches use shmem_add_seals() almost unchanged, and as such you call into shmem_wait_for_pins() on hugetlbfs. I would really like to see an explicit ACK that this works on hugetlbfs. Thanks David
Re: [PATCH 4/6] hugetlbfs: implement memfd sealing
Hi On Tue, Oct 31, 2017 at 7:40 PM, Marc-André Lureau wrote: > Implements memfd sealing, similar to shmem: > - WRITE: deny fallocate(PUNCH_HOLE). mmap() write is denied in > memfd_add_seals(). write() doesn't exist for hugetlbfs. > - SHRINK: added similar check as shmem_setattr() > - GROW: added similar check as shmem_setattr() & shmem_fallocate() > > Except write() operation that doesn't exist with hugetlbfs, that > should make sealing as close as it can be to shmem support. SEAL, SHRINK, and GROW look fine to me. Regarding WRITE you need to make sure there are no page references left around. For instance, on shmem any process might trigger the kernel to GUP mapped shmem pages for asynchronous IO, then unmap the file and request F_SEAL_WRITE. In this case the seal must be rejected *iff* the pages are still pinned. shmem does this by requiring the page-refcounts to be 0. Preferably there would be some better infrastructure that tells us whether someone operates on those pages, but this does not exist right now. See shmem_wait_for_pins() for details. I have little knowledge on how hugetlbs integrate with the page-cache and radix-tree, hence I'd prefer if someone can explicitly ACK that shmem_wait_for_pins() is suitable for hugetlbfs. Otherwise, this series looks good to me (minus the #ifdef mess..). Thanks David
Re: [PATCH 1/2] shmem: drop lru_add_drain_all from shmem_wait_for_pins
Hi On Fri, Nov 3, 2017 at 9:24 AM, Michal Hocko wrote: > On Fri 03-11-17 00:46:18, Hugh Dickins wrote: >> On Thu, 2 Nov 2017, Michal Hocko wrote: >> > From: Michal Hocko >> > >> > syzkaller has reported the following lockdep splat >> > == >> > WARNING: possible circular locking dependency detected >> > 4.13.0-next-20170911+ #19 Not tainted >> > -- >> > syz-executor5/6914 is trying to acquire lock: >> > (cpu_hotplug_lock.rw_sem){}, at: [] >> > get_online_cpus include/linux/cpu.h:126 [inline] >> > (cpu_hotplug_lock.rw_sem){}, at: [] >> > lru_add_drain_all+0xe/0x20 mm/swap.c:729 >> > >> > but task is already holding lock: >> > (&sb->s_type->i_mutex_key#9){}, at: [] inode_lock >> > include/linux/fs.h:712 [inline] >> > (&sb->s_type->i_mutex_key#9){}, at: [] >> > shmem_add_seals+0x197/0x1060 mm/shmem.c:2768 >> > >> > more details [1] and dependencies explained [2]. The problem seems to be >> > the usage of lru_add_drain_all from shmem_wait_for_pins. While the lock >> > dependency is subtle as hell and we might want to make lru_add_drain_all >> > less dependent on the hotplug locks the usage of lru_add_drain_all seems >> > dubious here. The whole function cares only about radix tree tags, page >> > count and page mapcount. None of those are touched from the draining >> > context. So it doesn't make much sense to drain pcp caches. Moreover >> > this looks like a wrong thing to do because it basically induces >> > unpredictable latency to the call because draining is not for free >> > (especially on larger machines with many cpus). >> > >> > Let's simply drop the call to lru_add_drain_all to address both issues. >> > >> > [1] http://lkml.kernel.org/r/089e0825eec8955c1f055c83d...@google.com >> > [2] >> > http://lkml.kernel.org/r/http://lkml.kernel.org/r/20171030151009.ip4k7nwan7muo...@hirez.programming.kicks-ass.net >> > >> > Cc: David Herrmann >> > Cc: Hugh Dickins >> > Signed-off-by: Michal Hocko >> >> NAK. shmem_wait_for_pins() is waiting for temporary pins on the pages >> to go away, and using lru_add_drain_all() in the usual way, to lower >> the refcount of pages temporarily pinned in a pagevec somewhere. Page >> count is touched by draining pagevecs: I'm surprised to see you say >> that it isn't - or have pagevec page references been eliminated by >> a recent commit that I missed? > > I must be missing something here. __pagevec_lru_add_fn merely about > moving the page into the appropriate LRU list, pagevec_move_tail only > rotates, lru_deactivate_file_fn moves from active to inactive LRUs, > lru_lazyfree_fn moves from anon to file LRUs and activate_page_drain > just moves to the active list. None of those operations touch the page > count AFAICS. So I would agree that some pages might be pinned outside > of the LRU (lru_add_pvec) and thus unreclaimable but does this really > matter. Or what else I am missing? Yes, we need to make sure those page-pins are dropped. shmem_wait_for_pins() literally just waits for all those to be cleared, since there is no way to tell whether a page is still inflight for some pending async WRITE operation. Hence, if the pagevecs keep pinning those pages, we must fail the shmem-seal operation, as we cannot guarantee there are no further WRITEs to this file. The refcount is our only way to tell. I think the caller could just call lru_add_drain_all() between mapping_deny_writable() and shmem_wait_for_pins(), releasing the inode-lock in between. But that means we drain it even if shmem_tag_pins() does not find anything (presumably the common case). It would also have weird interactions with parallel inode-operations, in case the seal-operation fails and is reverted. Not sure I like that. Thanks David
Re: [PATCH] hid/uhid: fix a double-fetch bug when copying event from user
Hey On Wed, Sep 20, 2017 at 12:12 AM, Dmitry Torokhov wrote: > On Tue, Sep 19, 2017 at 2:54 PM, Meng Xu wrote: >> >> >> On 09/19/2017 05:31 PM, Dmitry Torokhov wrote: >>> >>> On Mon, Sep 18, 2017 at 10:21 PM, Meng Xu wrote: When in_compat_syscall(), a user could make type != UHID_CREATE when get_user(type, buffer) [first fetch] and later make event->type == UHID_CREATE in copy_from_user(event, buffer, ...) [second fetch]. By doing so, an attacker might circumvent the specific logic to handle the type == UHID_CREATE case and later cause undefined behaviors. This patch enforces that event->type is overriden to the type value copied in the first fetch and thus, mitigate this race condition attack. >>> >>> I do not believe this mitigates anything, as copy_form_user() is not >>> an atomic operation and we can have 2nd thread "scrambling" the memory >>> while 1st does the ioctl. >> >> Yes, what you described is the root cause of this double-fetch situation >> and that is why I am proposing this patch. >>> >>> >>> We also should not expect that we always get consistent data from >>> userspace and the rest of the driver should be able to cope with >>> (reject) such data. >> >> Yes, that is also true and we should never assume to get consistent >> data from userspace. That is why in this case, we can have user >> started with UHID_INPUT just to skip the large chunk of code in >> if (type == UHID_CREATE) {} and then replace it with UHID_CREATE >> and take the function uhid_dev_create(uhid, &uhid->input_buf). >> This is exactly what this patch tries to mitigate. > > OK, so the driver should be able to withstand equivalent of "dd > if=/dev/random of=/dev/uhid". The point of special handling of > UHID_CREATE in uhid_event_from_user() is to convert the layout of the > UHID event from 32 to 64 bit layout because we made a mistake and have > a pointer data there. We do not really care about contents. We do not > care it if changes. We do not care of the pointer is valid. If there > was garbage, or there will be garbage after conversion, it does not > care one bit. uhid_dev_create() has to validate the input and reject > invalid input. I agree. With current code, worst case scenario is someone shortcutting the compat-conversion by setting UHID_CREATE after uhid_event_from_user() copied it. However, this does no harm, as Dmitry explained. If user-space wants to shortcut the conversion, let them do so... Thanks David
[PATCH] net/unix: drop obsolete fd-recursion limits
All unix sockets now account inflight FDs to the respective sender. This was introduced in: commit 712f4aad406bb1ed67f3f98d04c044191f0ff593 Author: willy tarreau Date: Sun Jan 10 07:54:56 2016 +0100 unix: properly account for FDs passed over unix sockets and further refined in: commit 415e3d3e90ce9e18727e8843ae343eda5a58fad6 Author: Hannes Frederic Sowa Date: Wed Feb 3 02:11:03 2016 +0100 unix: correctly track in-flight fds in sending process user_struct Hence, regardless of the stacking depth of FDs, the total number of inflight FDs is limited, and accounted. There is no known way for a local user to exceed those limits or exploit the accounting. Furthermore, the GC logic is independent of the recursion/stacking depth as well. It solely depends on the total number of inflight FDs, regardless of their layout. Lastly, the current `recursion_level' suffers a TOCTOU race, since it checks and inherits depths only at queue time. If we consider `A<-B' to mean `queue-B-on-A', the following sequence circumvents the recursion level easily: A<-B B<-C C<-D ... Y<-Z resulting in: A<-B<-C<-...<-Z With all of this in mind, lets drop the recursion limit. It has no additional security value, anymore. On the contrary, it randomly confuses message brokers that try to forward file-descriptors, since any sendmsg(2) call can fail spuriously with ETOOMANYREFS if a client maliciously modifies the FD while inflight. Cc: Alban Crequy Cc: Simon McVittie Signed-off-by: David Herrmann --- include/net/af_unix.h | 1 - net/unix/af_unix.c| 24 +--- 2 files changed, 1 insertion(+), 24 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 678e4d6fa317..3b3194b2fc65 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -58,7 +58,6 @@ struct unix_sock { struct list_headlink; atomic_long_t inflight; spinlock_t lock; - unsigned char recursion_level; unsigned long gc_flags; #define UNIX_GC_CANDIDATE 0 #define UNIX_GC_MAYBE_CYCLE1 diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 7b52a380d710..5c53f22d62e8 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1528,26 +1528,13 @@ static inline bool too_many_unix_fds(struct task_struct *p) return false; } -#define MAX_RECURSION_LEVEL 4 - static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb) { int i; - unsigned char max_level = 0; if (too_many_unix_fds(current)) return -ETOOMANYREFS; - for (i = scm->fp->count - 1; i >= 0; i--) { - struct sock *sk = unix_get_socket(scm->fp->fp[i]); - - if (sk) - max_level = max(max_level, - unix_sk(sk)->recursion_level); - } - if (unlikely(max_level > MAX_RECURSION_LEVEL)) - return -ETOOMANYREFS; - /* * Need to duplicate file references for the sake of garbage * collection. Otherwise a socket in the fps might become a @@ -1559,7 +1546,7 @@ static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb) for (i = scm->fp->count - 1; i >= 0; i--) unix_inflight(scm->fp->user, scm->fp->fp[i]); - return max_level; + return 0; } static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool send_fds) @@ -1649,7 +1636,6 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg, struct sk_buff *skb; long timeo; struct scm_cookie scm; - int max_level; int data_len = 0; int sk_locked; @@ -1701,7 +1687,6 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg, err = unix_scm_to_skb(&scm, skb, true); if (err < 0) goto out_free; - max_level = err + 1; skb_put(skb, len - data_len); skb->data_len = data_len; @@ -1819,8 +1804,6 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg, __net_timestamp(skb); maybe_add_creds(skb, sock, other); skb_queue_tail(&other->sk_receive_queue, skb); - if (max_level > unix_sk(other)->recursion_level) - unix_sk(other)->recursion_level = max_level; unix_state_unlock(other); other->sk_data_ready(other); sock_put(other); @@ -1855,7 +1838,6 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg, int sent = 0; struct scm_cookie scm; bool fds_sent = false; - int max_level; int data_len; wait_for_unix_gc(); @@ -1905,7 +1887,6 @@ static int unix_stream_sendmsg(struc
Re: [PATCH v2] HID: Replace semaphore driver_lock with mutex
Hi On Wed, Jun 14, 2017 at 1:58 PM, Arnd Bergmann wrote: > Does that mean that we can have a concurrent hid_device_remove() > and hid_device_probe() on the same device, using different > drivers and actually still need the driver_lock for that? I would assume > that the driver core handles that part at least. Nope. Only one device can be probed per physical device. Driver core locking is sufficient. >> Also note that hid_input_report() might be called from interrupt >> context, hence it can never call into kref_put() or similar (unless we >> want to guarantee that unbinding can run in interrupt context). > > I was thinking that we could do most of the unbinding in > hid_device_remove() and only do a small part (the final kfree > at the minimum) in the release() callback that gets called from > kref_put(), but I guess that also isn't easy to retrofit. Not a big fan of putting such restrictions on unbinding. Also unlikely to retrofit now. But I also think it is not needed. >> If we really want to get rid of the semaphore, a spinlock might do >> fine as well. Then again, some hid device drivers might expect their >> transport driver to *not* run in irq context, and thus break under a >> spinlock. So if you want to fix this, we need to audit the hid device >> drivers. > > Do you mean the hdrv->report or hdrv->raw_event might not work > in atomic context, or the probe/release callbacks might not work > there? Correct. I assume that there are hid-device-drivers that use raw_event (or other) callbacks, that assume that they're *not* in atomic context. For instance, the bluetooth ll-drivers call hid_input_report() from a workqueue. Hence, any device-driver running on bluetooth might have put kmalloc(GFP_KERNEL) calls into its callbacks without ever noticing that this is a bad idea. This is obviously not correct, since the device driver might very well be probed on USB and then fault. But... yeah... I wouldn't bet on all drivers to be correct in that regard. > If it's only the former, we could do something like the (untested, > needs rebasing etc) patch below, which only holds the spinlock > during hid_input_report(). We test the io_started flag under the > lock, which makes the flag sufficiently atomic, and the release > function will wait for any concurrent hid_input_report() to complete > before resetting the flag. > > For reference only, do not apply. > Signed-off-by: Arnd Bergmann I like the patch. It should be sufficient for what we want. If it breaks things, lets fix those device drivers then. Thanks David > diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c > index 5f87dbe28336..300c65bd40a1 100644 > --- a/drivers/hid/hid-core.c > +++ b/drivers/hid/hid-core.c > @@ -1532,8 +1532,11 @@ int hid_input_report(struct hid_device *hid, > int type, u8 *data, int size, int i > if (!hid) > return -ENODEV; > > - if (down_trylock(&hid->driver_input_lock)) > - return -EBUSY; > + spin_lock(&hid->driver_input_lock); > + if (!hid->io_started) { > + ret = -EBUSY; > + goto unlock; > + } > > if (!hid->driver) { > ret = -ENODEV; > @@ -1568,7 +1571,7 @@ int hid_input_report(struct hid_device *hid, int > type, u8 *data, int size, int i > ret = hid_report_raw_event(hid, type, data, size, interrupt); > > unlock: > - up(&hid->driver_input_lock); > + spin_unlock(&hid->driver_input_lock); > return ret; > } > EXPORT_SYMBOL_GPL(hid_input_report); > @@ -2317,11 +2320,6 @@ static int hid_device_probe(struct device *dev) > > if (down_interruptible(&hdev->driver_lock)) > return -EINTR; > - if (down_interruptible(&hdev->driver_input_lock)) { > - ret = -EINTR; > - goto unlock_driver_lock; > - } > - hdev->io_started = false; > > if (!hdev->driver) { > id = hid_match_device(hdev, hdrv); > @@ -2345,7 +2343,7 @@ static int hid_device_probe(struct device *dev) > } > unlock: > if (!hdev->io_started) > - up(&hdev->driver_input_lock); > + hid_device_io_start(hdev); > unlock_driver_lock: > up(&hdev->driver_lock); > return ret; > @@ -2363,7 +2361,7 @@ static int hid_device_remove(struct device *dev) > ret = -EINTR; > goto unlock_driver_lock; > } > - hdev->io_started = false; > + hid_device_io_stop(hdev); > > hdrv = hdev->driver; > if (hdrv) { > @@ -2375,8 +2373,11 @@ static int hid_device_remove(struct device *dev) > hdev->driver = NULL; > } > > - if (!hdev->io_started) > - up(&hdev->driver_input_lock); > + if (!hdev->io_started) { > + dev_warn(dev, "hdrv->remove left io active\n"); > + hid_device_io_stop(hdev); > + } > + > unlock_driver_lock: > up(&hdev->driver_lock); >
Re: [PATCH v2] HID: Replace semaphore driver_lock with mutex
Hey On Wed, Jun 14, 2017 at 9:20 AM, Arnd Bergmann wrote: > On Wed, Jun 14, 2017 at 7:22 AM, Binoy Jayan wrote: >> Hi, >> >> On 14 June 2017 at 01:55, Arnd Bergmann wrote: >> The mutex code clearly states mutex_trylock() must not be used in interrupt context (see kernel/locking/mutex.c), hence we used a semaphore here. Unless the mutex code is changed to allow this, we cannot switch away from semaphores. >>> >>> Right, that makes a lot of sense. I don't think changing the mutex >>> code is an option here, but I wonder if we can replace the semaphore >>> with something simpler anyway. >>> >>> From what I can tell, it currently does two things: >>> >>> 1. it acts as a simple flag to prevent hid_input_report from derefencing >>> the hid->driver pointer during initialization and exit. I think this >>> could >>> be done equally well using a simple atomic set_bit()/test_bit() or >>> similar. >>> >>> 2. it prevents the hid->driver pointer from becoming invalid while an >>> asynchronous hid_input_report() is in progress. This actually seems to >>> be a reference counting problem rather than a locking problem. >>> I don't immediately see how to better address it, or how exactly this >>> could go wrong in practice, but I would naively expect that either >>> hdev->driver->remove() needs to wait for the last user of hdev->driver >>> to complete, or we need kref_get/kref_put in hid_input_report() >>> to trigger the actual release function. The HID design is explained in detail in ./Documentation/hid/hid-transport.txt, in case you want some background information. The problem here is that the low-level transport driver has a lifetime that is independent of the hid device-driver. So the transport driver needs to be able to tell the HID layer about coming/going devices, as well as incoming traffic. At the same time, the HID layer can bind upper-layer hid device drivers *anytime* (since it is exposed via the driver core interfaces in /sys to bind drivers). The locking architecture is very similar to 's_active' on super-blocks, or 'active' on kernfs-nodes. However, the big difference here is that drivers can be rebound. Hence, we're not limited to just one driver lifetime, which is why we went with a semaphore instead. Also note that hid_input_report() might be called from interrupt context, hence it can never call into kref_put() or similar (unless we want to guarantee that unbinding can run in interrupt context). If we really want to get rid of the semaphore, a spinlock might do fine as well. Then again, some hid device drivers might expect their transport driver to *not* run in irq context, and thus break under a spinlock. So if you want to fix this, we need to audit the hid device drivers. David
Re: [PATCH v2] HID: Replace semaphore driver_lock with mutex
Hi On Tue, Jun 13, 2017 at 11:56 AM, Benjamin Tissoires wrote: >> > - struct semaphore driver_lock; /* >> > protects the current driver, except during input */ >> > + struct mutex driver_lock; /* >> > protects the current driver, except during input */ >> > struct semaphore driver_input_lock; /* >> > protects the current driver */ > > Unless I am mistaken, this one could also be converted to a mutex (in a > separate patch, of course). The mutex code clearly states mutex_trylock() must not be used in interrupt context (see kernel/locking/mutex.c), hence we used a semaphore here. Unless the mutex code is changed to allow this, we cannot switch away from semaphores. Otherwise, this patch (given Benjamin's comments are addressed) looks good: Reviewed-by: David Herrmann Thanks David
Re: [PATCH v7] Input: psxpad-spi - Add PlayStation 1/2 joypads via SPI interface Driver
Hey On Mon, May 1, 2017 at 9:47 PM, Dmitry Torokhov wrote: > On Sun, Apr 30, 2017 at 02:11:03PM +0900, Tomohiro Yoshidomi wrote: >> PlayStation 1/2 joypads can be connected directly to the SPI interface. >> >> Signed-off-by: Tomohiro Yoshidomi > > Benjamin, David, I think this driver now matches the gamepad spec that > we have in Drivers/input/gamepad.rst. Could you give it an one-over, as > far as mappings go, otherwise I think it can be applied and merged with > my 2nd pull request. Looks good to me: Acked-by: David Herrmann The FF code could be select'ed rather than conditional. Everything else looks fine. I didn't check the SPI internals though. Never touched SPI, sorry. Thanks David
Re: Re: [PATCH] Input: evdev: fix queueing of SYN_DROPPED event for EVIOCG[type] IOCTL case
Hi On Tue, Jan 31, 2017 at 5:09 PM, Aniroop Mathur wrote: > On Sun, Jan 22, 2017 at 6:45 PM, David Herrmann wrote: >> On Thu, Nov 24, 2016 at 9:11 PM, Aniroop Mathur wrote: >>> continue; >>> } else if (head != i) { >>> /* move entry to fill the gap */ >>> @@ -151,6 +155,7 @@ static void __evdev_flush_queue(struct evdev_client >>> *client, unsigned int type) >>> } >>> >>> client->head = head; >>> + return drop_count; >>> } >>> >>> static void __evdev_queue_syn_dropped(struct evdev_client *client) >>> @@ -920,6 +925,7 @@ static int evdev_handle_get_val(struct evdev_client >>> *client, >>> int ret; >>> unsigned long *mem; >>> size_t len; >>> + unsigned int drop_count = 0; >>> >>> len = BITS_TO_LONGS(maxbit) * sizeof(unsigned long); >>> mem = kmalloc(len, GFP_KERNEL); >>> @@ -933,12 +939,12 @@ static int evdev_handle_get_val(struct evdev_client >>> *client, >>> >>> spin_unlock(&dev->event_lock); >>> >>> - __evdev_flush_queue(client, type); >>> + drop_count = __evdev_flush_queue(client, type); >>> >>> spin_unlock_irq(&client->buffer_lock); >>> >>> ret = bits_to_user(mem, maxbit, maxlen, p, compat); >>> - if (ret < 0) >>> + if (ret < 0 && drop_count > 0) >>> evdev_queue_syn_dropped(client); >> >> I don't see the point. If bits_to_user() fails, you get EFAULT. >> User-space cannot assume anything is still valid if they get EFAULT. >> This is not like ENOMEM or other errors that you can recover from. >> EFAULT means _programming_ error, not runtime exception. >> >> IOW, EFAULT is special nearly everywhere in the kernel. Usually, >> whenever a syscall returns an error, you can rely on it to not have >> modified state. EFAULT is an exception on most paths, since it might >> occur when copying over results, and it is overly expensive to handle >> EFAULT gracefully there (you'd have to copy _results_ to user-space, >> before making them visible to the system). >> >> Long story short: I don't see the point in this patch. This path is >> *never* triggered, except if your client is buggy. And even if you >> trigger it, placing SYN_DROPPED in the queue does no harm at all. >> >> Care to elaborate why exactly you want this modification? >> David >> > > Sure, I will elaborate for you. > Basically, the bug is that if the last event dropped in the queue is > EV_SYN/SYN_REPORT and next event inserted is EV_SYN/SYN_DROPPED, then the > userspace client will ignore the next complete event packet as per rule > defined in the document although the client should not ignore the events > until EV_SYN/SYN_REPORT because the events for this case are not partial > events but the full packet indeed. So we need to make sure whenever this > happens we immediately insert another EV_SYN/SYN_REPORT after EV_SYN/DROPPED > event so that client do not ignore the next full packet. > I already fixed this bug and you can see the patch here (not submitted yet) > https://patchwork.kernel.org/patch/9362233/ > > For this patch, we had no problem with the case of kernel buffer overrun and > also had no problem for the case of clock change request, but only had problem > for the case of EVIOCG ioctl call which I have already explained in this patch > description. > In short, if we insert SYN_DROPPED event wrongly then client will ignore > events until SYN_REPORT event which we do not want to happen. So that is > why I want this modification in order to have correct insertion of > SYN_DROPPED event and hence go ahead with another patch I mentioned above. Fair enough. A SYN_DROPPED should be followed by a SYN_REPORT. The normal insertion path guarantees that (since it keeps the last event alive), the other 2 fake SYN_DROPPED insertions don't. But... > Next, you have also mentioned that this path is never triggered which I am not > sure of. However, if this path is never triggered then it is best to delete it > to avoid such confusion but I dont think thats a good idea. And if this path > can be triggered rarely (even once) which I believe it can like in case of > buggy > client you mentioned or in case of bit flip or for any possible reason, then > we need to make this modification. ...you seem to misunderstand when this code-path is triggered. This is an EFAULT handler. So it is
Re: [PATCH] Input: evdev: fix queueing of SYN_DROPPED event for EVIOCG[type] IOCTL case
Hi On Thu, Nov 24, 2016 at 9:11 PM, Aniroop Mathur wrote: > Currently, when EVIOCG[type] ioctl call is issued and bits_to_user fails, > then SYN_DROPPED event is inserted in the event queue always. > > However, it is not compulsory that some events are flushed out on every > EVIOCG[type] ioctl call like in case of empty event queue and in case when > EVIOCG[type] ioctl is issued for say A type of events but event queue does > not have any A type of events but some other type of events. > > Therefore, insert SYN_DROPPED event only when some events have been flushed > out from event queue plus bits_to_user fails. > > Signed-off-by: Aniroop Mathur > --- > drivers/input/evdev.c | 12 +--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c > index e9ae3d5..f8b295e 100644 > --- a/drivers/input/evdev.c > +++ b/drivers/input/evdev.c > @@ -108,9 +108,11 @@ static bool __evdev_is_filtered(struct evdev_client > *client, > } > > /* flush queued events of type @type, caller must hold client->buffer_lock */ > -static void __evdev_flush_queue(struct evdev_client *client, unsigned int > type) > +static unsigned int __evdev_flush_queue(struct evdev_client *client, > + unsigned int type) > { > unsigned int i, head, num; > + unsigned int drop_count = 0; > unsigned int mask = client->bufsize - 1; > bool is_report; > struct input_event *ev; > @@ -129,9 +131,11 @@ static void __evdev_flush_queue(struct evdev_client > *client, unsigned int type) > > if (ev->type == type) { > /* drop matched entry */ > + drop_count++; > continue; > } else if (is_report && !num) { > /* drop empty SYN_REPORT groups */ > + drop_count++; Dropping an empty report-group should not be considered in `drop_count` here. Empty report groups carry no information and should not affect client's state. You should only increment `drop_count` in the first block. > continue; > } else if (head != i) { > /* move entry to fill the gap */ > @@ -151,6 +155,7 @@ static void __evdev_flush_queue(struct evdev_client > *client, unsigned int type) > } > > client->head = head; > + return drop_count; > } > > static void __evdev_queue_syn_dropped(struct evdev_client *client) > @@ -920,6 +925,7 @@ static int evdev_handle_get_val(struct evdev_client > *client, > int ret; > unsigned long *mem; > size_t len; > + unsigned int drop_count = 0; > > len = BITS_TO_LONGS(maxbit) * sizeof(unsigned long); > mem = kmalloc(len, GFP_KERNEL); > @@ -933,12 +939,12 @@ static int evdev_handle_get_val(struct evdev_client > *client, > > spin_unlock(&dev->event_lock); > > - __evdev_flush_queue(client, type); > + drop_count = __evdev_flush_queue(client, type); > > spin_unlock_irq(&client->buffer_lock); > > ret = bits_to_user(mem, maxbit, maxlen, p, compat); > - if (ret < 0) > + if (ret < 0 && drop_count > 0) > evdev_queue_syn_dropped(client); I don't see the point. If bits_to_user() fails, you get EFAULT. User-space cannot assume anything is still valid if they get EFAULT. This is not like ENOMEM or other errors that you can recover from. EFAULT means _programming_ error, not runtime exception. IOW, EFAULT is special nearly everywhere in the kernel. Usually, whenever a syscall returns an error, you can rely on it to not have modified state. EFAULT is an exception on most paths, since it might occur when copying over results, and it is overly expensive to handle EFAULT gracefully there (you'd have to copy _results_ to user-space, before making them visible to the system). Long story short: I don't see the point in this patch. This path is *never* triggered, except if your client is buggy. And even if you trigger it, placing SYN_DROPPED in the queue does no harm at all. Care to elaborate why exactly you want this modification? David > > kfree(mem); > -- > 2.6.2 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-input" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] input: Add disable sysfs entry for every input device
Hi On Sun, Dec 25, 2016 at 11:04 AM, Pali Rohár wrote: > This patch allows user to disable events from any input device so events > would not be delivered to userspace. > > Currently there is no way to disable particular input device by kernel. > User for different reasons would need it for integrated PS/2 keyboard or > touchpad in notebook or touchscreen on mobile device to prevent sending > events. E.g. mobile phone in pocket or broken integrated PS/2 keyboard. > > This is just a RFC patch, not tested yet. Original post about motivation > about this patch is there: https://lkml.org/lkml/2014/11/29/92 > > Signed-off-by: Pali Rohár > --- > drivers/input/input.c | 35 +++ > include/linux/input.h |4 > 2 files changed, 39 insertions(+) Don't open the device, if you don't want events from it. Really. I assume the reason behind this is that you don't know how to make your user-space ignore devices. But with this patch in place, you now end up with user-space trying to use the device, but not getting any events. Anyone trying to debug this will go nuts because the setup looks like the device is used, but ends up being muted. I strongly recommend improving your user-space code to do what you want it to do (meaning, make your user-space be configurable, if you need this). Btw., as a workaround, you can always disable input-drivers that you don't want. Or you can run EVIOCGRAB on a device to get the same effect as your patch. Thanks David > diff --git a/drivers/input/input.c b/drivers/input/input.c > index d95c34e..9f0da7e 100644 > --- a/drivers/input/input.c > +++ b/drivers/input/input.c > @@ -430,6 +430,9 @@ void input_event(struct input_dev *dev, > { > unsigned long flags; > > + if (unlikely(dev->disabled)) > + return; > + > if (is_event_supported(type, dev->evbit, EV_MAX)) { > > spin_lock_irqsave(&dev->event_lock, flags); > @@ -457,6 +460,9 @@ void input_inject_event(struct input_handle *handle, > struct input_handle *grab; > unsigned long flags; > > + if (unlikely(dev->disabled)) > + return; > + > if (is_event_supported(type, dev->evbit, EV_MAX)) { > spin_lock_irqsave(&dev->event_lock, flags); > > @@ -1389,12 +1395,41 @@ static ssize_t input_dev_show_properties(struct > device *dev, > } > static DEVICE_ATTR(properties, S_IRUGO, input_dev_show_properties, NULL); > > +static ssize_t input_dev_show_disable(struct device *dev, > + struct device_attribute *attr, > + char *buf) > +{ > + struct input_dev *input_dev = to_input_dev(dev); > + > + return snprintf(buf, PAGE_SIZE, "%d\n", input_dev->disabled ? 1 : 0); > +} > +static ssize_t input_dev_store_disable(struct device *dev, > + struct device_attribute *attr, > + const char *buf, size_t count) > +{ > + struct input_dev *input_dev = to_input_dev(dev); > + int disable; > + int ret; > + > + ret = kstrtoint(buf, 0, &disable); > + if (ret) > + return ret; > + > + if (disable != 0 && disable != 1) > + return -EINVAL; > + > + input_dev->disabled = disable; > + return count; > +} > +static DEVICE_ATTR(disable, S_IRUGO | S_IWUSR, input_dev_show_disable, > input_dev_store_disable); > + > static struct attribute *input_dev_attrs[] = { > &dev_attr_name.attr, > &dev_attr_phys.attr, > &dev_attr_uniq.attr, > &dev_attr_modalias.attr, > &dev_attr_properties.attr, > + &dev_arrr_disable.attr, > NULL > }; > > diff --git a/include/linux/input.h b/include/linux/input.h > index a65e3b2..e390b56 100644 > --- a/include/linux/input.h > +++ b/include/linux/input.h > @@ -117,6 +117,8 @@ struct input_value { > * @vals: array of values queued in the current frame > * @devres_managed: indicates that devices is managed with devres framework > * and needs not be explicitly unregistered or freed. > + * @disabled: indicates that device is in disabled state and kernel drop > + * all events from it > */ > struct input_dev { > const char *name; > @@ -187,6 +189,8 @@ struct input_dev { > struct input_value *vals; > > bool devres_managed; > + > + bool disabled; > }; > #define to_input_dev(d) container_of(d, struct input_dev, dev) > > -- > 1.7.9.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-input" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] drm: add fourcc codes for 16bit R and GR
Hi On Mon, Jan 2, 2017 at 11:41 AM, Rainer Hochecker wrote: > From: Rainer Hochecker > > Add fourcc codes for 16bit planes. Required by mesa for > eglCreateImageKHR to access P010 surfaces created by vaapi. > > Signed-off-by: Rainer Hochecker > --- > include/uapi/drm/drm_fourcc.h | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h > index a5890bf..e6ab638 100644 > --- a/include/uapi/drm/drm_fourcc.h > +++ b/include/uapi/drm/drm_fourcc.h > @@ -41,10 +41,16 @@ extern "C" { > /* 8 bpp Red */ > #define DRM_FORMAT_R8 fourcc_code('R', '8', ' ', ' ') /* [7:0] R */ > > +/* 16 bpp Red */ > +#define DRM_FORMAT_R16 fourcc_code('R', '1', '6', ' ') /* [15:0] R */ > + > /* 16 bpp RG */ > #define DRM_FORMAT_RG88fourcc_code('R', 'G', '8', '8') /* > [15:0] R:G 8:8 little endian */ > #define DRM_FORMAT_GR88fourcc_code('G', 'R', '8', '8') /* > [15:0] G:R 8:8 little endian */ > > +/* 32 bpp GR */ > +#define DRM_FORMAT_GR16fourcc_code('G', 'R', '1', '6') /* > [31:0] G:R 16:16 little endian */ > + Shouldn't it be 'G', 'R', '3', '2'? Also, please put dri-devel on CC. Thanks David > /* 8 bpp RGB */ > #define DRM_FORMAT_RGB332 fourcc_code('R', 'G', 'B', '8') /* [7:0] > R:G:B 3:3:2 */ > #define DRM_FORMAT_BGR233 fourcc_code('B', 'G', 'R', '8') /* [7:0] > B:G:R 2:3:3 */ > -- > 2.9.3 >
Re: [RFC PATCH 0/3] staging: remove fbdev drivers
Hey On Fri, Dec 9, 2016 at 2:33 PM, Daniel Vetter wrote: >> > So it is possible, only reason vram dumb buffers look worse is that there's >> > only 3 and no one cares about them, vs about 20 and a very active community >> > of contributors (also for core drm improvements) for the other case. >> >> Well, we could move offb to drm while at it I suppose that would be another >> one (offb is the "dumb driver based on pre-programmed output by firmware). > > One of the still in-flight drm drivers is the simpledrm thing meant for > all kinds of firmware drivers like efifb and similar things on arm for > pre-programmed output set up by firmware. I.e. no modeset support and > otherwise a lot of fake to make it work as drm driver, but the idea that > it's good enough until your real drm driver takes over. The x86 platform device fixups for SimpleDRM went in some weeks ago, so maybe I should resend the patches. The driver could easily do 'offb'-like devices as well. Trivial to add. Anyway, Benjamin is right, we always do shadow buffering for trivial drivers. Even in SimpleDRM I blit the shadow buffer on page-flip or dirty-ioctl. Reason is that we cannot easily expose the real framebuffer in DRM via FB-objects. But I also never saw a use-case for it, since all trivial devices I worked with were only either used as fallback or nobody cared for performance. The generic DRM API is designed for dynamic FB allocation. If your hardware does not allow you to change the scanout source, you will have a hard time trying to expose the static buffers via the dynamic FB-object API. Furthermore, all DRM user-space expects dynamic FB management to work, preferably without a ridiculously low memory limit. That's also why I never bothered changing the drivers. Despite all of this I still see no reason why a driver could not expose the static, real frambuffers via private ioctls. You can get all your fancy acceleration that way. Then fix user-space to use this API. If enough drivers end up with something similar, move it into the core. Just like we always do in DRM. Thanks David
Re: drm: GPF in drm_getcap
Hi On Sat, Nov 26, 2016 at 7:07 PM, Dmitry Vyukov wrote: > grep "card0" dmesg: > [5.298617] device: 'card0': device_add > [5.298946] PM: Adding info for No Bus:card0 > [6.436178] device: 'card0': device_add > [6.436488] PM: Adding info for No Bus:card0 > > > # ls -l /dev/dri/card0 > crw-rw---T 1 root video 226, 0 Nov 26 18:05 /dev/dri/card0 > > # ls -lt /sys/class/drm/card0/device/ > ls: cannot access /sys/class/drm/card0/device/: No such file or directory > > # ls -lt /sys/class/drm/card0/device/driver > ls: cannot access /sys/class/drm/card0/device/driver: No such file or > directory Looks like vgem. Something like this should help: https://gist.github.com/dvdhrm/1bcdf4f3485aa1614a0198a7b90515e2 I wonder whether it would be more appropriate to return -ENOTSUPP rather than 0. Thanks David
Re: drm: GPF in drm_getcap
Hi On Sat, Nov 26, 2016 at 6:50 PM, Dmitry Vyukov wrote: > On Sat, Nov 26, 2016 at 6:35 PM, David Herrmann wrote: >> Hi >> >> On Sat, Nov 26, 2016 at 6:17 PM, Dmitry Vyukov wrote: >>> On Fri, Sep 9, 2016 at 1:56 PM, Dmitry Vyukov wrote: >>>> Hello, >>>> >>>> The following program triggers GPF in drm_getcap: >>>> >>>> // autogenerated by syzkaller (http://github.com/google/syzkaller) >>>> #include >>>> #include >>>> #include >>>> #include >>>> #include >>>> #include >>>> #include >>>> #include >>>> >>>> int main() >>>> { >>>> int fd = open("/dev/dri/card0", O_RDONLY); >>>> uint64_t data[2] = {0x11, 0x80}; >>>> ioctl(fd, 0xc010640cul /*DRM_IOCTL_GET_CAP*/, data); >>>> return 0; >>>> } >>>> >>>> >>>> general protection fault: [#1] SMP DEBUG_PAGEALLOC KASAN >>>> Modules linked in: >>>> CPU: 1 PID: 5745 Comm: syz-executor Not tainted 4.8.0-rc5-next-20160905+ >>>> #14 >>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs >>>> 01/01/2011 >>>> task: 8800310dc540 task.stack: 88003cbc >>>> RIP: 0010:[] [] >>>> drm_getcap+0x34b/0x4f0 drivers/gpu/drm/drm_ioctl.c:260 >>>> RSP: 0018:88003cbc7c28 EFLAGS: 00010202 >>>> RAX: 0058 RBX: 88003cbc7cf8 RCX: c90001db >>>> RDX: 005d RSI: 88003cbc7cf8 RDI: 02c0 >>>> RBP: 88003cbc7c50 R08: ed0007978fa1 R09: ed0007978fa0 >>>> R10: 88003cbc7d07 R11: ed0007978fa1 R12: fff0 >>>> R13: dc00 R14: 88003bcc6850 R15: fff2 >>>> FS: 7fcbf4e03700() GS:88003ed0() >>>> knlGS: >>>> CS: 0010 DS: ES: CR0: 80050033 >>>> CR2: 006dce00 CR3: 66135000 CR4: 06e0 >>>> DR0: 001e DR1: 001e DR2: >>>> DR3: DR6: 0ff0 DR7: 0600 >>>> Stack: >>>> 88003c26db00 88003cbc7cf8 875a3000 88cf0ee0 >>>> fff2 88003cbc7dc0 834cb57c e200 >>>> 1101 875a1ba0 882ae930 0010 >>>> Call Trace: >>>> [] drm_ioctl+0x54c/0xaf0 drivers/gpu/drm/drm_ioctl.c:728 >>>> [< inline >] vfs_ioctl fs/ioctl.c:43 >>>> [] do_vfs_ioctl+0x18c/0x1080 fs/ioctl.c:675 >>>> [< inline >] SYSC_ioctl fs/ioctl.c:690 >>>> [] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:681 >>>> [] entry_SYSCALL_64_fastpath+0x23/0xc1 >>>> Code: 3c 28 00 0f 85 88 01 00 00 49 8b 44 24 10 49 39 c6 4c 8d 60 f0 >>>> 74 82 e8 64 19 10 fe 49 8d bc 24 d0 02 00 00 48 89 f8 48 c1 e8 03 <42> >>>> 80 3c 28 00 0f 85 6f 01 00 00 4d 8b bc 24 d0 02 00 00 49 8d >>>> RIP [] drm_getcap+0x34b/0x4f0 >>>> drivers/gpu/drm/drm_ioctl.c:260 >>>> RSP >>>> ---[ end trace c6e1afa8cd73b880 ]--- >>>> >>>> >>>> On commit 4affa544adb8077403893e62b9e327fcf87de6f7 (Sep 8) of linux-next. >>> >>> ping >>> >>> Still happens on 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24). >> >> I suspect this is because we run drm_for_each_crtc() in >> drm_getcap(DRM_PAGE_FLIP_TARGET) on a legacy driver (meaning >> mode_config is not initialized). @danvet, how about always >> initializing mode_config to 0/empty/dummy? >> >> Dmitry, what driver do you run this on? And is CONFIG_DRM_LEGACY enabled? > > > CONFIG_DRM_LEGACY is enabled. > > How can I understand what driver is used? > This happens inside of qemu. This is the device: > crw-rw---T 1 root video 226, 0 Nov 26 17:45 /dev/dri/card0 Usually by looking into `dmesg` and grepping for 'card0', or by inspecting: /sys/class/drm/card0/device/ or more importantly looking at the symlink: /sys/class/drm/card0/device/driver Thanks David
Re: drm: GPF in drm_getcap
Hi On Sat, Nov 26, 2016 at 6:17 PM, Dmitry Vyukov wrote: > On Fri, Sep 9, 2016 at 1:56 PM, Dmitry Vyukov wrote: >> Hello, >> >> The following program triggers GPF in drm_getcap: >> >> // autogenerated by syzkaller (http://github.com/google/syzkaller) >> #include >> #include >> #include >> #include >> #include >> #include >> #include >> #include >> >> int main() >> { >> int fd = open("/dev/dri/card0", O_RDONLY); >> uint64_t data[2] = {0x11, 0x80}; >> ioctl(fd, 0xc010640cul /*DRM_IOCTL_GET_CAP*/, data); >> return 0; >> } >> >> >> general protection fault: [#1] SMP DEBUG_PAGEALLOC KASAN >> Modules linked in: >> CPU: 1 PID: 5745 Comm: syz-executor Not tainted 4.8.0-rc5-next-20160905+ #14 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 >> task: 8800310dc540 task.stack: 88003cbc >> RIP: 0010:[] [] >> drm_getcap+0x34b/0x4f0 drivers/gpu/drm/drm_ioctl.c:260 >> RSP: 0018:88003cbc7c28 EFLAGS: 00010202 >> RAX: 0058 RBX: 88003cbc7cf8 RCX: c90001db >> RDX: 005d RSI: 88003cbc7cf8 RDI: 02c0 >> RBP: 88003cbc7c50 R08: ed0007978fa1 R09: ed0007978fa0 >> R10: 88003cbc7d07 R11: ed0007978fa1 R12: fff0 >> R13: dc00 R14: 88003bcc6850 R15: fff2 >> FS: 7fcbf4e03700() GS:88003ed0() knlGS: >> CS: 0010 DS: ES: CR0: 80050033 >> CR2: 006dce00 CR3: 66135000 CR4: 06e0 >> DR0: 001e DR1: 001e DR2: >> DR3: DR6: 0ff0 DR7: 0600 >> Stack: >> 88003c26db00 88003cbc7cf8 875a3000 88cf0ee0 >> fff2 88003cbc7dc0 834cb57c e200 >> 1101 875a1ba0 882ae930 0010 >> Call Trace: >> [] drm_ioctl+0x54c/0xaf0 drivers/gpu/drm/drm_ioctl.c:728 >> [< inline >] vfs_ioctl fs/ioctl.c:43 >> [] do_vfs_ioctl+0x18c/0x1080 fs/ioctl.c:675 >> [< inline >] SYSC_ioctl fs/ioctl.c:690 >> [] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:681 >> [] entry_SYSCALL_64_fastpath+0x23/0xc1 >> Code: 3c 28 00 0f 85 88 01 00 00 49 8b 44 24 10 49 39 c6 4c 8d 60 f0 >> 74 82 e8 64 19 10 fe 49 8d bc 24 d0 02 00 00 48 89 f8 48 c1 e8 03 <42> >> 80 3c 28 00 0f 85 6f 01 00 00 4d 8b bc 24 d0 02 00 00 49 8d >> RIP [] drm_getcap+0x34b/0x4f0 >> drivers/gpu/drm/drm_ioctl.c:260 >> RSP >> ---[ end trace c6e1afa8cd73b880 ]--- >> >> >> On commit 4affa544adb8077403893e62b9e327fcf87de6f7 (Sep 8) of linux-next. > > ping > > Still happens on 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24). I suspect this is because we run drm_for_each_crtc() in drm_getcap(DRM_PAGE_FLIP_TARGET) on a legacy driver (meaning mode_config is not initialized). @danvet, how about always initializing mode_config to 0/empty/dummy? Dmitry, what driver do you run this on? And is CONFIG_DRM_LEGACY enabled? Thanks David
[tip:x86/urgent] x86/sysfb: Fix lfb_size calculation
Commit-ID: f96acec8c8020807429d21324547f4b904c37177 Gitweb: http://git.kernel.org/tip/f96acec8c8020807429d21324547f4b904c37177 Author: David Herrmann AuthorDate: Tue, 15 Nov 2016 13:01:58 +0100 Committer: Ingo Molnar CommitDate: Wed, 16 Nov 2016 09:38:23 +0100 x86/sysfb: Fix lfb_size calculation The screen_info.lfb_size field is shifted by 16 bits *only* in case of VBE. This has historical reasons since VBE advertised it similarly. However, in case of EFI framebuffers, the size is no longer shifted. Fix the x86 simple-framebuffer setup code to use the correct size in the non-VBE case. While at it, avoid variable abbreviations and rename 'len' to 'length', and use the correct types matching the screen_info definition. Signed-off-by: David Herrmann Acked-by: Thomas Gleixner Cc: Linus Torvalds Cc: Matt Fleming Cc: Peter Zijlstra Cc: Tom Gundersen Link: http://lkml.kernel.org/r/20161115120158.15388-3-dh.herrm...@gmail.com Signed-off-by: Ingo Molnar --- arch/x86/kernel/sysfb_simplefb.c | 25 + 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/sysfb_simplefb.c b/arch/x86/kernel/sysfb_simplefb.c index 35b8641..85195d4 100644 --- a/arch/x86/kernel/sysfb_simplefb.c +++ b/arch/x86/kernel/sysfb_simplefb.c @@ -66,8 +66,8 @@ __init int create_simplefb(const struct screen_info *si, { struct platform_device *pd; struct resource res; - unsigned long len; - u64 base; + u64 base, size; + u32 length; /* * If the 64BIT_BASE capability is set, ext_lfb_base will contain the @@ -82,11 +82,20 @@ __init int create_simplefb(const struct screen_info *si, return -EINVAL; } - /* don't use lfb_size as it may contain the whole VMEM instead of only -* the part that is occupied by the framebuffer */ - len = mode->height * mode->stride; - len = PAGE_ALIGN(len); - if (len > (u64)si->lfb_size << 16) { + /* +* Don't use lfb_size as IORESOURCE size, since it may contain the +* entire VMEM, and thus require huge mappings. Use just the part we +* need, that is, the part where the framebuffer is located. But verify +* that it does not exceed the advertised VMEM. +* Note that in case of VBE, the lfb_size is shifted by 16 bits for +* historical reasons. +*/ + size = si->lfb_size; + if (si->orig_video_isVGA == VIDEO_TYPE_VLFB) + size <<= 16; + length = mode->height * mode->stride; + length = PAGE_ALIGN(length); + if (length > size) { printk(KERN_WARNING "sysfb: VRAM smaller than advertised\n"); return -EINVAL; } @@ -96,7 +105,7 @@ __init int create_simplefb(const struct screen_info *si, res.flags = IORESOURCE_MEM | IORESOURCE_BUSY; res.name = simplefb_resname; res.start = base; - res.end = res.start + len - 1; + res.end = res.start + length - 1; if (res.end <= res.start) return -EINVAL;
[tip:x86/urgent] x86/sysfb: Add support for 64bit EFI lfb_base
Commit-ID: 9164b4ceb7b492a77c7fe770a4b9d1375c9cd45a Gitweb: http://git.kernel.org/tip/9164b4ceb7b492a77c7fe770a4b9d1375c9cd45a Author: David Herrmann AuthorDate: Tue, 15 Nov 2016 13:01:57 +0100 Committer: Ingo Molnar CommitDate: Wed, 16 Nov 2016 09:38:22 +0100 x86/sysfb: Add support for 64bit EFI lfb_base The screen_info object was extended to support 64-bit lfb_base addresses in: ae2ee627dc87 ("efifb: Add support for 64-bit frame buffer addresses") However, the x86 simple-framebuffer setup code never made use of it. Fix it to properly assemble and verify the lfb_base before advertising simple-framebuffer devices. In particular, this means if VIDEO_CAPABILITY_64BIT_BASE is set, the screen_info->ext_lfb_base field will contain the upper 32bit of the actual lfb_base. Make sure the address is not 0 (i.e., unset), as well as does not overflow the physical address type. Signed-off-by: David Herrmann Acked-by: Thomas Gleixner Cc: Linus Torvalds Cc: Matt Fleming Cc: Peter Zijlstra Cc: Tom Gundersen Link: http://lkml.kernel.org/r/20161115120158.15388-2-dh.herrm...@gmail.com Signed-off-by: Ingo Molnar --- arch/x86/kernel/sysfb_simplefb.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/sysfb_simplefb.c b/arch/x86/kernel/sysfb_simplefb.c index 764a29f..35b8641 100644 --- a/arch/x86/kernel/sysfb_simplefb.c +++ b/arch/x86/kernel/sysfb_simplefb.c @@ -67,6 +67,20 @@ __init int create_simplefb(const struct screen_info *si, struct platform_device *pd; struct resource res; unsigned long len; + u64 base; + + /* +* If the 64BIT_BASE capability is set, ext_lfb_base will contain the +* upper half of the base address. Assemble the address, then make sure +* it is valid and we can actually access it. +*/ + base = si->lfb_base; + if (si->capabilities & VIDEO_CAPABILITY_64BIT_BASE) + base |= (u64)si->ext_lfb_base << 32; + if (!base || (u64)(resource_size_t)base != base) { + printk(KERN_DEBUG "sysfb: inaccessible VRAM base\n"); + return -EINVAL; + } /* don't use lfb_size as it may contain the whole VMEM instead of only * the part that is occupied by the framebuffer */ @@ -81,8 +95,8 @@ __init int create_simplefb(const struct screen_info *si, memset(&res, 0, sizeof(res)); res.flags = IORESOURCE_MEM | IORESOURCE_BUSY; res.name = simplefb_resname; - res.start = si->lfb_base; - res.end = si->lfb_base + len - 1; + res.start = base; + res.end = res.start + len - 1; if (res.end <= res.start) return -EINVAL;
[PATCH 1/2] x86/sysfb: add support for 64bit EFI lfb_base
The screen_info object was extended to support 64bit lfb_base addresses in: commit ae2ee627dc87a70910de91b791b3cd0e9c6facdd Author: Matt Fleming Date: Tue Aug 25 16:32:55 2015 +0100 efifb: Add support for 64-bit frame buffer addresses However, the x86 simple-framebuffer setup code never made use of it. Fix it to properly assemble and verify the lfb_base before advertising simple-framebuffer devices. In particular, this means if VIDEO_CAPABILITY_64BIT_BASE is set, the screen_info->ext_lfb_base field will contain the upper 32bit of the actual lfb_base. Make sure the address is not 0 (i.e., unset), as well as does not overflow the physical address type. Signed-off-by: David Herrmann --- arch/x86/kernel/sysfb_simplefb.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/sysfb_simplefb.c b/arch/x86/kernel/sysfb_simplefb.c index 764a29f..35b8641 100644 --- a/arch/x86/kernel/sysfb_simplefb.c +++ b/arch/x86/kernel/sysfb_simplefb.c @@ -67,6 +67,20 @@ __init int create_simplefb(const struct screen_info *si, struct platform_device *pd; struct resource res; unsigned long len; + u64 base; + + /* +* If the 64BIT_BASE capability is set, ext_lfb_base will contain the +* upper half of the base address. Assemble the address, then make sure +* it is valid and we can actually access it. +*/ + base = si->lfb_base; + if (si->capabilities & VIDEO_CAPABILITY_64BIT_BASE) + base |= (u64)si->ext_lfb_base << 32; + if (!base || (u64)(resource_size_t)base != base) { + printk(KERN_DEBUG "sysfb: inaccessible VRAM base\n"); + return -EINVAL; + } /* don't use lfb_size as it may contain the whole VMEM instead of only * the part that is occupied by the framebuffer */ @@ -81,8 +95,8 @@ __init int create_simplefb(const struct screen_info *si, memset(&res, 0, sizeof(res)); res.flags = IORESOURCE_MEM | IORESOURCE_BUSY; res.name = simplefb_resname; - res.start = si->lfb_base; - res.end = si->lfb_base + len - 1; + res.start = base; + res.end = res.start + len - 1; if (res.end <= res.start) return -EINVAL; -- 2.10.2
[PATCH 2/2] x86/sysfb: fix lfb_size calculation
The screen_info.lfb_size field is shifted by 16 bits *only* in case of VBE. This has historical reasons since VBE advertised it similarly. However, in case of EFI framebuffers, the size is no longer shifted. Fix the x86 simple-framebuffer setup code to use the correct size in the non-VBE case. While at it, avoid variable abbreviations and rename 'len' to 'length', and use the correct types matching the screen_info definition. Signed-off-by: David Herrmann --- arch/x86/kernel/sysfb_simplefb.c | 25 + 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/sysfb_simplefb.c b/arch/x86/kernel/sysfb_simplefb.c index 35b8641..85195d4 100644 --- a/arch/x86/kernel/sysfb_simplefb.c +++ b/arch/x86/kernel/sysfb_simplefb.c @@ -66,8 +66,8 @@ __init int create_simplefb(const struct screen_info *si, { struct platform_device *pd; struct resource res; - unsigned long len; - u64 base; + u64 base, size; + u32 length; /* * If the 64BIT_BASE capability is set, ext_lfb_base will contain the @@ -82,11 +82,20 @@ __init int create_simplefb(const struct screen_info *si, return -EINVAL; } - /* don't use lfb_size as it may contain the whole VMEM instead of only -* the part that is occupied by the framebuffer */ - len = mode->height * mode->stride; - len = PAGE_ALIGN(len); - if (len > (u64)si->lfb_size << 16) { + /* +* Don't use lfb_size as IORESOURCE size, since it may contain the +* entire VMEM, and thus require huge mappings. Use just the part we +* need, that is, the part where the framebuffer is located. But verify +* that it does not exceed the advertised VMEM. +* Note that in case of VBE, the lfb_size is shifted by 16 bits for +* historical reasons. +*/ + size = si->lfb_size; + if (si->orig_video_isVGA == VIDEO_TYPE_VLFB) + size <<= 16; + length = mode->height * mode->stride; + length = PAGE_ALIGN(length); + if (length > size) { printk(KERN_WARNING "sysfb: VRAM smaller than advertised\n"); return -EINVAL; } @@ -96,7 +105,7 @@ __init int create_simplefb(const struct screen_info *si, res.flags = IORESOURCE_MEM | IORESOURCE_BUSY; res.name = simplefb_resname; res.start = base; - res.end = res.start + len - 1; + res.end = res.start + length - 1; if (res.end <= res.start) return -EINVAL; -- 2.10.2
[PATCH 0/2] x86 Simplefb Fixes
Hi Two fixes for the simplefb setup code in arch/x86/. One to support Matt's 64bit EFI fb-base-address, and one to fix a longer standing verification bug. Neither fix is critical. Additionally, most people still use efifb/vesafb instead of simplefb, so probably does not affect any normal distribution. However, SimpleDRM is close to being merged, so x86-sysfb might get into some distros. Thanks David David Herrmann (2): x86/sysfb: add support for 64bit EFI lfb_base x86/sysfb: fix lfb_size calculation arch/x86/kernel/sysfb_simplefb.c | 39 +++ 1 file changed, 31 insertions(+), 8 deletions(-) -- 2.10.2
Re: [RFC v1 00/14] Bus1 Kernel Message Bus
Hi On Thu, Oct 27, 2016 at 2:45 AM, Kirill A. Shutemov wrote: > On Wed, Oct 26, 2016 at 10:34:30PM +0200, David Herrmann wrote: >> Long story short: We have uid<->uid quotas so far, which prevent DoS >> attacks, unless you get access to a ridiculous amount of local UIDs. >> Details on which resources are accounted can be found in the wiki [1]. > > Does only root user_ns uid count as separate or per-ns too? > > In first case we will have vitually unbounded access to UIDs. > > The second case can cap number of user namespaces a user can create while > using bus1 inside. > > Or am I missing something? We use the exact same mechanism as "struct user_struct" (as defined in linux/sched.h). One instance corresponds to each kuid_t currently in use. This is analogous to task, epoll, inotify, fanotify, mqueue, pipes, keys, ... resource accounting. Could you elaborate on what problem you see? Thanks David
Re: [RFC v1 04/14] bus1: util - fixed list utility library
Hi On Thu, Oct 27, 2016 at 2:37 PM, Peter Zijlstra wrote: > On Wed, Oct 26, 2016 at 09:18:00PM +0200, David Herrmann wrote: >> + e = kmalloc_array(sizeof(*e), BUS1_FLIST_BATCH + 1, gfp); > >> +#define BUS1_FLIST_BATCH (1024) > >> +struct bus1_flist { >> + union { >> + struct bus1_flist *next; >> + void *ptr; >> + }; >> +}; > > So that's an allocation of 8*(1024+1), or slightly more than 2 pages. > > kmalloc will round up to the next power of two, so you'll end up with an > allocation of 16*1024, wasting a whopping 8184 bytes per such allocation > in slack space. > > Please consider using 1023 or something for your batch size, 511 would > get you to exactly 1 page which would be even better. Thanks for the hint! 511 looks like the obvious choice. Maybe even (PAGE_SIZE / sizeof(long) - 1). I will put a comment next to the definition. Thanks David
Re: [RFC v1 04/14] bus1: util - fixed list utility library
Hi On Thu, Oct 27, 2016 at 2:56 PM, Arnd Bergmann wrote: > On Thursday, October 27, 2016 2:48:46 PM CEST David Herrmann wrote: >> On Thu, Oct 27, 2016 at 2:37 PM, Peter Zijlstra wrote: >> > On Wed, Oct 26, 2016 at 09:18:00PM +0200, David Herrmann wrote: >> >> + e = kmalloc_array(sizeof(*e), BUS1_FLIST_BATCH + 1, gfp); >> > >> >> +#define BUS1_FLIST_BATCH (1024) >> > >> >> +struct bus1_flist { >> >> + union { >> >> + struct bus1_flist *next; >> >> + void *ptr; >> >> + }; >> >> +}; >> > >> > So that's an allocation of 8*(1024+1), or slightly more than 2 pages. >> > >> > kmalloc will round up to the next power of two, so you'll end up with an >> > allocation of 16*1024, wasting a whopping 8184 bytes per such allocation >> > in slack space. >> > >> > Please consider using 1023 or something for your batch size, 511 would >> > get you to exactly 1 page which would be even better. >> >> Thanks for the hint! 511 looks like the obvious choice. Maybe even >> (PAGE_SIZE / sizeof(long) - 1). I will put a comment next to the >> definition. >> >> > > PAGE_SIZE can be up to 64KB though, so that might lead wasting a lot > of memory. The bus1-flist implementation never over-allocates. It is a fixed size list, so it only allocates as much memory as needed. The issue PeterZ pointed out is passing suitable sizes to kmalloc(), which internally over-allocates to power-of-2 bounds (or some similar bounds). So we only ever waste space here if kmalloc() internally rounds up. The code in bus1-flist allocates exactly the needed space. Thanks David
Re: [RFC v1 00/14] Bus1 Kernel Message Bus
Hi On Wed, Oct 26, 2016 at 9:39 PM, Linus Torvalds wrote: > So the thing that tends to worry me about these is resource management. > > If I understood the documentation correctly, this has per-user > resource management, which guarantees that at least the system won't > run out of memory. Good. The act of sending a message transfers the > resource to the receiver end. Fine. > > However, the usual problem ends up being that a bad user can basically > DoS a system agent, especially since for obvious performance reasons > the send/receive has to be asynchronous. > > So the usual DoS model is that some user just sends a lot of messages > to a system agent, filling up the system agent resource quota, and > basically killing the system. No, it didn't run out of memory, but the > system agent may not be able to do anything more, since it is now out > of resources. > > Keeping the resource management with the sender doesn't solve the > problem, it just reverses it: now the attack will be to send a lot of > queries to the system agent, but then just refuse to listen to the > replies - again causing the system agent to run out of resources. > > Usually the way this is resolved this is by forcing a > "request-and-reply" resource management model, where the person who > sends out a request is not only the one who is accounted for the > request, but also accounted for the reply buffer. That way the system > agent never runs out of resources, because it's always the requesting > party that has its resources accounted, never the system agent. > > You may well have solved this, but can you describe what the solution > is without forcing people to read the code and try to analyze it? All accounting on bus1 is done on a UID-basis. This is the initial model that tries to match POSIX semantics. More advanced accounting is left as a future extension (like cgroup-based, etc.). So whenever we talk about "user accounting", we talk about the user-abstraction in bus1 that right now is based on UIDs, but could be extended for other schemes. All bus1 resources are owned by a peer, and each peer has a user assigned (which right now corresponds to file->f_cred->uid). Whenever a peer allocates resources, it is accounted on its user. There are global limits per user which cannot be exceeded. Additionally, each peer can set its own per-peer limits, to further partition the per-user limits. Of course, per-user limits override per-peer limits, if necessary. Now this is all trivial and obvious. It works like any resource accounting in the kernel. It becomes tricky when we try to transfer resources. Before SEND, a resource is always accounted on the sender. After RECV, a resource is accounted on the receiver. That is, resource ownership is transferred. In most cases this is obvious: memory is copied from one address-space into another, or file-descriptors are added into the file-table of another process, etc. Lastly, when a resource is queued, we decided to go with receiver-accounting. This means, at the time of SEND resource ownership is transferred (unlike sender-accounting, which would transfer it at time of RECV). The reasons are manifold, but mainly we want RECV to not fail due to accounting, resource exhaustion, etc. We wanted SEND to do the heavy-lifting, and RECV to just dequeue. By avoiding sender-based accounting, we avoid attacks where a receiver does not dequeue messages and thus exhausts the sender's limits. The issue left is senders DoS'ing a target user. To mitigate this, we implemented a quota system. Whenever a sender wants to transfer resources to a receiver, it only gets access to a subset of the receivers resource limits. The inflight resources are accounted on a uid<->uid basis, and the current algorithm allows a receiver access to at most half the limit of the destination not currently used by anyone else. Example: Imagine a receiver with a limit of 1024 handles. A sender transmits a message to that receiver. It gets access to half the limit not used by anyone else, hence 512 handles. It does not matter how many senders there are, nor how many messages are sent, it will reach its quota at 512. As long as they all belong to the same user, they will share the quota and can queue at most 512 handles. If a second sending user comes into play, it gets half the remaining not used by anyone else, which ends up being 256. And so on... If the peer dequeues messages in between, the numbers get higher again. But if you do the math, the most you can get is 50% of the targets resources, if you're the only sender. In all other cases you get less (like intertwined transfers, etc). We did look into sender-based inflight accounting, but the same set of issues arises. Sure, a Request+Reply model would make this easier to handle, but we want to explicitly support a Subscribe+Event{n} model. In this case there is more than one Reply to a message. Long story short: We have uid<->uid quotas so far, which prevent DoS attacks, unless
[RFC v1 02/14] bus1: provide stub cdev /dev/bus1
From: Tom Gundersen Add the CONFIG_BUS1 option to enable the bus1 kernel messaging bus. If enabled, provide the bus1.ko module with a stub cdev /dev/bus1. So far it does not expose any API, but the full intended uapi is provided in include/uapi/linux/bus1.h already. Signed-off-by: Tom Gundersen Signed-off-by: David Herrmann --- include/uapi/linux/bus1.h | 138 ++ init/Kconfig | 17 ++ ipc/Makefile | 1 + ipc/bus1/Makefile | 6 ++ ipc/bus1/main.c | 80 +++ ipc/bus1/main.h | 74 + ipc/bus1/tests.c | 19 +++ ipc/bus1/tests.h | 32 +++ 8 files changed, 367 insertions(+) create mode 100644 include/uapi/linux/bus1.h create mode 100644 ipc/bus1/Makefile create mode 100644 ipc/bus1/main.c create mode 100644 ipc/bus1/main.h create mode 100644 ipc/bus1/tests.c create mode 100644 ipc/bus1/tests.h diff --git a/include/uapi/linux/bus1.h b/include/uapi/linux/bus1.h new file mode 100644 index 000..8ec3357 --- /dev/null +++ b/include/uapi/linux/bus1.h @@ -0,0 +1,138 @@ +#ifndef _UAPI_LINUX_BUS1_H +#define _UAPI_LINUX_BUS1_H + +/* + * Copyright (C) 2013-2016 Red Hat, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU Lesser General Public License as published by the + * Free Software Foundation; either version 2.1 of the License, or (at + * your option) any later version. + */ + +#include +#include + +#define BUS1_FD_MAX(256) + +#define BUS1_IOCTL_MAGIC 0x96 +#define BUS1_HANDLE_INVALID((__u64)-1) +#define BUS1_OFFSET_INVALID((__u64)-1) + +enum { + BUS1_HANDLE_FLAG_MANAGED= 1ULL << 0, + BUS1_HANDLE_FLAG_REMOTE = 1ULL << 1, +}; + +enum { + BUS1_PEER_FLAG_WANT_SECCTX = 1ULL << 0, +}; + +enum { + BUS1_PEER_RESET_FLAG_FLUSH = 1ULL << 0, + BUS1_PEER_RESET_FLAG_FLUSH_SEED = 1ULL << 1, +}; + +struct bus1_cmd_peer_reset { + __u64 flags; + __u64 peer_flags; + __u32 max_slices; + __u32 max_handles; + __u32 max_inflight_bytes; + __u32 max_inflight_fds; +} __attribute__((__aligned__(8))); + +struct bus1_cmd_handle_transfer { + __u64 flags; + __u64 src_handle; + __u64 dst_fd; + __u64 dst_handle; +} __attribute__((__aligned__(8))); + +enum { + BUS1_NODES_DESTROY_FLAG_RELEASE_HANDLES = 1ULL << 0, +}; + +struct bus1_cmd_nodes_destroy { + __u64 flags; + __u64 ptr_nodes; + __u64 n_nodes; +} __attribute__((__aligned__(8))); + +enum { + BUS1_SEND_FLAG_CONTINUE = 1ULL << 0, + BUS1_SEND_FLAG_SEED = 1ULL << 1, +}; + +struct bus1_cmd_send { + __u64 flags; + __u64 ptr_destinations; + __u64 ptr_errors; + __u64 n_destinations; + __u64 ptr_vecs; + __u64 n_vecs; + __u64 ptr_handles; + __u64 n_handles; + __u64 ptr_fds; + __u64 n_fds; +} __attribute__((__aligned__(8))); + +enum { + BUS1_RECV_FLAG_PEEK = 1ULL << 0, + BUS1_RECV_FLAG_SEED = 1ULL << 1, + BUS1_RECV_FLAG_INSTALL_FDS = 1ULL << 2, +}; + +enum { + BUS1_MSG_NONE, + BUS1_MSG_DATA, + BUS1_MSG_NODE_DESTROY, + BUS1_MSG_NODE_RELEASE, +}; + +enum { + BUS1_MSG_FLAG_HAS_SECCTX= 1ULL << 0, + BUS1_MSG_FLAG_CONTINUE = 1ULL << 1, +}; + +struct bus1_cmd_recv { + __u64 flags; + __u64 max_offset; + struct { + __u64 type; + __u64 flags; + __u64 destination; + __u32 uid; + __u32 gid; + __u32 pid; + __u32 tid; + __u64 offset; + __u64 n_bytes; + __u64 n_handles; + __u64 n_fds; + __u64 n_secctx; + } __attribute__((__aligned__(8))) msg; +} __attribute__((__aligned__(8))); + +enum { + BUS1_CMD_PEER_DISCONNECT= _IOWR(BUS1_IOCTL_MAGIC, 0x00, + __u64), + BUS1_CMD_PEER_QUERY = _IOWR(BUS1_IOCTL_MAGIC, 0x01, + struct bus1_cmd_peer_reset), + BUS1_CMD_PEER_RESET = _IOWR(BUS1_IOCTL_MAGIC, 0x02, + struct bus1_cmd_peer_reset), + BUS1_CMD_HANDLE_RELEASE = _IOWR(BUS1_IOCTL_MAGIC, 0x10, +
[RFC v1 03/14] bus1: util - active reference utility library
From: Tom Gundersen The bus1_active object implements active references. They work similarly to plain object reference counters, but allow disabling any new references from being taken. Each bus1_active object goes through a set of states: NEW: Initial state, no active references can be acquired ACTIVE: Live state, active references can be acquired DRAINING: Deactivated but lingering, no active references can be acquired DRAINED: Deactivated and all active references were dropped RELEASED: Fully drained and synchronously released Initially, all bus1_active objects are in state NEW. As soon as they're activated, they enter ACTIVE and active references can be acquired. This is the normal, live state. Once the object is deactivated, it enters state DRAINING. No new active references can be acquired, but some threads might still own active references. Once all those are dropped, the object enters state DRAINED. Now the object can be released a *single* time, before it enters state RELEASED and is finished. It cannot be re-used anymore. Active-references are very useful to track threads that invoke callbacks on an object. As long as a callback is running, an active reference is held, and as such the object is usually protected from being destroyed. The destructor of the object needs to deactivate *and* drain the object, before releasing resources. Active references will be used heavy by the upcoming bus1_peer object. Whenever a peer operates on a remote peer, it must acquire and hold an active reference on that remote peer. This guarantees that the remote peer will wait for this operation to finish before possibly disconnecting from the bus. In concept, active-references can be seen as rw-locks. However, they have much more strict state-transitions. Prior art can be seen in super-blocks ('atomic_t s_active'), and kernfs ('atomic_t active'). Signed-off-by: Tom Gundersen Signed-off-by: David Herrmann --- ipc/bus1/Makefile | 3 +- ipc/bus1/util/active.c | 419 + ipc/bus1/util/active.h | 154 ++ 3 files changed, 575 insertions(+), 1 deletion(-) create mode 100644 ipc/bus1/util/active.c create mode 100644 ipc/bus1/util/active.h diff --git a/ipc/bus1/Makefile b/ipc/bus1/Makefile index d3a4491..9e491691 100644 --- a/ipc/bus1/Makefile +++ b/ipc/bus1/Makefile @@ -1,5 +1,6 @@ bus1-y := \ - main.o + main.o \ + util/active.o obj-$(CONFIG_BUS1) += bus1.o diff --git a/ipc/bus1/util/active.c b/ipc/bus1/util/active.c new file mode 100644 index 000..5f5fdaa --- /dev/null +++ b/ipc/bus1/util/active.c @@ -0,0 +1,419 @@ +/* + * Copyright (C) 2013-2016 Red Hat, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU Lesser General Public License as published by the + * Free Software Foundation; either version 2.1 of the License, or (at + * your option) any later version. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include +#include +#include +#include +#include "active.h" + +/* + * Bias values track states of "active references". They're all negative. If an + * object is active, its active-ref-counter is >=0 and tracks all active + * references. Once an object is deactivated, we subtract ACTIVE_BIAS. This + * means, the counter is now negative but still counts the active references. + * Once it drops to exactly ACTIVE_BIAS, we know all active references were + * dropped. Exactly one thread will change it to ACTIVE_RELEASE now, perform + * cleanup and then put it into ACTIVE_DONE. Once released, all other threads + * that tried deactivating the node will now be woken up (thus, they wait until + * the object is fully done). + * The initial state during object setup is ACTIVE_NEW. If an object is + * directly deactivated without having ever been active, it is put into + * ACTIVE_RELEASE_DIRECT instead of ACTIVE_BIAS. This tracks this one-bit state + * across deactivation. The task putting it into ACTIVE_RELEASE now knows + * whether the object was active before or not. + * + * We support lockdep annotations for 'active references'. We treat active + * references as a read-trylock, and deactivation as a write-lock. + * + * Some archs implement atomic_sub(v) with atomic_add(-v), so reserve INT_MIN + * to avoid overflows if multiplied by -1. + */ +#define BUS1_ACTIVE_RELEASE_DIRECT (BUS1_ACTIVE_BIAS - 1) +#define BUS1_ACTIVE_RELEASE(BUS1_ACTIVE_BIAS - 2) +#define BUS1_ACTIVE_DONE (BUS1_ACTIVE_BIAS - 3) +#define BUS1_ACTIVE_NEW(BUS1_ACTIVE_BIAS - 4) +#define _BUS1_ACTIVE_RESERVED (BUS1_ACTIVE_BIAS - 5) + +/** + * bus1_active_init_private() - initialize object + * @active:object to initialize + * + * This initializes an act
[RFC v1 08/14] bus1: implement peer management context
From: Tom Gundersen A peer context provides access to the bus1 system. A peer itself is not a routable entity, but rather only a local anchor to serve as gateway to the bus. To participate on the bus, you need to allocate a peer. This peer manages all your state on the bus, including all allocated nodes, owned handles, incoming messages, and more. A peer is split into 3 sections: - A static section that is initialized at peer creation and never changes - A peer-local section that is only ever accessed by ioctls done by the peer itself. - A data section that might be accessed by remote peers when interacting with this peer. All peers on the system operate on the same level. There is no context a peer is linked into. Hence, you can never lock multiple peers at the same time. Instead, peers provide active-references. Before performing an operation on a peer, an active reference must be acquired, and hold as long as the operation goes on. When done, the reference is released again. When a peer is disconnected, no more active references can be acquired, and any outstanding operation is waited for before the peer is destroyed. Additionally to active-references, there are 2 locks: A peer-local lock and a data lock. The peer-local lock is used to synchronize operations done by the peer itself. It is never acquired by a remote peer. The data lock protects the data of the peer, which might be modified by remote peers. The data lock nests underneath the local-lock. Furthermore, the data-lock critical sections must be kept small and never block indefinitely. Remote peers might wait for data-locks, hence they must rely on not being DoSed. The local peer lock, however, is private to the peer itself. Not such restrictions apply. It is mostly used to give the impression of atomic operations (i.e., making the API appear consistent and coherent). This only adds the peer context, the ioctls will be implemented in follow-up patches. Signed-off-by: Tom Gundersen Signed-off-by: David Herrmann --- ipc/bus1/Makefile | 2 + ipc/bus1/main.c | 17 +++ ipc/bus1/main.h | 14 ++ ipc/bus1/peer.c | 145 + ipc/bus1/peer.h | 146 ++ ipc/bus1/util.c | 52 +++ ipc/bus1/util.h | 51 +++ 7 files changed, 427 insertions(+) create mode 100644 ipc/bus1/peer.c create mode 100644 ipc/bus1/peer.h create mode 100644 ipc/bus1/util.c create mode 100644 ipc/bus1/util.h diff --git a/ipc/bus1/Makefile b/ipc/bus1/Makefile index 94d79e0..c689917 100644 --- a/ipc/bus1/Makefile +++ b/ipc/bus1/Makefile @@ -1,6 +1,8 @@ bus1-y := \ main.o \ + peer.o \ user.o \ + util.o \ util/active.o \ util/flist.o\ util/pool.o \ diff --git a/ipc/bus1/main.c b/ipc/bus1/main.c index 526347d..51034f3 100644 --- a/ipc/bus1/main.c +++ b/ipc/bus1/main.c @@ -15,24 +15,41 @@ #include #include #include "main.h" +#include "peer.h" #include "tests.h" #include "user.h" static int bus1_fop_open(struct inode *inode, struct file *file) { + struct bus1_peer *peer; + + peer = bus1_peer_new(); + if (IS_ERR(peer)) + return PTR_ERR(peer); + + file->private_data = peer; return 0; } static int bus1_fop_release(struct inode *inode, struct file *file) { + bus1_peer_free(file->private_data); return 0; } +static void bus1_fop_show_fdinfo(struct seq_file *m, struct file *file) +{ + struct bus1_peer *peer = file->private_data; + + seq_printf(m, KBUILD_MODNAME "-peer:\t%16llx\n", peer->id); +} + const struct file_operations bus1_fops = { .owner = THIS_MODULE, .open = bus1_fop_open, .release= bus1_fop_release, .llseek = noop_llseek, + .show_fdinfo= bus1_fop_show_fdinfo, }; static struct miscdevice bus1_misc = { diff --git a/ipc/bus1/main.h b/ipc/bus1/main.h index 76fce66..dd319d9 100644 --- a/ipc/bus1/main.h +++ b/ipc/bus1/main.h @@ -49,6 +49,20 @@ * ordered, including unicasts, multicasts, and notifications. */ +/** + * Locking + * + * Most of the bus1 objects form a hierarchy, as such, their locks must be + * ordered. Not all orders are explicitly defined (e.g., they might define + * orthogonal hierarchies), but this list gives a rough overview: + * + * bus1_peer.active + * bus1_peer.local.lock + * bus1_peer.data.lock + * bus1_user.lock + * bus1_user_lock + */ + struct dentry; struct file_operations; diff --git a/ipc/bus1/peer.c b/ipc/bus1/peer.c new file mode 100644 index 000..a6fbca01 --- /dev/null +++ b/ipc/bus1/peer.c
[RFC v1 09/14] bus1: provide transaction context for multicasts
From: Tom Gundersen The transaction engine is an object that lives on the stack and is used to stage and commit multicasts properly. Unlike unicasts, a multicast cannot just be queued on each destination, but must be properly synchronized. This requires us to first stage each message on their respective destination, then sync and tick the clocks, and eventual commit all messages. The transaction context implements this logic for both, unicasts and multicasts. It hides the timestamp handling and takes care to properly synchronize accesses to the peer queues. Signed-off-by: Tom Gundersen Signed-off-by: David Herrmann --- ipc/bus1/Makefile | 1 + ipc/bus1/peer.c | 2 + ipc/bus1/peer.h | 3 + ipc/bus1/tx.c | 360 ++ ipc/bus1/tx.h | 102 5 files changed, 468 insertions(+) create mode 100644 ipc/bus1/tx.c create mode 100644 ipc/bus1/tx.h diff --git a/ipc/bus1/Makefile b/ipc/bus1/Makefile index c689917..e3c7dd7 100644 --- a/ipc/bus1/Makefile +++ b/ipc/bus1/Makefile @@ -1,6 +1,7 @@ bus1-y := \ main.o \ peer.o \ + tx.o\ user.o \ util.o \ util/active.o \ diff --git a/ipc/bus1/peer.c b/ipc/bus1/peer.c index a6fbca01..3421f8c 100644 --- a/ipc/bus1/peer.c +++ b/ipc/bus1/peer.c @@ -70,6 +70,7 @@ struct bus1_peer *bus1_peer_new(void) /* initialize data section */ mutex_init(&peer->data.lock); + bus1_queue_init(&peer->data.queue); /* initialize peer-private section */ mutex_init(&peer->local.lock); @@ -131,6 +132,7 @@ struct bus1_peer *bus1_peer_free(struct bus1_peer *peer) mutex_destroy(&peer->local.lock); /* deinitialize data section */ + bus1_queue_deinit(&peer->data.queue); mutex_destroy(&peer->data.lock); /* deinitialize constant fields */ diff --git a/ipc/bus1/peer.h b/ipc/bus1/peer.h index 277fcf8..149ddf6 100644 --- a/ipc/bus1/peer.h +++ b/ipc/bus1/peer.h @@ -54,6 +54,7 @@ #include #include "user.h" #include "util/active.h" +#include "util/queue.h" struct cred; struct dentry; @@ -71,6 +72,7 @@ struct pid_namespace; * @active:active references * @debugdir: debugfs root of this peer, or NULL/ERR_PTR * @data.lock: data lock + * @data.queue:message queue * @local.lock:local peer runtime lock */ struct bus1_peer { @@ -86,6 +88,7 @@ struct bus1_peer { struct { struct mutex lock; + struct bus1_queue queue; } data; struct { diff --git a/ipc/bus1/tx.c b/ipc/bus1/tx.c new file mode 100644 index 000..6ff8949 --- /dev/null +++ b/ipc/bus1/tx.c @@ -0,0 +1,360 @@ +/* + * Copyright (C) 2013-2016 Red Hat, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU Lesser General Public License as published by the + * Free Software Foundation; either version 2.1 of the License, or (at + * your option) any later version. + */ + +#include +#include +#include +#include +#include +#include "peer.h" +#include "tx.h" +#include "util/active.h" +#include "util/queue.h" + +static void bus1_tx_push(struct bus1_tx *tx, +struct bus1_queue_node **list, +struct bus1_queue_node *qnode) +{ + struct bus1_peer *peer = qnode->owner; + + /* +* Push @qnode onto one of the lists in @tx (specified as @list). Note +* that each list has different locking/ordering requirements, which +* the caller has to verify. This helper does not check them. +* +* Whenever something is pushed on a list, we make sure it has the tx +* set as group. Furthermore, we tell lockdep that its peer was +* released. This is required to allow holding hundreds of peers in a +* multicast without exceeding the lockdep limits of allowed locks held +* in parallel. +* Note that pushing a qnode on a list consumes the qnode together with +* its set owner. The caller must not access it, except by popping it +* from the list or using one of the internal list-iterators. In other +* words, we say that a caller must be aware of lockdep limitations +* whenever they hold an unlimited number of peers. However, if they +* make sure they only ever hold a fixed number, but use transaction +* lists to stash them, the transaction lists make sure to properly +* avoid lockdep limitations. +*/ + + WARN_ON(qnode->group && tx != qnode->group); + WARN_ON(qn
[RFC v1 05/14] bus1: util - pool utility library
From: Tom Gundersen A bus1-pool is a shmem-backed memory pool shared between userspace and the kernel. The pool is used to transfer memory from the kernel to userspace without requiring userspace to pre-allocate space. The pool is managed in slices, which are published to userspace when they are ready to be read and must be released by userspace when userspace is done with them. Userspace has read-only access to its pools and the kernel has read-write access, but published slices are not altered. This pool implementation will be used by bus1 message transactions to support single-copy data transfers, directly from the sender's address space into the pool of the destination peer. The allocation algorithm is based on the Android Binder code and has served their needs well for many years now. Signed-off-by: Tom Gundersen Signed-off-by: David Herrmann --- ipc/bus1/Makefile| 3 +- ipc/bus1/util/pool.c | 572 +++ ipc/bus1/util/pool.h | 164 +++ 3 files changed, 738 insertions(+), 1 deletion(-) create mode 100644 ipc/bus1/util/pool.c create mode 100644 ipc/bus1/util/pool.h diff --git a/ipc/bus1/Makefile b/ipc/bus1/Makefile index 6db6d13..ca8e19d 100644 --- a/ipc/bus1/Makefile +++ b/ipc/bus1/Makefile @@ -1,7 +1,8 @@ bus1-y := \ main.o \ util/active.o \ - util/flist.o + util/flist.o\ + util/pool.o obj-$(CONFIG_BUS1) += bus1.o diff --git a/ipc/bus1/util/pool.c b/ipc/bus1/util/pool.c new file mode 100644 index 000..2ddbffb --- /dev/null +++ b/ipc/bus1/util/pool.c @@ -0,0 +1,572 @@ +/* + * Copyright (C) 2013-2016 Red Hat, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU Lesser General Public License as published by the + * Free Software Foundation; either version 2.1 of the License, or (at + * your option) any later version. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "pool.h" + +static struct bus1_pool_slice *bus1_pool_slice_new(size_t offset, size_t size) +{ + struct bus1_pool_slice *slice; + + if (offset > U32_MAX || size == 0 || size > BUS1_POOL_SLICE_SIZE_MAX) + return ERR_PTR(-EMSGSIZE); + + slice = kmalloc(sizeof(*slice), GFP_KERNEL); + if (!slice) + return ERR_PTR(-ENOMEM); + + slice->offset = offset; + slice->size = size; + + return slice; +} + +static struct bus1_pool_slice * +bus1_pool_slice_free(struct bus1_pool_slice *slice) +{ + if (!slice) + return NULL; + + kfree(slice); + + return NULL; +} + +/* insert slice into the free tree */ +static void bus1_pool_slice_link_free(struct bus1_pool_slice *slice, + struct bus1_pool *pool) +{ + struct rb_node **n, *prev = NULL; + struct bus1_pool_slice *ps; + + n = &pool->slices_free.rb_node; + while (*n) { + prev = *n; + ps = container_of(prev, struct bus1_pool_slice, rb); + if (slice->size < ps->size) + n = &prev->rb_left; + else + n = &prev->rb_right; + } + + rb_link_node(&slice->rb, prev, n); + rb_insert_color(&slice->rb, &pool->slices_free); +} + +/* insert slice into the busy tree */ +static void bus1_pool_slice_link_busy(struct bus1_pool_slice *slice, + struct bus1_pool *pool) +{ + struct rb_node **n, *prev = NULL; + struct bus1_pool_slice *ps; + + n = &pool->slices_busy.rb_node; + while (*n) { + prev = *n; + ps = container_of(prev, struct bus1_pool_slice, rb); + if (WARN_ON(slice->offset == ps->offset)) + n = &prev->rb_right; /* add anyway */ + else if (slice->offset < ps->offset) + n = &prev->rb_left; + else /* if (slice->offset > ps->offset) */ + n = &prev->rb_right; + } + + rb_link_node(&slice->rb, prev, n); + rb_insert_color(&slice->rb, &pool->slices_busy); + + pool->allocated_size += slice->size; +} + +/* find free slice big enough to hold @size bytes */ +static struct bus1_pool_slice * +bus1_pool_slice_find_by_size(struct bus1_pool *pool, size_t size) +{ + struct bus1_pool_slice *ps, *closest = NULL; + struct rb_node *n; + + n = pool->slices_free.rb_node; + while (n) { + ps = container_of(n, struct bus1_pool_s
[RFC v1 13/14] bus1: limit and protect resources
From: Tom Gundersen This adds resource-counters to peers and users. They limit the number of objects that a peer can operate on. The root limits per user can be configured as a module option. Everything else is just based on them. This also adds LSM hooks. They are not integrated with ./security/, yet, but just provide the hooks as discussed with the SELinux maintainers. Since the operations on bus1 are very similar in nature to Binder, the hooks are the same as well (since the LSM people seemed pretty happy with the Binder hooks, anyway). Signed-off-by: Tom Gundersen Signed-off-by: David Herrmann --- ipc/bus1/message.c | 47 +- ipc/bus1/peer.c | 95 ++- ipc/bus1/peer.h | 2 + ipc/bus1/security.h | 45 + ipc/bus1/user.c | 475 ipc/bus1/user.h | 75 - 6 files changed, 728 insertions(+), 11 deletions(-) create mode 100644 ipc/bus1/security.h diff --git a/ipc/bus1/message.c b/ipc/bus1/message.c index 4c5c905..6145d5f 100644 --- a/ipc/bus1/message.c +++ b/ipc/bus1/message.c @@ -26,6 +26,7 @@ #include "handle.h" #include "message.h" #include "peer.h" +#include "security.h" #include "tx.h" #include "user.h" #include "util.h" @@ -242,9 +243,16 @@ int bus1_factory_seal(struct bus1_factory *f) struct bus1_handle *h; struct bus1_flist *e; size_t i; + int r; lockdep_assert_held(&f->peer->local.lock); + r = bus1_user_charge(&f->peer->user->limits.n_handles, +&f->peer->data.limits.n_handles, +f->n_handles_charge); + if (r < 0) + return r; + for (i = 0, e = f->handles; i < f->n_handles; e = bus1_flist_next(e, &i)) { @@ -291,11 +299,29 @@ struct bus1_message *bus1_factory_instantiate(struct bus1_factory *f, transmit_secctx = f->has_secctx && (READ_ONCE(peer->flags) & BUS1_PEER_FLAG_WANT_SECCTX); + r = bus1_user_charge(&peer->user->limits.n_slices, +&peer->data.limits.n_slices, 1); + if (r < 0) + return ERR_PTR(r); + + r = bus1_user_charge(&peer->user->limits.n_handles, +&peer->data.limits.n_handles, f->n_handles); + if (r < 0) { + bus1_user_discharge(&peer->user->limits.n_slices, + &peer->data.limits.n_slices, 1); + return ERR_PTR(r); + } + size = sizeof(*m) + bus1_flist_inline_size(f->n_handles) + f->n_files * sizeof(struct file *); m = kmalloc(size, GFP_KERNEL); - if (!m) + if (!m) { + bus1_user_discharge(&peer->user->limits.n_handles, + &peer->data.limits.n_handles, f->n_handles); + bus1_user_discharge(&peer->user->limits.n_slices, + &peer->data.limits.n_slices, 1); return ERR_PTR(-ENOMEM); + } /* set to default first, so the destructor can be called anytime */ kref_init(&m->ref); @@ -329,6 +355,8 @@ struct bus1_message *bus1_factory_instantiate(struct bus1_factory *f, m->slice = bus1_pool_alloc(&peer->data.pool, size); mutex_unlock(&peer->data.lock); if (IS_ERR(m->slice)) { + bus1_user_discharge(&peer->user->limits.n_slices, + &peer->data.limits.n_slices, 1); r = PTR_ERR(m->slice); m->slice = NULL; goto error; @@ -376,6 +404,11 @@ struct bus1_message *bus1_factory_instantiate(struct bus1_factory *f, /* import files */ while (m->n_files < f->n_files) { + r = security_bus1_transfer_file(f->peer, peer, + f->files[m->n_files]); + if (r < 0) + goto error; + m->files[m->n_files] = get_file(f->files[m->n_files]); ++m->n_files; } @@ -436,10 +469,15 @@ void bus1_message_free(struct kref *k) bus1_handle_unref(e->ptr); } } + bus1_user_discharge(&peer->user->limits.n_handles, + &peer->data.limits.n_handles, m->n_handles_charge); bus1_flist_deinit(m->handles, m->n_handles); if (m->slice) { mutex_lock(&peer->data.lock); + if (!bus1_pool_slice_is_public(m->slice)) + bus1_user_discharge(&peer->user->l
[RFC v1 07/14] bus1: tracking user contexts
From: Tom Gundersen Different users can communicate via bus1, and many resources are shared between multiple users. The bus1_user object represents the UID of a user, like "struct user_struct" does in the kernel core. It is used to account global resources, apply limits, and calculate quotas if different UIDs communicate with each other. All dynamic resources have global per-user limits, which cannot be exceeded by a user. They prevent a single user from exhausting local resources. Each peer that is created is always owned by the user that initialized it. All resources allocated on that peer are accounted on that pinned user. Additionally to global resources, there are local limits per peer, that can be controlled by each peer individually (e.g., specifying a maximum pool size). Those local limits allow a user to distribute the globally available resources across its peer instances. Since bus1 allows communication across UID boundaries, any such transmission of resources must be properly accounted. Bus1 employs dynamic quotas to fairly distribute available resources. Those quotas make sure that available resources of a peer cannot be exhausted by remote UIDs, but are fairly divided among all communicating peers. This only implements the user tracking, the resource limits will be added in follow-up patches. Signed-off-by: Tom Gundersen Signed-off-by: David Herrmann --- ipc/bus1/Makefile | 1 + ipc/bus1/main.c | 3 ++ ipc/bus1/user.c | 153 ++ ipc/bus1/user.h | 67 4 files changed, 224 insertions(+) create mode 100644 ipc/bus1/user.c create mode 100644 ipc/bus1/user.h diff --git a/ipc/bus1/Makefile b/ipc/bus1/Makefile index 3c90657..94d79e0 100644 --- a/ipc/bus1/Makefile +++ b/ipc/bus1/Makefile @@ -1,5 +1,6 @@ bus1-y := \ main.o \ + user.o \ util/active.o \ util/flist.o\ util/pool.o \ diff --git a/ipc/bus1/main.c b/ipc/bus1/main.c index 02412a7..526347d 100644 --- a/ipc/bus1/main.c +++ b/ipc/bus1/main.c @@ -16,6 +16,7 @@ #include #include "main.h" #include "tests.h" +#include "user.h" static int bus1_fop_open(struct inode *inode, struct file *file) { @@ -64,6 +65,7 @@ static int __init bus1_modinit(void) error: debugfs_remove(bus1_debugdir); + bus1_user_modexit(); return r; } @@ -71,6 +73,7 @@ static void __exit bus1_modexit(void) { misc_deregister(&bus1_misc); debugfs_remove(bus1_debugdir); + bus1_user_modexit(); pr_info("unloaded\n"); } diff --git a/ipc/bus1/user.c b/ipc/bus1/user.c new file mode 100644 index 000..0498ab4 --- /dev/null +++ b/ipc/bus1/user.c @@ -0,0 +1,153 @@ +/* + * Copyright (C) 2013-2016 Red Hat, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU Lesser General Public License as published by the + * Free Software Foundation; either version 2.1 of the License, or (at + * your option) any later version. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "user.h" + +static DEFINE_MUTEX(bus1_user_lock); +static DEFINE_IDR(bus1_user_idr); + +/** + * bus1_user_modexit() - clean up global resources of user accounting + * + * This function cleans up any remaining global resources that were allocated + * by the user accounting helpers. The caller must make sure that no user + * object is referenced anymore, before calling this. This function just clears + * caches and verifies nothing is leaked. + * + * This is meant to be called on module-exit. + */ +void bus1_user_modexit(void) +{ + WARN_ON(!idr_is_empty(&bus1_user_idr)); + idr_destroy(&bus1_user_idr); + idr_init(&bus1_user_idr); +} + +static struct bus1_user *bus1_user_new(void) +{ + struct bus1_user *user; + + user = kmalloc(sizeof(*user), GFP_KERNEL); + if (!user) + return ERR_PTR(-ENOMEM); + + kref_init(&user->ref); + user->uid = INVALID_UID; + mutex_init(&user->lock); + + return user; +} + +static void bus1_user_free(struct kref *ref) +{ + struct bus1_user *user = container_of(ref, struct bus1_user, ref); + + lockdep_assert_held(&bus1_user_lock); + + if (likely(uid_valid(user->uid))) + idr_remove(&bus1_user_idr, __kuid_val(user->uid)); + mutex_destroy(&user->lock); + kfree_rcu(user, rcu); +} + +/** + * bus1_user_ref_by_uid() - get a user object for a uid + * @uid: uid of the user + * + * Find and return the user object for the uid if it exists, otherwise create + * it first. + * + * Return: A user object
[RFC v1 06/14] bus1: util - queue utility library
From: Tom Gundersen (Please refer to 'Lamport Timestamps', the concept of 'happened-before', and 'causal ordering'. The queue implementation has its roots in Lamport Timestamps, treating a set of local CPUs as a distributed system to avoid any global synchronization.) A bus1 message queue is a FIFO, i.e., messages are linearly ordered by the time they were sent. Moreover, atomic delivery of messages to multiple queues are supported, without any global synchronization, i.e., the order of message delivery is consistent across queues. Messages can be destined for multiple queues, hence, we need to be careful that all queues get a consistent order of incoming messages. We define the concept of `global order' to provide a basic set of guarantees. This global order is a partial order on the set of all messages. The order is defined as: 1) If a message B was queued *after* a message A, then: A < B 2) If a message B was queued *after* a message A was dequeued, then: A < B 3) If a message B was dequeued *after* a message A on the same queue, then: A < B (Note: Causality is honored. `after' and `before' do not refer to the same task, nor the same queue, but rather any kind of synchronization between the two operations.) The queue object implements this global order in a lockless fashion. It solely relies on a distributed clock on each queue. Each message to be sent causes a clock tick on the local clock and on all destination clocks. Furthermore, all clocks are synchronized, meaning they're fast-forwarded in case they're behind the highest of all participating peers. No global state tracking is involved. During a message transaction, we first queue a message as 'staging' entry in each destination with a preliminary timestamp. This timestamp is explicitly odd numbered. Any odd numbered timestamp is considered 'staging' and causes *any* message ordered after it to be blocked until it is no longer staging. This allows us to queue the message in parallel with any racing multicast, and be guaranteed that all possible conflicts are blocked until we eventually commit a transaction. To commit a transaction (after all staging entries are queued), we choose the highest timestamp we have seen across all destinations and re-queue all our entries on each peer using that timestamp. Here we use a commit timestamp (even numbered). With this in mind, we define that a client can only dequeue messages from its queue that have an even timestamp. Furthermore, if there is a message queued with an odd timestamp that is lower than the even timestamp of another message, then neither message can be dequeued. They're considered to be in-flight conflicts. This guarantees that two concurrent multicast messages can be queued without any *global* locks, but either can only be dequeued by a peer if their ordering has been established (via commit timestamps). NOTE: A fully committed message is not guaranteed to be ready to be dequeued as it may be blocked by a staging entry. This means that there is an arbitrary (though bounded) time from a message transaction completing when the queue may still appear to be empty. In other words, message transmission is not instantaneous. It would be possible to change this at the cost of shortly blocking each message transaction on all other conflicting tasks. The queue implementation uses an rb-tree (ordered by timestamps and sender), with a cached pointer to the front of the queue. It will be embedded in every peer participating on the bus1 kernel message bus1. Signed-off-by: Tom Gundersen Signed-off-by: David Herrmann --- ipc/bus1/Makefile | 3 +- ipc/bus1/util/queue.c | 445 ++ ipc/bus1/util/queue.h | 351 +++ 3 files changed, 798 insertions(+), 1 deletion(-) create mode 100644 ipc/bus1/util/queue.c create mode 100644 ipc/bus1/util/queue.h diff --git a/ipc/bus1/Makefile b/ipc/bus1/Makefile index ca8e19d..3c90657 100644 --- a/ipc/bus1/Makefile +++ b/ipc/bus1/Makefile @@ -2,7 +2,8 @@ bus1-y := \ main.o \ util/active.o \ util/flist.o\ - util/pool.o + util/pool.o \ + util/queue.o obj-$(CONFIG_BUS1) += bus1.o diff --git a/ipc/bus1/util/queue.c b/ipc/bus1/util/queue.c new file mode 100644 index 000..38d7b98 --- /dev/null +++ b/ipc/bus1/util/queue.c @@ -0,0 +1,445 @@ +/* + * Copyright (C) 2013-2016 Red Hat, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU Lesser General Public License as published by the + * Free Software Foundation; either version 2.1 of the License, or (at + * your option) any later version. + */
[RFC v1 00/14] Bus1 Kernel Message Bus
Hi This proposal introduces bus1.ko, a kernel messaging bus. This is not a request for inclusion, yet. It is rather an initial draft and a Request For Comments. While bus1 emerged out of the kdbus project, bus1 was started from scratch and the concepts have little in common. In a nutshell, bus1 provides a capability-based IPC system, similar in nature to Android Binder, Cap'n Proto, and seL4. The module is completely generic and does neither require nor mandate a user-space counter-part. o Description Bus1 is a local IPC system, which provides a decentralized infrastructure to share objects between local peers. The main building blocks are nodes and handles. Nodes represent objects of a local peer, while handles represent descriptors that point to a node. Nodes can be created and destroyed by any peer, and they will always remain owned by their respective creator. Handles on the other hand, are used to refer to nodes and can be passed around with messages as auxiliary data. Whenever a handle is transferred, the receiver will get its own handle allocated, pointing to the same node as the original handle. Any peer can send messages directed at one of their handles. This will transfer the message to the owner of the node the handle points to. If a peer does not posess a handle to a given node, it will not be able to send a message to that node. That is, handles provide exclusive access management. Anyone that somehow acquired a handle to a node is privileged to further send this handle to other peers. As such, access management is transitive. Once a peer acquired a handle, it cannot be revoked again. However, a node owner can, at anytime, destroy a node. This will effectively unbind all existing handles to that node on any peer, notifying each one of the destruction. Unlike nodes and handles, peers cannot be addressed directly. In fact, peers are completely disconnected entities. A peer is merely an anchor of a set of nodes and handles, including an incoming message queue for any of those. Whether multiple nodes are all part of the same peer, or part of different peers does not affect the remote view of those. Peers solely exist as management entity and command dispatcher to local processes. The set of actors on a system is completely decentralized. There is no global component involved that provides a central registry or discovery mechanism. Furthermore, communication between peers only involves those peers, and does not affect any other peer in any way. No global communication lock is taken. However, any communication is still globally ordered, including unicasts, multicasts, and notifications. o Prior Art The concepts behind bus1 are almost identical to capability systems like Android Binder, Google Mojo, Cap'n Proto, seL4, and more. Bus1 differs from them by supporting Global Ordering, Multicasts, Resource Accounting, No Global Locking, No Global Context. While the bus1 UAPI does not expose all features (like soft-references as supported by Binder), the in-kernel code includes support for it. Multiple UAPIs can be supported on top of the in-kernel bus1 code, including support for the Binder UAPI. Efforts on this are still on-going. o Documentation The first patch in this series provides the bus1(7) man-page. It explains all concepts in bus1 in more detail. Furthermore, it describes the API that is available on bus1 file descriptors. The pre-compiled man-page is available at: http://www.bus1.org/bus1.html There is also a great bunch of in-source documentation available. All cross-source-file APIs have KernelDoc annotations. Furthermore, we have an introduction for each subsystem, to be found in the header files. The total number in lines of code for bus1 is roughly ~4.5k. The remaining ~5k LOC are comments and documentation. o Upstream The upstream development repository is available on github: http://github.com/bus1/bus1 It is an out-of-tree repository that allows easy and fast development of new bus1 features. The in-tree integration repository is available at: http://github.com/bus1/linux o Conferences Tom and I will be attending Linux Plumbers Conf next week. Please do not hesitate to contact us there in person. There will also be a presentation [1] of bus1 on the last day of the conference. Thanks Tom & David [1] https://www.linuxplumbersconf.org/2016/ocw/proposals/3819 Tom Gundersen (14): bus1: add bus1(7) man-page bus1: provide stub cdev /dev/bus1 bus1: util - active reference utility library bus1: util - fixed list utility library bus1: util - pool utility library bus1: util - queue utility library bus1: tracking user contexts bus1: implement peer management context bus1: provide transaction context for multicasts
[RFC v1 14/14] bus1: basic user-space kselftests
From: Tom Gundersen This adds kselftests integration and provides some basic API tests for bus1. Signed-off-by: Tom Gundersen Signed-off-by: David Herrmann --- tools/testing/selftests/bus1/.gitignore | 2 + tools/testing/selftests/bus1/Makefile | 19 ++ tools/testing/selftests/bus1/bus1-ioctl.h | 111 +++ tools/testing/selftests/bus1/test-api.c | 532 ++ tools/testing/selftests/bus1/test-io.c| 198 +++ tools/testing/selftests/bus1/test.h | 114 +++ 6 files changed, 976 insertions(+) create mode 100644 tools/testing/selftests/bus1/.gitignore create mode 100644 tools/testing/selftests/bus1/Makefile create mode 100644 tools/testing/selftests/bus1/bus1-ioctl.h create mode 100644 tools/testing/selftests/bus1/test-api.c create mode 100644 tools/testing/selftests/bus1/test-io.c create mode 100644 tools/testing/selftests/bus1/test.h diff --git a/tools/testing/selftests/bus1/.gitignore b/tools/testing/selftests/bus1/.gitignore new file mode 100644 index 000..76ecb9c --- /dev/null +++ b/tools/testing/selftests/bus1/.gitignore @@ -0,0 +1,2 @@ +test-api +test-io diff --git a/tools/testing/selftests/bus1/Makefile b/tools/testing/selftests/bus1/Makefile new file mode 100644 index 000..cbcf689 --- /dev/null +++ b/tools/testing/selftests/bus1/Makefile @@ -0,0 +1,19 @@ +# Makefile for bus1 selftests + +CC = $(CROSS_COMPILE)gcc +CFLAGS += -D_FILE_OFFSET_BITS=64 -Wall -g -O2 +CFLAGS += -I../../../../include/uapi/ +CFLAGS += -I../../../../include/ +CFLAGS += -I../../../../usr/include/ + +TEST_PROGS := test-api test-io + +all: $(TEST_PROGS) + +%: %.c bus1-ioctl.h test.h ../../../../usr/include/linux/bus1.h + $(CC) $(CFLAGS) $< -o $@ + +include ../lib.mk + +clean: + $(RM) $(TEST_PROGS) diff --git a/tools/testing/selftests/bus1/bus1-ioctl.h b/tools/testing/selftests/bus1/bus1-ioctl.h new file mode 100644 index 000..552bd5d --- /dev/null +++ b/tools/testing/selftests/bus1/bus1-ioctl.h @@ -0,0 +1,111 @@ +#pragma once + +/* + * Copyright (C) 2013-2016 Red Hat, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU Lesser General Public License as published by the + * Free Software Foundation; either version 2.1 of the License, or (at + * your option) any later version. + */ + +#include +#include +#include +#include +#include + +#ifdef __cplusplus +extern "C" { +#endif + +static inline int +bus1_ioctl(int fd, unsigned int cmd, void *arg) +{ + return (ioctl(fd, cmd, arg) >= 0) ? 0: -errno; +} + +static inline int +bus1_ioctl_peer_disconnect(int fd) +{ + static_assert(_IOC_SIZE(BUS1_CMD_PEER_DISCONNECT) == sizeof(uint64_t), + "ioctl is called with invalid argument size"); + + return bus1_ioctl(fd, BUS1_CMD_PEER_DISCONNECT, NULL); +} + +static inline int +bus1_ioctl_peer_query(int fd, struct bus1_cmd_peer_reset *cmd) +{ + static_assert(_IOC_SIZE(BUS1_CMD_PEER_QUERY) == sizeof(*cmd), + "ioctl is called with invalid argument size"); + + return bus1_ioctl(fd, BUS1_CMD_PEER_QUERY, cmd); +} + +static inline int +bus1_ioctl_peer_reset(int fd, struct bus1_cmd_peer_reset *cmd) +{ + static_assert(_IOC_SIZE(BUS1_CMD_PEER_RESET) == sizeof(*cmd), + "ioctl is called with invalid argument size"); + + return bus1_ioctl(fd, BUS1_CMD_PEER_RESET, cmd); +} + +static inline int +bus1_ioctl_handle_release(int fd, uint64_t *cmd) +{ + static_assert(_IOC_SIZE(BUS1_CMD_HANDLE_RELEASE) == sizeof(*cmd), + "ioctl is called with invalid argument size"); + + return bus1_ioctl(fd, BUS1_CMD_HANDLE_RELEASE, cmd); +} + +static inline int +bus1_ioctl_handle_transfer(int fd, struct bus1_cmd_handle_transfer *cmd) +{ + static_assert(_IOC_SIZE(BUS1_CMD_HANDLE_TRANSFER) == sizeof(*cmd), + "ioctl is called with invalid argument size"); + + return bus1_ioctl(fd, BUS1_CMD_HANDLE_TRANSFER, cmd); +} + +static inline int +bus1_ioctl_nodes_destroy(int fd, struct bus1_cmd_nodes_destroy *cmd) +{ + static_assert(_IOC_SIZE(BUS1_CMD_NODES_DESTROY) == sizeof(*cmd), + "ioctl is called with invalid argument size"); + + return bus1_ioctl(fd, BUS1_CMD_NODES_DESTROY, cmd); +} + +static inline int +bus1_ioctl_slice_release(int fd, uint64_t *cmd) +{ + static_assert(_IOC_SIZE(BUS1_CMD_SLICE_RELEASE) == sizeof(*cmd), + "ioctl is called with invalid argument size"); + + return bus1_ioctl(fd, BUS1_CMD_SLICE_RELEASE, cmd); +} + +static inline int +bus1_ioctl_send(int fd, struct bus1_cmd_send *cmd) +{ + static_assert(_IOC_SIZE(BUS1_CMD_SEND) == sizeof(*cmd), + "ioctl is called with invalid argument size"); + + return bus1_ioctl(fd, BUS1_CMD_SEND, cmd); +} + +static inline int +bus
[RFC v1 10/14] bus1: add handle management
From: Tom Gundersen The object system on a bus is based on 'nodes' and 'handles'. Any peer can allocate new, local objects at any time. The creator automatically becomes the sole owner of the object. References to objects can be passed as payload of messages. The recipient will then gain their own reference to the object as well. Additionally, an object can be the destination of a message, in which case the message is always sent to the original creator (and thus the owner) of the object. Internally, objects are called 'nodes'. A reference to an object is a 'handle'. Whenever a new node is created, the owner implicitly gains an handle as well. In fact, handles are the only way to refer to a node. The node itself is entirely hidden in the implementation, and visible in the API as an “anchor handle”. Whenever a handle is passed as payload of a message, the target peer will gain a handle linked to the same underlying node. This works regardless of whether the sender is the owner of the underlying node, or not. Each peer can identify all its handles (both owned and un-owned) by a 64-bit integer. The namespace is local to each peer, and the numbers cannot be compared with the numbers of other peers (in fact, they are very likely to clash, but might still have *different* underlying nodes). However, if a peer receives a reference to the same node multiple times, the resulting handle will be the same. The kernel keeps count of how often each peer owns a handle. If a peer no longer requires a specific handle, it can release it. If the peer releases its last reference to a handle, the handle will be destroyed. The owner of a node (and *only* the owner) can trigger the destruction of a node (even if other peers still own handles to it). In this case, all peers that own a handle are notified of this fact. Once all handles to a specific node have been released (except for the handle internally pinned in the node itself), the owner of the node is notified of this, so it can potentially destroy both any linked state and the node itself. Node destruction is fully synchronized with any transaction. That is, a node and all its handles are valid in every message that is transmitted *before* the notification of its destruction. Furthermore, no message after this notification will carry the ID of such a destroyed node. Note that message transactions are asynchronous. That is, there is no unique point in time that a message is synchronized with another message. Hence, whether a specific handle passed with a message is still valid or not, cannot be predicted by the sender, but only by one of the receivers. Signed-off-by: Tom Gundersen Signed-off-by: David Herrmann --- ipc/bus1/Makefile | 1 + ipc/bus1/handle.c | 823 ++ ipc/bus1/handle.h | 312 + ipc/bus1/peer.c | 3 + ipc/bus1/peer.h | 2 + ipc/bus1/util.h | 83 ++ 6 files changed, 1224 insertions(+) create mode 100644 ipc/bus1/handle.c create mode 100644 ipc/bus1/handle.h diff --git a/ipc/bus1/Makefile b/ipc/bus1/Makefile index e3c7dd7..b87cddb 100644 --- a/ipc/bus1/Makefile +++ b/ipc/bus1/Makefile @@ -1,4 +1,5 @@ bus1-y := \ + handle.o\ main.o \ peer.o \ tx.o\ diff --git a/ipc/bus1/handle.c b/ipc/bus1/handle.c new file mode 100644 index 000..10f224e --- /dev/null +++ b/ipc/bus1/handle.c @@ -0,0 +1,823 @@ +/* + * Copyright (C) 2013-2016 Red Hat, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU Lesser General Public License as published by the + * Free Software Foundation; either version 2.1 of the License, or (at + * your option) any later version. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include "handle.h" +#include "peer.h" +#include "tx.h" +#include "util.h" +#include "util/queue.h" + +static void bus1_handle_init(struct bus1_handle *h, struct bus1_peer *holder) +{ + kref_init(&h->ref); + atomic_set(&h->n_weak, 0); + atomic_set(&h->n_user, 0); + h->holder = holder; + h->anchor = NULL; + h->tlink = NULL; + RB_CLEAR_NODE(&h->rb_to_peer); + h->id = BUS1_HANDLE_INVALID; +} + +static void bus1_handle_deinit(struct bus1_handle *h) +{ + if (h == h->anchor) { + WARN_ON(atomic_read(&h->node.n_strong) != 0); + WARN_ON(!RB_EMPTY_ROOT(&h->node.map_handles)); + } else if (h->anchor) { + WARN_ON(!RB_EMPTY_NODE(&h->remote.rb_to_anchor)); + bus1_handle_unref(h->anchor); + } + + bus1_queue_node_deinit(&h->qnode); + WARN_ON(!RB_EMPTY_NODE(&h-&
[RFC v1 11/14] bus1: implement message transmission
From: Tom Gundersen While notifications already work and simply require linking bus1_handle objects into the destination queue, real messages require proper payloads. This implements two core objects: Message objects and factories. The message factory is similar to transaction contexts, and lives completely on the stack. It is used to import the parameters given by user-space in a SEND ioctl. It parses and validates them. With this message factors we can now instantiate many messages, one for each destination of a multicast. Messages need to carry a bunch of data, mainly: - metadata: This just matches what Unix-sockets do (uid, gid, pid, tid, and secctx) - payload: Random memory passed in as iovec-array by user-space - files: Set of file-descriptors, very similar to SCM_RIGHTS - handles: Set of local handles to transfer to the destination Signed-off-by: Tom Gundersen Signed-off-by: David Herrmann --- ipc/bus1/Makefile | 1 + ipc/bus1/message.c | 613 + ipc/bus1/message.h | 171 +++ ipc/bus1/peer.c| 2 + ipc/bus1/peer.h| 2 + ipc/bus1/util.c| 162 ++ ipc/bus1/util.h| 7 + 7 files changed, 958 insertions(+) create mode 100644 ipc/bus1/message.c create mode 100644 ipc/bus1/message.h diff --git a/ipc/bus1/Makefile b/ipc/bus1/Makefile index b87cddb..05434bda 100644 --- a/ipc/bus1/Makefile +++ b/ipc/bus1/Makefile @@ -1,6 +1,7 @@ bus1-y := \ handle.o\ main.o \ + message.o \ peer.o \ tx.o\ user.o \ diff --git a/ipc/bus1/message.c b/ipc/bus1/message.c new file mode 100644 index 000..4c5c905 --- /dev/null +++ b/ipc/bus1/message.c @@ -0,0 +1,613 @@ +/* + * Copyright (C) 2013-2016 Red Hat, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU Lesser General Public License as published by the + * Free Software Foundation; either version 2.1 of the License, or (at + * your option) any later version. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "handle.h" +#include "message.h" +#include "peer.h" +#include "tx.h" +#include "user.h" +#include "util.h" +#include "util/flist.h" +#include "util/pool.h" +#include "util/queue.h" + +static size_t bus1_factory_size(struct bus1_cmd_send *param) +{ + /* make sure @size cannot overflow */ + BUILD_BUG_ON(UIO_MAXIOV > U16_MAX); + BUILD_BUG_ON(BUS1_FD_MAX > U16_MAX); + + /* make sure we do not violate alignment rules */ + BUILD_BUG_ON(__alignof(struct bus1_flist) < __alignof(struct iovec)); + BUILD_BUG_ON(__alignof(struct iovec) < __alignof(struct file *)); + + return sizeof(struct bus1_factory) + + bus1_flist_inline_size(param->n_handles) + + param->n_vecs * sizeof(struct iovec) + + param->n_fds * sizeof(struct file *); +} + +/** + * bus1_factory_new() - create new message factory + * @peer: peer to operate as + * @param: factory parameters + * @stack: optional stack for factory, or NULL + * @n_stack: size of space at @stack + * + * This allocates a new message factory. It imports data from @param and + * prepares the factory for a transaction. From this factory, messages can be + * instantiated. This is used both for unicasts and multicasts. + * + * If @stack is given, this tries to place the factory on the specified stack + * space. The caller must guarantee that the factory does not outlive the stack + * frame. If this is not wanted, pass 0 as @n_stack. + * In either case, if the stack frame is too small, this will allocate the + * factory on the heap. + * + * Return: Pointer to factory, or ERR_PTR on failure. + */ +struct bus1_factory *bus1_factory_new(struct bus1_peer *peer, + struct bus1_cmd_send *param, + void *stack, + size_t n_stack) +{ + const struct iovec __user *ptr_vecs; + const u64 __user *ptr_handles; + const int __user *ptr_fds; + struct bus1_factory *f; + struct bus1_flist *e; + struct file *file; + size_t i, size; + bool is_new; + int r, fd; + u32 sid; + u64 id; + + lockdep_assert_held(&peer->local.lock); + + size = bus1_factory_size(param); + if (unlikely(size > n_stack)) { + f = kmalloc(size, GFP_TEMPORARY); + if (!f)
[RFC v1 01/14] bus1: add bus1(7) man-page
From: Tom Gundersen Add new directory ./Documentation/bus1/ and include DocBook scripts to build the bus1(7) man-page. This documents the bus1 character-device, including all the file-operations you can perform on it. Furthermore, the man-page also introduces the core bus1 concepts and explains how they work. Build the bus1-documentation via: $ make -C Documentation/bus1/ mandocs $ make -C Documentation/bus1/ htmldocs Signed-off-by: Tom Gundersen Signed-off-by: David Herrmann --- Documentation/bus1/.gitignore | 2 + Documentation/bus1/Makefile | 41 ++ Documentation/bus1/bus1.xml | 833 ++ Documentation/bus1/stylesheet.xsl | 16 + 4 files changed, 892 insertions(+) create mode 100644 Documentation/bus1/.gitignore create mode 100644 Documentation/bus1/Makefile create mode 100644 Documentation/bus1/bus1.xml create mode 100644 Documentation/bus1/stylesheet.xsl diff --git a/Documentation/bus1/.gitignore b/Documentation/bus1/.gitignore new file mode 100644 index 000..b4a77cc --- /dev/null +++ b/Documentation/bus1/.gitignore @@ -0,0 +1,2 @@ +*.7 +*.html diff --git a/Documentation/bus1/Makefile b/Documentation/bus1/Makefile new file mode 100644 index 000..d2b9e61 --- /dev/null +++ b/Documentation/bus1/Makefile @@ -0,0 +1,41 @@ +cmd = $(cmd_$(1)) +srctree = $(shell pwd) +src = . +obj = $(srctree)/$(src) + +DOCS := \ + bus1.xml + +XMLFILES := $(addprefix $(obj)/,$(DOCS)) +MANFILES := $(patsubst %.xml, %.7, $(XMLFILES)) +HTMLFILES := $(patsubst %.xml, %.html, $(XMLFILES)) + +XMLTO_ARGS := \ + -m $(srctree)/$(src)/stylesheet.xsl \ + --skip-validation \ + --stringparam funcsynopsis.style=ansi \ + --stringparam man.output.quietly=1 \ + --stringparam man.authors.section.enabled=0 \ + --stringparam man.copyright.section.enabled=0 + +quiet_cmd_db2man = MAN $@ + cmd_db2man = xmlto man $(XMLTO_ARGS) -o $(obj) $< +%.7: %.xml + @(which xmlto > /dev/null 2>&1) || \ +(echo "*** You need to install xmlto ***"; \ + exit 1) + $(call cmd,db2man) + +quiet_cmd_db2html = HTML$@ + cmd_db2html = xmlto html-nochunks $(XMLTO_ARGS) -o $(obj) $< +%.html: %.xml + @(which xmlto > /dev/null 2>&1) || \ +(echo "*** You need to install xmlto ***"; \ + exit 1) + $(call cmd,db2html) + +mandocs: $(MANFILES) + +htmldocs: $(HTMLFILES) + +clean-files := $(MANFILES) $(HTMLFILES) diff --git a/Documentation/bus1/bus1.xml b/Documentation/bus1/bus1.xml new file mode 100644 index 000..40b55c0 --- /dev/null +++ b/Documentation/bus1/bus1.xml @@ -0,0 +1,833 @@ + +http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd";> + + + + +bus1 + bus1 + + + +Documentation +David +Herrmann + + +Documentation +Tom +Gundersen + + + + + +bus1 +7 + + + +bus1 +Kernel Message Bus + + + + + #include <linux/bus1.h> + + + + +Description + +The bus1 Kernel Message Bus defines and implements a distributed object model. +It allows local processes to send messages to objects owned by remote processes, +as well as share their own objects with others. Object ownership is static and +cannot be transferred. Access to remote objects is prohibited, unless it was +explicitly granted. Processes can transmit messages to a remote object via the +message bus, transferring a data payload, object access rights, file +descriptors, or other auxiliary data. + + +To participate on the message bus, a peer context must be created. Peer contexts +are kernel objects, identified by a file descriptor. They are not bound to any +process, but can be shared freely. The peer context provides a message queue to +store all incoming messages, a registry for all locally owned objects, and +tracks access rights to remote objects. A peer context never serves as +routing entity, but merely as anchor for peer-owned resources. Any message on +the bus is always destined for an object, and the bus takes care to transfer a +message into the message queue of the peer context that owns this object. + + +The message bus manages object access using capabilities. That is, by default +only the owner of an object is granted access rights. No other peer can access +the object, nor are they aware of the existance of the object. However, access +rights can be transmitted as auxiliary data with any message, effectively +granting them to the receiver of the message. This even works transitively, that +is, any peer that was granted access to an object can pass on those rights, even +if they do not own the object. But mind that no access rights can ever be +revoked, besides the owner destroying the object. + + + + Nodes and Handles + +Each peer context comes with a registry of owned objects, wh
[RFC v1 12/14] bus1: hook up file-operations
From: Tom Gundersen This hooks up all the file-operations on a bus1-file-descriptor. It implements the ioctls as defined in the UAPI, as well as mmap() and poll() support. Signed-off-by: Tom Gundersen Signed-off-by: David Herrmann --- ipc/bus1/main.c | 46 +++ ipc/bus1/peer.c | 934 +++- ipc/bus1/peer.h | 8 + 3 files changed, 987 insertions(+), 1 deletion(-) diff --git a/ipc/bus1/main.c b/ipc/bus1/main.c index 51034f3..d5a726a 100644 --- a/ipc/bus1/main.c +++ b/ipc/bus1/main.c @@ -14,10 +14,17 @@ #include #include #include +#include +#include +#include +#include #include "main.h" #include "peer.h" #include "tests.h" #include "user.h" +#include "util/active.h" +#include "util/pool.h" +#include "util/queue.h" static int bus1_fop_open(struct inode *inode, struct file *file) { @@ -37,6 +44,41 @@ static int bus1_fop_release(struct inode *inode, struct file *file) return 0; } +static unsigned int bus1_fop_poll(struct file *file, + struct poll_table_struct *wait) +{ + struct bus1_peer *peer = file->private_data; + unsigned int mask; + + poll_wait(file, &peer->waitq, wait); + + /* access queue->front unlocked */ + rcu_read_lock(); + if (bus1_active_is_deactivated(&peer->active)) { + mask = POLLHUP; + } else { + mask = POLLOUT | POLLWRNORM; + if (bus1_queue_is_readable_rcu(&peer->data.queue)) + mask |= POLLIN | POLLRDNORM; + } + rcu_read_unlock(); + + return mask; +} + +static int bus1_fop_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct bus1_peer *peer = file->private_data; + int r; + + if (!bus1_peer_acquire(peer)) + return -ESHUTDOWN; + + r = bus1_pool_mmap(&peer->data.pool, vma); + bus1_peer_release(peer); + return r; +} + static void bus1_fop_show_fdinfo(struct seq_file *m, struct file *file) { struct bus1_peer *peer = file->private_data; @@ -48,7 +90,11 @@ const struct file_operations bus1_fops = { .owner = THIS_MODULE, .open = bus1_fop_open, .release= bus1_fop_release, + .poll = bus1_fop_poll, .llseek = noop_llseek, + .mmap = bus1_fop_mmap, + .unlocked_ioctl = bus1_peer_ioctl, + .compat_ioctl = bus1_peer_ioctl, .show_fdinfo= bus1_fop_show_fdinfo, }; diff --git a/ipc/bus1/peer.c b/ipc/bus1/peer.c index 0ff7a98..f0da4a7 100644 --- a/ipc/bus1/peer.c +++ b/ipc/bus1/peer.c @@ -23,11 +23,52 @@ #include #include #include +#include +#include "handle.h" #include "main.h" +#include "message.h" #include "peer.h" +#include "tx.h" #include "user.h" #include "util.h" #include "util/active.h" +#include "util/pool.h" +#include "util/queue.h" + +static struct bus1_queue_node * +bus1_peer_free_qnode(struct bus1_queue_node *qnode) +{ + struct bus1_message *m; + struct bus1_handle *h; + + /* +* Queue-nodes are generic entities that can only be destroyed by who +* created them. That is, they have no embedded release callback. +* Instead, we must detect them by type. Since the queue logic is kept +* generic, it cannot provide this helper. Instead, we have this small +* destructor here, which simply dispatches to the correct handler. +*/ + + if (qnode) { + switch (bus1_queue_node_get_type(qnode)) { + case BUS1_MSG_DATA: + m = container_of(qnode, struct bus1_message, qnode); + bus1_message_unref(m); + break; + case BUS1_MSG_NODE_DESTROY: + case BUS1_MSG_NODE_RELEASE: + h = container_of(qnode, struct bus1_handle, qnode); + bus1_handle_unref(h); + break; + case BUS1_MSG_NONE: + default: + WARN(1, "Unknown message type\n"); + break; + } + } + + return NULL; +} /** * bus1_peer_new() - allocate new peer @@ -47,6 +88,7 @@ struct bus1_peer *bus1_peer_new(void) const struct cred *cred = current_cred(); struct bus1_peer *peer; struct bus1_user *user; + int r; user = bus1_user_ref_by_uid(cred->uid); if (IS_ERR(user)) @@ -75,9 +117,14 @@ struct bus1_peer *bus1_peer_new(void) /* initialize peer-private section */ mutex_init(&peer->local.lock); + peer-
[RFC v1 04/14] bus1: util - fixed list utility library
From: Tom Gundersen This implements a fixed-size list called bus1_flist. The size of the list must be constant over the lifetime of the list. The list can hold one arbitrary pointer per node. Fixed lists are a combination of a linked list and a static array. That is, fixed lists behave like linked lists (no random access, but arbitrary size), but compare in speed with arrays (consequetive accesses are fast). Unlike fixed arrays, fixed lists can hold huge number of elements without requiring vmalloc, but solely relying on small-size kmalloc allocations. Internally, fixed lists are a singly-linked list of static arrays. This guarantees that iterations behave almost like on an array, except when crossing a batch-border. Fixed lists can replace fixed-size arrays whenever you need to support large number of elements, but don't need random access. Fixed lists have ALMOST the same memory requirements as fixed-size arrays, except one pointer of state per 'BUS1_FLIST_BATCH' elements. If only a small size (i.e., it only requires one batch) is stored in a fixed list, then its memory requirements and iteration time are equivalent to fixed-size arrays. Fixed lists will be required by the upcoming bus1 message-transactions. They must support large auxiliary data transfers, in case users want to send their entire handle state via the bus. Signed-off-by: Tom Gundersen Signed-off-by: David Herrmann --- ipc/bus1/Makefile | 3 +- ipc/bus1/util/flist.c | 116 + ipc/bus1/util/flist.h | 202 ++ 3 files changed, 320 insertions(+), 1 deletion(-) create mode 100644 ipc/bus1/util/flist.c create mode 100644 ipc/bus1/util/flist.h diff --git a/ipc/bus1/Makefile b/ipc/bus1/Makefile index 9e491691..6db6d13 100644 --- a/ipc/bus1/Makefile +++ b/ipc/bus1/Makefile @@ -1,6 +1,7 @@ bus1-y := \ main.o \ - util/active.o + util/active.o \ + util/flist.o obj-$(CONFIG_BUS1) += bus1.o diff --git a/ipc/bus1/util/flist.c b/ipc/bus1/util/flist.c new file mode 100644 index 000..b8b0d4e --- /dev/null +++ b/ipc/bus1/util/flist.c @@ -0,0 +1,116 @@ +/* + * Copyright (C) 2013-2016 Red Hat, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU Lesser General Public License as published by the + * Free Software Foundation; either version 2.1 of the License, or (at + * your option) any later version. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include +#include +#include +#include "flist.h" + +/** + * bus1_flist_populate() - populate an flist + * @list: flist to operate on + * @n: number of elements + * @gfp: GFP to use for allocations + * + * Populate an flist. This pre-allocates the backing memory for an flist that + * was statically initialized via bus1_flist_init(). This is NOT needed if the + * list was allocated via bus1_flist_new(). + * + * Return: 0 on success, negative error code on failure. + */ +int bus1_flist_populate(struct bus1_flist *list, size_t n, gfp_t gfp) +{ + if (gfp & __GFP_ZERO) + memset(list, 0, bus1_flist_inline_size(n)); + + if (unlikely(n > BUS1_FLIST_BATCH)) { + /* Never populate twice! */ + WARN_ON(list[BUS1_FLIST_BATCH].next); + + n -= BUS1_FLIST_BATCH; + list[BUS1_FLIST_BATCH].next = bus1_flist_new(n, gfp); + if (!list[BUS1_FLIST_BATCH].next) + return -ENOMEM; + } + + return 0; +} + +/** + * bus1_flist_new() - allocate new flist + * @n: number of elements + * @gfp: GFP to use for allocations + * + * This allocates a new flist ready to store @n elements. + * + * Return: Pointer to flist, NULL if out-of-memory. + */ +struct bus1_flist *bus1_flist_new(size_t n, gfp_t gfp) +{ + struct bus1_flist list, *e, *slot; + size_t remaining; + + list.next = NULL; + slot = &list; + remaining = n; + + while (remaining >= BUS1_FLIST_BATCH) { + e = kmalloc_array(sizeof(*e), BUS1_FLIST_BATCH + 1, gfp); + if (!e) + return bus1_flist_free(list.next, n); + + slot->next = e; + slot = &e[BUS1_FLIST_BATCH]; + slot->next = NULL; + + remaining -= BUS1_FLIST_BATCH; + } + + if (remaining > 0) { + slot->next = kmalloc_array(remaining, sizeof(*e), gfp); + if (!slot->next) + return bus1_flist_free(list.next, n); + } + + return list.next; +} + +/** + * bus1_flist_free() - free flist + * @list: flist to operate on, or NULL + * @n: number of elements + * + * This deallocates an flist previously crea
Re: drm: NULL pointer dereference in drm_mode_object_find()
Hi On Fri, Aug 19, 2016 at 7:10 PM, Alexander Potapenko wrote: > Hello, > > the program below triggers a NULL deref in DRM code when ran on QEMU: > > === > BUG: unable to handle kernel NULL pointer dereference at (null) > IP: [< inline >] __list_add ./include/linux/list.h:44 > IP: [< inline >] list_add_tail ./include/linux/list.h:77 > IP: [< inline >] __mutex_lock_common kernel/locking/mutex.c:543 > IP: [] __mutex_lock_slowpath+0x6f/0x100 > kernel/locking/mutex.c:824 > PGD 1c555067 PUD 1c554067 PMD 0 > Oops: 0002 [#1] SMP > Modules linked in: > CPU: 0 PID: 2517 Comm: crash_drm_mode_ Not tainted 4.8.0-rc2+ #1157 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > task: 88001c40a700 task.stack: 88001c984000 > RIP: 0010:[] [] > __mutex_lock_slowpath+0x6f/0x100 > RSP: 0018:88001c987cb0 EFLAGS: 00010282 > RAX: RBX: 88001d5212a0 RCX: c100 > RDX: 0001 RSI: 88001c40a700 RDI: 88001d5212a4 > RBP: 88001c987cf8 R08: 88001c984000 R09: > R10: R11: R12: 88001c40a700 > R13: 88001d5212a4 R14: R15: 88001d5212a8 > FS: 00dc9880() GS:88001f00() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: CR3: 1c8a9000 CR4: 000406f0 > Stack: > 88001d5212a8 811a398f > 88001d5212a0 88001d5212a0 > 81a6eb20 88001c987d10 818e85ba 88001d521000 > Call Trace: > [< inline >] __mutex_fastpath_lock > ./arch/x86/include/asm/mutex_64.h:28 > [] mutex_lock+0x1a/0x30 kernel/locking/mutex.c:102 > [] _object_find+0x23/0xc0 drivers/gpu/drm/drm_crtc.c:329 > [< inline >] drm_mode_object_find drivers/gpu/drm/drm_crtc.c:360 > [< inline >] drm_crtc_find ./include/drm/drm_crtc.h:2999 > [] drm_mode_page_flip_ioctl+0x4e/0x300 > drivers/gpu/drm/drm_crtc.c:5414 > [] drm_ioctl+0x2a2/0x460 drivers/gpu/drm/drm_ioctl.c:721 > [< inline >] vfs_ioctl fs/ioctl.c:43 > [] do_vfs_ioctl+0x8d/0x580 fs/ioctl.c:675 > [< inline >] SYSC_ioctl fs/ioctl.c:690 > [] SyS_ioctl+0x74/0x80 fs/ioctl.c:681 > [] entry_SYSCALL_64_fastpath+0x13/0x8f > arch/x86/entry/entry_64.S:207 > Code: e8 37 23 00 00 8b 03 83 f8 01 0f 84 95 00 00 00 48 8b 43 10 4c > 8d 7b 08 48 89 63 10 41 be ff ff ff ff 4c 89 3c 24 48 89 44 24 08 <48> > 89 20 4c 89 64 24 10 eb 19 49 c7 04 24 02 00 00 00 c6 43 04 > RIP [< inline >] __list_add ./include/linux/list.h:44 > RIP [< inline >] list_add_tail ./include/linux/list.h:77 > RIP [< inline >] __mutex_lock_common kernel/locking/mutex.c:543 > RIP [] __mutex_lock_slowpath+0x6f/0x100 > kernel/locking/mutex.c:824 > RSP > CR2: > ---[ end trace 3cef4eb618ac6bb6 ]--- > === > > // autogenerated by syzkaller (http://github.com/google/syzkaller) > #include > #include > #include > > int main() > { > int fd = open("/dev/dri/card0", 0); > mmap(0x20036000ul, 0x1000ul, 0x3ul, 0x32ul, 0xul, 0x0ul); > memcpy((void*)0x20036ad7, > "\x1e\xa4\x45\xdc\xca\x11\xff\x25\x72\x65\x7e\x4a\x56\x54\x35" > "\x67\xe3\x8b\x41\x5c\x6d\x69\xa5\xf9\x88\x29\xb8\xc9\x6a\x45" > "\x76\xa9\xe7\x14\xd1\xf6\xa3\x59\x07\x4d\xe5\xc8\x39\xbf\x33" > "\xb9\x3d\x21\xd1\xaf\x16\x4d\xbc\xbf\xb1\x0a\x92\x97\xd9\x91" > "\x4d\xd8\xf8\xa1\xa6\xa3\x20\x02\x2c\x5e\x8f\x87\x05\x8b\xeb" > "\x9a\xb9\xbc\xa6\x60\x45\x8d\x19\x01\x7d\xb7\xef\x64\x62\x2e" > "\x5e\x3d\xfe\x65\xbf\xe2\x80\x89\x36\x48\x73\xc6\xa2\x6e\xe2" > "\x1a\x8f\x1b\x11\x6f\x49\x20\xeb\x74\x2d\x41\xb9\x8b\xb4\x8e" > "\x8b\xf5\x6d\xb7\xb1\xa3\xcb\xc4\xc2\x7f\x6d\xef\x32\xef\xa7" > "\x1c\x17\x03\x60\x7b\x31\x1f\x66", > 143); > ioctl(fd, 0xbb0ul, 0x20036ad7ul, 0, 0, 0); > return 0; > } > > I build the ToT kernel (commit > 952b159f2919a8d514f13999f9f463bddcc1dae7, Aug 18) with defconfig and > CONFIG_DRM_VGEM=y. Can you make sure you have this commit: commit 6f00975c619064a18c23fd3aced325ae165a73b9 Author: Daniel Vetter Date: Sat Aug 20 12:22:11 2016 +0200 drm: Reject page_flip for !DRIVER_MODESET Thanks David
Re: drm: NULL pointer dereference in drm_mode_object_find()
Hi On Tue, Sep 20, 2016 at 11:25 AM, Alexander Potapenko wrote: > On Tue, Sep 20, 2016 at 11:21 AM, David Herrmann > wrote: >> Hi >> >> On Mon, Sep 5, 2016 at 10:30 AM, Dmitry Vyukov wrote: >>> On Fri, Aug 19, 2016 at 7:10 PM, Alexander Potapenko >>> wrote: >>>> Hello, >>>> >>>> the program below triggers a NULL deref in DRM code when ran on QEMU: >>>> >>>> === >>>> BUG: unable to handle kernel NULL pointer dereference at (null) >>>> IP: [< inline >] __list_add ./include/linux/list.h:44 >>>> IP: [< inline >] list_add_tail ./include/linux/list.h:77 >>>> IP: [< inline >] __mutex_lock_common kernel/locking/mutex.c:543 >>>> IP: [] __mutex_lock_slowpath+0x6f/0x100 >>>> kernel/locking/mutex.c:824 >>>> PGD 1c555067 PUD 1c554067 PMD 0 >>>> Oops: 0002 [#1] SMP >>>> Modules linked in: >>>> CPU: 0 PID: 2517 Comm: crash_drm_mode_ Not tainted 4.8.0-rc2+ #1157 >>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs >>>> 01/01/2011 >>>> task: 88001c40a700 task.stack: 88001c984000 >>>> RIP: 0010:[] [] >>>> __mutex_lock_slowpath+0x6f/0x100 >>>> RSP: 0018:88001c987cb0 EFLAGS: 00010282 >>>> RAX: RBX: 88001d5212a0 RCX: c100 >>>> RDX: 0001 RSI: 88001c40a700 RDI: 88001d5212a4 >>>> RBP: 88001c987cf8 R08: 88001c984000 R09: >>>> R10: R11: R12: 88001c40a700 >>>> R13: 88001d5212a4 R14: R15: 88001d5212a8 >>>> FS: 00dc9880() GS:88001f00() >>>> knlGS: >>>> CS: 0010 DS: ES: CR0: 80050033 >>>> CR2: CR3: 1c8a9000 CR4: 000406f0 >>>> Stack: >>>> 88001d5212a8 811a398f >>>> 88001d5212a0 88001d5212a0 >>>> 81a6eb20 88001c987d10 818e85ba 88001d521000 >>>> Call Trace: >>>> [< inline >] __mutex_fastpath_lock >>>> ./arch/x86/include/asm/mutex_64.h:28 >>>> [] mutex_lock+0x1a/0x30 kernel/locking/mutex.c:102 >>>> [] _object_find+0x23/0xc0 drivers/gpu/drm/drm_crtc.c:329 >>>> [< inline >] drm_mode_object_find drivers/gpu/drm/drm_crtc.c:360 >>>> [< inline >] drm_crtc_find ./include/drm/drm_crtc.h:2999 >>>> [] drm_mode_page_flip_ioctl+0x4e/0x300 >>>> drivers/gpu/drm/drm_crtc.c:5414 >>>> [] drm_ioctl+0x2a2/0x460 drivers/gpu/drm/drm_ioctl.c:721 >>>> [< inline >] vfs_ioctl fs/ioctl.c:43 >>>> [] do_vfs_ioctl+0x8d/0x580 fs/ioctl.c:675 >>>> [< inline >] SYSC_ioctl fs/ioctl.c:690 >>>> [] SyS_ioctl+0x74/0x80 fs/ioctl.c:681 >>>> [] entry_SYSCALL_64_fastpath+0x13/0x8f >>>> arch/x86/entry/entry_64.S:207 >>>> Code: e8 37 23 00 00 8b 03 83 f8 01 0f 84 95 00 00 00 48 8b 43 10 4c >>>> 8d 7b 08 48 89 63 10 41 be ff ff ff ff 4c 89 3c 24 48 89 44 24 08 <48> >>>> 89 20 4c 89 64 24 10 eb 19 49 c7 04 24 02 00 00 00 c6 43 04 >>>> RIP [< inline >] __list_add ./include/linux/list.h:44 >>>> RIP [< inline >] list_add_tail ./include/linux/list.h:77 >>>> RIP [< inline >] __mutex_lock_common kernel/locking/mutex.c:543 >>>> RIP [] __mutex_lock_slowpath+0x6f/0x100 >>>> kernel/locking/mutex.c:824 >>>> RSP >>>> CR2: >>>> ---[ end trace 3cef4eb618ac6bb6 ]--- >>>> === >>>> >>>> // autogenerated by syzkaller (http://github.com/google/syzkaller) >>>> #include >>>> #include >>>> #include >>>> >>>> int main() >>>> { >>>> int fd = open("/dev/dri/card0", 0); >>>> mmap(0x20036000ul, 0x1000ul, 0x3ul, 0x32ul, 0xul, 0x0ul); >>>> memcpy((void*)0x20036ad7, >>>> "\x1e\xa4\x45\xdc\xca\x11\xff\x25\x72\x65\x7e\x4a\x56\x54\x35" >>>> "\x67\xe3\x8b\x41\x5c\x6d\x69\xa5\xf9\x88\x29\xb8\xc9\x6a\x45" >>>> "\x76\xa9
Re: drm: NULL pointer dereference in drm_mode_object_find()
Hi On Mon, Sep 5, 2016 at 10:30 AM, Dmitry Vyukov wrote: > On Fri, Aug 19, 2016 at 7:10 PM, Alexander Potapenko > wrote: >> Hello, >> >> the program below triggers a NULL deref in DRM code when ran on QEMU: >> >> === >> BUG: unable to handle kernel NULL pointer dereference at (null) >> IP: [< inline >] __list_add ./include/linux/list.h:44 >> IP: [< inline >] list_add_tail ./include/linux/list.h:77 >> IP: [< inline >] __mutex_lock_common kernel/locking/mutex.c:543 >> IP: [] __mutex_lock_slowpath+0x6f/0x100 >> kernel/locking/mutex.c:824 >> PGD 1c555067 PUD 1c554067 PMD 0 >> Oops: 0002 [#1] SMP >> Modules linked in: >> CPU: 0 PID: 2517 Comm: crash_drm_mode_ Not tainted 4.8.0-rc2+ #1157 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 >> task: 88001c40a700 task.stack: 88001c984000 >> RIP: 0010:[] [] >> __mutex_lock_slowpath+0x6f/0x100 >> RSP: 0018:88001c987cb0 EFLAGS: 00010282 >> RAX: RBX: 88001d5212a0 RCX: c100 >> RDX: 0001 RSI: 88001c40a700 RDI: 88001d5212a4 >> RBP: 88001c987cf8 R08: 88001c984000 R09: >> R10: R11: R12: 88001c40a700 >> R13: 88001d5212a4 R14: R15: 88001d5212a8 >> FS: 00dc9880() GS:88001f00() knlGS: >> CS: 0010 DS: ES: CR0: 80050033 >> CR2: CR3: 1c8a9000 CR4: 000406f0 >> Stack: >> 88001d5212a8 811a398f >> 88001d5212a0 88001d5212a0 >> 81a6eb20 88001c987d10 818e85ba 88001d521000 >> Call Trace: >> [< inline >] __mutex_fastpath_lock >> ./arch/x86/include/asm/mutex_64.h:28 >> [] mutex_lock+0x1a/0x30 kernel/locking/mutex.c:102 >> [] _object_find+0x23/0xc0 drivers/gpu/drm/drm_crtc.c:329 >> [< inline >] drm_mode_object_find drivers/gpu/drm/drm_crtc.c:360 >> [< inline >] drm_crtc_find ./include/drm/drm_crtc.h:2999 >> [] drm_mode_page_flip_ioctl+0x4e/0x300 >> drivers/gpu/drm/drm_crtc.c:5414 >> [] drm_ioctl+0x2a2/0x460 drivers/gpu/drm/drm_ioctl.c:721 >> [< inline >] vfs_ioctl fs/ioctl.c:43 >> [] do_vfs_ioctl+0x8d/0x580 fs/ioctl.c:675 >> [< inline >] SYSC_ioctl fs/ioctl.c:690 >> [] SyS_ioctl+0x74/0x80 fs/ioctl.c:681 >> [] entry_SYSCALL_64_fastpath+0x13/0x8f >> arch/x86/entry/entry_64.S:207 >> Code: e8 37 23 00 00 8b 03 83 f8 01 0f 84 95 00 00 00 48 8b 43 10 4c >> 8d 7b 08 48 89 63 10 41 be ff ff ff ff 4c 89 3c 24 48 89 44 24 08 <48> >> 89 20 4c 89 64 24 10 eb 19 49 c7 04 24 02 00 00 00 c6 43 04 >> RIP [< inline >] __list_add ./include/linux/list.h:44 >> RIP [< inline >] list_add_tail ./include/linux/list.h:77 >> RIP [< inline >] __mutex_lock_common kernel/locking/mutex.c:543 >> RIP [] __mutex_lock_slowpath+0x6f/0x100 >> kernel/locking/mutex.c:824 >> RSP >> CR2: >> ---[ end trace 3cef4eb618ac6bb6 ]--- >> === >> >> // autogenerated by syzkaller (http://github.com/google/syzkaller) >> #include >> #include >> #include >> >> int main() >> { >> int fd = open("/dev/dri/card0", 0); >> mmap(0x20036000ul, 0x1000ul, 0x3ul, 0x32ul, 0xul, 0x0ul); >> memcpy((void*)0x20036ad7, >> "\x1e\xa4\x45\xdc\xca\x11\xff\x25\x72\x65\x7e\x4a\x56\x54\x35" >> "\x67\xe3\x8b\x41\x5c\x6d\x69\xa5\xf9\x88\x29\xb8\xc9\x6a\x45" >> "\x76\xa9\xe7\x14\xd1\xf6\xa3\x59\x07\x4d\xe5\xc8\x39\xbf\x33" >> "\xb9\x3d\x21\xd1\xaf\x16\x4d\xbc\xbf\xb1\x0a\x92\x97\xd9\x91" >> "\x4d\xd8\xf8\xa1\xa6\xa3\x20\x02\x2c\x5e\x8f\x87\x05\x8b\xeb" >> "\x9a\xb9\xbc\xa6\x60\x45\x8d\x19\x01\x7d\xb7\xef\x64\x62\x2e" >> "\x5e\x3d\xfe\x65\xbf\xe2\x80\x89\x36\x48\x73\xc6\xa2\x6e\xe2" >> "\x1a\x8f\x1b\x11\x6f\x49\x20\xeb\x74\x2d\x41\xb9\x8b\xb4\x8e" >> "\x8b\xf5\x6d\xb7\xb1\xa3\xcb\xc4\xc2\x7f\x6d\xef\x32\xef\xa7" >> "\x1c\x17\x03\x60\x7b\x31\x1f\x66", >> 143); >> ioctl(fd, 0xbb0ul, 0x20036ad7ul, 0, 0, 0); >> return 0; >> } >> >> I build the ToT kernel (commit >> 952b159f2919a8d514f13999f9f463bddcc1dae7, Aug 18) with defconfig and >> CONFIG_DRM_VGEM=y. > > +dri-devel > > I am also hitting this on 0f98f121e1670eaa2a2fbb675e07d6ba7f0e146f of > linux-next. Can you tell us which DRM driver this is? vgem does not specify DRIVER_MODESET, so the page-flip ioctl should not be hooked up. Also, the mmap() operation should fail on any GEM driver. *confused* Thanks David
Re: drm: WARNING in ioremap_wc
Hey On Fri, Sep 2, 2016 at 2:23 PM, Dmitry Vyukov wrote: > On Mon, Aug 29, 2016 at 8:05 AM, Daniel Vetter wrote: >> On Sun, Aug 28, 2016 at 07:36:59PM +0200, Dmitry Vyukov wrote: >>> Hello, >>> >>> The following program triggers WARNING in ioremap_wc: >> >> Yup, that should also be fixed in linux-next. Probably better to not >> report more for 4.8-rc kernels for now ;-) >> >> If you have some time to test, you need to cherry pick the following two >> patches I think: >> >> fa5386459f06 ("drm: Used DRM_LEGACY for all legacy functions") >> 3cbf6a5deb2f ("drm: Mark up legacy/dri1 drivers with DRM_LEGACY") >> >> If that's confirmed we can cherry-pick them over to -fixes. > > > Hi Daniel, > > I've switched to linux-next and don't see the WARNINGs anymore. > I wonder how they were discovered? Is somebody else running syzkaller? > Or they were discovered in some other way? We independently discovered that some DRM-KMS drivers enabled the DRI1 ioctls, and fixed this. Hence, you should be unable to trigger anything in drm_bufs.c, drm_context.c, and friends. Thanks David
Re: [PATCH v4 3/5] drm: add SimpleDRM driver
Hey On Mon, Aug 22, 2016 at 10:25 PM, Noralf Trønnes wrote: > The SimpleDRM driver binds to simple-framebuffer devices and provides a > DRM/KMS API. It provides only a single CRTC+encoder+connector combination > plus one initial mode. > > Userspace can create dumb-buffers which can be blit into the real > framebuffer similar to UDL. No access to the real framebuffer is allowed > (compared to earlier version of this driver) to avoid security issues. > Furthermore, this way we can support arbitrary modes as long as we have a > conversion-helper. > > The driver was originally written by David Herrmann in 2014. > My main contribution is to make use of drm_simple_kms_helper and > rework the probe path to avoid use of the deprecated drm_platform_init() > and drm_driver.{load,unload}(). > Additions have also been made for later changes to the Device Tree binding > document, like support for clocks, regulators and having the node under > /chosen. This code was lifted verbatim from simplefb.c. > > Cc: dh.herrm...@gmail.com > Cc: l...@skynet.be > Signed-off-by: Noralf Trønnes > --- > > Changes from version 3: > - Reworked gem code to match udl > - Dropped PRIME support > - Dropped dirty_info_property, it's gone > - Don't use drm_device.platformdev it's deprecated > - Remove struct sdrm_device #ifdef's > - Split out sdrm_fb_init() from sdrm_fb_create(), needed by fbdev code > - Simplify drm_clip validation by extending the check in > sdrm_dirty() and drop the one in sdrm_blit() > - Removed sdrm_dirty_all_unlocked() which was unused. > > Changes from version 2: > - Remove superfluos module.h includes > - Move includes from header to source files > - Set plane.fb before flushing in pipe update, or else the previous one > gets flushed > - Added check for vblank event in sdrm_display_pipe_update() > > Changes from version 1: > - Move platform_set_drvdata() before drm_dev_register() > - Remove drm_legacy_mmap() call. > - Set mode_config.{min,max}_{width,height} to the actual dimensions > of the native framebuffer > - Remove plane positioning since it won't work with the simple display pipe, > meaning sdrm_display_pipe_check() isn't necessary either > - Support the additions to the Device Tree binding document, including > clocks, regulators and having the node under /chosen > > Changes from previous version: > - Remove FB_SIMPLE=n dependency to avoid kconfig recursive error > - Changed module name to match kconfig help text: sdrm -> simpledrm > - Use drm_simple_display_pipe > - Replace deprecated drm_platform_init() > - sdrm_dumb_create(): drm_gem_object_unreference() -> *_unlocked() > - sdrm_dumb_map_offset(): drm_gem_object_lookup() remove drm_device parameter > - sdrm_drm_mmap() changes: > Remove struct_mutex locking > Add drm_vma_offset_{lock,unlock}_lookup() > drm_mmap() -> drm_legacy_mmap() > - dma_buf_begin_cpu_access() doesn't require start and length anymore > - Use drm_cvt_mode() instead of open coding a mode > - Fix format conversion. In the intermediate step, store the 8/6/5 bit color > value in the upper part of the 16-bit color variable, not the lower. > - Support clips == NULL in sdrm_dirty() > - Set mode_config.preferred_depth > - Attach mode_config.dirty_info_property to connector > > drivers/gpu/drm/Kconfig | 2 + > drivers/gpu/drm/Makefile | 1 + > drivers/gpu/drm/simpledrm/Kconfig| 19 + > drivers/gpu/drm/simpledrm/Makefile | 4 + > drivers/gpu/drm/simpledrm/simpledrm.h| 86 + > drivers/gpu/drm/simpledrm/simpledrm_damage.c | 235 > drivers/gpu/drm/simpledrm/simpledrm_drv.c| 543 > +++ > drivers/gpu/drm/simpledrm/simpledrm_gem.c| 202 ++ > drivers/gpu/drm/simpledrm/simpledrm_kms.c| 234 > 9 files changed, 1326 insertions(+) > create mode 100644 drivers/gpu/drm/simpledrm/Kconfig > create mode 100644 drivers/gpu/drm/simpledrm/Makefile > create mode 100644 drivers/gpu/drm/simpledrm/simpledrm.h > create mode 100644 drivers/gpu/drm/simpledrm/simpledrm_damage.c > create mode 100644 drivers/gpu/drm/simpledrm/simpledrm_drv.c > create mode 100644 drivers/gpu/drm/simpledrm/simpledrm_gem.c > create mode 100644 drivers/gpu/drm/simpledrm/simpledrm_kms.c > > diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig > index fc35731..a54cc8d 100644 > --- a/drivers/gpu/drm/Kconfig > +++ b/drivers/gpu/drm/Kconfig > @@ -290,3 +290,5 @@ source "drivers/gpu/drm/arc/Kconfig" > source "drivers/gpu/drm/hisilicon/Kconfig" > > source "drivers/gpu/drm/mediatek/Kconfi
Re: [PATCH 0/2] drm: add SimpleDRM driver
Hi On Thu, Aug 4, 2016 at 5:34 PM, Luc Verhaegen wrote: > Do we really want to recreate a 400+ email thread again, or are we > capable of learning from the past? No we don't. And no-one intends to. I am fully aware of the discussion that introduced the clock-dependencies to simplefb, and I gladly accept patches that add such support to SimpleDRM. Did anyone say otherwise? This series adds initial support for the devices _we_ know and can test (which is x86 and RPi, in my case). If someone wants support for more devices, please send patches. Why does this have to be included (or even discussed) as part of this submission? Thanks David
Re: [PATCH 1/3 v4] drm: Introduce drm_connector_register_all() helper
Hi On Tue, Mar 29, 2016 at 12:02 PM, Alexey Brodkin wrote: > As a pair to already existing drm_connector_unregister_all() we're adding > generic implementation of what is already done in some drivers. > > Once this helper is implemented we'll be ready to switch existing > driver-specific implementations with the generic one. > > Signed-off-by: Alexey Brodkin > Cc: Daniel Vetter > Cc: David Airlie > Cc: David Herrmann > --- > > Changes v3 -> v4: > * In drm_connector_register_all() fail path which calls unregister_all() >is moved outside of loop&locked section (as suggested by Daniel) > > Changes v2 -> v3: > * Updated title with capital after colon > * Simplified failure path with direct and unconditional invocation of >unregister_all() > * Updated kerneldoc description of the drm_connector_register_all() > > Changes v1 -> v2: > * Rename drm_connector_unplug_all() to drm_connector_unregister_all() > * Use drm_for_each_connector() instead of list_for_each_entry() > * Updated kerneldoc for drm_dev_register() > > drivers/gpu/drm/drm_crtc.c | 39 +++ > drivers/gpu/drm/drm_drv.c | 6 +- > include/drm/drm_crtc.h | 3 ++- > 3 files changed, 46 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c > index 7675826..3e4cdb1 100644 > --- a/drivers/gpu/drm/drm_crtc.c > +++ b/drivers/gpu/drm/drm_crtc.c > @@ -1079,6 +1079,45 @@ void drm_connector_unregister(struct drm_connector > *connector) > EXPORT_SYMBOL(drm_connector_unregister); > > /** > + * drm_connector_register_all - register all connectors > + * @dev: drm device > + * > + * This function registers all connectors in sysfs and other places so that > + * userspace can start to access them. Drivers can call it after calling > + * drm_dev_register() to complete the device registration, if they don't call > + * drm_connector_register() on each connector individually. > + * > + * When a device is unplugged and should be removed from userspace access, > + * call drm_connector_unregister_all(), which is the inverse of this > + * function. > + * > + * Returns: > + * Zero on success, error code on failure. > + */ > +int drm_connector_register_all(struct drm_device *dev) > +{ > + struct drm_connector *connector; > + int ret; > + > + mutex_lock(&dev->mode_config.mutex); > + > + drm_for_each_connector(connector, dev) { > + ret = drm_connector_register(connector); > + if (ret) > + goto err; > + } > + > + mutex_unlock(&dev->mode_config.mutex); > + > + return 0; > + > +err: > + drm_connector_unregister_all(dev); You _must_ unlock the mutex before returning. Thanks David > + return ret; > +} > +EXPORT_SYMBOL(drm_connector_register_all); > + > +/** > * drm_connector_unregister_all - unregister connector userspace interfaces > * @dev: drm device > * > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c > index 167c8d3..2c9a2b6 100644 > --- a/drivers/gpu/drm/drm_drv.c > +++ b/drivers/gpu/drm/drm_drv.c > @@ -715,7 +715,11 @@ EXPORT_SYMBOL(drm_dev_unref); > * > * Register the DRM device @dev with the system, advertise device to > user-space > * and start normal device operation. @dev must be allocated via > drm_dev_alloc() > - * previously. > + * previously. Right after drm_dev_register() the driver should call > + * drm_connector_register_all() to register all connectors in sysfs. This is > + * a separate call for backward compatibility with drivers still using > + * the deprecated ->load() callback, where connectors are registered from > within > + * the ->load() callback. > * > * Never call this twice on any device! > * > diff --git a/include/drm/drm_crtc.h b/include/drm/drm_crtc.h > index 12f2bd4..6231f6c 100644 > --- a/include/drm/drm_crtc.h > +++ b/include/drm/drm_crtc.h > @@ -2253,7 +2253,8 @@ void drm_connector_unregister(struct drm_connector > *connector); > > extern void drm_connector_cleanup(struct drm_connector *connector); > extern unsigned int drm_connector_index(struct drm_connector *connector); > -/* helper to unregister all connectors from sysfs for device */ > +/* helpers to {un}register all connectors from sysfs for device */ > +extern int drm_connector_register_all(struct drm_device *dev); > extern void drm_connector_unregister_all(struct drm_device *dev); > > extern int drm_bridge_add(struct drm_bridge *bridge); > -- > 2.5.0 >
Re: [PATCH 2/4 v3] drm: Introduce drm_connector_register_all() helper
Hi On Wed, Mar 23, 2016 at 9:42 AM, Alexey Brodkin wrote: > As a pair to already existing drm_connector_unregister_all() we're adding > generic implementation of what is already done in some drivers. > > Once this helper is implemented we'll be ready to switch existing > driver-specific implementations with the generic one. > > Signed-off-by: Alexey Brodkin > Cc: Daniel Vetter > Cc: David Airlie > --- > > Changes v2 -> v3: > * Updated title with capital after colon > * Simplified failure path with direct and unconditional invocation of >unregister_all() > * Updated kerneldoc description of the drm_connector_register_all() > > Changes v1 -> v2: > * Rename drm_connector_unplug_all() to drm_connector_unregister_all() > * Use drm_for_each_connector() instead of list_for_each_entry() > * Updated kerneldoc for drm_dev_register() > > drivers/gpu/drm/drm_crtc.c | 43 +++ > drivers/gpu/drm/drm_drv.c | 6 +- > include/drm/drm_crtc.h | 3 ++- > 3 files changed, 50 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c > index 65488a6..21eea11 100644 > --- a/drivers/gpu/drm/drm_crtc.c > +++ b/drivers/gpu/drm/drm_crtc.c > @@ -1081,6 +1081,49 @@ void drm_connector_unregister(struct drm_connector > *connector) > EXPORT_SYMBOL(drm_connector_unregister); > > /** > + * drm_connector_register_all - register all connectors > + * @dev: drm device > + * > + * This function registers all connectors in sysfs and other places so that > + * userspace can start to access them. Drivers can call it after calling > + * drm_dev_register() to complete the device registration, if they don't call > + * drm_connector_register() on each connector individually. > + * > + * When a device is unplugged and should be removed from userspace access, > + * call drm_connector_unregister_all(), which is the inverse of this > + * function. > + * > + * Returns: > + * Zero on success, error code on failure. > + */ > +int drm_connector_register_all(struct drm_device *dev) > +{ > + struct drm_connector *connector; > + int ret; > + > + mutex_lock(&dev->mode_config.mutex); > + > + drm_for_each_connector(connector, dev) { > + ret = drm_connector_register(connector); > + if (ret) { > + /* > +* We may safely call unregister_all() here within > +* area locked with mutex because unregister_all() > +* doesn't use locks inside (see a comment in that > +* function). > +*/ Ugh? unregister_all() says: /* FIXME: taking the mode config mutex ends up in a clash with sysfs */ This strongly contradicts your comment. Anyway, regardless how you want to fix it: You better unlock the mode-config mutex before returning below. Thanks David > + drm_connector_unregister_all(dev); > + return ret; > + } > + } > + > + mutex_unlock(&dev->mode_config.mutex); > + > + return 0; > +} > +EXPORT_SYMBOL(drm_connector_register_all); > + > +/** > * drm_connector_unregister_all - unregister connector userspace interfaces > * @dev: drm device > * > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c > index 167c8d3..2c9a2b6 100644 > --- a/drivers/gpu/drm/drm_drv.c > +++ b/drivers/gpu/drm/drm_drv.c > @@ -715,7 +715,11 @@ EXPORT_SYMBOL(drm_dev_unref); > * > * Register the DRM device @dev with the system, advertise device to > user-space > * and start normal device operation. @dev must be allocated via > drm_dev_alloc() > - * previously. > + * previously. Right after drm_dev_register() the driver should call > + * drm_connector_register_all() to register all connectors in sysfs. This is > + * a separate call for backward compatibility with drivers still using > + * the deprecated ->load() callback, where connectors are registered from > within > + * the ->load() callback. > * > * Never call this twice on any device! > * > diff --git a/include/drm/drm_crtc.h b/include/drm/drm_crtc.h > index 42d9f4d..6a34117 100644 > --- a/include/drm/drm_crtc.h > +++ b/include/drm/drm_crtc.h > @@ -2214,7 +2214,8 @@ void drm_connector_unregister(struct drm_connector > *connector); > > extern void drm_connector_cleanup(struct drm_connector *connector); > extern unsigned int drm_connector_index(struct drm_connector *connector); > -/* helper to unregister all connectors from sysfs for device */ > +/* helpers to {un}register all connectors from sysfs for device */ > +extern int drm_connector_register_all(struct drm_device *dev); > extern void drm_connector_unregister_all(struct drm_device *dev); > > extern int drm_bridge_add(struct drm_bridge *bridge); > -- > 2.5.0 >
Re: [PATCH v2 3/9] efi/x86: efifb: move DMI based quirks handling out of generic code
Hi On Fri, Mar 18, 2016 at 5:47 PM, Ard Biesheuvel wrote: > The efifb quirks handling based on DMI identification of the platform is > specific to x86, so move it to x86 arch code. > > Signed-off-by: Ard Biesheuvel Fine with me: Acked-by: David Herrmann Thanks David > --- > arch/x86/include/asm/efi.h | 2 ++ > arch/x86/kernel/sysfb_efi.c | 15 +++ > drivers/video/fbdev/efifb.c | 15 --- > 3 files changed, 21 insertions(+), 11 deletions(-) > > diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h > index 7d5187b66108..790adbc3a64a 100644 > --- a/arch/x86/include/asm/efi.h > +++ b/arch/x86/include/asm/efi.h > @@ -154,6 +154,8 @@ static inline bool efi_runtime_supported(void) > extern struct console early_efi_console; > extern void parse_efi_setup(u64 phys_addr, u32 data_len); > > +extern void efifb_setup_from_dmi(struct screen_info *si, const char *opt); > + > #ifdef CONFIG_EFI_MIXED > extern void efi_thunk_runtime_setup(void); > extern efi_status_t efi_thunk_set_virtual_address_map( > diff --git a/arch/x86/kernel/sysfb_efi.c b/arch/x86/kernel/sysfb_efi.c > index b285d4e8c68e..e21a8a7ddcff 100644 > --- a/arch/x86/kernel/sysfb_efi.c > +++ b/arch/x86/kernel/sysfb_efi.c > @@ -68,6 +68,21 @@ struct efifb_dmi_info efifb_dmi_list[] = { > [M_UNKNOWN] = { NULL, 0, 0, 0, 0, OVERRIDE_NONE } > }; > > +void efifb_setup_from_dmi(struct screen_info *si, const char *opt) > +{ > + int i; > + > + for (i = 0; i < M_UNKNOWN; i++) { > + if (efifb_dmi_list[i].base != 0 && > + !strcmp(opt, efifb_dmi_list[i].optname)) { > + si->lfb_base = efifb_dmi_list[i].base; > + si->lfb_linelength = efifb_dmi_list[i].stride; > + si->lfb_width = efifb_dmi_list[i].width; > + si->lfb_height = efifb_dmi_list[i].height; > + } > + } > +} > + > #define choose_value(dmivalue, fwvalue, field, flags) ({ \ > typeof(fwvalue) _ret_ = fwvalue;\ > if ((flags) & (field)) \ > diff --git a/drivers/video/fbdev/efifb.c b/drivers/video/fbdev/efifb.c > index 95d293b7445a..dd594369b8a6 100644 > --- a/drivers/video/fbdev/efifb.c > +++ b/drivers/video/fbdev/efifb.c > @@ -8,6 +8,7 @@ > > #include > #include > +#include > #include > #include > #include > @@ -15,7 +16,7 @@ > #include > #include > #include > -#include > +#include > > static bool request_mem_succeeded = false; > > @@ -85,21 +86,13 @@ static struct fb_ops efifb_ops = { > static int efifb_setup(char *options) > { > char *this_opt; > - int i; > > if (options && *options) { > while ((this_opt = strsep(&options, ",")) != NULL) { > if (!*this_opt) continue; > > - for (i = 0; i < M_UNKNOWN; i++) { > - if (efifb_dmi_list[i].base != 0 && > - !strcmp(this_opt, > efifb_dmi_list[i].optname)) { > - screen_info.lfb_base = > efifb_dmi_list[i].base; > - screen_info.lfb_linelength = > efifb_dmi_list[i].stride; > - screen_info.lfb_width = > efifb_dmi_list[i].width; > - screen_info.lfb_height = > efifb_dmi_list[i].height; > - } > - } > + efifb_setup_from_dmi(&screen_info, this_opt); > + > if (!strncmp(this_opt, "base:", 5)) > screen_info.lfb_base = > simple_strtoul(this_opt+5, NULL, 0); > else if (!strncmp(this_opt, "stride:", 7)) > -- > 2.5.0 >
Re: [PATCH v2 8/9] efi/arm*: wire up struct screen_info to efi-framebuffer platform device
Hi On Fri, Mar 18, 2016 at 5:47 PM, Ard Biesheuvel wrote: > This adds code to the ARM and arm64 EFI init routines to expose a platform > device of type 'efi-framebuffer' if struct screen_info has been populated > appropriately from the GOP protocol by the stub. Since the framebuffer may > potentially be located in system RAM, make sure that the region is reserved > and marked MEMBLOCK_NOMAP. > > Signed-off-by: Ard Biesheuvel > --- > drivers/firmware/efi/arm-init.c | 18 ++ > 1 file changed, 18 insertions(+) > > diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c > index acd44de30504..f5f13b871d42 100644 > --- a/drivers/firmware/efi/arm-init.c > +++ b/drivers/firmware/efi/arm-init.c > @@ -19,6 +19,7 @@ > #include > #include > #include > +#include > #include > > #include > @@ -76,6 +77,10 @@ static void __init init_screen_info(void) > screen_info = *si; > early_memunmap(si, sizeof(*si)); > } > + > + if (screen_info.orig_video_isVGA == VIDEO_TYPE_EFI && > + memblock_is_map_memory(screen_info.lfb_base)) > + memblock_mark_nomap(screen_info.lfb_base, > screen_info.lfb_size); > } > > static int __init uefi_init(void) > @@ -235,3 +240,16 @@ void __init efi_init(void) > > init_screen_info(); > } > + > +static int __init register_gop_device(void) > +{ > + void *pd; > + > + if (screen_info.orig_video_isVGA != VIDEO_TYPE_EFI) > + return 0; > + > + /* the efifb driver accesses screen_info directly, no need to pass it > */ > + pd = platform_device_register_simple("efi-framebuffer", 0, NULL, 0); I'd really appreciate if you pass "screen_info" as data, just like x86 does. Sure, we still use the global screen_info variable from efifb, but we should try to reduce the number of global variables and rather make drivers use the platform-data of the platform device. Thanks David
Re: [PATCH] HID: wiimote: Fix wiimote mp scale linearization
Hi On Wed, Mar 16, 2016 at 5:59 PM, Cyan Ogilvie wrote: > The wiimote motion plus gyros use two scales to report fast and slow > rotation - below 440 deg/s uses 8192/440 units / deg/s, and above uses > 8192/2000 units / deg/s. > > Previously this driver attempted to linearize the two by scaling the fast > rate by 18 and the slow by 9, but this results in a scale of > 8192*9/440 = ~167.564 for slow and 8192*18/2000 = 73.728 for fast. > > Correct the fast motion scale factor so that both report ~167.564 > units / deg/s > > Signed-off-by: Cyan Ogilvie > --- > drivers/hid/hid-wiimote-modules.c | 14 -- > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/drivers/hid/hid-wiimote-modules.c > b/drivers/hid/hid-wiimote-modules.c > index 4390eee..c830ed3 100644 > --- a/drivers/hid/hid-wiimote-modules.c > +++ b/drivers/hid/hid-wiimote-modules.c > @@ -2049,9 +2049,11 @@ static void wiimod_mp_in_mp(struct wiimote_data > *wdata, const __u8 *ext) > * -+--+-+-+ > * The single bits Yaw, Roll, Pitch in the lower right corner specify > * whether the wiimote is rotating fast (0) or slow (1). Speed for > slow > -* roation is 440 deg/s and for fast rotation 2000 deg/s. To get a > -* linear scale we multiply by 2000/440 = ~4.5454 which is 18 for fast > -* and 9 for slow. Oh dear! My initial reasoning was completely off. I somehow upscaled 4.5 to 18, and 1 to 9, when it should have been 1 to 4. Oops! > +* roation is 8192/440 units / deg/s and for fast rotation 8192/2000 > +* units / deg/s. To get a linear scale for fast rotation we multiply > +* by 2000/440 = ~4.5454 and scale both fast and slow by 9 to match > the > +* previous scale reported by this driver. > +* This leaves a linear scale with 8192*9/440 (~167.564) units / > deg/s. > * If the wiimote is not rotating the sensor reports 2^13 = 8192. > * Ext specifies whether an extension is connected to the motionp. > * which is parsed by wiimote-core. > @@ -2070,15 +2072,15 @@ static void wiimod_mp_in_mp(struct wiimote_data > *wdata, const __u8 *ext) > z -= 8192; > > if (!(ext[3] & 0x02)) > - x *= 18; > + x = (x * 2000 * 9) / 440; > else > x *= 9; > if (!(ext[4] & 0x02)) > - y *= 18; > + y = (y * 2000 * 9) / 440; > else > y *= 9; > if (!(ext[3] & 0x01)) > - z *= 18; > + z = (z * 2000 * 9) / 440; So 2000/440 ~= 4.5, so this is exactly what the initial comment said. Patch looks good to me: Reviewed-by: David Herrmann Thanks David > else > z *= 9; > > -- > 2.1.4 >
Re: [PATCH, RESEND] drm: avoid uninitialized timestamp use in wait_vblank
Hi On Thu, Feb 25, 2016 at 10:09 PM, Arnd Bergmann wrote: > gcc warns about the timestamp in drm_wait_vblank being possibly > used without an initialization: > > drivers/gpu/drm/drm_irq.c: In function 'drm_wait_vblank': > drivers/gpu/drm/drm_irq.c:1755:28: warning: 'now.tv_usec' may be used > uninitialized in this function [-Wmaybe-uninitialized] >vblwait->reply.tval_usec = now.tv_usec; > drivers/gpu/drm/drm_irq.c:1754:27: warning: 'now.tv_sec' may be used > uninitialized in this function [-Wmaybe-uninitialized] >vblwait->reply.tval_sec = now.tv_sec; > > This can happen if drm_vblank_count_and_time() returns 0 in its > error path. To sanitize the error case, I'm changing that function > to return a zero timestamp when it fails. > > Signed-off-by: Arnd Bergmann > Fixes: e6ae8687a87b ("drm: idiot-proof vblank") > --- > drivers/gpu/drm/drm_irq.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > I'm going through the maybe-unused warnings in randconfig builds, this one is > apparently not a false positive, although it only happens if something > else has already gone wrong. > > Originally sent out on Jan 13, but I received no reply, so resending now. > > diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c > index 881c5a6c180c..6f41ddfbe061 100644 > --- a/drivers/gpu/drm/drm_irq.c > +++ b/drivers/gpu/drm/drm_irq.c > @@ -997,8 +997,10 @@ u32 drm_vblank_count_and_time(struct drm_device *dev, > unsigned int pipe, > int count = DRM_TIMESTAMP_MAXRETRIES; > u32 cur_vblank; > > - if (WARN_ON(pipe >= dev->num_crtcs)) > + if (WARN_ON(pipe >= dev->num_crtcs)) { > + *vblanktime = (struct timeval) { 0 }; The '0' is redundant. Anyway, this is: Reviewed-by: David Herrmann (CC: Daniel, for drm-misc) Thanks David > return 0; > + } > > /* > * Vblank timestamps are read lockless. To ensure consistency the > vblank > -- > 2.7.0 > > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2] unix: properly account for FDs passed over unix sockets
Hi On Wed, Feb 3, 2016 at 12:36 PM, Simon McVittie wrote: > Am I right in saying that the advice I give to D-Bus users should be > something like this? > > * system services should not send fds at all, unless they trust the > dbus-daemon > * system services should not send fds via D-Bus that will be delivered > to recipients that they do not trust > * sending fds to an untrusted recipient would enable that recipient to > carry out a denial-of-service attack (on what? the sender? the > dbus-daemon?) With the revised patch from Hannes, this should no longer be needed. My original concern was only about accounting inflight-fds on the file-owner, rather than the sender. However, with Hannes' revised patch, a different DoS attack against dbus-daemon is possible. Imagine a peer that receives batches of FDs, but never dequeues them. They will be accounted on the inflight-limit of dbus-daemon, as such causing messages of independent peers to be rejected in case they carry FDs. Preferably, dbus-daemon would avoid queuing more than 16 FDs on a single destination (total). But that would require POLLOUT to be capped by the number of queued fds. A possible workaround is to add CAP_SYS_RESOURCE to dbus-daemon. Thanks David
Re: [PATCH v2] unix: properly account for FDs passed over unix sockets
Hi On Sun, Jan 10, 2016 at 7:54 AM, Willy Tarreau wrote: > It is possible for a process to allocate and accumulate far more FDs than > the process' limit by sending them over a unix socket then closing them > to keep the process' fd count low. > > This change addresses this problem by keeping track of the number of FDs > in flight per user and preventing non-privileged processes from having > more FDs in flight than their configured FD limit. This is not really what this patch does. This patch rather prevents processes from having more of their owned *files* in flight than their configured FD limit. See below for details.. > Reported-by: socketp...@gmail.com > Reported-by: Tetsuo Handa > Mitigates: CVE-2013-4312 (Linux 2.0+) > Suggested-by: Linus Torvalds > Acked-by: Hannes Frederic Sowa > Signed-off-by: Willy Tarreau > --- > v2: add reported-by, mitigates and acked-by. > > It would be nice if (if accepted) it would be backported to -stable as the > issue is currently exploitable. > --- > include/linux/sched.h | 1 + > net/unix/af_unix.c| 24 > net/unix/garbage.c| 13 - > 3 files changed, 29 insertions(+), 9 deletions(-) > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index edad7a4..fbf25f1 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -830,6 +830,7 @@ struct user_struct { > unsigned long mq_bytes; /* How many bytes can be allocated to mqueue? > */ > #endif > unsigned long locked_shm; /* How many pages of mlocked shm ? */ > + unsigned long unix_inflight;/* How many files in flight in unix > sockets */ > > #ifdef CONFIG_KEYS > struct key *uid_keyring;/* UID specific keyring */ > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c > index 45aebd9..d6d7b43 100644 > --- a/net/unix/af_unix.c > +++ b/net/unix/af_unix.c > @@ -1499,6 +1499,21 @@ static void unix_destruct_scm(struct sk_buff *skb) > sock_wfree(skb); > } > > +/* > + * The "user->unix_inflight" variable is protected by the garbage > + * collection lock, and we just read it locklessly here. If you go > + * over the limit, there might be a tiny race in actually noticing > + * it across threads. Tough. This limit is checked once per transaction. IIRC, a transaction can carry 255 files. Thus, it is easy to exceed this limit without any race involved. > + */ > +static inline bool too_many_unix_fds(struct task_struct *p) > +{ > + struct user_struct *user = current_user(); > + > + if (unlikely(user->unix_inflight > task_rlimit(p, RLIMIT_NOFILE))) > + return !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN); > + return false; > +} > + > #define MAX_RECURSION_LEVEL 4 > > static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb) > @@ -1507,6 +1522,9 @@ static int unix_attach_fds(struct scm_cookie *scm, > struct sk_buff *skb) > unsigned char max_level = 0; > int unix_sock_count = 0; > > + if (too_many_unix_fds(current)) > + return -ETOOMANYREFS; > + Ignoring the capabilities, this effectively resolves to: if (current_cred()->user->unix_inflight > rlimit(RLIMIT_NOFILE)) return -ETOOMANYREFS; It limits the number of inflight FDs of the *current* user to its *own* limit. But.. > for (i = scm->fp->count - 1; i >= 0; i--) { > struct sock *sk = unix_get_socket(scm->fp->fp[i]); > > @@ -1528,10 +1546,8 @@ static int unix_attach_fds(struct scm_cookie *scm, > struct sk_buff *skb) > if (!UNIXCB(skb).fp) > return -ENOMEM; > > - if (unix_sock_count) { > - for (i = scm->fp->count - 1; i >= 0; i--) > - unix_inflight(scm->fp->fp[i]); > - } > + for (i = scm->fp->count - 1; i >= 0; i--) > + unix_inflight(scm->fp->fp[i]); > return max_level; > } > > diff --git a/net/unix/garbage.c b/net/unix/garbage.c > index a73a226..8fcdc22 100644 > --- a/net/unix/garbage.c > +++ b/net/unix/garbage.c > @@ -120,11 +120,11 @@ void unix_inflight(struct file *fp) > { > struct sock *s = unix_get_socket(fp); > > + spin_lock(&unix_gc_lock); > + > if (s) { > struct unix_sock *u = unix_sk(s); > > - spin_lock(&unix_gc_lock); > - > if (atomic_long_inc_return(&u->inflight) == 1) { > BUG_ON(!list_empty(&u->link)); > list_add_tail(&u->link, &gc_inflight_list); > @@ -132,25 +132,28 @@ void unix_inflight(struct file *fp) > BUG_ON(list_empty(&u->link)); > } > unix_tot_inflight++; > - spin_unlock(&unix_gc_lock); > } > + fp->f_cred->user->unix_inflight++; ..this modifies the limit of the owner of the file you send. As such, if you only send files that you don't own, you will continuously bump the limit of the file-owner, but never
Re: [PATCH 1/2] ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANO
Hi On Fri, Jan 22, 2016 at 12:03 AM, John Stultz wrote: > Recently, in commit 37cf4dc3370f > ("time: Verify time values in adjtimex ADJ_SETOFFSET to avoid overflow") > I forgot to check if the timeval being passed was actually a > timespec (as is signaled with ADJ_NANO). > > This resulted in that patch breaking ADJ_SETOFFSET users who set > ADJ_NANO, by rejecting valid timespecs that were compared with > valid timeval ranges. > > This patch addresses this by checking for the ADJ_NANO flag and > using the timepsec check instead in that case. > > Cc: Sasha Levin > Cc: Richard Cochran > Cc: Thomas Gleixner > Cc: Prarit Bhargava > Cc: Harald Hoyer > Cc: Kay Sievers > Cc: David Herrmann > Reported-by: Harald Hoyer > Reported-by: Kay Sievers > Signed-off-by: John Stultz Thanks for picking this up. Looks good to me: Reviewed-by: David Herrmann Thanks David > --- > kernel/time/ntp.c | 14 -- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c > index 36f2ca0..6df8927 100644 > --- a/kernel/time/ntp.c > +++ b/kernel/time/ntp.c > @@ -685,8 +685,18 @@ int ntp_validate_timex(struct timex *txc) > if (!capable(CAP_SYS_TIME)) > return -EPERM; > > - if (!timeval_inject_offset_valid(&txc->time)) > - return -EINVAL; > + if (txc->modes & ADJ_NANO) { > + struct timespec ts; > + > + ts.tv_sec = txc->time.tv_sec; > + ts.tv_nsec = txc->time.tv_usec; > + if (!timespec_inject_offset_valid(&ts)) > + return -EINVAL; > + > + } else { > + if (!timeval_inject_offset_valid(&txc->time)) > + return -EINVAL; > + } > } > > /* > -- > 1.9.1 >
Re: [PATCH] vgaarb: fix signal handling in vga_get()
Hi On Mon, Dec 14, 2015 at 9:19 AM, Kirill A. Shutemov wrote: > On Thu, Dec 10, 2015 at 11:28:58AM +0100, David Herrmann wrote: >> Hi >> >> On Mon, Nov 30, 2015 at 3:17 AM, Kirill A. Shutemov >> wrote: >> > There are few defects in vga_get() related to signal hadning: >> > >> > - we shouldn't check for pending signals for TASK_UNINTERRUPTIBLE >> > case; >> > >> > - if we found pending signal we must remove ourself from wait queue >> > and change task state back to running; >> > >> > - -ERESTARTSYS is more appropriate, I guess. >> > >> > Signed-off-by: Kirill A. Shutemov >> > --- >> > >> > Alex, I try to get KVM with VGA passthrough working properly. I have i915 >> > (HD 4600) on the host and GTX 580 for the guest. The guest GPU is not >> > capabale of EFI, so I have to use x-vga=on. It's kinda work with your >> > patch for i915.enable_hd_vgaarb=1. But guest refuse to initialize the GPU >> > after KVM was not shut down correctly, resulting in host crash like this: >> > >> > BUG: unable to handle kernel paging request at 880870187ed8 >> > IP: [] 0x880870187ed8 >> > PGD 2129067 PUD 800841e3 >> > Oops: 0011 [#1] PREEMPT SMP >> > Modules linked in: iwlmvm iwlwifi >> > CPU: 6 PID: 3983 Comm: qemu-system-x86 Not tainted 4.3.0-gentoo #6 >> > Hardware name: Gigabyte Technology Co., Ltd. Z87X-UD7 TH/Z87X-UD7 TH-CF, >> > BIOS F5a 06/12/2014 >> > task: 88087a91 ti: 8808632c task.ti: 8808632c >> > RIP: 0010:[] [] 0x880870187ed8 >> > RSP: 0018:8808632c3d08 EFLAGS: 00010006 >> > RAX: 880870187db0 RBX: 70187f58 RCX: >> > RDX: RSI: 0003 RDI: 880870187db0 >> > RBP: 8808632c3d48 R08: R09: >> > R10: 000103c0 R11: 0293 R12: 81ea03c8 >> > R13: 8104c7cb R14: R15: 0003 >> > FS: 7f984f9b2700() GS:88089f38() >> > knlGS: >> > CS: 0010 DS: ES: CR0: 80050033 >> > CR2: 880870187ed8 CR3: 0008645f8000 CR4: 001426e0 >> > Stack: >> > 810cc83d 632c3d28 81ea03c0 >> > 0046 0003 >> > 8808632c3d80 810cca44 88087af63800 0286 >> > Call Trace: >> > [] ? __wake_up_common+0x4d/0x80 >> > [] __wake_up+0x34/0x50 >> > [] __vga_put+0x73/0xd0 >> > [] vga_put+0x54/0x80 >> > [] vfio_pci_vga_rw+0x1d2/0x220 >> > [] vfio_pci_rw+0x33/0x60 >> > [] vfio_pci_write+0x17/0x20 >> > [] vfio_device_fops_write+0x26/0x30 >> > [] __vfs_write+0x23/0xe0 >> > [] ? __vfs_read+0x23/0xd0 >> > [] ? do_vfs_ioctl+0x2b5/0x490 >> > [] vfs_write+0xa4/0x190 >> > [] SyS_pwrite64+0x66/0xa0 >> > [] entry_SYSCALL_64_fastpath+0x12/0x6a >> > Code: 88 ff ff e0 7e 18 70 08 88 ff ff 00 8c 57 76 08 88 ff ff 20 7f 18 70 >> > 08 88 ff ff 08 7f 18 70 08 88 ff ff 94 51 1a 81 ff ff ff ff <09> 00 00 00 >> > 00 00 00 00 01 8c 57 76 08 88 ff ff 00 8c 57 76 08 >> > RIP [] 0x880870187ed8 >> > RSP >> > CR2: 880870187ed8 >> > >> > The patch fixes the crash, but doesn't help with getting GPU in guest >> > working again. >> > >> > Any ideas? >> > >> > --- >> > drivers/gpu/vga/vgaarb.c | 6 -- >> > 1 file changed, 4 insertions(+), 2 deletions(-) >> > >> > diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/gpu/vga/vgaarb.c >> > index 3166e4bc4eb6..9abcaa53bd25 100644 >> > --- a/drivers/gpu/vga/vgaarb.c >> > +++ b/drivers/gpu/vga/vgaarb.c >> > @@ -395,8 +395,10 @@ int vga_get(struct pci_dev *pdev, unsigned int rsrc, >> > int interruptible) >> > set_current_state(interruptible ? >> > TASK_INTERRUPTIBLE : >> > TASK_UNINTERRUPTIBLE); >> > - if (signal_pending(current)) { >> > - rc = -EINTR; >> > + if (interruptible && signal_pending(current)) { >> > + __set_current_state(TASK_RUNNING); >> > + remove_wait_queue(&vga_wait_queue, &wai
Re: [PATCH] vgaarb: fix signal handling in vga_get()
Hi On Mon, Nov 30, 2015 at 3:17 AM, Kirill A. Shutemov wrote: > There are few defects in vga_get() related to signal hadning: > > - we shouldn't check for pending signals for TASK_UNINTERRUPTIBLE > case; > > - if we found pending signal we must remove ourself from wait queue > and change task state back to running; > > - -ERESTARTSYS is more appropriate, I guess. > > Signed-off-by: Kirill A. Shutemov > --- > > Alex, I try to get KVM with VGA passthrough working properly. I have i915 > (HD 4600) on the host and GTX 580 for the guest. The guest GPU is not > capabale of EFI, so I have to use x-vga=on. It's kinda work with your > patch for i915.enable_hd_vgaarb=1. But guest refuse to initialize the GPU > after KVM was not shut down correctly, resulting in host crash like this: > > BUG: unable to handle kernel paging request at 880870187ed8 > IP: [] 0x880870187ed8 > PGD 2129067 PUD 800841e3 > Oops: 0011 [#1] PREEMPT SMP > Modules linked in: iwlmvm iwlwifi > CPU: 6 PID: 3983 Comm: qemu-system-x86 Not tainted 4.3.0-gentoo #6 > Hardware name: Gigabyte Technology Co., Ltd. Z87X-UD7 TH/Z87X-UD7 TH-CF, BIOS > F5a 06/12/2014 > task: 88087a91 ti: 8808632c task.ti: 8808632c > RIP: 0010:[] [] 0x880870187ed8 > RSP: 0018:8808632c3d08 EFLAGS: 00010006 > RAX: 880870187db0 RBX: 70187f58 RCX: > RDX: RSI: 0003 RDI: 880870187db0 > RBP: 8808632c3d48 R08: R09: > R10: 000103c0 R11: 0293 R12: 81ea03c8 > R13: 8104c7cb R14: R15: 0003 > FS: 7f984f9b2700() GS:88089f38() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 880870187ed8 CR3: 0008645f8000 CR4: 001426e0 > Stack: > 810cc83d 632c3d28 81ea03c0 > 0046 0003 > 8808632c3d80 810cca44 88087af63800 0286 > Call Trace: > [] ? __wake_up_common+0x4d/0x80 > [] __wake_up+0x34/0x50 > [] __vga_put+0x73/0xd0 > [] vga_put+0x54/0x80 > [] vfio_pci_vga_rw+0x1d2/0x220 > [] vfio_pci_rw+0x33/0x60 > [] vfio_pci_write+0x17/0x20 > [] vfio_device_fops_write+0x26/0x30 > [] __vfs_write+0x23/0xe0 > [] ? __vfs_read+0x23/0xd0 > [] ? do_vfs_ioctl+0x2b5/0x490 > [] vfs_write+0xa4/0x190 > [] SyS_pwrite64+0x66/0xa0 > [] entry_SYSCALL_64_fastpath+0x12/0x6a > Code: 88 ff ff e0 7e 18 70 08 88 ff ff 00 8c 57 76 08 88 ff ff 20 7f 18 70 08 > 88 ff ff 08 7f 18 70 08 88 ff ff 94 51 1a 81 ff ff ff ff <09> 00 00 00 00 00 > 00 00 01 8c 57 76 08 88 ff ff 00 8c 57 76 08 > RIP [] 0x880870187ed8 > RSP > CR2: 880870187ed8 > > The patch fixes the crash, but doesn't help with getting GPU in guest > working again. > > Any ideas? > > --- > drivers/gpu/vga/vgaarb.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/gpu/vga/vgaarb.c > index 3166e4bc4eb6..9abcaa53bd25 100644 > --- a/drivers/gpu/vga/vgaarb.c > +++ b/drivers/gpu/vga/vgaarb.c > @@ -395,8 +395,10 @@ int vga_get(struct pci_dev *pdev, unsigned int rsrc, int > interruptible) > set_current_state(interruptible ? > TASK_INTERRUPTIBLE : > TASK_UNINTERRUPTIBLE); > - if (signal_pending(current)) { > - rc = -EINTR; > + if (interruptible && signal_pending(current)) { > + __set_current_state(TASK_RUNNING); > + remove_wait_queue(&vga_wait_queue, &wait); > + rc = -ERESTARTSYS; > break; All 3 points are valid, and the patch looks good to me: Reviewed-by: David Herrmann However, there seems to be a race between vga_lock and putting the thread asleep. We should fix that as well. See the hunk below (completely untested.. why is VGA still in use? *sigh*). Thanks David diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/gpu/vga/vgaarb.c index a0b4334..82cf1e3 100644 --- a/drivers/gpu/vga/vgaarb.c +++ b/drivers/gpu/vga/vgaarb.c @@ -359,8 +359,8 @@ static void __vga_put int vga_get(struct pci_dev *pdev, unsigned int rsrc, int interruptible) { struct vga_device *vgadev, *conflict; + DECLARE_WAITQUEUE(wait, current); unsigned long flags; - wait_queue_t wait; int rc = 0; vga_check_first_use(); @@ -371,6 +371,11 @@ int vga_get return 0; for (;;) { + add_wait_queue(&vga_wait_qu
Re: kdbus refactoring?
Hi On Mon, Nov 9, 2015 at 10:14 AM, Peter Zijlstra wrote: > On Mon, Nov 09, 2015 at 09:56:04AM +0100, David Herrmann wrote: > >> The submission and following year of development followed the 'kernel >> way'. > > This is not true; the last complete posting I could find was: > > Date: Mon, 9 Mar 2015 14:09:06 +0100 > Subject: [PATCH v4 00/14] Add kdbus implementation ..which was the last time we proposed it for inclusion. Thanks David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus refactoring?
Hi On Mon, Nov 9, 2015 at 9:48 AM, Peter Zijlstra wrote: > On Mon, Nov 09, 2015 at 09:34:37AM +0100, David Herrmann wrote: >> Hi >> >> On Mon, Nov 9, 2015 at 1:53 AM, Josh Triplett wrote: >> > Quoting Documentation/development-process/1.Intro: >> [...] >> >> Years of experience with the kernel development community have taught a >> >> clear lesson: kernel code which is designed and developed behind closed >> >> doors invariably has problems which are only revealed when the code is >> >> released into the community. Sometimes these problems are severe, >> >> requiring months or years of effort before the code can be brought up to >> >> the kernel community's standards. >> [...] >> > And I've seen you specifically recommend having such conversations early >> > and often. >> >> I think comparing kdbus to "behind closed doors" development models is >> unfair. We chose to center our development around DBus, not the >> kernel. > > And yet it will be the kernel people you ask to take your code. You > don't see something funny with that? No. >> Anybody who is interested in kdbus discussions could have >> easily joined the DBus and systemd communication channels (and *many* >> people did). I see little reason in cross-posting everything to LKML, >> especially given that our communication is rarely mail-based. > > So you want to develop kernel code, but can't be arsed to do it the > kernel way? Gheez, no wonder this all is going so well.. The submission and following year of development followed the 'kernel way'. I don't see why the design phase needs to follow your style, though. Thanks David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus refactoring?
Hi On Mon, Nov 9, 2015 at 1:53 AM, Josh Triplett wrote: > Quoting Documentation/development-process/1.Intro: [...] >> Years of experience with the kernel development community have taught a >> clear lesson: kernel code which is designed and developed behind closed >> doors invariably has problems which are only revealed when the code is >> released into the community. Sometimes these problems are severe, >> requiring months or years of effort before the code can be brought up to >> the kernel community's standards. [...] > And I've seen you specifically recommend having such conversations early > and often. I think comparing kdbus to "behind closed doors" development models is unfair. We chose to center our development around DBus, not the kernel. Anybody who is interested in kdbus discussions could have easily joined the DBus and systemd communication channels (and *many* people did). I see little reason in cross-posting everything to LKML, especially given that our communication is rarely mail-based. Thanks David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] thinkpad_acpi: don't yell on unsupported brightness interfaces
Hi Darren! On Wed, Oct 21, 2015 at 4:44 PM, Darren Hart wrote: > On Wed, Oct 21, 2015 at 12:33:52PM -0200, Henrique de Moraes Holschuh wrote: >> On Wed, Oct 21, 2015, at 08:46, David Herrmann wrote: >> > The thinkpad_acpi driver currently emits error messages on unsupported >> > brightness interfaces, giving the impression that someone will implement >> > those. However, this error is spit out on nearly every thinkpad in >> > production since 2 years now. Furthermore, the backlight interfaces on >> > those devices are supported by the i915 driver just fine. >> > >> > Downgrade the error message to a normal pr_info() and stop telling people >> > to report it to IBM. >> >> IBM? Those reports go directly to me. >> >> > Signed-off-by: David Herrmann >> >> Acked-by: Henrique de Moraes Holschuh > > Thanks, Queued. Any particular reason this didn't show up in: [GIT PULL] platform-drivers-x86 for 4.4-1 Just a kind reminder, in case it slipped through the cracks. Thanks David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] Input: uinput - add new UINPUT_DEV_SETUP and UI_ABS_SETUP ioctl
Hi On Sun, Oct 25, 2015 at 12:53 AM, Dmitry Torokhov wrote: > Hi Benjamin, > > On Tue, Aug 25, 2015 at 11:12:59AM -0400, Benjamin Tissoires wrote: >> +static int uinput_abs_setup(struct uinput_device *udev, >> + struct uinput_setup __user *arg, size_t size) >> +{ >> + struct uinput_abs_setup setup = {}; >> + struct input_dev *dev; >> + >> + if (size > sizeof(setup)) >> + return -E2BIG; >> + if (udev->state == UIST_CREATED) >> + return -EINVAL; >> + if (copy_from_user(&setup, arg, size)) >> + return -EFAULT; >> + if (setup.code > ABS_MAX) >> + return -ERANGE; >> + >> + dev = udev->dev; >> + >> + input_alloc_absinfo(dev); >> + if (!dev->absinfo) >> + return -ENOMEM; >> + >> + set_bit(setup.code, dev->absbit); >> + dev->absinfo[setup.code] = setup.absinfo; >> + >> + /* >> + * We restore the state to UIST_NEW_DEVICE because the user has to call >> + * UI_DEV_SETUP in the last place before UI_DEV_CREATE to check the >> + * validity of the absbits. >> + */ > > Do we have to be this strict? It seems to me that with the new IOCTLs > you naturally want to use the new ioctl to setup the device, then adjust > various axes and bits and then validate everything. Indeed, we now force the order to be abs-setup first, then device-setup as last step. Appended is a follow-up patch to cleanup ABS handling in uinput. It is untested. Benjamin, care to give it a spin? Thanks David >From 2568f83bcc5c4b8aeb149be2c5fc5a743812f0fe Mon Sep 17 00:00:00 2001 From: David Herrmann Date: Sun, 25 Oct 2015 10:34:13 +0100 Subject: [PATCH] Input: uinput - rework ABS validation Rework the uinput ABS validation to check passed absinfo data immediately, but do ABS initialization as last step in UI_DEV_CREATE. The behavior observed by user-space is not changed, as ABS initialization was never checked for errors. With this in place, the order of device-initialization and abs-configuration is no longer fixed. User-space can initialize the device and afterwards set absinfo just fine. Signed-off-by: David Herrmann --- drivers/input/misc/uinput.c | 89 +++-- 1 file changed, 45 insertions(+), 44 deletions(-) diff --git a/drivers/input/misc/uinput.c b/drivers/input/misc/uinput.c index baac903..1d93037 100644 --- a/drivers/input/misc/uinput.c +++ b/drivers/input/misc/uinput.c @@ -256,13 +256,22 @@ static void uinput_destroy_device(...) static int uinput_create_device(struct uinput_device *udev) { struct input_dev *dev = udev->dev; - int error; + int error, nslot; if (udev->state != UIST_SETUP_COMPLETE) { printk(KERN_DEBUG "%s: write device info first\n", UINPUT_NAME); return -EINVAL; } + if (test_bit(ABS_MT_SLOT, dev->absbit)) { + nslot = input_abs_get_max(dev, ABS_MT_SLOT) + 1; + error = input_mt_init_slots(dev, nslot, 0); + if (error) + goto fail1; + } else if (test_bit(ABS_MT_POSITION_X, dev->absbit)) { + input_set_events_per_packet(dev, 60); + } + if (udev->ff_effects_max) { error = input_ff_create(dev, udev->ff_effects_max); if (error) @@ -308,10 +317,35 @@ static int uinput_open(...) return 0; } +static int uinput_validate_absinfo(struct input_dev *dev, unsigned int code, + const struct input_absinfo *abs) +{ + int min, max; + + min = abs->minimum; + max = abs->maximum; + + if ((min != 0 || max != 0) && max <= min) { + printk(KERN_DEBUG + "%s: invalid abs[%02x] min:%d max:%d\n", + UINPUT_NAME, code, min, max); + return -EINVAL; + } + + if (abs->flat > max - min) { + printk(KERN_DEBUG + "%s: abs_flat #%02x out of range: %d (min:%d/max:%d)\n", + UINPUT_NAME, code, abs->flat, min, max); + return -EINVAL; + } + + return 0; +} + static int uinput_validate_absbits(struct input_dev *dev) { unsigned int cnt; - int nslot; + int error; if (!test_bit(EV_ABS, dev->evbit)) return 0; @@ -321,38 +355,12 @@ static int uinput_validate_absbits(struct input_dev *dev) */ for_each_set_bit(cnt, dev->absbit, ABS_CNT) { - int min, max; - - min = input_abs_get_min(dev, cnt); - max = input_abs_get_max(dev, cnt); - - if ((min != 0 || max != 0) && max <= min) { - printk(KERN_DEBUG - "%s: invalid abs[%02x] min:%d max:%d\n", - UINPUT_NAME, cnt, - input_abs_get_min(dev, cnt), - input_abs_get_max(dev, cnt)); + if (!dev->absinfo) return -EINVAL; - } - - if (input_abs_get_flat(dev, cnt) > -input_abs_get_max(dev, cnt) - input_abs_get_min(dev, cnt)) { - printk(KERN_DEBUG - "%s: abs_flat #%02x out of range: %d " - "(min:%d/max:%d)\n", - UINPUT_NAME, cnt, - input_a
[PATCH] thinkpad_acpi: don't yell on unsupported brightness interfaces
The thinkpad_acpi driver currently emits error messages on unsupported brightness interfaces, giving the impression that someone will implement those. However, this error is spit out on nearly every thinkpad in production since 2 years now. Furthermore, the backlight interfaces on those devices are supported by the i915 driver just fine. Downgrade the error message to a normal pr_info() and stop telling people to report it to IBM. Signed-off-by: David Herrmann --- drivers/platform/x86/thinkpad_acpi.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c index 131dd74..0bed473 100644 --- a/drivers/platform/x86/thinkpad_acpi.c +++ b/drivers/platform/x86/thinkpad_acpi.c @@ -6459,8 +6459,7 @@ static void __init tpacpi_detect_brightness_capabilities(void) pr_info("detected a 8-level brightness capable ThinkPad\n"); break; default: - pr_err("Unsupported brightness interface, " - "please contact %s\n", TPACPI_MAIL); + pr_info("Unsupported brightness interface\n"); tp_features.bright_unkfw = 1; bright_maxlvl = b - 1; } -- 2.6.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7] Input: Add userio module
Hi On Mon, Oct 5, 2015 at 5:55 PM, wrote: > From: Stephen Chandler Paul > > Debugging input devices, specifically laptop touchpads, can be tricky > without having the physical device handy. Here we try to remedy that > with userio. This module allows an application to connect to a character > device provided by the kernel, and emulate any serio device. In > combination with userspace programs that can record PS/2 devices and > replay them through the /dev/userio device, this allows developers to > debug driver issues on the PS/2 level with devices simply by requesting > a recording from the user experiencing the issue without having to have > the physical hardware in front of them. > > Signed-off-by: Stephen Chandler Paul Apart from the serio-unregistration, all I have left is bike-shedding. Regardless of those changes, this is: Reviewed-by: David Herrmann See below for some details.. > --- > Changes > * Removed useless if (!userio) check in userio_device_write() > > Documentation/input/userio.txt | 70 ++ > MAINTAINERS| 6 + > drivers/input/serio/Kconfig| 11 ++ > drivers/input/serio/Makefile | 1 + > drivers/input/serio/userio.c | 286 > + > include/linux/miscdevice.h | 1 + > include/uapi/linux/userio.h| 44 +++ > 7 files changed, 419 insertions(+) > create mode 100644 Documentation/input/userio.txt > create mode 100644 drivers/input/serio/userio.c > create mode 100644 include/uapi/linux/userio.h > > diff --git a/Documentation/input/userio.txt b/Documentation/input/userio.txt > new file mode 100644 > index 000..0880c0f > --- /dev/null > +++ b/Documentation/input/userio.txt > @@ -0,0 +1,70 @@ > + The userio Protocol > +(c) 2015 Stephen Chandler Paul > + Sponsored by Red Hat > + > + > +1. Introduction > +~~~ > + This module is intended to try to make the lives of input driver developers > +easier by allowing them to test various serio devices (mainly the various > +touchpads found on laptops) without having to have the physical device in > front > +of them. userio accomplishes this by allowing any privileged userspace > program > +to directly interact with the kernel's serio driver and control a virtual > serio > +port from there. > + > +2. Usage overview > +~ > + In order to interact with the userio kernel module, one simply opens the > +/dev/userio character device in their applications. Commands are sent to the > +kernel module by writing to the device, and any data received from the serio > +driver is read as-is from the /dev/userio device. All of the structures and > +macros you need to interact with the device are defined in > and > +. > + > +3. Command Structure > + > + The struct used for sending commands to /dev/userio is as follows: > + > + struct userio_cmd { > + __u8 type; > + __u8 data; > + }; > + > + "type" describes the type of command that is being sent. This can be any > one > +of the USERIO_CMD macros defined in . "data" is the argument > +that goes along with the command. In the event that the command doesn't have > an > +argument, this field can be left untouched and will be ignored by the kernel. > +Each command should be sent by writing the struct directly to the character > +device. In the event that the command you send is invalid, an error will be > +returned by the character device and a more descriptive error will be printed > +to the kernel log. Only one command can be sent at a time, any additional > data > +written to the character device after the initial command will be ignored. > + To close the virtual serio port, just close /dev/userio. > + > +4. Commands > +~~~ > + > +4.1 USERIO_CMD_REGISTER > +~~~ > + Registers the port with the serio driver and begins transmitting data back > and > +forth. Registration can only be performed once a port type is set with > +USERIO_CMD_SET_PORT_TYPE. Has no argument. > + > +4.2 USERIO_CMD_SET_PORT_TYPE > + > + Sets the type of port we're emulating, where "data" is the port type being > +set. Can be any of the macros from . For example: SERIO_8042 > +would set the port type to be a normal PS/2 port. > + > +4.3 USERIO_CMD_SEND_INTERRUPT > +~ > + Sends an interrupt through the virtual serio port to
Re: [PATCH 00/44] kdbus cleanups
Hi On Thu, Oct 8, 2015 at 1:31 PM, Sergei Zviagintsev wrote: > Hi all, > > This is a set of various kdbus code cleanups. Patches are ordered by > increasing complexity, starting with docs and comments fixes and > one-liners. > > Patch 29 is the revised version of > http://lkml.kernel.org/g/1435497454-10464-6-git-send-email-ser...@s15v.net > > Feel free to ask to change layout of this, split/join, etc if necessary. So I reviewed all of the patches, most of them look good. Some comments: - Please justify your changes in the commit-message. Always. - Please don't split patches based on modified functions. If you fix typos, do them subsystem-wide. If you fix common errors like "don't init 'name' to NULL before calling kdbus_pin_dst()", then please do that for all functions. In other words, group your changes logically, not based on location. - If you do cleanups, explain why you do them. I commented on some of the changes, which IMO reduce readability. Anyway, looks good. Thanks a lot! David > Thanks, Sergei > > Sergei Zviagintsev (44): > Documentation/kdbus: Document new name registry flags > uapi: kdbus.h: Kernel-doc fixes > kdbus: Kernel-docs and comments trivial fixes > kdbus: Update kernel-doc for struct kdbus_pool > kdbus: Add comment on merging free pool slices > kdbus: Fix kernel-doc for struct kdbus_gaps > kdbus: Fix comment on translation of caps between namespaces > kdbus: Rename var in kdbus_meta_export_caps() > kdbus: Remove unused KDBUS_MSG_MAX_SIZE constant > kdbus: Use conditional operator > kdbus: Cosmetic fix of kdbus_name_is_valid() > kdbus: Use conventional list macros in __kdbus_pool_slice_release() > kdbus: Use list_next_entry() in kdbus_queue_entry_unlink() > kdbus: Simplify expression in kdbus_get_memfd() > kdbus: Simplify bitwise expression in kdbus_meta_get_mask() > kdbus: Drop redundant code from kdbus_name_acquire() > kdbus: Drop duplicated code from kdbus_pool_slice_alloc() > kdbus: Add var initialization to kdbus_conn_entry_insert() > kdbus: Drop useless initialization from kdbus_conn_reply() > kdbus: Drop useless initialization from kdbus_cmd_hello() > kdbus: Cleanup tests in kdbus_cmd_send() > kdbus: Cleanup error path in kdbus_staging_new_user() > kdbus: Cleanup kdbus_conn_call() > kdbus: Cleanup kdbus_conn_unicast() > kdbus: Cleanup kdbus_cmd_conn_info() > kdbus: Cleanup kdbus_pin_dst() > kdbus: Cleanup kdbus_conn_new() > kdbus: Cleanup kdbus_queue_entry_new() > kdbus: Improve tests on incrementing quota > kdbus: Cleanup kdbus_meta_proc_mask() > kdbus: Cleanup kdbus_conn_move_messages() > kdbus: Remove duplicated code from kdbus_conn_lock2() > kdbus: Improve kdbus_staging_reserve() > kdbus: Improve kdbus_conn_entry_sync_attach() > kdbus: Drop goto from kdbus_queue_entry_link() > kdbus: Improve kdbus_name_release() > kdbus: Fix error path in kdbus_meta_proc_collect_cgroup() > kdbus: Fix error path in kdbus_user_lookup() > kdbus: Cleanup kdbus_user_lookup() > kdbus: Cleanup kdbus_item_validate_name() > kdbus: Fix memfd install algorithm > kdbus: Check if fd is allocated before trying to free it > kdbus: Give up on failed fd allocation > kdbus: Cleanup kdbus_gaps_install() > > Documentation/kdbus/kdbus.name.xml | 42 +- > include/uapi/linux/kdbus.h | 43 +- > ipc/kdbus/connection.c | 157 > +++-- > ipc/kdbus/connection.h | 19 ++--- > ipc/kdbus/domain.c | 38 + > ipc/kdbus/fs.c | 2 +- > ipc/kdbus/item.c | 26 +++--- > ipc/kdbus/limits.h | 3 - > ipc/kdbus/message.c| 81 +-- > ipc/kdbus/message.h| 9 ++- > ipc/kdbus/metadata.c | 79 ++- > ipc/kdbus/names.c | 32 > ipc/kdbus/node.c | 4 +- > ipc/kdbus/pool.c | 26 +++--- > ipc/kdbus/queue.c | 51 ++-- > ipc/kdbus/queue.h | 2 +- > 16 files changed, 298 insertions(+), 316 deletions(-) > > -- > 1.8.3.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 43/44] kdbus: Give up on failed fd allocation
Hi On Thu, Oct 8, 2015 at 1:32 PM, Sergei Zviagintsev wrote: > If we failed to allocate a file descriptor, do not try to do it again on > the next iteration. There are few chances that we will have success, and > we can simplify the algorithm assuming that valid fd numbers are not > mixed with -1 values. Why? This is no a fast-path, and the penalty is accounted on the actual culprit (the caller). I don't see why we should optimize for that case. If the fd-allocation fails, you clearly did something wrong. Thanks David > Signed-off-by: Sergei Zviagintsev > --- > ipc/kdbus/message.c | 42 -- > 1 file changed, 24 insertions(+), 18 deletions(-) > > diff --git a/ipc/kdbus/message.c b/ipc/kdbus/message.c > index da685049d66c..75e6213e7ed5 100644 > --- a/ipc/kdbus/message.c > +++ b/ipc/kdbus/message.c > @@ -123,7 +123,7 @@ int kdbus_gaps_install(struct kdbus_gaps *gaps, struct > kdbus_pool_slice *slice, > { > bool incomplete_fds = false; > struct kvec kvec; > - size_t i, n_fds; > + size_t i, n_fds, n_memfds; > int ret, *fds; > > if (!gaps) { > @@ -140,25 +140,31 @@ int kdbus_gaps_install(struct kdbus_gaps *gaps, struct > kdbus_pool_slice *slice, > } > > fds = kmalloc_array(n_fds, sizeof(*fds), GFP_TEMPORARY); > - n_fds = 0; > if (!fds) > return -ENOMEM; > > /* 1) allocate fds and copy them over */ > > + n_fds = 0; /* n_fds now tracks the number of allocated fds */ > if (gaps->n_fds > 0) { > for (i = 0; i < gaps->n_fds; ++i) { > int fd; > > fd = get_unused_fd_flags(O_CLOEXEC); > - if (fd < 0) > + if (fd < 0) { > incomplete_fds = true; > + break; > + } > > WARN_ON(!gaps->fd_files[i]); > > - fds[n_fds++] = fd < 0 ? -1 : fd; > + fds[n_fds++] = fd; > } > > + /* If we couldn't allocate a fd, fill the rest with -1 */ > + for ( ; i < gaps->n_fds; ++i) > + fds[i] = -1; > + > /* > * The file-descriptor array can only be present once per > * message. Hence, prepare all fds and then copy them over > with > @@ -175,18 +181,18 @@ int kdbus_gaps_install(struct kdbus_gaps *gaps, struct > kdbus_pool_slice *slice, > goto exit; > } > > - for (i = 0; i < gaps->n_memfds; ++i) { > + n_memfds = 0; > + for (i = 0; i < gaps->n_memfds && !incomplete_fds; ++i) { > int memfd; > > memfd = get_unused_fd_flags(O_CLOEXEC); > if (memfd < 0) { > incomplete_fds = true; > - fds[n_fds++] = -1; > /* memfds are initialized to -1, skip copying it */ > - continue; > + break; > } > > - fds[n_fds++] = memfd; > + fds[gaps->n_fds + n_memfds++] = memfd; > > /* > * memfds have to be copied individually as they each are put > @@ -208,21 +214,21 @@ int kdbus_gaps_install(struct kdbus_gaps *gaps, struct > kdbus_pool_slice *slice, > > /* 2) install fds now that everything was successful */ > > - for (i = 0; i < gaps->n_fds; ++i) > - if (fds[i] >= 0) > - fd_install(fds[i], get_file(gaps->fd_files[i])); > - for (i = 0; i < gaps->n_memfds; ++i) > - if (fds[gaps->n_fds + i] >= 0) > - fd_install(fds[gaps->n_fds + i], > - get_file(gaps->memfd_files[i])); > + for (i = 0; i < n_fds; ++i) > + fd_install(fds[i], get_file(gaps->fd_files[i])); > + for (i = 0; i < n_memfds; ++i) > + fd_install(fds[gaps->n_fds + i], > + get_file(gaps->memfd_files[i])); > > ret = 0; > > exit: > - if (ret < 0) > + if (ret < 0) { > for (i = 0; i < n_fds; ++i) > - if (fds[i] >= 0) > - put_unused_fd(fds[i]); > + put_unused_fd(fds[i]); > + for (i = 0; i < n_memfds; ++i) > + put_unused_fd(fds[gaps->n_fds + i]); > + } > kfree(fds); > *out_incomplete = incomplete_fds; > return ret; > -- > 1.8.3.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 38/44] kdbus: Fix error path in kdbus_user_lookup()
Hi On Thu, Oct 8, 2015 at 1:32 PM, Sergei Zviagintsev wrote: > If idr_alloc() fails, we shouldn't call idr_remove() as latter produces > warning when called on non-allocated ids. Split cleanup code into three > parts for differrent cleanup scenarios before and after idr_alloc(). > > Signed-off-by: Sergei Zviagintsev > --- > ipc/kdbus/domain.c | 20 ++-- > 1 file changed, 10 insertions(+), 10 deletions(-) > > diff --git a/ipc/kdbus/domain.c b/ipc/kdbus/domain.c > index ac9f760c150d..31cd09fb572f 100644 > --- a/ipc/kdbus/domain.c > +++ b/ipc/kdbus/domain.c > @@ -208,7 +208,7 @@ struct kdbus_user *kdbus_user_lookup(struct kdbus_domain > *domain, kuid_t uid) > u = kzalloc(sizeof(*u), GFP_KERNEL); > if (!u) { > ret = -ENOMEM; > - goto exit; > + goto exit_unlock; > } > > kref_init(&u->kref); > @@ -225,7 +225,7 @@ struct kdbus_user *kdbus_user_lookup(struct kdbus_domain > *domain, kuid_t uid) > ret = idr_alloc(&domain->user_idr, u, __kuid_val(uid), > __kuid_val(uid) + 1, GFP_KERNEL); > if (ret < 0) > - goto exit; > + goto exit_free; > } > } > > @@ -235,19 +235,19 @@ struct kdbus_user *kdbus_user_lookup(struct > kdbus_domain *domain, kuid_t uid) > */ > ret = ida_simple_get(&domain->user_ida, 1, 0, GFP_KERNEL); > if (ret < 0) > - goto exit; > + goto exit_idr; > Why not simply assign "u->uid = uid;" _after_ doing the idr operations? Thanks David > u->id = ret; > mutex_unlock(&domain->lock); > return u; > > -exit: > - if (u) { > - if (uid_valid(u->uid)) > - idr_remove(&domain->user_idr, __kuid_val(u->uid)); > - kdbus_domain_unref(u->domain); > - kfree(u); > - } > +exit_idr: > + if (uid_valid(u->uid)) > + idr_remove(&domain->user_idr, __kuid_val(u->uid)); > +exit_free: > + kdbus_domain_unref(u->domain); > + kfree(u); > +exit_unlock: > mutex_unlock(&domain->lock); > return ERR_PTR(ret); > } > -- > 1.8.3.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 31/44] kdbus: Cleanup kdbus_conn_move_messages()
Hi On Thu, Oct 8, 2015 at 1:32 PM, Sergei Zviagintsev wrote: > - Move `r' and `ret' to scopes where they are used. Drop redundant >initialization of `ret'. > > - Initialize `bus' on declaration. > > - Replace list_for_each_entry_safe() with list_for_each_entry() when >iterating over list of replies. > > - Drop redundant `continue'. > > Signed-off-by: Sergei Zviagintsev > --- > ipc/kdbus/connection.c | 14 +++--- > 1 file changed, 7 insertions(+), 7 deletions(-) > > diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c > index b32b4f981618..6ee688d3de53 100644 > --- a/ipc/kdbus/connection.c > +++ b/ipc/kdbus/connection.c > @@ -1293,25 +1293,24 @@ void kdbus_conn_move_messages(struct kdbus_conn > *conn_dst, > u64 name_id) > { > struct kdbus_queue_entry *e, *e_tmp; > - struct kdbus_reply *r, *r_tmp; > - struct kdbus_bus *bus; > + struct kdbus_bus *bus = conn_src->ep->bus; > struct kdbus_conn *c; > LIST_HEAD(msg_list); > - int i, ret = 0; > + int i; > > if (WARN_ON(conn_src == conn_dst)) > return; > > - bus = conn_src->ep->bus; > - > /* lock order: domain -> bus -> ep -> names -> conn */ > down_read(&bus->conn_rwlock); > hash_for_each(bus->conn_hash, i, c, hentry) { > + struct kdbus_reply *r; > + Why make 'r' local? > if (c == conn_src || c == conn_dst) > continue; > > mutex_lock(&c->lock); > - list_for_each_entry_safe(r, r_tmp, &c->reply_list, entry) { > + list_for_each_entry(r, &c->reply_list, entry) { Looks good. > if (r->reply_src != conn_src) > continue; > > @@ -1328,6 +1327,8 @@ void kdbus_conn_move_messages(struct kdbus_conn > *conn_dst, > > kdbus_conn_lock2(conn_src, conn_dst); > list_for_each_entry_safe(e, e_tmp, &conn_src->queue.msg_list, entry) { > + int ret; > + Why make it local? > /* filter messages for a specific name */ > if (name_id > 0 && e->dst_name_id != name_id) > continue; > @@ -1343,7 +1344,6 @@ void kdbus_conn_move_messages(struct kdbus_conn > *conn_dst, > if (ret < 0) { > kdbus_conn_lost_message(conn_dst); > kdbus_queue_entry_free(e); > - continue; Looks good. Thanks David > } > } > kdbus_conn_unlock2(conn_src, conn_dst); > -- > 1.8.3.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 30/44] kdbus: Cleanup kdbus_meta_proc_mask()
Hi On Thu, Oct 8, 2015 at 1:31 PM, Sergei Zviagintsev wrote: > - Simplify if-statements > >'proc < KDBUS_META_PROC_NORMAL' test only makes sense when we call >kdbus_proc_permission(). Include it into 'prv_pid != req_pid' branch, >remove redundant assignment of `proc' var and reduce its scope. > > - Drop redundant binary operation > >In 'proc < KDBUS_META_PROC_NORMAL' case we firstly do >'wanted &= ~flags' and then 'wanted & flags' in the return statement, >which is the same as 'wanted & 0'. Return 0 instead. > > - Cosmetic cleanup of the return statement. > > Signed-off-by: Sergei Zviagintsev > --- > ipc/kdbus/metadata.c | 32 > 1 file changed, 12 insertions(+), 20 deletions(-) > > diff --git a/ipc/kdbus/metadata.c b/ipc/kdbus/metadata.c > index 61215a078359..b8d094d9fb56 100644 > --- a/ipc/kdbus/metadata.c > +++ b/ipc/kdbus/metadata.c > @@ -1224,7 +1224,6 @@ static u64 kdbus_meta_proc_mask(struct pid *prv_pid, > u64 wanted) > { > struct pid_namespace *prv_ns, *req_ns; > - unsigned int proc; > > prv_ns = ns_of_pid(prv_pid); > req_ns = ns_of_pid(req_pid); > @@ -1243,30 +1242,23 @@ static u64 kdbus_meta_proc_mask(struct pid *prv_pid, > * provider and requestor are the same. If not, we perform rather > * expensive /proc permission checks. > */ > - if (prv_pid == req_pid) > - proc = KDBUS_META_PROC_NORMAL; > - else > - proc = kdbus_proc_permission(req_ns, req_cred, prv_pid); > - > - /* you need /proc access to read standard process attributes */ > - if (proc < KDBUS_META_PROC_NORMAL) > - wanted &= ~(KDBUS_ATTACH_TID_COMM | > - KDBUS_ATTACH_PID_COMM | > - KDBUS_ATTACH_SECLABEL | > - KDBUS_ATTACH_CMDLINE | > - KDBUS_ATTACH_CGROUP | > - KDBUS_ATTACH_AUDIT | > - KDBUS_ATTACH_CAPS | > - KDBUS_ATTACH_EXE); > + if (prv_pid != req_pid) { > + unsigned int proc = kdbus_proc_permission(req_ns, req_cred, > + prv_pid); Please keep "proc" declared non-local. Also, we don't do direct assignments of non-static data in declarations. Thanks David > + > + /* you need /proc access to read standard process attributes > */ > + if (proc < KDBUS_META_PROC_NORMAL) > + return 0; > + } > > /* clear all non-/proc flags */ > return wanted & (KDBUS_ATTACH_TID_COMM | > KDBUS_ATTACH_PID_COMM | > KDBUS_ATTACH_SECLABEL | > -KDBUS_ATTACH_CMDLINE | > -KDBUS_ATTACH_CGROUP | > -KDBUS_ATTACH_AUDIT | > -KDBUS_ATTACH_CAPS | > +KDBUS_ATTACH_CMDLINE | > +KDBUS_ATTACH_CGROUP | > +KDBUS_ATTACH_AUDIT| > +KDBUS_ATTACH_CAPS | > KDBUS_ATTACH_EXE); > } > > -- > 1.8.3.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 26/44] kdbus: Cleanup kdbus_pin_dst()
Hi On Thu, Oct 8, 2015 at 1:31 PM, Sergei Zviagintsev wrote: > - Reduce scope of `owner' var to the block where it is actually used. >Use conditional operator to set its value. > > - Drop initialization of `dst' as it gets value later. > > - Drop separate exit point as we have only one use case of it. > > Signed-off-by: Sergei Zviagintsev > --- > ipc/kdbus/connection.c | 17 ++--- > 1 file changed, 6 insertions(+), 11 deletions(-) > > diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c > index 6a73ac3f444d..ace587ee951a 100644 > --- a/ipc/kdbus/connection.c > +++ b/ipc/kdbus/connection.c > @@ -1048,10 +1048,8 @@ static int kdbus_pin_dst(struct kdbus_bus *bus, > struct kdbus_conn **out_dst) > { > const struct kdbus_msg *msg = staging->msg; > - struct kdbus_name_owner *owner = NULL; > struct kdbus_name_entry *name = NULL; > - struct kdbus_conn *dst = NULL; > - int ret; > + struct kdbus_conn *dst; > > lockdep_assert_held(&bus->name_registry->rwlock); > > @@ -1061,14 +1059,15 @@ static int kdbus_pin_dst(struct kdbus_bus *bus, > return -ENXIO; > > if (!kdbus_conn_is_ordinary(dst)) { > - ret = -ENXIO; > - goto error; > + kdbus_conn_unref(dst); > + return -ENXIO; Looks good. > } > } else { > + struct kdbus_name_owner *owner; > + I'd prefer if you avoid making this block-local. > name = kdbus_name_lookup_unlocked(bus->name_registry, > staging->dst_name); > - if (name) > - owner = kdbus_name_get_owner(name); > + owner = name ? kdbus_name_get_owner(name) : NULL; Looks good. > if (!owner) > return -ESRCH; > > @@ -1094,10 +1093,6 @@ static int kdbus_pin_dst(struct kdbus_bus *bus, > *out_name = name; > *out_dst = dst; > return 0; > - > -error: > - kdbus_conn_unref(dst); > - return ret; Looks good. Thanks David > } > > static int kdbus_conn_reply(struct kdbus_conn *src, > -- > 1.8.3.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 25/44] kdbus: Cleanup kdbus_cmd_conn_info()
Hi On Thu, Oct 8, 2015 at 1:31 PM, Sergei Zviagintsev wrote: > - Move `entry' and `owner' to the scope where they are used. Drop >redundand initialization of `entry'. Use conditional operator to set >the value of `owner'. > > - Set `ret' to zero right after call to kdbus_pool_slice_copy_kvec(), >not in the end of function. > > - Remove redundant goto. > > Signed-off-by: Sergei Zviagintsev > --- > ipc/kdbus/connection.c | 16 +++- > 1 file changed, 7 insertions(+), 9 deletions(-) > > diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c > index b3c5f20a57d8..6a73ac3f444d 100644 > --- a/ipc/kdbus/connection.c > +++ b/ipc/kdbus/connection.c > @@ -1702,8 +1702,6 @@ int kdbus_cmd_conn_info(struct kdbus_conn *conn, void > __user *argp) > { > struct kdbus_meta_conn *conn_meta = NULL; > struct kdbus_pool_slice *slice = NULL; > - struct kdbus_name_entry *entry = NULL; > - struct kdbus_name_owner *owner = NULL; > struct kdbus_conn *owner_conn = NULL; > struct kdbus_item *meta_items = NULL; > struct kdbus_info info = {}; > @@ -1739,9 +1737,12 @@ int kdbus_cmd_conn_info(struct kdbus_conn *conn, void > __user *argp) > name = argv[1].item ? argv[1].item->str : NULL; > > if (name) { > + struct kdbus_name_entry *entry; > + struct kdbus_name_owner *owner; > + > entry = kdbus_name_lookup_unlocked(bus->name_registry, name); > - if (entry) > - owner = kdbus_name_get_owner(entry); > + owner = entry ? kdbus_name_get_owner(entry) : NULL; > + This looks good to me. > if (!owner || > !kdbus_conn_policy_see_name(conn, current_cred(), name) || > (cmd->id != 0 && owner->conn->id != cmd->id)) { > @@ -1804,17 +1805,14 @@ int kdbus_cmd_conn_info(struct kdbus_conn *conn, void > __user *argp) > ret = kdbus_pool_slice_copy_kvec(slice, 0, kvec, cnt, size); > if (ret < 0) > goto exit; > + ret = 0; > > kdbus_pool_slice_publish(slice, &cmd->offset, &cmd->info_size); > > if (kdbus_member_set_user(&cmd->offset, argp, typeof(*cmd), offset) || > kdbus_member_set_user(&cmd->info_size, argp, > - typeof(*cmd), info_size)) { > + typeof(*cmd), info_size)) > ret = -EFAULT; > - goto exit; > - } > - > - ret = 0; Again, why? Now you have a random "ret = 0;" somewhere in between, instead of directly at the tail of the success-path. Thanks David > > exit: > up_read(&bus->name_registry->rwlock); > -- > 1.8.3.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 24/44] kdbus: Cleanup kdbus_conn_unicast()
Hi On Thu, Oct 8, 2015 at 1:31 PM, Sergei Zviagintsev wrote: > Do not initialize `name' and `ret' as values are assigned to them at > the first use by kdbus_pin_dst(). Simplify handling of > kdbus_conn_entry_insert() return value and drop useless goto. > > Signed-off-by: Sergei Zviagintsev > --- > ipc/kdbus/connection.c | 10 -- > 1 file changed, 4 insertions(+), 6 deletions(-) > > diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c > index db49f282a1bf..b3c5f20a57d8 100644 > --- a/ipc/kdbus/connection.c > +++ b/ipc/kdbus/connection.c > @@ -1229,12 +1229,12 @@ static int kdbus_conn_unicast(struct kdbus_conn *src, > struct kdbus_staging *staging) > { > const struct kdbus_msg *msg = staging->msg; > - struct kdbus_name_entry *name = NULL; > + struct kdbus_name_entry *name; > struct kdbus_reply *wait = NULL; > struct kdbus_conn *dst = NULL; > struct kdbus_bus *bus = src->ep->bus; > bool is_signal = (msg->flags & KDBUS_MSG_SIGNAL); > - int ret = 0; > + int ret; > > if (WARN_ON(msg->dst_id == KDBUS_DST_ID_BROADCAST) || > WARN_ON(!(msg->flags & KDBUS_MSG_EXPECT_REPLY) && > @@ -1245,7 +1245,6 @@ static int kdbus_conn_unicast(struct kdbus_conn *src, > down_read(&bus->name_registry->rwlock); > > /* find and pin destination */ > - If a comment is addressed to a whole following block, we usually put a newline after it. Only if the comment is only addressed at the next code-line, we don't. > ret = kdbus_pin_dst(bus, staging, &name, &dst); > if (ret < 0) > goto exit; > @@ -1276,11 +1275,10 @@ static int kdbus_conn_unicast(struct kdbus_conn *src, > kdbus_bus_eavesdrop(bus, src, staging); > > ret = kdbus_conn_entry_insert(src, dst, staging, wait, name); > - if (ret < 0 && !is_signal) > - goto exit; > > /* signals are treated like broadcasts, recv-errors are ignored */ > - ret = 0; > + if (is_signal) > + ret = 0; Why? Just to reduce the line-count? You break the code-flow here, by making the success-path conditional, instead of the error-path. Thanks David > > exit: > up_read(&bus->name_registry->rwlock); > -- > 1.8.3.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 23/44] kdbus: Cleanup kdbus_conn_call()
Hi On Thu, Oct 8, 2015 at 1:31 PM, Sergei Zviagintsev wrote: > Do not initialize `wait' and `name' as values are assigned to them at > first use: `wait' gets its value from kdbus_reply_find(), `name' is set > by kdbus_pin_dst(). > > Remove redundant code. goto isn't required as we reached exit point > already. Setting `ret' to zero is unnecessary because > kdbus_conn_entry_insert() returns 0 on success. > > Signed-off-by: Sergei Zviagintsev > --- > ipc/kdbus/connection.c | 10 ++ > 1 file changed, 2 insertions(+), 8 deletions(-) > > diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c > index a4d7414ecaea..db49f282a1bf 100644 > --- a/ipc/kdbus/connection.c > +++ b/ipc/kdbus/connection.c > @@ -1159,8 +1159,8 @@ static struct kdbus_reply *kdbus_conn_call(struct > kdbus_conn *src, >ktime_t exp) > { > const struct kdbus_msg *msg = staging->msg; > - struct kdbus_name_entry *name = NULL; > - struct kdbus_reply *wait = NULL; > + struct kdbus_name_entry *name; > + struct kdbus_reply *wait; > struct kdbus_conn *dst = NULL; > struct kdbus_bus *bus = src->ep->bus; > int ret; > @@ -1212,14 +1212,8 @@ static struct kdbus_reply *kdbus_conn_call(struct > kdbus_conn *src, > } > > /* send message */ > - > kdbus_bus_eavesdrop(bus, src, staging); > - > ret = kdbus_conn_entry_insert(src, dst, staging, wait, name); > - if (ret < 0) > - goto exit; > - > - ret = 0; Who says kdbus_conn_entry_insert() returns 0? It might be >0. I'd prefer the explicit check. Thanks David > > exit: > up_read(&bus->name_registry->rwlock); > -- > 1.8.3.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 21/44] kdbus: Cleanup tests in kdbus_cmd_send()
Hi On Thu, Oct 8, 2015 at 1:31 PM, Sergei Zviagintsev wrote: > Rearrange tests a little to make them look cleaner. What's wrong with 'else if'? Thanks David > Signed-off-by: Sergei Zviagintsev > --- > ipc/kdbus/connection.c | 9 + > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c > index 93da7f539f74..a4d7414ecaea 100644 > --- a/ipc/kdbus/connection.c > +++ b/ipc/kdbus/connection.c > @@ -1977,7 +1977,7 @@ int kdbus_cmd_send(struct kdbus_conn *conn, struct file > *f, void __user *argp) > ret = kdbus_args_parse(&args, argp, &cmd); > if (ret < 0) > goto exit; > - else if (ret > 0 && !cmd->msg_address) /* negotiation without msg */ > + if (ret > 0 && !cmd->msg_address) /* negotiation without msg */ > goto exit; > > ret2 = kdbus_args_parse_msg(&msg_args, KDBUS_PTR(cmd->msg_address), > @@ -1985,10 +1985,11 @@ int kdbus_cmd_send(struct kdbus_conn *conn, struct > file *f, void __user *argp) > if (ret2 < 0) { /* cannot parse message */ > ret = ret2; > goto exit; > - } else if (ret2 > 0 && !ret) { /* msg-negot implies cmd-negot */ > - ret = -EINVAL; > + } > + if (ret > 0) /* negotiation */ > goto exit; > - } else if (ret > 0) { /* negotiation */ > + if (ret2 > 0) { /* msg-negot implies cmd-negot */ > + ret = -EINVAL; > goto exit; > } > > -- > 1.8.3.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 18/44] kdbus: Add var initialization to kdbus_conn_entry_insert()
Hi On Thu, Oct 8, 2015 at 1:31 PM, Sergei Zviagintsev wrote: > Assign zero to `ret' in the beginning of function instead of doing it > in the end. > > Signed-off-by: Sergei Zviagintsev > --- > ipc/kdbus/connection.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c > index 4f3cd370ecd9..185ed3ba1bce 100644 > --- a/ipc/kdbus/connection.c > +++ b/ipc/kdbus/connection.c > @@ -889,7 +889,7 @@ int kdbus_conn_entry_insert(struct kdbus_conn *conn_src, > const struct kdbus_name_entry *name) > { > struct kdbus_queue_entry *entry; > - int ret; > + int ret = 0; > > kdbus_conn_lock2(conn_src, conn_dst); > > @@ -916,8 +916,6 @@ int kdbus_conn_entry_insert(struct kdbus_conn *conn_src, > kdbus_queue_entry_enqueue(entry, reply); > wake_up_interruptible(&conn_dst->wait); > > - ret = 0; > - Not a big fan of this. It makes it less obvious, and this style is wrong in several cases (but not here). We often only check for "ret < 0", but generally want >0 to be turned into 0 on return. It does not matter in this specific case, but I prefer making return codes explicit, rather than relying on a previous initialization to be still valid. What's your rationale here? Thanks David > exit_unlock: > kdbus_conn_unlock2(conn_src, conn_dst); > return ret; > -- > 1.8.3.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 15/44] kdbus: Simplify bitwise expression in kdbus_meta_get_mask()
Hi On Thu, Oct 8, 2015 at 1:31 PM, Sergei Zviagintsev wrote: > Replace the expression with more concise and readable equivalent. It can > be proven by opening parentheses: > > r & ~((p | i) & r) == r & (~(p | i) | ~r) == > (r & ~(p | i)) | (r & ~r) == r & ~(p | i) == r & ~p & ~i But why? The current code follows the description, and does exactly the same. It shows that it merges the "provided" and "implied" masks, and then extracts the flags that are missing compared to the required mask. I cannot follow why your code is more obvious? David > Signed-off-by: Sergei Zviagintsev > --- > ipc/kdbus/metadata.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/ipc/kdbus/metadata.c b/ipc/kdbus/metadata.c > index 788b4d9c7ecb..61215a078359 100644 > --- a/ipc/kdbus/metadata.c > +++ b/ipc/kdbus/metadata.c > @@ -1321,7 +1321,7 @@ static u64 kdbus_meta_get_mask(struct pid *prv_pid, u64 > prv_mask, > * the sender, but still requested by the receiver. If any are left, > * perform rather expensive /proc access checks for them. > */ > - missing = req_mask & ~((prv_mask | impl_mask) & req_mask); > + missing = req_mask & ~prv_mask & ~impl_mask; > if (missing) > proc_mask = kdbus_meta_proc_mask(prv_pid, req_pid, req_cred, > missing); > -- > 1.8.3.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/