Re: [PATCH] coresight: etm4x: Add config to exclude kernel mode tracing
On Fri, Jan 15, 2021 at 6:46 AM Sai Prakash Ranjan wrote: > > Hello Mathieu, Suzuki > > On 2020-10-15 21:32, Mathieu Poirier wrote: > > On Thu, Oct 15, 2020 at 06:15:22PM +0530, Sai Prakash Ranjan wrote: > >> On production systems with ETMs enabled, it is preferred to > >> exclude kernel mode(NS EL1) tracing for security concerns and > >> support only userspace(NS EL0) tracing. So provide an option > >> via kconfig to exclude kernel mode tracing if it is required. > >> This config is disabled by default and would not affect the > >> current configuration which has both kernel and userspace > >> tracing enabled by default. > >> > > > > One requires root access (or be part of a special trace group) to be > > able to use > > the cs_etm PMU. With this kind of elevated access restricting tracing > > at EL1 > > provides little in terms of security. > > > > Apart from the VM usecase discussed, I am told there are other > security concerns here regarding need to exclude kernel mode tracing > even for the privileged users/root. One such case being the ability > to analyze cryptographic code execution since ETMs can record all > branch instructions including timestamps in the kernel and there may > be other cases as well which I may not be aware of and hence have > added Denis and Mattias. Please let us know if you have any questions > further regarding this not being a security concern. Well, the idea that root privileges != full control over the kernel isn't new and at the very least since lockdown became part of mainline [1] no longer an esoteric edge case. Regarding the use case Sai hints at (namely protection of secrets in the kernel), Matthew Garret actually has some more thoughts about confidentiality mode for lockdown for secret protection [2]. And thus, unless someone can make a compelling case that instruction-level tracing will not leak secrets held by the kernel, I think an option for the kernel to prevent itself from being traced (even by root) is valuable. Finally, to sketch a practical use case scenario: Consider a system where disk contents are encrypted and the encryption key is set up by the user when mounting the file system. From that point on the encryption key resides in the kernel. It seems reasonable to expect that the disk encryption key be protected from exfiltration even if the system later suffers a root compromise (or even against insiders that have root access), at least as long as the attacker doesn't manage to compromise the kernel. [1] https://lwn.net/Articles/796866/ [2] https://mjg59.dreamwidth.org/55105.html > > After this discussion, I would like to post a v2 based on Suzuki's > feedback earlier. @Suzuki, I have a common config for ETM3 and ETM4 > but couldn't get much idea on how to implement it for Intel PTs, if > you have any suggestions there, please do share or we can have this > only for Coresight ETMs. > > Thanks, > Sai > > -- > QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a > member > of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH 4.9 30/47] ANDROID: binder: remove waitqueue when thread exits.
(resend, apologies for accidental HTML reply) On Sun, Oct 6, 2019 at 11:24 AM Greg Kroah-Hartman wrote: > > On Sun, Oct 06, 2019 at 10:32:02AM -0700, Eric Biggers wrote: > > On Sun, Oct 06, 2019 at 07:21:17PM +0200, Greg Kroah-Hartman wrote: > > > From: Martijn Coenen > > > > > > commit f5cb779ba16334b45ba8946d6bfa6d9834d1527f upstream. > > > > > > binder_poll() passes the thread->wait waitqueue that > > > can be slept on for work. When a thread that uses > > > epoll explicitly exits using BINDER_THREAD_EXIT, > > > the waitqueue is freed, but it is never removed > > > from the corresponding epoll data structure. When > > > the process subsequently exits, the epoll cleanup > > > code tries to access the waitlist, which results in > > > a use-after-free. > > > > > > Prevent this by using POLLFREE when the thread exits. > > > > > > Signed-off-by: Martijn Coenen > > > Reported-by: syzbot > > > Cc: stable # 4.14 > > > [backport BINDER_LOOPER_STATE_POLL logic as well] > > > Signed-off-by: Mattias Nissler > > > Signed-off-by: Greg Kroah-Hartman > > > --- > > > drivers/android/binder.c | 17 - > > > 1 file changed, 16 insertions(+), 1 deletion(-) > > > > > > --- a/drivers/android/binder.c > > > +++ b/drivers/android/binder.c > > > @@ -334,7 +334,8 @@ enum { > > > BINDER_LOOPER_STATE_EXITED = 0x04, > > > BINDER_LOOPER_STATE_INVALID = 0x08, > > > BINDER_LOOPER_STATE_WAITING = 0x10, > > > - BINDER_LOOPER_STATE_NEED_RETURN = 0x20 > > > + BINDER_LOOPER_STATE_NEED_RETURN = 0x20, > > > + BINDER_LOOPER_STATE_POLL= 0x40, > > > }; > > > > > > struct binder_thread { > > > @@ -2628,6 +2629,18 @@ static int binder_free_thread(struct bin > > > } else > > > BUG(); > > > } > > > + > > > + /* > > > +* If this thread used poll, make sure we remove the waitqueue > > > +* from any epoll data structures holding it with POLLFREE. > > > +* waitqueue_active() is safe to use here because we're holding > > > +* the inner lock. > > > +*/ > > > + if ((thread->looper & BINDER_LOOPER_STATE_POLL) && > > > + waitqueue_active(>wait)) { > > > + wake_up_poll(>wait, POLLHUP | POLLFREE); > > > + } > > > + > > > if (send_reply) > > > binder_send_failed_reply(send_reply, BR_DEAD_REPLY); > > > binder_release_work(>todo); > > > @@ -2651,6 +2664,8 @@ static unsigned int binder_poll(struct f > > > return POLLERR; > > > } > > > > > > + thread->looper |= BINDER_LOOPER_STATE_POLL; > > > + > > > wait_for_proc_work = thread->transaction_stack == NULL && > > > list_empty(>todo) && thread->return_error == BR_OK; > > > > > > > Are you sure this backport is correct, given that in 4.9, binder_poll() > > sometimes uses proc->wait instead of thread->wait?: Jann's PoC calls the BINDER_THREAD_EXIT ioctl to free the binder_thread which will then cause the UAF, and this is cut off by the patch. IIUC, you are worried about a similar AUF on the proc->wait access. I am not 100% sure, but I think the binder_proc lifetime matches the corresponding struct file instance, so it shouldn't be possible to get the binder_proc deallocated while still being able to access it via filp->private_data. > > > > wait_for_proc_work = thread->transaction_stack == NULL && > > list_empty(>todo) && thread->return_error == BR_OK; > > > > binder_unlock(__func__); > > > > if (wait_for_proc_work) { > > if (binder_has_proc_work(proc, thread)) > > return POLLIN; > > poll_wait(filp, >wait, wait); > > if (binder_has_proc_work(proc, thread)) > > return POLLIN; > > } else { > > if (binder_has_thread_work(thread)) > > return POLLIN; > > poll_wait(filp, >wait, wait); > > if (binder_has_thread_work(thread)) > > return POLLIN; > > } > > return 0; > > I _think_ the backport is correct, and I know someone has verified that > the 4.4.y backport works properly and I don't see much difference here > from that version. > > But I will defer to Todd and Martijn here, as they know this code _WAY_ > better than I do. The codebase has changed a lot from 4.9.y to 4.14.y > so it makes it hard to do equal comparisons simply. > > Todd and Martijn, thoughts? > > thanks, > > greg k-h
Re: [PATCH v2] Add a "nosymlinks" mount option.
On Mon, Nov 21, 2016 at 6:10 PM, James Bottomley <james.bottom...@hansenpartnership.com> wrote: > On Wed, 2016-11-16 at 13:18 -0800, Mattias Nissler wrote: >> I understand that silence suggests there's little interest, but >> here's some new information I discovered today that may justify to >> reconsider the patch: >> >> The BSDs already have exactly what I propose, the mount option is >> called "nosymfollow" there: >> https://github.com/freebsd/freebsd/blob/a41f4cc9a57cd74604ae7b051eec2 >> f48865f18d6/sys/kern/vfs_lookup.c#L939 >> >> There's also some evidence on the net that people have been using >> said nosymfollow mount option to mitigate symlink attacks. > > I've got to say that just because BSD does this doesn't make it a good > idea. The problem with disabling symlinks is that they're a core part > of unix filesystem semantics and simply disabling them breaks various > setups in various ways. Agreed. The proposed mount option is not going to mix well with cases such as shared object symlinks commonly found in /usr/lib/ > The breakage depends on the user, so if you > come from a Windows background, where symlinks basically don't exist, > you're likely fine, but if you come from a UNIX background, chances are > you use them pretty extensively. Obviously, I'm a UNIX background > person and I can think of all the nasty ways you'll break my setup ... Note that I'm not saying you should be applying this mount option broadly, just that it is useful for certain mounts where there are no reasons for symlinks to be present. Point in case: we already have noexec, but obviously people don't mount their root fs noexec as that would prevent the system from booting ;-) > > If a root privileged programme is trying to write to a file, it's not > unreasonable to expect the programme itself to take simple precautions > (like setuid to the user it's trying to write), so I'd really rather > declare binaries that don't do this as broken and fix them. Thinking > you've plugged a hole like this simply by disabling symlinks is a false > sense of security because there are a variety of other ways I could > trick the programme into writing where it shouldn't. Disabling > symlinks is fixing a symptom, it's really the cause that should be > fixed. Your points are generally valid. Yes, I'm addressing the symptom of incautious access to untrusted file systems. I actually agree that a better solution would be for privileged code to be aware of the risks of file system access and to apply caution where necessary. I also agree that symlinks are not the only issue with untrusted file systems, so applying this mount option is not a panacea. I should note that I'm actively working on hardening some of the more exposed contexts in Chrome OS beyond the symlink traversal issue. However, I still believe that blocking symlink traversal in the kernel is a useful thing to do in our use case. Rationale: 1. On a typical system, there's a ton of code executing as root or with elevated privileges. It's not practical to audit all that code for symlink traversal issues. Traditional systems just assume that the root file system is trusted and that code running as root only deals with root-owned files, which alleviates the problem to a point where it becomes manageable. However, on Chrome OS we have the ambition to not blindly trust the writable file system, so we have to worry about *any* file access in privileged contexts, not just the cases where privileged processes access user-owned file system locations (allow me to add that history has shown that this alone has been a constant source of bugs to the point that it is now generally considered an anti-pattern). 2. It is sometimes hard to make code sufficiently cautious, for example in init scripts. As long as you're using shell, you want to use features such as shell redirection. To cover these, you'd ideally patch the shell to take care upon opening files for redirection, but given the prevalence of file access and the myriad of tools invoked by shell this is bound to be a losing battle. Another option is to add shell code along the lines of this: test "$(readlink -f $PATH)" = "$PATH" (note that O_NOFOLLOW is not enough as that'll only affect the last path component). That's tedious and error-prone to maintain, and also comes with a TOCTOU issue. 3. Assuming for a moment that we can fix all existing issues, it's still not realistic to keep the code clean moving forward. IMHO, the root of the problem is that developers just don't have symlinks on their radar when writing file access code. "I'll just write this data to this file, and I'll hardcode the path, so what can go wrong?" is probably a realistic level of caution you can expect. The fact that the presence of symlinks can redirect the operation to a
Re: [PATCH v2] Add a "nosymlinks" mount option.
On Mon, Nov 21, 2016 at 6:10 PM, James Bottomley wrote: > On Wed, 2016-11-16 at 13:18 -0800, Mattias Nissler wrote: >> I understand that silence suggests there's little interest, but >> here's some new information I discovered today that may justify to >> reconsider the patch: >> >> The BSDs already have exactly what I propose, the mount option is >> called "nosymfollow" there: >> https://github.com/freebsd/freebsd/blob/a41f4cc9a57cd74604ae7b051eec2 >> f48865f18d6/sys/kern/vfs_lookup.c#L939 >> >> There's also some evidence on the net that people have been using >> said nosymfollow mount option to mitigate symlink attacks. > > I've got to say that just because BSD does this doesn't make it a good > idea. The problem with disabling symlinks is that they're a core part > of unix filesystem semantics and simply disabling them breaks various > setups in various ways. Agreed. The proposed mount option is not going to mix well with cases such as shared object symlinks commonly found in /usr/lib/ > The breakage depends on the user, so if you > come from a Windows background, where symlinks basically don't exist, > you're likely fine, but if you come from a UNIX background, chances are > you use them pretty extensively. Obviously, I'm a UNIX background > person and I can think of all the nasty ways you'll break my setup ... Note that I'm not saying you should be applying this mount option broadly, just that it is useful for certain mounts where there are no reasons for symlinks to be present. Point in case: we already have noexec, but obviously people don't mount their root fs noexec as that would prevent the system from booting ;-) > > If a root privileged programme is trying to write to a file, it's not > unreasonable to expect the programme itself to take simple precautions > (like setuid to the user it's trying to write), so I'd really rather > declare binaries that don't do this as broken and fix them. Thinking > you've plugged a hole like this simply by disabling symlinks is a false > sense of security because there are a variety of other ways I could > trick the programme into writing where it shouldn't. Disabling > symlinks is fixing a symptom, it's really the cause that should be > fixed. Your points are generally valid. Yes, I'm addressing the symptom of incautious access to untrusted file systems. I actually agree that a better solution would be for privileged code to be aware of the risks of file system access and to apply caution where necessary. I also agree that symlinks are not the only issue with untrusted file systems, so applying this mount option is not a panacea. I should note that I'm actively working on hardening some of the more exposed contexts in Chrome OS beyond the symlink traversal issue. However, I still believe that blocking symlink traversal in the kernel is a useful thing to do in our use case. Rationale: 1. On a typical system, there's a ton of code executing as root or with elevated privileges. It's not practical to audit all that code for symlink traversal issues. Traditional systems just assume that the root file system is trusted and that code running as root only deals with root-owned files, which alleviates the problem to a point where it becomes manageable. However, on Chrome OS we have the ambition to not blindly trust the writable file system, so we have to worry about *any* file access in privileged contexts, not just the cases where privileged processes access user-owned file system locations (allow me to add that history has shown that this alone has been a constant source of bugs to the point that it is now generally considered an anti-pattern). 2. It is sometimes hard to make code sufficiently cautious, for example in init scripts. As long as you're using shell, you want to use features such as shell redirection. To cover these, you'd ideally patch the shell to take care upon opening files for redirection, but given the prevalence of file access and the myriad of tools invoked by shell this is bound to be a losing battle. Another option is to add shell code along the lines of this: test "$(readlink -f $PATH)" = "$PATH" (note that O_NOFOLLOW is not enough as that'll only affect the last path component). That's tedious and error-prone to maintain, and also comes with a TOCTOU issue. 3. Assuming for a moment that we can fix all existing issues, it's still not realistic to keep the code clean moving forward. IMHO, the root of the problem is that developers just don't have symlinks on their radar when writing file access code. "I'll just write this data to this file, and I'll hardcode the path, so what can go wrong?" is probably a realistic level of caution you can expect. The fact that the presence of symlinks can redirect the operation to an entirely different file isn't obvious at all wh
Re: [PATCH v2] Add a "nosymlinks" mount option.
On Thu, Nov 17, 2016 at 1:43 PM, Austin S. Hemmelgarn <ahferro...@gmail.com> wrote: > On 2016-11-16 16:18, Mattias Nissler wrote: >> >> I understand that silence suggests there's little interest, but here's >> some new information I discovered today that may justify to reconsider >> the patch: >> >> The BSDs already have exactly what I propose, the mount option is >> called "nosymfollow" there: >> >> https://github.com/freebsd/freebsd/blob/a41f4cc9a57cd74604ae7b051eec2f48865f18d6/sys/kern/vfs_lookup.c#L939 >> >> There's also some evidence on the net that people have been using said >> nosymfollow mount option to mitigate symlink attacks. > > I will comment that I do have interest in this (to the point that if this > doesn't go into mainline, I'll probably add it to my local set of patches), > though I'm not a maintainer or reviewer for the relevant area, so there's > not much I can do except say that there is interest from at least one person > who didn't work on the patch. > > That said, it would be nice if we matched the BSD option name (we do on > pretty much every other core VFS level mount option). Agreed. I'm happy to send an updated patch with adjusted naming. > >> >> On Mon, Oct 24, 2016 at 7:09 AM, Mattias Nissler <mniss...@chromium.org> >> wrote: >>> >>> Friendly ping - does this version of the patch have any chance on >>> getting included in mainline? >>> >>> On Wed, Oct 19, 2016 at 2:31 PM, Mattias Nissler <mniss...@chromium.org> >>> wrote: >>>> >>>> For mounts that have the new "nosymlinks" option, don't follow >>>> symlinks when resolving paths. The new option is similar in spirit to >>>> the existing "nodev", "noexec", and "nosuid" options. >>>> >>>> Note that symlinks may still be created on mounts where the >>>> "nosymlinks" option is present. readlink() remains functional, so user >>>> space code that is aware of symlinks can still choose to follow them >>>> explicitly. >>>> >>>> Setting the "nosymlinks" mount option helps prevent privileged writers >>>> from modifying files unintentionally in case there is an unexpected >>>> link along the accessed path. The "nosymlinks" option is thus useful >>>> as a defensive measure for systems that need to deal with untrusted >>>> file systems in privileged contexts. >>>> >>>> Signed-off-by: Mattias Nissler <mniss...@chromium.org> >>>> --- >>>> fs/namei.c | 3 +++ >>>> fs/namespace.c | 9 ++--- >>>> fs/proc_namespace.c | 1 + >>>> fs/statfs.c | 2 ++ >>>> include/linux/mount.h | 3 ++- >>>> include/linux/statfs.h | 1 + >>>> include/uapi/linux/fs.h | 1 + >>>> 7 files changed, 16 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/fs/namei.c b/fs/namei.c >>>> index 5b4eed2..4cddcf3 100644 >>>> --- a/fs/namei.c >>>> +++ b/fs/namei.c >>>> @@ -1021,6 +1021,9 @@ const char *get_link(struct nameidata *nd) >>>> touch_atime(>link); >>>> } >>>> >>>> + if (nd->path.mnt->mnt_flags & MNT_NOSYMLINKS) >>>> + return ERR_PTR(-EACCES); >>>> + >>>> error = security_inode_follow_link(dentry, inode, >>>>nd->flags & LOOKUP_RCU); >>>> if (unlikely(error)) >>>> diff --git a/fs/namespace.c b/fs/namespace.c >>>> index e6c234b..deec84e 100644 >>>> --- a/fs/namespace.c >>>> +++ b/fs/namespace.c >>>> @@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char >>>> __user *dir_name, >>>> mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME); >>>> if (flags & MS_RDONLY) >>>> mnt_flags |= MNT_READONLY; >>>> + if (flags & MS_NOSYMLINKS) >>>> + mnt_flags |= MNT_NOSYMLINKS; >>>> >>>> /* The default atime for remount is preservation */ >>>> if ((flags & MS_REMOUNT) && >>>> @@ -2741,9 +2743,10 @@ long do_mount(const char *dev_name, const char >>>> __user *dir_name, >>>> mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK; >>>&
Re: [PATCH v2] Add a "nosymlinks" mount option.
On Thu, Nov 17, 2016 at 1:43 PM, Austin S. Hemmelgarn wrote: > On 2016-11-16 16:18, Mattias Nissler wrote: >> >> I understand that silence suggests there's little interest, but here's >> some new information I discovered today that may justify to reconsider >> the patch: >> >> The BSDs already have exactly what I propose, the mount option is >> called "nosymfollow" there: >> >> https://github.com/freebsd/freebsd/blob/a41f4cc9a57cd74604ae7b051eec2f48865f18d6/sys/kern/vfs_lookup.c#L939 >> >> There's also some evidence on the net that people have been using said >> nosymfollow mount option to mitigate symlink attacks. > > I will comment that I do have interest in this (to the point that if this > doesn't go into mainline, I'll probably add it to my local set of patches), > though I'm not a maintainer or reviewer for the relevant area, so there's > not much I can do except say that there is interest from at least one person > who didn't work on the patch. > > That said, it would be nice if we matched the BSD option name (we do on > pretty much every other core VFS level mount option). Agreed. I'm happy to send an updated patch with adjusted naming. > >> >> On Mon, Oct 24, 2016 at 7:09 AM, Mattias Nissler >> wrote: >>> >>> Friendly ping - does this version of the patch have any chance on >>> getting included in mainline? >>> >>> On Wed, Oct 19, 2016 at 2:31 PM, Mattias Nissler >>> wrote: >>>> >>>> For mounts that have the new "nosymlinks" option, don't follow >>>> symlinks when resolving paths. The new option is similar in spirit to >>>> the existing "nodev", "noexec", and "nosuid" options. >>>> >>>> Note that symlinks may still be created on mounts where the >>>> "nosymlinks" option is present. readlink() remains functional, so user >>>> space code that is aware of symlinks can still choose to follow them >>>> explicitly. >>>> >>>> Setting the "nosymlinks" mount option helps prevent privileged writers >>>> from modifying files unintentionally in case there is an unexpected >>>> link along the accessed path. The "nosymlinks" option is thus useful >>>> as a defensive measure for systems that need to deal with untrusted >>>> file systems in privileged contexts. >>>> >>>> Signed-off-by: Mattias Nissler >>>> --- >>>> fs/namei.c | 3 +++ >>>> fs/namespace.c | 9 ++--- >>>> fs/proc_namespace.c | 1 + >>>> fs/statfs.c | 2 ++ >>>> include/linux/mount.h | 3 ++- >>>> include/linux/statfs.h | 1 + >>>> include/uapi/linux/fs.h | 1 + >>>> 7 files changed, 16 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/fs/namei.c b/fs/namei.c >>>> index 5b4eed2..4cddcf3 100644 >>>> --- a/fs/namei.c >>>> +++ b/fs/namei.c >>>> @@ -1021,6 +1021,9 @@ const char *get_link(struct nameidata *nd) >>>> touch_atime(>link); >>>> } >>>> >>>> + if (nd->path.mnt->mnt_flags & MNT_NOSYMLINKS) >>>> + return ERR_PTR(-EACCES); >>>> + >>>> error = security_inode_follow_link(dentry, inode, >>>>nd->flags & LOOKUP_RCU); >>>> if (unlikely(error)) >>>> diff --git a/fs/namespace.c b/fs/namespace.c >>>> index e6c234b..deec84e 100644 >>>> --- a/fs/namespace.c >>>> +++ b/fs/namespace.c >>>> @@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char >>>> __user *dir_name, >>>> mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME); >>>> if (flags & MS_RDONLY) >>>> mnt_flags |= MNT_READONLY; >>>> + if (flags & MS_NOSYMLINKS) >>>> + mnt_flags |= MNT_NOSYMLINKS; >>>> >>>> /* The default atime for remount is preservation */ >>>> if ((flags & MS_REMOUNT) && >>>> @@ -2741,9 +2743,10 @@ long do_mount(const char *dev_name, const char >>>> __user *dir_name, >>>> mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK; >>>> } >>>> >>>> - flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE
[PATCH v3] Add a "nosymfollow" mount option.
For mounts that have the new "nosymfollow" option, don't follow symlinks when resolving paths. The new option is similar in spirit to the existing "nodev", "noexec", and "nosuid" options. Various BSD variants have been supporting the "nosymfollow" mount option for a long time with equivalent implementations. Note that symlinks may still be created on file systems mounted with the "nosymfollow" option present. readlink() remains functional, so user space code that is aware of symlinks can still choose to follow them explicitly. Setting the "nosymfollow" mount option helps prevent privileged writers from modifying files unintentionally in case there is an unexpected link along the accessed path. The "nosymfollow" option is thus useful as a defensive measure for systems that need to deal with untrusted file systems in privileged contexts. Signed-off-by: Mattias Nissler <mniss...@chromium.org> --- Changes since v2: - Updated the option name to align with BSD naming. I have also uploaded the corresponding util-linux patch to make the "mount" command understand the new option here: https://gist.github.com/anonymous/317207ae499389235258f0d52ab22c03 fs/namei.c | 3 +++ fs/namespace.c | 9 ++--- fs/proc_namespace.c | 1 + fs/statfs.c | 2 ++ include/linux/mount.h | 3 ++- include/linux/statfs.h | 1 + include/uapi/linux/fs.h | 1 + 7 files changed, 16 insertions(+), 4 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 5b4eed2..4751e7f 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1021,6 +1021,9 @@ const char *get_link(struct nameidata *nd) touch_atime(>link); } + if (nd->path.mnt->mnt_flags & MNT_NOSYMFOLLOW) + return ERR_PTR(-EACCES); + error = security_inode_follow_link(dentry, inode, nd->flags & LOOKUP_RCU); if (unlikely(error)) diff --git a/fs/namespace.c b/fs/namespace.c index e6c234b..2bf244e 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char __user *dir_name, mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME); if (flags & MS_RDONLY) mnt_flags |= MNT_READONLY; + if (flags & MS_NOSYMFOLLOW) + mnt_flags |= MNT_NOSYMFOLLOW; /* The default atime for remount is preservation */ if ((flags & MS_REMOUNT) && @@ -2741,9 +2743,10 @@ long do_mount(const char *dev_name, const char __user *dir_name, mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK; } - flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN | - MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT | - MS_STRICTATIME | MS_NOREMOTELOCK); + flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_NOSYMFOLLOW | + MS_ACTIVE | MS_BORN | MS_NOATIME | MS_NODIRATIME | + MS_RELATIME | MS_KERNMOUNT | MS_STRICTATIME | + MS_NOREMOTELOCK); if (flags & MS_REMOUNT) retval = do_remount(, flags & ~MS_REMOUNT, mnt_flags, diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c index 3f1190d..366149d 100644 --- a/fs/proc_namespace.c +++ b/fs/proc_namespace.c @@ -67,6 +67,7 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt) { MNT_NOATIME, ",noatime" }, { MNT_NODIRATIME, ",nodiratime" }, { MNT_RELATIME, ",relatime" }, + { MNT_NOSYMFOLLOW, ",nosymfollow" }, { 0, NULL } }; const struct proc_fs_info *fs_infop; diff --git a/fs/statfs.c b/fs/statfs.c index 083dc0a..cfd4da8 100644 --- a/fs/statfs.c +++ b/fs/statfs.c @@ -27,6 +27,8 @@ static int flags_by_mnt(int mnt_flags) flags |= ST_NODIRATIME; if (mnt_flags & MNT_RELATIME) flags |= ST_RELATIME; + if (mnt_flags & MNT_NOSYMFOLLOW) + flags |= ST_NOSYMFOLLOW; return flags; } diff --git a/include/linux/mount.h b/include/linux/mount.h index 1172cce..c9e8d7b 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -28,6 +28,7 @@ struct mnt_namespace; #define MNT_NODIRATIME 0x10 #define MNT_RELATIME 0x20 #define MNT_READONLY 0x40/* does the user want this to be r/o? */ +#define MNT_NOSYMFOLLOW0x80 #define MNT_SHRINKABLE 0x100 #define MNT_WRITE_HOLD 0x200 @@ -44,7 +45,7 @@ struct mnt_namespace; #define MNT_SHARED_MASK(MNT_UNBINDABLE) #define MNT_USER_SETTABLE_MASK (MNT_NOSUID | MNT_NODEV | MNT_NOEXEC \ | MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME \ -| MNT_READONLY) +
[PATCH v3] Add a "nosymfollow" mount option.
For mounts that have the new "nosymfollow" option, don't follow symlinks when resolving paths. The new option is similar in spirit to the existing "nodev", "noexec", and "nosuid" options. Various BSD variants have been supporting the "nosymfollow" mount option for a long time with equivalent implementations. Note that symlinks may still be created on file systems mounted with the "nosymfollow" option present. readlink() remains functional, so user space code that is aware of symlinks can still choose to follow them explicitly. Setting the "nosymfollow" mount option helps prevent privileged writers from modifying files unintentionally in case there is an unexpected link along the accessed path. The "nosymfollow" option is thus useful as a defensive measure for systems that need to deal with untrusted file systems in privileged contexts. Signed-off-by: Mattias Nissler --- Changes since v2: - Updated the option name to align with BSD naming. I have also uploaded the corresponding util-linux patch to make the "mount" command understand the new option here: https://gist.github.com/anonymous/317207ae499389235258f0d52ab22c03 fs/namei.c | 3 +++ fs/namespace.c | 9 ++--- fs/proc_namespace.c | 1 + fs/statfs.c | 2 ++ include/linux/mount.h | 3 ++- include/linux/statfs.h | 1 + include/uapi/linux/fs.h | 1 + 7 files changed, 16 insertions(+), 4 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 5b4eed2..4751e7f 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1021,6 +1021,9 @@ const char *get_link(struct nameidata *nd) touch_atime(>link); } + if (nd->path.mnt->mnt_flags & MNT_NOSYMFOLLOW) + return ERR_PTR(-EACCES); + error = security_inode_follow_link(dentry, inode, nd->flags & LOOKUP_RCU); if (unlikely(error)) diff --git a/fs/namespace.c b/fs/namespace.c index e6c234b..2bf244e 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char __user *dir_name, mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME); if (flags & MS_RDONLY) mnt_flags |= MNT_READONLY; + if (flags & MS_NOSYMFOLLOW) + mnt_flags |= MNT_NOSYMFOLLOW; /* The default atime for remount is preservation */ if ((flags & MS_REMOUNT) && @@ -2741,9 +2743,10 @@ long do_mount(const char *dev_name, const char __user *dir_name, mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK; } - flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN | - MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT | - MS_STRICTATIME | MS_NOREMOTELOCK); + flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_NOSYMFOLLOW | + MS_ACTIVE | MS_BORN | MS_NOATIME | MS_NODIRATIME | + MS_RELATIME | MS_KERNMOUNT | MS_STRICTATIME | + MS_NOREMOTELOCK); if (flags & MS_REMOUNT) retval = do_remount(, flags & ~MS_REMOUNT, mnt_flags, diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c index 3f1190d..366149d 100644 --- a/fs/proc_namespace.c +++ b/fs/proc_namespace.c @@ -67,6 +67,7 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt) { MNT_NOATIME, ",noatime" }, { MNT_NODIRATIME, ",nodiratime" }, { MNT_RELATIME, ",relatime" }, + { MNT_NOSYMFOLLOW, ",nosymfollow" }, { 0, NULL } }; const struct proc_fs_info *fs_infop; diff --git a/fs/statfs.c b/fs/statfs.c index 083dc0a..cfd4da8 100644 --- a/fs/statfs.c +++ b/fs/statfs.c @@ -27,6 +27,8 @@ static int flags_by_mnt(int mnt_flags) flags |= ST_NODIRATIME; if (mnt_flags & MNT_RELATIME) flags |= ST_RELATIME; + if (mnt_flags & MNT_NOSYMFOLLOW) + flags |= ST_NOSYMFOLLOW; return flags; } diff --git a/include/linux/mount.h b/include/linux/mount.h index 1172cce..c9e8d7b 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -28,6 +28,7 @@ struct mnt_namespace; #define MNT_NODIRATIME 0x10 #define MNT_RELATIME 0x20 #define MNT_READONLY 0x40/* does the user want this to be r/o? */ +#define MNT_NOSYMFOLLOW0x80 #define MNT_SHRINKABLE 0x100 #define MNT_WRITE_HOLD 0x200 @@ -44,7 +45,7 @@ struct mnt_namespace; #define MNT_SHARED_MASK(MNT_UNBINDABLE) #define MNT_USER_SETTABLE_MASK (MNT_NOSUID | MNT_NODEV | MNT_NOEXEC \ | MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME \ -| MNT_READONLY) +| MNT_READONLY | MNT_NOSYMFOL
Re: [PATCH v2] Add a "nosymlinks" mount option.
I understand that silence suggests there's little interest, but here's some new information I discovered today that may justify to reconsider the patch: The BSDs already have exactly what I propose, the mount option is called "nosymfollow" there: https://github.com/freebsd/freebsd/blob/a41f4cc9a57cd74604ae7b051eec2f48865f18d6/sys/kern/vfs_lookup.c#L939 There's also some evidence on the net that people have been using said nosymfollow mount option to mitigate symlink attacks. On Mon, Oct 24, 2016 at 7:09 AM, Mattias Nissler <mniss...@chromium.org> wrote: > Friendly ping - does this version of the patch have any chance on > getting included in mainline? > > On Wed, Oct 19, 2016 at 2:31 PM, Mattias Nissler <mniss...@chromium.org> > wrote: >> For mounts that have the new "nosymlinks" option, don't follow >> symlinks when resolving paths. The new option is similar in spirit to >> the existing "nodev", "noexec", and "nosuid" options. >> >> Note that symlinks may still be created on mounts where the >> "nosymlinks" option is present. readlink() remains functional, so user >> space code that is aware of symlinks can still choose to follow them >> explicitly. >> >> Setting the "nosymlinks" mount option helps prevent privileged writers >> from modifying files unintentionally in case there is an unexpected >> link along the accessed path. The "nosymlinks" option is thus useful >> as a defensive measure for systems that need to deal with untrusted >> file systems in privileged contexts. >> >> Signed-off-by: Mattias Nissler <mniss...@chromium.org> >> --- >> fs/namei.c | 3 +++ >> fs/namespace.c | 9 ++--- >> fs/proc_namespace.c | 1 + >> fs/statfs.c | 2 ++ >> include/linux/mount.h | 3 ++- >> include/linux/statfs.h | 1 + >> include/uapi/linux/fs.h | 1 + >> 7 files changed, 16 insertions(+), 4 deletions(-) >> >> diff --git a/fs/namei.c b/fs/namei.c >> index 5b4eed2..4cddcf3 100644 >> --- a/fs/namei.c >> +++ b/fs/namei.c >> @@ -1021,6 +1021,9 @@ const char *get_link(struct nameidata *nd) >> touch_atime(>link); >> } >> >> + if (nd->path.mnt->mnt_flags & MNT_NOSYMLINKS) >> + return ERR_PTR(-EACCES); >> + >> error = security_inode_follow_link(dentry, inode, >>nd->flags & LOOKUP_RCU); >> if (unlikely(error)) >> diff --git a/fs/namespace.c b/fs/namespace.c >> index e6c234b..deec84e 100644 >> --- a/fs/namespace.c >> +++ b/fs/namespace.c >> @@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char __user >> *dir_name, >> mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME); >> if (flags & MS_RDONLY) >> mnt_flags |= MNT_READONLY; >> + if (flags & MS_NOSYMLINKS) >> + mnt_flags |= MNT_NOSYMLINKS; >> >> /* The default atime for remount is preservation */ >> if ((flags & MS_REMOUNT) && >> @@ -2741,9 +2743,10 @@ long do_mount(const char *dev_name, const char __user >> *dir_name, >> mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK; >> } >> >> - flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN | >> - MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT | >> - MS_STRICTATIME | MS_NOREMOTELOCK); >> + flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_NOSYMLINKS | >> + MS_ACTIVE | MS_BORN | MS_NOATIME | MS_NODIRATIME | >> + MS_RELATIME | MS_KERNMOUNT | MS_STRICTATIME | >> + MS_NOREMOTELOCK); >> >> if (flags & MS_REMOUNT) >> retval = do_remount(, flags & ~MS_REMOUNT, mnt_flags, >> diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c >> index 3f1190d..a1949d9 100644 >> --- a/fs/proc_namespace.c >> +++ b/fs/proc_namespace.c >> @@ -67,6 +67,7 @@ static void show_mnt_opts(struct seq_file *m, struct >> vfsmount *mnt) >> { MNT_NOATIME, ",noatime" }, >> { MNT_NODIRATIME, ",nodiratime" }, >> { MNT_RELATIME, ",relatime" }, >> + { MNT_NOSYMLINKS, ",nosymlinks" }, >> { 0, NULL } >> }; >> const struct proc_fs_info *fs_infop; >> diff --git a/fs/statf
Re: [PATCH v2] Add a "nosymlinks" mount option.
I understand that silence suggests there's little interest, but here's some new information I discovered today that may justify to reconsider the patch: The BSDs already have exactly what I propose, the mount option is called "nosymfollow" there: https://github.com/freebsd/freebsd/blob/a41f4cc9a57cd74604ae7b051eec2f48865f18d6/sys/kern/vfs_lookup.c#L939 There's also some evidence on the net that people have been using said nosymfollow mount option to mitigate symlink attacks. On Mon, Oct 24, 2016 at 7:09 AM, Mattias Nissler wrote: > Friendly ping - does this version of the patch have any chance on > getting included in mainline? > > On Wed, Oct 19, 2016 at 2:31 PM, Mattias Nissler > wrote: >> For mounts that have the new "nosymlinks" option, don't follow >> symlinks when resolving paths. The new option is similar in spirit to >> the existing "nodev", "noexec", and "nosuid" options. >> >> Note that symlinks may still be created on mounts where the >> "nosymlinks" option is present. readlink() remains functional, so user >> space code that is aware of symlinks can still choose to follow them >> explicitly. >> >> Setting the "nosymlinks" mount option helps prevent privileged writers >> from modifying files unintentionally in case there is an unexpected >> link along the accessed path. The "nosymlinks" option is thus useful >> as a defensive measure for systems that need to deal with untrusted >> file systems in privileged contexts. >> >> Signed-off-by: Mattias Nissler >> --- >> fs/namei.c | 3 +++ >> fs/namespace.c | 9 ++--- >> fs/proc_namespace.c | 1 + >> fs/statfs.c | 2 ++ >> include/linux/mount.h | 3 ++- >> include/linux/statfs.h | 1 + >> include/uapi/linux/fs.h | 1 + >> 7 files changed, 16 insertions(+), 4 deletions(-) >> >> diff --git a/fs/namei.c b/fs/namei.c >> index 5b4eed2..4cddcf3 100644 >> --- a/fs/namei.c >> +++ b/fs/namei.c >> @@ -1021,6 +1021,9 @@ const char *get_link(struct nameidata *nd) >> touch_atime(>link); >> } >> >> + if (nd->path.mnt->mnt_flags & MNT_NOSYMLINKS) >> + return ERR_PTR(-EACCES); >> + >> error = security_inode_follow_link(dentry, inode, >>nd->flags & LOOKUP_RCU); >> if (unlikely(error)) >> diff --git a/fs/namespace.c b/fs/namespace.c >> index e6c234b..deec84e 100644 >> --- a/fs/namespace.c >> +++ b/fs/namespace.c >> @@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char __user >> *dir_name, >> mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME); >> if (flags & MS_RDONLY) >> mnt_flags |= MNT_READONLY; >> + if (flags & MS_NOSYMLINKS) >> + mnt_flags |= MNT_NOSYMLINKS; >> >> /* The default atime for remount is preservation */ >> if ((flags & MS_REMOUNT) && >> @@ -2741,9 +2743,10 @@ long do_mount(const char *dev_name, const char __user >> *dir_name, >> mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK; >> } >> >> - flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN | >> - MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT | >> - MS_STRICTATIME | MS_NOREMOTELOCK); >> + flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_NOSYMLINKS | >> + MS_ACTIVE | MS_BORN | MS_NOATIME | MS_NODIRATIME | >> + MS_RELATIME | MS_KERNMOUNT | MS_STRICTATIME | >> + MS_NOREMOTELOCK); >> >> if (flags & MS_REMOUNT) >> retval = do_remount(, flags & ~MS_REMOUNT, mnt_flags, >> diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c >> index 3f1190d..a1949d9 100644 >> --- a/fs/proc_namespace.c >> +++ b/fs/proc_namespace.c >> @@ -67,6 +67,7 @@ static void show_mnt_opts(struct seq_file *m, struct >> vfsmount *mnt) >> { MNT_NOATIME, ",noatime" }, >> { MNT_NODIRATIME, ",nodiratime" }, >> { MNT_RELATIME, ",relatime" }, >> + { MNT_NOSYMLINKS, ",nosymlinks" }, >> { 0, NULL } >> }; >> const struct proc_fs_info *fs_infop; >> diff --git a/fs/statfs.c b/fs/statfs.c >> index 083dc0a..7ff7c32 100644 >> --- a/fs/statfs.c
Re: [PATCH v2] Add a "nosymlinks" mount option.
Friendly ping - does this version of the patch have any chance on getting included in mainline? On Wed, Oct 19, 2016 at 2:31 PM, Mattias Nissler <mniss...@chromium.org> wrote: > For mounts that have the new "nosymlinks" option, don't follow > symlinks when resolving paths. The new option is similar in spirit to > the existing "nodev", "noexec", and "nosuid" options. > > Note that symlinks may still be created on mounts where the > "nosymlinks" option is present. readlink() remains functional, so user > space code that is aware of symlinks can still choose to follow them > explicitly. > > Setting the "nosymlinks" mount option helps prevent privileged writers > from modifying files unintentionally in case there is an unexpected > link along the accessed path. The "nosymlinks" option is thus useful > as a defensive measure for systems that need to deal with untrusted > file systems in privileged contexts. > > Signed-off-by: Mattias Nissler <mniss...@chromium.org> > --- > fs/namei.c | 3 +++ > fs/namespace.c | 9 ++--- > fs/proc_namespace.c | 1 + > fs/statfs.c | 2 ++ > include/linux/mount.h | 3 ++- > include/linux/statfs.h | 1 + > include/uapi/linux/fs.h | 1 + > 7 files changed, 16 insertions(+), 4 deletions(-) > > diff --git a/fs/namei.c b/fs/namei.c > index 5b4eed2..4cddcf3 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -1021,6 +1021,9 @@ const char *get_link(struct nameidata *nd) > touch_atime(>link); > } > > + if (nd->path.mnt->mnt_flags & MNT_NOSYMLINKS) > + return ERR_PTR(-EACCES); > + > error = security_inode_follow_link(dentry, inode, >nd->flags & LOOKUP_RCU); > if (unlikely(error)) > diff --git a/fs/namespace.c b/fs/namespace.c > index e6c234b..deec84e 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char __user > *dir_name, > mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME); > if (flags & MS_RDONLY) > mnt_flags |= MNT_READONLY; > + if (flags & MS_NOSYMLINKS) > + mnt_flags |= MNT_NOSYMLINKS; > > /* The default atime for remount is preservation */ > if ((flags & MS_REMOUNT) && > @@ -2741,9 +2743,10 @@ long do_mount(const char *dev_name, const char __user > *dir_name, > mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK; > } > > - flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN | > - MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT | > - MS_STRICTATIME | MS_NOREMOTELOCK); > + flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_NOSYMLINKS | > + MS_ACTIVE | MS_BORN | MS_NOATIME | MS_NODIRATIME | > + MS_RELATIME | MS_KERNMOUNT | MS_STRICTATIME | > + MS_NOREMOTELOCK); > > if (flags & MS_REMOUNT) > retval = do_remount(, flags & ~MS_REMOUNT, mnt_flags, > diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c > index 3f1190d..a1949d9 100644 > --- a/fs/proc_namespace.c > +++ b/fs/proc_namespace.c > @@ -67,6 +67,7 @@ static void show_mnt_opts(struct seq_file *m, struct > vfsmount *mnt) > { MNT_NOATIME, ",noatime" }, > { MNT_NODIRATIME, ",nodiratime" }, > { MNT_RELATIME, ",relatime" }, > + { MNT_NOSYMLINKS, ",nosymlinks" }, > { 0, NULL } > }; > const struct proc_fs_info *fs_infop; > diff --git a/fs/statfs.c b/fs/statfs.c > index 083dc0a..7ff7c32 100644 > --- a/fs/statfs.c > +++ b/fs/statfs.c > @@ -27,6 +27,8 @@ static int flags_by_mnt(int mnt_flags) > flags |= ST_NODIRATIME; > if (mnt_flags & MNT_RELATIME) > flags |= ST_RELATIME; > + if (mnt_flags & MNT_NOSYMLINKS) > + flags |= ST_NOSYMLINKS; > return flags; > } > > diff --git a/include/linux/mount.h b/include/linux/mount.h > index 1172cce..5e302f4 100644 > --- a/include/linux/mount.h > +++ b/include/linux/mount.h > @@ -28,6 +28,7 @@ struct mnt_namespace; > #define MNT_NODIRATIME 0x10 > #define MNT_RELATIME 0x20 > #define MNT_READONLY 0x40/* does the user want this to be r/o? */ > +#define MNT_NOSYMLINKS 0x80 > > #define MNT_SHRINKABLE 0x100 > #define MNT_WRITE_HOLD 0x200 > @@ -44,7 +45,7 @@ struct mnt_namespace; &
Re: [PATCH v2] Add a "nosymlinks" mount option.
Friendly ping - does this version of the patch have any chance on getting included in mainline? On Wed, Oct 19, 2016 at 2:31 PM, Mattias Nissler wrote: > For mounts that have the new "nosymlinks" option, don't follow > symlinks when resolving paths. The new option is similar in spirit to > the existing "nodev", "noexec", and "nosuid" options. > > Note that symlinks may still be created on mounts where the > "nosymlinks" option is present. readlink() remains functional, so user > space code that is aware of symlinks can still choose to follow them > explicitly. > > Setting the "nosymlinks" mount option helps prevent privileged writers > from modifying files unintentionally in case there is an unexpected > link along the accessed path. The "nosymlinks" option is thus useful > as a defensive measure for systems that need to deal with untrusted > file systems in privileged contexts. > > Signed-off-by: Mattias Nissler > --- > fs/namei.c | 3 +++ > fs/namespace.c | 9 ++--- > fs/proc_namespace.c | 1 + > fs/statfs.c | 2 ++ > include/linux/mount.h | 3 ++- > include/linux/statfs.h | 1 + > include/uapi/linux/fs.h | 1 + > 7 files changed, 16 insertions(+), 4 deletions(-) > > diff --git a/fs/namei.c b/fs/namei.c > index 5b4eed2..4cddcf3 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -1021,6 +1021,9 @@ const char *get_link(struct nameidata *nd) > touch_atime(>link); > } > > + if (nd->path.mnt->mnt_flags & MNT_NOSYMLINKS) > + return ERR_PTR(-EACCES); > + > error = security_inode_follow_link(dentry, inode, >nd->flags & LOOKUP_RCU); > if (unlikely(error)) > diff --git a/fs/namespace.c b/fs/namespace.c > index e6c234b..deec84e 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char __user > *dir_name, > mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME); > if (flags & MS_RDONLY) > mnt_flags |= MNT_READONLY; > + if (flags & MS_NOSYMLINKS) > + mnt_flags |= MNT_NOSYMLINKS; > > /* The default atime for remount is preservation */ > if ((flags & MS_REMOUNT) && > @@ -2741,9 +2743,10 @@ long do_mount(const char *dev_name, const char __user > *dir_name, > mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK; > } > > - flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN | > - MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT | > - MS_STRICTATIME | MS_NOREMOTELOCK); > + flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_NOSYMLINKS | > + MS_ACTIVE | MS_BORN | MS_NOATIME | MS_NODIRATIME | > + MS_RELATIME | MS_KERNMOUNT | MS_STRICTATIME | > + MS_NOREMOTELOCK); > > if (flags & MS_REMOUNT) > retval = do_remount(, flags & ~MS_REMOUNT, mnt_flags, > diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c > index 3f1190d..a1949d9 100644 > --- a/fs/proc_namespace.c > +++ b/fs/proc_namespace.c > @@ -67,6 +67,7 @@ static void show_mnt_opts(struct seq_file *m, struct > vfsmount *mnt) > { MNT_NOATIME, ",noatime" }, > { MNT_NODIRATIME, ",nodiratime" }, > { MNT_RELATIME, ",relatime" }, > + { MNT_NOSYMLINKS, ",nosymlinks" }, > { 0, NULL } > }; > const struct proc_fs_info *fs_infop; > diff --git a/fs/statfs.c b/fs/statfs.c > index 083dc0a..7ff7c32 100644 > --- a/fs/statfs.c > +++ b/fs/statfs.c > @@ -27,6 +27,8 @@ static int flags_by_mnt(int mnt_flags) > flags |= ST_NODIRATIME; > if (mnt_flags & MNT_RELATIME) > flags |= ST_RELATIME; > + if (mnt_flags & MNT_NOSYMLINKS) > + flags |= ST_NOSYMLINKS; > return flags; > } > > diff --git a/include/linux/mount.h b/include/linux/mount.h > index 1172cce..5e302f4 100644 > --- a/include/linux/mount.h > +++ b/include/linux/mount.h > @@ -28,6 +28,7 @@ struct mnt_namespace; > #define MNT_NODIRATIME 0x10 > #define MNT_RELATIME 0x20 > #define MNT_READONLY 0x40/* does the user want this to be r/o? */ > +#define MNT_NOSYMLINKS 0x80 > > #define MNT_SHRINKABLE 0x100 > #define MNT_WRITE_HOLD 0x200 > @@ -44,7 +45,7 @@ struct mnt_namespace; > #define MNT_SHARED_MASK(MNT_UNBINDABLE) > #
[PATCH v2] Add a "nosymlinks" mount option.
For mounts that have the new "nosymlinks" option, don't follow symlinks when resolving paths. The new option is similar in spirit to the existing "nodev", "noexec", and "nosuid" options. Note that symlinks may still be created on mounts where the "nosymlinks" option is present. readlink() remains functional, so user space code that is aware of symlinks can still choose to follow them explicitly. Setting the "nosymlinks" mount option helps prevent privileged writers from modifying files unintentionally in case there is an unexpected link along the accessed path. The "nosymlinks" option is thus useful as a defensive measure for systems that need to deal with untrusted file systems in privileged contexts. Signed-off-by: Mattias Nissler <mniss...@chromium.org> --- fs/namei.c | 3 +++ fs/namespace.c | 9 ++--- fs/proc_namespace.c | 1 + fs/statfs.c | 2 ++ include/linux/mount.h | 3 ++- include/linux/statfs.h | 1 + include/uapi/linux/fs.h | 1 + 7 files changed, 16 insertions(+), 4 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 5b4eed2..4cddcf3 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1021,6 +1021,9 @@ const char *get_link(struct nameidata *nd) touch_atime(>link); } + if (nd->path.mnt->mnt_flags & MNT_NOSYMLINKS) + return ERR_PTR(-EACCES); + error = security_inode_follow_link(dentry, inode, nd->flags & LOOKUP_RCU); if (unlikely(error)) diff --git a/fs/namespace.c b/fs/namespace.c index e6c234b..deec84e 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char __user *dir_name, mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME); if (flags & MS_RDONLY) mnt_flags |= MNT_READONLY; + if (flags & MS_NOSYMLINKS) + mnt_flags |= MNT_NOSYMLINKS; /* The default atime for remount is preservation */ if ((flags & MS_REMOUNT) && @@ -2741,9 +2743,10 @@ long do_mount(const char *dev_name, const char __user *dir_name, mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK; } - flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN | - MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT | - MS_STRICTATIME | MS_NOREMOTELOCK); + flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_NOSYMLINKS | + MS_ACTIVE | MS_BORN | MS_NOATIME | MS_NODIRATIME | + MS_RELATIME | MS_KERNMOUNT | MS_STRICTATIME | + MS_NOREMOTELOCK); if (flags & MS_REMOUNT) retval = do_remount(, flags & ~MS_REMOUNT, mnt_flags, diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c index 3f1190d..a1949d9 100644 --- a/fs/proc_namespace.c +++ b/fs/proc_namespace.c @@ -67,6 +67,7 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt) { MNT_NOATIME, ",noatime" }, { MNT_NODIRATIME, ",nodiratime" }, { MNT_RELATIME, ",relatime" }, + { MNT_NOSYMLINKS, ",nosymlinks" }, { 0, NULL } }; const struct proc_fs_info *fs_infop; diff --git a/fs/statfs.c b/fs/statfs.c index 083dc0a..7ff7c32 100644 --- a/fs/statfs.c +++ b/fs/statfs.c @@ -27,6 +27,8 @@ static int flags_by_mnt(int mnt_flags) flags |= ST_NODIRATIME; if (mnt_flags & MNT_RELATIME) flags |= ST_RELATIME; + if (mnt_flags & MNT_NOSYMLINKS) + flags |= ST_NOSYMLINKS; return flags; } diff --git a/include/linux/mount.h b/include/linux/mount.h index 1172cce..5e302f4 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -28,6 +28,7 @@ struct mnt_namespace; #define MNT_NODIRATIME 0x10 #define MNT_RELATIME 0x20 #define MNT_READONLY 0x40/* does the user want this to be r/o? */ +#define MNT_NOSYMLINKS 0x80 #define MNT_SHRINKABLE 0x100 #define MNT_WRITE_HOLD 0x200 @@ -44,7 +45,7 @@ struct mnt_namespace; #define MNT_SHARED_MASK(MNT_UNBINDABLE) #define MNT_USER_SETTABLE_MASK (MNT_NOSUID | MNT_NODEV | MNT_NOEXEC \ | MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME \ -| MNT_READONLY) +| MNT_READONLY | MNT_NOSYMLINKS) #define MNT_ATIME_MASK (MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME ) #define MNT_INTERNAL_FLAGS (MNT_SHARED | MNT_WRITE_HOLD | MNT_INTERNAL | \ diff --git a/include/linux/statfs.h b/include/linux/statfs.h index 0166d32..994b059 100644 --- a/include/linux/statfs.h +++ b/include/linux/statfs.h @@ -39,5 +39,6 @@ struct kstatfs { #define ST_NOATIME 0x0400 /* do not update access times
[PATCH v2] Add a "nosymlinks" mount option.
For mounts that have the new "nosymlinks" option, don't follow symlinks when resolving paths. The new option is similar in spirit to the existing "nodev", "noexec", and "nosuid" options. Note that symlinks may still be created on mounts where the "nosymlinks" option is present. readlink() remains functional, so user space code that is aware of symlinks can still choose to follow them explicitly. Setting the "nosymlinks" mount option helps prevent privileged writers from modifying files unintentionally in case there is an unexpected link along the accessed path. The "nosymlinks" option is thus useful as a defensive measure for systems that need to deal with untrusted file systems in privileged contexts. Signed-off-by: Mattias Nissler --- fs/namei.c | 3 +++ fs/namespace.c | 9 ++--- fs/proc_namespace.c | 1 + fs/statfs.c | 2 ++ include/linux/mount.h | 3 ++- include/linux/statfs.h | 1 + include/uapi/linux/fs.h | 1 + 7 files changed, 16 insertions(+), 4 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 5b4eed2..4cddcf3 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1021,6 +1021,9 @@ const char *get_link(struct nameidata *nd) touch_atime(>link); } + if (nd->path.mnt->mnt_flags & MNT_NOSYMLINKS) + return ERR_PTR(-EACCES); + error = security_inode_follow_link(dentry, inode, nd->flags & LOOKUP_RCU); if (unlikely(error)) diff --git a/fs/namespace.c b/fs/namespace.c index e6c234b..deec84e 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char __user *dir_name, mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME); if (flags & MS_RDONLY) mnt_flags |= MNT_READONLY; + if (flags & MS_NOSYMLINKS) + mnt_flags |= MNT_NOSYMLINKS; /* The default atime for remount is preservation */ if ((flags & MS_REMOUNT) && @@ -2741,9 +2743,10 @@ long do_mount(const char *dev_name, const char __user *dir_name, mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK; } - flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN | - MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT | - MS_STRICTATIME | MS_NOREMOTELOCK); + flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_NOSYMLINKS | + MS_ACTIVE | MS_BORN | MS_NOATIME | MS_NODIRATIME | + MS_RELATIME | MS_KERNMOUNT | MS_STRICTATIME | + MS_NOREMOTELOCK); if (flags & MS_REMOUNT) retval = do_remount(, flags & ~MS_REMOUNT, mnt_flags, diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c index 3f1190d..a1949d9 100644 --- a/fs/proc_namespace.c +++ b/fs/proc_namespace.c @@ -67,6 +67,7 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt) { MNT_NOATIME, ",noatime" }, { MNT_NODIRATIME, ",nodiratime" }, { MNT_RELATIME, ",relatime" }, + { MNT_NOSYMLINKS, ",nosymlinks" }, { 0, NULL } }; const struct proc_fs_info *fs_infop; diff --git a/fs/statfs.c b/fs/statfs.c index 083dc0a..7ff7c32 100644 --- a/fs/statfs.c +++ b/fs/statfs.c @@ -27,6 +27,8 @@ static int flags_by_mnt(int mnt_flags) flags |= ST_NODIRATIME; if (mnt_flags & MNT_RELATIME) flags |= ST_RELATIME; + if (mnt_flags & MNT_NOSYMLINKS) + flags |= ST_NOSYMLINKS; return flags; } diff --git a/include/linux/mount.h b/include/linux/mount.h index 1172cce..5e302f4 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -28,6 +28,7 @@ struct mnt_namespace; #define MNT_NODIRATIME 0x10 #define MNT_RELATIME 0x20 #define MNT_READONLY 0x40/* does the user want this to be r/o? */ +#define MNT_NOSYMLINKS 0x80 #define MNT_SHRINKABLE 0x100 #define MNT_WRITE_HOLD 0x200 @@ -44,7 +45,7 @@ struct mnt_namespace; #define MNT_SHARED_MASK(MNT_UNBINDABLE) #define MNT_USER_SETTABLE_MASK (MNT_NOSUID | MNT_NODEV | MNT_NOEXEC \ | MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME \ -| MNT_READONLY) +| MNT_READONLY | MNT_NOSYMLINKS) #define MNT_ATIME_MASK (MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME ) #define MNT_INTERNAL_FLAGS (MNT_SHARED | MNT_WRITE_HOLD | MNT_INTERNAL | \ diff --git a/include/linux/statfs.h b/include/linux/statfs.h index 0166d32..994b059 100644 --- a/include/linux/statfs.h +++ b/include/linux/statfs.h @@ -39,5 +39,6 @@ struct kstatfs { #define ST_NOATIME 0x0400 /* do not update access times */ #define ST_NODIRATIME 0x08
Re: [RFC] [PATCH] Add a "nolinks" mount option.
On Tue, Oct 18, 2016 at 5:14 PM, Colin Walters <walt...@verbum.org> wrote: > > On Mon, Oct 17, 2016, at 09:02 AM, Mattias Nissler wrote: > > OK, no more feedback thus far. Is there generally any interest in a > > mount option to avoid path name aliasing resulting in target file > > confusion? Perhaps a version that only disables symlinks instead of > > also hard-disabling files hard-linked to multiple locations (those are > > much lower risk for the situation I care about)? > > So the situation here is a (privileged) process that is trying to read/write > to a filesystem tree writable by other processes that are in a separate > security domain? More or less. The scenario is that of an attacker gaining root access, followed by a reboot. The dm-verity-protected root FS raises the bar for re-exploiting the system after reboot, but the fact that we'll consume data from the writable file system during boot can (and has been) used to regain code execution. You may argue that "all is lost" after the initial root exploit, but we think there's value in hardening the system to the point where getting a *persistent* exploit (i.e. retaining control across reboots) is significantly harder. This increases our chances to successfully auto-update devices even after a root exploit and thus retroactively fix them. > > That's a classic situation that requires extreme care, and I am doubtful > that symlinks are the only issue you're facing. For example, if this > process is also *parsing* any data there, there's another whole source > of risk. Agreed. Essentially, the entire writable file system has to be regarded as untrusted input. Every process that stores or reads data needs to do so cautiously. I'm aware this is quite a rabbit hole and difficult to harden. In particular it'll require attention and changes across a number of areas. The fact that we've seen multiple exploits that rely on target file confusion by creating symlinks suggests that there's value in blocking this as an attack vector though (in particular given that we have only few legit uses of symlinks that are relatively easy to clean up). > > I suspect for you it wouldn't be too hard to have a "follow untrusted > path" helper function, it's possible to implement in userspace safely > with O_NOFOLLOW etc. Note that O_NOFOLLOW only affects the final path component. If there's a symlink in any of the parent directories, that'll still be traversed even with O_NOFOLLOW. This situation is less risky as an attacker will have to deal with the restriction of a fixed filename in the last component, but might still be exploitable. > > Regardless too, it sounds like what you want more is a > "same filesystem" traversal (stat and compare devices). > > Or does it even need to handle full traversal? Would it have mitigated > the security issue to fstat() any files you opened and verified they > were from the writable partition you expected? These are all good ideas, and in fact I'm already looking into making safe helpers to use when dealing with files from the writable file system. There is an idea to even go a step further and mount the writable file system in a separate mount namespace (to avoid accidental access from the rest of the system). Then, the helper tools would take care of performing file access in said mount namespace and can prevent malicious symlinks by canonicalizing the requested path and bailing out if it doesn't match the requested path. The difficulty lies in applying these measures of precaution system-wide. This affects most init scripts and daemons, and everything else that keeps state on the writable file system. Some of the affected code we have written ourselves so is relatively easy to fix, but we're also running a number of third party packages for which things are more complicated. Going through with a fine comb and auditing all file access is possible in theory, but hardly in practice. So we'll have to make do with partial auditing and implementing mitigations that reduce risk for the rest of the code, which led to the idea of killing the symlink attack vector systematically. FWIW, I plan to send an updated patch later that only disables symlinks for consideration.
Re: [RFC] [PATCH] Add a "nolinks" mount option.
On Tue, Oct 18, 2016 at 5:14 PM, Colin Walters wrote: > > On Mon, Oct 17, 2016, at 09:02 AM, Mattias Nissler wrote: > > OK, no more feedback thus far. Is there generally any interest in a > > mount option to avoid path name aliasing resulting in target file > > confusion? Perhaps a version that only disables symlinks instead of > > also hard-disabling files hard-linked to multiple locations (those are > > much lower risk for the situation I care about)? > > So the situation here is a (privileged) process that is trying to read/write > to a filesystem tree writable by other processes that are in a separate > security domain? More or less. The scenario is that of an attacker gaining root access, followed by a reboot. The dm-verity-protected root FS raises the bar for re-exploiting the system after reboot, but the fact that we'll consume data from the writable file system during boot can (and has been) used to regain code execution. You may argue that "all is lost" after the initial root exploit, but we think there's value in hardening the system to the point where getting a *persistent* exploit (i.e. retaining control across reboots) is significantly harder. This increases our chances to successfully auto-update devices even after a root exploit and thus retroactively fix them. > > That's a classic situation that requires extreme care, and I am doubtful > that symlinks are the only issue you're facing. For example, if this > process is also *parsing* any data there, there's another whole source > of risk. Agreed. Essentially, the entire writable file system has to be regarded as untrusted input. Every process that stores or reads data needs to do so cautiously. I'm aware this is quite a rabbit hole and difficult to harden. In particular it'll require attention and changes across a number of areas. The fact that we've seen multiple exploits that rely on target file confusion by creating symlinks suggests that there's value in blocking this as an attack vector though (in particular given that we have only few legit uses of symlinks that are relatively easy to clean up). > > I suspect for you it wouldn't be too hard to have a "follow untrusted > path" helper function, it's possible to implement in userspace safely > with O_NOFOLLOW etc. Note that O_NOFOLLOW only affects the final path component. If there's a symlink in any of the parent directories, that'll still be traversed even with O_NOFOLLOW. This situation is less risky as an attacker will have to deal with the restriction of a fixed filename in the last component, but might still be exploitable. > > Regardless too, it sounds like what you want more is a > "same filesystem" traversal (stat and compare devices). > > Or does it even need to handle full traversal? Would it have mitigated > the security issue to fstat() any files you opened and verified they > were from the writable partition you expected? These are all good ideas, and in fact I'm already looking into making safe helpers to use when dealing with files from the writable file system. There is an idea to even go a step further and mount the writable file system in a separate mount namespace (to avoid accidental access from the rest of the system). Then, the helper tools would take care of performing file access in said mount namespace and can prevent malicious symlinks by canonicalizing the requested path and bailing out if it doesn't match the requested path. The difficulty lies in applying these measures of precaution system-wide. This affects most init scripts and daemons, and everything else that keeps state on the writable file system. Some of the affected code we have written ourselves so is relatively easy to fix, but we're also running a number of third party packages for which things are more complicated. Going through with a fine comb and auditing all file access is possible in theory, but hardly in practice. So we'll have to make do with partial auditing and implementing mitigations that reduce risk for the rest of the code, which led to the idea of killing the symlink attack vector systematically. FWIW, I plan to send an updated patch later that only disables symlinks for consideration.
Re: [RFC] [PATCH] Add a "nolinks" mount option.
OK, no more feedback thus far. Is there generally any interest in a mount option to avoid path name aliasing resulting in target file confusion? Perhaps a version that only disables symlinks instead of also hard-disabling files hard-linked to multiple locations (those are much lower risk for the situation I care about)? If there is interest, I'm happy to iterate the patch until it's accepted. If there's no interest, that's fine too - I'll then likely resort to moving the restrictions desired for Chrome OS into an LSM we compile into our kernels. On Fri, Oct 14, 2016 at 6:22 PM, Mattias Nissler <mniss...@chromium.org> wrote: > Forgot to mention: I realize my motivation is very specific to Chrome > OS, however the nolinks option seemed useful also as a mitigation to > generic privilege escalation symlink attacks, for cases where > disabling symlinks/hardlinks is acceptable. > > On Fri, Oct 14, 2016 at 5:50 PM, Mattias Nissler <mniss...@chromium.org> > wrote: >> On Fri, Oct 14, 2016 at 5:00 PM, Al Viro <v...@zeniv.linux.org.uk> wrote: >>> >>> On Fri, Oct 14, 2016 at 03:55:15PM +0100, Al Viro wrote: >>> > > Setting the "nolinks" mount option helps prevent privileged writers >>> > > from modifying files unintentionally in case there is an unexpected >>> > > link along the accessed path. The "nolinks" option is thus useful as a >>> > > defensive measure against persistent exploits (i.e. a system getting >>> > > re-exploited after a reboot) for systems that employ a read-only or >>> > > dm-verity-protected rootfs. These systems prevent non-legit binaries >>> > > from running after reboot. However, legit code typically still reads >>> > > from and writes to a writable file system previously under full >>> > > control of the attacker, who can place symlinks to trick file writes >>> > > after reboot to target a file of their choice. "nolinks" fundamentally >>> > > prevents this. >>> > >>> > Which parts of the tree would be on that "protected" rootfs and which >>> > would >>> > you mount with that option? Description above is rather vague and I'm >>> > not convinced that it actually buys you anything. Details, please... >> >> Apologies for the vague description, I'm happy to explain in detail. >> >> In case of Chrome OS, we have all binaries on a dm-verity rootfs, so >> an attacker can't modify any binaries. After reboot, everything except >> the rootfs is mounted noexec, so there's no way to re-gain code >> execution after reboot by modifying existing binaries or dropping new >> ones. >> >> We've seen multiple exploits now where the attacker worked around >> these limitations in two steps: >> >> 1. Before reboot, the attacker sets up symlinks on the writeable file >> system (called "stateful" file system), which are later accessed by >> legit boot code (such as init scripts) after reboot. For example, an >> init script that copies file A to B can be abused by an attacker by >> symlinking or hardlinking B to a location C of their choice, and >> placing desired data to be written to C in A. That gives the attacker >> a primitive to write data of their choice to a path of their choice >> after reboot. Note that this primitive may target locations _outside_ >> the stateful file system the attacker previously had control of. >> Particularly of interest are targets on /sys, but also tmpfs on /run >> etc. >> >> 2. The second step for a successful attack is finding some legit code >> invoked in the boot flow that has a vulnerability exploitable by >> feeding it unexpected data. As an example, there are Linux userspace >> utilities that read config from /run which may contain shell commands >> the the utility executes, through which the attacker can gain code >> execution again. >> >> The purpose of the proposed patch is to raise the bar for the first >> step of the attack: Writing arbitrary files after reboot. I'm >> intending to mount the stateful file system with the nolinks option >> (or otherwise prevent symlink traversal). This will help make sure >> that any legit writes taking place during boot in init scripts etc. go >> to the files intended by the developer, and can't be redirected by an >> attacker. >> >> Does this make more sense to you? >> >>> >>> >>> PS: what the hell do restrictions on _following_ symlinks have to _creating_ >>> hardlinks? I'm trying to imagine a threat model where both would
Re: [RFC] [PATCH] Add a "nolinks" mount option.
OK, no more feedback thus far. Is there generally any interest in a mount option to avoid path name aliasing resulting in target file confusion? Perhaps a version that only disables symlinks instead of also hard-disabling files hard-linked to multiple locations (those are much lower risk for the situation I care about)? If there is interest, I'm happy to iterate the patch until it's accepted. If there's no interest, that's fine too - I'll then likely resort to moving the restrictions desired for Chrome OS into an LSM we compile into our kernels. On Fri, Oct 14, 2016 at 6:22 PM, Mattias Nissler wrote: > Forgot to mention: I realize my motivation is very specific to Chrome > OS, however the nolinks option seemed useful also as a mitigation to > generic privilege escalation symlink attacks, for cases where > disabling symlinks/hardlinks is acceptable. > > On Fri, Oct 14, 2016 at 5:50 PM, Mattias Nissler > wrote: >> On Fri, Oct 14, 2016 at 5:00 PM, Al Viro wrote: >>> >>> On Fri, Oct 14, 2016 at 03:55:15PM +0100, Al Viro wrote: >>> > > Setting the "nolinks" mount option helps prevent privileged writers >>> > > from modifying files unintentionally in case there is an unexpected >>> > > link along the accessed path. The "nolinks" option is thus useful as a >>> > > defensive measure against persistent exploits (i.e. a system getting >>> > > re-exploited after a reboot) for systems that employ a read-only or >>> > > dm-verity-protected rootfs. These systems prevent non-legit binaries >>> > > from running after reboot. However, legit code typically still reads >>> > > from and writes to a writable file system previously under full >>> > > control of the attacker, who can place symlinks to trick file writes >>> > > after reboot to target a file of their choice. "nolinks" fundamentally >>> > > prevents this. >>> > >>> > Which parts of the tree would be on that "protected" rootfs and which >>> > would >>> > you mount with that option? Description above is rather vague and I'm >>> > not convinced that it actually buys you anything. Details, please... >> >> Apologies for the vague description, I'm happy to explain in detail. >> >> In case of Chrome OS, we have all binaries on a dm-verity rootfs, so >> an attacker can't modify any binaries. After reboot, everything except >> the rootfs is mounted noexec, so there's no way to re-gain code >> execution after reboot by modifying existing binaries or dropping new >> ones. >> >> We've seen multiple exploits now where the attacker worked around >> these limitations in two steps: >> >> 1. Before reboot, the attacker sets up symlinks on the writeable file >> system (called "stateful" file system), which are later accessed by >> legit boot code (such as init scripts) after reboot. For example, an >> init script that copies file A to B can be abused by an attacker by >> symlinking or hardlinking B to a location C of their choice, and >> placing desired data to be written to C in A. That gives the attacker >> a primitive to write data of their choice to a path of their choice >> after reboot. Note that this primitive may target locations _outside_ >> the stateful file system the attacker previously had control of. >> Particularly of interest are targets on /sys, but also tmpfs on /run >> etc. >> >> 2. The second step for a successful attack is finding some legit code >> invoked in the boot flow that has a vulnerability exploitable by >> feeding it unexpected data. As an example, there are Linux userspace >> utilities that read config from /run which may contain shell commands >> the the utility executes, through which the attacker can gain code >> execution again. >> >> The purpose of the proposed patch is to raise the bar for the first >> step of the attack: Writing arbitrary files after reboot. I'm >> intending to mount the stateful file system with the nolinks option >> (or otherwise prevent symlink traversal). This will help make sure >> that any legit writes taking place during boot in init scripts etc. go >> to the files intended by the developer, and can't be redirected by an >> attacker. >> >> Does this make more sense to you? >> >>> >>> >>> PS: what the hell do restrictions on _following_ symlinks have to _creating_ >>> hardlinks? I'm trying to imagine a threat model where both would apply or >>> anything else beyond the word "link" they would have in c
Re: [RFC] [PATCH] Add a "nolinks" mount option.
Forgot to mention: I realize my motivation is very specific to Chrome OS, however the nolinks option seemed useful also as a mitigation to generic privilege escalation symlink attacks, for cases where disabling symlinks/hardlinks is acceptable. On Fri, Oct 14, 2016 at 5:50 PM, Mattias Nissler <mniss...@chromium.org> wrote: > On Fri, Oct 14, 2016 at 5:00 PM, Al Viro <v...@zeniv.linux.org.uk> wrote: >> >> On Fri, Oct 14, 2016 at 03:55:15PM +0100, Al Viro wrote: >> > > Setting the "nolinks" mount option helps prevent privileged writers >> > > from modifying files unintentionally in case there is an unexpected >> > > link along the accessed path. The "nolinks" option is thus useful as a >> > > defensive measure against persistent exploits (i.e. a system getting >> > > re-exploited after a reboot) for systems that employ a read-only or >> > > dm-verity-protected rootfs. These systems prevent non-legit binaries >> > > from running after reboot. However, legit code typically still reads >> > > from and writes to a writable file system previously under full >> > > control of the attacker, who can place symlinks to trick file writes >> > > after reboot to target a file of their choice. "nolinks" fundamentally >> > > prevents this. >> > >> > Which parts of the tree would be on that "protected" rootfs and which would >> > you mount with that option? Description above is rather vague and I'm >> > not convinced that it actually buys you anything. Details, please... > > Apologies for the vague description, I'm happy to explain in detail. > > In case of Chrome OS, we have all binaries on a dm-verity rootfs, so > an attacker can't modify any binaries. After reboot, everything except > the rootfs is mounted noexec, so there's no way to re-gain code > execution after reboot by modifying existing binaries or dropping new > ones. > > We've seen multiple exploits now where the attacker worked around > these limitations in two steps: > > 1. Before reboot, the attacker sets up symlinks on the writeable file > system (called "stateful" file system), which are later accessed by > legit boot code (such as init scripts) after reboot. For example, an > init script that copies file A to B can be abused by an attacker by > symlinking or hardlinking B to a location C of their choice, and > placing desired data to be written to C in A. That gives the attacker > a primitive to write data of their choice to a path of their choice > after reboot. Note that this primitive may target locations _outside_ > the stateful file system the attacker previously had control of. > Particularly of interest are targets on /sys, but also tmpfs on /run > etc. > > 2. The second step for a successful attack is finding some legit code > invoked in the boot flow that has a vulnerability exploitable by > feeding it unexpected data. As an example, there are Linux userspace > utilities that read config from /run which may contain shell commands > the the utility executes, through which the attacker can gain code > execution again. > > The purpose of the proposed patch is to raise the bar for the first > step of the attack: Writing arbitrary files after reboot. I'm > intending to mount the stateful file system with the nolinks option > (or otherwise prevent symlink traversal). This will help make sure > that any legit writes taking place during boot in init scripts etc. go > to the files intended by the developer, and can't be redirected by an > attacker. > > Does this make more sense to you? > >> >> >> PS: what the hell do restrictions on _following_ symlinks have to _creating_ >> hardlinks? I'm trying to imagine a threat model where both would apply or >> anything else beyond the word "link" they would have in common... > > The restriction is not on _creating_ hard links, but _opening_ > hardlinks. The commonality is in the confusion between the file you're > meaning to write vs. the file you actually end up writing to, which > stems from the fact that as things stand a file can be accessible on > other paths than its canonical one. For Chrome OS, I'd like to get to > a point where most privileged code can only access a file via its > canonical name (bind mounts are an OK exception as they're not > persistent, so out of reach for manipulation). > >> >> The one you've described above might have something to do with the first >> one (modulo missing description of the setup you have in mind), but it >> clearly has nothing to do with the second - attackers could've created >> whatever they wanted while the fs had been under their control, after all. >> Doesn't make sense...
Re: [RFC] [PATCH] Add a "nolinks" mount option.
Forgot to mention: I realize my motivation is very specific to Chrome OS, however the nolinks option seemed useful also as a mitigation to generic privilege escalation symlink attacks, for cases where disabling symlinks/hardlinks is acceptable. On Fri, Oct 14, 2016 at 5:50 PM, Mattias Nissler wrote: > On Fri, Oct 14, 2016 at 5:00 PM, Al Viro wrote: >> >> On Fri, Oct 14, 2016 at 03:55:15PM +0100, Al Viro wrote: >> > > Setting the "nolinks" mount option helps prevent privileged writers >> > > from modifying files unintentionally in case there is an unexpected >> > > link along the accessed path. The "nolinks" option is thus useful as a >> > > defensive measure against persistent exploits (i.e. a system getting >> > > re-exploited after a reboot) for systems that employ a read-only or >> > > dm-verity-protected rootfs. These systems prevent non-legit binaries >> > > from running after reboot. However, legit code typically still reads >> > > from and writes to a writable file system previously under full >> > > control of the attacker, who can place symlinks to trick file writes >> > > after reboot to target a file of their choice. "nolinks" fundamentally >> > > prevents this. >> > >> > Which parts of the tree would be on that "protected" rootfs and which would >> > you mount with that option? Description above is rather vague and I'm >> > not convinced that it actually buys you anything. Details, please... > > Apologies for the vague description, I'm happy to explain in detail. > > In case of Chrome OS, we have all binaries on a dm-verity rootfs, so > an attacker can't modify any binaries. After reboot, everything except > the rootfs is mounted noexec, so there's no way to re-gain code > execution after reboot by modifying existing binaries or dropping new > ones. > > We've seen multiple exploits now where the attacker worked around > these limitations in two steps: > > 1. Before reboot, the attacker sets up symlinks on the writeable file > system (called "stateful" file system), which are later accessed by > legit boot code (such as init scripts) after reboot. For example, an > init script that copies file A to B can be abused by an attacker by > symlinking or hardlinking B to a location C of their choice, and > placing desired data to be written to C in A. That gives the attacker > a primitive to write data of their choice to a path of their choice > after reboot. Note that this primitive may target locations _outside_ > the stateful file system the attacker previously had control of. > Particularly of interest are targets on /sys, but also tmpfs on /run > etc. > > 2. The second step for a successful attack is finding some legit code > invoked in the boot flow that has a vulnerability exploitable by > feeding it unexpected data. As an example, there are Linux userspace > utilities that read config from /run which may contain shell commands > the the utility executes, through which the attacker can gain code > execution again. > > The purpose of the proposed patch is to raise the bar for the first > step of the attack: Writing arbitrary files after reboot. I'm > intending to mount the stateful file system with the nolinks option > (or otherwise prevent symlink traversal). This will help make sure > that any legit writes taking place during boot in init scripts etc. go > to the files intended by the developer, and can't be redirected by an > attacker. > > Does this make more sense to you? > >> >> >> PS: what the hell do restrictions on _following_ symlinks have to _creating_ >> hardlinks? I'm trying to imagine a threat model where both would apply or >> anything else beyond the word "link" they would have in common... > > The restriction is not on _creating_ hard links, but _opening_ > hardlinks. The commonality is in the confusion between the file you're > meaning to write vs. the file you actually end up writing to, which > stems from the fact that as things stand a file can be accessible on > other paths than its canonical one. For Chrome OS, I'd like to get to > a point where most privileged code can only access a file via its > canonical name (bind mounts are an OK exception as they're not > persistent, so out of reach for manipulation). > >> >> The one you've described above might have something to do with the first >> one (modulo missing description of the setup you have in mind), but it >> clearly has nothing to do with the second - attackers could've created >> whatever they wanted while the fs had been under their control, after all. >> Doesn't make sense...
Re: [RFC] [PATCH] Add a "nolinks" mount option.
On Fri, Oct 14, 2016 at 5:00 PM, Al Virowrote: > > On Fri, Oct 14, 2016 at 03:55:15PM +0100, Al Viro wrote: > > > Setting the "nolinks" mount option helps prevent privileged writers > > > from modifying files unintentionally in case there is an unexpected > > > link along the accessed path. The "nolinks" option is thus useful as a > > > defensive measure against persistent exploits (i.e. a system getting > > > re-exploited after a reboot) for systems that employ a read-only or > > > dm-verity-protected rootfs. These systems prevent non-legit binaries > > > from running after reboot. However, legit code typically still reads > > > from and writes to a writable file system previously under full > > > control of the attacker, who can place symlinks to trick file writes > > > after reboot to target a file of their choice. "nolinks" fundamentally > > > prevents this. > > > > Which parts of the tree would be on that "protected" rootfs and which would > > you mount with that option? Description above is rather vague and I'm > > not convinced that it actually buys you anything. Details, please... Apologies for the vague description, I'm happy to explain in detail. In case of Chrome OS, we have all binaries on a dm-verity rootfs, so an attacker can't modify any binaries. After reboot, everything except the rootfs is mounted noexec, so there's no way to re-gain code execution after reboot by modifying existing binaries or dropping new ones. We've seen multiple exploits now where the attacker worked around these limitations in two steps: 1. Before reboot, the attacker sets up symlinks on the writeable file system (called "stateful" file system), which are later accessed by legit boot code (such as init scripts) after reboot. For example, an init script that copies file A to B can be abused by an attacker by symlinking or hardlinking B to a location C of their choice, and placing desired data to be written to C in A. That gives the attacker a primitive to write data of their choice to a path of their choice after reboot. Note that this primitive may target locations _outside_ the stateful file system the attacker previously had control of. Particularly of interest are targets on /sys, but also tmpfs on /run etc. 2. The second step for a successful attack is finding some legit code invoked in the boot flow that has a vulnerability exploitable by feeding it unexpected data. As an example, there are Linux userspace utilities that read config from /run which may contain shell commands the the utility executes, through which the attacker can gain code execution again. The purpose of the proposed patch is to raise the bar for the first step of the attack: Writing arbitrary files after reboot. I'm intending to mount the stateful file system with the nolinks option (or otherwise prevent symlink traversal). This will help make sure that any legit writes taking place during boot in init scripts etc. go to the files intended by the developer, and can't be redirected by an attacker. Does this make more sense to you? > > > PS: what the hell do restrictions on _following_ symlinks have to _creating_ > hardlinks? I'm trying to imagine a threat model where both would apply or > anything else beyond the word "link" they would have in common... The restriction is not on _creating_ hard links, but _opening_ hardlinks. The commonality is in the confusion between the file you're meaning to write vs. the file you actually end up writing to, which stems from the fact that as things stand a file can be accessible on other paths than its canonical one. For Chrome OS, I'd like to get to a point where most privileged code can only access a file via its canonical name (bind mounts are an OK exception as they're not persistent, so out of reach for manipulation). > > The one you've described above might have something to do with the first > one (modulo missing description of the setup you have in mind), but it > clearly has nothing to do with the second - attackers could've created > whatever they wanted while the fs had been under their control, after all. > Doesn't make sense...
Re: [RFC] [PATCH] Add a "nolinks" mount option.
On Fri, Oct 14, 2016 at 5:00 PM, Al Viro wrote: > > On Fri, Oct 14, 2016 at 03:55:15PM +0100, Al Viro wrote: > > > Setting the "nolinks" mount option helps prevent privileged writers > > > from modifying files unintentionally in case there is an unexpected > > > link along the accessed path. The "nolinks" option is thus useful as a > > > defensive measure against persistent exploits (i.e. a system getting > > > re-exploited after a reboot) for systems that employ a read-only or > > > dm-verity-protected rootfs. These systems prevent non-legit binaries > > > from running after reboot. However, legit code typically still reads > > > from and writes to a writable file system previously under full > > > control of the attacker, who can place symlinks to trick file writes > > > after reboot to target a file of their choice. "nolinks" fundamentally > > > prevents this. > > > > Which parts of the tree would be on that "protected" rootfs and which would > > you mount with that option? Description above is rather vague and I'm > > not convinced that it actually buys you anything. Details, please... Apologies for the vague description, I'm happy to explain in detail. In case of Chrome OS, we have all binaries on a dm-verity rootfs, so an attacker can't modify any binaries. After reboot, everything except the rootfs is mounted noexec, so there's no way to re-gain code execution after reboot by modifying existing binaries or dropping new ones. We've seen multiple exploits now where the attacker worked around these limitations in two steps: 1. Before reboot, the attacker sets up symlinks on the writeable file system (called "stateful" file system), which are later accessed by legit boot code (such as init scripts) after reboot. For example, an init script that copies file A to B can be abused by an attacker by symlinking or hardlinking B to a location C of their choice, and placing desired data to be written to C in A. That gives the attacker a primitive to write data of their choice to a path of their choice after reboot. Note that this primitive may target locations _outside_ the stateful file system the attacker previously had control of. Particularly of interest are targets on /sys, but also tmpfs on /run etc. 2. The second step for a successful attack is finding some legit code invoked in the boot flow that has a vulnerability exploitable by feeding it unexpected data. As an example, there are Linux userspace utilities that read config from /run which may contain shell commands the the utility executes, through which the attacker can gain code execution again. The purpose of the proposed patch is to raise the bar for the first step of the attack: Writing arbitrary files after reboot. I'm intending to mount the stateful file system with the nolinks option (or otherwise prevent symlink traversal). This will help make sure that any legit writes taking place during boot in init scripts etc. go to the files intended by the developer, and can't be redirected by an attacker. Does this make more sense to you? > > > PS: what the hell do restrictions on _following_ symlinks have to _creating_ > hardlinks? I'm trying to imagine a threat model where both would apply or > anything else beyond the word "link" they would have in common... The restriction is not on _creating_ hard links, but _opening_ hardlinks. The commonality is in the confusion between the file you're meaning to write vs. the file you actually end up writing to, which stems from the fact that as things stand a file can be accessible on other paths than its canonical one. For Chrome OS, I'd like to get to a point where most privileged code can only access a file via its canonical name (bind mounts are an OK exception as they're not persistent, so out of reach for manipulation). > > The one you've described above might have something to do with the first > one (modulo missing description of the setup you have in mind), but it > clearly has nothing to do with the second - attackers could've created > whatever they wanted while the fs had been under their control, after all. > Doesn't make sense...
[RFC] [PATCH] Add a "nolinks" mount option.
For mounts that have the new "nolinks" option, don't follow symlinks and reject to open files with a hard link count larger than one. The new option is similar in spirit to the existing "nodev", "noexec", and "nosuid" options. Note that symlinks and hard links may still be created on mounts where the "nolinks" option is present. readlink() remains functional, so user space code that is aware of symlinks can still choose to follow them explicitly. Similarly, hard-linked files can be identified from userspace using stat() output while the "nolinks" option is set. Setting the "nolinks" mount option helps prevent privileged writers from modifying files unintentionally in case there is an unexpected link along the accessed path. The "nolinks" option is thus useful as a defensive measure against persistent exploits (i.e. a system getting re-exploited after a reboot) for systems that employ a read-only or dm-verity-protected rootfs. These systems prevent non-legit binaries from running after reboot. However, legit code typically still reads from and writes to a writable file system previously under full control of the attacker, who can place symlinks to trick file writes after reboot to target a file of their choice. "nolinks" fundamentally prevents this. Signed-off-by: Mattias Nissler <mniss...@chromium.org> --- fs/namei.c | 9 - fs/namespace.c | 8 +--- fs/proc_namespace.c | 1 + include/linux/mount.h | 3 ++- include/uapi/linux/fs.h | 1 + 5 files changed, 17 insertions(+), 5 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index a7f601c..f152687 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1021,6 +1021,10 @@ const char *get_link(struct nameidata *nd) touch_atime(>link); } + error = -EACCES; + if (nd->path.mnt->mnt_flags & MNT_NOLINKS) + return ERR_PTR(error); + error = security_inode_follow_link(dentry, inode, nd->flags & LOOKUP_RCU); if (unlikely(error)) @@ -2919,7 +2923,10 @@ static int may_open(struct path *path, int acc_mode, int flag) case S_IFIFO: case S_IFSOCK: flag &= ~O_TRUNC; - break; + /*FALLTHRU*/ + default: + if ((path->mnt->mnt_flags & MNT_NOLINKS) && inode->i_nlink > 1) + return -EACCES; } error = inode_permission(inode, MAY_OPEN | acc_mode); diff --git a/fs/namespace.c b/fs/namespace.c index 58aca9c..c421fbb 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char __user *dir_name, mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME); if (flags & MS_RDONLY) mnt_flags |= MNT_READONLY; + if (flags & MS_NOLINKS) + mnt_flags |= MNT_NOLINKS; /* The default atime for remount is preservation */ if ((flags & MS_REMOUNT) && @@ -2741,9 +2743,9 @@ long do_mount(const char *dev_name, const char __user *dir_name, mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK; } - flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN | - MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT | - MS_STRICTATIME | MS_NOREMOTELOCK); + flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_NOLINKS | MS_ACTIVE | + MS_BORN | MS_NOATIME | MS_NODIRATIME | MS_RELATIME | + MS_KERNMOUNT | MS_STRICTATIME | MS_NOREMOTELOCK); if (flags & MS_REMOUNT) retval = do_remount(, flags & ~MS_REMOUNT, mnt_flags, diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c index 3f1190d..b5d9d35 100644 --- a/fs/proc_namespace.c +++ b/fs/proc_namespace.c @@ -67,6 +67,7 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt) { MNT_NOATIME, ",noatime" }, { MNT_NODIRATIME, ",nodiratime" }, { MNT_RELATIME, ",relatime" }, + { MNT_NOLINKS, ",nolinks" }, { 0, NULL } }; const struct proc_fs_info *fs_infop; diff --git a/include/linux/mount.h b/include/linux/mount.h index 1172cce..df4eb6a 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -28,6 +28,7 @@ struct mnt_namespace; #define MNT_NODIRATIME 0x10 #define MNT_RELATIME 0x20 #define MNT_READONLY 0x40/* does the user want this to be r/o? */ +#define MNT_NOLINKS0x80 #define MNT_SHRINKABLE 0x100 #define MNT_WRITE_HOLD 0x200 @@ -44,7 +45,7 @@ struct mnt_namespace; #define MNT_SHARED_MASK(MNT_UNBINDABLE) #define MNT_USER_SETTABLE_MASK (MNT_NOSUID | MNT_NODEV | MNT_NOEXEC \
[RFC] [PATCH] Add a "nolinks" mount option.
For mounts that have the new "nolinks" option, don't follow symlinks and reject to open files with a hard link count larger than one. The new option is similar in spirit to the existing "nodev", "noexec", and "nosuid" options. Note that symlinks and hard links may still be created on mounts where the "nolinks" option is present. readlink() remains functional, so user space code that is aware of symlinks can still choose to follow them explicitly. Similarly, hard-linked files can be identified from userspace using stat() output while the "nolinks" option is set. Setting the "nolinks" mount option helps prevent privileged writers from modifying files unintentionally in case there is an unexpected link along the accessed path. The "nolinks" option is thus useful as a defensive measure against persistent exploits (i.e. a system getting re-exploited after a reboot) for systems that employ a read-only or dm-verity-protected rootfs. These systems prevent non-legit binaries from running after reboot. However, legit code typically still reads from and writes to a writable file system previously under full control of the attacker, who can place symlinks to trick file writes after reboot to target a file of their choice. "nolinks" fundamentally prevents this. Signed-off-by: Mattias Nissler --- fs/namei.c | 9 - fs/namespace.c | 8 +--- fs/proc_namespace.c | 1 + include/linux/mount.h | 3 ++- include/uapi/linux/fs.h | 1 + 5 files changed, 17 insertions(+), 5 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index a7f601c..f152687 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1021,6 +1021,10 @@ const char *get_link(struct nameidata *nd) touch_atime(>link); } + error = -EACCES; + if (nd->path.mnt->mnt_flags & MNT_NOLINKS) + return ERR_PTR(error); + error = security_inode_follow_link(dentry, inode, nd->flags & LOOKUP_RCU); if (unlikely(error)) @@ -2919,7 +2923,10 @@ static int may_open(struct path *path, int acc_mode, int flag) case S_IFIFO: case S_IFSOCK: flag &= ~O_TRUNC; - break; + /*FALLTHRU*/ + default: + if ((path->mnt->mnt_flags & MNT_NOLINKS) && inode->i_nlink > 1) + return -EACCES; } error = inode_permission(inode, MAY_OPEN | acc_mode); diff --git a/fs/namespace.c b/fs/namespace.c index 58aca9c..c421fbb 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char __user *dir_name, mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME); if (flags & MS_RDONLY) mnt_flags |= MNT_READONLY; + if (flags & MS_NOLINKS) + mnt_flags |= MNT_NOLINKS; /* The default atime for remount is preservation */ if ((flags & MS_REMOUNT) && @@ -2741,9 +2743,9 @@ long do_mount(const char *dev_name, const char __user *dir_name, mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK; } - flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN | - MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT | - MS_STRICTATIME | MS_NOREMOTELOCK); + flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_NOLINKS | MS_ACTIVE | + MS_BORN | MS_NOATIME | MS_NODIRATIME | MS_RELATIME | + MS_KERNMOUNT | MS_STRICTATIME | MS_NOREMOTELOCK); if (flags & MS_REMOUNT) retval = do_remount(, flags & ~MS_REMOUNT, mnt_flags, diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c index 3f1190d..b5d9d35 100644 --- a/fs/proc_namespace.c +++ b/fs/proc_namespace.c @@ -67,6 +67,7 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt) { MNT_NOATIME, ",noatime" }, { MNT_NODIRATIME, ",nodiratime" }, { MNT_RELATIME, ",relatime" }, + { MNT_NOLINKS, ",nolinks" }, { 0, NULL } }; const struct proc_fs_info *fs_infop; diff --git a/include/linux/mount.h b/include/linux/mount.h index 1172cce..df4eb6a 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -28,6 +28,7 @@ struct mnt_namespace; #define MNT_NODIRATIME 0x10 #define MNT_RELATIME 0x20 #define MNT_READONLY 0x40/* does the user want this to be r/o? */ +#define MNT_NOLINKS0x80 #define MNT_SHRINKABLE 0x100 #define MNT_WRITE_HOLD 0x200 @@ -44,7 +45,7 @@ struct mnt_namespace; #define MNT_SHARED_MASK(MNT_UNBINDABLE) #define MNT_USER_SETTABLE_MASK (MNT_NOSUID | MNT_NODEV | MNT_NOEXEC \ | MNT_NOATIME |
Re: debugfs_create_{u,x,s}{8,16,32,64}
On Wed, Aug 13, 2014 at 11:57 PM, Greg KH wrote: > > On Wed, Aug 13, 2014 at 02:57:05PM -0700, Matt Longnecker wrote: > > Greg, > > > > Back in December 2007 Mattias Nissler proposed a patch defining > > debugfs_create_s32 and friends. In it he reworked the already existent u and > > x versions. The subject of Mattias's mail was "[PATCH] debugfs: Revamp > > debugfs_create_{u,x,s}{8,16,32,64} to support signed integers". > > > > As I'm writing a new driver related to thermal management, I'm finding the > > absence of debugfs_create_s32 to be a stumbling block. I see at least two > > other folks who have stumbled on the absence of debugfs_create_s32 in the > > past few years. > > > > Why didn't we merge Mattias's patch? > > I can't remember why I didn't merge a patch from last week, let alone > from 2007, are you crazy? :) Needless to say, I don't remember details either. It might just have been a case of not adding further debugfs API as long as there's little demand to justify the addition. The latter may have changed in the last 7 years (Matt hints at other potential consumers?), but I personally can't tell as I haven't done any kernel work in more recent times. > > > > Would you entertain a similar patch today? > > I always entertain all patches, you never need to ask if you are allowed > to send a patch, that's like asking "Am I allowed to ask a question?"... > > greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: debugfs_create_{u,x,s}{8,16,32,64}
On Wed, Aug 13, 2014 at 11:57 PM, Greg KH gre...@linuxfoundation.org wrote: On Wed, Aug 13, 2014 at 02:57:05PM -0700, Matt Longnecker wrote: Greg, Back in December 2007 Mattias Nissler proposed a patch defining debugfs_create_s32 and friends. In it he reworked the already existent u and x versions. The subject of Mattias's mail was [PATCH] debugfs: Revamp debugfs_create_{u,x,s}{8,16,32,64} to support signed integers. As I'm writing a new driver related to thermal management, I'm finding the absence of debugfs_create_s32 to be a stumbling block. I see at least two other folks who have stumbled on the absence of debugfs_create_s32 in the past few years. Why didn't we merge Mattias's patch? I can't remember why I didn't merge a patch from last week, let alone from 2007, are you crazy? :) Needless to say, I don't remember details either. It might just have been a case of not adding further debugfs API as long as there's little demand to justify the addition. The latter may have changed in the last 7 years (Matt hints at other potential consumers?), but I personally can't tell as I haven't done any kernel work in more recent times. Would you entertain a similar patch today? I always entertain all patches, you never need to ask if you are allowed to send a patch, that's like asking Am I allowed to ask a question?... greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Silent crash witk 2-6.24-rc8-git4
On Thu, 2008-01-24 at 00:03 +, Chris Clayton wrote: > Thanks for the reply, John. > > On Wednesday 23 January 2008, John W. Linville wrote: > > On Wed, Jan 23, 2008 at 04:42:58PM +, Chris Clayton wrote: > > > On Wednesday 23 January 2008, Chris Clayton wrote: > > > > Hi again. > > > > > > > > On Tuesday 22 January 2008, Chris Clayton wrote: > > > > > Hi, > > > > > > and here's another, slightly more complete, one, this time from > > > 2.6.24-rc8-git6 > > > (the none I sent earlier was from -git4, by the way) > > > > > > Call Trace: > > > [] __update_rq_clock+0x1a/0xf0 > > > [] rt2x00lib_txdone+0x9d/0xd0 [rt2x00lib] > > > [] rt61pci_txdone+0x153/0x1f0 [rt61pci] > > > [] rt61pci_interrupt+0x9d/0xb0 [rt61pci] > > > > I suspect this could relate to this commit: > > > > commit 62bc060b8ed5fcdafd87da5ab17bdd59a39ebcc9 > > Author: Mattias Nissler <[EMAIL PROTECTED]> > > Date: Mon Nov 12 15:03:12 2007 +0100 > > > > rt2x00: Allow rt61 to catch up after a missing tx report > > > > Could you revert that patch, rebuild, and try to recreate the problem? > > > > I've reverted the patch by hand - the original patch on linux-wireless > doesn't match the present state of the code > in rt61pci.c and I'm not a git user. I'm pretty sure I've got it right, > however. > > Problem now is that we are back where we were a few weeks ago because my > wireless connection dies > almost immediately after I start the download and from dmesg I see: > > phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the > non-full queue 2. > Please file bug report to http://rt2x00.serialmonkey.com. As the patches author, I'm not very surprised :-) The patch just adds additional processing that should have happened in a different call of the interrupt handler. I don't think it can be related at all to the trouble you're seeing. When I find time, I'll try the Linus tree on my rt61 to see what happens. Mattias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Silent crash witk 2-6.24-rc8-git4
On Thu, 2008-01-24 at 00:03 +, Chris Clayton wrote: Thanks for the reply, John. On Wednesday 23 January 2008, John W. Linville wrote: On Wed, Jan 23, 2008 at 04:42:58PM +, Chris Clayton wrote: On Wednesday 23 January 2008, Chris Clayton wrote: Hi again. On Tuesday 22 January 2008, Chris Clayton wrote: Hi, and here's another, slightly more complete, one, this time from 2.6.24-rc8-git6 (the none I sent earlier was from -git4, by the way) Call Trace: [c011440a] __update_rq_clock+0x1a/0xf0 [e08ff59d] rt2x00lib_txdone+0x9d/0xd0 [rt2x00lib] [e0909573] rt61pci_txdone+0x153/0x1f0 [rt61pci] [e09096ad] rt61pci_interrupt+0x9d/0xb0 [rt61pci] I suspect this could relate to this commit: commit 62bc060b8ed5fcdafd87da5ab17bdd59a39ebcc9 Author: Mattias Nissler [EMAIL PROTECTED] Date: Mon Nov 12 15:03:12 2007 +0100 rt2x00: Allow rt61 to catch up after a missing tx report Could you revert that patch, rebuild, and try to recreate the problem? I've reverted the patch by hand - the original patch on linux-wireless doesn't match the present state of the code in rt61pci.c and I'm not a git user. I'm pretty sure I've got it right, however. Problem now is that we are back where we were a few weeks ago because my wireless connection dies almost immediately after I start the download and from dmesg I see: phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 2. Please file bug report to http://rt2x00.serialmonkey.com. As the patches author, I'm not very surprised :-) The patch just adds additional processing that should have happened in a different call of the interrupt handler. I don't think it can be related at all to the trouble you're seeing. When I find time, I'll try the Linus tree on my rt61 to see what happens. Mattias -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Spammed logs again rt2500pci 2.6.24-rc8
Hi, we currently think this is due to a race condition in the packet queue code. Ivo is currently reworking the packet queues, which I hope will resolve this situation. Mattias On Thu, 2008-01-17 at 22:57 +0100, Alejandro Riveira Fernández wrote: > Running > > Linux Varda 2.6.24-rc8 #1 SMP PREEMPT Wed Jan 16 15:09:43 CET 2008 x86_64 > GNU/Linux > > my logs are spmmed with: > > > [20126.343240] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free > entry in the non-full queue 1. > [20126.343242] Please file bug report to http://rt2x00.serialmonkey.com. > [20126.848277] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free > entry in the non-full queue 1. > [20126.848280] Please file bug report to http://rt2x00.serialmonkey.com. > [20126.851693] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free > entry in the non-full queue 1. > [20126.851696] Please file bug report to http://rt2x00.serialmonkey.com. > [20126.851854] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free > entry in the non-full queue 1. > [20126.851856] Please file bug report to http://rt2x00.serialmonkey.com. > [20126.852082] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free > entry in the non-full queue 1. > [20126.852084] Please file bug report to http://rt2x00.serialmonkey.com. > [20126.852224] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free > entry in the non-full queue 1. > [20126.852226] Please file bug report to http://rt2x00.serialmonkey.com. > [20127.349691] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free > entry in the non-full queue 1. > [20127.349694] Please file bug report to http://rt2x00.serialmonkey.com. > [20127.352425] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free > entry in the non-full queue 1. > [20127.352428] Please file bug report to http://rt2x00.serialmonkey.com. > [20127.853345] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free > entry in the non-full queue 1. > [20127.853348] Please file bug report to http://rt2x00.serialmonkey.com. > [20127.853699] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free > entry in the non-full queue 1. > [20127.853701] Please file bug report to http://rt2x00.serialmonkey.com. > [20128.858569] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free > entry in the non-full queue 1. > [20128.858572] Please file bug report to http://rt2x00.serialmonkey.com. > [20131.875550] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free > entry in the non-full queue 1. > [20131.875554] Please file bug report to http://rt2x00.serialmonkey.com. > [20154.693853] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free > entry in the non-full queue 1. > [20154.693856] Please file bug report to http://rt2x00.serialmonkey.com. > [20185.527974] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free > entry in the non-full queue 1. > [20185.527977] Please file bug report to http://rt2x00.serialmonkey.com. > [20189.812334] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free > entry in the non-full queue 1. > [20189.812337] Please file bug report to http://rt2x00.serialmonkey.com. > [20189.812374] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free > entry in the non-full queue 1. > [20189.812376] Please file bug report to http://rt2x00.serialmonkey.com. > [20189.812507] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free > entry in the non-full queue 1. > [20189.812509] Please file bug report to http://rt2x00.serialmonkey.com. > [20189.813100] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free > entry in the non-full queue 1. > [20189.813102] Please file bug report to http://rt2x00.serialmonkey.com. > [20189.813474] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free > entry in the non-full queue 1. > [20189.813476] Please file bug report to http://rt2x00.serialmonkey.com. > [20189.813909] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free > entry in the non-full queue 1. > [20189.813911] Please file bug report to http://rt2x00.serialmonkey.com. > [20212.242858] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free > entry in the non-full queue 1. > [20212.242861] Please file bug report to http://rt2x00.serialmonkey.com. > [20212.243587] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free > entry in the non-full queue 1. > [20212.243589] Please file bug report to http://rt2x00.serialmonkey.com. > > > As the last time the thing is triggered with a p2p app making a lot of > connections (this time not bt but mldonkey) > Back to top > View user's profile Send private message > - > To unsubscribe from this list: send the line "unsubscribe linux-wireless" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to
Re: Spammed logs again rt2500pci 2.6.24-rc8
Hi, we currently think this is due to a race condition in the packet queue code. Ivo is currently reworking the packet queues, which I hope will resolve this situation. Mattias On Thu, 2008-01-17 at 22:57 +0100, Alejandro Riveira Fernández wrote: Running Linux Varda 2.6.24-rc8 #1 SMP PREEMPT Wed Jan 16 15:09:43 CET 2008 x86_64 GNU/Linux my logs are spmmed with: [20126.343240] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 1. [20126.343242] Please file bug report to http://rt2x00.serialmonkey.com. [20126.848277] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 1. [20126.848280] Please file bug report to http://rt2x00.serialmonkey.com. [20126.851693] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 1. [20126.851696] Please file bug report to http://rt2x00.serialmonkey.com. [20126.851854] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 1. [20126.851856] Please file bug report to http://rt2x00.serialmonkey.com. [20126.852082] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 1. [20126.852084] Please file bug report to http://rt2x00.serialmonkey.com. [20126.852224] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 1. [20126.852226] Please file bug report to http://rt2x00.serialmonkey.com. [20127.349691] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 1. [20127.349694] Please file bug report to http://rt2x00.serialmonkey.com. [20127.352425] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 1. [20127.352428] Please file bug report to http://rt2x00.serialmonkey.com. [20127.853345] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 1. [20127.853348] Please file bug report to http://rt2x00.serialmonkey.com. [20127.853699] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 1. [20127.853701] Please file bug report to http://rt2x00.serialmonkey.com. [20128.858569] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 1. [20128.858572] Please file bug report to http://rt2x00.serialmonkey.com. [20131.875550] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 1. [20131.875554] Please file bug report to http://rt2x00.serialmonkey.com. [20154.693853] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 1. [20154.693856] Please file bug report to http://rt2x00.serialmonkey.com. [20185.527974] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 1. [20185.527977] Please file bug report to http://rt2x00.serialmonkey.com. [20189.812334] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 1. [20189.812337] Please file bug report to http://rt2x00.serialmonkey.com. [20189.812374] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 1. [20189.812376] Please file bug report to http://rt2x00.serialmonkey.com. [20189.812507] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 1. [20189.812509] Please file bug report to http://rt2x00.serialmonkey.com. [20189.813100] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 1. [20189.813102] Please file bug report to http://rt2x00.serialmonkey.com. [20189.813474] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 1. [20189.813476] Please file bug report to http://rt2x00.serialmonkey.com. [20189.813909] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 1. [20189.813911] Please file bug report to http://rt2x00.serialmonkey.com. [20212.242858] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 1. [20212.242861] Please file bug report to http://rt2x00.serialmonkey.com. [20212.243587] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the non-full queue 1. [20212.243589] Please file bug report to http://rt2x00.serialmonkey.com. As the last time the thing is triggered with a p2p app making a lot of connections (this time not bt but mldonkey) Back to top View user's profile Send private message - To unsubscribe from this list: send the line unsubscribe linux-wireless in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at
Re: [PATCH] debugfs: Revamp debugfs_create_{u,x,s}{8,16,32,64} to support signed integers
On Sun, 2007-12-16 at 09:05 -0800, Greg KH wrote: > On Sun, Dec 16, 2007 at 05:37:59PM +0100, Mattias Nissler wrote: > > This makes debugfs use its own file_operations for the value accessor files > > created by debugfs_create_XXX. Having that, we can also have proper versions > > for signed integers. > > Why not tweak the "SIMPLE_ATTRIBUTE" code to support this instead? That > way debugfs and all other filesystems could also use these attributes? I expected that question ;-) Actually, I had a version that did this. But it ended up being ugly, cause SIMPLE_ATTRIBUTEs expect to access the values via get/set functions using u64. Here is what I did and why I didn't like it: * I made the set/get function interface more generic (as it is in the debugfs patch), i.e. getting passed the buf, so they have to decode/encode whatever they like into the buf instead of working with u64 * This means either all code using SIMPLE_ATTRIBUTES must be changed, or I need to add another wrapper macro that produces suitable get/set functions from the u64 versions it gets passed. * Changing all SIMPLE_ATTRIBUTE users is not what I want, cause moving the csnprintf()/strtoull() calls out of the SIMPLE_ATTRIBUTE code means it becomes less simple to use it, right? * The wrapper is actually possible, but it suffers from cases where somebody passes NULL for a get or set function (which is actually expected, compare the code in libfs.c) It is possible to hack around this, but it's ugly. Then I thought, well, SIMPLE_ATTRIBUTEs are made to work with a set/get function pair. But debugfs expects a pointer to an actual variable anyway, so just bypass the SIMPLE_ATTRIBUTE code and make debugfs use it's own file_operations (most of them shamelessly stolen from the SIMPLE_ATTRIBUTE code, I admit). Mattias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] debugfs: Revamp debugfs_create_{u,x,s}{8,16,32,64} to support signed integers
This makes debugfs use its own file_operations for the value accessor files created by debugfs_create_XXX. Having that, we can also have proper versions for signed integers. Signed-off-by: Mattias Nissler <[EMAIL PROTECTED]> --- When writing some debugfs code for mac80211 I wanted to have access to some s32 variables. I thought it's better to make debugfs support signed integers instead of adding the functionality locally. I assume generic versions will be useful for other people as well. Please CC me for replies. fs/debugfs/file.c | 291 +++ include/linux/debugfs.h | 45 +++ 2 files changed, 287 insertions(+), 49 deletions(-) diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c index fa6b7f7..8808bc0 100644 --- a/fs/debugfs/file.c +++ b/fs/debugfs/file.c @@ -56,18 +56,121 @@ const struct inode_operations debugfs_link_operations = { .follow_link= debugfs_follow_link, }; -static void debugfs_u8_set(void *data, u64 val) +/* Simple numeric attributes */ +struct debugfs_num_attr { - *(u8 *)data = val; + void (*get)(void *, char *); + void (*set)(void *, char *); + char get_buf[24]; + char set_buf[24]; + void *pnum; + struct mutex mutex; +}; + +static int debugfs_num_attr_open(struct inode *inode, struct file *file, +void (*get)(void *, char*), +void (*set)(void *, char*)) +{ + struct debugfs_num_attr *attr; + + attr = kmalloc(sizeof(*attr), GFP_KERNEL); + if (!attr) + return -ENOMEM; + + attr->get = get; + attr->set = set; + attr->pnum = inode->i_private; + mutex_init(>mutex); + + file->private_data = attr; + + return nonseekable_open(inode, file); +} + +static int debugfs_num_attr_close(struct inode *inode, struct file *file) +{ + kfree(file->private_data); + + return 0; +} + +static ssize_t debugfs_num_attr_read(struct file *file, char __user *buf, +size_t len, loff_t *ppos) +{ + struct debugfs_num_attr *attr; + size_t size; + ssize_t ret; + + attr = file->private_data; + + mutex_lock(>mutex); + if (!*ppos) { + /* first read */ + attr->get(attr->pnum, attr->get_buf); + attr->get_buf[sizeof(attr->get_buf) - 1] = '\0'; + } + + size = strlen(attr->get_buf); + ret = simple_read_from_buffer(buf, len, ppos, attr->get_buf, size); + mutex_unlock(>mutex); + + return ret; } -static u64 debugfs_u8_get(void *data) + +static ssize_t debugfs_num_attr_write(struct file *file, const char __user *buf, + size_t len, loff_t *ppos) { - return *(u8 *)data; + struct debugfs_num_attr *attr; + size_t size; + ssize_t ret; + + attr = file->private_data; + + mutex_lock(>mutex); + ret = -EFAULT; + size = min(sizeof(attr->set_buf) - 1, len); + if (copy_from_user(attr->set_buf, buf, size)) + goto out; + + ret = len; /* claim we got the whole input */ + attr->set_buf[size] = '\0'; + attr->set(attr->pnum, attr->set_buf); +out: + mutex_unlock(>mutex); + return ret; } -DEFINE_SIMPLE_ATTRIBUTE(fops_u8, debugfs_u8_get, debugfs_u8_set, "%llu\n"); + +#define DEFINE_NUM_ATTR(__fops, __type, __format) \ +static void __fops ## _get(void *data, char *buf) \ +{ \ + scnprintf(buf, 24, __format, *(__type *) data); \ +} \ +static void __fops ## _set(void *data, char *buf) \ +{ \ + sscanf(buf, __format, (__type *) data); \ +} \ +static int __fops ## _open(struct inode *inode, struct file *file) \ +{ \ + return debugfs_num_attr_open(inode, file, __fops ## _get, \ +__fops ## _set); \ +} \ +static struct file_operations __fops = { \ + .owner = THIS_MODULE, \ + .open= __fops ## _open, \ + .release = debugfs_num_attr_close, \ + .read= debugfs_num_attr_read, \ + .write = debugfs_num_attr_write, \ +}; + +DE
[PATCH] debugfs: Revamp debugfs_create_{u,x,s}{8,16,32,64} to support signed integers
This makes debugfs use its own file_operations for the value accessor files created by debugfs_create_XXX. Having that, we can also have proper versions for signed integers. Signed-off-by: Mattias Nissler [EMAIL PROTECTED] --- When writing some debugfs code for mac80211 I wanted to have access to some s32 variables. I thought it's better to make debugfs support signed integers instead of adding the functionality locally. I assume generic versions will be useful for other people as well. Please CC me for replies. fs/debugfs/file.c | 291 +++ include/linux/debugfs.h | 45 +++ 2 files changed, 287 insertions(+), 49 deletions(-) diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c index fa6b7f7..8808bc0 100644 --- a/fs/debugfs/file.c +++ b/fs/debugfs/file.c @@ -56,18 +56,121 @@ const struct inode_operations debugfs_link_operations = { .follow_link= debugfs_follow_link, }; -static void debugfs_u8_set(void *data, u64 val) +/* Simple numeric attributes */ +struct debugfs_num_attr { - *(u8 *)data = val; + void (*get)(void *, char *); + void (*set)(void *, char *); + char get_buf[24]; + char set_buf[24]; + void *pnum; + struct mutex mutex; +}; + +static int debugfs_num_attr_open(struct inode *inode, struct file *file, +void (*get)(void *, char*), +void (*set)(void *, char*)) +{ + struct debugfs_num_attr *attr; + + attr = kmalloc(sizeof(*attr), GFP_KERNEL); + if (!attr) + return -ENOMEM; + + attr-get = get; + attr-set = set; + attr-pnum = inode-i_private; + mutex_init(attr-mutex); + + file-private_data = attr; + + return nonseekable_open(inode, file); +} + +static int debugfs_num_attr_close(struct inode *inode, struct file *file) +{ + kfree(file-private_data); + + return 0; +} + +static ssize_t debugfs_num_attr_read(struct file *file, char __user *buf, +size_t len, loff_t *ppos) +{ + struct debugfs_num_attr *attr; + size_t size; + ssize_t ret; + + attr = file-private_data; + + mutex_lock(attr-mutex); + if (!*ppos) { + /* first read */ + attr-get(attr-pnum, attr-get_buf); + attr-get_buf[sizeof(attr-get_buf) - 1] = '\0'; + } + + size = strlen(attr-get_buf); + ret = simple_read_from_buffer(buf, len, ppos, attr-get_buf, size); + mutex_unlock(attr-mutex); + + return ret; } -static u64 debugfs_u8_get(void *data) + +static ssize_t debugfs_num_attr_write(struct file *file, const char __user *buf, + size_t len, loff_t *ppos) { - return *(u8 *)data; + struct debugfs_num_attr *attr; + size_t size; + ssize_t ret; + + attr = file-private_data; + + mutex_lock(attr-mutex); + ret = -EFAULT; + size = min(sizeof(attr-set_buf) - 1, len); + if (copy_from_user(attr-set_buf, buf, size)) + goto out; + + ret = len; /* claim we got the whole input */ + attr-set_buf[size] = '\0'; + attr-set(attr-pnum, attr-set_buf); +out: + mutex_unlock(attr-mutex); + return ret; } -DEFINE_SIMPLE_ATTRIBUTE(fops_u8, debugfs_u8_get, debugfs_u8_set, %llu\n); + +#define DEFINE_NUM_ATTR(__fops, __type, __format) \ +static void __fops ## _get(void *data, char *buf) \ +{ \ + scnprintf(buf, 24, __format, *(__type *) data); \ +} \ +static void __fops ## _set(void *data, char *buf) \ +{ \ + sscanf(buf, __format, (__type *) data); \ +} \ +static int __fops ## _open(struct inode *inode, struct file *file) \ +{ \ + return debugfs_num_attr_open(inode, file, __fops ## _get, \ +__fops ## _set); \ +} \ +static struct file_operations __fops = { \ + .owner = THIS_MODULE, \ + .open= __fops ## _open, \ + .release = debugfs_num_attr_close, \ + .read= debugfs_num_attr_read, \ + .write = debugfs_num_attr_write, \ +}; + +DEFINE_NUM_ATTR(fops_u8, u8, %hhu\n) +DEFINE_NUM_ATTR(fops_u16, u16, %hu\n) +DEFINE_NUM_ATTR(fops_u32, u32
Re: [PATCH] debugfs: Revamp debugfs_create_{u,x,s}{8,16,32,64} to support signed integers
On Sun, 2007-12-16 at 09:05 -0800, Greg KH wrote: On Sun, Dec 16, 2007 at 05:37:59PM +0100, Mattias Nissler wrote: This makes debugfs use its own file_operations for the value accessor files created by debugfs_create_XXX. Having that, we can also have proper versions for signed integers. Why not tweak the SIMPLE_ATTRIBUTE code to support this instead? That way debugfs and all other filesystems could also use these attributes? I expected that question ;-) Actually, I had a version that did this. But it ended up being ugly, cause SIMPLE_ATTRIBUTEs expect to access the values via get/set functions using u64. Here is what I did and why I didn't like it: * I made the set/get function interface more generic (as it is in the debugfs patch), i.e. getting passed the buf, so they have to decode/encode whatever they like into the buf instead of working with u64 * This means either all code using SIMPLE_ATTRIBUTES must be changed, or I need to add another wrapper macro that produces suitable get/set functions from the u64 versions it gets passed. * Changing all SIMPLE_ATTRIBUTE users is not what I want, cause moving the csnprintf()/strtoull() calls out of the SIMPLE_ATTRIBUTE code means it becomes less simple to use it, right? * The wrapper is actually possible, but it suffers from cases where somebody passes NULL for a get or set function (which is actually expected, compare the code in libfs.c) It is possible to hack around this, but it's ugly. Then I thought, well, SIMPLE_ATTRIBUTEs are made to work with a set/get function pair. But debugfs expects a pointer to an actual variable anyway, so just bypass the SIMPLE_ATTRIBUTE code and make debugfs use it's own file_operations (most of them shamelessly stolen from the SIMPLE_ATTRIBUTE code, I admit). Mattias -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/