Re: [PATCH] coresight: etm4x: Add config to exclude kernel mode tracing

2021-01-18 Thread Mattias Nissler
On Fri, Jan 15, 2021 at 6:46 AM Sai Prakash Ranjan
 wrote:
>
> Hello Mathieu, Suzuki
>
> On 2020-10-15 21:32, Mathieu Poirier wrote:
> > On Thu, Oct 15, 2020 at 06:15:22PM +0530, Sai Prakash Ranjan wrote:
> >> On production systems with ETMs enabled, it is preferred to
> >> exclude kernel mode(NS EL1) tracing for security concerns and
> >> support only userspace(NS EL0) tracing. So provide an option
> >> via kconfig to exclude kernel mode tracing if it is required.
> >> This config is disabled by default and would not affect the
> >> current configuration which has both kernel and userspace
> >> tracing enabled by default.
> >>
> >
> > One requires root access (or be part of a special trace group) to be
> > able to use
> > the cs_etm PMU.  With this kind of elevated access restricting tracing
> > at EL1
> > provides little in terms of security.
> >
>
> Apart from the VM usecase discussed, I am told there are other
> security concerns here regarding need to exclude kernel mode tracing
> even for the privileged users/root. One such case being the ability
> to analyze cryptographic code execution since ETMs can record all
> branch instructions including timestamps in the kernel and there may
> be other cases as well which I may not be aware of and hence have
> added Denis and Mattias. Please let us know if you have any questions
> further regarding this not being a security concern.

Well, the idea that root privileges != full control over the kernel
isn't new and at the very least since lockdown became part of mainline
[1] no longer an esoteric edge case. Regarding the use case Sai hints
at (namely protection of secrets in the kernel), Matthew Garret
actually has some more thoughts about confidentiality mode for
lockdown for secret protection [2]. And thus, unless someone can make
a compelling case that instruction-level tracing will not leak secrets
held by the kernel, I think an option for the kernel to prevent itself
from being traced (even by root) is valuable.

Finally, to sketch a practical use case scenario: Consider a system
where disk contents are encrypted and the encryption key is set up by
the user when mounting the file system. From that point on the
encryption key resides in the kernel. It seems reasonable to expect
that the disk encryption key be protected from exfiltration even if
the system later suffers a root compromise (or even against insiders
that have root access), at least as long as the attacker doesn't
manage to compromise the kernel.

[1] https://lwn.net/Articles/796866/
[2] https://mjg59.dreamwidth.org/55105.html

>
> After this discussion, I would like to post a v2 based on Suzuki's
> feedback earlier. @Suzuki, I have a common config for ETM3 and ETM4
> but couldn't get much idea on how to implement it for Intel PTs, if
> you have any suggestions there, please do share or we can have this
> only for Coresight ETMs.
>
> Thanks,
> Sai
>
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> member
> of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH 4.9 30/47] ANDROID: binder: remove waitqueue when thread exits.

2019-10-07 Thread Mattias Nissler
(resend, apologies for accidental HTML reply)

On Sun, Oct 6, 2019 at 11:24 AM Greg Kroah-Hartman
 wrote:
>
> On Sun, Oct 06, 2019 at 10:32:02AM -0700, Eric Biggers wrote:
> > On Sun, Oct 06, 2019 at 07:21:17PM +0200, Greg Kroah-Hartman wrote:
> > > From: Martijn Coenen 
> > >
> > > commit f5cb779ba16334b45ba8946d6bfa6d9834d1527f upstream.
> > >
> > > binder_poll() passes the thread->wait waitqueue that
> > > can be slept on for work. When a thread that uses
> > > epoll explicitly exits using BINDER_THREAD_EXIT,
> > > the waitqueue is freed, but it is never removed
> > > from the corresponding epoll data structure. When
> > > the process subsequently exits, the epoll cleanup
> > > code tries to access the waitlist, which results in
> > > a use-after-free.
> > >
> > > Prevent this by using POLLFREE when the thread exits.
> > >
> > > Signed-off-by: Martijn Coenen 
> > > Reported-by: syzbot 
> > > Cc: stable  # 4.14
> > > [backport BINDER_LOOPER_STATE_POLL logic as well]
> > > Signed-off-by: Mattias Nissler 
> > > Signed-off-by: Greg Kroah-Hartman 
> > > ---
> > >  drivers/android/binder.c |   17 -
> > >  1 file changed, 16 insertions(+), 1 deletion(-)
> > >
> > > --- a/drivers/android/binder.c
> > > +++ b/drivers/android/binder.c
> > > @@ -334,7 +334,8 @@ enum {
> > > BINDER_LOOPER_STATE_EXITED  = 0x04,
> > > BINDER_LOOPER_STATE_INVALID = 0x08,
> > > BINDER_LOOPER_STATE_WAITING = 0x10,
> > > -   BINDER_LOOPER_STATE_NEED_RETURN = 0x20
> > > +   BINDER_LOOPER_STATE_NEED_RETURN = 0x20,
> > > +   BINDER_LOOPER_STATE_POLL= 0x40,
> > >  };
> > >
> > >  struct binder_thread {
> > > @@ -2628,6 +2629,18 @@ static int binder_free_thread(struct bin
> > > } else
> > > BUG();
> > > }
> > > +
> > > +   /*
> > > +* If this thread used poll, make sure we remove the waitqueue
> > > +* from any epoll data structures holding it with POLLFREE.
> > > +* waitqueue_active() is safe to use here because we're holding
> > > +* the inner lock.
> > > +*/
> > > +   if ((thread->looper & BINDER_LOOPER_STATE_POLL) &&
> > > +   waitqueue_active(>wait)) {
> > > +   wake_up_poll(>wait, POLLHUP | POLLFREE);
> > > +   }
> > > +
> > > if (send_reply)
> > > binder_send_failed_reply(send_reply, BR_DEAD_REPLY);
> > > binder_release_work(>todo);
> > > @@ -2651,6 +2664,8 @@ static unsigned int binder_poll(struct f
> > > return POLLERR;
> > > }
> > >
> > > +   thread->looper |= BINDER_LOOPER_STATE_POLL;
> > > +
> > > wait_for_proc_work = thread->transaction_stack == NULL &&
> > > list_empty(>todo) && thread->return_error == BR_OK;
> > >
> >
> > Are you sure this backport is correct, given that in 4.9, binder_poll()
> > sometimes uses proc->wait instead of thread->wait?:

Jann's PoC calls the BINDER_THREAD_EXIT ioctl to free the
binder_thread which will then cause the UAF, and this is cut off by
the patch. IIUC, you are worried about a similar AUF on the proc->wait
access. I am not 100% sure, but I think the binder_proc lifetime
matches the corresponding struct file instance, so it shouldn't be
possible to get the binder_proc deallocated while still being able to
access it via filp->private_data.

> >
> > wait_for_proc_work = thread->transaction_stack == NULL &&
> > list_empty(>todo) && thread->return_error == BR_OK;
> >
> > binder_unlock(__func__);
> >
> > if (wait_for_proc_work) {
> > if (binder_has_proc_work(proc, thread))
> > return POLLIN;
> > poll_wait(filp, >wait, wait);
> > if (binder_has_proc_work(proc, thread))
> > return POLLIN;
> > } else {
> > if (binder_has_thread_work(thread))
> > return POLLIN;
> > poll_wait(filp, >wait, wait);
> > if (binder_has_thread_work(thread))
> > return POLLIN;
> > }
> > return 0;
>
> I _think_ the backport is correct, and I know someone has verified that
> the 4.4.y backport works properly and I don't see much difference here
> from that version.
>
> But I will defer to Todd and Martijn here, as they know this code _WAY_
> better than I do.  The codebase has changed a lot from 4.9.y to 4.14.y
> so it makes it hard to do equal comparisons simply.
>
> Todd and Martijn, thoughts?
>
> thanks,
>
> greg k-h


Re: [PATCH v2] Add a "nosymlinks" mount option.

2016-11-22 Thread Mattias Nissler
On Mon, Nov 21, 2016 at 6:10 PM, James Bottomley
<james.bottom...@hansenpartnership.com> wrote:
> On Wed, 2016-11-16 at 13:18 -0800, Mattias Nissler wrote:
>> I understand that silence suggests there's little interest, but
>> here's some new information I discovered today that may justify to
>> reconsider the patch:
>>
>> The BSDs already have exactly what I propose, the mount option is
>> called "nosymfollow" there:
>> https://github.com/freebsd/freebsd/blob/a41f4cc9a57cd74604ae7b051eec2
>> f48865f18d6/sys/kern/vfs_lookup.c#L939
>>
>> There's also some evidence on the net that people have been using
>> said nosymfollow mount option to mitigate symlink attacks.
>
> I've got to say that just because BSD does this doesn't make it a good
> idea.  The problem with disabling symlinks is that they're a core part
> of unix filesystem semantics and simply disabling them breaks various
> setups in various ways.

Agreed. The proposed mount option is not going to mix well with cases
such as shared object symlinks commonly found in /usr/lib/

> The breakage depends on the user, so if you
> come from a Windows background, where symlinks basically don't exist,
> you're likely fine, but if you come from a UNIX background, chances are
> you use them pretty extensively.  Obviously, I'm a UNIX background
> person and I can think of all the nasty ways you'll break my setup ...

Note that I'm not saying you should be applying this mount option
broadly, just that it is useful for certain mounts where there are no
reasons for symlinks to be present. Point in case: we already have
noexec, but obviously people don't mount their root fs noexec as that
would prevent the system from booting ;-)

>
> If a root privileged programme is trying to write to a file, it's not
> unreasonable to expect the programme itself to take simple precautions
> (like setuid to the user it's trying to write), so I'd really rather
> declare binaries that don't do this as broken and fix them.  Thinking
> you've plugged a hole like this simply by disabling symlinks is a false
> sense of security because there are a variety of other ways I could
> trick the programme into writing where it shouldn't.  Disabling
> symlinks is fixing a symptom, it's really the cause that should be
> fixed.

Your points are generally valid. Yes, I'm addressing the symptom of
incautious access to untrusted file systems. I actually agree that a
better solution would be for privileged code to be aware of the risks
of file system access and to apply caution where necessary. I also
agree that symlinks are not the only issue with untrusted file
systems, so applying this mount option is not a panacea. I should note
that I'm actively working on hardening some of the more exposed
contexts in Chrome OS beyond the symlink traversal issue.

However, I still believe that blocking symlink traversal in the kernel
is a useful thing to do in our use case. Rationale:

1. On a typical system, there's a ton of code executing as root or
with elevated privileges. It's not practical to audit all that code
for symlink traversal issues. Traditional systems just assume that the
root file system is trusted and that code running as root only deals
with root-owned files, which alleviates the problem to a point where
it becomes manageable. However, on Chrome OS we have the ambition to
not blindly trust the writable file system, so we have to worry about
*any* file access in privileged contexts, not just the cases where
privileged processes access user-owned file system locations (allow me
to add that history has shown that this alone has been a constant
source of bugs to the point that it is now generally considered an
anti-pattern).

2. It is sometimes hard to make code sufficiently cautious, for
example in init scripts. As long as you're using shell, you want to
use features such as shell redirection. To cover these, you'd ideally
patch the shell to take care upon opening files for redirection, but
given the prevalence of file access and the myriad of tools invoked by
shell this is bound to be a losing battle. Another option is to add
shell code along the lines of this: test "$(readlink -f $PATH)" =
"$PATH" (note that O_NOFOLLOW is not enough as that'll only affect the
last path component). That's tedious and error-prone to maintain, and
also comes with a TOCTOU issue.

3. Assuming for a moment that we can fix all existing issues, it's
still not realistic to keep the code clean moving forward. IMHO, the
root of the problem is that developers just don't have symlinks on
their radar when writing file access code. "I'll just write this data
to this file, and I'll hardcode the path, so what can go wrong?" is
probably a realistic level of caution you can expect. The fact that
the presence of symlinks can redirect the operation to a

Re: [PATCH v2] Add a "nosymlinks" mount option.

2016-11-22 Thread Mattias Nissler
On Mon, Nov 21, 2016 at 6:10 PM, James Bottomley
 wrote:
> On Wed, 2016-11-16 at 13:18 -0800, Mattias Nissler wrote:
>> I understand that silence suggests there's little interest, but
>> here's some new information I discovered today that may justify to
>> reconsider the patch:
>>
>> The BSDs already have exactly what I propose, the mount option is
>> called "nosymfollow" there:
>> https://github.com/freebsd/freebsd/blob/a41f4cc9a57cd74604ae7b051eec2
>> f48865f18d6/sys/kern/vfs_lookup.c#L939
>>
>> There's also some evidence on the net that people have been using
>> said nosymfollow mount option to mitigate symlink attacks.
>
> I've got to say that just because BSD does this doesn't make it a good
> idea.  The problem with disabling symlinks is that they're a core part
> of unix filesystem semantics and simply disabling them breaks various
> setups in various ways.

Agreed. The proposed mount option is not going to mix well with cases
such as shared object symlinks commonly found in /usr/lib/

> The breakage depends on the user, so if you
> come from a Windows background, where symlinks basically don't exist,
> you're likely fine, but if you come from a UNIX background, chances are
> you use them pretty extensively.  Obviously, I'm a UNIX background
> person and I can think of all the nasty ways you'll break my setup ...

Note that I'm not saying you should be applying this mount option
broadly, just that it is useful for certain mounts where there are no
reasons for symlinks to be present. Point in case: we already have
noexec, but obviously people don't mount their root fs noexec as that
would prevent the system from booting ;-)

>
> If a root privileged programme is trying to write to a file, it's not
> unreasonable to expect the programme itself to take simple precautions
> (like setuid to the user it's trying to write), so I'd really rather
> declare binaries that don't do this as broken and fix them.  Thinking
> you've plugged a hole like this simply by disabling symlinks is a false
> sense of security because there are a variety of other ways I could
> trick the programme into writing where it shouldn't.  Disabling
> symlinks is fixing a symptom, it's really the cause that should be
> fixed.

Your points are generally valid. Yes, I'm addressing the symptom of
incautious access to untrusted file systems. I actually agree that a
better solution would be for privileged code to be aware of the risks
of file system access and to apply caution where necessary. I also
agree that symlinks are not the only issue with untrusted file
systems, so applying this mount option is not a panacea. I should note
that I'm actively working on hardening some of the more exposed
contexts in Chrome OS beyond the symlink traversal issue.

However, I still believe that blocking symlink traversal in the kernel
is a useful thing to do in our use case. Rationale:

1. On a typical system, there's a ton of code executing as root or
with elevated privileges. It's not practical to audit all that code
for symlink traversal issues. Traditional systems just assume that the
root file system is trusted and that code running as root only deals
with root-owned files, which alleviates the problem to a point where
it becomes manageable. However, on Chrome OS we have the ambition to
not blindly trust the writable file system, so we have to worry about
*any* file access in privileged contexts, not just the cases where
privileged processes access user-owned file system locations (allow me
to add that history has shown that this alone has been a constant
source of bugs to the point that it is now generally considered an
anti-pattern).

2. It is sometimes hard to make code sufficiently cautious, for
example in init scripts. As long as you're using shell, you want to
use features such as shell redirection. To cover these, you'd ideally
patch the shell to take care upon opening files for redirection, but
given the prevalence of file access and the myriad of tools invoked by
shell this is bound to be a losing battle. Another option is to add
shell code along the lines of this: test "$(readlink -f $PATH)" =
"$PATH" (note that O_NOFOLLOW is not enough as that'll only affect the
last path component). That's tedious and error-prone to maintain, and
also comes with a TOCTOU issue.

3. Assuming for a moment that we can fix all existing issues, it's
still not realistic to keep the code clean moving forward. IMHO, the
root of the problem is that developers just don't have symlinks on
their radar when writing file access code. "I'll just write this data
to this file, and I'll hardcode the path, so what can go wrong?" is
probably a realistic level of caution you can expect. The fact that
the presence of symlinks can redirect the operation to an entirely
different file isn't obvious at all wh

Re: [PATCH v2] Add a "nosymlinks" mount option.

2016-11-17 Thread Mattias Nissler
On Thu, Nov 17, 2016 at 1:43 PM, Austin S. Hemmelgarn
<ahferro...@gmail.com> wrote:
> On 2016-11-16 16:18, Mattias Nissler wrote:
>>
>> I understand that silence suggests there's little interest, but here's
>> some new information I discovered today that may justify to reconsider
>> the patch:
>>
>> The BSDs already have exactly what I propose, the mount option is
>> called "nosymfollow" there:
>>
>> https://github.com/freebsd/freebsd/blob/a41f4cc9a57cd74604ae7b051eec2f48865f18d6/sys/kern/vfs_lookup.c#L939
>>
>> There's also some evidence on the net that people have been using said
>> nosymfollow mount option to mitigate symlink attacks.
>
> I will comment that I do have interest in this (to the point that if this
> doesn't go into mainline, I'll probably add it to my local set of patches),
> though I'm not a maintainer or reviewer for the relevant area, so there's
> not much I can do except say that there is interest from at least one person
> who didn't work on the patch.
>
> That said, it would be nice if we matched the BSD option name (we do on
> pretty much every other core VFS level mount option).

Agreed. I'm happy to send an updated patch with adjusted naming.

>
>>
>> On Mon, Oct 24, 2016 at 7:09 AM, Mattias Nissler <mniss...@chromium.org>
>> wrote:
>>>
>>> Friendly ping - does this version of the patch have any chance on
>>> getting included in mainline?
>>>
>>> On Wed, Oct 19, 2016 at 2:31 PM, Mattias Nissler <mniss...@chromium.org>
>>> wrote:
>>>>
>>>> For mounts that have the new "nosymlinks" option, don't follow
>>>> symlinks when resolving paths. The new option is similar in spirit to
>>>> the existing "nodev", "noexec", and "nosuid" options.
>>>>
>>>> Note that symlinks may still be created on mounts where the
>>>> "nosymlinks" option is present. readlink() remains functional, so user
>>>> space code that is aware of symlinks can still choose to follow them
>>>> explicitly.
>>>>
>>>> Setting the "nosymlinks" mount option helps prevent privileged writers
>>>> from modifying files unintentionally in case there is an unexpected
>>>> link along the accessed path. The "nosymlinks" option is thus useful
>>>> as a defensive measure for systems that need to deal with untrusted
>>>> file systems in privileged contexts.
>>>>
>>>> Signed-off-by: Mattias Nissler <mniss...@chromium.org>
>>>> ---
>>>>  fs/namei.c  | 3 +++
>>>>  fs/namespace.c  | 9 ++---
>>>>  fs/proc_namespace.c | 1 +
>>>>  fs/statfs.c | 2 ++
>>>>  include/linux/mount.h   | 3 ++-
>>>>  include/linux/statfs.h  | 1 +
>>>>  include/uapi/linux/fs.h | 1 +
>>>>  7 files changed, 16 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/fs/namei.c b/fs/namei.c
>>>> index 5b4eed2..4cddcf3 100644
>>>> --- a/fs/namei.c
>>>> +++ b/fs/namei.c
>>>> @@ -1021,6 +1021,9 @@ const char *get_link(struct nameidata *nd)
>>>> touch_atime(>link);
>>>> }
>>>>
>>>> +   if (nd->path.mnt->mnt_flags & MNT_NOSYMLINKS)
>>>> +   return ERR_PTR(-EACCES);
>>>> +
>>>> error = security_inode_follow_link(dentry, inode,
>>>>nd->flags & LOOKUP_RCU);
>>>> if (unlikely(error))
>>>> diff --git a/fs/namespace.c b/fs/namespace.c
>>>> index e6c234b..deec84e 100644
>>>> --- a/fs/namespace.c
>>>> +++ b/fs/namespace.c
>>>> @@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char
>>>> __user *dir_name,
>>>> mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME);
>>>> if (flags & MS_RDONLY)
>>>> mnt_flags |= MNT_READONLY;
>>>> +   if (flags & MS_NOSYMLINKS)
>>>> +   mnt_flags |= MNT_NOSYMLINKS;
>>>>
>>>> /* The default atime for remount is preservation */
>>>> if ((flags & MS_REMOUNT) &&
>>>> @@ -2741,9 +2743,10 @@ long do_mount(const char *dev_name, const char
>>>> __user *dir_name,
>>>> mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK;
>>>&

Re: [PATCH v2] Add a "nosymlinks" mount option.

2016-11-17 Thread Mattias Nissler
On Thu, Nov 17, 2016 at 1:43 PM, Austin S. Hemmelgarn
 wrote:
> On 2016-11-16 16:18, Mattias Nissler wrote:
>>
>> I understand that silence suggests there's little interest, but here's
>> some new information I discovered today that may justify to reconsider
>> the patch:
>>
>> The BSDs already have exactly what I propose, the mount option is
>> called "nosymfollow" there:
>>
>> https://github.com/freebsd/freebsd/blob/a41f4cc9a57cd74604ae7b051eec2f48865f18d6/sys/kern/vfs_lookup.c#L939
>>
>> There's also some evidence on the net that people have been using said
>> nosymfollow mount option to mitigate symlink attacks.
>
> I will comment that I do have interest in this (to the point that if this
> doesn't go into mainline, I'll probably add it to my local set of patches),
> though I'm not a maintainer or reviewer for the relevant area, so there's
> not much I can do except say that there is interest from at least one person
> who didn't work on the patch.
>
> That said, it would be nice if we matched the BSD option name (we do on
> pretty much every other core VFS level mount option).

Agreed. I'm happy to send an updated patch with adjusted naming.

>
>>
>> On Mon, Oct 24, 2016 at 7:09 AM, Mattias Nissler 
>> wrote:
>>>
>>> Friendly ping - does this version of the patch have any chance on
>>> getting included in mainline?
>>>
>>> On Wed, Oct 19, 2016 at 2:31 PM, Mattias Nissler 
>>> wrote:
>>>>
>>>> For mounts that have the new "nosymlinks" option, don't follow
>>>> symlinks when resolving paths. The new option is similar in spirit to
>>>> the existing "nodev", "noexec", and "nosuid" options.
>>>>
>>>> Note that symlinks may still be created on mounts where the
>>>> "nosymlinks" option is present. readlink() remains functional, so user
>>>> space code that is aware of symlinks can still choose to follow them
>>>> explicitly.
>>>>
>>>> Setting the "nosymlinks" mount option helps prevent privileged writers
>>>> from modifying files unintentionally in case there is an unexpected
>>>> link along the accessed path. The "nosymlinks" option is thus useful
>>>> as a defensive measure for systems that need to deal with untrusted
>>>> file systems in privileged contexts.
>>>>
>>>> Signed-off-by: Mattias Nissler 
>>>> ---
>>>>  fs/namei.c  | 3 +++
>>>>  fs/namespace.c  | 9 ++---
>>>>  fs/proc_namespace.c | 1 +
>>>>  fs/statfs.c | 2 ++
>>>>  include/linux/mount.h   | 3 ++-
>>>>  include/linux/statfs.h  | 1 +
>>>>  include/uapi/linux/fs.h | 1 +
>>>>  7 files changed, 16 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/fs/namei.c b/fs/namei.c
>>>> index 5b4eed2..4cddcf3 100644
>>>> --- a/fs/namei.c
>>>> +++ b/fs/namei.c
>>>> @@ -1021,6 +1021,9 @@ const char *get_link(struct nameidata *nd)
>>>> touch_atime(>link);
>>>> }
>>>>
>>>> +   if (nd->path.mnt->mnt_flags & MNT_NOSYMLINKS)
>>>> +   return ERR_PTR(-EACCES);
>>>> +
>>>> error = security_inode_follow_link(dentry, inode,
>>>>nd->flags & LOOKUP_RCU);
>>>> if (unlikely(error))
>>>> diff --git a/fs/namespace.c b/fs/namespace.c
>>>> index e6c234b..deec84e 100644
>>>> --- a/fs/namespace.c
>>>> +++ b/fs/namespace.c
>>>> @@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char
>>>> __user *dir_name,
>>>> mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME);
>>>> if (flags & MS_RDONLY)
>>>> mnt_flags |= MNT_READONLY;
>>>> +   if (flags & MS_NOSYMLINKS)
>>>> +   mnt_flags |= MNT_NOSYMLINKS;
>>>>
>>>> /* The default atime for remount is preservation */
>>>> if ((flags & MS_REMOUNT) &&
>>>> @@ -2741,9 +2743,10 @@ long do_mount(const char *dev_name, const char
>>>> __user *dir_name,
>>>> mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK;
>>>> }
>>>>
>>>> -   flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE

[PATCH v3] Add a "nosymfollow" mount option.

2016-11-17 Thread Mattias Nissler
For mounts that have the new "nosymfollow" option, don't follow
symlinks when resolving paths. The new option is similar in spirit to
the existing "nodev", "noexec", and "nosuid" options. Various BSD
variants have been supporting the "nosymfollow" mount option for a
long time with equivalent implementations.

Note that symlinks may still be created on file systems mounted with
the "nosymfollow" option present. readlink() remains functional, so
user space code that is aware of symlinks can still choose to follow
them explicitly.

Setting the "nosymfollow" mount option helps prevent privileged
writers from modifying files unintentionally in case there is an
unexpected link along the accessed path. The "nosymfollow" option is
thus useful as a defensive measure for systems that need to deal with
untrusted file systems in privileged contexts.

Signed-off-by: Mattias Nissler <mniss...@chromium.org>
---
Changes since v2:
  - Updated the option name to align with BSD naming.

I have also uploaded the corresponding util-linux patch to make the
"mount" command understand the new option here:
https://gist.github.com/anonymous/317207ae499389235258f0d52ab22c03

 fs/namei.c  | 3 +++
 fs/namespace.c  | 9 ++---
 fs/proc_namespace.c | 1 +
 fs/statfs.c | 2 ++
 include/linux/mount.h   | 3 ++-
 include/linux/statfs.h  | 1 +
 include/uapi/linux/fs.h | 1 +
 7 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 5b4eed2..4751e7f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1021,6 +1021,9 @@ const char *get_link(struct nameidata *nd)
touch_atime(>link);
}
 
+   if (nd->path.mnt->mnt_flags & MNT_NOSYMFOLLOW)
+   return ERR_PTR(-EACCES);
+
error = security_inode_follow_link(dentry, inode,
   nd->flags & LOOKUP_RCU);
if (unlikely(error))
diff --git a/fs/namespace.c b/fs/namespace.c
index e6c234b..2bf244e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char __user 
*dir_name,
mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME);
if (flags & MS_RDONLY)
mnt_flags |= MNT_READONLY;
+   if (flags & MS_NOSYMFOLLOW)
+   mnt_flags |= MNT_NOSYMFOLLOW;
 
/* The default atime for remount is preservation */
if ((flags & MS_REMOUNT) &&
@@ -2741,9 +2743,10 @@ long do_mount(const char *dev_name, const char __user 
*dir_name,
mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK;
}
 
-   flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN |
-  MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT |
-  MS_STRICTATIME | MS_NOREMOTELOCK);
+   flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_NOSYMFOLLOW |
+  MS_ACTIVE | MS_BORN | MS_NOATIME | MS_NODIRATIME |
+  MS_RELATIME | MS_KERNMOUNT | MS_STRICTATIME |
+  MS_NOREMOTELOCK);
 
if (flags & MS_REMOUNT)
retval = do_remount(, flags & ~MS_REMOUNT, mnt_flags,
diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index 3f1190d..366149d 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -67,6 +67,7 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount 
*mnt)
{ MNT_NOATIME, ",noatime" },
{ MNT_NODIRATIME, ",nodiratime" },
{ MNT_RELATIME, ",relatime" },
+   { MNT_NOSYMFOLLOW, ",nosymfollow" },
{ 0, NULL }
};
const struct proc_fs_info *fs_infop;
diff --git a/fs/statfs.c b/fs/statfs.c
index 083dc0a..cfd4da8 100644
--- a/fs/statfs.c
+++ b/fs/statfs.c
@@ -27,6 +27,8 @@ static int flags_by_mnt(int mnt_flags)
flags |= ST_NODIRATIME;
if (mnt_flags & MNT_RELATIME)
flags |= ST_RELATIME;
+   if (mnt_flags & MNT_NOSYMFOLLOW)
+   flags |= ST_NOSYMFOLLOW;
return flags;
 }
 
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 1172cce..c9e8d7b 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -28,6 +28,7 @@ struct mnt_namespace;
 #define MNT_NODIRATIME 0x10
 #define MNT_RELATIME   0x20
 #define MNT_READONLY   0x40/* does the user want this to be r/o? */
+#define MNT_NOSYMFOLLOW0x80
 
 #define MNT_SHRINKABLE 0x100
 #define MNT_WRITE_HOLD 0x200
@@ -44,7 +45,7 @@ struct mnt_namespace;
 #define MNT_SHARED_MASK(MNT_UNBINDABLE)
 #define MNT_USER_SETTABLE_MASK  (MNT_NOSUID | MNT_NODEV | MNT_NOEXEC \
 | MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME \
-| MNT_READONLY)
+  

[PATCH v3] Add a "nosymfollow" mount option.

2016-11-17 Thread Mattias Nissler
For mounts that have the new "nosymfollow" option, don't follow
symlinks when resolving paths. The new option is similar in spirit to
the existing "nodev", "noexec", and "nosuid" options. Various BSD
variants have been supporting the "nosymfollow" mount option for a
long time with equivalent implementations.

Note that symlinks may still be created on file systems mounted with
the "nosymfollow" option present. readlink() remains functional, so
user space code that is aware of symlinks can still choose to follow
them explicitly.

Setting the "nosymfollow" mount option helps prevent privileged
writers from modifying files unintentionally in case there is an
unexpected link along the accessed path. The "nosymfollow" option is
thus useful as a defensive measure for systems that need to deal with
untrusted file systems in privileged contexts.

Signed-off-by: Mattias Nissler 
---
Changes since v2:
  - Updated the option name to align with BSD naming.

I have also uploaded the corresponding util-linux patch to make the
"mount" command understand the new option here:
https://gist.github.com/anonymous/317207ae499389235258f0d52ab22c03

 fs/namei.c  | 3 +++
 fs/namespace.c  | 9 ++---
 fs/proc_namespace.c | 1 +
 fs/statfs.c | 2 ++
 include/linux/mount.h   | 3 ++-
 include/linux/statfs.h  | 1 +
 include/uapi/linux/fs.h | 1 +
 7 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 5b4eed2..4751e7f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1021,6 +1021,9 @@ const char *get_link(struct nameidata *nd)
touch_atime(>link);
}
 
+   if (nd->path.mnt->mnt_flags & MNT_NOSYMFOLLOW)
+   return ERR_PTR(-EACCES);
+
error = security_inode_follow_link(dentry, inode,
   nd->flags & LOOKUP_RCU);
if (unlikely(error))
diff --git a/fs/namespace.c b/fs/namespace.c
index e6c234b..2bf244e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char __user 
*dir_name,
mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME);
if (flags & MS_RDONLY)
mnt_flags |= MNT_READONLY;
+   if (flags & MS_NOSYMFOLLOW)
+   mnt_flags |= MNT_NOSYMFOLLOW;
 
/* The default atime for remount is preservation */
if ((flags & MS_REMOUNT) &&
@@ -2741,9 +2743,10 @@ long do_mount(const char *dev_name, const char __user 
*dir_name,
mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK;
}
 
-   flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN |
-  MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT |
-  MS_STRICTATIME | MS_NOREMOTELOCK);
+   flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_NOSYMFOLLOW |
+  MS_ACTIVE | MS_BORN | MS_NOATIME | MS_NODIRATIME |
+  MS_RELATIME | MS_KERNMOUNT | MS_STRICTATIME |
+  MS_NOREMOTELOCK);
 
if (flags & MS_REMOUNT)
retval = do_remount(, flags & ~MS_REMOUNT, mnt_flags,
diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index 3f1190d..366149d 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -67,6 +67,7 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount 
*mnt)
{ MNT_NOATIME, ",noatime" },
{ MNT_NODIRATIME, ",nodiratime" },
{ MNT_RELATIME, ",relatime" },
+   { MNT_NOSYMFOLLOW, ",nosymfollow" },
{ 0, NULL }
};
const struct proc_fs_info *fs_infop;
diff --git a/fs/statfs.c b/fs/statfs.c
index 083dc0a..cfd4da8 100644
--- a/fs/statfs.c
+++ b/fs/statfs.c
@@ -27,6 +27,8 @@ static int flags_by_mnt(int mnt_flags)
flags |= ST_NODIRATIME;
if (mnt_flags & MNT_RELATIME)
flags |= ST_RELATIME;
+   if (mnt_flags & MNT_NOSYMFOLLOW)
+   flags |= ST_NOSYMFOLLOW;
return flags;
 }
 
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 1172cce..c9e8d7b 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -28,6 +28,7 @@ struct mnt_namespace;
 #define MNT_NODIRATIME 0x10
 #define MNT_RELATIME   0x20
 #define MNT_READONLY   0x40/* does the user want this to be r/o? */
+#define MNT_NOSYMFOLLOW0x80
 
 #define MNT_SHRINKABLE 0x100
 #define MNT_WRITE_HOLD 0x200
@@ -44,7 +45,7 @@ struct mnt_namespace;
 #define MNT_SHARED_MASK(MNT_UNBINDABLE)
 #define MNT_USER_SETTABLE_MASK  (MNT_NOSUID | MNT_NODEV | MNT_NOEXEC \
 | MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME \
-| MNT_READONLY)
+| MNT_READONLY | MNT_NOSYMFOL

Re: [PATCH v2] Add a "nosymlinks" mount option.

2016-11-16 Thread Mattias Nissler
I understand that silence suggests there's little interest, but here's
some new information I discovered today that may justify to reconsider
the patch:

The BSDs already have exactly what I propose, the mount option is
called "nosymfollow" there:
https://github.com/freebsd/freebsd/blob/a41f4cc9a57cd74604ae7b051eec2f48865f18d6/sys/kern/vfs_lookup.c#L939

There's also some evidence on the net that people have been using said
nosymfollow mount option to mitigate symlink attacks.

On Mon, Oct 24, 2016 at 7:09 AM, Mattias Nissler <mniss...@chromium.org> wrote:
> Friendly ping - does this version of the patch have any chance on
> getting included in mainline?
>
> On Wed, Oct 19, 2016 at 2:31 PM, Mattias Nissler <mniss...@chromium.org> 
> wrote:
>> For mounts that have the new "nosymlinks" option, don't follow
>> symlinks when resolving paths. The new option is similar in spirit to
>> the existing "nodev", "noexec", and "nosuid" options.
>>
>> Note that symlinks may still be created on mounts where the
>> "nosymlinks" option is present. readlink() remains functional, so user
>> space code that is aware of symlinks can still choose to follow them
>> explicitly.
>>
>> Setting the "nosymlinks" mount option helps prevent privileged writers
>> from modifying files unintentionally in case there is an unexpected
>> link along the accessed path. The "nosymlinks" option is thus useful
>> as a defensive measure for systems that need to deal with untrusted
>> file systems in privileged contexts.
>>
>> Signed-off-by: Mattias Nissler <mniss...@chromium.org>
>> ---
>>  fs/namei.c  | 3 +++
>>  fs/namespace.c  | 9 ++---
>>  fs/proc_namespace.c | 1 +
>>  fs/statfs.c | 2 ++
>>  include/linux/mount.h   | 3 ++-
>>  include/linux/statfs.h  | 1 +
>>  include/uapi/linux/fs.h | 1 +
>>  7 files changed, 16 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/namei.c b/fs/namei.c
>> index 5b4eed2..4cddcf3 100644
>> --- a/fs/namei.c
>> +++ b/fs/namei.c
>> @@ -1021,6 +1021,9 @@ const char *get_link(struct nameidata *nd)
>> touch_atime(>link);
>> }
>>
>> +   if (nd->path.mnt->mnt_flags & MNT_NOSYMLINKS)
>> +   return ERR_PTR(-EACCES);
>> +
>> error = security_inode_follow_link(dentry, inode,
>>nd->flags & LOOKUP_RCU);
>> if (unlikely(error))
>> diff --git a/fs/namespace.c b/fs/namespace.c
>> index e6c234b..deec84e 100644
>> --- a/fs/namespace.c
>> +++ b/fs/namespace.c
>> @@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char __user 
>> *dir_name,
>> mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME);
>> if (flags & MS_RDONLY)
>> mnt_flags |= MNT_READONLY;
>> +   if (flags & MS_NOSYMLINKS)
>> +   mnt_flags |= MNT_NOSYMLINKS;
>>
>> /* The default atime for remount is preservation */
>> if ((flags & MS_REMOUNT) &&
>> @@ -2741,9 +2743,10 @@ long do_mount(const char *dev_name, const char __user 
>> *dir_name,
>> mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK;
>> }
>>
>> -   flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN |
>> -  MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT |
>> -  MS_STRICTATIME | MS_NOREMOTELOCK);
>> +   flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_NOSYMLINKS |
>> +  MS_ACTIVE | MS_BORN | MS_NOATIME | MS_NODIRATIME |
>> +  MS_RELATIME | MS_KERNMOUNT | MS_STRICTATIME |
>> +  MS_NOREMOTELOCK);
>>
>> if (flags & MS_REMOUNT)
>> retval = do_remount(, flags & ~MS_REMOUNT, mnt_flags,
>> diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
>> index 3f1190d..a1949d9 100644
>> --- a/fs/proc_namespace.c
>> +++ b/fs/proc_namespace.c
>> @@ -67,6 +67,7 @@ static void show_mnt_opts(struct seq_file *m, struct 
>> vfsmount *mnt)
>> { MNT_NOATIME, ",noatime" },
>> { MNT_NODIRATIME, ",nodiratime" },
>> { MNT_RELATIME, ",relatime" },
>> +   { MNT_NOSYMLINKS, ",nosymlinks" },
>> { 0, NULL }
>> };
>> const struct proc_fs_info *fs_infop;
>> diff --git a/fs/statf

Re: [PATCH v2] Add a "nosymlinks" mount option.

2016-11-16 Thread Mattias Nissler
I understand that silence suggests there's little interest, but here's
some new information I discovered today that may justify to reconsider
the patch:

The BSDs already have exactly what I propose, the mount option is
called "nosymfollow" there:
https://github.com/freebsd/freebsd/blob/a41f4cc9a57cd74604ae7b051eec2f48865f18d6/sys/kern/vfs_lookup.c#L939

There's also some evidence on the net that people have been using said
nosymfollow mount option to mitigate symlink attacks.

On Mon, Oct 24, 2016 at 7:09 AM, Mattias Nissler  wrote:
> Friendly ping - does this version of the patch have any chance on
> getting included in mainline?
>
> On Wed, Oct 19, 2016 at 2:31 PM, Mattias Nissler  
> wrote:
>> For mounts that have the new "nosymlinks" option, don't follow
>> symlinks when resolving paths. The new option is similar in spirit to
>> the existing "nodev", "noexec", and "nosuid" options.
>>
>> Note that symlinks may still be created on mounts where the
>> "nosymlinks" option is present. readlink() remains functional, so user
>> space code that is aware of symlinks can still choose to follow them
>> explicitly.
>>
>> Setting the "nosymlinks" mount option helps prevent privileged writers
>> from modifying files unintentionally in case there is an unexpected
>> link along the accessed path. The "nosymlinks" option is thus useful
>> as a defensive measure for systems that need to deal with untrusted
>> file systems in privileged contexts.
>>
>> Signed-off-by: Mattias Nissler 
>> ---
>>  fs/namei.c  | 3 +++
>>  fs/namespace.c  | 9 ++---
>>  fs/proc_namespace.c | 1 +
>>  fs/statfs.c | 2 ++
>>  include/linux/mount.h   | 3 ++-
>>  include/linux/statfs.h  | 1 +
>>  include/uapi/linux/fs.h | 1 +
>>  7 files changed, 16 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/namei.c b/fs/namei.c
>> index 5b4eed2..4cddcf3 100644
>> --- a/fs/namei.c
>> +++ b/fs/namei.c
>> @@ -1021,6 +1021,9 @@ const char *get_link(struct nameidata *nd)
>> touch_atime(>link);
>> }
>>
>> +   if (nd->path.mnt->mnt_flags & MNT_NOSYMLINKS)
>> +   return ERR_PTR(-EACCES);
>> +
>> error = security_inode_follow_link(dentry, inode,
>>nd->flags & LOOKUP_RCU);
>> if (unlikely(error))
>> diff --git a/fs/namespace.c b/fs/namespace.c
>> index e6c234b..deec84e 100644
>> --- a/fs/namespace.c
>> +++ b/fs/namespace.c
>> @@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char __user 
>> *dir_name,
>> mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME);
>> if (flags & MS_RDONLY)
>> mnt_flags |= MNT_READONLY;
>> +   if (flags & MS_NOSYMLINKS)
>> +   mnt_flags |= MNT_NOSYMLINKS;
>>
>> /* The default atime for remount is preservation */
>> if ((flags & MS_REMOUNT) &&
>> @@ -2741,9 +2743,10 @@ long do_mount(const char *dev_name, const char __user 
>> *dir_name,
>> mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK;
>> }
>>
>> -   flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN |
>> -  MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT |
>> -  MS_STRICTATIME | MS_NOREMOTELOCK);
>> +   flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_NOSYMLINKS |
>> +  MS_ACTIVE | MS_BORN | MS_NOATIME | MS_NODIRATIME |
>> +  MS_RELATIME | MS_KERNMOUNT | MS_STRICTATIME |
>> +  MS_NOREMOTELOCK);
>>
>> if (flags & MS_REMOUNT)
>> retval = do_remount(, flags & ~MS_REMOUNT, mnt_flags,
>> diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
>> index 3f1190d..a1949d9 100644
>> --- a/fs/proc_namespace.c
>> +++ b/fs/proc_namespace.c
>> @@ -67,6 +67,7 @@ static void show_mnt_opts(struct seq_file *m, struct 
>> vfsmount *mnt)
>> { MNT_NOATIME, ",noatime" },
>> { MNT_NODIRATIME, ",nodiratime" },
>> { MNT_RELATIME, ",relatime" },
>> +   { MNT_NOSYMLINKS, ",nosymlinks" },
>> { 0, NULL }
>> };
>> const struct proc_fs_info *fs_infop;
>> diff --git a/fs/statfs.c b/fs/statfs.c
>> index 083dc0a..7ff7c32 100644
>> --- a/fs/statfs.c

Re: [PATCH v2] Add a "nosymlinks" mount option.

2016-10-24 Thread Mattias Nissler
Friendly ping - does this version of the patch have any chance on
getting included in mainline?

On Wed, Oct 19, 2016 at 2:31 PM, Mattias Nissler <mniss...@chromium.org> wrote:
> For mounts that have the new "nosymlinks" option, don't follow
> symlinks when resolving paths. The new option is similar in spirit to
> the existing "nodev", "noexec", and "nosuid" options.
>
> Note that symlinks may still be created on mounts where the
> "nosymlinks" option is present. readlink() remains functional, so user
> space code that is aware of symlinks can still choose to follow them
> explicitly.
>
> Setting the "nosymlinks" mount option helps prevent privileged writers
> from modifying files unintentionally in case there is an unexpected
> link along the accessed path. The "nosymlinks" option is thus useful
> as a defensive measure for systems that need to deal with untrusted
> file systems in privileged contexts.
>
> Signed-off-by: Mattias Nissler <mniss...@chromium.org>
> ---
>  fs/namei.c  | 3 +++
>  fs/namespace.c  | 9 ++---
>  fs/proc_namespace.c | 1 +
>  fs/statfs.c | 2 ++
>  include/linux/mount.h   | 3 ++-
>  include/linux/statfs.h  | 1 +
>  include/uapi/linux/fs.h | 1 +
>  7 files changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 5b4eed2..4cddcf3 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1021,6 +1021,9 @@ const char *get_link(struct nameidata *nd)
> touch_atime(>link);
> }
>
> +   if (nd->path.mnt->mnt_flags & MNT_NOSYMLINKS)
> +   return ERR_PTR(-EACCES);
> +
> error = security_inode_follow_link(dentry, inode,
>nd->flags & LOOKUP_RCU);
> if (unlikely(error))
> diff --git a/fs/namespace.c b/fs/namespace.c
> index e6c234b..deec84e 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char __user 
> *dir_name,
> mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME);
> if (flags & MS_RDONLY)
> mnt_flags |= MNT_READONLY;
> +   if (flags & MS_NOSYMLINKS)
> +   mnt_flags |= MNT_NOSYMLINKS;
>
> /* The default atime for remount is preservation */
> if ((flags & MS_REMOUNT) &&
> @@ -2741,9 +2743,10 @@ long do_mount(const char *dev_name, const char __user 
> *dir_name,
> mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK;
> }
>
> -   flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN |
> -  MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT |
> -  MS_STRICTATIME | MS_NOREMOTELOCK);
> +   flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_NOSYMLINKS |
> +  MS_ACTIVE | MS_BORN | MS_NOATIME | MS_NODIRATIME |
> +  MS_RELATIME | MS_KERNMOUNT | MS_STRICTATIME |
> +  MS_NOREMOTELOCK);
>
> if (flags & MS_REMOUNT)
> retval = do_remount(, flags & ~MS_REMOUNT, mnt_flags,
> diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
> index 3f1190d..a1949d9 100644
> --- a/fs/proc_namespace.c
> +++ b/fs/proc_namespace.c
> @@ -67,6 +67,7 @@ static void show_mnt_opts(struct seq_file *m, struct 
> vfsmount *mnt)
> { MNT_NOATIME, ",noatime" },
> { MNT_NODIRATIME, ",nodiratime" },
> { MNT_RELATIME, ",relatime" },
> +   { MNT_NOSYMLINKS, ",nosymlinks" },
> { 0, NULL }
> };
> const struct proc_fs_info *fs_infop;
> diff --git a/fs/statfs.c b/fs/statfs.c
> index 083dc0a..7ff7c32 100644
> --- a/fs/statfs.c
> +++ b/fs/statfs.c
> @@ -27,6 +27,8 @@ static int flags_by_mnt(int mnt_flags)
> flags |= ST_NODIRATIME;
> if (mnt_flags & MNT_RELATIME)
> flags |= ST_RELATIME;
> +   if (mnt_flags & MNT_NOSYMLINKS)
> +   flags |= ST_NOSYMLINKS;
> return flags;
>  }
>
> diff --git a/include/linux/mount.h b/include/linux/mount.h
> index 1172cce..5e302f4 100644
> --- a/include/linux/mount.h
> +++ b/include/linux/mount.h
> @@ -28,6 +28,7 @@ struct mnt_namespace;
>  #define MNT_NODIRATIME 0x10
>  #define MNT_RELATIME   0x20
>  #define MNT_READONLY   0x40/* does the user want this to be r/o? */
> +#define MNT_NOSYMLINKS 0x80
>
>  #define MNT_SHRINKABLE 0x100
>  #define MNT_WRITE_HOLD 0x200
> @@ -44,7 +45,7 @@ struct mnt_namespace;
&

Re: [PATCH v2] Add a "nosymlinks" mount option.

2016-10-24 Thread Mattias Nissler
Friendly ping - does this version of the patch have any chance on
getting included in mainline?

On Wed, Oct 19, 2016 at 2:31 PM, Mattias Nissler  wrote:
> For mounts that have the new "nosymlinks" option, don't follow
> symlinks when resolving paths. The new option is similar in spirit to
> the existing "nodev", "noexec", and "nosuid" options.
>
> Note that symlinks may still be created on mounts where the
> "nosymlinks" option is present. readlink() remains functional, so user
> space code that is aware of symlinks can still choose to follow them
> explicitly.
>
> Setting the "nosymlinks" mount option helps prevent privileged writers
> from modifying files unintentionally in case there is an unexpected
> link along the accessed path. The "nosymlinks" option is thus useful
> as a defensive measure for systems that need to deal with untrusted
> file systems in privileged contexts.
>
> Signed-off-by: Mattias Nissler 
> ---
>  fs/namei.c  | 3 +++
>  fs/namespace.c  | 9 ++---
>  fs/proc_namespace.c | 1 +
>  fs/statfs.c | 2 ++
>  include/linux/mount.h   | 3 ++-
>  include/linux/statfs.h  | 1 +
>  include/uapi/linux/fs.h | 1 +
>  7 files changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 5b4eed2..4cddcf3 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1021,6 +1021,9 @@ const char *get_link(struct nameidata *nd)
> touch_atime(>link);
> }
>
> +   if (nd->path.mnt->mnt_flags & MNT_NOSYMLINKS)
> +   return ERR_PTR(-EACCES);
> +
> error = security_inode_follow_link(dentry, inode,
>nd->flags & LOOKUP_RCU);
> if (unlikely(error))
> diff --git a/fs/namespace.c b/fs/namespace.c
> index e6c234b..deec84e 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char __user 
> *dir_name,
> mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME);
> if (flags & MS_RDONLY)
> mnt_flags |= MNT_READONLY;
> +   if (flags & MS_NOSYMLINKS)
> +   mnt_flags |= MNT_NOSYMLINKS;
>
> /* The default atime for remount is preservation */
> if ((flags & MS_REMOUNT) &&
> @@ -2741,9 +2743,10 @@ long do_mount(const char *dev_name, const char __user 
> *dir_name,
> mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK;
> }
>
> -   flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN |
> -  MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT |
> -  MS_STRICTATIME | MS_NOREMOTELOCK);
> +   flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_NOSYMLINKS |
> +  MS_ACTIVE | MS_BORN | MS_NOATIME | MS_NODIRATIME |
> +  MS_RELATIME | MS_KERNMOUNT | MS_STRICTATIME |
> +  MS_NOREMOTELOCK);
>
> if (flags & MS_REMOUNT)
> retval = do_remount(, flags & ~MS_REMOUNT, mnt_flags,
> diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
> index 3f1190d..a1949d9 100644
> --- a/fs/proc_namespace.c
> +++ b/fs/proc_namespace.c
> @@ -67,6 +67,7 @@ static void show_mnt_opts(struct seq_file *m, struct 
> vfsmount *mnt)
> { MNT_NOATIME, ",noatime" },
> { MNT_NODIRATIME, ",nodiratime" },
> { MNT_RELATIME, ",relatime" },
> +   { MNT_NOSYMLINKS, ",nosymlinks" },
> { 0, NULL }
> };
> const struct proc_fs_info *fs_infop;
> diff --git a/fs/statfs.c b/fs/statfs.c
> index 083dc0a..7ff7c32 100644
> --- a/fs/statfs.c
> +++ b/fs/statfs.c
> @@ -27,6 +27,8 @@ static int flags_by_mnt(int mnt_flags)
> flags |= ST_NODIRATIME;
> if (mnt_flags & MNT_RELATIME)
> flags |= ST_RELATIME;
> +   if (mnt_flags & MNT_NOSYMLINKS)
> +   flags |= ST_NOSYMLINKS;
> return flags;
>  }
>
> diff --git a/include/linux/mount.h b/include/linux/mount.h
> index 1172cce..5e302f4 100644
> --- a/include/linux/mount.h
> +++ b/include/linux/mount.h
> @@ -28,6 +28,7 @@ struct mnt_namespace;
>  #define MNT_NODIRATIME 0x10
>  #define MNT_RELATIME   0x20
>  #define MNT_READONLY   0x40/* does the user want this to be r/o? */
> +#define MNT_NOSYMLINKS 0x80
>
>  #define MNT_SHRINKABLE 0x100
>  #define MNT_WRITE_HOLD 0x200
> @@ -44,7 +45,7 @@ struct mnt_namespace;
>  #define MNT_SHARED_MASK(MNT_UNBINDABLE)
>  #

[PATCH v2] Add a "nosymlinks" mount option.

2016-10-19 Thread Mattias Nissler
For mounts that have the new "nosymlinks" option, don't follow
symlinks when resolving paths. The new option is similar in spirit to
the existing "nodev", "noexec", and "nosuid" options.

Note that symlinks may still be created on mounts where the
"nosymlinks" option is present. readlink() remains functional, so user
space code that is aware of symlinks can still choose to follow them
explicitly.

Setting the "nosymlinks" mount option helps prevent privileged writers
from modifying files unintentionally in case there is an unexpected
link along the accessed path. The "nosymlinks" option is thus useful
as a defensive measure for systems that need to deal with untrusted
file systems in privileged contexts.

Signed-off-by: Mattias Nissler <mniss...@chromium.org>
---
 fs/namei.c  | 3 +++
 fs/namespace.c  | 9 ++---
 fs/proc_namespace.c | 1 +
 fs/statfs.c | 2 ++
 include/linux/mount.h   | 3 ++-
 include/linux/statfs.h  | 1 +
 include/uapi/linux/fs.h | 1 +
 7 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 5b4eed2..4cddcf3 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1021,6 +1021,9 @@ const char *get_link(struct nameidata *nd)
touch_atime(>link);
}
 
+   if (nd->path.mnt->mnt_flags & MNT_NOSYMLINKS)
+   return ERR_PTR(-EACCES);
+
error = security_inode_follow_link(dentry, inode,
   nd->flags & LOOKUP_RCU);
if (unlikely(error))
diff --git a/fs/namespace.c b/fs/namespace.c
index e6c234b..deec84e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char __user 
*dir_name,
mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME);
if (flags & MS_RDONLY)
mnt_flags |= MNT_READONLY;
+   if (flags & MS_NOSYMLINKS)
+   mnt_flags |= MNT_NOSYMLINKS;
 
/* The default atime for remount is preservation */
if ((flags & MS_REMOUNT) &&
@@ -2741,9 +2743,10 @@ long do_mount(const char *dev_name, const char __user 
*dir_name,
mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK;
}
 
-   flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN |
-  MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT |
-  MS_STRICTATIME | MS_NOREMOTELOCK);
+   flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_NOSYMLINKS |
+  MS_ACTIVE | MS_BORN | MS_NOATIME | MS_NODIRATIME |
+  MS_RELATIME | MS_KERNMOUNT | MS_STRICTATIME |
+  MS_NOREMOTELOCK);
 
if (flags & MS_REMOUNT)
retval = do_remount(, flags & ~MS_REMOUNT, mnt_flags,
diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index 3f1190d..a1949d9 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -67,6 +67,7 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount 
*mnt)
{ MNT_NOATIME, ",noatime" },
{ MNT_NODIRATIME, ",nodiratime" },
{ MNT_RELATIME, ",relatime" },
+   { MNT_NOSYMLINKS, ",nosymlinks" },
{ 0, NULL }
};
const struct proc_fs_info *fs_infop;
diff --git a/fs/statfs.c b/fs/statfs.c
index 083dc0a..7ff7c32 100644
--- a/fs/statfs.c
+++ b/fs/statfs.c
@@ -27,6 +27,8 @@ static int flags_by_mnt(int mnt_flags)
flags |= ST_NODIRATIME;
if (mnt_flags & MNT_RELATIME)
flags |= ST_RELATIME;
+   if (mnt_flags & MNT_NOSYMLINKS)
+   flags |= ST_NOSYMLINKS;
return flags;
 }
 
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 1172cce..5e302f4 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -28,6 +28,7 @@ struct mnt_namespace;
 #define MNT_NODIRATIME 0x10
 #define MNT_RELATIME   0x20
 #define MNT_READONLY   0x40/* does the user want this to be r/o? */
+#define MNT_NOSYMLINKS 0x80
 
 #define MNT_SHRINKABLE 0x100
 #define MNT_WRITE_HOLD 0x200
@@ -44,7 +45,7 @@ struct mnt_namespace;
 #define MNT_SHARED_MASK(MNT_UNBINDABLE)
 #define MNT_USER_SETTABLE_MASK  (MNT_NOSUID | MNT_NODEV | MNT_NOEXEC \
 | MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME \
-| MNT_READONLY)
+| MNT_READONLY | MNT_NOSYMLINKS)
 #define MNT_ATIME_MASK (MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME )
 
 #define MNT_INTERNAL_FLAGS (MNT_SHARED | MNT_WRITE_HOLD | MNT_INTERNAL | \
diff --git a/include/linux/statfs.h b/include/linux/statfs.h
index 0166d32..994b059 100644
--- a/include/linux/statfs.h
+++ b/include/linux/statfs.h
@@ -39,5 +39,6 @@ struct kstatfs {
 #define ST_NOATIME 0x0400  /* do not update access times

[PATCH v2] Add a "nosymlinks" mount option.

2016-10-19 Thread Mattias Nissler
For mounts that have the new "nosymlinks" option, don't follow
symlinks when resolving paths. The new option is similar in spirit to
the existing "nodev", "noexec", and "nosuid" options.

Note that symlinks may still be created on mounts where the
"nosymlinks" option is present. readlink() remains functional, so user
space code that is aware of symlinks can still choose to follow them
explicitly.

Setting the "nosymlinks" mount option helps prevent privileged writers
from modifying files unintentionally in case there is an unexpected
link along the accessed path. The "nosymlinks" option is thus useful
as a defensive measure for systems that need to deal with untrusted
file systems in privileged contexts.

Signed-off-by: Mattias Nissler 
---
 fs/namei.c  | 3 +++
 fs/namespace.c  | 9 ++---
 fs/proc_namespace.c | 1 +
 fs/statfs.c | 2 ++
 include/linux/mount.h   | 3 ++-
 include/linux/statfs.h  | 1 +
 include/uapi/linux/fs.h | 1 +
 7 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 5b4eed2..4cddcf3 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1021,6 +1021,9 @@ const char *get_link(struct nameidata *nd)
touch_atime(>link);
}
 
+   if (nd->path.mnt->mnt_flags & MNT_NOSYMLINKS)
+   return ERR_PTR(-EACCES);
+
error = security_inode_follow_link(dentry, inode,
   nd->flags & LOOKUP_RCU);
if (unlikely(error))
diff --git a/fs/namespace.c b/fs/namespace.c
index e6c234b..deec84e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char __user 
*dir_name,
mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME);
if (flags & MS_RDONLY)
mnt_flags |= MNT_READONLY;
+   if (flags & MS_NOSYMLINKS)
+   mnt_flags |= MNT_NOSYMLINKS;
 
/* The default atime for remount is preservation */
if ((flags & MS_REMOUNT) &&
@@ -2741,9 +2743,10 @@ long do_mount(const char *dev_name, const char __user 
*dir_name,
mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK;
}
 
-   flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN |
-  MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT |
-  MS_STRICTATIME | MS_NOREMOTELOCK);
+   flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_NOSYMLINKS |
+  MS_ACTIVE | MS_BORN | MS_NOATIME | MS_NODIRATIME |
+  MS_RELATIME | MS_KERNMOUNT | MS_STRICTATIME |
+  MS_NOREMOTELOCK);
 
if (flags & MS_REMOUNT)
retval = do_remount(, flags & ~MS_REMOUNT, mnt_flags,
diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index 3f1190d..a1949d9 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -67,6 +67,7 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount 
*mnt)
{ MNT_NOATIME, ",noatime" },
{ MNT_NODIRATIME, ",nodiratime" },
{ MNT_RELATIME, ",relatime" },
+   { MNT_NOSYMLINKS, ",nosymlinks" },
{ 0, NULL }
};
const struct proc_fs_info *fs_infop;
diff --git a/fs/statfs.c b/fs/statfs.c
index 083dc0a..7ff7c32 100644
--- a/fs/statfs.c
+++ b/fs/statfs.c
@@ -27,6 +27,8 @@ static int flags_by_mnt(int mnt_flags)
flags |= ST_NODIRATIME;
if (mnt_flags & MNT_RELATIME)
flags |= ST_RELATIME;
+   if (mnt_flags & MNT_NOSYMLINKS)
+   flags |= ST_NOSYMLINKS;
return flags;
 }
 
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 1172cce..5e302f4 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -28,6 +28,7 @@ struct mnt_namespace;
 #define MNT_NODIRATIME 0x10
 #define MNT_RELATIME   0x20
 #define MNT_READONLY   0x40/* does the user want this to be r/o? */
+#define MNT_NOSYMLINKS 0x80
 
 #define MNT_SHRINKABLE 0x100
 #define MNT_WRITE_HOLD 0x200
@@ -44,7 +45,7 @@ struct mnt_namespace;
 #define MNT_SHARED_MASK(MNT_UNBINDABLE)
 #define MNT_USER_SETTABLE_MASK  (MNT_NOSUID | MNT_NODEV | MNT_NOEXEC \
 | MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME \
-| MNT_READONLY)
+| MNT_READONLY | MNT_NOSYMLINKS)
 #define MNT_ATIME_MASK (MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME )
 
 #define MNT_INTERNAL_FLAGS (MNT_SHARED | MNT_WRITE_HOLD | MNT_INTERNAL | \
diff --git a/include/linux/statfs.h b/include/linux/statfs.h
index 0166d32..994b059 100644
--- a/include/linux/statfs.h
+++ b/include/linux/statfs.h
@@ -39,5 +39,6 @@ struct kstatfs {
 #define ST_NOATIME 0x0400  /* do not update access times */
 #define ST_NODIRATIME  0x08

Re: [RFC] [PATCH] Add a "nolinks" mount option.

2016-10-19 Thread Mattias Nissler
On Tue, Oct 18, 2016 at 5:14 PM, Colin Walters <walt...@verbum.org> wrote:
>
> On Mon, Oct 17, 2016, at 09:02 AM, Mattias Nissler wrote:
> > OK, no more feedback thus far. Is there generally any interest in a
> > mount option to avoid path name aliasing resulting in target file
> > confusion? Perhaps a version that only disables symlinks instead of
> > also hard-disabling files hard-linked to multiple locations (those are
> > much lower risk for the situation I care about)?
>
> So the situation here is a (privileged) process that is trying to read/write
> to a filesystem tree writable by other processes that are in a separate
> security domain?

More or less. The scenario is that of an attacker gaining root access,
followed by a reboot. The dm-verity-protected root FS raises the bar
for re-exploiting the system after reboot, but the fact that we'll
consume data from the writable file system during boot can (and has
been) used to regain code execution.

You may argue that "all is lost" after the initial root exploit, but
we think there's value in hardening the system to the point where
getting a *persistent* exploit (i.e. retaining control across reboots)
is significantly harder. This increases our chances to successfully
auto-update devices even after a root exploit and thus retroactively
fix them.

>
> That's a classic situation that requires extreme care, and I am doubtful
> that symlinks are the only issue you're facing.  For example, if this
> process is also *parsing* any data there, there's another whole source
> of risk.

Agreed. Essentially, the entire writable file system has to be
regarded as untrusted input. Every process that stores or reads data
needs to do so cautiously. I'm aware this is quite a rabbit hole and
difficult to harden. In particular it'll require attention and changes
across a number of areas. The fact that we've seen multiple exploits
that rely on target file confusion by creating symlinks suggests that
there's value in blocking this as an attack vector though (in
particular given that we have only few legit uses of symlinks that are
relatively easy to clean up).

>
> I suspect for you it wouldn't be too hard to have a "follow untrusted
> path" helper function, it's possible to implement in userspace safely
> with O_NOFOLLOW etc.

Note that O_NOFOLLOW only affects the final path component. If there's
a symlink in any of the parent directories, that'll still be traversed
even with O_NOFOLLOW. This situation is less risky as an attacker will
have to deal with the restriction of a fixed filename in the last
component, but might still be exploitable.

>
> Regardless too, it sounds like what you want more is a
> "same filesystem" traversal (stat and compare devices).
>
> Or does it even need to handle full traversal?  Would it have mitigated
> the security issue to fstat() any files you opened and verified they
> were from the writable partition you expected?

These are all good ideas, and in fact I'm already looking into making
safe helpers to use when dealing with files from the writable file
system. There is an idea to even go a step further and mount the
writable file system in a separate mount namespace (to avoid
accidental access from the rest of the system). Then, the helper tools
would take care of performing file access in said mount namespace and
can prevent malicious symlinks by canonicalizing the requested path
and bailing out if it doesn't match the requested path.

The difficulty lies in applying these measures of precaution
system-wide. This affects most init scripts and daemons, and
everything else that keeps state on the writable file system. Some of
the affected code we have written ourselves so is relatively easy to
fix, but we're also running a number of third party packages for which
things are more complicated. Going through with a fine comb and
auditing all file access is possible in theory, but hardly in
practice. So we'll have to make do with partial auditing and
implementing mitigations that reduce risk for the rest of the code,
which led to the idea of killing the symlink attack vector
systematically.

FWIW, I plan to send an updated patch later that only disables
symlinks for consideration.


Re: [RFC] [PATCH] Add a "nolinks" mount option.

2016-10-19 Thread Mattias Nissler
On Tue, Oct 18, 2016 at 5:14 PM, Colin Walters  wrote:
>
> On Mon, Oct 17, 2016, at 09:02 AM, Mattias Nissler wrote:
> > OK, no more feedback thus far. Is there generally any interest in a
> > mount option to avoid path name aliasing resulting in target file
> > confusion? Perhaps a version that only disables symlinks instead of
> > also hard-disabling files hard-linked to multiple locations (those are
> > much lower risk for the situation I care about)?
>
> So the situation here is a (privileged) process that is trying to read/write
> to a filesystem tree writable by other processes that are in a separate
> security domain?

More or less. The scenario is that of an attacker gaining root access,
followed by a reboot. The dm-verity-protected root FS raises the bar
for re-exploiting the system after reboot, but the fact that we'll
consume data from the writable file system during boot can (and has
been) used to regain code execution.

You may argue that "all is lost" after the initial root exploit, but
we think there's value in hardening the system to the point where
getting a *persistent* exploit (i.e. retaining control across reboots)
is significantly harder. This increases our chances to successfully
auto-update devices even after a root exploit and thus retroactively
fix them.

>
> That's a classic situation that requires extreme care, and I am doubtful
> that symlinks are the only issue you're facing.  For example, if this
> process is also *parsing* any data there, there's another whole source
> of risk.

Agreed. Essentially, the entire writable file system has to be
regarded as untrusted input. Every process that stores or reads data
needs to do so cautiously. I'm aware this is quite a rabbit hole and
difficult to harden. In particular it'll require attention and changes
across a number of areas. The fact that we've seen multiple exploits
that rely on target file confusion by creating symlinks suggests that
there's value in blocking this as an attack vector though (in
particular given that we have only few legit uses of symlinks that are
relatively easy to clean up).

>
> I suspect for you it wouldn't be too hard to have a "follow untrusted
> path" helper function, it's possible to implement in userspace safely
> with O_NOFOLLOW etc.

Note that O_NOFOLLOW only affects the final path component. If there's
a symlink in any of the parent directories, that'll still be traversed
even with O_NOFOLLOW. This situation is less risky as an attacker will
have to deal with the restriction of a fixed filename in the last
component, but might still be exploitable.

>
> Regardless too, it sounds like what you want more is a
> "same filesystem" traversal (stat and compare devices).
>
> Or does it even need to handle full traversal?  Would it have mitigated
> the security issue to fstat() any files you opened and verified they
> were from the writable partition you expected?

These are all good ideas, and in fact I'm already looking into making
safe helpers to use when dealing with files from the writable file
system. There is an idea to even go a step further and mount the
writable file system in a separate mount namespace (to avoid
accidental access from the rest of the system). Then, the helper tools
would take care of performing file access in said mount namespace and
can prevent malicious symlinks by canonicalizing the requested path
and bailing out if it doesn't match the requested path.

The difficulty lies in applying these measures of precaution
system-wide. This affects most init scripts and daemons, and
everything else that keeps state on the writable file system. Some of
the affected code we have written ourselves so is relatively easy to
fix, but we're also running a number of third party packages for which
things are more complicated. Going through with a fine comb and
auditing all file access is possible in theory, but hardly in
practice. So we'll have to make do with partial auditing and
implementing mitigations that reduce risk for the rest of the code,
which led to the idea of killing the symlink attack vector
systematically.

FWIW, I plan to send an updated patch later that only disables
symlinks for consideration.


Re: [RFC] [PATCH] Add a "nolinks" mount option.

2016-10-17 Thread Mattias Nissler
OK, no more feedback thus far. Is there generally any interest in a
mount option to avoid path name aliasing resulting in target file
confusion? Perhaps a version that only disables symlinks instead of
also hard-disabling files hard-linked to multiple locations (those are
much lower risk for the situation I care about)?

If there is interest, I'm happy to iterate the patch until it's
accepted. If there's no interest, that's fine too - I'll then likely
resort to moving the restrictions desired for Chrome OS into an LSM we
compile into our kernels.

On Fri, Oct 14, 2016 at 6:22 PM, Mattias Nissler <mniss...@chromium.org> wrote:
> Forgot to mention: I realize my motivation is very specific to Chrome
> OS, however the nolinks option seemed useful also as a mitigation to
> generic privilege escalation symlink attacks, for cases where
> disabling symlinks/hardlinks is acceptable.
>
> On Fri, Oct 14, 2016 at 5:50 PM, Mattias Nissler <mniss...@chromium.org> 
> wrote:
>> On Fri, Oct 14, 2016 at 5:00 PM, Al Viro <v...@zeniv.linux.org.uk> wrote:
>>>
>>> On Fri, Oct 14, 2016 at 03:55:15PM +0100, Al Viro wrote:
>>> > > Setting the "nolinks" mount option helps prevent privileged writers
>>> > > from modifying files unintentionally in case there is an unexpected
>>> > > link along the accessed path. The "nolinks" option is thus useful as a
>>> > > defensive measure against persistent exploits (i.e. a system getting
>>> > > re-exploited after a reboot) for systems that employ a read-only or
>>> > > dm-verity-protected rootfs. These systems prevent non-legit binaries
>>> > > from running after reboot. However, legit code typically still reads
>>> > > from and writes to a writable file system previously under full
>>> > > control of the attacker, who can place symlinks to trick file writes
>>> > > after reboot to target a file of their choice. "nolinks" fundamentally
>>> > > prevents this.
>>> >
>>> > Which parts of the tree would be on that "protected" rootfs and which 
>>> > would
>>> > you mount with that option?  Description above is rather vague and I'm
>>> > not convinced that it actually buys you anything.  Details, please...
>>
>> Apologies for the vague description, I'm happy to explain in detail.
>>
>> In case of Chrome OS, we have all binaries on a dm-verity rootfs, so
>> an attacker can't modify any binaries. After reboot, everything except
>> the rootfs is mounted noexec, so there's no way to re-gain code
>> execution after reboot by modifying existing binaries or dropping new
>> ones.
>>
>> We've seen multiple exploits now where the attacker worked around
>> these limitations in two steps:
>>
>> 1. Before reboot, the attacker sets up symlinks on the writeable file
>> system (called "stateful" file system), which are later accessed by
>> legit boot code (such as init scripts) after reboot. For example, an
>> init script that copies file A to B can be abused by an attacker by
>> symlinking or hardlinking B to a location C of their choice, and
>> placing desired data to be written to C in A. That gives the attacker
>> a primitive to write data of their choice to a path of their choice
>> after reboot. Note that this primitive may target locations _outside_
>> the stateful file system the attacker previously had control of.
>> Particularly of interest are targets on /sys, but also tmpfs on /run
>> etc.
>>
>> 2. The second step for a successful attack is finding some legit code
>> invoked in the boot flow that has a vulnerability exploitable by
>> feeding it unexpected data. As an example, there are Linux userspace
>> utilities that read config from /run which may contain shell commands
>> the the utility executes, through which the attacker can gain code
>> execution again.
>>
>> The purpose of the proposed patch is to raise the bar for the first
>> step of the attack: Writing arbitrary files after reboot. I'm
>> intending to mount the stateful file system with the nolinks option
>> (or otherwise prevent symlink traversal). This will help make sure
>> that any legit writes taking place during boot in init scripts etc. go
>> to the files intended by the developer, and can't be redirected by an
>> attacker.
>>
>> Does this make more sense to you?
>>
>>>
>>>
>>> PS: what the hell do restrictions on _following_ symlinks have to _creating_
>>> hardlinks?  I'm trying to imagine a threat model where both would 

Re: [RFC] [PATCH] Add a "nolinks" mount option.

2016-10-17 Thread Mattias Nissler
OK, no more feedback thus far. Is there generally any interest in a
mount option to avoid path name aliasing resulting in target file
confusion? Perhaps a version that only disables symlinks instead of
also hard-disabling files hard-linked to multiple locations (those are
much lower risk for the situation I care about)?

If there is interest, I'm happy to iterate the patch until it's
accepted. If there's no interest, that's fine too - I'll then likely
resort to moving the restrictions desired for Chrome OS into an LSM we
compile into our kernels.

On Fri, Oct 14, 2016 at 6:22 PM, Mattias Nissler  wrote:
> Forgot to mention: I realize my motivation is very specific to Chrome
> OS, however the nolinks option seemed useful also as a mitigation to
> generic privilege escalation symlink attacks, for cases where
> disabling symlinks/hardlinks is acceptable.
>
> On Fri, Oct 14, 2016 at 5:50 PM, Mattias Nissler  
> wrote:
>> On Fri, Oct 14, 2016 at 5:00 PM, Al Viro  wrote:
>>>
>>> On Fri, Oct 14, 2016 at 03:55:15PM +0100, Al Viro wrote:
>>> > > Setting the "nolinks" mount option helps prevent privileged writers
>>> > > from modifying files unintentionally in case there is an unexpected
>>> > > link along the accessed path. The "nolinks" option is thus useful as a
>>> > > defensive measure against persistent exploits (i.e. a system getting
>>> > > re-exploited after a reboot) for systems that employ a read-only or
>>> > > dm-verity-protected rootfs. These systems prevent non-legit binaries
>>> > > from running after reboot. However, legit code typically still reads
>>> > > from and writes to a writable file system previously under full
>>> > > control of the attacker, who can place symlinks to trick file writes
>>> > > after reboot to target a file of their choice. "nolinks" fundamentally
>>> > > prevents this.
>>> >
>>> > Which parts of the tree would be on that "protected" rootfs and which 
>>> > would
>>> > you mount with that option?  Description above is rather vague and I'm
>>> > not convinced that it actually buys you anything.  Details, please...
>>
>> Apologies for the vague description, I'm happy to explain in detail.
>>
>> In case of Chrome OS, we have all binaries on a dm-verity rootfs, so
>> an attacker can't modify any binaries. After reboot, everything except
>> the rootfs is mounted noexec, so there's no way to re-gain code
>> execution after reboot by modifying existing binaries or dropping new
>> ones.
>>
>> We've seen multiple exploits now where the attacker worked around
>> these limitations in two steps:
>>
>> 1. Before reboot, the attacker sets up symlinks on the writeable file
>> system (called "stateful" file system), which are later accessed by
>> legit boot code (such as init scripts) after reboot. For example, an
>> init script that copies file A to B can be abused by an attacker by
>> symlinking or hardlinking B to a location C of their choice, and
>> placing desired data to be written to C in A. That gives the attacker
>> a primitive to write data of their choice to a path of their choice
>> after reboot. Note that this primitive may target locations _outside_
>> the stateful file system the attacker previously had control of.
>> Particularly of interest are targets on /sys, but also tmpfs on /run
>> etc.
>>
>> 2. The second step for a successful attack is finding some legit code
>> invoked in the boot flow that has a vulnerability exploitable by
>> feeding it unexpected data. As an example, there are Linux userspace
>> utilities that read config from /run which may contain shell commands
>> the the utility executes, through which the attacker can gain code
>> execution again.
>>
>> The purpose of the proposed patch is to raise the bar for the first
>> step of the attack: Writing arbitrary files after reboot. I'm
>> intending to mount the stateful file system with the nolinks option
>> (or otherwise prevent symlink traversal). This will help make sure
>> that any legit writes taking place during boot in init scripts etc. go
>> to the files intended by the developer, and can't be redirected by an
>> attacker.
>>
>> Does this make more sense to you?
>>
>>>
>>>
>>> PS: what the hell do restrictions on _following_ symlinks have to _creating_
>>> hardlinks?  I'm trying to imagine a threat model where both would apply or
>>> anything else beyond the word "link" they would have in c

Re: [RFC] [PATCH] Add a "nolinks" mount option.

2016-10-14 Thread Mattias Nissler
Forgot to mention: I realize my motivation is very specific to Chrome
OS, however the nolinks option seemed useful also as a mitigation to
generic privilege escalation symlink attacks, for cases where
disabling symlinks/hardlinks is acceptable.

On Fri, Oct 14, 2016 at 5:50 PM, Mattias Nissler <mniss...@chromium.org> wrote:
> On Fri, Oct 14, 2016 at 5:00 PM, Al Viro <v...@zeniv.linux.org.uk> wrote:
>>
>> On Fri, Oct 14, 2016 at 03:55:15PM +0100, Al Viro wrote:
>> > > Setting the "nolinks" mount option helps prevent privileged writers
>> > > from modifying files unintentionally in case there is an unexpected
>> > > link along the accessed path. The "nolinks" option is thus useful as a
>> > > defensive measure against persistent exploits (i.e. a system getting
>> > > re-exploited after a reboot) for systems that employ a read-only or
>> > > dm-verity-protected rootfs. These systems prevent non-legit binaries
>> > > from running after reboot. However, legit code typically still reads
>> > > from and writes to a writable file system previously under full
>> > > control of the attacker, who can place symlinks to trick file writes
>> > > after reboot to target a file of their choice. "nolinks" fundamentally
>> > > prevents this.
>> >
>> > Which parts of the tree would be on that "protected" rootfs and which would
>> > you mount with that option?  Description above is rather vague and I'm
>> > not convinced that it actually buys you anything.  Details, please...
>
> Apologies for the vague description, I'm happy to explain in detail.
>
> In case of Chrome OS, we have all binaries on a dm-verity rootfs, so
> an attacker can't modify any binaries. After reboot, everything except
> the rootfs is mounted noexec, so there's no way to re-gain code
> execution after reboot by modifying existing binaries or dropping new
> ones.
>
> We've seen multiple exploits now where the attacker worked around
> these limitations in two steps:
>
> 1. Before reboot, the attacker sets up symlinks on the writeable file
> system (called "stateful" file system), which are later accessed by
> legit boot code (such as init scripts) after reboot. For example, an
> init script that copies file A to B can be abused by an attacker by
> symlinking or hardlinking B to a location C of their choice, and
> placing desired data to be written to C in A. That gives the attacker
> a primitive to write data of their choice to a path of their choice
> after reboot. Note that this primitive may target locations _outside_
> the stateful file system the attacker previously had control of.
> Particularly of interest are targets on /sys, but also tmpfs on /run
> etc.
>
> 2. The second step for a successful attack is finding some legit code
> invoked in the boot flow that has a vulnerability exploitable by
> feeding it unexpected data. As an example, there are Linux userspace
> utilities that read config from /run which may contain shell commands
> the the utility executes, through which the attacker can gain code
> execution again.
>
> The purpose of the proposed patch is to raise the bar for the first
> step of the attack: Writing arbitrary files after reboot. I'm
> intending to mount the stateful file system with the nolinks option
> (or otherwise prevent symlink traversal). This will help make sure
> that any legit writes taking place during boot in init scripts etc. go
> to the files intended by the developer, and can't be redirected by an
> attacker.
>
> Does this make more sense to you?
>
>>
>>
>> PS: what the hell do restrictions on _following_ symlinks have to _creating_
>> hardlinks?  I'm trying to imagine a threat model where both would apply or
>> anything else beyond the word "link" they would have in common...
>
> The restriction is not on _creating_ hard links, but _opening_
> hardlinks. The commonality is in the confusion between the file you're
> meaning to write vs. the file you actually end up writing to, which
> stems from the fact that as things stand a file can be accessible on
> other paths than its canonical one. For Chrome OS, I'd like to get to
> a point where most privileged code can only access a file via its
> canonical name (bind mounts are an OK exception as they're not
> persistent, so out of reach for manipulation).
>
>>
>> The one you've described above might have something to do with the first
>> one (modulo missing description of the setup you have in mind), but it
>> clearly has nothing to do with the second - attackers could've created
>> whatever they wanted while the fs had been under their control, after all.
>> Doesn't make sense...


Re: [RFC] [PATCH] Add a "nolinks" mount option.

2016-10-14 Thread Mattias Nissler
Forgot to mention: I realize my motivation is very specific to Chrome
OS, however the nolinks option seemed useful also as a mitigation to
generic privilege escalation symlink attacks, for cases where
disabling symlinks/hardlinks is acceptable.

On Fri, Oct 14, 2016 at 5:50 PM, Mattias Nissler  wrote:
> On Fri, Oct 14, 2016 at 5:00 PM, Al Viro  wrote:
>>
>> On Fri, Oct 14, 2016 at 03:55:15PM +0100, Al Viro wrote:
>> > > Setting the "nolinks" mount option helps prevent privileged writers
>> > > from modifying files unintentionally in case there is an unexpected
>> > > link along the accessed path. The "nolinks" option is thus useful as a
>> > > defensive measure against persistent exploits (i.e. a system getting
>> > > re-exploited after a reboot) for systems that employ a read-only or
>> > > dm-verity-protected rootfs. These systems prevent non-legit binaries
>> > > from running after reboot. However, legit code typically still reads
>> > > from and writes to a writable file system previously under full
>> > > control of the attacker, who can place symlinks to trick file writes
>> > > after reboot to target a file of their choice. "nolinks" fundamentally
>> > > prevents this.
>> >
>> > Which parts of the tree would be on that "protected" rootfs and which would
>> > you mount with that option?  Description above is rather vague and I'm
>> > not convinced that it actually buys you anything.  Details, please...
>
> Apologies for the vague description, I'm happy to explain in detail.
>
> In case of Chrome OS, we have all binaries on a dm-verity rootfs, so
> an attacker can't modify any binaries. After reboot, everything except
> the rootfs is mounted noexec, so there's no way to re-gain code
> execution after reboot by modifying existing binaries or dropping new
> ones.
>
> We've seen multiple exploits now where the attacker worked around
> these limitations in two steps:
>
> 1. Before reboot, the attacker sets up symlinks on the writeable file
> system (called "stateful" file system), which are later accessed by
> legit boot code (such as init scripts) after reboot. For example, an
> init script that copies file A to B can be abused by an attacker by
> symlinking or hardlinking B to a location C of their choice, and
> placing desired data to be written to C in A. That gives the attacker
> a primitive to write data of their choice to a path of their choice
> after reboot. Note that this primitive may target locations _outside_
> the stateful file system the attacker previously had control of.
> Particularly of interest are targets on /sys, but also tmpfs on /run
> etc.
>
> 2. The second step for a successful attack is finding some legit code
> invoked in the boot flow that has a vulnerability exploitable by
> feeding it unexpected data. As an example, there are Linux userspace
> utilities that read config from /run which may contain shell commands
> the the utility executes, through which the attacker can gain code
> execution again.
>
> The purpose of the proposed patch is to raise the bar for the first
> step of the attack: Writing arbitrary files after reboot. I'm
> intending to mount the stateful file system with the nolinks option
> (or otherwise prevent symlink traversal). This will help make sure
> that any legit writes taking place during boot in init scripts etc. go
> to the files intended by the developer, and can't be redirected by an
> attacker.
>
> Does this make more sense to you?
>
>>
>>
>> PS: what the hell do restrictions on _following_ symlinks have to _creating_
>> hardlinks?  I'm trying to imagine a threat model where both would apply or
>> anything else beyond the word "link" they would have in common...
>
> The restriction is not on _creating_ hard links, but _opening_
> hardlinks. The commonality is in the confusion between the file you're
> meaning to write vs. the file you actually end up writing to, which
> stems from the fact that as things stand a file can be accessible on
> other paths than its canonical one. For Chrome OS, I'd like to get to
> a point where most privileged code can only access a file via its
> canonical name (bind mounts are an OK exception as they're not
> persistent, so out of reach for manipulation).
>
>>
>> The one you've described above might have something to do with the first
>> one (modulo missing description of the setup you have in mind), but it
>> clearly has nothing to do with the second - attackers could've created
>> whatever they wanted while the fs had been under their control, after all.
>> Doesn't make sense...


Re: [RFC] [PATCH] Add a "nolinks" mount option.

2016-10-14 Thread Mattias Nissler
On Fri, Oct 14, 2016 at 5:00 PM, Al Viro  wrote:
>
> On Fri, Oct 14, 2016 at 03:55:15PM +0100, Al Viro wrote:
> > > Setting the "nolinks" mount option helps prevent privileged writers
> > > from modifying files unintentionally in case there is an unexpected
> > > link along the accessed path. The "nolinks" option is thus useful as a
> > > defensive measure against persistent exploits (i.e. a system getting
> > > re-exploited after a reboot) for systems that employ a read-only or
> > > dm-verity-protected rootfs. These systems prevent non-legit binaries
> > > from running after reboot. However, legit code typically still reads
> > > from and writes to a writable file system previously under full
> > > control of the attacker, who can place symlinks to trick file writes
> > > after reboot to target a file of their choice. "nolinks" fundamentally
> > > prevents this.
> >
> > Which parts of the tree would be on that "protected" rootfs and which would
> > you mount with that option?  Description above is rather vague and I'm
> > not convinced that it actually buys you anything.  Details, please...

Apologies for the vague description, I'm happy to explain in detail.

In case of Chrome OS, we have all binaries on a dm-verity rootfs, so
an attacker can't modify any binaries. After reboot, everything except
the rootfs is mounted noexec, so there's no way to re-gain code
execution after reboot by modifying existing binaries or dropping new
ones.

We've seen multiple exploits now where the attacker worked around
these limitations in two steps:

1. Before reboot, the attacker sets up symlinks on the writeable file
system (called "stateful" file system), which are later accessed by
legit boot code (such as init scripts) after reboot. For example, an
init script that copies file A to B can be abused by an attacker by
symlinking or hardlinking B to a location C of their choice, and
placing desired data to be written to C in A. That gives the attacker
a primitive to write data of their choice to a path of their choice
after reboot. Note that this primitive may target locations _outside_
the stateful file system the attacker previously had control of.
Particularly of interest are targets on /sys, but also tmpfs on /run
etc.

2. The second step for a successful attack is finding some legit code
invoked in the boot flow that has a vulnerability exploitable by
feeding it unexpected data. As an example, there are Linux userspace
utilities that read config from /run which may contain shell commands
the the utility executes, through which the attacker can gain code
execution again.

The purpose of the proposed patch is to raise the bar for the first
step of the attack: Writing arbitrary files after reboot. I'm
intending to mount the stateful file system with the nolinks option
(or otherwise prevent symlink traversal). This will help make sure
that any legit writes taking place during boot in init scripts etc. go
to the files intended by the developer, and can't be redirected by an
attacker.

Does this make more sense to you?

>
>
> PS: what the hell do restrictions on _following_ symlinks have to _creating_
> hardlinks?  I'm trying to imagine a threat model where both would apply or
> anything else beyond the word "link" they would have in common...

The restriction is not on _creating_ hard links, but _opening_
hardlinks. The commonality is in the confusion between the file you're
meaning to write vs. the file you actually end up writing to, which
stems from the fact that as things stand a file can be accessible on
other paths than its canonical one. For Chrome OS, I'd like to get to
a point where most privileged code can only access a file via its
canonical name (bind mounts are an OK exception as they're not
persistent, so out of reach for manipulation).

>
> The one you've described above might have something to do with the first
> one (modulo missing description of the setup you have in mind), but it
> clearly has nothing to do with the second - attackers could've created
> whatever they wanted while the fs had been under their control, after all.
> Doesn't make sense...


Re: [RFC] [PATCH] Add a "nolinks" mount option.

2016-10-14 Thread Mattias Nissler
On Fri, Oct 14, 2016 at 5:00 PM, Al Viro  wrote:
>
> On Fri, Oct 14, 2016 at 03:55:15PM +0100, Al Viro wrote:
> > > Setting the "nolinks" mount option helps prevent privileged writers
> > > from modifying files unintentionally in case there is an unexpected
> > > link along the accessed path. The "nolinks" option is thus useful as a
> > > defensive measure against persistent exploits (i.e. a system getting
> > > re-exploited after a reboot) for systems that employ a read-only or
> > > dm-verity-protected rootfs. These systems prevent non-legit binaries
> > > from running after reboot. However, legit code typically still reads
> > > from and writes to a writable file system previously under full
> > > control of the attacker, who can place symlinks to trick file writes
> > > after reboot to target a file of their choice. "nolinks" fundamentally
> > > prevents this.
> >
> > Which parts of the tree would be on that "protected" rootfs and which would
> > you mount with that option?  Description above is rather vague and I'm
> > not convinced that it actually buys you anything.  Details, please...

Apologies for the vague description, I'm happy to explain in detail.

In case of Chrome OS, we have all binaries on a dm-verity rootfs, so
an attacker can't modify any binaries. After reboot, everything except
the rootfs is mounted noexec, so there's no way to re-gain code
execution after reboot by modifying existing binaries or dropping new
ones.

We've seen multiple exploits now where the attacker worked around
these limitations in two steps:

1. Before reboot, the attacker sets up symlinks on the writeable file
system (called "stateful" file system), which are later accessed by
legit boot code (such as init scripts) after reboot. For example, an
init script that copies file A to B can be abused by an attacker by
symlinking or hardlinking B to a location C of their choice, and
placing desired data to be written to C in A. That gives the attacker
a primitive to write data of their choice to a path of their choice
after reboot. Note that this primitive may target locations _outside_
the stateful file system the attacker previously had control of.
Particularly of interest are targets on /sys, but also tmpfs on /run
etc.

2. The second step for a successful attack is finding some legit code
invoked in the boot flow that has a vulnerability exploitable by
feeding it unexpected data. As an example, there are Linux userspace
utilities that read config from /run which may contain shell commands
the the utility executes, through which the attacker can gain code
execution again.

The purpose of the proposed patch is to raise the bar for the first
step of the attack: Writing arbitrary files after reboot. I'm
intending to mount the stateful file system with the nolinks option
(or otherwise prevent symlink traversal). This will help make sure
that any legit writes taking place during boot in init scripts etc. go
to the files intended by the developer, and can't be redirected by an
attacker.

Does this make more sense to you?

>
>
> PS: what the hell do restrictions on _following_ symlinks have to _creating_
> hardlinks?  I'm trying to imagine a threat model where both would apply or
> anything else beyond the word "link" they would have in common...

The restriction is not on _creating_ hard links, but _opening_
hardlinks. The commonality is in the confusion between the file you're
meaning to write vs. the file you actually end up writing to, which
stems from the fact that as things stand a file can be accessible on
other paths than its canonical one. For Chrome OS, I'd like to get to
a point where most privileged code can only access a file via its
canonical name (bind mounts are an OK exception as they're not
persistent, so out of reach for manipulation).

>
> The one you've described above might have something to do with the first
> one (modulo missing description of the setup you have in mind), but it
> clearly has nothing to do with the second - attackers could've created
> whatever they wanted while the fs had been under their control, after all.
> Doesn't make sense...


[RFC] [PATCH] Add a "nolinks" mount option.

2016-10-14 Thread Mattias Nissler
For mounts that have the new "nolinks" option, don't follow symlinks
and reject to open files with a hard link count larger than one. The
new option is similar in spirit to the existing "nodev", "noexec", and
"nosuid" options.

Note that symlinks and hard links may still be created on mounts where
the "nolinks" option is present. readlink() remains functional, so
user space code that is aware of symlinks can still choose to follow
them explicitly. Similarly, hard-linked files can be identified from
userspace using stat() output while the "nolinks" option is set.

Setting the "nolinks" mount option helps prevent privileged writers
from modifying files unintentionally in case there is an unexpected
link along the accessed path. The "nolinks" option is thus useful as a
defensive measure against persistent exploits (i.e. a system getting
re-exploited after a reboot) for systems that employ a read-only or
dm-verity-protected rootfs. These systems prevent non-legit binaries
from running after reboot. However, legit code typically still reads
from and writes to a writable file system previously under full
control of the attacker, who can place symlinks to trick file writes
after reboot to target a file of their choice. "nolinks" fundamentally
prevents this.

Signed-off-by: Mattias Nissler <mniss...@chromium.org>
---
 fs/namei.c  | 9 -
 fs/namespace.c  | 8 +---
 fs/proc_namespace.c | 1 +
 include/linux/mount.h   | 3 ++-
 include/uapi/linux/fs.h | 1 +
 5 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index a7f601c..f152687 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1021,6 +1021,10 @@ const char *get_link(struct nameidata *nd)
touch_atime(>link);
}
 
+   error = -EACCES;
+   if (nd->path.mnt->mnt_flags & MNT_NOLINKS)
+   return ERR_PTR(error);
+
error = security_inode_follow_link(dentry, inode,
   nd->flags & LOOKUP_RCU);
if (unlikely(error))
@@ -2919,7 +2923,10 @@ static int may_open(struct path *path, int acc_mode, int 
flag)
case S_IFIFO:
case S_IFSOCK:
flag &= ~O_TRUNC;
-   break;
+   /*FALLTHRU*/
+   default:
+   if ((path->mnt->mnt_flags & MNT_NOLINKS) && inode->i_nlink > 1)
+   return -EACCES;
}
 
error = inode_permission(inode, MAY_OPEN | acc_mode);
diff --git a/fs/namespace.c b/fs/namespace.c
index 58aca9c..c421fbb 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char __user 
*dir_name,
mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME);
if (flags & MS_RDONLY)
mnt_flags |= MNT_READONLY;
+   if (flags & MS_NOLINKS)
+   mnt_flags |= MNT_NOLINKS;
 
/* The default atime for remount is preservation */
if ((flags & MS_REMOUNT) &&
@@ -2741,9 +2743,9 @@ long do_mount(const char *dev_name, const char __user 
*dir_name,
mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK;
}
 
-   flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN |
-  MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT |
-  MS_STRICTATIME | MS_NOREMOTELOCK);
+   flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_NOLINKS | MS_ACTIVE |
+  MS_BORN | MS_NOATIME | MS_NODIRATIME | MS_RELATIME |
+  MS_KERNMOUNT | MS_STRICTATIME | MS_NOREMOTELOCK);
 
if (flags & MS_REMOUNT)
retval = do_remount(, flags & ~MS_REMOUNT, mnt_flags,
diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index 3f1190d..b5d9d35 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -67,6 +67,7 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount 
*mnt)
{ MNT_NOATIME, ",noatime" },
{ MNT_NODIRATIME, ",nodiratime" },
{ MNT_RELATIME, ",relatime" },
+   { MNT_NOLINKS, ",nolinks" },
{ 0, NULL }
};
const struct proc_fs_info *fs_infop;
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 1172cce..df4eb6a 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -28,6 +28,7 @@ struct mnt_namespace;
 #define MNT_NODIRATIME 0x10
 #define MNT_RELATIME   0x20
 #define MNT_READONLY   0x40/* does the user want this to be r/o? */
+#define MNT_NOLINKS0x80
 
 #define MNT_SHRINKABLE 0x100
 #define MNT_WRITE_HOLD 0x200
@@ -44,7 +45,7 @@ struct mnt_namespace;
 #define MNT_SHARED_MASK(MNT_UNBINDABLE)
 #define MNT_USER_SETTABLE_MASK  (MNT_NOSUID | MNT_NODEV | MNT_NOEXEC \
   

[RFC] [PATCH] Add a "nolinks" mount option.

2016-10-14 Thread Mattias Nissler
For mounts that have the new "nolinks" option, don't follow symlinks
and reject to open files with a hard link count larger than one. The
new option is similar in spirit to the existing "nodev", "noexec", and
"nosuid" options.

Note that symlinks and hard links may still be created on mounts where
the "nolinks" option is present. readlink() remains functional, so
user space code that is aware of symlinks can still choose to follow
them explicitly. Similarly, hard-linked files can be identified from
userspace using stat() output while the "nolinks" option is set.

Setting the "nolinks" mount option helps prevent privileged writers
from modifying files unintentionally in case there is an unexpected
link along the accessed path. The "nolinks" option is thus useful as a
defensive measure against persistent exploits (i.e. a system getting
re-exploited after a reboot) for systems that employ a read-only or
dm-verity-protected rootfs. These systems prevent non-legit binaries
from running after reboot. However, legit code typically still reads
from and writes to a writable file system previously under full
control of the attacker, who can place symlinks to trick file writes
after reboot to target a file of their choice. "nolinks" fundamentally
prevents this.

Signed-off-by: Mattias Nissler 
---
 fs/namei.c  | 9 -
 fs/namespace.c  | 8 +---
 fs/proc_namespace.c | 1 +
 include/linux/mount.h   | 3 ++-
 include/uapi/linux/fs.h | 1 +
 5 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index a7f601c..f152687 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1021,6 +1021,10 @@ const char *get_link(struct nameidata *nd)
touch_atime(>link);
}
 
+   error = -EACCES;
+   if (nd->path.mnt->mnt_flags & MNT_NOLINKS)
+   return ERR_PTR(error);
+
error = security_inode_follow_link(dentry, inode,
   nd->flags & LOOKUP_RCU);
if (unlikely(error))
@@ -2919,7 +2923,10 @@ static int may_open(struct path *path, int acc_mode, int 
flag)
case S_IFIFO:
case S_IFSOCK:
flag &= ~O_TRUNC;
-   break;
+   /*FALLTHRU*/
+   default:
+   if ((path->mnt->mnt_flags & MNT_NOLINKS) && inode->i_nlink > 1)
+   return -EACCES;
}
 
error = inode_permission(inode, MAY_OPEN | acc_mode);
diff --git a/fs/namespace.c b/fs/namespace.c
index 58aca9c..c421fbb 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const char __user 
*dir_name,
mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME);
if (flags & MS_RDONLY)
mnt_flags |= MNT_READONLY;
+   if (flags & MS_NOLINKS)
+   mnt_flags |= MNT_NOLINKS;
 
/* The default atime for remount is preservation */
if ((flags & MS_REMOUNT) &&
@@ -2741,9 +2743,9 @@ long do_mount(const char *dev_name, const char __user 
*dir_name,
mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK;
}
 
-   flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN |
-  MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT |
-  MS_STRICTATIME | MS_NOREMOTELOCK);
+   flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_NOLINKS | MS_ACTIVE |
+  MS_BORN | MS_NOATIME | MS_NODIRATIME | MS_RELATIME |
+  MS_KERNMOUNT | MS_STRICTATIME | MS_NOREMOTELOCK);
 
if (flags & MS_REMOUNT)
retval = do_remount(, flags & ~MS_REMOUNT, mnt_flags,
diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index 3f1190d..b5d9d35 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -67,6 +67,7 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount 
*mnt)
{ MNT_NOATIME, ",noatime" },
{ MNT_NODIRATIME, ",nodiratime" },
{ MNT_RELATIME, ",relatime" },
+   { MNT_NOLINKS, ",nolinks" },
{ 0, NULL }
};
const struct proc_fs_info *fs_infop;
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 1172cce..df4eb6a 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -28,6 +28,7 @@ struct mnt_namespace;
 #define MNT_NODIRATIME 0x10
 #define MNT_RELATIME   0x20
 #define MNT_READONLY   0x40/* does the user want this to be r/o? */
+#define MNT_NOLINKS0x80
 
 #define MNT_SHRINKABLE 0x100
 #define MNT_WRITE_HOLD 0x200
@@ -44,7 +45,7 @@ struct mnt_namespace;
 #define MNT_SHARED_MASK(MNT_UNBINDABLE)
 #define MNT_USER_SETTABLE_MASK  (MNT_NOSUID | MNT_NODEV | MNT_NOEXEC \
 | MNT_NOATIME | 

Re: debugfs_create_{u,x,s}{8,16,32,64}

2014-08-14 Thread Mattias Nissler
On Wed, Aug 13, 2014 at 11:57 PM, Greg KH  wrote:
>
> On Wed, Aug 13, 2014 at 02:57:05PM -0700, Matt Longnecker wrote:
> > Greg,
> >
> > Back in December 2007 Mattias Nissler proposed a patch defining
> > debugfs_create_s32 and friends. In it he reworked the already existent u and
> > x versions. The subject of Mattias's mail was "[PATCH] debugfs: Revamp
> > debugfs_create_{u,x,s}{8,16,32,64} to support signed integers".
> >
> > As I'm writing a new driver related to thermal management, I'm finding the
> > absence of debugfs_create_s32 to be a stumbling block. I see at least two
> > other folks who have stumbled on the absence of debugfs_create_s32 in the
> > past few years.
> >
> > Why didn't we merge Mattias's patch?
>
> I can't remember why I didn't merge a patch from last week, let alone
> from 2007, are you crazy?  :)

Needless to say, I don't remember details either. It might just have
been a case of not adding further debugfs API as long as there's
little demand to justify the addition. The latter may have changed in
the last 7 years (Matt hints at other potential consumers?), but I
personally can't tell as I haven't done any kernel work in more recent
times.

>
>
> > Would you entertain a similar patch today?
>
> I always entertain all patches, you never need to ask if you are allowed
> to send a patch, that's like asking "Am I allowed to ask a question?"...
>
> greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: debugfs_create_{u,x,s}{8,16,32,64}

2014-08-14 Thread Mattias Nissler
On Wed, Aug 13, 2014 at 11:57 PM, Greg KH gre...@linuxfoundation.org wrote:

 On Wed, Aug 13, 2014 at 02:57:05PM -0700, Matt Longnecker wrote:
  Greg,
 
  Back in December 2007 Mattias Nissler proposed a patch defining
  debugfs_create_s32 and friends. In it he reworked the already existent u and
  x versions. The subject of Mattias's mail was [PATCH] debugfs: Revamp
  debugfs_create_{u,x,s}{8,16,32,64} to support signed integers.
 
  As I'm writing a new driver related to thermal management, I'm finding the
  absence of debugfs_create_s32 to be a stumbling block. I see at least two
  other folks who have stumbled on the absence of debugfs_create_s32 in the
  past few years.
 
  Why didn't we merge Mattias's patch?

 I can't remember why I didn't merge a patch from last week, let alone
 from 2007, are you crazy?  :)

Needless to say, I don't remember details either. It might just have
been a case of not adding further debugfs API as long as there's
little demand to justify the addition. The latter may have changed in
the last 7 years (Matt hints at other potential consumers?), but I
personally can't tell as I haven't done any kernel work in more recent
times.



  Would you entertain a similar patch today?

 I always entertain all patches, you never need to ask if you are allowed
 to send a patch, that's like asking Am I allowed to ask a question?...

 greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Silent crash witk 2-6.24-rc8-git4

2008-01-24 Thread Mattias Nissler

On Thu, 2008-01-24 at 00:03 +, Chris Clayton wrote:
> Thanks for the reply, John.
> 
> On Wednesday 23 January 2008, John W. Linville wrote:
> > On Wed, Jan 23, 2008 at 04:42:58PM +, Chris Clayton wrote:
> > > On Wednesday 23 January 2008, Chris Clayton wrote:
> > > > Hi again.
> > > > 
> > > > On Tuesday 22 January 2008, Chris Clayton wrote:
> > > > > Hi,
> > > 
> > > and here's another, slightly more complete, one, this time from 
> > > 2.6.24-rc8-git6
> > > (the none I sent earlier was from -git4, by the way)
> > > 
> > > Call Trace:
> > > [] __update_rq_clock+0x1a/0xf0
> > > [] rt2x00lib_txdone+0x9d/0xd0 [rt2x00lib]
> > > [] rt61pci_txdone+0x153/0x1f0 [rt61pci]
> > > [] rt61pci_interrupt+0x9d/0xb0 [rt61pci]
> > 
> > I suspect this could relate to this commit:
> > 
> > commit 62bc060b8ed5fcdafd87da5ab17bdd59a39ebcc9
> > Author: Mattias Nissler <[EMAIL PROTECTED]>
> > Date:   Mon Nov 12 15:03:12 2007 +0100
> > 
> > rt2x00: Allow rt61 to catch up after a missing tx report
> > 
> > Could you revert that patch, rebuild, and try to recreate the problem?
> > 
> 
> I've reverted the patch by hand - the original patch on linux-wireless 
> doesn't match the present state of the code
> in rt61pci.c and I'm not a git user. I'm pretty sure I've got it right, 
> however.
> 
> Problem now is that we are back where we were a few weeks ago because my 
> wireless connection dies
> almost immediately after I start the download and from dmesg I see:
> 
> phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the 
> non-full queue 2.
> Please file bug report to http://rt2x00.serialmonkey.com.

As the patches author, I'm not very surprised :-) The patch just adds
additional processing that should have happened in a different call of
the interrupt handler. I don't think it can be related at all to the
trouble you're seeing.

When I find time, I'll try the Linus tree on my rt61 to see what
happens.

Mattias

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Silent crash witk 2-6.24-rc8-git4

2008-01-24 Thread Mattias Nissler

On Thu, 2008-01-24 at 00:03 +, Chris Clayton wrote:
 Thanks for the reply, John.
 
 On Wednesday 23 January 2008, John W. Linville wrote:
  On Wed, Jan 23, 2008 at 04:42:58PM +, Chris Clayton wrote:
   On Wednesday 23 January 2008, Chris Clayton wrote:
Hi again.

On Tuesday 22 January 2008, Chris Clayton wrote:
 Hi,
   
   and here's another, slightly more complete, one, this time from 
   2.6.24-rc8-git6
   (the none I sent earlier was from -git4, by the way)
   
   Call Trace:
   [c011440a] __update_rq_clock+0x1a/0xf0
   [e08ff59d] rt2x00lib_txdone+0x9d/0xd0 [rt2x00lib]
   [e0909573] rt61pci_txdone+0x153/0x1f0 [rt61pci]
   [e09096ad] rt61pci_interrupt+0x9d/0xb0 [rt61pci]
  
  I suspect this could relate to this commit:
  
  commit 62bc060b8ed5fcdafd87da5ab17bdd59a39ebcc9
  Author: Mattias Nissler [EMAIL PROTECTED]
  Date:   Mon Nov 12 15:03:12 2007 +0100
  
  rt2x00: Allow rt61 to catch up after a missing tx report
  
  Could you revert that patch, rebuild, and try to recreate the problem?
  
 
 I've reverted the patch by hand - the original patch on linux-wireless 
 doesn't match the present state of the code
 in rt61pci.c and I'm not a git user. I'm pretty sure I've got it right, 
 however.
 
 Problem now is that we are back where we were a few weeks ago because my 
 wireless connection dies
 almost immediately after I start the download and from dmesg I see:
 
 phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free entry in the 
 non-full queue 2.
 Please file bug report to http://rt2x00.serialmonkey.com.

As the patches author, I'm not very surprised :-) The patch just adds
additional processing that should have happened in a different call of
the interrupt handler. I don't think it can be related at all to the
trouble you're seeing.

When I find time, I'll try the Linus tree on my rt61 to see what
happens.

Mattias

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Spammed logs again rt2500pci 2.6.24-rc8

2008-01-18 Thread Mattias Nissler
Hi,

we currently think this is due to a race condition in the packet queue
code. Ivo is currently reworking the packet queues, which I hope will
resolve this situation.

Mattias

On Thu, 2008-01-17 at 22:57 +0100, Alejandro Riveira Fernández wrote:
> Running
> 
> Linux Varda 2.6.24-rc8 #1 SMP PREEMPT Wed Jan 16 15:09:43 CET 2008 x86_64 
> GNU/Linux
> 
> my logs are spmmed with:
> 
> 
> [20126.343240] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free 
> entry in the non-full queue 1.
> [20126.343242] Please file bug report to http://rt2x00.serialmonkey.com.
> [20126.848277] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free 
> entry in the non-full queue 1.
> [20126.848280] Please file bug report to http://rt2x00.serialmonkey.com.
> [20126.851693] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free 
> entry in the non-full queue 1.
> [20126.851696] Please file bug report to http://rt2x00.serialmonkey.com.
> [20126.851854] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free 
> entry in the non-full queue 1.
> [20126.851856] Please file bug report to http://rt2x00.serialmonkey.com.
> [20126.852082] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free 
> entry in the non-full queue 1.
> [20126.852084] Please file bug report to http://rt2x00.serialmonkey.com.
> [20126.852224] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free 
> entry in the non-full queue 1.
> [20126.852226] Please file bug report to http://rt2x00.serialmonkey.com.
> [20127.349691] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free 
> entry in the non-full queue 1.
> [20127.349694] Please file bug report to http://rt2x00.serialmonkey.com.
> [20127.352425] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free 
> entry in the non-full queue 1.
> [20127.352428] Please file bug report to http://rt2x00.serialmonkey.com.
> [20127.853345] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free 
> entry in the non-full queue 1.
> [20127.853348] Please file bug report to http://rt2x00.serialmonkey.com.
> [20127.853699] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free 
> entry in the non-full queue 1.
> [20127.853701] Please file bug report to http://rt2x00.serialmonkey.com.
> [20128.858569] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free 
> entry in the non-full queue 1.
> [20128.858572] Please file bug report to http://rt2x00.serialmonkey.com.
> [20131.875550] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free 
> entry in the non-full queue 1.
> [20131.875554] Please file bug report to http://rt2x00.serialmonkey.com.
> [20154.693853] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free 
> entry in the non-full queue 1.
> [20154.693856] Please file bug report to http://rt2x00.serialmonkey.com.
> [20185.527974] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free 
> entry in the non-full queue 1.
> [20185.527977] Please file bug report to http://rt2x00.serialmonkey.com.
> [20189.812334] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free 
> entry in the non-full queue 1.
> [20189.812337] Please file bug report to http://rt2x00.serialmonkey.com.
> [20189.812374] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free 
> entry in the non-full queue 1.
> [20189.812376] Please file bug report to http://rt2x00.serialmonkey.com.
> [20189.812507] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free 
> entry in the non-full queue 1.
> [20189.812509] Please file bug report to http://rt2x00.serialmonkey.com.
> [20189.813100] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free 
> entry in the non-full queue 1.
> [20189.813102] Please file bug report to http://rt2x00.serialmonkey.com.
> [20189.813474] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free 
> entry in the non-full queue 1.
> [20189.813476] Please file bug report to http://rt2x00.serialmonkey.com.
> [20189.813909] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free 
> entry in the non-full queue 1.
> [20189.813911] Please file bug report to http://rt2x00.serialmonkey.com.
> [20212.242858] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free 
> entry in the non-full queue 1.
> [20212.242861] Please file bug report to http://rt2x00.serialmonkey.com.
> [20212.243587] phy0 -> rt2x00pci_write_tx_data: Error - Arrived at non-free 
> entry in the non-full queue 1.
> [20212.243589] Please file bug report to http://rt2x00.serialmonkey.com.
> 
> 
> As the last time the thing is triggered with a p2p app making a lot of 
> connections (this time not bt but mldonkey)
> Back to top   
> View user's profile Send private message  
> -
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to 

Re: Spammed logs again rt2500pci 2.6.24-rc8

2008-01-18 Thread Mattias Nissler
Hi,

we currently think this is due to a race condition in the packet queue
code. Ivo is currently reworking the packet queues, which I hope will
resolve this situation.

Mattias

On Thu, 2008-01-17 at 22:57 +0100, Alejandro Riveira Fernández wrote:
 Running
 
 Linux Varda 2.6.24-rc8 #1 SMP PREEMPT Wed Jan 16 15:09:43 CET 2008 x86_64 
 GNU/Linux
 
 my logs are spmmed with:
 
 
 [20126.343240] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free 
 entry in the non-full queue 1.
 [20126.343242] Please file bug report to http://rt2x00.serialmonkey.com.
 [20126.848277] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free 
 entry in the non-full queue 1.
 [20126.848280] Please file bug report to http://rt2x00.serialmonkey.com.
 [20126.851693] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free 
 entry in the non-full queue 1.
 [20126.851696] Please file bug report to http://rt2x00.serialmonkey.com.
 [20126.851854] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free 
 entry in the non-full queue 1.
 [20126.851856] Please file bug report to http://rt2x00.serialmonkey.com.
 [20126.852082] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free 
 entry in the non-full queue 1.
 [20126.852084] Please file bug report to http://rt2x00.serialmonkey.com.
 [20126.852224] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free 
 entry in the non-full queue 1.
 [20126.852226] Please file bug report to http://rt2x00.serialmonkey.com.
 [20127.349691] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free 
 entry in the non-full queue 1.
 [20127.349694] Please file bug report to http://rt2x00.serialmonkey.com.
 [20127.352425] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free 
 entry in the non-full queue 1.
 [20127.352428] Please file bug report to http://rt2x00.serialmonkey.com.
 [20127.853345] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free 
 entry in the non-full queue 1.
 [20127.853348] Please file bug report to http://rt2x00.serialmonkey.com.
 [20127.853699] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free 
 entry in the non-full queue 1.
 [20127.853701] Please file bug report to http://rt2x00.serialmonkey.com.
 [20128.858569] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free 
 entry in the non-full queue 1.
 [20128.858572] Please file bug report to http://rt2x00.serialmonkey.com.
 [20131.875550] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free 
 entry in the non-full queue 1.
 [20131.875554] Please file bug report to http://rt2x00.serialmonkey.com.
 [20154.693853] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free 
 entry in the non-full queue 1.
 [20154.693856] Please file bug report to http://rt2x00.serialmonkey.com.
 [20185.527974] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free 
 entry in the non-full queue 1.
 [20185.527977] Please file bug report to http://rt2x00.serialmonkey.com.
 [20189.812334] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free 
 entry in the non-full queue 1.
 [20189.812337] Please file bug report to http://rt2x00.serialmonkey.com.
 [20189.812374] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free 
 entry in the non-full queue 1.
 [20189.812376] Please file bug report to http://rt2x00.serialmonkey.com.
 [20189.812507] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free 
 entry in the non-full queue 1.
 [20189.812509] Please file bug report to http://rt2x00.serialmonkey.com.
 [20189.813100] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free 
 entry in the non-full queue 1.
 [20189.813102] Please file bug report to http://rt2x00.serialmonkey.com.
 [20189.813474] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free 
 entry in the non-full queue 1.
 [20189.813476] Please file bug report to http://rt2x00.serialmonkey.com.
 [20189.813909] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free 
 entry in the non-full queue 1.
 [20189.813911] Please file bug report to http://rt2x00.serialmonkey.com.
 [20212.242858] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free 
 entry in the non-full queue 1.
 [20212.242861] Please file bug report to http://rt2x00.serialmonkey.com.
 [20212.243587] phy0 - rt2x00pci_write_tx_data: Error - Arrived at non-free 
 entry in the non-full queue 1.
 [20212.243589] Please file bug report to http://rt2x00.serialmonkey.com.
 
 
 As the last time the thing is triggered with a p2p app making a lot of 
 connections (this time not bt but mldonkey)
 Back to top   
 View user's profile Send private message  
 -
 To unsubscribe from this list: send the line unsubscribe linux-wireless in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

Re: [PATCH] debugfs: Revamp debugfs_create_{u,x,s}{8,16,32,64} to support signed integers

2007-12-16 Thread Mattias Nissler

On Sun, 2007-12-16 at 09:05 -0800, Greg KH wrote:
> On Sun, Dec 16, 2007 at 05:37:59PM +0100, Mattias Nissler wrote:
> > This makes debugfs use its own file_operations for the value accessor files
> > created by debugfs_create_XXX. Having that, we can also have proper versions
> > for signed integers.
> 
> Why not tweak the "SIMPLE_ATTRIBUTE" code to support this instead?  That
> way debugfs and all other filesystems could also use these attributes?

I expected that question ;-) Actually, I had a version that did this.
But it ended up being ugly, cause SIMPLE_ATTRIBUTEs expect to access the
values via get/set functions using u64. Here is what I did and why I
didn't like it:

* I made the set/get function interface more generic (as it is in the
debugfs patch), i.e. getting passed the buf, so they have to
decode/encode whatever they like into the buf instead of working with
u64
* This means either all code using SIMPLE_ATTRIBUTES must be changed, or
I need to add another wrapper macro that produces suitable get/set
functions from the u64 versions it gets passed.
* Changing all SIMPLE_ATTRIBUTE users is not what I want, cause moving
the csnprintf()/strtoull() calls out of the SIMPLE_ATTRIBUTE code means
it becomes less simple to use it, right?
* The wrapper is actually possible, but it suffers from cases where
somebody passes NULL for a get or set function (which is actually
expected, compare the code in libfs.c) It is possible to hack around
this, but it's ugly.

Then I thought, well, SIMPLE_ATTRIBUTEs are made to work with a set/get
function pair. But debugfs expects a pointer to an actual variable
anyway, so just bypass the SIMPLE_ATTRIBUTE code and make debugfs use
it's own file_operations (most of them shamelessly stolen from the
SIMPLE_ATTRIBUTE code, I admit).

Mattias

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] debugfs: Revamp debugfs_create_{u,x,s}{8,16,32,64} to support signed integers

2007-12-16 Thread Mattias Nissler
This makes debugfs use its own file_operations for the value accessor files
created by debugfs_create_XXX. Having that, we can also have proper versions
for signed integers.

Signed-off-by: Mattias Nissler <[EMAIL PROTECTED]>
---

When writing some debugfs code for mac80211 I wanted to have access to
some s32 variables. I thought it's better to make debugfs support signed
integers instead of adding the functionality locally. I assume generic
versions will be useful for other people as well.

Please CC me for replies.


 fs/debugfs/file.c   |  291 +++
 include/linux/debugfs.h |   45 +++
 2 files changed, 287 insertions(+), 49 deletions(-)

diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
index fa6b7f7..8808bc0 100644
--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -56,18 +56,121 @@ const struct inode_operations debugfs_link_operations = {
.follow_link= debugfs_follow_link,
 };
 
-static void debugfs_u8_set(void *data, u64 val)
+/* Simple numeric attributes */
+struct debugfs_num_attr
 {
-   *(u8 *)data = val;
+   void (*get)(void *, char *);
+   void (*set)(void *, char *);
+   char get_buf[24];
+   char set_buf[24];
+   void *pnum;
+   struct mutex mutex;
+};
+
+static int debugfs_num_attr_open(struct inode *inode, struct file *file,
+void (*get)(void *, char*),
+void (*set)(void *, char*))
+{
+   struct debugfs_num_attr *attr;
+
+   attr = kmalloc(sizeof(*attr), GFP_KERNEL);
+   if (!attr)
+   return -ENOMEM;
+
+   attr->get = get;
+   attr->set = set;
+   attr->pnum = inode->i_private;
+   mutex_init(>mutex);
+
+   file->private_data = attr;
+
+   return nonseekable_open(inode, file);
+}
+
+static int debugfs_num_attr_close(struct inode *inode, struct file *file)
+{
+   kfree(file->private_data);
+
+   return 0;
+}
+
+static ssize_t debugfs_num_attr_read(struct file *file, char __user *buf,
+size_t len, loff_t *ppos)
+{
+   struct debugfs_num_attr *attr;
+   size_t size;
+   ssize_t ret;
+
+   attr = file->private_data;
+
+   mutex_lock(>mutex);
+   if (!*ppos) {
+   /* first read */
+   attr->get(attr->pnum, attr->get_buf);
+   attr->get_buf[sizeof(attr->get_buf) - 1] = '\0';
+   }
+
+   size = strlen(attr->get_buf);
+   ret = simple_read_from_buffer(buf, len, ppos, attr->get_buf, size);
+   mutex_unlock(>mutex);
+
+   return ret;
 }
-static u64 debugfs_u8_get(void *data)
+
+static ssize_t debugfs_num_attr_write(struct file *file, const char __user 
*buf,
+ size_t len, loff_t *ppos)
 {
-   return *(u8 *)data;
+   struct debugfs_num_attr *attr;
+   size_t size;
+   ssize_t ret;
+
+   attr = file->private_data;
+
+   mutex_lock(>mutex);
+   ret = -EFAULT;
+   size = min(sizeof(attr->set_buf) - 1, len);
+   if (copy_from_user(attr->set_buf, buf, size))
+   goto out;
+
+   ret = len; /* claim we got the whole input */
+   attr->set_buf[size] = '\0';
+   attr->set(attr->pnum, attr->set_buf);
+out:
+   mutex_unlock(>mutex);
+   return ret;
 }
-DEFINE_SIMPLE_ATTRIBUTE(fops_u8, debugfs_u8_get, debugfs_u8_set, "%llu\n");
+
+#define DEFINE_NUM_ATTR(__fops, __type, __format)  \
+static void __fops ## _get(void *data, char *buf)  \
+{  \
+   scnprintf(buf, 24, __format, *(__type *) data); \
+}  \
+static void __fops ## _set(void *data, char *buf)  \
+{  \
+   sscanf(buf, __format, (__type *) data); \
+}  \
+static int __fops ## _open(struct inode *inode, struct file *file) \
+{  \
+   return debugfs_num_attr_open(inode, file, __fops ## _get,   \
+__fops ## _set);   \
+}  \
+static struct file_operations __fops = {   \
+   .owner   = THIS_MODULE, \
+   .open= __fops ## _open, \
+   .release = debugfs_num_attr_close,  \
+   .read= debugfs_num_attr_read,   \
+   .write   = debugfs_num_attr_write,  \
+};
+
+DE

[PATCH] debugfs: Revamp debugfs_create_{u,x,s}{8,16,32,64} to support signed integers

2007-12-16 Thread Mattias Nissler
This makes debugfs use its own file_operations for the value accessor files
created by debugfs_create_XXX. Having that, we can also have proper versions
for signed integers.

Signed-off-by: Mattias Nissler [EMAIL PROTECTED]
---

When writing some debugfs code for mac80211 I wanted to have access to
some s32 variables. I thought it's better to make debugfs support signed
integers instead of adding the functionality locally. I assume generic
versions will be useful for other people as well.

Please CC me for replies.


 fs/debugfs/file.c   |  291 +++
 include/linux/debugfs.h |   45 +++
 2 files changed, 287 insertions(+), 49 deletions(-)

diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
index fa6b7f7..8808bc0 100644
--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -56,18 +56,121 @@ const struct inode_operations debugfs_link_operations = {
.follow_link= debugfs_follow_link,
 };
 
-static void debugfs_u8_set(void *data, u64 val)
+/* Simple numeric attributes */
+struct debugfs_num_attr
 {
-   *(u8 *)data = val;
+   void (*get)(void *, char *);
+   void (*set)(void *, char *);
+   char get_buf[24];
+   char set_buf[24];
+   void *pnum;
+   struct mutex mutex;
+};
+
+static int debugfs_num_attr_open(struct inode *inode, struct file *file,
+void (*get)(void *, char*),
+void (*set)(void *, char*))
+{
+   struct debugfs_num_attr *attr;
+
+   attr = kmalloc(sizeof(*attr), GFP_KERNEL);
+   if (!attr)
+   return -ENOMEM;
+
+   attr-get = get;
+   attr-set = set;
+   attr-pnum = inode-i_private;
+   mutex_init(attr-mutex);
+
+   file-private_data = attr;
+
+   return nonseekable_open(inode, file);
+}
+
+static int debugfs_num_attr_close(struct inode *inode, struct file *file)
+{
+   kfree(file-private_data);
+
+   return 0;
+}
+
+static ssize_t debugfs_num_attr_read(struct file *file, char __user *buf,
+size_t len, loff_t *ppos)
+{
+   struct debugfs_num_attr *attr;
+   size_t size;
+   ssize_t ret;
+
+   attr = file-private_data;
+
+   mutex_lock(attr-mutex);
+   if (!*ppos) {
+   /* first read */
+   attr-get(attr-pnum, attr-get_buf);
+   attr-get_buf[sizeof(attr-get_buf) - 1] = '\0';
+   }
+
+   size = strlen(attr-get_buf);
+   ret = simple_read_from_buffer(buf, len, ppos, attr-get_buf, size);
+   mutex_unlock(attr-mutex);
+
+   return ret;
 }
-static u64 debugfs_u8_get(void *data)
+
+static ssize_t debugfs_num_attr_write(struct file *file, const char __user 
*buf,
+ size_t len, loff_t *ppos)
 {
-   return *(u8 *)data;
+   struct debugfs_num_attr *attr;
+   size_t size;
+   ssize_t ret;
+
+   attr = file-private_data;
+
+   mutex_lock(attr-mutex);
+   ret = -EFAULT;
+   size = min(sizeof(attr-set_buf) - 1, len);
+   if (copy_from_user(attr-set_buf, buf, size))
+   goto out;
+
+   ret = len; /* claim we got the whole input */
+   attr-set_buf[size] = '\0';
+   attr-set(attr-pnum, attr-set_buf);
+out:
+   mutex_unlock(attr-mutex);
+   return ret;
 }
-DEFINE_SIMPLE_ATTRIBUTE(fops_u8, debugfs_u8_get, debugfs_u8_set, %llu\n);
+
+#define DEFINE_NUM_ATTR(__fops, __type, __format)  \
+static void __fops ## _get(void *data, char *buf)  \
+{  \
+   scnprintf(buf, 24, __format, *(__type *) data); \
+}  \
+static void __fops ## _set(void *data, char *buf)  \
+{  \
+   sscanf(buf, __format, (__type *) data); \
+}  \
+static int __fops ## _open(struct inode *inode, struct file *file) \
+{  \
+   return debugfs_num_attr_open(inode, file, __fops ## _get,   \
+__fops ## _set);   \
+}  \
+static struct file_operations __fops = {   \
+   .owner   = THIS_MODULE, \
+   .open= __fops ## _open, \
+   .release = debugfs_num_attr_close,  \
+   .read= debugfs_num_attr_read,   \
+   .write   = debugfs_num_attr_write,  \
+};
+
+DEFINE_NUM_ATTR(fops_u8, u8, %hhu\n)
+DEFINE_NUM_ATTR(fops_u16, u16, %hu\n)
+DEFINE_NUM_ATTR(fops_u32, u32

Re: [PATCH] debugfs: Revamp debugfs_create_{u,x,s}{8,16,32,64} to support signed integers

2007-12-16 Thread Mattias Nissler

On Sun, 2007-12-16 at 09:05 -0800, Greg KH wrote:
 On Sun, Dec 16, 2007 at 05:37:59PM +0100, Mattias Nissler wrote:
  This makes debugfs use its own file_operations for the value accessor files
  created by debugfs_create_XXX. Having that, we can also have proper versions
  for signed integers.
 
 Why not tweak the SIMPLE_ATTRIBUTE code to support this instead?  That
 way debugfs and all other filesystems could also use these attributes?

I expected that question ;-) Actually, I had a version that did this.
But it ended up being ugly, cause SIMPLE_ATTRIBUTEs expect to access the
values via get/set functions using u64. Here is what I did and why I
didn't like it:

* I made the set/get function interface more generic (as it is in the
debugfs patch), i.e. getting passed the buf, so they have to
decode/encode whatever they like into the buf instead of working with
u64
* This means either all code using SIMPLE_ATTRIBUTES must be changed, or
I need to add another wrapper macro that produces suitable get/set
functions from the u64 versions it gets passed.
* Changing all SIMPLE_ATTRIBUTE users is not what I want, cause moving
the csnprintf()/strtoull() calls out of the SIMPLE_ATTRIBUTE code means
it becomes less simple to use it, right?
* The wrapper is actually possible, but it suffers from cases where
somebody passes NULL for a get or set function (which is actually
expected, compare the code in libfs.c) It is possible to hack around
this, but it's ugly.

Then I thought, well, SIMPLE_ATTRIBUTEs are made to work with a set/get
function pair. But debugfs expects a pointer to an actual variable
anyway, so just bypass the SIMPLE_ATTRIBUTE code and make debugfs use
it's own file_operations (most of them shamelessly stolen from the
SIMPLE_ATTRIBUTE code, I admit).

Mattias

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/