from:"Ondrej Mosnacek"

[PATCH] selinux: fix NULL dereference in policydb_destroy()

2019-03-17 Thread Ondrej Mosnacek

The conversion to kvmalloc() forgot to account for the possibility that
p->type_attr_map_array might be null in policydb_destroy().

Fix this by destroying its contents only if it is not NULL.

Also make sure ebitmap_init() is called on all entries before
policydb_destroy() can be called. Right now this is a no-op, because
both kvcalloc() and ebitmap_init() just zero out the whole struct, but
let's rather not rely on a specific implementation.

Reported-by: syzbot+a57b2aff60832666f...@syzkaller.appspotmail.com
Fixes: acdf52d97f82 ("selinux: convert to kvmalloc")
Signed-off-by: Ondrej Mosnacek 
---
 security/selinux/ss/policydb.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

NOTE: This applies directly on top of current Linus' tree, since the
problematic commit is not present in the selinux/stable-5.1 branch.

diff --git a/security/selinux/ss/policydb.c b/security/selinux/ss/policydb.c
index 6b576e588725..daecdfb15a9c 100644
--- a/security/selinux/ss/policydb.c
+++ b/security/selinux/ss/policydb.c
@@ -828,9 +828,11 @@ void policydb_destroy(struct policydb *p)
hashtab_map(p->range_tr, range_tr_destroy, NULL);
hashtab_destroy(p->range_tr);
 
-   for (i = 0; i < p->p_types.nprim; i++)
-   ebitmap_destroy(&p->type_attr_map_array[i]);
-   kvfree(p->type_attr_map_array);
+   if (p->type_attr_map_array) {
+   for (i = 0; i < p->p_types.nprim; i++)
+   ebitmap_destroy(&p->type_attr_map_array[i]);
+   kvfree(p->type_attr_map_array);
+   }
 
ebitmap_destroy(&p->filename_trans_ttypes);
ebitmap_destroy(&p->policycaps);
@@ -2496,10 +2498,13 @@ int policydb_read(struct policydb *p, void *fp)
if (!p->type_attr_map_array)
goto bad;
 
+   /* just in case ebitmap_init() becomes more than just a memset(0): */
+   for (i = 0; i < p->p_types.nprim; i++)
+   ebitmap_init(&p->type_attr_map_array[i]);
+
for (i = 0; i < p->p_types.nprim; i++) {
struct ebitmap *e = &p->type_attr_map_array[i];
 
-   ebitmap_init(e);
if (p->policyvers >= POLICYDB_VERSION_AVTAB) {
rc = ebitmap_read(e, fp);
if (rc)
-- 
2.20.1

Re: [PATCH] ntp: Allow TAI-UTC offset to be set to zero

2019-04-17 Thread Ondrej Mosnacek

On Wed, Apr 17, 2019 at 10:48 AM Miroslav Lichvar  wrote:
> The ADJ_TAI adjtimex mode sets the TAI-UTC offset of the system clock.
> It is typically set by NTP/PTP implementations and it is automatically
> updated by the kernel on leap seconds. The initial value is zero (which
> applications may interpret as unknown), but this value cannot be set by
> adjtimex. This limitation seems to go back to the original "nanokernel"
> implementation by David Mills.
>
> Change the ADJ_TAI check to accept zero as a valid TAI-UTC offset in
> order to allow setting it back to the initial value.
>
> Cc: Thomas Gleixner 
> Cc: John Stultz 
> Cc: Richard Cochran 
> Cc: Prarit Bhargava 
> Suggested-by: Ondrej Mosnacek 
> Signed-off-by: Miroslav Lichvar 

Thanks for sending the patch! Maybe you (or the committer) could
consider adding:

Fixes: 153b5d054ac2 ("ntp: support for TAI")

so that it gets to the stable kernels as well.

> ---
>  kernel/time/ntp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
> index 92a90014a925..f43d47c8c3b6 100644
> --- a/kernel/time/ntp.c
> +++ b/kernel/time/ntp.c
> @@ -690,7 +690,7 @@ static inline void process_adjtimex_modes(const struct 
> __kernel_timex *txc,
> time_constant = max(time_constant, 0l);
> }
>
> -   if (txc->modes & ADJ_TAI && txc->constant > 0)
> +   if (txc->modes & ADJ_TAI && txc->constant >= 0)
> *time_tai = txc->constant;
>
> if (txc->modes & ADJ_OFFSET)
> --
> 2.17.2

-- 
Ondrej Mosnacek 
Software Engineer, Security Technologies
Red Hat, Inc.

Re: [PATCH] ntp: Allow TAI-UTC offset to be set to zero

2019-04-18 Thread Ondrej Mosnacek

On Thu, Apr 18, 2019 at 11:07 AM Thomas Gleixner  wrote:
> On Thu, 18 Apr 2019, Miroslav Lichvar wrote:
> > On Wed, Apr 17, 2019 at 11:00:23AM +0200, Ondrej Mosnacek wrote:
> > > On Wed, Apr 17, 2019 at 10:48 AM Miroslav Lichvar  
> > > wrote:
> > > > Change the ADJ_TAI check to accept zero as a valid TAI-UTC offset in
> > > > order to allow setting it back to the initial value.
> >
> > > Thanks for sending the patch! Maybe you (or the committer) could
> > > consider adding:
> > >
> > > Fixes: 153b5d054ac2 ("ntp: support for TAI")
> >
> > To me the change looks more like an extension of the API, rather than
> > a bug fix, so I'd leave that up to the committer.
>
> I don't see why we need to backport that all the way, but I'm happy to add
> the tag if there is some really good reason.

OK, I can live without it, it's just that it kind of breaks the tests
for the recent time auditing patches [1] (I try to reset to previous
values when triggering events). I worked around it for now by
resetting the value to 1 when original value was 0... I don't know if
it can lead to some issues or not, but it would be nice if I could
reset to the actual original value...

[1] 
https://github.com/linux-audit/audit-testsuite/pull/82/files#diff-555d96083b536100d9b7e3eea56feadbR76

-- 
Ondrej Mosnacek 
Software Engineer, Security Technologies
Red Hat, Inc.

Re: [kernfs] e19dfdc83b: BUG:KASAN:global-out-of-bounds_in_s

2019-03-26 Thread Ondrej Mosnacek

On Mon, Mar 25, 2019 at 6:06 PM Ondrej Mosnacek  wrote:
> On Mon, Mar 25, 2019 at 4:17 PM Paul Moore  wrote:
> > Ondrej, please look into this.
> >
> > You've looked at this code more recently than I have, but it looks
> > like there might be an issue with __kernfs_iattrs() returning a
> > pointer to a kernfs_iattrs object without taking a kernfs reference
> > (kernfs_get(kn)).  Although I would be a little surprised if this was
> > the problem as I think it would cause a number of issues beyond just
> > this one ... ?
>
> I think this is actually because of how xattr_full_name() reconstructs
> the full name from the xattr suffix. It assumes that the suffix was
> obtained from the full name by just taking a pointer inside it, but in
> kernfs_security_xattr_get/set() I pass the suffix directly... I'm
> surprised that this didn't fail spectacularly earlier during testing.
> Maybe the newer GCC does some clever merging of the string constants,
> so that XATTR_SELINUX_SUFFIX actually ends up as a substring of
> XATTR_NAME_SELINUX? (That would be one hell of a "lucky" coincidence
> :)
>
> I'll post a patch that converts kernfs_security_xattr_get/set() to
> take the full name and hopefully that will fix the problem. I'll see
> if I can run the reproducer locally tomorrow...

I managed to reproduce the KASAN warning in my kernel testing
environment by simply enabling CONFIG_KASAN and running the cgroupfs
issue reproducer from the original patchset. With the patch I posted I
no longer get the warning, so I believe it really fixes the problem.

--
Ondrej Mosnacek 
Software Engineer, Security Technologies
Red Hat, Inc.

Re: general protection fault in selinux_socket_recvmsg

2020-05-23 Thread Ondrej Mosnacek

dc00 RBX:  RCX: 4000
> RDX: 0003 RSI: 83543bb9 RDI: 0018
> RBP: dc00 R08: 88809f45a180 R09: 
> R10:  R11:  R12: dc00
> R13: c900019d7d78 R14: 1000 R15: 4000
> FS:  7f5ffb311700() GS:8880ae60() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7f5ffb2efe78 CR3: a33c1000 CR4: 001406f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkal...@googlegroups.com.
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> syzbot can test patches for this bug, for details see:
> https://goo.gl/tpsmEJ#testing-patches
>

-- 
Ondrej Mosnacek 
Software Engineer, Security Technologies
Red Hat, Inc.

Re: [PATCH] twist: allow converting pr_devel()/pr_debug() into printk(KERN_DEBUG)

2020-05-24 Thread Ondrej Mosnacek

On Sun, May 24, 2020 at 7:38 PM Joe Perches  wrote:
> On Sun, 2020-05-24 at 23:50 +0900, Tetsuo Handa wrote:
> > syzbot found a NULL pointer dereference bug inside mptcp_recvmsg() due to
> > ssock == NULL, but this bug manifested inside selinux_socket_recvmsg()
> > because pr_debug() was no-op [1].
> >
> >   pr_debug("fallback-read subflow=%p",
> >mptcp_subflow_ctx(ssock->sk));
> >   copied = sock_recvmsg(ssock, msg, flags);
>
> > Since console loglevel used by syzkaller will not print KERN_DEBUG
> > messages to consoles, always evaluating pr_devel()/pr_debug() messages
> > will not cause too much console output. Thus, let's allow fuzzers to
> > always evaluate pr_devel()/pr_debug() messages.
>
> While I think this is rather unnecessary,
> what about dev_dbg/netdev_dbg/netif_dbg et al ?

I'm also not sure if this is really worth it... It would help localize
the bug in this specific case, but there is nothing systematic about
it. Are there that many debug print statements that dereference
pointers that are later passed to functions, but not dereferenced
otherwise? Maybe yes, but it seems to be quite an optimistic
assumption... I don't consider it such a big problem that a bug in
function X only manifests itself deeper in the callchain. There will
always be such bugs, no matter how many moles you whack.

That said, I'm not strongly opposed to the change either, I just
wanted to state my opinion in case my reply to the syzbot report [1]
gave the impression that I considered the "misattribution" as
something that needs to be fixed :)

[1] 
https://lore.kernel.org/selinux/CAFqZXNvf+oJs9u4H97u7=jtl2wo_hkf4nzdzjld7tnc_j0k...@mail.gmail.com/

--
Ondrej Mosnacek 
Software Engineer, Security Technologies
Red Hat, Inc.

Re: [PATCH 1/2] audit: show user land backtrace as part of audit context messages

2021-02-02 Thread Ondrej Mosnacek

On Tue, Feb 2, 2021 at 10:46 PM Paul Moore  wrote:
> On Tue, Feb 2, 2021 at 4:44 PM Daniel Walker (danielwa)
>  wrote:
> > On Tue, Feb 02, 2021 at 04:35:42PM -0500, Paul Moore wrote:
> > > On Tue, Feb 2, 2021 at 4:29 PM Daniel Walker  wrote:
> > > > From: Victor Kamensky 
> > > >
> > > > To efficiently find out where SELinux AVC denial is comming from
> > > > take backtrace of user land process and display it as type=UBACKTRACE
> > > > message that comes as audit context for SELinux AVC and other audit
> > > > messages ...
> > >
> > > Have you tried the new perf tracepoint for SELinux AVC decisions that
> > > trigger an audit event?  It's a new feature for v5.10 and looks to
> > > accomplish most of what you are looking for with this patch.
> > >
> > > * https://www.paul-moore.com/blog/d/2020/12/linux_v510.html
> >
> > We haven't tried it, but I can look into it. We're not using v5.10 
> > extensively
> > yet.
>
> Let us know if that works for you, and if it doesn't, let us know what
> might be missing.  I hate seeing the kernel grow multiple features
> which do the same thing.

I agree - I played around with this new tracepoint recently and you
can use it to achieve what you want quite easily:

# collect traces for denials (just interrupt/kill the sleep process
when done) - will create a perf.data file you can analyze later
perf record -a -e avc:selinux_audited -g --call-graph=dwarf sleep infinity
# dump all collected backtraces from the perf.data file
perf script

It's a bit complicated if you want to have it running in the
background permanently as a service (you need to tell perf to dump the
recorded data before you can read it), but perf already has some
command-line options to help with that use case.

--
Ondrej Mosnacek
Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.

Re: [PATCH 7/7 v2] tracing: Do not create tracefs files if tracefs lockdown is in effect

2021-04-13 Thread Ondrej Mosnacek

On Sat, Oct 12, 2019 at 2:59 AM Steven Rostedt  wrote:
> From: "Steven Rostedt (VMware)" 
>
> If on boot up, lockdown is activated for tracefs, don't even bother creating
> the files. This can also prevent instances from being created if lockdown is
> in effect.
>
> Link: 
> http://lkml.kernel.org/r/CAHk-=whC6Ji=fwnjh2+es4b15tnbss4vpvtvbowcy1jjeg_...@mail.gmail.com
>
> Suggested-by: Linus Torvalds 
> Signed-off-by: Steven Rostedt (VMware) 
> ---
>  fs/tracefs/inode.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
> index eeeae0475da9..0caa151cae4e 100644
> --- a/fs/tracefs/inode.c
> +++ b/fs/tracefs/inode.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -390,6 +391,9 @@ struct dentry *tracefs_create_file(const char *name, 
> umode_t mode,
> struct dentry *dentry;
> struct inode *inode;
>
> +   if (security_locked_down(LOCKDOWN_TRACEFS))
> +   return NULL;
> +
> if (!(mode & S_IFMT))
> mode |= S_IFREG;
> BUG_ON(!S_ISREG(mode));
> --
> 2.23.0

Hi all,

sorry for coming back to an old thread, but it turns out that this
patch doesn't play well with SELinux's implementation of the
security_locked_down() hook, which was added a few months later (so
not your fault :) in commit 59438b46471a ("security,lockdown,selinux:
implement SELinux lockdown").

What SELinux does is it checks if the current task's creds are allowed
the lockdown::integrity or lockdown::confidentiality permission in the
policy whenever security_locked_down() is called. The idea is to be
able to control at SELinux domain level which tasks can do these
sensitive operations (when the kernel is not actually locked down by
the Lockdown LSM).

With this patch + the SELinux lockdown mechanism in use, when a
userspace task loads a module that creates some tracefs nodes in its
initcall SELinux will check if the task has the
lockdown::confidentiality permission and if not, will report denials
in audit log and prevent the tracefs entries from being created. But
that is not a very logical behavior, since the task loading the module
is itself not (explicitly) doing anything that would breach
confidentiality. It just indirectly causes some tracefs nodes to be
created, but doesn't actually use them at that point.

Since it seems the other patches also added security_locked_down()
calls to the tracefs nodes' open functions, I guess reverting this
patch could be an acceptable way to fix this problem (please correct
me if there is something that this call catches, which the other ones
don't). However, even then I can understand that you (or someone else)
might want to keep this as an optimization, in which case we could
instead do this:
1. Add a new hook security_locked_down_permanently() (the name is open
for discussion), which would be intended for situations when we want
to avoid doing some pointless work when the kernel is in a "hard"
lockdown that can't be taken back (except perhaps in some rescue
scenario...).
2. This hook would be backed by the same implementation as
security_locked_down() in the Lockdown LSM and left unimplemented by
SELinux.
3. tracefs_create_file() would call this hook instead of security_locked_down().

This way it would work as before relative to the standard lockdown via
the Lockdown LSM and would be simply ignored by SELinux. I went over
all the security_locked_down() call in the kernel and I think this
alternative hook could also fit better in arch/powerpc/xmon/xmon.c,
where it seems to be called from interrupt context (so task creds are
irrelevant, anyway...) and mainly causes some values to be redacted.
(I also found a couple minor issues with how the hook is used in other
places, for which I plan to send patches later.)

Thoughts?

--
Ondrej Mosnacek
Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.

Re: [BUG] Oops in sidtab_context_to_sid

2021-04-03 Thread Ondrej Mosnacek

On Sat, Apr 3, 2021 at 4:33 PM Paul Moore  wrote:
> On Fri, Apr 2, 2021 at 6:35 PM Vijay Balakrishna
>  wrote:
> >
> > Seeing oops in 5.4.83 sidtab_context_to_sid().  I checked with Tyler 
> > (copied),  he said it might be
> >
> > https://lore.kernel.org/selinux/CAFqZXNu8s5edDbSZuSutetTsy58i08vPuP2h-n9=kt34hcp...@mail.gmail.com/
> >
> > Ondrej, can you confirm?  Unfortunately, we don't have a on demand repro.
>
> I'm guessing this may be the problem that Tyler reported earlier and
> which appeared to be fixed by the patch below:
>
> https://lore.kernel.org/selinux/20210318215303.2578052-3-omosn...@redhat.com

Nope, if that's really 5.4.83 with no extra backports, then it can't
be this issue as it has been introduced only in v5.10.

Looking at the code in 5.4.83, my initial guess is that it could be a
memory ordering race between
sidtab_reverse_lookup()/sidtab_rcache_push() and
sidtab_rcache_search(). I think the sidtab_rcache_push() call at
security/selinux/ss/security.c:326 should in fact be after the
smp_store_release() call. Note that the sidtab_rcache_*() functions
have been replaced in commit 66f8e2f03c02 ("selinux: sidtab reverse
lookup hash table") with a different mechanism, which AFAICT doesn't
have the same issue.

If that's really it, it will likely be *very* hard to reproduce, so
you may be unable to verify the fix.

-- 
Ondrej Mosnacek
Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.

Re: [BUG] Race between policy reload sidtab conversion and live conversion

2021-02-24 Thread Ondrej Mosnacek

On Tue, Feb 23, 2021 at 11:37 PM Tyler Hicks
 wrote:
> On 2021-02-23 15:50:56, Tyler Hicks wrote:
> > On 2021-02-23 15:43:48, Tyler Hicks wrote:
> > > I'm seeing a race during policy load while the "regular" sidtab
> > > conversion is happening and a live conversion starts to take place in
> > > sidtab_context_to_sid().
> > >
> > > We have an initial policy that's loaded by systemd ~0.6s into boot and
> > > then another policy gets loaded ~2-3s into boot. That second policy load
> > > is what hits the race condition situation because the sidtab is only
> > > partially populated and there's a decent amount of filesystem operations
> > > happening, at the same time, which are triggering live conversions.
>
> Hmm, perhaps this is the same problem that's fixed by Ondrej's proposed
> change here:
>
>  https://lore.kernel.org/selinux/20210212185930.130477-3-omosn...@redhat.com/
>
> I'll put these changes through a validation run (the only place that I
> can seem to reproduce this crash) and see how it looks.

Hm... I think there is actually another race condition introduced by
the switch from rwlock to RCU [1]... Judging from the call trace you
may be hitting that.

Basically, before the switch the sidtab swapover worked like this:
1. Start live conversion of new entries.
2. Convert existing entries.
[Still only the old sidtab is visible to readers here.]
3. Swap sidtab under write lock.
4. Now only the new sidtab is visible to readers, so the old one can
be destroyed.

After the switch to RCU, we now have:
1. Start live conversion of new entries.
2. Convert existing entries.
3. RCU-assign the new policy pointer to selinux_state.
[!!! Now actually both old and new sidtab may be referenced by
readers, since there is no synchronization barrier previously provided
by the write lock.]
4. Wait for synchronize_rcu() to return.
5. Now only the new sidtab is visible to readers, so the old one can
be destroyed.

So the race can happen between 3. and 5., if one thread already sees
the new sidtab and adds a new entry there, and a second thread still
has the reference to the old sidtab and also tires to add a new entry;
live-converting to the new sidtab, which it doesn't expect to change
by itself. Unfortunately I failed to realize this when reviewing the
patch :/

I think the only two options to fix it are A) switching back to
read-write lock (the easy and safe way; undoing the performance
benefits of [1]), or B) implementing a safe two-way live conversion of
new sidtab entries, so that both tables are kept in sync while they
are both available (more complicated and with possible tricky
implications of different interpretations of contexts by the two
policies).

[1] 1b8b31a2e612 ("selinux: convert policy read-write lock to RCU")

--
Ondrej Mosnacek
Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.

[PATCH] perf/core: fix unconditional security_locked_down() call

2021-02-24 Thread Ondrej Mosnacek

Currently, the lockdown state is queried unconditionally, even though
its result is used only if the PERF_SAMPLE_REGS_INTR bit is set in
attr.sample_type. While that doesn't matter in case of the Lockdown LSM,
it causes trouble with the SELinux's lockdown hook implementation.

SELinux implements the locked_down hook with a check whether the current
task's type has the corresponding "lockdown" class permission
("integrity" or "confidentiality") allowed in the policy. This means
that calling the hook when the access control decision would be ignored
generates a bogus permission check and audit record.

Fix this by checking sample_type first and only calling the hook when
its result would be honored.

Fixes: b0c8fdc7fdb7 ("lockdown: Lock down perf when in confidentiality mode")
Signed-off-by: Ondrej Mosnacek 
---
 kernel/events/core.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 129dee540a8b..0f857307e9bd 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11796,12 +11796,12 @@ SYSCALL_DEFINE5(perf_event_open,
return err;
}
 
-   err = security_locked_down(LOCKDOWN_PERF);
-   if (err && (attr.sample_type & PERF_SAMPLE_REGS_INTR))
-   /* REGS_INTR can leak data, lockdown must prevent this */
-   return err;
-
-   err = 0;
+   /* REGS_INTR can leak data, lockdown must prevent this */
+   if (attr.sample_type & PERF_SAMPLE_REGS_INTR) {
+   err = security_locked_down(LOCKDOWN_PERF);
+   if (err)
+   return err;
+   }
 
/*
 * In cgroup mode, the pid argument is used to pass the fd
-- 
2.29.2

Re: [BUG] Race between policy reload sidtab conversion and live conversion

2021-02-25 Thread Ondrej Mosnacek

On Wed, Feb 24, 2021 at 3:43 PM Tyler Hicks  wrote:
> On 2021-02-24 10:33:46, Ondrej Mosnacek wrote:
> > On Tue, Feb 23, 2021 at 11:37 PM Tyler Hicks
> >  wrote:
> > > On 2021-02-23 15:50:56, Tyler Hicks wrote:
> > > > On 2021-02-23 15:43:48, Tyler Hicks wrote:
> > > > > I'm seeing a race during policy load while the "regular" sidtab
> > > > > conversion is happening and a live conversion starts to take place in
> > > > > sidtab_context_to_sid().
> > > > >
> > > > > We have an initial policy that's loaded by systemd ~0.6s into boot and
> > > > > then another policy gets loaded ~2-3s into boot. That second policy 
> > > > > load
> > > > > is what hits the race condition situation because the sidtab is only
> > > > > partially populated and there's a decent amount of filesystem 
> > > > > operations
> > > > > happening, at the same time, which are triggering live conversions.
> > >
> > > Hmm, perhaps this is the same problem that's fixed by Ondrej's proposed
> > > change here:
> > >
> > >  
> > > https://lore.kernel.org/selinux/20210212185930.130477-3-omosn...@redhat.com/
> > >
> > > I'll put these changes through a validation run (the only place that I
> > > can seem to reproduce this crash) and see how it looks.
> >
> > Hm... I think there is actually another race condition introduced by
> > the switch from rwlock to RCU [1]... Judging from the call trace you
> > may be hitting that.
>
> I believe your patches above fixed the race I was seeing. I was able to
> make it through a full validation run without any crashes. Without those
> patches applied, I would see several crashes resulting from this race
> over the course of a validation run.

Hm... okay so probably you were indeed running into that bug. I tried
to reproduce the other race (I added a BUG_ON to help detect it), but
wasn't able to reproduce it with my (pretty aggressive) stress test. I
only managed to trigger it by adding a conditional delay in the right
place. So I now know the second bug is really there, though it' seems
to be very unlikely to be hit in practice (might be more likely on
systems with many CPU cores, though). The first bug, OTOH, is
triggered almost instantly by my stress test.

Unless someone objects, I'll start working on a patch to switch back
to read-write lock for now. If all goes well, I'll send it sometime
next week.

>
> I'll continue to test with your changes and let you know if I end up
> running into the other race you spotted.

Thanks, but given the results of my testing it's probably not worth trying :)

>
> Tyler
>
> >
> > Basically, before the switch the sidtab swapover worked like this:
> > 1. Start live conversion of new entries.
> > 2. Convert existing entries.
> > [Still only the old sidtab is visible to readers here.]
> > 3. Swap sidtab under write lock.
> > 4. Now only the new sidtab is visible to readers, so the old one can
> > be destroyed.
> >
> > After the switch to RCU, we now have:
> > 1. Start live conversion of new entries.
> > 2. Convert existing entries.
> > 3. RCU-assign the new policy pointer to selinux_state.
> > [!!! Now actually both old and new sidtab may be referenced by
> > readers, since there is no synchronization barrier previously provided
> > by the write lock.]
> > 4. Wait for synchronize_rcu() to return.
> > 5. Now only the new sidtab is visible to readers, so the old one can
> > be destroyed.
> >
> > So the race can happen between 3. and 5., if one thread already sees
> > the new sidtab and adds a new entry there, and a second thread still
> > has the reference to the old sidtab and also tires to add a new entry;
> > live-converting to the new sidtab, which it doesn't expect to change
> > by itself. Unfortunately I failed to realize this when reviewing the
> > patch :/
> >
> > I think the only two options to fix it are A) switching back to
> > read-write lock (the easy and safe way; undoing the performance
> > benefits of [1]), or B) implementing a safe two-way live conversion of
> > new sidtab entries, so that both tables are kept in sync while they
> > are both available (more complicated and with possible tricky
> > implications of different interpretations of contexts by the two
> > policies).
> >
> > [1] 1b8b31a2e612 ("selinux: convert policy read-write lock to RCU")
> >
> > --
> > Ondrej Mosnacek
> > Software Engineer, Linux Security - SELinux kernel
> > Red Hat, Inc.
> >
>


--
Ondrej Mosnacek
Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.

Re: [PATCH bpf] security: Fix hook iteration for secid_to_secctx

2020-06-19 Thread Ondrej Mosnacek

On Wed, May 20, 2020 at 2:56 PM KP Singh  wrote:
> From: KP Singh 
>
> secid_to_secctx is not stackable, and since the BPF LSM registers this
> hook by default, the call_int_hook logic is not suitable which
> "bails-on-fail" and casues issues when other LSMs register this hook and
> eventually breaks Audit.
>
> In order to fix this, directly iterate over the security hooks instead
> of using call_int_hook as suggested in:
>
> https: 
> //lore.kernel.org/bpf/9d0eb6c6-803a-ff3a-5603-9ad6d9edf...@schaufler-ca.com/#t
>
> Fixes: 98e828a0650f ("security: Refactor declaration of LSM hooks")
> Fixes: 625236ba3832 ("security: Fix the default value of secid_to_secctx hook"
> Reported-by: Alexei Starovoitov 
> Signed-off-by: KP Singh 
[...]

Sorry for being late to the party, but doesn't this (and the
associated default return value patch) just paper over a bigger
problem? What if I have only the BPF LSM enabled and I attach a BPF
program to this hook that just returns 0? Doesn't that allow anything
privileged enough to do this to force the kernel to try and send
memory from uninitialized pointers to userspace and/or copy such
memory around and/or free uninitialized pointers?

Why on earth does the BPF LSM directly expose *all* of the hooks, even
those that are not being used for any security decisions (and are
"useful" in this context only for borking the kernel...)? Feel free to
prove me wrong, but this lazy approach of "let's just take all the
hooks as they are and stick BPF programs to them" doesn't seem like a
good choice... IMHO you should either limit the set of hooks that can
be attached to only those that aren't used to return back values via
pointers, or (if you really really need to do some state
updates/logging in those hooks) use wrapper functions that will call
the BPF progs via a simplified interface so that they cannot cause
unsafe behavior.

--
Ondrej Mosnacek
Software Engineer, Platform Security - SELinux kernel
Red Hat, Inc.

Re: [PATCH bpf] security: Fix hook iteration for secid_to_secctx

2020-06-19 Thread Ondrej Mosnacek

On Fri, Jun 19, 2020 at 3:13 PM KP Singh  wrote:
> Hi,
>
> On Fri, Jun 19, 2020 at 2:49 PM Ondrej Mosnacek  wrote:
> >
> > On Wed, May 20, 2020 at 2:56 PM KP Singh  wrote:
> > > From: KP Singh 
> > >
> > > secid_to_secctx is not stackable, and since the BPF LSM registers this
> > > hook by default, the call_int_hook logic is not suitable which
> > > "bails-on-fail" and casues issues when other LSMs register this hook and
> > > eventually breaks Audit.
> > >
> > > In order to fix this, directly iterate over the security hooks instead
> > > of using call_int_hook as suggested in:
> > >
> > > https: 
> > > //lore.kernel.org/bpf/9d0eb6c6-803a-ff3a-5603-9ad6d9edf...@schaufler-ca.com/#t
> > >
> > > Fixes: 98e828a0650f ("security: Refactor declaration of LSM hooks")
> > > Fixes: 625236ba3832 ("security: Fix the default value of secid_to_secctx 
> > > hook"
> > > Reported-by: Alexei Starovoitov 
> > > Signed-off-by: KP Singh 
> > [...]
> >
> > Sorry for being late to the party, but doesn't this (and the
> > associated default return value patch) just paper over a bigger
> > problem? What if I have only the BPF LSM enabled and I attach a BPF
> > program to this hook that just returns 0? Doesn't that allow anything
> > privileged enough to do this to force the kernel to try and send
> > memory from uninitialized pointers to userspace and/or copy such
> > memory around and/or free uninitialized pointers?
> >
> > Why on earth does the BPF LSM directly expose *all* of the hooks, even
> > those that are not being used for any security decisions (and are
> > "useful" in this context only for borking the kernel...)? Feel free to
> > prove me wrong, but this lazy approach of "let's just take all the
> > hooks as they are and stick BPF programs to them" doesn't seem like a
>
> The plan was definitely to not hook everywhere but only call the hooks
> that do have a BPF program registered. This was one of the versions
> we proposed in the initial patches where the call to the BPF LSM was
> guarded by a static key with it being enabled only when there's a
> BPF program attached to the hook.
>
> https://lore.kernel.org/bpf/20200220175250.10795-5-kpsi...@chromium.org/
>
> However, this special-cased BPF in the LSM framework, and, was met
> with opposition. Our plan is to still achieve this, but we want to do this
> with DEFINE_STATIC_CALL patches:
>
> https://lore.kernel.org/lkml/cover.1547073843.git.jpoim...@redhat.com
>
> Using these, only can we enable the call into the hook based on whether
> a program is attached, we can also eliminate the indirect call overhead which
> currently affects the "slow" way which was decided in the discussion:
>
> https://lore.kernel.org/bpf/202002241136.C4F9F7DFF@keescook/

Perhaps you are misunderstanding me... I don't have a problem with BPF
LSM registering callbacks for all the hooks. My point is about what
you can trigger once you attach programs to certain hooks. All the
above seem to be just optimizations/implementation details that do not
affect the problem I'm pointing to.

>
> > good choice... IMHO you should either limit the set of hooks that can
> > be attached to only those that aren't used to return back values via
>
> I am not sure if limiting the hooks is required here once we have
> the ability to call into BPF only when a program is attached. If the
> the user provides a BPF program, deliberately returns 0 (or any
> other value) then it is working as intended. Even if we limit this in the
> bpf LSM, deliberate privileged users can still achieve this with
> other means.

The point is that for this particular hook (secid_to_secctx) and a
couple others, the consequences of having control over the return
value are more serious than with other hooks. For most hooks, the
implementation usually just returns 0 (OK), -EACCESS (access denied)
or -E... (error) and the caller either continues as normal or handles
the error. But here if you return 0, you signal that you have
initialized the pointer and size to valid values. So suddenly the BPF
prog doesn't just control allow/deny decisions, but can now easily
trigger kernel panic. And when you look at the semantics of the hook,
you will realize that it doesn't really make sense to implement it via
BPF, since it can never populate the output values and the only
meaningful implementation would be to just return -EOPNOTSUPP.

Maybe I have it all wrong, but isn't the whole point of BPF programs
to provide a tight sandbox where you can only implement pure input ->
output functions + read/modify some internal state? Is it really
"working as intended" if you can crash the kernel by attaching a
simple BPF program to a certain hook? I mean yes, you can make the
system pretty much unusable already using the classic hooks by simply
returning -EACCESS for everything, but IMO that's quite different from
causing the kernel to do an invalid memory access.

-- 
Ondrej Mosnacek
Software Engineer, Platform Security - SELinux kernel
Red Hat, Inc.

Re: [RESEND][RFC PATCH 0/6] Fork brute force attack mitigation (fbfam)

2020-09-12 Thread Ondrej Mosnacek

On Sat, Sep 12, 2020 at 4:51 PM Mel Gorman  wrote:
> On Sat, Sep 12, 2020 at 11:36:52AM +0200, John Wood wrote:
> > On Sat, Sep 12, 2020 at 12:56:18AM -0700, Kees Cook wrote:
> > > On Sat, Sep 12, 2020 at 10:03:23AM +1000, James Morris wrote:
> > > > On Thu, 10 Sep 2020, Kees Cook wrote:
> > > >
> > > > > [kees: re-sending this series on behalf of John Wood 
> > > > > 
> > > > >  also visible at https://github.com/johwood/linux fbfam]
> > > > >
> > > > > From: John Wood 
> > > >
> > > > Why are you resending this? The author of the code needs to be able to
> > > > send and receive emails directly as part of development and maintenance.
> >
> > I tried to send the full patch serie by myself but my email got blocked. 
> > After
> > get support from my email provider it told to me that my account is young,
> > and due to its spam policie I am not allow, for now, to send a big amount
> > of mails in a short period. They also informed me that soon I will be able
> > to send more mails. The quantity increase with the age of the account.
> >
>
> If you're using "git send-email" then specify --confirm=always and
> either manually send a mail every few seconds or use an expect script
> like
>
> #!/bin/bash
> EXPECT_SCRIPT=
> function cleanup() {
> if [ "$EXPECT_SCRIPT" != "" ]; then
> rm $EXPECT_SCRIPT
> fi
> }
> trap cleanup EXIT
>
> EXPECT_SCRIPT=`mktemp`
> cat > $EXPECT_SCRIPT < spawn sh ./SEND
> expect {
> "Send this email"   { sleep 10; exp_send y\\r; exp_continue }
> }
> EOF
>
> expect -f $EXPECT_SCRIPT
> exit $?
>
> This will work if your provider limits the rate mails are sent rather
> than the total amount.

...or you could keep it simple and just pass "--batch-size 1
--relogin-delay 10" to git send-email ;)

-- 
Ondrej Mosnacek
Software Engineer, Platform Security - SELinux kernel
Red Hat, Inc.

Re: [PATCH v3] selinux: lsm: fix a missing-check bug in selinux_add_mnt_opt( )

2019-06-07 Thread Ondrej Mosnacek

On Thu, Jun 6, 2019 at 11:23 AM Gen Zhang  wrote:
> In selinux_add_mnt_opt(), 'val' is allocated by kmemdup_nul(). It returns
> NULL when fails. So 'val' should be checked. And 'mnt_opts' should be
> freed when error.
>
> Signed-off-by: Gen Zhang 
> Fixes: 757cbe597fe8 ("LSM: new method: ->sb_add_mnt_opt()")
> ---
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 3ec702c..4e4c1c6 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -1052,15 +1052,23 @@ static int selinux_add_mnt_opt(const char *option, 
> const char *val, int len,
> if (token == Opt_error)
> return -EINVAL;
>
> -   if (token != Opt_seclabel)
> -   val = kmemdup_nul(val, len, GFP_KERNEL);
> +   if (token != Opt_seclabel) {
> +   val = kmemdup_nul(val, len, GFP_KERNEL);
> +   if (!val) {
> +   rc = -ENOMEM;
> +   goto free_opt;
> +   }
> +   }
> rc = selinux_add_opt(token, val, mnt_opts);
> if (unlikely(rc)) {
> kfree(val);
> -   if (*mnt_opts) {
> -   selinux_free_mnt_opts(*mnt_opts);
> -   *mnt_opts = NULL;
> -   }
> +   goto free_opt;
> +   }
> +   return rc;

At this point rc is guaranteed to be 0, so you can just 'return 0' for
clarity. Also, I visually prefer an empty line between a return
statement and a goto label, but I'm not sure what is the
general/maintainer's preference.

Also, you should drop the "lsm: " from the subject - it is redundant
and doesn't follow the SELinux convention. See `git log --oneline --
security/selinux/` for what the subjects usually look like - mostly we
just go with "selinux: " (or "LSM: " when
the changes affect the shared LSM interface).

> +free_opt:
> +   if (*mnt_opts) {
> +   selinux_free_mnt_opts(*mnt_opts);
> +   *mnt_opts = NULL;
> }
> return rc;
>  }

--
Ondrej Mosnacek 
Software Engineer, Security Technologies
Red Hat, Inc.

Re: [PATCH v4] selinux: lsm: fix a missing-check bug in selinux_sb_eat_lsm_o pts()

2019-06-07 Thread Ondrej Mosnacek

On Thu, Jun 6, 2019 at 10:55 AM Gen Zhang  wrote:
> In selinux_sb_eat_lsm_opts(), 'arg' is allocated by kmemdup_nul(). It
> returns NULL when fails. So 'arg' should be checked. And 'mnt_opts'
> should be freed when error.
>
> Signed-off-by: Gen Zhang 
> Fixes: 99dbbb593fe6 ("selinux: rewrite selinux_sb_eat_lsm_opts()")

My comments about the subject and an empty line before label apply
here as well, but Paul can fix both easily when applying, so:

Reviewed-by: Ondrej Mosnacek 

> ---
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 3ec702c..13479cd 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -2616,10 +2616,11 @@ static int selinux_sb_eat_lsm_opts(char *options, 
> void **mnt_opts)
> char *from = options;
> char *to = options;
> bool first = true;
> +   int rc;
>
> while (1) {
> int len = opt_len(from);
> -   int token, rc;
> +   int token;
> char *arg = NULL;
>
> token = match_opt_prefix(from, len, &arg);
> @@ -2635,15 +2636,15 @@ static int selinux_sb_eat_lsm_opts(char *options, 
> void **mnt_opts)
> *q++ = c;
> }
> arg = kmemdup_nul(arg, q - arg, GFP_KERNEL);
> +   if (!arg) {
> +   rc = -ENOMEM;
> +   goto free_opt;
> +   }
> }
> rc = selinux_add_opt(token, arg, mnt_opts);
> if (unlikely(rc)) {
> kfree(arg);
> -   if (*mnt_opts) {
> -   selinux_free_mnt_opts(*mnt_opts);
> -   *mnt_opts = NULL;
> -   }
> -   return rc;
> +   goto free_opt;
> }
> } else {
> if (!first) {   // copy with preceding comma
> @@ -2661,6 +2662,12 @@ static int selinux_sb_eat_lsm_opts(char *options, void 
> **mnt_opts)
> }
> *to = '\0';
> return 0;
> +free_opt:
> +   if (*mnt_opts) {
> +   selinux_free_mnt_opts(*mnt_opts);
> +   *mnt_opts = NULL;
> +   }
> +   return rc;
>  }
>
>  static int selinux_sb_remount(struct super_block *sb, void *mnt_opts)

--
Ondrej Mosnacek 
Software Engineer, Security Technologies
Red Hat, Inc.

Re: [PATCH v3] selinux: lsm: fix a missing-check bug in selinux_sb_eat_lsm_opts()

2019-06-03 Thread Ondrej Mosnacek

On Sat, Jun 1, 2019 at 4:15 AM Gen Zhang  wrote:
> In selinux_sb_eat_lsm_opts(), 'arg' is allocated by kmemdup_nul(). It
> returns NULL when fails. So 'arg' should be checked. And 'mnt_opts'
> should be freed when error.
>
> Signed-off-by: Gen Zhang 
> Reviewed-by: Ondrej Mosnacek 

It looks like you're new to the kernel development community, so let
me give you a bit of friendly advice for the future :)

You don't need to repost the patch when people give you
Acked-by/Reviewed-by/Tested-by (unless there is a different reason to
respin/repost the patches). The maintainer goes over the replies when
applying the final patch and adds Acked-by/Reviewed-by/... on his/her
own.

If you *do* need to respin a path for which you have received A/R/T,
then you need to distinguish between two cases:
1. Only trivial changes to the patch (only fixed typos, edited commit
message, removed empty line, etc. - for example, v1 -> v2 of this
patch falls into this category) - in this case you can collect the
A/R/T yourself and add them to the new version. This saves the
maintainer and the reviewers from redundant work, since the patch is
still semantically the same and the A/R/T from the last version still
apply.
2. Non-trivial changes to the patch (as is the case for this patch) -
in this case your patch needs to be reviewed again and you should
disregard all A/R/T from the previous version. You can easily piss
someone off if you add their Reviewed-by to a patch they haven't
actually reviewed, so be careful ;-)

(Someone please correct me if I got it wrong - this is what I gathered
so far from my experience.)

Good luck in your future work!

> Fixes: 99dbbb593fe6 ("selinux: rewrite selinux_sb_eat_lsm_opts()")
> ---
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 3ec702c..f329fc0 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -2616,6 +2616,7 @@ static int selinux_sb_eat_lsm_opts(char *options, void 
> **mnt_opts)
> char *from = options;
> char *to = options;
> bool first = true;
> +   int ret;
>
> while (1) {
> int len = opt_len(from);
> @@ -2635,15 +2636,16 @@ static int selinux_sb_eat_lsm_opts(char *options, 
> void **mnt_opts)
> *q++ = c;
> }
> arg = kmemdup_nul(arg, q - arg, GFP_KERNEL);
> +   if (!arg) {
> +   ret = -ENOMEM;
> +   goto free_opt;
> +   }
> }
> rc = selinux_add_opt(token, arg, mnt_opts);
> if (unlikely(rc)) {
> +   ret = rc;
> kfree(arg);
> -   if (*mnt_opts) {
> -   selinux_free_mnt_opts(*mnt_opts);
> -   *mnt_opts = NULL;
> -   }
> -   return rc;
> +   goto free_opt;
> }
> } else {
> if (!first) {   // copy with preceding comma
> @@ -2661,6 +2663,12 @@ static int selinux_sb_eat_lsm_opts(char *options, void 
> **mnt_opts)
> }
> *to = '\0';
> return 0;
> +free_opt:
> +   if (*mnt_opts) {
> +   selinux_free_mnt_opts(*mnt_opts);
> +   *mnt_opts = NULL;
> +   }
> +   return ret;
>  }
>
>  static int selinux_sb_remount(struct super_block *sb, void *mnt_opts)

-- 
Ondrej Mosnacek 
Software Engineer, Security Technologies
Red Hat, Inc.

Re: [PATCH v3] selinux: lsm: fix a missing-check bug in selinux_sb_eat_lsm_opts()

2019-06-03 Thread Ondrej Mosnacek

On Sat, Jun 1, 2019 at 4:15 AM Gen Zhang  wrote:
> In selinux_sb_eat_lsm_opts(), 'arg' is allocated by kmemdup_nul(). It
> returns NULL when fails. So 'arg' should be checked. And 'mnt_opts'
> should be freed when error.
>
> Signed-off-by: Gen Zhang 
> Reviewed-by: Ondrej Mosnacek 
> Fixes: 99dbbb593fe6 ("selinux: rewrite selinux_sb_eat_lsm_opts()")
> ---
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 3ec702c..f329fc0 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -2616,6 +2616,7 @@ static int selinux_sb_eat_lsm_opts(char *options, void 
> **mnt_opts)
> char *from = options;
> char *to = options;
> bool first = true;
> +   int ret;

I'd suggest just moving the declaration of 'rc' here and simply reuse
that variable. Otherwise the patch looks good to me.

>
> while (1) {
> int len = opt_len(from);
> @@ -2635,15 +2636,16 @@ static int selinux_sb_eat_lsm_opts(char *options, 
> void **mnt_opts)
> *q++ = c;
> }
> arg = kmemdup_nul(arg, q - arg, GFP_KERNEL);
> +   if (!arg) {
> +   ret = -ENOMEM;
> +   goto free_opt;
> +   }
> }
> rc = selinux_add_opt(token, arg, mnt_opts);
> if (unlikely(rc)) {
> +   ret = rc;
> kfree(arg);
> -   if (*mnt_opts) {
> -   selinux_free_mnt_opts(*mnt_opts);
> -   *mnt_opts = NULL;
> -   }
> -   return rc;
> +   goto free_opt;
> }
> } else {
> if (!first) {   // copy with preceding comma
> @@ -2661,6 +2663,12 @@ static int selinux_sb_eat_lsm_opts(char *options, void 
> **mnt_opts)
> }
> *to = '\0';
> return 0;
> +free_opt:
> +   if (*mnt_opts) {
> +   selinux_free_mnt_opts(*mnt_opts);
> +   *mnt_opts = NULL;
> +   }
> +   return ret;
>  }
>
>  static int selinux_sb_remount(struct super_block *sb, void *mnt_opts)

-- 
Ondrej Mosnacek 
Software Engineer, Security Technologies
Red Hat, Inc.

Re: [BUG] Race between policy reload sidtab conversion and live conversion

2021-02-26 Thread Ondrej Mosnacek

On Fri, Feb 26, 2021 at 2:07 AM Paul Moore  wrote:
> On Wed, Feb 24, 2021 at 4:35 AM Ondrej Mosnacek  wrote:
> > After the switch to RCU, we now have:
> > 1. Start live conversion of new entries.
> > 2. Convert existing entries.
> > 3. RCU-assign the new policy pointer to selinux_state.
> > [!!! Now actually both old and new sidtab may be referenced by
> > readers, since there is no synchronization barrier previously provided
> > by the write lock.]
> > 4. Wait for synchronize_rcu() to return.
> > 5. Now only the new sidtab is visible to readers, so the old one can
> > be destroyed.
> >
> > So the race can happen between 3. and 5., if one thread already sees
> > the new sidtab and adds a new entry there, and a second thread still
> > has the reference to the old sidtab and also tires to add a new entry;
> > live-converting to the new sidtab, which it doesn't expect to change
> > by itself. Unfortunately I failed to realize this when reviewing the
> > patch :/
>
> It is possible I'm not fully understanding the problem and/or missing
> an important detail - it is rather tricky code, and RCU can be very
> hard to reason at times - but I think we may be able to solve this
> with some lock fixes inside sidtab_context_to_sid().  Let me try to
> explain to see if we are on the same page here ...
>
> The problem is when we have two (or more) threads trying to
> add/convert the same context into a sid; the task with new_sidtab is
> looking to add a new sidtab entry, while the task with old_sidtab is
> looking to convert an entry in old_sidtab into a new entry in
> new_sidtab.  Boom.
>
> Looking at the code in sidtab_context_to_sid(), when we have two
> sidtabs that are currently active (old_sidtab->convert pointer is
> valid) and a task with old_sidtab attempts to add a new entry to both
> sidtabs it first adds it to the old sidtab then it also adds it to the
> new sidtab.  I believe the problem is that in this case while the task
> grabs the old_sidtab->lock, it never grabs the new_sidtab->lock which
> allows it to race with tasks that already see only new_sidtab.  I
> think adding code to sidtab_context_to_sid() which grabs the
> new_sidtab->lock when adding entries to the new_sidtab *should* solve
> the problem.
>
> Did I miss something important? ;)

Sadly, yes :) Consider this scenario (assuming we fix the locking at
sidtab level):

If it happens that a new SID (x) is added via the new sidtab and then
another one (y) via the old sidtab, to avoid clash of SIDs, we would
need to leave a "hole" in the old sidtab for SID x. And this will
cause trouble if the thread that has just added SID y, then tries to
translate the context string corresponding to SID x (without re-taking
the RCU read lock and refreshing the policy pointer). Even if we
handle skipping the "holes" in the old sidtab safely, the translation
would then end up adding a duplicate SID entry for the context already
represented by SID x - which is not a state we want to end up in.

This is why I said that to fully fix this, we'd need to have a
both-ways live conversion in place. (And that already starts to feel
like too much hacking for something that should probably go to
stable@...)

--
Ondrej Mosnacek
Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.

Re: [BUG] Race between policy reload sidtab conversion and live conversion

2021-02-26 Thread Ondrej Mosnacek

On Fri, Feb 26, 2021 at 5:08 AM Hillf Danton  wrote:
> On Thu, 25 Feb 2021 20:06:45 -0500 Paul Moore wrote:
> > On Wed, Feb 24, 2021 at 4:35 AM Ondrej Mosnacek  wrote:
> > > After the switch to RCU, we now have:
> > > 1. Start live conversion of new entries.
> > > 2. Convert existing entries.
> > > 3. RCU-assign the new policy pointer to selinux_state.
> > > [!!! Now actually both old and new sidtab may be referenced by
> > > readers, since there is no synchronization barrier previously provided
> > > by the write lock.]
> > > 4. Wait for synchronize_rcu() to return.
> > > 5. Now only the new sidtab is visible to readers, so the old one can
> > > be destroyed.
> > >
> > > So the race can happen between 3. and 5., if one thread already sees
> > > the new sidtab and adds a new entry there, and a second thread still
> > > has the reference to the old sidtab and also tires to add a new entry;
> > > live-converting to the new sidtab, which it doesn't expect to change
> > > by itself. Unfortunately I failed to realize this when reviewing the
> > > patch :/
> >
> > It is possible I'm not fully understanding the problem and/or missing
> > an important detail - it is rather tricky code, and RCU can be very
> > hard to reason at times - but I think we may be able to solve this
> > with some lock fixes inside sidtab_context_to_sid().  Let me try to
> > explain to see if we are on the same page here ...
> >
> > The problem is when we have two (or more) threads trying to
> > add/convert the same context into a sid; the task with new_sidtab is
> > looking to add a new sidtab entry, while the task with old_sidtab is
> > looking to convert an entry in old_sidtab into a new entry in
> > new_sidtab.  Boom.
> >
> > Looking at the code in sidtab_context_to_sid(), when we have two
> > sidtabs that are currently active (old_sidtab->convert pointer is
> > valid) and a task with old_sidtab attempts to add a new entry to both
> > sidtabs it first adds it to the old sidtab then it also adds it to the
> > new sidtab.  I believe the problem is that in this case while the task
> > grabs the old_sidtab->lock, it never grabs the new_sidtab->lock which
> > allows it to race with tasks that already see only new_sidtab.  I
> > think adding code to sidtab_context_to_sid() which grabs the
> > new_sidtab->lock when adding entries to the new_sidtab *should* solve
> > the problem.
> >
> > Did I miss something important? ;)
>
> If the convert pointer can be derefered without lock, we can opt to
> convert context after building sidtab with the risk of AB BA deadlock
> cut. Below is the minimum change I can think of along your direction.

We could fix this a bit more easily by just having a shared spinlock
for both (well, *all*) sidtabs. Yes, we'd need to have it all the way
up in selinux_state and pass it through to sidtab_init(), but IMHO
that's less bad than trying to get it right with two locks.

-- 
Ondrej Mosnacek
Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.

Re: [BUG] Race between policy reload sidtab conversion and live conversion

2021-03-01 Thread Ondrej Mosnacek

On Sun, Feb 28, 2021 at 8:21 PM Paul Moore  wrote:
> On Fri, Feb 26, 2021 at 6:12 AM Ondrej Mosnacek  wrote:
> > On Fri, Feb 26, 2021 at 2:07 AM Paul Moore  wrote:
> > > On Wed, Feb 24, 2021 at 4:35 AM Ondrej Mosnacek  
> > > wrote:
> > > > After the switch to RCU, we now have:
> > > > 1. Start live conversion of new entries.
> > > > 2. Convert existing entries.
> > > > 3. RCU-assign the new policy pointer to selinux_state.
> > > > [!!! Now actually both old and new sidtab may be referenced by
> > > > readers, since there is no synchronization barrier previously provided
> > > > by the write lock.]
> > > > 4. Wait for synchronize_rcu() to return.
> > > > 5. Now only the new sidtab is visible to readers, so the old one can
> > > > be destroyed.
> > > >
> > > > So the race can happen between 3. and 5., if one thread already sees
> > > > the new sidtab and adds a new entry there, and a second thread still
> > > > has the reference to the old sidtab and also tires to add a new entry;
> > > > live-converting to the new sidtab, which it doesn't expect to change
> > > > by itself. Unfortunately I failed to realize this when reviewing the
> > > > patch :/
> > >
> > > It is possible I'm not fully understanding the problem and/or missing
> > > an important detail - it is rather tricky code, and RCU can be very
> > > hard to reason at times - but I think we may be able to solve this
> > > with some lock fixes inside sidtab_context_to_sid().  Let me try to
> > > explain to see if we are on the same page here ...
> > >
> > > The problem is when we have two (or more) threads trying to
> > > add/convert the same context into a sid; the task with new_sidtab is
> > > looking to add a new sidtab entry, while the task with old_sidtab is
> > > looking to convert an entry in old_sidtab into a new entry in
> > > new_sidtab.  Boom.
> > >
> > > Looking at the code in sidtab_context_to_sid(), when we have two
> > > sidtabs that are currently active (old_sidtab->convert pointer is
> > > valid) and a task with old_sidtab attempts to add a new entry to both
> > > sidtabs it first adds it to the old sidtab then it also adds it to the
> > > new sidtab.  I believe the problem is that in this case while the task
> > > grabs the old_sidtab->lock, it never grabs the new_sidtab->lock which
> > > allows it to race with tasks that already see only new_sidtab.  I
> > > think adding code to sidtab_context_to_sid() which grabs the
> > > new_sidtab->lock when adding entries to the new_sidtab *should* solve
> > > the problem.
> > >
> > > Did I miss something important? ;)
> >
> > Sadly, yes :) Consider this scenario (assuming we fix the locking at
> > sidtab level):
> >
> > If it happens that a new SID (x) is added via the new sidtab and then
> > another one (y) via the old sidtab, to avoid clash of SIDs, we would
> > need to leave a "hole" in the old sidtab for SID x. And this will
> > cause trouble if the thread that has just added SID y, then tries to
> > translate the context string corresponding to SID x (without re-taking
> > the RCU read lock and refreshing the policy pointer). Even if we
> > handle skipping the "holes" in the old sidtab safely, the translation
> > would then end up adding a duplicate SID entry for the context already
> > represented by SID x - which is not a state we want to end up in.
>
> Ah, yes, you're right.  I was only thinking about the problem of
> adding an entry to the old sidtab, and not the (much more likely case)
> of an entry being added to the new sidtab.  Bummer.
>
> Thinking aloud for a moment - what if we simply refused to add new
> sidtab entries if the task's sidtab pointer is "old"?  Common sense
> would tell us that this scenario should be very rare at present, and I
> believe the testing mentioned in this thread adds some weight to that
> claim.  After all, this only affects tasks which entered into their
> RCU protected session prior to the policy load RCU sync *AND* are
> attempting to add a new entry to the sidtab.  That *has* to be a
> really low percentage, especially on a system that has been up and
> running for some time.  My gut feeling is this should be safe as well;
> all of the calling code should have the necessary error handling in
> place as there are plenty of reasons why we could normally fail to add
> an entry to the sidtab; memory allocation f

Re: [PATCH v2 02/15] crypto: morus - fix handling chunked inputs

2019-02-05 Thread Ondrej Mosnacek

On Fri, Feb 1, 2019 at 8:52 AM Eric Biggers  wrote:
> From: Eric Biggers 
>
> The generic MORUS implementations all fail the improved AEAD tests
> because they produce the wrong result with some data layouts.  The issue
> is that they assume that if the skcipher_walk API gives 'nbytes' not
> aligned to the walksize (a.k.a. walk.stride), then it is the end of the
> data.  In fact, this can happen before the end.  Fix them.
>
> Fixes: 396be41f16fd ("crypto: morus - Add generic MORUS AEAD implementations")
> Cc:  # v4.18+
> Cc: Ondrej Mosnacek 
> Signed-off-by: Eric Biggers 

Reviewed-by: Ondrej Mosnacek 


> ---
>  crypto/morus1280.c | 13 +++--
>  crypto/morus640.c  | 13 +++--
>  2 files changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/crypto/morus1280.c b/crypto/morus1280.c
> index 78ba09db7328c..0747732d5b78a 100644
> --- a/crypto/morus1280.c
> +++ b/crypto/morus1280.c
> @@ -362,18 +362,19 @@ static void crypto_morus1280_process_crypt(struct 
> morus1280_state *state,
>const struct morus1280_ops *ops)
>  {
> struct skcipher_walk walk;
> -   u8 *dst;
> -   const u8 *src;
>
> ops->skcipher_walk_init(&walk, req, false);
>
> while (walk.nbytes) {
> -   src = walk.src.virt.addr;
> -   dst = walk.dst.virt.addr;
> +   unsigned int nbytes = walk.nbytes;
>
> -   ops->crypt_chunk(state, dst, src, walk.nbytes);
> +   if (nbytes < walk.total)
> +   nbytes = round_down(nbytes, walk.stride);
>
> -   skcipher_walk_done(&walk, 0);
> +   ops->crypt_chunk(state, walk.dst.virt.addr, 
> walk.src.virt.addr,
> +nbytes);
> +
> +   skcipher_walk_done(&walk, walk.nbytes - nbytes);
> }
>  }
>
> diff --git a/crypto/morus640.c b/crypto/morus640.c
> index 5cf530139b271..1617a1eb8be13 100644
> --- a/crypto/morus640.c
> +++ b/crypto/morus640.c
> @@ -361,18 +361,19 @@ static void crypto_morus640_process_crypt(struct 
> morus640_state *state,
>   const struct morus640_ops *ops)
>  {
> struct skcipher_walk walk;
> -   u8 *dst;
> -   const u8 *src;
>
> ops->skcipher_walk_init(&walk, req, false);
>
> while (walk.nbytes) {
> -   src = walk.src.virt.addr;
> -   dst = walk.dst.virt.addr;
> +   unsigned int nbytes = walk.nbytes;
>
> -   ops->crypt_chunk(state, dst, src, walk.nbytes);
> +   if (nbytes < walk.total)
> +   nbytes = round_down(nbytes, walk.stride);
>
> -   skcipher_walk_done(&walk, 0);
> +   ops->crypt_chunk(state, walk.dst.virt.addr, 
> walk.src.virt.addr,
> +nbytes);
> +
> +   skcipher_walk_done(&walk, walk.nbytes - nbytes);
> }
>  }
>
> --
> 2.20.1
>


--
Ondrej Mosnacek 
Associate Software Engineer, Security Technologies
Red Hat, Inc.

Re: [PATCH v2 01/15] crypto: aegis - fix handling chunked inputs

2019-02-05 Thread Ondrej Mosnacek

On Fri, Feb 1, 2019 at 8:52 AM Eric Biggers  wrote:
> From: Eric Biggers 
>
> The generic AEGIS implementations all fail the improved AEAD tests
> because they produce the wrong result with some data layouts.  The issue
> is that they assume that if the skcipher_walk API gives 'nbytes' not
> aligned to the walksize (a.k.a. walk.stride), then it is the end of the
> data.  In fact, this can happen before the end.  Fix them.
>
> Fixes: f606a88e5823 ("crypto: aegis - Add generic AEGIS AEAD implementations")
> Cc:  # v4.18+
> Cc: Ondrej Mosnacek 
> Signed-off-by: Eric Biggers 

Reviewed-by: Ondrej Mosnacek 

> ---
>  crypto/aegis128.c  | 14 +++---
>  crypto/aegis128l.c | 14 +++---
>  crypto/aegis256.c  | 14 +++---
>  3 files changed, 21 insertions(+), 21 deletions(-)
>
> diff --git a/crypto/aegis128.c b/crypto/aegis128.c
> index 96e078a8a00a1..3718a83413032 100644
> --- a/crypto/aegis128.c
> +++ b/crypto/aegis128.c
> @@ -286,19 +286,19 @@ static void crypto_aegis128_process_crypt(struct 
> aegis_state *state,
>   const struct aegis128_ops *ops)
>  {
> struct skcipher_walk walk;
> -   u8 *src, *dst;
> -   unsigned int chunksize;
>
> ops->skcipher_walk_init(&walk, req, false);
>
> while (walk.nbytes) {
> -   src = walk.src.virt.addr;
> -   dst = walk.dst.virt.addr;
> -   chunksize = walk.nbytes;
> +   unsigned int nbytes = walk.nbytes;
>
> -   ops->crypt_chunk(state, dst, src, chunksize);
> +   if (nbytes < walk.total)
> +   nbytes = round_down(nbytes, walk.stride);
>
> -   skcipher_walk_done(&walk, 0);
> +   ops->crypt_chunk(state, walk.dst.virt.addr, 
> walk.src.virt.addr,
> +nbytes);
> +
> +   skcipher_walk_done(&walk, walk.nbytes - nbytes);
> }
>  }
>
> diff --git a/crypto/aegis128l.c b/crypto/aegis128l.c
> index a210e779b911b..275a8616d71bd 100644
> --- a/crypto/aegis128l.c
> +++ b/crypto/aegis128l.c
> @@ -349,19 +349,19 @@ static void crypto_aegis128l_process_crypt(struct 
> aegis_state *state,
>const struct aegis128l_ops *ops)
>  {
> struct skcipher_walk walk;
> -   u8 *src, *dst;
> -   unsigned int chunksize;
>
> ops->skcipher_walk_init(&walk, req, false);
>
> while (walk.nbytes) {
> -   src = walk.src.virt.addr;
> -   dst = walk.dst.virt.addr;
> -   chunksize = walk.nbytes;
> +   unsigned int nbytes = walk.nbytes;
>
> -   ops->crypt_chunk(state, dst, src, chunksize);
> +   if (nbytes < walk.total)
> +   nbytes = round_down(nbytes, walk.stride);
>
> -   skcipher_walk_done(&walk, 0);
> +   ops->crypt_chunk(state, walk.dst.virt.addr, 
> walk.src.virt.addr,
> +nbytes);
> +
> +   skcipher_walk_done(&walk, walk.nbytes - nbytes);
> }
>  }
>
> diff --git a/crypto/aegis256.c b/crypto/aegis256.c
> index 49882a28e93e9..ecd6b7f34a2d2 100644
> --- a/crypto/aegis256.c
> +++ b/crypto/aegis256.c
> @@ -299,19 +299,19 @@ static void crypto_aegis256_process_crypt(struct 
> aegis_state *state,
>   const struct aegis256_ops *ops)
>  {
> struct skcipher_walk walk;
> -   u8 *src, *dst;
> -   unsigned int chunksize;
>
> ops->skcipher_walk_init(&walk, req, false);
>
> while (walk.nbytes) {
> -   src = walk.src.virt.addr;
> -   dst = walk.dst.virt.addr;
> -   chunksize = walk.nbytes;
> +   unsigned int nbytes = walk.nbytes;
>
> -   ops->crypt_chunk(state, dst, src, chunksize);
> +       if (nbytes < walk.total)
> +   nbytes = round_down(nbytes, walk.stride);
>
> -   skcipher_walk_done(&walk, 0);
> +   ops->crypt_chunk(state, walk.dst.virt.addr, 
> walk.src.virt.addr,
> +nbytes);
> +
> +   skcipher_walk_done(&walk, walk.nbytes - nbytes);
> }
>  }
>
> --
> 2.20.1
>


-- 
Ondrej Mosnacek 
Associate Software Engineer, Security Technologies
Red Hat, Inc.

Re: [PATCH v2 03/15] crypto: x86/aegis - fix handling chunked inputs and MAY_SLEEP

2019-02-05 Thread Ondrej Mosnacek

On Fri, Feb 1, 2019 at 8:52 AM Eric Biggers  wrote:
> From: Eric Biggers 
>
> The x86 AEGIS implementations all fail the improved AEAD tests because
> they produce the wrong result with some data layouts.  The issue is that
> they assume that if the skcipher_walk API gives 'nbytes' not aligned to
> the walksize (a.k.a. walk.stride), then it is the end of the data.  In
> fact, this can happen before the end.
>
> Also, when the CRYPTO_TFM_REQ_MAY_SLEEP flag is given, they can
> incorrectly sleep in the skcipher_walk_*() functions while preemption
> has been disabled by kernel_fpu_begin().
>
> Fix these bugs.
>
> Fixes: 1d373d4e8e15 ("crypto: x86 - Add optimized AEGIS implementations")
> Cc:  # v4.18+
> Cc: Ondrej Mosnacek 
> Signed-off-by: Eric Biggers 

Reviewed-by: Ondrej Mosnacek 

> ---
>  arch/x86/crypto/aegis128-aesni-glue.c  | 38 ++
>  arch/x86/crypto/aegis128l-aesni-glue.c | 38 ++
>  arch/x86/crypto/aegis256-aesni-glue.c  | 38 ++
>  3 files changed, 45 insertions(+), 69 deletions(-)
>
> diff --git a/arch/x86/crypto/aegis128-aesni-glue.c 
> b/arch/x86/crypto/aegis128-aesni-glue.c
> index 2a356b948720e..3ea71b8718135 100644
> --- a/arch/x86/crypto/aegis128-aesni-glue.c
> +++ b/arch/x86/crypto/aegis128-aesni-glue.c
> @@ -119,31 +119,20 @@ static void crypto_aegis128_aesni_process_ad(
>  }
>
>  static void crypto_aegis128_aesni_process_crypt(
> -   struct aegis_state *state, struct aead_request *req,
> +   struct aegis_state *state, struct skcipher_walk *walk,
> const struct aegis_crypt_ops *ops)
>  {
> -   struct skcipher_walk walk;
> -   u8 *src, *dst;
> -   unsigned int chunksize, base;
> -
> -   ops->skcipher_walk_init(&walk, req, false);
> -
> -   while (walk.nbytes) {
> -   src = walk.src.virt.addr;
> -   dst = walk.dst.virt.addr;
> -   chunksize = walk.nbytes;
> -
> -   ops->crypt_blocks(state, chunksize, src, dst);
> -
> -   base = chunksize & ~(AEGIS128_BLOCK_SIZE - 1);
> -   src += base;
> -   dst += base;
> -   chunksize &= AEGIS128_BLOCK_SIZE - 1;
> -
> -   if (chunksize > 0)
> -   ops->crypt_tail(state, chunksize, src, dst);
> +   while (walk->nbytes >= AEGIS128_BLOCK_SIZE) {
> +   ops->crypt_blocks(state,
> + round_down(walk->nbytes, 
> AEGIS128_BLOCK_SIZE),
> + walk->src.virt.addr, walk->dst.virt.addr);
> +   skcipher_walk_done(walk, walk->nbytes % AEGIS128_BLOCK_SIZE);
> +   }
>
> -   skcipher_walk_done(&walk, 0);
> +   if (walk->nbytes) {
> +   ops->crypt_tail(state, walk->nbytes, walk->src.virt.addr,
> +   walk->dst.virt.addr);
> +   skcipher_walk_done(walk, 0);
> }
>  }
>
> @@ -186,13 +175,16 @@ static void crypto_aegis128_aesni_crypt(struct 
> aead_request *req,
>  {
> struct crypto_aead *tfm = crypto_aead_reqtfm(req);
> struct aegis_ctx *ctx = crypto_aegis128_aesni_ctx(tfm);
> +   struct skcipher_walk walk;
> struct aegis_state state;
>
> +   ops->skcipher_walk_init(&walk, req, true);
> +
> kernel_fpu_begin();
>
> crypto_aegis128_aesni_init(&state, ctx->key.bytes, req->iv);
> crypto_aegis128_aesni_process_ad(&state, req->src, req->assoclen);
> -   crypto_aegis128_aesni_process_crypt(&state, req, ops);
> +   crypto_aegis128_aesni_process_crypt(&state, &walk, ops);
> crypto_aegis128_aesni_final(&state, tag_xor, req->assoclen, cryptlen);
>
> kernel_fpu_end();
> diff --git a/arch/x86/crypto/aegis128l-aesni-glue.c 
> b/arch/x86/crypto/aegis128l-aesni-glue.c
> index dbe8bb980da15..1b1b39c66c5e2 100644
> --- a/arch/x86/crypto/aegis128l-aesni-glue.c
> +++ b/arch/x86/crypto/aegis128l-aesni-glue.c
> @@ -119,31 +119,20 @@ static void crypto_aegis128l_aesni_process_ad(
>  }
>
>  static void crypto_aegis128l_aesni_process_crypt(
> -   struct aegis_state *state, struct aead_request *req,
> +   struct aegis_state *state, struct skcipher_walk *walk,
> const struct aegis_crypt_ops *ops)
>  {
> -   struct skcipher_walk walk;
> -   u8 *src, *dst;
> -   unsigned int chunksize, base;
> -
> -   ops->skcipher_walk_init(&walk, req, false);
> -
> -   while (walk.nbytes) {
>

Re: [PATCH v2 04/15] crypto: x86/morus - fix handling chunked inputs and MAY_SLEEP

2019-02-05 Thread Ondrej Mosnacek

On Fri, Feb 1, 2019 at 8:52 AM Eric Biggers  wrote:
> From: Eric Biggers 
>
> The x86 MORUS implementations all fail the improved AEAD tests because
> they produce the wrong result with some data layouts.  The issue is that
> they assume that if the skcipher_walk API gives 'nbytes' not aligned to
> the walksize (a.k.a. walk.stride), then it is the end of the data.  In
> fact, this can happen before the end.
>
> Also, when the CRYPTO_TFM_REQ_MAY_SLEEP flag is given, they can
> incorrectly sleep in the skcipher_walk_*() functions while preemption
> has been disabled by kernel_fpu_begin().
>
> Fix these bugs.
>
> Fixes: 56e8e57fc3a7 ("crypto: morus - Add common SIMD glue code for MORUS")
> Cc:  # v4.18+
> Cc: Ondrej Mosnacek 
> Signed-off-by: Eric Biggers 

Reviewed-by: Ondrej Mosnacek 

> ---
>  arch/x86/crypto/morus1280_glue.c | 40 +---
>  arch/x86/crypto/morus640_glue.c  | 39 ---
>  2 files changed, 31 insertions(+), 48 deletions(-)
>
> diff --git a/arch/x86/crypto/morus1280_glue.c 
> b/arch/x86/crypto/morus1280_glue.c
> index 0dccdda1eb3a1..7e600f8bcdad8 100644
> --- a/arch/x86/crypto/morus1280_glue.c
> +++ b/arch/x86/crypto/morus1280_glue.c
> @@ -85,31 +85,20 @@ static void crypto_morus1280_glue_process_ad(
>
>  static void crypto_morus1280_glue_process_crypt(struct morus1280_state 
> *state,
> struct morus1280_ops ops,
> -   struct aead_request *req)
> +   struct skcipher_walk *walk)
>  {
> -   struct skcipher_walk walk;
> -   u8 *cursor_src, *cursor_dst;
> -   unsigned int chunksize, base;
> -
> -   ops.skcipher_walk_init(&walk, req, false);
> -
> -   while (walk.nbytes) {
> -   cursor_src = walk.src.virt.addr;
> -   cursor_dst = walk.dst.virt.addr;
> -   chunksize = walk.nbytes;
> -
> -   ops.crypt_blocks(state, cursor_src, cursor_dst, chunksize);
> -
> -   base = chunksize & ~(MORUS1280_BLOCK_SIZE - 1);
> -   cursor_src += base;
> -   cursor_dst += base;
> -   chunksize &= MORUS1280_BLOCK_SIZE - 1;
> -
> -   if (chunksize > 0)
> -   ops.crypt_tail(state, cursor_src, cursor_dst,
> -  chunksize);
> +   while (walk->nbytes >= MORUS1280_BLOCK_SIZE) {
> +   ops.crypt_blocks(state, walk->src.virt.addr,
> +walk->dst.virt.addr,
> +round_down(walk->nbytes,
> +   MORUS1280_BLOCK_SIZE));
> +   skcipher_walk_done(walk, walk->nbytes % MORUS1280_BLOCK_SIZE);
> +   }
>
> -   skcipher_walk_done(&walk, 0);
> +   if (walk->nbytes) {
> +   ops.crypt_tail(state, walk->src.virt.addr, 
> walk->dst.virt.addr,
> +  walk->nbytes);
> +   skcipher_walk_done(walk, 0);
> }
>  }
>
> @@ -147,12 +136,15 @@ static void crypto_morus1280_glue_crypt(struct 
> aead_request *req,
> struct crypto_aead *tfm = crypto_aead_reqtfm(req);
> struct morus1280_ctx *ctx = crypto_aead_ctx(tfm);
> struct morus1280_state state;
> +   struct skcipher_walk walk;
> +
> +   ops.skcipher_walk_init(&walk, req, true);
>
> kernel_fpu_begin();
>
> ctx->ops->init(&state, &ctx->key, req->iv);
> crypto_morus1280_glue_process_ad(&state, ctx->ops, req->src, 
> req->assoclen);
> -   crypto_morus1280_glue_process_crypt(&state, ops, req);
> +   crypto_morus1280_glue_process_crypt(&state, ops, &walk);
> ctx->ops->final(&state, tag_xor, req->assoclen, cryptlen);
>
> kernel_fpu_end();
> diff --git a/arch/x86/crypto/morus640_glue.c b/arch/x86/crypto/morus640_glue.c
> index 7b58fe4d9bd1a..cb3a817320160 100644
> --- a/arch/x86/crypto/morus640_glue.c
> +++ b/arch/x86/crypto/morus640_glue.c
> @@ -85,31 +85,19 @@ static void crypto_morus640_glue_process_ad(
>
>  static void crypto_morus640_glue_process_crypt(struct morus640_state *state,
>struct morus640_ops ops,
> -  struct aead_request *req)
> +  struct skcipher_walk *walk)
>  {
> -   struct skcipher_walk walk;
> -   u8 *cursor_src, *cursor_dst;
> -   uns

[PATCH] sysfs: remove unused include of kernfs-internal.h

2019-02-04 Thread Ondrej Mosnacek

This include is not needed (fs/sysfs/file.c builds just fine without
it). Remove it.

Cc: Tejun Heo 
Signed-off-by: Ondrej Mosnacek 
---
 fs/sysfs/file.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 52d9235e0291..130fc6fbcc03 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -17,7 +17,6 @@
 #include 
 
 #include "sysfs.h"
-#include "../kernfs/kernfs-internal.h"
 
 /*
  * Determine ktype->sysfs_ops for the given kernfs_node.  This function
-- 
2.20.1

[PATCH] selinux: check length properly in SCTP bind hook

2018-11-13 Thread Ondrej Mosnacek

selinux_sctp_bind_connect() must verify if the address buffer has
sufficient length before accessing the 'sa_family' field. See
__sctp_connect() for a similar check.

The length of the whole address ('len') is already checked in the
callees.

Reported-by: Qian Cai 
Fixes: d452930fd3b9 ("selinux: Add SCTP support")
Cc:  # 4.17+
Cc: Richard Haines 
Signed-off-by: Ondrej Mosnacek 
---
Hi,

On Mon, Nov 12, 2018 at 8:39 PM Qian Cai  wrote:
> Running the trinity fuzzer on the latest mainline (rc2) generates this,
> 
> [15029.879626] BUG: KASAN: slab-out-of-bounds in 
> selinux_sctp_bind_connect+0x60/0x150
> [15029.887275] Read of size 2 at addr 801ec53c5080 by task 
> trinity-main/18081
> [15029.887294] 
> [15029.887304] CPU: 28 PID: 18081 Comm: trinity-main Tainted: GW  OE  
>4.20.0-rc2+ #15
> [15029.887311] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.50 
> 06/01/2018
> [15000.084786] [15029.887320] Call trace:
> [15029.915511]  dump_backtrace+0x0/0x2c8
> [15029.920046]  show_stack+0x24/0x30
> [15029.923367]  dump_stack+0x118/0x19c
> [15029.927539]  print_address_description+0x68/0x2a0
> [15029.932245]  kasan_report+0x1b4/0x348
> [15029.938760]  __asan_load2+0x7c/0xa0
> [15029.945098]  selinux_sctp_bind_connect+0x60/0x150
> 
> [15029.950571]  security_sctp_bind_connect+0x58/0x90
> [15029.955493]  __sctp_setsockopt_connectx+0x68/0x128 [sctp]
> [15029.960943]  sctp_setsockopt+0x764/0x2928 [sctp]
> [15029.965564]  sock_common_setsockopt+0x6c/0x80
> [15029.969923]  __arm64_sys_setsockopt+0x13c/0x1f0
> [15029.974456]  el0_svc_handler+0xd4/0x198
> [15029.978293]  el0_svc+0x8/0xc
> [15029.981174] 
> [15029.982667] Allocated by task 18081:
> [15029.986245]  kasan_kmalloc.part.1+0x40/0x108
> [15029.990517]  kasan_kmalloc+0xb4/0xc8
> [15029.994094]  __kmalloc_node+0x1c4/0x638
> [15029.997933]  kvmalloc_node+0x98/0xa8
> [15030.001511]  vmemdup_user+0x34/0x128
> [15030.005137]  __sctp_setsockopt_connectx+0x44/0x128 [sctp]
> [15030.010586]  sctp_setsockopt+0x764/0x2928 [sctp]
> [15030.015205]  sock_common_setsockopt+0x6c/0x80
> [15030.019564]  __arm64_sys_setsockopt+0x13c/0x1f0
> [15030.024096]  el0_svc_handler+0xd4/0x198
> [15030.027933]  el0_svc+0x8/0xc
> [15030.030814] 
> [15030.032306] Freed by task 3025:
> [15030.035451]  __kasan_slab_free+0x114/0x228
> [15030.039548]  kasan_slab_free+0x10/0x18
> [15030.043299]  kfree+0x114/0x408
> [15030.046357]  selinux_sk_free_security+0x38/0x48
> [15030.050888]  security_sk_free+0x3c/0x58
> [15030.054727]  __sk_destruct+0x3e8/0x570
> [15030.058478]  sk_destruct+0x4c/0x58
> [15030.061881]  __sk_free+0x68/0x138
> [15030.065197]  sk_free+0x3c/0x48
> [15030.068255]  unix_release_sock+0x4a8/0x550
> [15030.072353]  unix_release+0x34/0x50
> [15030.075843]  __sock_release+0x74/0x110
> [15030.079593]  sock_close+0x24/0x38
> [15030.082913]  __fput+0x1b8/0x368
> [15030.086055]  fput+0x20/0x30
> [15030.089199]  task_work_run+0x14c/0x1a8
> [15030.092951]  do_notify_resume+0x1e4/0x200
> [15030.096961]  work_pending+0x8/0x14
> [15030.100363] 
> [15030.101856] The buggy address belongs to the object at 801ec53c5080
> [15030.101856]  which belongs to the cache kmalloc-128 of size 128
> [15030.114376] The buggy address is located 0 bytes inside of
> [15030.114376]  128-byte region [801ec53c5080, 801ec53c5100)
> [15030.125939] The buggy address belongs to the page:
> [15030.130732] page:7fe007b14f00 count:1 mapcount:0 
> mapping:8016c0010480 index:0x0
> [15030.138738] flags: 0x5f000200(slab)
> [15030.142926] raw: 5f000200 7fe007980608 801ec000fd60 
> 8016c0010480
> [15030.150671] raw:  00660066 0001 
> 
> [15030.158413] page dumped because: kasan: bad access detected
> [15030.163984] 
> [15030.165476] Memory state around the buggy address:
> [15030.170269]  801ec53c4f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc 
> fc fc
> [15030.177491]  801ec53c5000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc 
> fc fc
> [15030.184714] >801ec53c5080: 01 fc fc fc fc fc fc fc fc fc fc fc fc fc 
> fc fc
> [15030.191934]^
> [15030.195164]  801ec53c5100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc 
> fc fc
> [15030.202386]  801ec53c5180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc 
> fc fc
> [15030.209607] 
> ==
> [15030.216828] Disabling lock debugging due to kernel taint

I think I found the cause - Qian, can you test this patch if it fixes
the problem?

 security/selinux/hooks.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 7

Re: linux-next: manual merge of the selinux tree with the vfs tree

2018-11-29 Thread Ondrej Mosnacek

On Wed, Nov 28, 2018 at 10:52 PM Paul Moore  wrote:
> On Tue, Nov 27, 2018 at 6:50 AM Stephen Rothwell  
> wrote:
> > Hi Ondrej,
> >
> > On Tue, 27 Nov 2018 09:53:32 +0100 Ondrej Mosnacek  
> > wrote:
> > >
> > > Hm... seems that there was some massive overhaul in the VFS code right
> > > at the wrong moment... There are new hooks for mounting now and the
> >
> > The mount changes have been in linux-next since before the last
> > release ...
> >
> > > code that our commit changes is now here:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/tree/security/selinux/hooks.c?h=for-next#n3131
> > >
> > > It seems that the logic is still the same, just now our patch (or the
> > > VFS one) needs to be updated to change the above line as such
> > > (untested pseudo-patch):
> > >
> > > - if (fc->purpose == FS_CONTEXT_FOR_KERNEL_MOUNT)
> > > + if (fc->purpose == 
> > > (FS_CONTEXT_FOR_KERNEL_MOUNT|FS_CONTEXT_FOR_SUBMOUNT))
> >
> > OK, so from tomorrow I will use that merge resolution.  Someone needs
> > to remember to tell Linus about this when the latter of the vfs and
> > selinux trees reach him.
>
> I will, or at least I'll do my best to remember; since we only have a
> few more week until the merge window I like my odds.  FWIW, I
> typically do a test merge on top of Linus' tree before sending the
> SELinux PR just to verify that everything is relatively clean and
> there are no surprises.
>
> Ondrej, please work with David Howells to ensure that submounts are
> handled correctly in his mount rework.

OK, I will verify that the SELinux submount fix rebased on top of
vfs/work.mount in the way I suggested above passes the same testing
(seliinux-testsuite + NFS crossmnt reproducer). I am now building two
kernels (vfs/work.mount with and without the fix) to test. Let me know
if there is anything more to do.

Thanks,

--
Ondrej Mosnacek 
Associate Software Engineer, Security Technologies
Red Hat, Inc.

Re: linux-next: manual merge of the selinux tree with the vfs tree

2018-11-30 Thread Ondrej Mosnacek

On Fri, Nov 30, 2018 at 4:10 PM David Howells  wrote:
> Ondrej Mosnacek  wrote:
>
> > - if (fc->purpose == FS_CONTEXT_FOR_KERNEL_MOUNT)
> > + if (fc->purpose == (FS_CONTEXT_FOR_KERNEL_MOUNT|FS_CONTEXT_FOR_SUBMOUNT))
>
> It's not a bitmask, so you can't do that.  You'd need to do:
>
> if (fc->purpose == FS_CONTEXT_FOR_KERNEL_MOUNT ||
> fc->purpose == FS_CONTEXT_FOR_SUBMOUNT)
>
> or use a switch.

Damn, you're right... I should have noticed that there is '==' instead of '&' :/

Thanks,

--
Ondrej Mosnacek 
Associate Software Engineer, Security Technologies
Red Hat, Inc.

Re: linux-next: manual merge of the selinux tree with the vfs tree

2018-12-01 Thread Ondrej Mosnacek

On Thu, Nov 29, 2018 at 11:07 AM Ondrej Mosnacek  wrote:
> On Wed, Nov 28, 2018 at 10:52 PM Paul Moore  wrote:
> > On Tue, Nov 27, 2018 at 6:50 AM Stephen Rothwell  
> > wrote:
> > > Hi Ondrej,
> > >
> > > On Tue, 27 Nov 2018 09:53:32 +0100 Ondrej Mosnacek  
> > > wrote:
> > > >
> > > > Hm... seems that there was some massive overhaul in the VFS code right
> > > > at the wrong moment... There are new hooks for mounting now and the
> > >
> > > The mount changes have been in linux-next since before the last
> > > release ...
> > >
> > > > code that our commit changes is now here:
> > > >
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/tree/security/selinux/hooks.c?h=for-next#n3131
> > > >
> > > > It seems that the logic is still the same, just now our patch (or the
> > > > VFS one) needs to be updated to change the above line as such
> > > > (untested pseudo-patch):
> > > >
> > > > - if (fc->purpose == FS_CONTEXT_FOR_KERNEL_MOUNT)
> > > > + if (fc->purpose == 
> > > > (FS_CONTEXT_FOR_KERNEL_MOUNT|FS_CONTEXT_FOR_SUBMOUNT))
> > >
> > > OK, so from tomorrow I will use that merge resolution.  Someone needs
> > > to remember to tell Linus about this when the latter of the vfs and
> > > selinux trees reach him.
> >
> > I will, or at least I'll do my best to remember; since we only have a
> > few more week until the merge window I like my odds.  FWIW, I
> > typically do a test merge on top of Linus' tree before sending the
> > SELinux PR just to verify that everything is relatively clean and
> > there are no surprises.
> >
> > Ondrej, please work with David Howells to ensure that submounts are
> > handled correctly in his mount rework.
>
> OK, I will verify that the SELinux submount fix rebased on top of
> vfs/work.mount in the way I suggested above passes the same testing
> (seliinux-testsuite + NFS crossmnt reproducer). I am now building two
> kernels (vfs/work.mount with and without the fix) to test. Let me know
> if there is anything more to do.

I tested the proposed patch ([1]; fixed as per correction from David
Howells) applied on top of patches v4.19-rc3..vfs/work.mount applied
on top of the 4.19.5-300 Fedora 29 kernel.

However, the submount test was still failing, so I looked again at the
list of the possible 'purpose' values and it turns out the value used
by NFS et al. is actually FS_CONTEXT_FOR_ROOT_MOUNT (it is actually
documented nicely in Documentation/filesystems/mount_api.txt). So I'll
need to build a new test kernel with updated patch ([2]) and retest...

BTW, the vfs/work.mount changes alone seem to cause some overlay test
failures (I didn't test a clean 4.19.5 so it may be due to some stable
patch as well):

Test Summary Report
---
overlay/test  (Wstat: 3072 Tests: 119 Failed: 12)
 Failed tests:  66, 74, 76-77, 79, 87, 95, 103, 108, 110-111
   117
 Non-zero exit status: 12

The failing tests are all in the context mount section, but I don't
think this is (directly) related to [3] because there are much more
tests failing and the kernel I was testing didn't include the
problematic OverlayFS patch. Perhaps the VFS patches somehow broke the
parsing of the context= mount option?

[1] 
https://gitlab.com/omos/linux-public/commit/fe5478717ddde92e3ea599e14051ad57522fdf47
[2] 
https://gitlab.com/omos/linux-public/commit/f5c58adc7babd62e4bfe8cda799459d263dc5186
[3] https://github.com/SELinuxProject/selinux-kernel/issues/43

--
Ondrej Mosnacek 
Associate Software Engineer, Security Technologies
Red Hat, Inc.

Re: linux-next: manual merge of the selinux tree with the vfs tree

2018-12-02 Thread Ondrej Mosnacek

On Sat, Dec 1, 2018 at 10:32 PM Ondrej Mosnacek  wrote:
> On Thu, Nov 29, 2018 at 11:07 AM Ondrej Mosnacek  wrote:
> > On Wed, Nov 28, 2018 at 10:52 PM Paul Moore  wrote:
> > > On Tue, Nov 27, 2018 at 6:50 AM Stephen Rothwell  
> > > wrote:
> > > > Hi Ondrej,
> > > >
> > > > On Tue, 27 Nov 2018 09:53:32 +0100 Ondrej Mosnacek 
> > > >  wrote:
> > > > >
> > > > > Hm... seems that there was some massive overhaul in the VFS code right
> > > > > at the wrong moment... There are new hooks for mounting now and the
> > > >
> > > > The mount changes have been in linux-next since before the last
> > > > release ...
> > > >
> > > > > code that our commit changes is now here:
> > > > >
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/tree/security/selinux/hooks.c?h=for-next#n3131
> > > > >
> > > > > It seems that the logic is still the same, just now our patch (or the
> > > > > VFS one) needs to be updated to change the above line as such
> > > > > (untested pseudo-patch):
> > > > >
> > > > > - if (fc->purpose == FS_CONTEXT_FOR_KERNEL_MOUNT)
> > > > > + if (fc->purpose == 
> > > > > (FS_CONTEXT_FOR_KERNEL_MOUNT|FS_CONTEXT_FOR_SUBMOUNT))
> > > >
> > > > OK, so from tomorrow I will use that merge resolution.  Someone needs
> > > > to remember to tell Linus about this when the latter of the vfs and
> > > > selinux trees reach him.
> > >
> > > I will, or at least I'll do my best to remember; since we only have a
> > > few more week until the merge window I like my odds.  FWIW, I
> > > typically do a test merge on top of Linus' tree before sending the
> > > SELinux PR just to verify that everything is relatively clean and
> > > there are no surprises.
> > >
> > > Ondrej, please work with David Howells to ensure that submounts are
> > > handled correctly in his mount rework.
> >
> > OK, I will verify that the SELinux submount fix rebased on top of
> > vfs/work.mount in the way I suggested above passes the same testing
> > (seliinux-testsuite + NFS crossmnt reproducer). I am now building two
> > kernels (vfs/work.mount with and without the fix) to test. Let me know
> > if there is anything more to do.
>
> I tested the proposed patch ([1]; fixed as per correction from David
> Howells) applied on top of patches v4.19-rc3..vfs/work.mount applied
> on top of the 4.19.5-300 Fedora 29 kernel.
>
> However, the submount test was still failing, so I looked again at the
> list of the possible 'purpose' values and it turns out the value used
> by NFS et al. is actually FS_CONTEXT_FOR_ROOT_MOUNT (it is actually
> documented nicely in Documentation/filesystems/mount_api.txt). So I'll
> need to build a new test kernel with updated patch ([2]) and retest...

Unfortunately, even with FS_CONTEXT_FOR_ROOT_MOUNT the submount test
is still failing. (Actually, re-reading the description for
FS_CONTEXT_FOR_ROOT_MOUNT it seems it is not actually related to
submounts, although we should probably still skip the check for it,
too.) I will need to dig deeper into this on Monday...

FWIW, the generated AVC denial is still the same so it must be failing
the check in that particular hook (the FILESYSTEM__MOUNT permission is
checked only in that single place):

type=AVC msg=audit(02.12.2018 01:55:03.626:604) : avc:  denied  {
mount } for  pid=4036 comm=cat name=/ dev="0:48" ino=2
scontext=unconfined_u:unconfined_r:test_readnfs_t:s0-s0:c0.c1023
tcontext=system_u:object_r:nfs_t:s0 tclass=filesystem permissive=0

>
> BTW, the vfs/work.mount changes alone seem to cause some overlay test
> failures (I didn't test a clean 4.19.5 so it may be due to some stable
> patch as well):
>
> Test Summary Report
> ---
> overlay/test  (Wstat: 3072 Tests: 119 Failed: 12)
>  Failed tests:  66, 74, 76-77, 79, 87, 95, 103, 108, 110-111
>117
>  Non-zero exit status: 12
>
> The failing tests are all in the context mount section, but I don't
> think this is (directly) related to [3] because there are much more
> tests failing and the kernel I was testing didn't include the
> problematic OverlayFS patch. Perhaps the VFS patches somehow broke the
> parsing of the context= mount option?
>
> [1] 
> https://gitlab.com/omos/linux-public/commit/fe5478717ddde92e3ea599e14051ad57522fdf47
> [2] 
> https://gitlab.com/omos/linux-public/commit/f5c58adc7babd62e4bfe8cda799459d263dc5186
> [3] https://github.com/SELinuxProject/selinux-kernel/issues/43
>
> --
> Ondrej Mosnacek 
> Associate Software Engineer, Security Technologies
> Red Hat, Inc.

-- 
Ondrej Mosnacek 
Associate Software Engineer, Security Technologies
Red Hat, Inc.

Re: linux-next: manual merge of the selinux tree with the vfs tree

2018-12-03 Thread Ondrej Mosnacek

On Sun, Dec 2, 2018 at 10:13 AM Ondrej Mosnacek  wrote:
> On Sat, Dec 1, 2018 at 10:32 PM Ondrej Mosnacek  wrote:
> > On Thu, Nov 29, 2018 at 11:07 AM Ondrej Mosnacek  
> > wrote:
> > > On Wed, Nov 28, 2018 at 10:52 PM Paul Moore  wrote:
> > > > On Tue, Nov 27, 2018 at 6:50 AM Stephen Rothwell 
> > > >  wrote:
> > > > > Hi Ondrej,
> > > > >
> > > > > On Tue, 27 Nov 2018 09:53:32 +0100 Ondrej Mosnacek 
> > > > >  wrote:
> > > > > >
> > > > > > Hm... seems that there was some massive overhaul in the VFS code 
> > > > > > right
> > > > > > at the wrong moment... There are new hooks for mounting now and the
> > > > >
> > > > > The mount changes have been in linux-next since before the last
> > > > > release ...
> > > > >
> > > > > > code that our commit changes is now here:
> > > > > >
> > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/tree/security/selinux/hooks.c?h=for-next#n3131
> > > > > >
> > > > > > It seems that the logic is still the same, just now our patch (or 
> > > > > > the
> > > > > > VFS one) needs to be updated to change the above line as such
> > > > > > (untested pseudo-patch):
> > > > > >
> > > > > > - if (fc->purpose == FS_CONTEXT_FOR_KERNEL_MOUNT)
> > > > > > + if (fc->purpose == 
> > > > > > (FS_CONTEXT_FOR_KERNEL_MOUNT|FS_CONTEXT_FOR_SUBMOUNT))
> > > > >
> > > > > OK, so from tomorrow I will use that merge resolution.  Someone needs
> > > > > to remember to tell Linus about this when the latter of the vfs and
> > > > > selinux trees reach him.
> > > >
> > > > I will, or at least I'll do my best to remember; since we only have a
> > > > few more week until the merge window I like my odds.  FWIW, I
> > > > typically do a test merge on top of Linus' tree before sending the
> > > > SELinux PR just to verify that everything is relatively clean and
> > > > there are no surprises.
> > > >
> > > > Ondrej, please work with David Howells to ensure that submounts are
> > > > handled correctly in his mount rework.
> > >
> > > OK, I will verify that the SELinux submount fix rebased on top of
> > > vfs/work.mount in the way I suggested above passes the same testing
> > > (seliinux-testsuite + NFS crossmnt reproducer). I am now building two
> > > kernels (vfs/work.mount with and without the fix) to test. Let me know
> > > if there is anything more to do.
> >
> > I tested the proposed patch ([1]; fixed as per correction from David
> > Howells) applied on top of patches v4.19-rc3..vfs/work.mount applied
> > on top of the 4.19.5-300 Fedora 29 kernel.
> >
> > However, the submount test was still failing, so I looked again at the
> > list of the possible 'purpose' values and it turns out the value used
> > by NFS et al. is actually FS_CONTEXT_FOR_ROOT_MOUNT (it is actually
> > documented nicely in Documentation/filesystems/mount_api.txt). So I'll
> > need to build a new test kernel with updated patch ([2]) and retest...
>
> Unfortunately, even with FS_CONTEXT_FOR_ROOT_MOUNT the submount test
> is still failing. (Actually, re-reading the description for
> FS_CONTEXT_FOR_ROOT_MOUNT it seems it is not actually related to
> submounts, although we should probably still skip the check for it,
> too.) I will need to dig deeper into this on Monday...

I think I figured out what's the problem. NFS still creates the
submount via the old vfs_submount() call, which calls
vfs_kern_mount(), which creates an fs_context with
FS_CONTEXT_FOR_USER_MOUNT because FS_CONTEXT_FOR_SUBMOUNT needs the
mountpoint dentry reference and there is currently no way to pass that
to vfs_kern_mount(). This is further complicated by the fact that
vfs_submount() accepts only a const reference to the mountpoint, while
vfs_new_fs_context() expects a non-const one...

I think all users of the old vfs_submount call should be converted to
the new API before the VFS changes are merged into mainline, otherwise
they will break the SELinux submount fix. We could work around it in
the SELinux hook by checking the fc->sb_flags[_mask] for SB_SUBMOUNT,
but I guess that would be a hack.

>
> FWIW, the generated AVC denial is still the same so it must be failing
> the check in that particular hook (the FILESYSTEM__MOUNT permission is
> checke

Re: linux-next: manual merge of the selinux tree with the vfs tree

2018-12-05 Thread Ondrej Mosnacek

On Mon, Dec 3, 2018 at 10:56 PM Al Viro  wrote:
> On Mon, Dec 03, 2018 at 11:12:59AM +0100, Ondrej Mosnacek wrote:
>
> > I think I figured out what's the problem. NFS still creates the
> > submount via the old vfs_submount() call, which calls
> > vfs_kern_mount(), which creates an fs_context with
> > FS_CONTEXT_FOR_USER_MOUNT because FS_CONTEXT_FOR_SUBMOUNT needs the
> > mountpoint dentry reference and there is currently no way to pass that
> > to vfs_kern_mount(). This is further complicated by the fact that
> > vfs_submount() accepts only a const reference to the mountpoint, while
> > vfs_new_fs_context() expects a non-const one...
> >
> > I think all users of the old vfs_submount call should be converted to
> > the new API before the VFS changes are merged into mainline, otherwise
> > they will break the SELinux submount fix. We could work around it in
> > the SELinux hook by checking the fc->sb_flags[_mask] for SB_SUBMOUNT,
> > but I guess that would be a hack.
>
> Could you take a look at vfs.git#Q28?  There's still a massive reshuffling
> going on, so there will be more branches; this one is the latest at the
> moment.

I just tested the Q28 branch rebased onto a recent Fedora rawhide
kernel (4.20.0-0.rc5.git0.1) and that code seems to be working fine.
The submount test failed with Q28 and succeeds with Q28+fix, as
expected. Also, the overlay tests failures are gone now (except for
the 4 known failures from GH issue #43, since I had to rebase onto
4.20-rcX).

This is the commit that I used as the SELinux submount fix:
https://gitlab.com/omos/linux-public/commit/47922f9c70a83008388b836c285f94c40da1af2b

Kernel builds:
Unfixed Q28:  
https://copr.fedorainfracloud.org/coprs/omos/kernel-testing/build/833311/
Fixed Q28:  
https://copr.fedorainfracloud.org/coprs/omos/kernel-testing/build/833312/

Selinux-testsuite reports:
=== Q28 ===
Test Summary Report
---
overlay/test  (Wstat: 1024 Tests: 119 Failed: 4)
  Failed tests:  81, 83, 107, 112
  Non-zero exit status: 4
submount/test (Wstat: 256 Tests: 2 Failed: 1)
  Failed test:  2
  Non-zero exit status: 1
Files=54, Tests=615, 117 wallclock secs ( 0.20 usr  0.04 sys +  1.64
cusr  1.29 csys =  3.17 CPU)
Result: FAIL
Failed 2/54 test programs. 5/615 subtests failed.

=== Q28 + FIX ===
Test Summary Report
---
overlay/test  (Wstat: 1024 Tests: 119 Failed: 4)
  Failed tests:  81, 83, 107, 112
  Non-zero exit status: 4
Files=54, Tests=615, 117 wallclock secs ( 0.22 usr  0.05 sys +  1.54
cusr  1.37 csys =  3.18 CPU)
Result: FAIL
Failed 1/54 test programs. 4/615 subtests failed.

--
Ondrej Mosnacek 
Associate Software Engineer, Security Technologies
Red Hat, Inc.

Re: linux-next: manual merge of the selinux tree with the vfs tree

2018-11-27 Thread Ondrej Mosnacek

On Tue, Nov 27, 2018 at 1:52 AM Stephen Rothwell  wrote:
> Hi Paul,
>
> Today's linux-next merge of the selinux tree got a conflict in:
>
>   security/selinux/hooks.c
>
> between commit:
>
>   0472421f47a9 ("vfs: Remove unused code after filesystem context changes")
>
> from the vfs tree and commit:
>
>   2cbdcb882f97 ("selinux: always allow mounting submounts")
>
> from the selinux tree.
>
> I fixed it up (the former removed the function updated by the latter -
> I am not sure if there are further changes necessary) and can carry the
> fix as necessary. This is now fixed as far as linux-next is concerned,
> but any non trivial conflicts should be mentioned to your upstream
> maintainer when your tree is submitted for merging.  You may also want
> to consider cooperating with the maintainer of the conflicting tree to
> minimise any particularly complex conflicts.

Hm... seems that there was some massive overhaul in the VFS code right
at the wrong moment... There are new hooks for mounting now and the
code that our commit changes is now here:

https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/tree/security/selinux/hooks.c?h=for-next#n3131

It seems that the logic is still the same, just now our patch (or the
VFS one) needs to be updated to change the above line as such
(untested pseudo-patch):

- if (fc->purpose == FS_CONTEXT_FOR_KERNEL_MOUNT)
+ if (fc->purpose == (FS_CONTEXT_FOR_KERNEL_MOUNT|FS_CONTEXT_FOR_SUBMOUNT))

Thanks for the heads up, Stephen!

-- 
Ondrej Mosnacek 
Associate Software Engineer, Security Technologies
Red Hat, Inc.

Re: linux-next: manual merge of the selinux tree with the vfs tree

2018-11-27 Thread Ondrej Mosnacek

On Tue, Nov 27, 2018 at 9:53 AM Ondrej Mosnacek  wrote:
> On Tue, Nov 27, 2018 at 1:52 AM Stephen Rothwell  
> wrote:
> > Hi Paul,
> >
> > Today's linux-next merge of the selinux tree got a conflict in:
> >
> >   security/selinux/hooks.c
> >
> > between commit:
> >
> >   0472421f47a9 ("vfs: Remove unused code after filesystem context changes")
> >
> > from the vfs tree and commit:
> >
> >   2cbdcb882f97 ("selinux: always allow mounting submounts")
> >
> > from the selinux tree.
> >
> > I fixed it up (the former removed the function updated by the latter -
> > I am not sure if there are further changes necessary) and can carry the
> > fix as necessary. This is now fixed as far as linux-next is concerned,
> > but any non trivial conflicts should be mentioned to your upstream
> > maintainer when your tree is submitted for merging.  You may also want
> > to consider cooperating with the maintainer of the conflicting tree to
> > minimise any particularly complex conflicts.
>
> Hm... seems that there was some massive overhaul in the VFS code right
> at the wrong moment... There are new hooks for mounting now and the
> code that our commit changes is now here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/tree/security/selinux/hooks.c?h=for-next#n3131

For convenience, here are direct links to the most important -next VFS
commits that are related:

https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/commit/?h=for-next&id=c87c47c34750e9ee1ff0345593f3cbf6726b9d4e
https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/commit/?h=for-next&id=4786c3427b2517ee9c685f95bf5b3185e332e64d
https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/commit/?h=for-next&id=37744f3d21f8dbf6bb65e1ecef38c2cf9503d202
https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/commit/?h=for-next&id=0472421f47a97be4b741d55ffd18f68ed9ba7cea

>
> It seems that the logic is still the same, just now our patch (or the
> VFS one) needs to be updated to change the above line as such
> (untested pseudo-patch):
>
> - if (fc->purpose == FS_CONTEXT_FOR_KERNEL_MOUNT)
> + if (fc->purpose == (FS_CONTEXT_FOR_KERNEL_MOUNT|FS_CONTEXT_FOR_SUBMOUNT))
>
> Thanks for the heads up, Stephen!
>
> --
> Ondrej Mosnacek 
> Associate Software Engineer, Security Technologies
> Red Hat, Inc.

-- 
Ondrej Mosnacek 
Associate Software Engineer, Security Technologies
Red Hat, Inc.

[RFC PATCH ghak10 v6 0/2] audit: Log changes that can affect the system clock

2019-03-07 Thread Ondrej Mosnacek

This patchset implements auditing of (syscall-triggered) changes that
can modify or indirectly affect the system clock. Some of these
changes can already be detected by simply logging relevant syscalls,
but this has some disadvantages:
  a) It is usually not possible to find out from the syscall records
 the amount by which the time was shifted.
  b) Syscalls like adjtimex(2) or clock_adjtime(2) can be used also
 for read-only operations, which might flood the audit log with
 false positives. (Note that these patches don't solve this
 problem yet due to the limitations of current record filtering
 capabilities.)

The main motivation is to provide better reliability of timestamps
on the system as mandated by the FPT_STM.1 security functional
requirement from Common Criteria. This requirement apparently demands
that it is possible to reconstruct from audit trail the old and new
values of the time when it is adjusted (see [1]).

The current version of the patchset logs the following changes:
  - direct setting of system time to a given value
  - direct injection of timekeeping offset
  - adjustment of timekeeping's TAI offset
  - NTP value adjustments:
- time_offset
- time_freq
- time_status
- time_adjust
- tick_usec

Changes to the following NTP values are not logged, as they are not
important for security:
  - time_maxerror
  - time_esterror
  - time_constant

Audit kernel GitHub issue: https://github.com/linux-audit/audit-kernel/issues/10
Audit kernel RFE page: 
https://github.com/linux-audit/audit-kernel/wiki/RFE-More-detailed-auditing-of-changes-to-system-clock

Testing: Passed audit-testuite; functional tests TBD

Changes in v6:
  - Reorganized the patches to group changes by record type, not
kernel subsytem, as suggested in earlier discussions
  - Added checks to ignore no-change events (new value == old value)
  - Added TIME_INJOFFSET logging also to do_settimeofday64() to cover
syscalls such as settimeofday(2), stime(2), clock_settime(2)
  - Created an RFE page on audit-kernel GitHub
TODO:
  - tests for audit-testsuite

v5: https://www.redhat.com/archives/linux-audit/2018-August/msg00039.html
Changes in v5:
  - Dropped logging of some less important changes and update commit messages
  - No longer mark the patchset as RFC

v4: https://www.redhat.com/archives/linux-audit/2018-August/msg00023.html
Changes in v4:
  - Squashed first two patches into one
  - Renamed ADJNTPVAL's "type" field to "op" to align with audit record
conventions
  - Minor commit message editing
  - Cc timekeeping/NTP people for feedback

v3: https://www.redhat.com/archives/linux-audit/2018-July/msg1.html
Changes in v3:
  - Switched to separate records for each variable
  - Both old and new value is now reported for each change
  - Injecting offset is reported via a separate record (since this
offset consists of two values and is added directly to the clock,
i.e. it doesn't make sense to log old and new value)
  - Added example records produced by chronyd -q (see the commit message
of the last patch)

v2: https://www.redhat.com/archives/linux-audit/2018-June/msg00114.html
Changes in v2:
  - The audit_adjtime() function has been modified to only log those
fields that contain values that are actually used, resulting in more
compact records.
  - The audit_adjtime() call has been moved to do_adjtimex() in
timekeeping.c
  - Added an additional patch (for review) that simplifies the detection
if the syscall is read-only.

v1: https://www.redhat.com/archives/linux-audit/2018-June/msg00095.html

[1] https://www.niap-ccevs.org/MMO/PP/pp_ca_v2.1.pdf -- section 5.1,
table 4

Ondrej Mosnacek (2):
  timekeeping: Audit clock adjustments
  ntp: Audit NTP parameters adjustment

 include/linux/audit.h  | 29 +
 include/uapi/linux/audit.h |  2 ++
 kernel/auditsc.c   | 15 +++
 kernel/time/ntp.c  | 38 ++
 kernel/time/timekeeping.c  |  6 ++
 5 files changed, 82 insertions(+), 8 deletions(-)

-- 
2.20.1

[RFC PATCH ghak10 v6 2/2] ntp: Audit NTP parameters adjustment

2019-03-07 Thread Ondrej Mosnacek

rt/delete leap seconds and it also enables/
   disables synchronization of the hardware real-time
   clock (AUDITED)
time_maxerror, time_esterror -- change error estimates used to
inform userspace applications
(NOT AUDITED)
time_constant -- controls the speed of the clock adjustments that
 are made when time_offset is set (NOT AUDITED)
time_adjust -- can temporarily speed up or slow down the clock by up
   to 0.05% (AUDITED)
tick_usec -- a more extreme version of time_freq; can speed up or
 slow down the clock by up to 10% (AUDITED)

Signed-off-by: Ondrej Mosnacek 
---
 include/linux/audit.h  | 14 ++
 include/uapi/linux/audit.h |  1 +
 kernel/auditsc.c   |  7 +++
 kernel/time/ntp.c  | 38 ++
 4 files changed, 52 insertions(+), 8 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 43a60fbe74be..0f67964544cc 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -367,6 +367,7 @@ extern void __audit_mmap_fd(int fd, int flags);
 extern void __audit_log_kern_module(char *name);
 extern void __audit_fanotify(unsigned int response);
 extern void __audit_tk_injoffset(struct timespec64 offset);
+extern void __audit_ntp_adjust(const char *type, s64 oldval, s64 newval);
 
 static inline void audit_ipc_obj(struct kern_ipc_perm *ipcp)
 {
@@ -479,6 +480,16 @@ static inline void audit_tk_injoffset(struct timespec64 
offset)
__audit_tk_injoffset(offset);
 }
 
+static inline void audit_ntp_adjust(const char *type, s64 oldval, s64 newval)
+{
+   /* ignore no-op events */
+   if (newval == oldval)
+   return;
+
+   if (!audit_dummy_context())
+   __audit_ntp_adjust(type, oldval, newval);
+}
+
 extern int audit_n_rules;
 extern int audit_signals;
 #else /* CONFIG_AUDITSYSCALL */
@@ -595,6 +606,9 @@ static inline void audit_fanotify(unsigned int response)
 static inline void audit_tk_injoffset(struct timespec64 offset)
 { }
 
+static inline void audit_ntp_adjust(const char *type, s64 oldval, s64 newval)
+{ }
+
 static inline void audit_ptrace(struct task_struct *t)
 { }
 #define audit_n_rules 0
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index 2167d55bc800..e9781f0385eb 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -115,6 +115,7 @@
 #define AUDIT_KERN_MODULE  1330/* Kernel Module events */
 #define AUDIT_FANOTIFY 1331/* Fanotify access decision */
 #define AUDIT_TIME_INJOFFSET   1332/* Timekeeping offset injected */
+#define AUDIT_TIME_ADJNTPVAL   1333/* NTP value adjustment */
 
 #define AUDIT_AVC  1400/* SE Linux avc denial or grant */
 #define AUDIT_SELINUX_ERR  1401/* Internal SE Linux Errors */
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 781336d0f2de..946806174cd9 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -2520,6 +2520,13 @@ void __audit_tk_injoffset(struct timespec64 offset)
  "sec=%lli nsec=%li", (long long)offset.tv_sec, 
offset.tv_nsec);
 }
 
+void __audit_ntp_adjust(const char *type, s64 oldval, s64 newval)
+{
+   audit_log(audit_context(), GFP_ATOMIC, AUDIT_TIME_ADJNTPVAL,
+ "op=%s old=%lli new=%lli", type,
+ (long long)oldval, (long long)newval);
+}
+
 static void audit_log_task(struct audit_buffer *ab)
 {
kuid_t auid, uid;
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 36a2bef00125..5f456a84151a 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ntp_internal.h"
 #include "timekeeping_internal.h"
@@ -293,6 +294,8 @@ static inline s64 ntp_update_offset_fll(s64 offset64, long 
secs)
 
 static void ntp_update_offset(long offset)
 {
+   s64 old_offset = time_offset;
+   s64 old_freq = time_freq;
s64 freq_adj;
s64 offset64;
long secs;
@@ -341,6 +344,9 @@ static void ntp_update_offset(long offset)
time_freq   = max(freq_adj, -MAXFREQ_SCALED);
 
time_offset = div_s64(offset64 << NTP_SCALE_SHIFT, NTP_INTERVAL_FREQ);
+
+   audit_ntp_adjust("offset", old_offset, time_offset);
+   audit_ntp_adjust("freq", old_freq, time_freq);
 }
 
 /**
@@ -658,21 +664,31 @@ static inline void process_adj_status(const struct timex 
*txc)
 
 static inline void process_adjtimex_modes(const struct timex *txc, s32 
*time_tai)
 {
-   if (txc->modes & ADJ_STATUS)
-   process_adj_status(txc);
+   if (txc->modes & (ADJ_STATUS | ADJ_NANO | ADJ_MICRO)) {
+   int old_status = time_status;
+
+   if (txc->modes & ADJ_STATUS)
+   process_adj_status(txc);
 
-   if (txc->modes

[RFC PATCH ghak10 v6 1/2] timekeeping: Audit clock adjustments

2019-03-07 Thread Ondrej Mosnacek

Emit an audit record whenever the system clock is changed (i.e. shifted
by a non-zero offset) by a syscall from userspace. The syscalls than can
(at the time of writing) trigger such record are:
  - settimeofday(2), stime(2), clock_settime(2) -- via
do_settimeofday64()
  - adjtimex(2), clock_adjtime(2) -- via do_adjtimex()

The new records have type AUDIT_TIME_INJOFFSET and contain the following
fields:
  - sec -- the 'seconds' part of the offset
  - nsec -- the 'nanoseconds' part of the offset

For reference, running the following commands:

auditctl -D
auditctl -a exit,always -F arch=b64 -S adjtimex
chronyd -q

triggers (among others) a syscall that produces audit records like this:

type=TIME_INJOFFSET msg=audit(1530616049.652:13): sec=-16 nsec=124887145
type=SYSCALL msg=audit(1530616049.652:13): arch=c03e syscall=159 
success=yes exit=5 a0=7fff57e78270 a1=1 a2=fff0 a3=137b828205ca12 
items=0 ppid=626 pid=629 auid=0 uid=385 gid=382 euid=385 suid=385 fsuid=385 
egid=382 sgid=382 fsgid=382 tty=(none) ses=1 comm="chronyd" 
exe="/usr/sbin/chronyd" subj=system_u:system_r:kernel_t:s0 key=(null)
type=PROCTITLE msg=audit(1530616049.652:13): proctitle=6368726F6E7964002D71 cd 
/home/omosnace/Dokumenty/Kernel/worktrees/audit/src/kernel/time
s

The above records have been produced by the following syscall from
chronyd (as per strace output):

adjtimex({modes=ADJ_SETOFFSET|ADJ_NANO, offset=0, freq=750433, 
maxerror=1600, esterror=1600, status=STA_UNSYNC|STA_NANO, constant=2, 
precision=1, tolerance=32768000, time={tv_sec=1530616033, tv_usec=778717675}, 
tick=1, ppsfreq=0, jitter=0, shift=0, stabil=0, jitcnt=0, calcnt=0, 
errcnt=0, stbcnt=0, tai=0}) = 5 (TIME_ERROR)

(The struct timex fields above are from *after* the syscall was
executed, so they contain the current (new) values as set from the
kernel, except of the 'modes' field, which contains the original value
sent by the caller.)

Signed-off-by: Ondrej Mosnacek 
---
 include/linux/audit.h  | 15 +++
 include/uapi/linux/audit.h |  1 +
 kernel/auditsc.c   |  8 
 kernel/time/timekeeping.c  |  6 ++
 4 files changed, 30 insertions(+)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 1e69d9fe16da..43a60fbe74be 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -27,6 +27,7 @@
 #include 
 #include   /* LOOKUP_* */
 #include 
+#include 
 
 #define AUDIT_INO_UNSET ((unsigned long)-1)
 #define AUDIT_DEV_UNSET ((dev_t)-1)
@@ -365,6 +366,7 @@ extern void __audit_log_capset(const struct cred *new, 
const struct cred *old);
 extern void __audit_mmap_fd(int fd, int flags);
 extern void __audit_log_kern_module(char *name);
 extern void __audit_fanotify(unsigned int response);
+extern void __audit_tk_injoffset(struct timespec64 offset);
 
 static inline void audit_ipc_obj(struct kern_ipc_perm *ipcp)
 {
@@ -467,6 +469,16 @@ static inline void audit_fanotify(unsigned int response)
__audit_fanotify(response);
 }
 
+static inline void audit_tk_injoffset(struct timespec64 offset)
+{
+   /* ignore no-op events */
+   if (offset.tv_sec == 0 && offset.tv_nsec == 0)
+   return;
+
+   if (!audit_dummy_context())
+   __audit_tk_injoffset(offset);
+}
+
 extern int audit_n_rules;
 extern int audit_signals;
 #else /* CONFIG_AUDITSYSCALL */
@@ -580,6 +592,9 @@ static inline void audit_log_kern_module(char *name)
 static inline void audit_fanotify(unsigned int response)
 { }
 
+static inline void audit_tk_injoffset(struct timespec64 offset)
+{ }
+
 static inline void audit_ptrace(struct task_struct *t)
 { }
 #define audit_n_rules 0
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index 36a7e3f18e69..2167d55bc800 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -114,6 +114,7 @@
 #define AUDIT_REPLACE  1329/* Replace auditd if this packet 
unanswerd */
 #define AUDIT_KERN_MODULE  1330/* Kernel Module events */
 #define AUDIT_FANOTIFY 1331/* Fanotify access decision */
+#define AUDIT_TIME_INJOFFSET   1332/* Timekeeping offset injected */
 
 #define AUDIT_AVC  1400/* SE Linux avc denial or grant */
 #define AUDIT_SELINUX_ERR  1401/* Internal SE Linux Errors */
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index d1eab1d4a930..781336d0f2de 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -2512,6 +2512,14 @@ void __audit_fanotify(unsigned int response)
AUDIT_FANOTIFY, "resp=%u", response);
 }
 
+/* We need to allocate with GFP_ATOMIC here, since these two functions will be
+ * called while holding the timekeeping lock: */
+void __audit_tk_injoffset(struct timespec64 offset)
+{
+   audit_log(audit_context(), GFP_ATOMIC, AUDIT_TIME_INJOFFSET,
+ "sec=%lli nsec=%li", (long long)offset.tv_sec, 
offset.tv_nsec);
+}
+
 s

Re: [RFC/RFT PATCH 02/15] crypto: morus - fix handling chunked inputs

2019-01-31 Thread Ondrej Mosnacek

Hi Eric,

On Wed, Jan 23, 2019 at 11:52 PM Eric Biggers  wrote:
> From: Eric Biggers 
>
> The generic MORUS implementations all fail the improved AEAD tests
> because they produce the wrong result with some data layouts.  Fix them.
>
> Fixes: 396be41f16fd ("crypto: morus - Add generic MORUS AEAD implementations")
> Cc:  # v4.18+
> Cc: Ondrej Mosnacek 
> Signed-off-by: Eric Biggers 
> ---
>  crypto/morus1280.c | 13 +++--
>  crypto/morus640.c  | 13 +++--
>  2 files changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/crypto/morus1280.c b/crypto/morus1280.c
> index 3889c188f266..b83576b4eb55 100644
> --- a/crypto/morus1280.c
> +++ b/crypto/morus1280.c
> @@ -366,18 +366,19 @@ static void crypto_morus1280_process_crypt(struct 
> morus1280_state *state,
>const struct morus1280_ops *ops)
>  {
> struct skcipher_walk walk;
> -   u8 *dst;
> -   const u8 *src;
>
> ops->skcipher_walk_init(&walk, req, false);
>
> while (walk.nbytes) {
> -   src = walk.src.virt.addr;
> -   dst = walk.dst.virt.addr;
> +   unsigned int nbytes = walk.nbytes;
>
> -   ops->crypt_chunk(state, dst, src, walk.nbytes);
> +   if (nbytes < walk.total)
> +   nbytes = round_down(nbytes, walk.stride);
>
> -   skcipher_walk_done(&walk, 0);
> +   ops->crypt_chunk(state, walk.dst.virt.addr, 
> walk.src.virt.addr,
> +nbytes);
> +
> +   skcipher_walk_done(&walk, walk.nbytes - nbytes);
> }

Hm... I assume the problem was that skcipher_walk may give you nbytes
that is not rounded to the algorithm's walksize (aka walk.stride) even
in the middle of the stream, right? I must have misread the code in
skcipher.c and assumed that it automatically makes every step of a
round size, with the exception of the very last one, which may include
a final partial block. Thinking about it now it makes sense to me that
this isn't the case, since for stream ciphers it would mean some
unnecessary memcpy'ing in the skcipher_walk code...

Maybe you could explain the problem in one or two sentences in the
commit message(s), since the contract of skcipher_walk doesn't seem to
be documented anywhere and so it is not obvious why the change is
necessary.

Thank you for all your work on this - the improved testmgr tests were
long needed!

>  }
>
> diff --git a/crypto/morus640.c b/crypto/morus640.c
> index da06ec2f6a80..b6a477444f6d 100644
> --- a/crypto/morus640.c
> +++ b/crypto/morus640.c
> @@ -365,18 +365,19 @@ static void crypto_morus640_process_crypt(struct 
> morus640_state *state,
>   const struct morus640_ops *ops)
>  {
> struct skcipher_walk walk;
> -   u8 *dst;
> -   const u8 *src;
>
> ops->skcipher_walk_init(&walk, req, false);
>
> while (walk.nbytes) {
> -   src = walk.src.virt.addr;
> -   dst = walk.dst.virt.addr;
> +   unsigned int nbytes = walk.nbytes;
>
> -   ops->crypt_chunk(state, dst, src, walk.nbytes);
> +   if (nbytes < walk.total)
> +   nbytes = round_down(nbytes, walk.stride);
>
> -   skcipher_walk_done(&walk, 0);
> +   ops->crypt_chunk(state, walk.dst.virt.addr, 
> walk.src.virt.addr,
> +nbytes);
> +
> +   skcipher_walk_done(&walk, walk.nbytes - nbytes);
> }
>  }
>
> --
> 2.20.1.321.g9e740568ce-goog
>

--
Ondrej Mosnacek 
Associate Software Engineer, Security Technologies
Red Hat, Inc.

[PATCH 1/3] ntp: Remove redundant arguments

2018-07-13 Thread Ondrej Mosnacek

The 'ts' argument of process_adj_status() and process_adjtimex_modes()
is unused and can be safely removed.

Signed-off-by: Ondrej Mosnacek 
---
 kernel/time/ntp.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index a09ded765f6c..25031ffb5d25 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -642,7 +642,7 @@ void ntp_notify_cmos_timer(void)
 /*
  * Propagate a new txc->status value into the NTP state:
  */
-static inline void process_adj_status(struct timex *txc, struct timespec64 *ts)
+static inline void process_adj_status(struct timex *txc)
 {
if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
time_state = TIME_OK;
@@ -665,12 +665,10 @@ static inline void process_adj_status(struct timex *txc, 
struct timespec64 *ts)
 }
 
 
-static inline void process_adjtimex_modes(struct timex *txc,
-   struct timespec64 *ts,
-   s32 *time_tai)
+static inline void process_adjtimex_modes(struct timex *txc, s32 *time_tai)
 {
if (txc->modes & ADJ_STATUS)
-   process_adj_status(txc, ts);
+   process_adj_status(txc);
 
if (txc->modes & ADJ_NANO)
time_status |= STA_NANO;
@@ -735,7 +733,7 @@ int __do_adjtimex(struct timex *txc, struct timespec64 *ts, 
s32 *time_tai)
 
/* If there are input parameters, then process them: */
if (txc->modes)
-   process_adjtimex_modes(txc, ts, time_tai);
+   process_adjtimex_modes(txc, time_tai);
 
txc->offset = shift_right(time_offset * NTP_INTERVAL_FREQ,
  NTP_SCALE_SHIFT);
-- 
2.17.1

[PATCH 2/3] ntp: Use kstrtos64 for s64 variable

2018-07-13 Thread Ondrej Mosnacek

...instead of kstrtol with a dirty cast.

Signed-off-by: Ondrej Mosnacek 
---
 kernel/time/ntp.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 25031ffb5d25..6c764addef3e 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -1020,12 +1020,11 @@ void __hardpps(const struct timespec64 *phase_ts, const 
struct timespec64 *raw_t
 
 static int __init ntp_tick_adj_setup(char *str)
 {
-   int rc = kstrtol(str, 0, (long *)&ntp_tick_adj);
-
+   int rc = kstrtos64(str, 0, &ntp_tick_adj);
if (rc)
return rc;
-   ntp_tick_adj <<= NTP_SCALE_SHIFT;
 
+   ntp_tick_adj <<= NTP_SCALE_SHIFT;
return 1;
 }
 
-- 
2.17.1

[PATCH 3/3] timekeeping/ntp: Constify some function arguments

2018-07-13 Thread Ondrej Mosnacek

Add 'const' to some function arguments and variables to make it easier
to read the code.

Signed-off-by: Ondrej Mosnacek 
---
 include/linux/timekeeping.h|  2 +-
 kernel/time/ntp.c  |  6 +++---
 kernel/time/ntp_internal.h |  2 +-
 kernel/time/timekeeping.c  | 29 +++--
 kernel/time/timekeeping_debug.c|  2 +-
 kernel/time/timekeeping_internal.h |  2 +-
 6 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index 86bc2026efce..edace6b656e9 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -177,7 +177,7 @@ static inline time64_t ktime_get_clocktai_seconds(void)
 extern bool timekeeping_rtc_skipsuspend(void);
 extern bool timekeeping_rtc_skipresume(void);
 
-extern void timekeeping_inject_sleeptime64(struct timespec64 *delta);
+extern void timekeeping_inject_sleeptime64(const struct timespec64 *delta);
 
 /*
  * struct system_time_snapshot - simultaneous raw/real time capture with
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 6c764addef3e..f2c7bbf65db8 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -642,7 +642,7 @@ void ntp_notify_cmos_timer(void)
 /*
  * Propagate a new txc->status value into the NTP state:
  */
-static inline void process_adj_status(struct timex *txc)
+static inline void process_adj_status(const struct timex *txc)
 {
if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
time_state = TIME_OK;
@@ -665,7 +665,7 @@ static inline void process_adj_status(struct timex *txc)
 }
 
 
-static inline void process_adjtimex_modes(struct timex *txc, s32 *time_tai)
+static inline void process_adjtimex_modes(const struct timex *txc, s32 
*time_tai)
 {
if (txc->modes & ADJ_STATUS)
process_adj_status(txc);
@@ -716,7 +716,7 @@ static inline void process_adjtimex_modes(struct timex 
*txc, s32 *time_tai)
  * adjtimex mainly allows reading (and writing, if superuser) of
  * kernel time-keeping variables. used by xntpd.
  */
-int __do_adjtimex(struct timex *txc, struct timespec64 *ts, s32 *time_tai)
+int __do_adjtimex(struct timex *txc, const struct timespec64 *ts, s32 
*time_tai)
 {
int result;
 
diff --git a/kernel/time/ntp_internal.h b/kernel/time/ntp_internal.h
index 909bd1f1bfb1..21292c7c56a1 100644
--- a/kernel/time/ntp_internal.h
+++ b/kernel/time/ntp_internal.h
@@ -8,6 +8,6 @@ extern void ntp_clear(void);
 extern u64 ntp_tick_length(void);
 extern ktime_t ntp_get_next_leap(void);
 extern int second_overflow(time64_t secs);
-extern int __do_adjtimex(struct timex *, struct timespec64 *, s32 *);
+extern int __do_adjtimex(struct timex *, const struct timespec64 *, s32 *);
 extern void __hardpps(const struct timespec64 *, const struct timespec64 *);
 #endif /* _LINUX_NTP_INTERNAL_H */
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 4786df904c22..6253bb21c3bf 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -97,7 +97,7 @@ static inline void tk_normalize_xtime(struct timekeeper *tk)
}
 }
 
-static inline struct timespec64 tk_xtime(struct timekeeper *tk)
+static inline struct timespec64 tk_xtime(const struct timekeeper *tk)
 {
struct timespec64 ts;
 
@@ -154,7 +154,7 @@ static inline void tk_update_sleep_time(struct timekeeper 
*tk, ktime_t delta)
  * a read of the fast-timekeeper tkrs (which is protected by its own locking
  * and update logic).
  */
-static inline u64 tk_clock_read(struct tk_read_base *tkr)
+static inline u64 tk_clock_read(const struct tk_read_base *tkr)
 {
struct clocksource *clock = READ_ONCE(tkr->clock);
 
@@ -203,7 +203,7 @@ static void timekeeping_check_update(struct timekeeper *tk, 
u64 offset)
}
 }
 
-static inline u64 timekeeping_get_delta(struct tk_read_base *tkr)
+static inline u64 timekeeping_get_delta(const struct tk_read_base *tkr)
 {
struct timekeeper *tk = &tk_core.timekeeper;
u64 now, last, mask, max, delta;
@@ -247,7 +247,7 @@ static inline u64 timekeeping_get_delta(struct tk_read_base 
*tkr)
 static inline void timekeeping_check_update(struct timekeeper *tk, u64 offset)
 {
 }
-static inline u64 timekeeping_get_delta(struct tk_read_base *tkr)
+static inline u64 timekeeping_get_delta(const struct tk_read_base *tkr)
 {
u64 cycle_now, delta;
 
@@ -344,7 +344,7 @@ u32 (*arch_gettimeoffset)(void) = 
default_arch_gettimeoffset;
 static inline u32 arch_gettimeoffset(void) { return 0; }
 #endif
 
-static inline u64 timekeeping_delta_to_ns(struct tk_read_base *tkr, u64 delta)
+static inline u64 timekeeping_delta_to_ns(const struct tk_read_base *tkr, u64 
delta)
 {
u64 nsec;
 
@@ -355,7 +355,7 @@ static inline u64 timekeeping_delta_to_ns(struct 
tk_read_base *tkr, u64 delta)
return nsec + arch_gettimeoffset();
 }
 
-static inline u64 timekeeping_get_ns(struct tk_read_base *tk

Re: [PATCH ghak10 v5 1/2] audit: Add functions to log time adjustments

2018-09-13 Thread Ondrej Mosnacek

On Mon, Aug 27, 2018 at 6:38 PM Steve Grubb  wrote:
> On Monday, August 27, 2018 5:13:17 AM EDT Ondrej Mosnacek wrote:
> > On Mon, Aug 27, 2018 at 9:50 AM Miroslav Lichvar 
> wrote:
> > > On Fri, Aug 24, 2018 at 02:00:00PM +0200, Ondrej Mosnacek wrote:
> > > > This patch adds two auxiliary record types that will be used to
> > > > annotate
> > > > the adjtimex SYSCALL records with the NTP/timekeeping values that have
> > > > been changed.
> > >
> > > It seems the "adjust" function intentionally logs also calls/modes
> > > that don't actually change anything. Can you please explain it a bit
> > > in the message?
> > >
> > > NTP/PTP daemons typically don't read the adjtimex values in a normal
> > > operation and overwrite them on each update, even if they don't
> > > change. If the audit function checked that oldval != newval, the
> > > number of messages would be reduced and it might be easier to follow.
> >
> > We actually want to log any attempt to change a value, as even an
> > intention to set/change something could be a hint that the process is
> > trying to do something bad (see discussion at [1]).
>
> One of the problems is that these applications can flood the logs very
> quickly. An attempt to change is not needed unless it fails for permissions
> reasons. So, limiting to actual changes is probably a good thing.

Well, Richard seemed to "violently" agree with the opposite, so now I
don't know which way to go... Paul, you are the official tie-breaker
here, which do you prefer?

>
> -Steve
>
> > There are valid
> > arguments both for and against this choice, but we have to pick one in
> > the end... Anyway, I should explain the reasoning in the commit
> > message better, right now it just states the fact without explanation
> > (in the second patch), thank you for pointing my attention to it.
> >
> > [1] https://www.redhat.com/archives/linux-audit/2018-July/msg00061.html
> >
> > --
> > Ondrej Mosnacek 
> > Associate Software Engineer, Security Technologies
> > Red Hat, Inc.
>
>
>
>

--
Ondrej Mosnacek 
Associate Software Engineer, Security Technologies
Red Hat, Inc.

Re: [PATCH ghak10 v5 1/2] audit: Add functions to log time adjustments

2018-09-17 Thread Ondrej Mosnacek

On Fri, Sep 14, 2018 at 5:19 AM Paul Moore  wrote:
> On Fri, Aug 24, 2018 at 8:00 AM Ondrej Mosnacek  wrote:
> > This patch adds two auxiliary record types that will be used to annotate
> > the adjtimex SYSCALL records with the NTP/timekeeping values that have
> > been changed.
> >
> > Next, it adds two functions to the audit interface:
> >  - audit_tk_injoffset(), which will be called whenever a timekeeping
> >offset is injected by a syscall from userspace,
> >  - audit_ntp_adjust(), which will be called whenever an NTP internal
> >variable is changed by a syscall from userspace.
> >
> > Quick reference for the fields of the new records:
> > AUDIT_TIME_INJOFFSET
> > sec - the 'seconds' part of the offset
> > nsec - the 'nanoseconds' part of the offset
> > AUDIT_TIME_ADJNTPVAL
> > op - which value was adjusted:
> > offset - corresponding to the time_offset variable
> > freq   - corresponding to the time_freq variable
> > status - corresponding to the time_status variable
> > adjust - corresponding to the time_adjust variable
> > tick   - corresponding to the tick_usec variable
> > tai- corresponding to the timekeeping's TAI offset
>
> I understand that reusing "op" is tempting, but the above aren't
> really operations, they are state variables which are being changed.

I remember Steve (or was it Richard?) convincing me at one of the
meetings that "op" is the right filed name to use, despite it not
being a name for an operation... But I don't really care, I'm okay
with changing it to e.g. "var" as Richard suggests later in this
thread.

> Using the CONFIG_CHANGE record as a basis, I wonder if we are better
> off with something like the following:
>
>  type=TIME_CHANGE = old=
>
> ... you might need to preface the variable names with something like
> "ntp_" or "offset_".  You'll notice I'm also suggesting we use a
> single record type here; is there any reason why two records types are
> required?

There are actually two reasons:
1. The injected offset is a timespec64, so it consists of two integer
values (and it would be weird to produce two records for it, since IMO
it is conceptually still a single variable).
2. In all other cases the variable is reset to the (possibly
transformed) input value, while in this case the input value is added
directly to the system time. This can be viewed as a kind of variable
too, but it would be weird to report old and new value for it, since
its value flows with time.

Plus, when I look at:
type=TIME_INJOFFSET [...]: sec=-16 nsec=124887145

I can immediately see that the time was shifted back by 16-something
seconds, while when I look at something like:

type=TIME_CHANGE [...]: var=time_sec new=1537185685 old=1537185701
type=TIME_CHANGE [...]: var=time_nsec new=664373417 old=789260562

I can just see some big numbers that I need to do math with before I
get an idea of what is the magnitude (or sign) of the change.

>
> > old - the old value
> > new - the new value
> >
> > Signed-off-by: Ondrej Mosnacek 
> > ---
> >  include/linux/audit.h  | 21 +
> >  include/uapi/linux/audit.h |  2 ++
> >  kernel/auditsc.c   | 15 +++
> >  3 files changed, 38 insertions(+)
>
> A reminder that we need tests for these new records and a RFE page on the 
> wiki:
>
> * https://github.com/linux-audit/audit-testsuite

I was going to start working on this once the format issues are
settled. (Although I probably should have kept the RFC in the subject
until then...)

> * https://github.com/linux-audit/audit-kernel/wiki

I admit I forgot about this duty, but again I would like to wait for
the discussions to settle before writing that up.

>
> --
> paul moore
> www.paul-moore.com--
Ondrej Mosnacek 
Associate Software Engineer, Security Technologies
Red Hat, Inc.

Re: [RFC PATCH v4] IV Generation algorithms for dm-crypt

2017-03-01 Thread Ondrej Mosnacek

2017-03-01 13:42 GMT+01:00 Gilad Ben-Yossef :
> It really is an observation about overhead of context switches between
> dm-crypt and
> whatever/wherever you handle crypto - be it an off CPU hardware engine
> or a bunch
> of parallel kernel threads running on other cores. You really want to
> burst as much as
> possible.

[...]

>> For XTS you need just simple linear IV. No problem with that, implementation
>> in crypto API and hw is trivial.
>>
>> But for compatible IV (that provides compatibility with loopAES and very old 
>> TrueCrypt),
>> these should be never ever implemented again anywhere.
>
>>
>> Specifically "tcw" is broken, insecure and provided here just to help people 
>> to migrate
>> from old Truecrypt containers. Even Truecrypt followers removed it from the 
>> codebase.
>> (It is basically combination of IV and slight modification of CBC mode. All
>> recent version switched to XTS and plain IV.)
>>
>> So building abstraction over something known to be broken and that is now 
>> intentionally
>> isolated inside dmcrypt is, in my opinion, really not a good idea.
>>
>
> I don't think anyone is interested in these modes. How do you support
> XTS and essiv in
> a generic way without supporting this broken modes is not something
> I'm clear on though.

Wouldn't adopting a bulk request API (something like what I tried to
do here [1]) that allows users to supply multiple messages, each with
their own IV, fulfill this purpose? That way, we wouldn't need to
introduce any new modes into Crypto API and the drivers/accelerators
would only need to provide bulk implementations of common modes
(xts(aes), cbc(aes), ...) to provide better performance for dm-crypt
(and possibly other users, too).

I'm not sure how exactly these crypto accelerators work, but wouldn't
it help if the drivers simply get more messages (in our case sectors)
in a single call? I wonder, would (efficiently) supporting such a
scheme require changes in the HW itself or could it be achieved just
by modifying driver code (let's say specifically for your CryptoCell
accelerator)?

Thanks,
Ondrej

[1] https://www.mail-archive.com/linux-crypto@vger.kernel.org/msg23007.html

>
> Thanks,
> Gilad
>
>
>
> --
> Gilad Ben-Yossef
> Chief Coffee Drinker
>
> "If you take a class in large-scale robotics, can you end up in a
> situation where the homework eats your dog?"
>  -- Jean-Baptiste Queru

Re: [BUG] Race between policy reload sidtab conversion and live conversion

2021-03-01 Thread Ondrej Mosnacek

On Sat, Feb 27, 2021 at 3:35 AM Hillf Danton  wrote:
> On Fri, 26 Feb 2021 12:19:35 +0100  Ondrej Mosnacek wrote:
> > On Fri, Feb 26, 2021 at 5:08 AM Hillf Danton  wrote:
> > > On Thu, 25 Feb 2021 20:06:45 -0500 Paul Moore wrote:
> > > > On Wed, Feb 24, 2021 at 4:35 AM Ondrej Mosnacek  
> > > > wrote:
> > > > > After the switch to RCU, we now have:
> > > > > 1. Start live conversion of new entries.
> > > > > 2. Convert existing entries.
> > > > > 3. RCU-assign the new policy pointer to selinux_state.
> > > > > [!!! Now actually both old and new sidtab may be referenced by
> > > > > readers, since there is no synchronization barrier previously provided
> > > > > by the write lock.]
> > > > > 4. Wait for synchronize_rcu() to return.
> > > > > 5. Now only the new sidtab is visible to readers, so the old one can
> > > > > be destroyed.
> > > > >
> > > > > So the race can happen between 3. and 5., if one thread already sees
> > > > > the new sidtab and adds a new entry there, and a second thread still
> > > > > has the reference to the old sidtab and also tires to add a new entry;
> > > > > live-converting to the new sidtab, which it doesn't expect to change
> > > > > by itself. Unfortunately I failed to realize this when reviewing the
> > > > > patch :/
> > > >
> > > > It is possible I'm not fully understanding the problem and/or missing
> > > > an important detail - it is rather tricky code, and RCU can be very
> > > > hard to reason at times - but I think we may be able to solve this
> > > > with some lock fixes inside sidtab_context_to_sid().  Let me try to
> > > > explain to see if we are on the same page here ...
> > > >
> > > > The problem is when we have two (or more) threads trying to
> > > > add/convert the same context into a sid; the task with new_sidtab is
> > > > looking to add a new sidtab entry, while the task with old_sidtab is
> > > > looking to convert an entry in old_sidtab into a new entry in
> > > > new_sidtab.  Boom.
> > > >
> > > > Looking at the code in sidtab_context_to_sid(), when we have two
> > > > sidtabs that are currently active (old_sidtab->convert pointer is
> > > > valid) and a task with old_sidtab attempts to add a new entry to both
> > > > sidtabs it first adds it to the old sidtab then it also adds it to the
> > > > new sidtab.  I believe the problem is that in this case while the task
> > > > grabs the old_sidtab->lock, it never grabs the new_sidtab->lock which
> > > > allows it to race with tasks that already see only new_sidtab.  I
> > > > think adding code to sidtab_context_to_sid() which grabs the
> > > > new_sidtab->lock when adding entries to the new_sidtab *should* solve
> > > > the problem.
> > > >
> > > > Did I miss something important? ;)
> > >
> > > If the convert pointer can be derefered without lock, we can opt to
> > > convert context after building sidtab with the risk of AB BA deadlock
> > > cut. Below is the minimum change I can think of along your direction.
> >
> > We could fix this a bit more easily by just having a shared spinlock
> > for both (well, *all*) sidtabs. Yes, we'd need to have it all the way
>
> Looking forward to reading how to add another BKL,

This is not even remotely comparable to the Big Kernel Lock... 99.9%
of the time, there will be just one sidtab, otherwise there will be
just two for a short time during the policy reload, which need to be
updated in sync anyway.

>
> > up in selinux_state and pass it through to sidtab_init(), but IMHO
> > that's less bad than trying to get it right with two locks.
>
> instead of a nearly pure code churn in order to walk around deadlock.

I wish it were that simple, but it won't work this way. Consider a
scenario like this:
1. Thread A acquires the old policy pointer, encounters a new context,
adds it to the old sidtab as X, but doesn't start adding it to the new
one yet.
2. Thread B acquires the new policy pointer, encounters a different
new context and adds it to the new sidtab as X.
[Now you end up with two different contexts assigned to the same SID,
depending on the policy]
3. Now thread A continues to add the context to the new sidtab, but
since it reuses the count variable from before, it overwrites the
entry for X.
(And even if you fixed that, re-searched the new sidtab, and added the
new context to the end, you'd end up with the context having different
SIDs under the old vs. new policy, which is again a messy situation.)

You see, moving the sidtab code around can get you in trouble and
that's why I'm advocating for using a common lock - it's much easier
to prove that such change doesn't create new bugs. And as I said
above, merging the locks won't really have performance consequences,
so I still believe it to be the better choice here.

--
Ondrej Mosnacek
Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.

Re: [PATCH v2 3/3] security: Add LSMs dependencies to CONFIG_LSM

2021-02-15 Thread Ondrej Mosnacek

On Mon, Feb 15, 2021 at 7:17 PM Mickaël Salaün  wrote:
> From: Mickaël Salaün 
>
> Thanks to the previous commit, this gives the opportunity to users, when
> running make oldconfig, to update the list of enabled LSMs at boot time
> if an LSM has just been enabled or disabled in the build.  Moreover,
> this list only makes sense if at least one LSM is enabled.
>
> Cc: Casey Schaufler 
> Cc: James Morris 
> Cc: Masahiro Yamada 
> Cc: Serge E. Hallyn 
> Signed-off-by: Mickaël Salaün 
> Link: https://lore.kernel.org/r/20210215181511.2840674-4-...@digikod.net
> ---
>
> Changes since v1:
> * Add CONFIG_SECURITY as a dependency of CONFIG_LSM.  This prevent an
>   error when building without any LSMs.
> ---
>  security/Kconfig | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/security/Kconfig b/security/Kconfig
> index 7561f6f99f1d..addcc1c04701 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -277,6 +277,10 @@ endchoice
>
>  config LSM
> string "Ordered list of enabled LSMs"
> +   depends on SECURITY || SECURITY_LOCKDOWN_LSM || SECURITY_YAMA || \
> +   SECURITY_LOADPIN || SECURITY_SAFESETID || INTEGRITY || \
> +   SECURITY_SELINUX || SECURITY_SMACK || SECURITY_TOMOYO || \
> +   SECURITY_APPARMOR || BPF_LSM

This looks really awkward, since all of these already depend on
SECURITY (if not, it's a bug)... I guarantee you that after some time
someone will come, see that the weird boolean expression is equivalent
to just SECURITY, and simplify it.

I assume the new mechanism wouldn't work as intended if there is just
SECURITY? If not, then maybe you should rather specify this value
dependency via some new  field rather than abusing "depends on" (say,
"value depends on"?). The fact that a seemingly innocent change to the
config definition breaks your mechanism suggests that the design is
flawed.

I do think this would be a useful feature, but IMHO shouldn't be
implemented like this.

> default 
> "lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" 
> if DEFAULT_SECURITY_SMACK
> default 
> "lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" 
> if DEFAULT_SECURITY_APPARMOR
> default "lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if 
> DEFAULT_SECURITY_TOMOYO
> --
> 2.30.0
>

--
Ondrej Mosnacek
Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.

Re: [PATCH ghak46 V1] audit: normalize MAC_STATUS record

2018-04-16 Thread Ondrej Mosnacek

2018-04-10 1:34 GMT+02:00 Richard Guy Briggs :
> There were two formats of the audit MAC_STATUS record, one of which was more
> standard than the other.  One listed enforcing status changes and the
> other listed enabled status changes with a non-standard label.  In
> addition, the record was missing information about which LSM was
> responsible and the operation's completion status.  While this record is
> only issued on success, the parser expects the res= field to be present.
>
> old enforcing/permissive:
> type=MAC_STATUS msg=audit(1523312831.378:24514): enforcing=0 old_enforcing=1 
> auid=0 ses=1
> old enable/disable:
> type=MAC_STATUS msg=audit(1523312831.378:24514): selinux=0 auid=0 ses=1
>
> List both sets of status and old values and add the lsm= field and the
> res= field.
>
> Here is the new format:
> type=MAC_STATUS msg=audit(1523293828.657:891): enforcing=0 old_enforcing=1 
> auid=0 ses=1 enabled=1 old-enabled=1 lsm=selinux res=1
>
> This record already accompanied a SYSCALL record.
>
> See: https://github.com/linux-audit/audit-kernel/issues/46
> Signed-off-by: Richard Guy Briggs 
> ---
>  security/selinux/selinuxfs.c | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
> index 00eed84..00b21b2 100644
> --- a/security/selinux/selinuxfs.c
> +++ b/security/selinux/selinuxfs.c
> @@ -145,10 +145,11 @@ static ssize_t sel_write_enforce(struct file *file, 
> const char __user *buf,
> if (length)
> goto out;
> audit_log(current->audit_context, GFP_KERNEL, 
> AUDIT_MAC_STATUS,
> -   "enforcing=%d old_enforcing=%d auid=%u ses=%u",
> +   "enforcing=%d old_enforcing=%d auid=%u ses=%u"
> +   " enabled=%d old-enabled=%d lsm=selinux res=1",

This is just a tiny nit but why does "old_enforcing" use an underscore
and "old-enabled" a dash? Shouldn't the style be consistent across
fields?

Just my two cents...

> new_value, selinux_enforcing,
> from_kuid(&init_user_ns, audit_get_loginuid(current)),
> -   audit_get_sessionid(current));
> +   audit_get_sessionid(current), selinux_enabled, 
> selinux_enabled);
> selinux_enforcing = new_value;
> if (selinux_enforcing)
> avc_ss_reset(0);
> @@ -272,9 +273,11 @@ static ssize_t sel_write_disable(struct file *file, 
> const char __user *buf,
> if (length)
> goto out;
> audit_log(current->audit_context, GFP_KERNEL, 
> AUDIT_MAC_STATUS,
> -   "selinux=0 auid=%u ses=%u",
> +   "enforcing=%d old_enforcing=%d auid=%u ses=%u"
> +   " enabled=%d old-enabled=%d lsm=selinux res=1",
> +   selinux_enforcing, selinux_enforcing,

^ also here

> from_kuid(&init_user_ns, audit_get_loginuid(current)),
> -       audit_get_sessionid(current));
> +   audit_get_sessionid(current), 0, 1);
> }
>
> length = count;
> --
> 1.8.3.1
>
> --
> Linux-audit mailing list
> linux-au...@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-audit

-- 
Ondrej Mosnacek 
Associate Software Engineer, Security Technologies
Red Hat, Inc.

Re: [PATCH ghak46 V1] audit: normalize MAC_STATUS record

2018-04-16 Thread Ondrej Mosnacek

2018-04-16 16:11 GMT+02:00 Richard Guy Briggs :
> On 2018-04-16 09:26, Ondrej Mosnacek wrote:
>> 2018-04-10 1:34 GMT+02:00 Richard Guy Briggs :
>> > There were two formats of the audit MAC_STATUS record, one of which was 
>> > more
>> > standard than the other.  One listed enforcing status changes and the
>> > other listed enabled status changes with a non-standard label.  In
>> > addition, the record was missing information about which LSM was
>> > responsible and the operation's completion status.  While this record is
>> > only issued on success, the parser expects the res= field to be present.
>> >
>> > old enforcing/permissive:
>> > type=MAC_STATUS msg=audit(1523312831.378:24514): enforcing=0 
>> > old_enforcing=1 auid=0 ses=1
>> > old enable/disable:
>> > type=MAC_STATUS msg=audit(1523312831.378:24514): selinux=0 auid=0 ses=1
>> >
>> > List both sets of status and old values and add the lsm= field and the
>> > res= field.
>> >
>> > Here is the new format:
>> > type=MAC_STATUS msg=audit(1523293828.657:891): enforcing=0 old_enforcing=1 
>> > auid=0 ses=1 enabled=1 old-enabled=1 lsm=selinux res=1
>> >
>> > This record already accompanied a SYSCALL record.
>> >
>> > See: https://github.com/linux-audit/audit-kernel/issues/46
>> > Signed-off-by: Richard Guy Briggs 
>> > ---
>> >  security/selinux/selinuxfs.c | 11 +++
>> >  1 file changed, 7 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
>> > index 00eed84..00b21b2 100644
>> > --- a/security/selinux/selinuxfs.c
>> > +++ b/security/selinux/selinuxfs.c
>> > @@ -145,10 +145,11 @@ static ssize_t sel_write_enforce(struct file *file, 
>> > const char __user *buf,
>> > if (length)
>> > goto out;
>> > audit_log(current->audit_context, GFP_KERNEL, 
>> > AUDIT_MAC_STATUS,
>> > -   "enforcing=%d old_enforcing=%d auid=%u ses=%u",
>> > +   "enforcing=%d old_enforcing=%d auid=%u ses=%u"
>> > +   " enabled=%d old-enabled=%d lsm=selinux res=1",
>>
>> This is just a tiny nit but why does "old_enforcing" use an underscore
>> and "old-enabled" a dash? Shouldn't the style be consistent across
>> fields?
>
> Yes, but my understanding is a preference for underscore, and not to
> change existing field names.

Ah, I just noticed that the field is already used elsewhere in the
code, so it makes sense to keep it the same. I thought at first that
it is just a typo.

>
> Steve?
>
>> Just my two cents...
>
> These details are worth noticing, thank you.
>
>> > new_value, selinux_enforcing,
>> > from_kuid(&init_user_ns, 
>> > audit_get_loginuid(current)),
>> > -   audit_get_sessionid(current));
>> > +   audit_get_sessionid(current), selinux_enabled, 
>> > selinux_enabled);
>> > selinux_enforcing = new_value;
>> > if (selinux_enforcing)
>> > avc_ss_reset(0);
>> > @@ -272,9 +273,11 @@ static ssize_t sel_write_disable(struct file *file, 
>> > const char __user *buf,
>> > if (length)
>> > goto out;
>> > audit_log(current->audit_context, GFP_KERNEL, 
>> > AUDIT_MAC_STATUS,
>> > -   "selinux=0 auid=%u ses=%u",
>> > +   "enforcing=%d old_enforcing=%d auid=%u ses=%u"
>> > +   " enabled=%d old-enabled=%d lsm=selinux res=1",
>> > +   selinux_enforcing, selinux_enforcing,
>>
>> ^ also here
>>
>> > from_kuid(&init_user_ns, 
>> > audit_get_loginuid(current)),
>> > -   audit_get_sessionid(current));
>> > +   audit_get_sessionid(current), 0, 1);
>> > }
>> >
>> > length = count;
>>
>> Ondrej Mosnacek 
>
> - RGB
>
> --
> Richard Guy Briggs 
> Sr. S/W Engineer, Kernel Security, Base Operating Systems
> Remote, Ottawa, Red Hat Canada
> IRC: rgb, SunRaycer
> Voice: +1.647.777.2635, Internal: (81) 32635

-- 
Ondrej Mosnacek 
Associate Software Engineer, Security Technologies
Red Hat, Inc.

Re: [PATCH v2 1/1] selinux: fix another double free

2020-06-12 Thread Ondrej Mosnacek

On Thu, Jun 11, 2020 at 10:48 PM  wrote:
[...]
> diff --git a/security/selinux/ss/conditional.c 
> b/security/selinux/ss/conditional.c
> index da94a1b4bfda..d0d6668709f0 100644
> --- a/security/selinux/ss/conditional.c
> +++ b/security/selinux/ss/conditional.c
> @@ -392,26 +392,21 @@ static int cond_read_node(struct policydb *p, struct 
> cond_node *node, void *fp)
>
> rc = next_entry(buf, fp, sizeof(u32) * 2);
> if (rc)
> -   goto err;
> +   return rc;
>
> expr->expr_type = le32_to_cpu(buf[0]);
> expr->bool = le32_to_cpu(buf[1]);
>
> if (!expr_node_isvalid(p, expr)) {
> rc = -EINVAL;
> -   goto err;
> +   return rc;
> }

Sorry for more nitpicking... This can be further simplified to just
"return -EINVAL;" and the braces can be removed.

> }
>
> rc = cond_read_av_list(p, fp, &node->true_list, NULL);
> if (rc)
> -   goto err;
> +   return rc;
> rc = cond_read_av_list(p, fp, &node->false_list, &node->true_list);
> -   if (rc)
> -   goto err;
> -   return 0;
> -err:
> -   cond_node_destroy(node);
> return rc;

Also here you can skip the rc assignment:

return cond_read_av_list(p, fp, &node->false_list, &node->true_list);

>  }
>
> --
> 2.18.1

--
Ondrej Mosnacek
Software Engineer, Platform Security - SELinux kernel
Red Hat, Inc.

Re: [PATCH v2 1/1] selinux: fix another double free

2020-06-12 Thread Ondrej Mosnacek

On Fri, Jun 12, 2020 at 1:27 AM Paul Moore  wrote:
> On Thu, Jun 11, 2020 at 6:41 PM Tom Rix  wrote:
> > On 6/11/20 3:30 PM, Paul Moore wrote:
> > > On Thu, Jun 11, 2020 at 4:48 PM  wrote:
> > >> From: Tom Rix 
> > >>
> > >> Clang static analysis reports this double free error
> > >>
> > >> security/selinux/ss/conditional.c:139:2: warning: Attempt to free 
> > >> released memory [unix.Malloc]
> > >> kfree(node->expr.nodes);
> > >> ^~~
> > >>
> > >> When cond_read_node fails, it calls cond_node_destroy which frees the
> > >> node but does not poison the entry in the node list.  So when it
> > >> returns to its caller cond_read_list, cond_read_list deletes the
> > >> partial list.  The latest entry in the list will be deleted twice.
> > >>
> > >> So instead of freeing the node in cond_read_node, let list freeing in
> > >> code_read_list handle the freeing the problem node along with all of the
> > >> earlier nodes.
> > >>
> > >> Because cond_read_node no longer does any error handling, the goto's
> > >> the error case are redundant.  Instead just return the error code.
> > >>
> > >> Fixes a problem was introduced by commit
> > >>
> > >>   selinux: convert cond_list to array
> > >>
> > >> Signed-off-by: Tom Rix 
> > >> ---
> > >>  security/selinux/ss/conditional.c | 11 +++
> > >>  1 file changed, 3 insertions(+), 8 deletions(-)
> > > Hi Tom,
> > >
> > > Thanks for the patch!  A few more notes, in no particular order:
> > >
> > > * There is no need to send a cover letter for just a single patch.
> > > Typically cover letters are reserved for large patchsets that require
> > > some additional explanation and/or instructions beyond the individual
> > > commit descriptions.
> >
> > I was doing this to carry the repo name and tag info.
> >
> > So how do folks know which repo and commit the change applies to ?
>
> We read your mind ;)
>
> Generally it's pretty obvious, and in the rare occasion when it isn't,
> we ask.  Most of the time you can deduce the destination repo by the
> files changed and the mailing lists on the To/CC line.  From there it
> is then just a matter of -next vs -stable and that is something that
> is usually sorted out based on the context of the patch, and if
> needed, a discussion on-list.

Yes, it is normally not necessary, but I wouldn't discourage people
from providing the info if they want to / are used to do that. It can
be really useful in some situations, especially in case of
cross-subsystem changes that are sent to many mailing lists. But of
course this information belongs either to the cover letter or in case
of single patches to the "informational" section between "---" and
"diff --git [...]".

-- 
Ondrej Mosnacek
Software Engineer, Platform Security - SELinux kernel
Red Hat, Inc.

Re: [PATCH] selinux: fix another double free

2020-06-11 Thread Ondrej Mosnacek

Hi Tom,

On Thu, Jun 11, 2020 at 5:58 PM  wrote:
> From: Tom Rix 
>
> Clang static analysis reports this double free error
>
> security/selinux/ss/conditional.c:139:2: warning: Attempt to free released 
> memory [unix.Malloc]
> kfree(node->expr.nodes);
> ^~~
>
> When cond_read_node fails, it calls cond_node_destroy which frees the
> node but does not poison the entry in the node list.  So when it
> returns to its caller cond_read_list, cond_read_list deletes the
> partial list.  The latest entry in the list will be deleted twice.
>
> So instead of freeing the node in cond_read_node, let list freeing in
> code_read_list handle the freeing the problem node along with all of the the

There is an extra "the" before "freeing".

> earlier nodes.
>
> Signed-off-by: Tom Rix 
> ---
>  security/selinux/ss/conditional.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/security/selinux/ss/conditional.c 
> b/security/selinux/ss/conditional.c
> index da94a1b4bfda..ffb31af22f4f 100644
> --- a/security/selinux/ss/conditional.c
> +++ b/security/selinux/ss/conditional.c
> @@ -411,7 +411,6 @@ static int cond_read_node(struct policydb *p, struct 
> cond_node *node, void *fp)
> goto err;
> return 0;
>  err:
> -   cond_node_destroy(node);
> return rc;

Since there is now just "return rc" in the error path, can you please
replace all the gotos with plain return statements? And please also
add a Fixes: tag pointing to the commit that introduced the bug (see
Stephen's reply).

Thanks,

-- 
Ondrej Mosnacek
Software Engineer, Platform Security - SELinux kernel
Red Hat, Inc.

Re: [PATCH] SELinux: Measure state and hash of policy using IMA

2020-08-24 Thread Ondrej Mosnacek

On Mon, Aug 24, 2020 at 9:30 PM Stephen Smalley
 wrote:
> On Mon, Aug 24, 2020 at 2:13 PM Lakshmi Ramasubramanian
>  wrote:
> >
> > On 8/24/20 7:00 AM, Stephen Smalley wrote:
> >
> > >> +int security_read_policy_kernel(struct selinux_state *state,
> > >> +   void **data, size_t *len)
> > >> +{
> > >> +   int rc;
> > >> +
> > >> +   rc = security_read_policy_len(state, len);
> > >> +   if (rc)
> > >> +   return rc;
> > >> +
> > >> +   *data = vmalloc(*len);
> > >> +   if (!*data)
> > >> +   return -ENOMEM;
> > >>
> > >> +   return security_read_selinux_policy(state, data, len);
> > >>   }
> > >
> > > See the discussion here:
> > > https://lore.kernel.org/selinux/20200824113015.1375857-1-omosn...@redhat.com/T/#t
> > >
> > > In order for this to be safe, you need to ensure that all callers of
> > > security_read_policy_kernel() have taken fsi->mutex in selinuxfs and
> > > any use of security_read_policy_len() occurs while holding the mutex.
> > > Otherwise, the length can change between security_read_policy_len()
> > > and security_read_selinux_policy() if policy is reloaded.
> > >
> >
> > "struct selinux_fs_info" is available when calling
> > security_read_policy_kernel() - currently in measure.c.
> > Only "struct selinux_state" is.
> >
> > Is Ondrej's re-try approach I need to use to workaround policy reload issue?
>
> No, I think perhaps we should move the mutex to selinux_state instead
> of selinux_fs_info.  selinux_fs_info has a pointer to selinux_state so
> it can then use it indirectly.  Note that your patches are going to
> conflict with other ongoing work in the selinux next branch that is
> refactoring policy load and converting the policy rwlock to RCU.

Yeah, and I'm experimenting with a patch on top of Stephen's RCU work
that would allow you to do this in a straightforward way without even
messing with the fsi->mutex. My patch may or may not be eventually
committed, but either way I'd recommend holding off on this for a
while until the dust settles around the RCU conversion.

-- 
Ondrej Mosnacek
Software Engineer, Platform Security - SELinux kernel
Red Hat, Inc.

Re: general protection fault in security_inode_getattr

2020-08-24 Thread Ondrej Mosnacek

On Mon, Aug 24, 2020 at 9:37 PM syzbot
 wrote:
> syzbot has found a reproducer for the following issue on:

Looping in fsdevel and OverlayFS maintainers, as this seems to be
FS/OverlayFS related...

See also original report against 5.8-rc7:
https://lore.kernel.org/linux-security-module/8caae305ab9a5...@google.com/T/

>
> HEAD commit:d012a719 Linux 5.9-rc2
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=14aa130e90
> kernel config:  https://syzkaller.appspot.com/x/.config?x=891ca5711a9f1650
> dashboard link: https://syzkaller.appspot.com/bug?extid=f07cc9be8d1d226947ed
> compiler:   clang version 10.0.0 (https://github.com/llvm/llvm-project/ 
> c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=104a650e90
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+f07cc9be8d1d22694...@syzkaller.appspotmail.com
>
> general protection fault, probably for non-canonical address 
> 0xdc0d:  [#1] PREEMPT SMP KASAN
> KASAN: null-ptr-deref in range [0x0068-0x006f]
> CPU: 1 PID: 32288 Comm: syz-executor.3 Not tainted 5.9.0-rc2-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> RIP: 0010:d_backing_inode include/linux/dcache.h:549 [inline]
> RIP: 0010:security_inode_getattr+0x42/0x140 security/security.c:1276
> Code: 1b fe 49 8d 5e 08 48 89 d8 48 c1 e8 03 42 80 3c 38 00 74 08 48 89 df e8 
> bc b4 5b fe 48 8b 1b 48 83 c3 68 48 89 d8 48 c1 e8 03 <42> 80 3c 38 00 74 08 
> 48 89 df e8 9f b4 5b fe 48 8b 1b 48 83 c3 0c
> RSP: 0018:c9000a017750 EFLAGS: 00010202
> RAX: 000d RBX: 0068 RCX: 888093ec6180
> RDX:  RSI: c9000a017860 RDI: c9000a017850
> RBP: c9000a017850 R08: dc00 R09: c9000a017850
> R10: f52001402f0c R11:  R12: c9000a017860
> R13: 8401 R14: c9000a017850 R15: dc00
> FS:  7f292d4ef700() GS:8880ae90() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 001b30920074 CR3: 937fd000 CR4: 001506e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  vfs_getattr+0x21/0x60 fs/stat.c:121
>  ovl_copy_up_one fs/overlayfs/copy_up.c:850 [inline]
>  ovl_copy_up_flags+0x2ef/0x2a00 fs/overlayfs/copy_up.c:931
>  ovl_maybe_copy_up+0x154/0x180 fs/overlayfs/copy_up.c:963
>  ovl_open+0xa2/0x200 fs/overlayfs/file.c:147
>  do_dentry_open+0x7c8/0x1010 fs/open.c:817
>  do_open fs/namei.c:3251 [inline]
>  path_openat+0x2794/0x3840 fs/namei.c:3368
>  do_filp_open+0x191/0x3a0 fs/namei.c:3395
>  file_open_name+0x321/0x430 fs/open.c:1113
>  acct_on kernel/acct.c:207 [inline]
>  __do_sys_acct kernel/acct.c:286 [inline]
>  __se_sys_acct+0x122/0x6f0 kernel/acct.c:273
>  do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x45d579
> Code: 5d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 
> 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 
> 83 2b b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:7f292d4eec78 EFLAGS: 0246 ORIG_RAX: 00a3
> RAX: ffda RBX: 0700 RCX: 0045d579
> RDX:  RSI:  RDI: 2040
> RBP: 0118cf70 R08:  R09: 
> R10:  R11: 0246 R12: 0118cf4c
> R13: 7ffc8e04bc4f R14: 7f292d4ef9c0 R15: 0118cf4c
> Modules linked in:
> ---[ end trace 7e4f1041b188e411 ]---
> RIP: 0010:d_backing_inode include/linux/dcache.h:549 [inline]
> RIP: 0010:security_inode_getattr+0x42/0x140 security/security.c:1276
> Code: 1b fe 49 8d 5e 08 48 89 d8 48 c1 e8 03 42 80 3c 38 00 74 08 48 89 df e8 
> bc b4 5b fe 48 8b 1b 48 83 c3 68 48 89 d8 48 c1 e8 03 <42> 80 3c 38 00 74 08 
> 48 89 df e8 9f b4 5b fe 48 8b 1b 48 83 c3 0c
> RSP: 0018:c9000a017750 EFLAGS: 00010202
> RAX: 000d RBX: 0068 RCX: 888093ec6180
> RDX:  RSI: c9000a017860 RDI: c9000a017850
> RBP: c9000a017850 R08: dc00 R09: c9000a017850
> R10: f52001402f0c R11:  R12: c9000a017860
> R13: 8401 R14: c9000a017850 R15: dc00
> FS:  7f292d4ef700() GS:8880ae90() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7fef820e7000 CR3: 937fd000 CR4: 001506e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
>

-- 
Ondrej Mosnacek
Software Engineer, Platform Security - SELinux kernel
Red Hat, Inc.

Re: [RFC PATCH ghak10 v6 0/2] audit: Log changes that can affect the system clock

2019-03-11 Thread Ondrej Mosnacek

On Fri, Mar 8, 2019 at 9:26 PM Richard Guy Briggs  wrote:
> On 2019-03-07 13:32, Ondrej Mosnacek wrote:
> > This patchset implements auditing of (syscall-triggered) changes that
> > can modify or indirectly affect the system clock. Some of these
> > changes can already be detected by simply logging relevant syscalls,
> > but this has some disadvantages:
> >   a) It is usually not possible to find out from the syscall records
> >  the amount by which the time was shifted.
> >   b) Syscalls like adjtimex(2) or clock_adjtime(2) can be used also
> >  for read-only operations, which might flood the audit log with
> >  false positives. (Note that these patches don't solve this
> >  problem yet due to the limitations of current record filtering
> >  capabilities.)
> >
> > The main motivation is to provide better reliability of timestamps
> > on the system as mandated by the FPT_STM.1 security functional
> > requirement from Common Criteria. This requirement apparently demands
> > that it is possible to reconstruct from audit trail the old and new
> > values of the time when it is adjusted (see [1]).
> >
> > The current version of the patchset logs the following changes:
> >   - direct setting of system time to a given value
> >   - direct injection of timekeeping offset
> >   - adjustment of timekeeping's TAI offset
> >   - NTP value adjustments:
> > - time_offset
> > - time_freq
> > - time_status
> > - time_adjust
> > - tick_usec
> >
> > Changes to the following NTP values are not logged, as they are not
> > important for security:
> >   - time_maxerror
> >   - time_esterror
> >   - time_constant
> >
> > Audit kernel GitHub issue: 
> > https://github.com/linux-audit/audit-kernel/issues/10
> > Audit kernel RFE page: 
> > https://github.com/linux-audit/audit-kernel/wiki/RFE-More-detailed-auditing-of-changes-to-system-clock
> >
> > Testing: Passed audit-testuite; functional tests TBD
>
> Reviewed-by: Richard Guy Briggs 
>
> How do you plan to test this in the audit-testsuite?

My plan is to add a new subtest which will use a short C program that
calls the relevant syscalls (they are listed in patch 1/2) directly
and will check if they produce the expected records. I outlined some
specific things to be tested in the RFE page.

>
> > Changes in v6:
> >   - Reorganized the patches to group changes by record type, not
> > kernel subsytem, as suggested in earlier discussions
> >   - Added checks to ignore no-change events (new value == old value)
> >   - Added TIME_INJOFFSET logging also to do_settimeofday64() to cover
> > syscalls such as settimeofday(2), stime(2), clock_settime(2)
> >   - Created an RFE page on audit-kernel GitHub
> > TODO:
> >   - tests for audit-testsuite
> >
> > v5: https://www.redhat.com/archives/linux-audit/2018-August/msg00039.html
> > Changes in v5:
> >   - Dropped logging of some less important changes and update commit 
> > messages
> >   - No longer mark the patchset as RFC
> >
> > v4: https://www.redhat.com/archives/linux-audit/2018-August/msg00023.html
> > Changes in v4:
> >   - Squashed first two patches into one
> >   - Renamed ADJNTPVAL's "type" field to "op" to align with audit record
> > conventions
> >   - Minor commit message editing
> >   - Cc timekeeping/NTP people for feedback
> >
> > v3: https://www.redhat.com/archives/linux-audit/2018-July/msg1.html
> > Changes in v3:
> >   - Switched to separate records for each variable
> >   - Both old and new value is now reported for each change
> >   - Injecting offset is reported via a separate record (since this
> > offset consists of two values and is added directly to the clock,
> > i.e. it doesn't make sense to log old and new value)
> >   - Added example records produced by chronyd -q (see the commit message
> > of the last patch)
> >
> > v2: https://www.redhat.com/archives/linux-audit/2018-June/msg00114.html
> > Changes in v2:
> >   - The audit_adjtime() function has been modified to only log those
> > fields that contain values that are actually used, resulting in more
> > compact records.
> >   - The audit_adjtime() call has been moved to do_adjtimex() in
> > timekeeping.c
> >   - Added an additional patch (for review) that simplifies the detection
> > if the syscall is read-only.
> >
> > v1: https://www.redhat.com/archives/linux-audit/2018-June/msg00095.html
> >

kernel/time/ntp.c: Possible off-by-one error in TAI range check?

2019-04-08 Thread Ondrej Mosnacek

Hello,

while writing tests for clock adjustment auditing [1] [2], I stumbled
upon a strange behavior of adjtimex(2) when setting the TAI offset...

Commit 153b5d054ac2 ("ntp: support for TAI") added a possibility to
change the TAI offset from userspace via adjtimex(2). The code checks
if the input value (txc->constant) is greater than 0 and if it is not,
then it doesn't modify the value. Ignoring the fact that this check
should probably be in timekeeping_validate_timex() and cause -EINVAL
to be returned when false, I find it strange that the check doesn't
allow to set the value to 0, which seems to be the default value...

Was this behavior intended or should the code actually check for
txc->constant >= 0 instead of txc->constant > 0?

Thanks,

[1] https://github.com/linux-audit/audit-kernel/issues/10
[2] 
https://github.com/linux-audit/audit-kernel/wiki/RFE-More-detailed-auditing-of-changes-to-system-clock

-- 
Ondrej Mosnacek 
Software Engineer, Security Technologies
Red Hat, Inc.

Re: kernel/time/ntp.c: Possible off-by-one error in TAI range check?

2019-04-15 Thread Ondrej Mosnacek

On Mon, Apr 8, 2019 at 10:47 AM Ondrej Mosnacek  wrote:
> Hello,
>
> while writing tests for clock adjustment auditing [1] [2], I stumbled
> upon a strange behavior of adjtimex(2) when setting the TAI offset...
>
> Commit 153b5d054ac2 ("ntp: support for TAI") added a possibility to
> change the TAI offset from userspace via adjtimex(2). The code checks
> if the input value (txc->constant) is greater than 0 and if it is not,
> then it doesn't modify the value. Ignoring the fact that this check
> should probably be in timekeeping_validate_timex() and cause -EINVAL
> to be returned when false, I find it strange that the check doesn't
> allow to set the value to 0, which seems to be the default value...
>
> Was this behavior intended or should the code actually check for
> txc->constant >= 0 instead of txc->constant > 0?

Ping?

>
> Thanks,
>
> [1] https://github.com/linux-audit/audit-kernel/issues/10
> [2] 
> https://github.com/linux-audit/audit-kernel/wiki/RFE-More-detailed-auditing-of-changes-to-system-clock
>
> --
> Ondrej Mosnacek 
> Software Engineer, Security Technologies
> Red Hat, Inc.

-- 
Ondrej Mosnacek 
Software Engineer, Security Technologies
Red Hat, Inc.

Re: [RFC PATCH ghak10 v6 2/2] ntp: Audit NTP parameters adjustment

2019-04-01 Thread Ondrej Mosnacek

On Thu, Mar 28, 2019 at 1:02 AM Thomas Gleixner  wrote:
> On Thu, 7 Mar 2019, Ondrej Mosnacek wrote:
>
> > Emit an audit record every time selected NTP parameters are modified
> > from userspace (via adjtimex(2) or clock_adjtime(2)).
> >
> > Such events will now generate records of type AUDIT_TIME_ADJNTPVAL
> > containing the following fields:
> >   - op -- which value was adjusted:
> > - offset -- corresponding to the time_offset variable
> > - freq   -- corresponding to the time_freq variable
> > - status -- corresponding to the time_status variable
> > - adjust -- corresponding to the time_adjust variable
> > - tick   -- corresponding to the tick_usec variable
> > - tai-- corresponding to the timekeeping's TAI offset
> >   - old -- the old value
> >   - new -- the new value
> >
> > For reference, running the following commands:
> >
> > auditctl -D
> > auditctl -a exit,always -F arch=b64 -S adjtimex
> > chronyd -q
> >
> > produces audit records like this:
> >
> > type=SYSCALL msg=audit(1530616044.507:5): arch=c03e syscall=159 
> > success=yes exit=5 a0=7fff57e78c00 a1=0 a2=4 a3=7f754ae28c0a items=0 
> > ppid=626 pid=629 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 
> > fsgid=0 tty=(none) ses=1 comm="chronyd" exe="/usr/sbin/chronyd" 
> > subj=system_u:system_r:kernel_t:s0 key=(null)
>
> 
>
> Is it really necessary to put this into the changelog?

Yeah, sorry, I went a bit overboard with the record examples... I'll
try to provide simpler and less verbose examples in the next version.

>
> >
> > +void __audit_ntp_adjust(const char *type, s64 oldval, s64 newval)
> > +{
> > + audit_log(audit_context(), GFP_ATOMIC, AUDIT_TIME_ADJNTPVAL,
>
> No.
>
> > +   "op=%s old=%lli new=%lli", type,
> > +   (long long)oldval, (long long)newval);
> > +}
> > +
> >  static void audit_log_task(struct audit_buffer *ab)
> >  {
> >   kuid_t auid, uid;
> > diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
> > index 36a2bef00125..5f456a84151a 100644
> > --- a/kernel/time/ntp.c
> > +++ b/kernel/time/ntp.c
> > @@ -17,6 +17,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #include "ntp_internal.h"
> >  #include "timekeeping_internal.h"
> > @@ -293,6 +294,8 @@ static inline s64 ntp_update_offset_fll(s64 offset64, 
> > long secs)
> >
> >  static void ntp_update_offset(long offset)
> >  {
> > + s64 old_offset = time_offset;
> > + s64 old_freq = time_freq;
> >   s64 freq_adj;
> >   s64 offset64;
> >   long secs;
> > @@ -341,6 +344,9 @@ static void ntp_update_offset(long offset)
> >   time_freq   = max(freq_adj, -MAXFREQ_SCALED);
> >
> >   time_offset = div_s64(offset64 << NTP_SCALE_SHIFT, NTP_INTERVAL_FREQ);
> > +
> > + audit_ntp_adjust("offset", old_offset, time_offset);
> > + audit_ntp_adjust("freq", old_freq, time_freq);
> >  }
> >
> >  /**
> > @@ -658,21 +664,31 @@ static inline void process_adj_status(const struct 
> > timex *txc)
> >
> >  static inline void process_adjtimex_modes(const struct timex *txc, s32 
> > *time_tai)
> >  {
> > - if (txc->modes & ADJ_STATUS)
> > - process_adj_status(txc);
> > + if (txc->modes & (ADJ_STATUS | ADJ_NANO | ADJ_MICRO)) {
> > + int old_status = time_status;
> > +
> > + if (txc->modes & ADJ_STATUS)
> > + process_adj_status(txc);
> > - if (txc->modes & ADJ_NANO)
> > - time_status |= STA_NANO;
> > + if (txc->modes & ADJ_NANO)
> > + time_status |= STA_NANO;
> >
> > - if (txc->modes & ADJ_MICRO)
> > - time_status &= ~STA_NANO;
> > + if (txc->modes & ADJ_MICRO)
> > + time_status &= ~STA_NANO;
> > +
> > + audit_ntp_adjust("status", old_status, time_status);
> > + }
> >
> >   if (txc->modes & ADJ_FREQUENCY) {
> > + s64 old_freq = time_freq;
> > +
> >   time_freq = txc->freq * PPM_SCALE;
> >   time_freq = min(time_freq, MAXFREQ_SCALED);
> >   time_freq = max(time_freq, -MAXFREQ_SCALED);
> >   /* update pps_freq */
> &g

Re: [RFC PATCH ghak10 v6 1/2] timekeeping: Audit clock adjustments

2019-04-01 Thread Ondrej Mosnacek

On Thu, Mar 28, 2019 at 12:27 AM John Stultz  wrote:
> On Thu, Mar 7, 2019 at 4:33 AM Ondrej Mosnacek  wrote:
> >
> > Emit an audit record whenever the system clock is changed (i.e. shifted
> > by a non-zero offset) by a syscall from userspace. The syscalls than can
> > (at the time of writing) trigger such record are:
> >   - settimeofday(2), stime(2), clock_settime(2) -- via
> > do_settimeofday64()
> >   - adjtimex(2), clock_adjtime(2) -- via do_adjtimex()
> >
> > The new records have type AUDIT_TIME_INJOFFSET and contain the following
> > fields:
> >   - sec -- the 'seconds' part of the offset
> >   - nsec -- the 'nanoseconds' part of the offset
> >
> > For reference, running the following commands:
> >
> > auditctl -D
> > auditctl -a exit,always -F arch=b64 -S adjtimex
> > chronyd -q
> >
> > triggers (among others) a syscall that produces audit records like this:
> >
> > type=TIME_INJOFFSET msg=audit(1530616049.652:13): sec=-16 nsec=124887145
> > type=SYSCALL msg=audit(1530616049.652:13): arch=c03e syscall=159 
> > success=yes exit=5 a0=7fff57e78270 a1=1 a2=fff0 
> > a3=137b828205ca12 items=0 ppid=626 pid=629 auid=0 uid=385 gid=382 euid=385 
> > suid=385 fsuid=385 egid=382 sgid=382 fsgid=382 tty=(none) ses=1 
> > comm="chronyd" exe="/usr/sbin/chronyd" subj=system_u:system_r:kernel_t:s0 
> > key=(null)
> > type=PROCTITLE msg=audit(1530616049.652:13): proctitle=6368726F6E7964002D71 
> > cd /home/omosnace/Dokumenty/Kernel/worktrees/audit/src/kernel/time
> > s
>
> Hrm.  Ugly log spam aside (I get it sort of goes with the territory
> :), I could imagine someone looking at this and wanting to also know
> when the injection was applied. Obviously the whole point of the
> offset injection is we don't really care about the when, we only want
> a fixed offset to be made atomically.  That said, if someone did try
> to add such a timestamp on the log, there's the problem of trying to
> calculate the time while holding the timekeeping locks.   So, are you
> certain the next request won't be to try to to also calculate a
> timestamp and push it into the audit_log() call as well?

Well, __audit_syscall_entry() already logs the timestamp of the
syscall entry (this is what ends up also in the TIME_INJOFFSET
syscall-associated record as the timestamp, actually), so even though
it is not precisely the moment when the change happened, it should be
good enough in most cases. I'm not sure if it is worth it to add
another slightly duplicit but exact timestamp... Steve Grubb is our
local expert on certifications, so unless he tells me this is likely
required, I don't feel like adding that extra information there right
now...

>
> Also, we have to be careful with anything we call from the timekeeping
> code, its really easy for some corner case to trip something that then
> tries to read the time and we deadlock, particularly with rare cases
> like time adjustments.  I'm not familiar with the audit subsystem, but
> from a maintenance point of view, can we make sure there's enough
> documentation so that audit_log() and everything it calls will never
> in the future call back into the timekeeping code?   I suspect this is
> already covered, so apologies for the boilerplate concern, but I want
> to be sure.

Yes, this turns out to be a bigger issue than I'd thought, see my
reply to Thomas in the thread for 2/2.



>
>
> > The above records have been produced by the following syscall from
> > chronyd (as per strace output):
> >
> > adjtimex({modes=ADJ_SETOFFSET|ADJ_NANO, offset=0, freq=750433, 
> > maxerror=1600, esterror=1600, status=STA_UNSYNC|STA_NANO, 
> > constant=2, precision=1, tolerance=32768000, time={tv_sec=1530616033, 
> > tv_usec=778717675}, tick=1, ppsfreq=0, jitter=0, shift=0, stabil=0, 
> > jitcnt=0, calcnt=0, errcnt=0, stbcnt=0, tai=0}) = 5 (TIME_ERROR)
> >
> > (The struct timex fields above are from *after* the syscall was
> > executed, so they contain the current (new) values as set from the
> > kernel, except of the 'modes' field, which contains the original value
> > sent by the caller.)
> >
> > Signed-off-by: Ondrej Mosnacek 
> > ---
> >  include/linux/audit.h  | 15 +++
> >  include/uapi/linux/audit.h |  1 +
> >  kernel/auditsc.c   |  8 
> >  kernel/time/timekeeping.c  |  6 ++
> >  4 files changed, 30 insertions(+)
> >
> > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > index 1e69d9fe16da..43a60fbe74be 100644
> > --- a/include/l

Re: [PATCH] hooks: fix a missing-check bug in selinux_sb_eat_lsm_opts()

2019-05-30 Thread Ondrej Mosnacek

On Thu, May 30, 2019 at 5:53 AM Gen Zhang  wrote:
> In selinux_sb_eat_lsm_opts(), 'arg' is allocated by kmemdup_nul(). It
> returns NULL when fails. So 'arg' should be checked.
>
> Signed-off-by: Gen Zhang 

Since it looks like you are going to respin this patch, please add:

Fixes: 99dbbb593fe6 ("selinux: rewrite selinux_sb_eat_lsm_opts()")

to the commit message so that there is a record of which commit
introduced the issue (then it can be picked up automatically for
backport into the relevant stable kernels).

Thanks for spotting the issue and sending the patch(es)!

> ---
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 3ec702c..5a9e959 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -2635,6 +2635,8 @@ static int selinux_sb_eat_lsm_opts(char *options, void 
> **mnt_opts)
> *q++ = c;
> }
> arg = kmemdup_nul(arg, q - arg, GFP_KERNEL);
> +   if (!arg)
> +   return 0;
> }
> rc = selinux_add_opt(token, arg, mnt_opts);
> if (unlikely(rc)) {

-- 
Ondrej Mosnacek 
Software Engineer, Security Technologies
Red Hat, Inc.

Re: [PATCH] hooks: fix a missing-check bug in selinux_add_mnt_opt()

2019-05-30 Thread Ondrej Mosnacek

On Thu, May 30, 2019 at 10:06 AM Gen Zhang  wrote:
> In selinux_add_mnt_opt(), 'val' is allcoted by kmemdup_nul(). It returns
> NULL when fails. So 'val' should be checked.
>
> Signed-off-by: Gen Zhang 

Please add a Fixes tag here, too:

Fixes: 757cbe597fe8 ("LSM: new method: ->sb_add_mnt_opt()")

> ---
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 3ec702c..4797c63 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -1052,8 +1052,11 @@ static int selinux_add_mnt_opt(const char *option, 
> const char *val, int len,
> if (token == Opt_error)
> return -EINVAL;
>
> -   if (token != Opt_seclabel)
> -   val = kmemdup_nul(val, len, GFP_KERNEL);
> +   if (token != Opt_seclabel) {
> +   val = kmemdup_nul(val, len, GFP_KERNEL);
> +   if (!val)
> +   return -ENOMEM;

There is one extra tab character in the above three lines ^^^

> +   }
> rc = selinux_add_opt(token, val, mnt_opts);
> if (unlikely(rc)) {
> kfree(val);

Thanks,

--
Ondrej Mosnacek 
Software Engineer, Security Technologies
Red Hat, Inc.

Re: [PATCH v2] hooks: fix a missing-check bug in selinux_sb_eat_lsm_opts()

2019-05-30 Thread Ondrej Mosnacek

On Thu, May 30, 2019 at 10:51 AM Gen Zhang  wrote:
> In selinux_sb_eat_lsm_opts(), 'arg' is allocated by kmemdup_nul(). It
> returns NULL when fails. So 'arg' should be checked.
>
> Signed-off-by: Gen Zhang 
> Fixes: 99dbbb593fe6 ("selinux: rewrite selinux_sb_eat_lsm_opts()")
> ---
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 3ec702c..5a9e959 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -2635,6 +2635,8 @@ static int selinux_sb_eat_lsm_opts(char *options, void 
> **mnt_opts)
> *q++ = c;
> }
> arg = kmemdup_nul(arg, q - arg, GFP_KERNEL);
> +   if (!arg)
> +   return -ENOMEM;
> }
> rc = selinux_add_opt(token, arg, mnt_opts);
> if (unlikely(rc)) {

Looking at the callers of security_sb_eat_lsm_opts() (which is the
function that eventually calls the selinux_sb_eat_lsm_opts() hook),
-ENOMEM should be appropriate here.

Reviewed-by: Ondrej Mosnacek 

-- 
Ondrej Mosnacek 
Software Engineer, Security Technologies
Red Hat, Inc.

Re: [PATCH] selinux: convert struct sidtab count to refcount_t

2019-07-22 Thread Ondrej Mosnacek

On Mon, Jul 22, 2019 at 1:35 PM NitinGote  wrote:
> refcount_t type and corresponding API should be
> used instead of atomic_t when the variable is used as
> a reference counter. This allows to avoid accidental
> refcounter overflows that might lead to use-after-free
> situations.
>
> Signed-off-by: NitinGote 

Nack.

The 'count' variable is not used as a reference counter here. It
tracks the number of entries in sidtab, which is a very specific
lookup table that can only grow (the count never decreases). I only
made it atomic because the variable is read outside of the sidtab's
spin lock and thus the reads and writes to it need to be guaranteed to
be atomic. The counter is only updated under the spin lock, so
insertions do not race with each other.

Your patch, however, lead me to realize that I forgot to guard against
overflow above SIDTAB_MAX when a new entry is being inserted. It is
extremely unlikely to happen in practice, but should be fixed anyway.
I'll send a patch shortly.

> ---
>  security/selinux/ss/sidtab.c | 16 
>  security/selinux/ss/sidtab.h |  2 +-
>  2 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/security/selinux/ss/sidtab.c b/security/selinux/ss/sidtab.c
> index e63a90ff2728..20fe235c6c71 100644
> --- a/security/selinux/ss/sidtab.c
> +++ b/security/selinux/ss/sidtab.c
> @@ -29,7 +29,7 @@ int sidtab_init(struct sidtab *s)
> for (i = 0; i < SECINITSID_NUM; i++)
> s->isids[i].set = 0;
>
> -   atomic_set(&s->count, 0);
> +   refcount_set(&s->count, 0);
>
> s->convert = NULL;
>
> @@ -130,7 +130,7 @@ static struct context *sidtab_do_lookup(struct sidtab *s, 
> u32 index, int alloc)
>
>  static struct context *sidtab_lookup(struct sidtab *s, u32 index)
>  {
> -   u32 count = (u32)atomic_read(&s->count);
> +   u32 count = refcount_read(&s->count);
>
> if (index >= count)
> return NULL;
> @@ -245,7 +245,7 @@ static int sidtab_reverse_lookup(struct sidtab *s, struct 
> context *context,
>  u32 *index)
>  {
> unsigned long flags;
> -   u32 count = (u32)atomic_read(&s->count);
> +   u32 count = (u32)refcount_read(&s->count);
> u32 count_locked, level, pos;
> struct sidtab_convert_params *convert;
> struct context *dst, *dst_convert;
> @@ -272,7 +272,7 @@ static int sidtab_reverse_lookup(struct sidtab *s, struct 
> context *context,
> spin_lock_irqsave(&s->lock, flags);
>
> convert = s->convert;
> -   count_locked = (u32)atomic_read(&s->count);
> +   count_locked = (u32)refcount_read(&s->count);
> level = sidtab_level_from_count(count_locked);
>
> /* if count has changed before we acquired the lock, then catch up */
> @@ -315,7 +315,7 @@ static int sidtab_reverse_lookup(struct sidtab *s, struct 
> context *context,
> }
>
> /* at this point we know the insert won't fail */
> -   atomic_set(&convert->target->count, count + 1);
> +   refcount_set(&convert->target->count, count + 1);
> }
>
> if (context->len)
> @@ -328,7 +328,7 @@ static int sidtab_reverse_lookup(struct sidtab *s, struct 
> context *context,
> /* write entries before writing new count */
> smp_wmb();
>
> -   atomic_set(&s->count, count + 1);
> +   refcount_set(&s->count, count + 1);
>
> rc = 0;
>  out_unlock:
> @@ -418,7 +418,7 @@ int sidtab_convert(struct sidtab *s, struct 
> sidtab_convert_params *params)
> return -EBUSY;
> }
>
> -   count = (u32)atomic_read(&s->count);
> +   count = (u32)refcount_read(&s->count);
> level = sidtab_level_from_count(count);
>
> /* allocate last leaf in the new sidtab (to avoid race with
> @@ -431,7 +431,7 @@ int sidtab_convert(struct sidtab *s, struct 
> sidtab_convert_params *params)
> }
>
> /* set count in case no new entries are added during conversion */
> -   atomic_set(¶ms->target->count, count);
> +   refcount_set(¶ms->target->count, count);
>
> /* enable live convert of new entries */
> s->convert = params;
> diff --git a/security/selinux/ss/sidtab.h b/security/selinux/ss/sidtab.h
> index bbd5c0d1f3bd..68dd96a5beba 100644
> --- a/security/selinux/ss/sidtab.h
> +++ b/security/selinux/ss/sidtab.h
> @@ -70,7 +70,7 @@ struct sidtab_convert_params {
>
>  struct sidtab {
> union sidtab_entry_inner roots[SIDTAB_MAX_LEVEL + 1];
> -   atomic_t count;
> +   refcount_t count;
> struct sidtab_convert_params *convert;
> spinlock_t lock;
>
> --
> 2.17.1
>

Thanks,

-- 
Ondrej Mosnacek 
Software Engineer, Security Technologies
Red Hat, Inc.

Re: [PATCH] selinux: convert struct sidtab count to refcount_t

2019-07-24 Thread Ondrej Mosnacek

On Tue, Jul 23, 2019 at 4:54 PM Jann Horn  wrote:
> On Mon, Jul 22, 2019 at 3:44 PM Ondrej Mosnacek  wrote:
> > On Mon, Jul 22, 2019 at 1:35 PM NitinGote  wrote:
> > > refcount_t type and corresponding API should be
> > > used instead of atomic_t when the variable is used as
> > > a reference counter. This allows to avoid accidental
> > > refcounter overflows that might lead to use-after-free
> > > situations.
> > >
> > > Signed-off-by: NitinGote 
> >
> > Nack.
> >
> > The 'count' variable is not used as a reference counter here. It
> > tracks the number of entries in sidtab, which is a very specific
> > lookup table that can only grow (the count never decreases). I only
> > made it atomic because the variable is read outside of the sidtab's
> > spin lock and thus the reads and writes to it need to be guaranteed to
> > be atomic. The counter is only updated under the spin lock, so
> > insertions do not race with each other.
>
> Probably shouldn't even be atomic_t... quoting Documentation/atomic_t.txt:
>
> | SEMANTICS
> | -
> |
> | Non-RMW ops:
> |
> | The non-RMW ops are (typically) regular LOADs and STOREs and are canonically
> | implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and
> | smp_store_release() respectively. Therefore, if you find yourself only using
> | the Non-RMW operations of atomic_t, you do not in fact need atomic_t at all
> | and are doing it wrong.
>
> So I think what you actually want here is a plain "int count", and then:
>  - for unlocked reads, either READ_ONCE()+smp_rmb() or smp_load_acquire()
>  - for writes, either smp_wmb()+WRITE_ONCE() or smp_store_release()
>
> smp_load_acquire() and smp_store_release() are probably the nicest
> here, since they are semantically clearer than smp_rmb()/smp_wmb().

Oh yes, I had a hunch that there would be a better way to do it... I
should have taken the time to read the documentation carefully :)

I am on PTO today, but I will be happy to send a patch to convert the
atomic_t usage to the smp_load_acquire()/smp_store_release() helpers
tomorrow. It will also allow us to just use u32 directly and to get
rid of the ugly casts and the INT_MAX limit.

Thanks a lot for the hint, Jann!

--
Ondrej Mosnacek 
Software Engineer, Security Technologies
Red Hat, Inc.

Re: [PATCH -next] selinux: Make selinux_kernfs_init_security static

2019-03-22 Thread Ondrej Mosnacek

On Fri, Mar 22, 2019 at 3:04 PM Yue Haibing  wrote:
> From: YueHaibing 
>
> Fix sparse warning:
>
> security/selinux/hooks.c:3389:5: warning:
>  symbol 'selinux_kernfs_init_security' was not declared. Should it be static?
>
> Signed-off-by: YueHaibing 

Yup, another trivial mistake on my part...

Acked-by: Ondrej Mosnacek 

Thanks for catching that!

> ---
>  security/selinux/hooks.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index ab4b049..b6e6152 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -3386,8 +3386,8 @@ static int selinux_inode_copy_up_xattr(const char *name)
>
>  /* kernfs node operations */
>
> -int selinux_kernfs_init_security(struct kernfs_node *kn_dir,
> -struct kernfs_node *kn)
> +static int selinux_kernfs_init_security(struct kernfs_node *kn_dir,
> +   struct kernfs_node *kn)
>  {
> const struct task_security_struct *tsec = current_security();
> u32 parent_sid, newsid, clen;
> --
> 2.7.0

-- 
Ondrej Mosnacek 
Software Engineer, Security Technologies
Red Hat, Inc.

Re: [PATCH] selinux: fixed parse warning Using plain integer as NULL pointer

2019-03-24 Thread Ondrej Mosnacek

On Sun, Mar 24, 2019 at 11:10 AM Hariprasad Kelam
 wrote:
> Changed  0 --> NULL to avoid sparse warning
>
> Sparse warning below:
>
> sudo make C=2 CF=-D__CHECK_ENDIAN__ M=security
>
>  CHECK   security/selinux/ss/services.c
> security/selinux/ss/services.c:1323:32: warning: Using plain integer as
> NULL pointer
>
> Signed-off-by: Hariprasad Kelam 
> ---
>  security/selinux/ss/services.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
> index ec62918..30cea59 100644
> --- a/security/selinux/ss/services.c
> +++ b/security/selinux/ss/services.c
> @@ -1320,7 +1320,7 @@ static int security_sid_to_context_core(struct 
> selinux_state *state,
> }
> if (only_invalid && !context->len) {
> scontext = NULL;
> -   scontext_len = 0;
> +   scontext_len = NULL;
> rc = 0;

The warning is correct, but this patch is the wrong fix. I intended to
set the values to the objects pointed to by the pointers, not the
pointers themselves. As is, the assignments have no effect, because
they only modify the local variables, which are not used further in
the function. Fortunately, *scontext and *scontext_len are already set
to NULL/0 at the beginning of the function, so the code is technically
correct even without them.

The correct fix is actually to remove the assignments entirely. I'll
send a patch that does that tomorrow.

> } else {
> rc = context_struct_to_string(policydb, context, scontext,
> --
> 2.7.4
>

Anyway, thank you for catching the mistake! One more reminder for me
to finally install sparse and add C=1 to my kernel build command
line...

--
Ondrej Mosnacek 
Software Engineer, Security Technologies
Red Hat, Inc.

Re: [kernfs] e19dfdc83b: BUG:KASAN:global-out-of-bounds_in_s

2019-03-25 Thread Ondrej Mosnacek

yout+0x446/0x600 with crng_init=1
> > [   32.637364] random: get_random_u64 called from 
> > load_elf_binary+0x1281/0x2f30 with crng_init=1
> >  Starting Login Service...
> >  Starting LSB: Start and stop bmc-watchdog...
> >  Starting LSB: Execute the kexec -e command to reboot system...
> >
> >
> > To reproduce:
> >
> > # build kernel
> > cd linux
> > cp config-5.1.0-rc1-00010-ge19dfdc .config
> > make HOSTCC=gcc-7 CC=gcc-7 ARCH=x86_64 olddefconfig
> > make HOSTCC=gcc-7 CC=gcc-7 ARCH=x86_64 prepare
> > make HOSTCC=gcc-7 CC=gcc-7 ARCH=x86_64 modules_prepare
> > make HOSTCC=gcc-7 CC=gcc-7 ARCH=x86_64 SHELL=/bin/bash
> > make HOSTCC=gcc-7 CC=gcc-7 ARCH=x86_64 bzImage
> >
> >
> > git clone https://github.com/intel/lkp-tests.git
> > cd lkp-tests
> > find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
> > bin/lkp qemu -k  -m modules.cgz job-script # job-script is 
> > attached in this email
> >
> >
> >
> >
> > Thanks,
> > Rong Chen
> >
>
>
> --
> paul moore
> www.paul-moore.com

--
Ondrej Mosnacek 
Software Engineer, Security Technologies
Red Hat, Inc.

Re: [RFC PATCH] sched: only issue an audit on privileged operation

2020-09-08 Thread Ondrej Mosnacek

On Tue, Sep 8, 2020 at 12:26 PM  wrote:
> On Fri, Sep 04, 2020 at 06:00:31PM +0200, Christian GÃ¶ttsche wrote:
> > sched_setattr(2) does via kernel/sched/core.c:__sched_setscheduler()
> > issue a CAP_SYS_NICE audit event unconditionally, even when the requested
> > operation does not require that capability / is un-privileged.
> >
> > Perform privilged/unprivileged catigorization first and perform a
> > capable test only if needed.
> >
> > Signed-off-by: Christian GÃ¶ttsche 
> > ---
> >  kernel/sched/core.c | 65 -
> >  1 file changed, 47 insertions(+), 18 deletions(-)
>
> So who sodding cares about audit, and why is that a reason to make a
> trainwreck of code?

The commit message should be more specific. I believe Christian is
talking about the case where SELinux or other LSM denies the
capability, in which case the denial is usually logged to the audit
log. Obviously, we don't want to get a denial logged when the
capability wasn't actually required for the operation to be allowed.

Unfortunately, the capability interface doesn't provide a way to first
get the decision value and only trigger the auditing when it was
actually used in the decision, so in complex scenarios like this the
caller needs to jump through some hoops to avoid such false-positive
denial records.

--
Ondrej Mosnacek
Software Engineer, Platform Security - SELinux kernel
Red Hat, Inc.

Re: [PATCH] SELinux: Measure state and hash of policy using IMA

2020-09-08 Thread Ondrej Mosnacek

On Tue, Sep 8, 2020 at 2:37 PM Stephen Smalley
 wrote:
> On Mon, Sep 7, 2020 at 5:39 PM Lakshmi Ramasubramanian
>  wrote:

> > diff --git a/security/selinux/measure.c b/security/selinux/measure.c
> > new file mode 100644
> > index ..caf9107937d9
> > --- /dev/null
> > +++ b/security/selinux/measure.c
> 
> > +static int read_selinux_state(char **state_str, int *state_str_len,
> > + struct selinux_state *state)
> > +{
> > +   char *buf, *str_fmt = "%s=%d;";
> > +   int i, buf_len, curr;
> 
> > +   for (i = 0; i < __POLICYDB_CAPABILITY_MAX; i++) {
> > +   buf_len += snprintf(NULL, 0, str_fmt,
> > +   selinux_policycap_names[i],
> > +   state->policycap[i]);
> > +   }
>
> This will need to be converted to use
> security_policycap_supported(state, i) rather than state->policycap[i]
> since the latter is going to be removed by Ondrej's patches I think.

Based on my testing so far, even with just moving the array under
struct selinux_policy, the RCU accessing still brings a significant
overhead (relative to the whole syscalls it is probably negligible,
but relative to the rest of the simpler hooks it is about 30%), so I
don't think it is necessary to adapt other patches to it yet. It will
be my responsibility to adapt to the newly added code when/if I rebase
and respin my patch.

>
> > +   for (i = 0; i < __POLICYDB_CAPABILITY_MAX; i++) {
> > +   curr += snprintf((buf + curr), (buf_len - curr), str_fmt,
> > +selinux_policycap_names[i],
> > +state->policycap[i]);
>
> Ditto.
>

-- 
Ondrej Mosnacek
Software Engineer, Platform Security - SELinux kernel
Red Hat, Inc.

Re: [PATCH v3] selinux: fix another double free

2020-06-16 Thread Ondrej Mosnacek

On Mon, Jun 15, 2020 at 10:46 PM  wrote:
> From: Tom Rix 
>
> Clang static analysis reports this double free error
>
> security/selinux/ss/conditional.c:139:2: warning: Attempt to free released 
> memory [unix.Malloc]
> kfree(node->expr.nodes);
> ^~~
>
> When cond_read_node fails, it calls cond_node_destroy which frees the
> node but does not poison the entry in the node list.  So when it
> returns to its caller cond_read_list, cond_read_list deletes the
> partial list.  The latest entry in the list will be deleted twice.
>
> So instead of freeing the node in cond_read_node, let list freeing in
> code_read_list handle the freeing the problem node along with all of the
> earlier nodes.
>
> Because cond_read_node no longer does any error handling, the goto's
> the error case are redundant.  Instead just return the error code.
>
> Fixes: 60abd3181db2 ("selinux: convert cond_list to array")
>
> Signed-off-by: Tom Rix 

Reviewed-by: Ondrej Mosnacek 

Thanks!

-- 
Ondrej Mosnacek
Software Engineer, Platform Security - SELinux kernel
Red Hat, Inc.

[tip: perf/core] perf/core: Fix unconditional security_locked_down() call

2021-03-17 Thread tip-bot2 for Ondrej Mosnacek

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 08ef1af4de5fe7de9c6d69f1e22e51b66e385d9b
Gitweb:
https://git.kernel.org/tip/08ef1af4de5fe7de9c6d69f1e22e51b66e385d9b
Author:Ondrej Mosnacek 
AuthorDate:Wed, 24 Feb 2021 22:56:28 +01:00
Committer: Peter Zijlstra 
CommitterDate: Tue, 16 Mar 2021 21:44:43 +01:00

perf/core: Fix unconditional security_locked_down() call

Currently, the lockdown state is queried unconditionally, even though
its result is used only if the PERF_SAMPLE_REGS_INTR bit is set in
attr.sample_type. While that doesn't matter in case of the Lockdown LSM,
it causes trouble with the SELinux's lockdown hook implementation.

SELinux implements the locked_down hook with a check whether the current
task's type has the corresponding "lockdown" class permission
("integrity" or "confidentiality") allowed in the policy. This means
that calling the hook when the access control decision would be ignored
generates a bogus permission check and audit record.

Fix this by checking sample_type first and only calling the hook when
its result would be honored.

Fixes: b0c8fdc7fdb7 ("lockdown: Lock down perf when in confidentiality mode")
Signed-off-by: Ondrej Mosnacek 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Paul Moore 
Link: https://lkml.kernel.org/r/20210224215628.192519-1-omosn...@redhat.com
---
 kernel/events/core.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6182cb1..f079431 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11833,12 +11833,12 @@ SYSCALL_DEFINE5(perf_event_open,
return err;
}
 
-   err = security_locked_down(LOCKDOWN_PERF);
-   if (err && (attr.sample_type & PERF_SAMPLE_REGS_INTR))
-   /* REGS_INTR can leak data, lockdown must prevent this */
-   return err;
-
-   err = 0;
+   /* REGS_INTR can leak data, lockdown must prevent this */
+   if (attr.sample_type & PERF_SAMPLE_REGS_INTR) {
+   err = security_locked_down(LOCKDOWN_PERF);
+   if (err)
+   return err;
+   }
 
/*
 * In cgroup mode, the pid argument is used to pass the fd

72 matches

Mail list logo