Re: [PATCH v2 5/7] selinux: Add support for unprivileged mounts from user namespaces
On 10/13/2015 01:04 PM, Seth Forshee wrote: Security labels from unprivileged mounts in user namespaces must be ignored. Force superblocks from user namespaces whose labeling behavior is to use xattrs to use mountpoint labeling instead. For the mountpoint label, default to converting the current task context into a form suitable for file objects, but also allow the policy writer to specify a different label through policy transition rules. Pieced together from code snippets provided by Stephen Smalley. Signed-off-by: Seth Forshee <seth.fors...@canonical.com> Acked-by: Stephen Smalley <s...@tycho.nsa.gov> --- security/selinux/hooks.c | 23 +++ 1 file changed, 23 insertions(+) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index de05207eb665..09be1dc21e58 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -756,6 +756,28 @@ static int selinux_set_mnt_opts(struct super_block *sb, goto out; } } + + /* +* If this is a user namespace mount, no contexts are allowed +* on the command line and security labels must be ignored. +*/ + if (sb->s_user_ns != _user_ns) { + if (context_sid || fscontext_sid || rootcontext_sid || + defcontext_sid) { + rc = -EACCES; + goto out; + } + if (sbsec->behavior == SECURITY_FS_USE_XATTR) { + sbsec->behavior = SECURITY_FS_USE_MNTPOINT; + rc = security_transition_sid(current_sid(), current_sid(), +SECCLASS_FILE, NULL, +>mntpoint_sid); + if (rc) + goto out; + } + goto out_set_opts; + } + /* sets the context of the superblock for the fs being mounted. */ if (fscontext_sid) { rc = may_context_mount_sb_relabel(fscontext_sid, sbsec, cred); @@ -824,6 +846,7 @@ static int selinux_set_mnt_opts(struct super_block *sb, sbsec->def_sid = defcontext_sid; } +out_set_opts: rc = sb_finish_set_opts(sb); out: mutex_unlock(>lock); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] security: selinux: Use a kmem_cache for allocation struct file_security_struct
On 10/05/2015 01:45 AM, Sangwoo wrote: > The size of struct file_security_struct is 16byte at my setup. > But, the real allocation size for per each file_security_struct > is 64bytes in my setup that kmalloc min size is 64bytes > because ARCH_DMA_MINALIGN is 64. > > This allocation is called every times at file allocation(alloc_file()). > So, the total slack memory size(allocated size - request size) > is increased exponentially. > > E.g) Min Kmalloc Size : 64bytes, Unit : bytes > Allocated Size | Request Size | Slack Size | Allocation Count > --- > 770048 |192512| 577536 | 12032 > > At the result, this change reduce memory usage 42bytes per each > file_security_struct > > Signed-off-by: Sangwoo Acked-by: Stephen Smalley > --- > security/selinux/hooks.c |8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > index 3f8d567..c20e082 100644 > --- a/security/selinux/hooks.c > +++ b/security/selinux/hooks.c > @@ -126,6 +126,7 @@ int selinux_enabled = 1; > #endif > > static struct kmem_cache *sel_inode_cache; > +static struct kmem_cache *file_security_cache; > > /** > * selinux_secmark_enabled - Check to see if SECMARK is currently enabled > @@ -287,7 +288,7 @@ static int file_alloc_security(struct file *file) > struct file_security_struct *fsec; > u32 sid = current_sid(); > > - fsec = kzalloc(sizeof(struct file_security_struct), GFP_KERNEL); > + fsec = kmem_cache_zalloc(file_security_cache, GFP_KERNEL); > if (!fsec) > return -ENOMEM; > > @@ -302,7 +303,7 @@ static void file_free_security(struct file *file) > { > struct file_security_struct *fsec = file->f_security; > file->f_security = NULL; > - kfree(fsec); > + kmem_cache_free(file_security_cache, fsec); > } > > static int superblock_alloc_security(struct super_block *sb) > @@ -6086,6 +6087,9 @@ static __init int selinux_init(void) > sel_inode_cache = kmem_cache_create("selinux_inode_security", > sizeof(struct > inode_security_struct), > 0, SLAB_PANIC, NULL); > + file_security_cache = kmem_cache_create("selinux_file_security", > + sizeof(struct file_security_struct), > + 0, SLAB_PANIC, NULL); > avc_init(); > > security_add_hooks(selinux_hooks, ARRAY_SIZE(selinux_hooks)); > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] security: selinux: Use a kmem_cache for allocation struct file_security_struct
On 10/05/2015 01:45 AM, Sangwoo wrote: > The size of struct file_security_struct is 16byte at my setup. > But, the real allocation size for per each file_security_struct > is 64bytes in my setup that kmalloc min size is 64bytes > because ARCH_DMA_MINALIGN is 64. > > This allocation is called every times at file allocation(alloc_file()). > So, the total slack memory size(allocated size - request size) > is increased exponentially. > > E.g) Min Kmalloc Size : 64bytes, Unit : bytes > Allocated Size | Request Size | Slack Size | Allocation Count > --- > 770048 |192512| 577536 | 12032 > > At the result, this change reduce memory usage 42bytes per each > file_security_struct > > Signed-off-by: Sangwoo <sangwoo2.p...@lge.com> Acked-by: Stephen Smalley <s...@tycho.nsa.gov> > --- > security/selinux/hooks.c |8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > index 3f8d567..c20e082 100644 > --- a/security/selinux/hooks.c > +++ b/security/selinux/hooks.c > @@ -126,6 +126,7 @@ int selinux_enabled = 1; > #endif > > static struct kmem_cache *sel_inode_cache; > +static struct kmem_cache *file_security_cache; > > /** > * selinux_secmark_enabled - Check to see if SECMARK is currently enabled > @@ -287,7 +288,7 @@ static int file_alloc_security(struct file *file) > struct file_security_struct *fsec; > u32 sid = current_sid(); > > - fsec = kzalloc(sizeof(struct file_security_struct), GFP_KERNEL); > + fsec = kmem_cache_zalloc(file_security_cache, GFP_KERNEL); > if (!fsec) > return -ENOMEM; > > @@ -302,7 +303,7 @@ static void file_free_security(struct file *file) > { > struct file_security_struct *fsec = file->f_security; > file->f_security = NULL; > - kfree(fsec); > + kmem_cache_free(file_security_cache, fsec); > } > > static int superblock_alloc_security(struct super_block *sb) > @@ -6086,6 +6087,9 @@ static __init int selinux_init(void) > sel_inode_cache = kmem_cache_create("selinux_inode_security", > sizeof(struct > inode_security_struct), > 0, SLAB_PANIC, NULL); > + file_security_cache = kmem_cache_create("selinux_file_security", > + sizeof(struct file_security_struct), > + 0, SLAB_PANIC, NULL); > avc_init(); > > security_add_hooks(selinux_hooks, ARRAY_SIZE(selinux_hooks)); > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 1/2] security: Add hook to invalidate inode security labels
On 10/05/2015 05:56 PM, Andreas Gruenbacher wrote: > On Mon, Oct 5, 2015 at 5:08 PM, Stephen Smalley wrote: >> Not fond of these magic initialized values. > > That should be a solvable problem. > >> Is it always safe to call inode_doinit() from all callers of >> inode_has_perm()? > > As long as inode_has_perm is only used in contexts in which a file > permission check / acl check would be possible, I don't see why not. > >> What about the cases where isec->sid is used without going through >> inode_has_perm()? > > inode_has_perm seems to be called frequently and invalid labels seem > to be reload quickly, so this change may make SELinux work well enough > to be useful on top of gfs2 or similar. More checks would of course be > better. The ideal case would be to always reload invalid labels, but > that currently won't be possible because we don't have dentries > everywhere. > > I can't tell if this is this good enough to provide a useful level of > protection. In any case, without a patch like this, on gfs2 and > similar file systems, SELinux currently doesn't work at all. > > How we can make progress with this problem? I think we'd need to wrap all uses of inode->i_security with a helper that applies this test. FWIW, many/most of them seem to have a dentry available, including all callers of inode_has_perm itself, so you could just use inode_doinit_with_dentry() for all of those cases. Maybe just inline inode_has_perm() and get rid of it. Need to deal appropriately with situations like selinux_inode_permission with MAY_NOT_BLOCK. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86/mm: warn on W+x mappings
On 10/06/2015 03:32 AM, Ingo Molnar wrote: > > * Stephen Smalley wrote: > >> On 10/03/2015 07:27 AM, Ingo Molnar wrote: >>> >>> * Stephen Smalley wrote: >>> >>>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c >>>> index 30564e2..f8b1573 100644 >>>> --- a/arch/x86/mm/init_64.c >>>> +++ b/arch/x86/mm/init_64.c >>>> @@ -1150,6 +1150,8 @@ void mark_rodata_ro(void) >>>>free_init_pages("unused kernel", >>>>(unsigned long) __va(__pa_symbol(rodata_end)), >>>>(unsigned long) __va(__pa_symbol(_sdata))); >>>> + >>>> + debug_checkwx(); >>> >>> Any reason to not do this on NX capable 32-bit kernels as well? >> >> Done in v3. However, I do see lots of W+X mappings there. > > Ha! That's a debug check plan gone very well! :) > >> [1.012796] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:225 >> note_page+0x65d/0x840() >> [1.012803] x86/mm: Found insecure W+X mapping at address >> f4a0/0xf4a0 > > What does this range correspond to on your kernel? >From dmesg: [0.00] virtual kernel memory layout: fixmap : 0xffa96000 - 0xf000 (5540 kB) pkmap : 0xff80 - 0xffa0 (2048 kB) vmalloc : 0xf7ffe000 - 0xff7fe000 ( 120 MB) lowmem : 0xc000 - 0xf77fe000 ( 887 MB) .init : 0xc0dde000 - 0xc0e9d000 ( 764 kB) .data : 0xc0aa2ba0 - 0xc0ddca00 (3303 kB) .text : 0xc040 - 0xc0aa2ba0 (6794 kB) /sys/kernel/debug/kernel_page_tables seems to have many such mappings, even before the reported one under Kernel Mapping, plus one in the vmalloc() area: ---[ Kernel Mapping ]--- 0xc000-0xc009b000 620K RW GLB NX pte 0xc009b000-0xc009c000 4K ro GLB NX pte 0xc009c000-0xc009d000 4K ro GLB x pte 0xc009d000-0xc0201420K RW GLB NX pte 0xc020-0xc040 2M RW PSE GLB NX pmd 0xc040-0xc0a0 6M ro PSE GLB x pmd 0xc0a0-0xc0aa3000 652K ro GLB x pte 0xc0aa3000-0xc0d2a0002588K ro GLB NX pte 0xc0d2a000-0xc1002904K RW GLB NX pte 0xc100-0xe700 608M RW PSE GLB NX pmd 0xe700-0xe7027000 156K RW GLB x pte 0xe7027000-0xe7028000 4K ro GLB x pte 0xe7028000-0xe709b000 460K RW GLB x pte 0xe709b000-0xe709c000 4K ro GLB x pte 0xe709c000-0xe70b8000 112K RW GLB x pte 0xe70b8000-0xe70b9000 4K ro GLB x pte 0xe70b9000-0xe7108000 316K RW GLB x pte 0xe7108000-0xe710a000 8K ro GLB x pte 0xe710a000-0xe7127000 116K RW GLB x pte 0xe7127000-0xe712a000 12K ro GLB x pte 0xf2c5c000-0xf2c5d000 4K ro GLB x pte 0xf2c5d000-0xf2e01676K RW GLB x pte 0xf2e0-0xf4a0 28M RW PSE GLB NX pmd 0xf4a0-0xf4b280001184K RW GLB x pte 0xf4b28000-0xf4c0 864K RW GLB NX pte 0xf4c0-0xf520 6M RW PSE GLB x pmd 0xf520-0xf525d000 372K RW GLB x pte 0xf525d000-0xf525e000 4K ro GLB x pte 0xf525e000-0xf525f000 4K RW GLB x pte 0xf525f000-0xf526 4K ro GLB x pte 0xf526-0xf526a000 40K RW GLB x pte 0xf640-0xf658c0001584K RW GLB NX pte 0xf658c000-0xf660 464K RW GLB x pte 0xf660-0xf760 16M RW PSE GLB NX pmd 0xf760-0xf77fe0002040K RW GLB NX pte 0xf77fe000-0xf780 8K pte 0xf780-0xf7e0 6M pmd 0xf7e0-0xf7ffe0002040K pte ---[ vmalloc() Area ]--- 0xf7ffe000-0xf7fff000 4K RW GLB NX pte 0xf7fff000-0xf800 4K pte 0xf800-0xf8002000 8K RW GLB NX pte ... 0xf86f3000-0xf8801076K pte 0xf880-0xf8a0 2M RW PWT PSE GLB x pmd 0xf8a0-0xf8b0
[tip:x86/mm] x86/mm: Warn on W^X mappings
Commit-ID: e1a58320a38dfa72be48a0f1a3a92273663ba6db Gitweb: http://git.kernel.org/tip/e1a58320a38dfa72be48a0f1a3a92273663ba6db Author: Stephen Smalley AuthorDate: Mon, 5 Oct 2015 12:55:20 -0400 Committer: Ingo Molnar CommitDate: Tue, 6 Oct 2015 11:11:48 +0200 x86/mm: Warn on W^X mappings Warn on any residual W+X mappings after setting NX if DEBUG_WX is enabled. Introduce a separate X86_PTDUMP_CORE config that enables the code for dumping the page tables without enabling the debugfs interface, so that DEBUG_WX can be enabled without exposing the debugfs interface. Switch EFI_PGT_DUMP to using X86_PTDUMP_CORE so that it also does not require enabling the debugfs interface. On success it prints this to the kernel log: x86/mm: Checked W+X mappings: passed, no W+X pages found. On failure it prints a warning and a count of the failed pages: [ cut here ] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:226 note_page+0x610/0x7b0() x86/mm: Found insecure W+X mapping at address 81755000/__stop___ex_table+0xfa8/0xabfa8 [...] Call Trace: [] dump_stack+0x44/0x55 [] warn_slowpath_common+0x82/0xc0 [] warn_slowpath_fmt+0x5c/0x80 [] ? note_page+0x5c9/0x7b0 [] note_page+0x610/0x7b0 [] ptdump_walk_pgd_level_core+0x259/0x3c0 [] ptdump_walk_pgd_level_checkwx+0x17/0x20 [] mark_rodata_ro+0xf5/0x100 [] ? rest_init+0x80/0x80 [] kernel_init+0x1d/0xe0 [] ret_from_fork+0x3f/0x70 [] ? rest_init+0x80/0x80 ---[ end trace a1f23a1e42a2ac76 ]--- x86/mm: Checked W+X mappings: FAILED, 171 W+X pages found. Signed-off-by: Stephen Smalley Acked-by: Kees Cook Cc: Andy Lutomirski Cc: Arjan van de Ven Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1444064120-11450-1-git-send-email-...@tycho.nsa.gov [ Improved the Kconfig help text and made the new option default-y if CONFIG_DEBUG_RODATA=y, because it already found buggy mappings, so we really want people to have this on by default. ] Signed-off-by: Ingo Molnar --- arch/x86/Kconfig.debug | 36 +++- arch/x86/include/asm/pgtable.h | 7 +++ arch/x86/mm/Makefile | 2 +- arch/x86/mm/dump_pagetables.c | 42 +- arch/x86/mm/init_32.c | 2 ++ arch/x86/mm/init_64.c | 2 ++ 6 files changed, 88 insertions(+), 3 deletions(-) diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index d8c0d32..3e0baf7 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -65,10 +65,14 @@ config EARLY_PRINTK_EFI This is useful for kernel debugging when your machine crashes very early before the console code is initialized. +config X86_PTDUMP_CORE + def_bool n + config X86_PTDUMP bool "Export kernel pagetable layout to userspace via debugfs" depends on DEBUG_KERNEL select DEBUG_FS + select X86_PTDUMP_CORE ---help--- Say Y here if you want to show the kernel pagetable layout in a debugfs file. This information is only useful for kernel developers @@ -79,7 +83,8 @@ config X86_PTDUMP config EFI_PGT_DUMP bool "Dump the EFI pagetable" - depends on EFI && X86_PTDUMP + depends on EFI + select X86_PTDUMP_CORE ---help--- Enable this if you want to dump the EFI page table before enabling virtual mode. This can be used to debug miscellaneous @@ -105,6 +110,35 @@ config DEBUG_RODATA_TEST feature as well as for the change_page_attr() infrastructure. If in doubt, say "N" +config DEBUG_WX + bool "Warn on W+X mappings at boot" + depends on DEBUG_RODATA + default y + select X86_PTDUMP_CORE + ---help--- + Generate a warning if any W+X mappings are found at boot. + + This is useful for discovering cases where the kernel is leaving + W+X mappings after applying NX, as such mappings are a security risk. + + Look for a message in dmesg output like this: + + x86/mm: Checked W+X mappings: passed, no W+X pages found. + + or like this, if the check failed: + + x86/mm: Checked W+X mappings: FAILED, W+X pages found. + + Note that even if the check fails, your kernel is possibly + still fine, as W+X mappings are not a security hole in + themselves, what they do is that they make the exploitation + of other unfixed kernel bugs easier. + + There is no runtime or memory usage effect of this option + once the kernel has booted up - it's a one time check. + + If in doubt, say "Y". + config DEBUG_SET_MODULE_RONX bool "Set loadable kernel module data as
Re: [PATCH v2] x86/mm: warn on W+x mappings
On 10/06/2015 03:32 AM, Ingo Molnar wrote: > > * Stephen Smalley <s...@tycho.nsa.gov> wrote: > >> On 10/03/2015 07:27 AM, Ingo Molnar wrote: >>> >>> * Stephen Smalley <s...@tycho.nsa.gov> wrote: >>> >>>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c >>>> index 30564e2..f8b1573 100644 >>>> --- a/arch/x86/mm/init_64.c >>>> +++ b/arch/x86/mm/init_64.c >>>> @@ -1150,6 +1150,8 @@ void mark_rodata_ro(void) >>>>free_init_pages("unused kernel", >>>>(unsigned long) __va(__pa_symbol(rodata_end)), >>>>(unsigned long) __va(__pa_symbol(_sdata))); >>>> + >>>> + debug_checkwx(); >>> >>> Any reason to not do this on NX capable 32-bit kernels as well? >> >> Done in v3. However, I do see lots of W+X mappings there. > > Ha! That's a debug check plan gone very well! :) > >> [1.012796] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:225 >> note_page+0x65d/0x840() >> [1.012803] x86/mm: Found insecure W+X mapping at address >> f4a0/0xf4a0 > > What does this range correspond to on your kernel? >From dmesg: [0.00] virtual kernel memory layout: fixmap : 0xffa96000 - 0xf000 (5540 kB) pkmap : 0xff80 - 0xffa0 (2048 kB) vmalloc : 0xf7ffe000 - 0xff7fe000 ( 120 MB) lowmem : 0xc000 - 0xf77fe000 ( 887 MB) .init : 0xc0dde000 - 0xc0e9d000 ( 764 kB) .data : 0xc0aa2ba0 - 0xc0ddca00 (3303 kB) .text : 0xc040 - 0xc0aa2ba0 (6794 kB) /sys/kernel/debug/kernel_page_tables seems to have many such mappings, even before the reported one under Kernel Mapping, plus one in the vmalloc() area: ---[ Kernel Mapping ]--- 0xc000-0xc009b000 620K RW GLB NX pte 0xc009b000-0xc009c000 4K ro GLB NX pte 0xc009c000-0xc009d000 4K ro GLB x pte 0xc009d000-0xc0201420K RW GLB NX pte 0xc020-0xc040 2M RW PSE GLB NX pmd 0xc040-0xc0a0 6M ro PSE GLB x pmd 0xc0a0-0xc0aa3000 652K ro GLB x pte 0xc0aa3000-0xc0d2a0002588K ro GLB NX pte 0xc0d2a000-0xc1002904K RW GLB NX pte 0xc100-0xe700 608M RW PSE GLB NX pmd 0xe700-0xe7027000 156K RW GLB x pte 0xe7027000-0xe7028000 4K ro GLB x pte 0xe7028000-0xe709b000 460K RW GLB x pte 0xe709b000-0xe709c000 4K ro GLB x pte 0xe709c000-0xe70b8000 112K RW GLB x pte 0xe70b8000-0xe70b9000 4K ro GLB x pte 0xe70b9000-0xe7108000 316K RW GLB x pte 0xe7108000-0xe710a000 8K ro GLB x pte 0xe710a000-0xe7127000 116K RW GLB x pte 0xe7127000-0xe712a000 12K ro GLB x pte 0xf2c5c000-0xf2c5d000 4K ro GLB x pte 0xf2c5d000-0xf2e01676K RW GLB x pte 0xf2e0-0xf4a0 28M RW PSE GLB NX pmd 0xf4a0-0xf4b280001184K RW GLB x pte 0xf4b28000-0xf4c0 864K RW GLB NX pte 0xf4c0-0xf520 6M RW PSE GLB x pmd 0xf520-0xf525d000 372K RW GLB x pte 0xf525d000-0xf525e000 4K ro GLB x pte 0xf525e000-0xf525f000 4K RW GLB x pte 0xf525f000-0xf526 4K ro GLB x pte 0xf526-0xf526a000 40K RW GLB x pte 0xf640-0xf658c0001584K RW GLB NX pte 0xf658c000-0xf660 464K RW GLB x pte 0xf660-0xf760 16M RW PSE GLB NX pmd 0xf760-0xf77fe0002040K RW GLB NX pte 0xf77fe000-0xf780 8K pte 0xf780-0xf7e0 6M pmd 0xf7e0-0xf7ffe0002040K pte ---[ vmalloc() Area ]--- 0xf7ffe000-0xf7fff000 4K RW GLB NX pte 0xf7fff000-0xf800 4K pte 0xf800-0xf8002000 8K RW GLB NX pte ... 0xf86f3000-0xf8801076K pte 0xf880-0xf8a0 2M RW
Re: [PATCH v2 1/2] security: Add hook to invalidate inode security labels
On 10/05/2015 05:56 PM, Andreas Gruenbacher wrote: > On Mon, Oct 5, 2015 at 5:08 PM, Stephen Smalley <s...@tycho.nsa.gov> wrote: >> Not fond of these magic initialized values. > > That should be a solvable problem. > >> Is it always safe to call inode_doinit() from all callers of >> inode_has_perm()? > > As long as inode_has_perm is only used in contexts in which a file > permission check / acl check would be possible, I don't see why not. > >> What about the cases where isec->sid is used without going through >> inode_has_perm()? > > inode_has_perm seems to be called frequently and invalid labels seem > to be reload quickly, so this change may make SELinux work well enough > to be useful on top of gfs2 or similar. More checks would of course be > better. The ideal case would be to always reload invalid labels, but > that currently won't be possible because we don't have dentries > everywhere. > > I can't tell if this is this good enough to provide a useful level of > protection. In any case, without a patch like this, on gfs2 and > similar file systems, SELinux currently doesn't work at all. > > How we can make progress with this problem? I think we'd need to wrap all uses of inode->i_security with a helper that applies this test. FWIW, many/most of them seem to have a dentry available, including all callers of inode_has_perm itself, so you could just use inode_doinit_with_dentry() for all of those cases. Maybe just inline inode_has_perm() and get rid of it. Need to deal appropriately with situations like selinux_inode_permission with MAY_NOT_BLOCK. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/mm] x86/mm: Warn on W^X mappings
Commit-ID: e1a58320a38dfa72be48a0f1a3a92273663ba6db Gitweb: http://git.kernel.org/tip/e1a58320a38dfa72be48a0f1a3a92273663ba6db Author: Stephen Smalley <s...@tycho.nsa.gov> AuthorDate: Mon, 5 Oct 2015 12:55:20 -0400 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Tue, 6 Oct 2015 11:11:48 +0200 x86/mm: Warn on W^X mappings Warn on any residual W+X mappings after setting NX if DEBUG_WX is enabled. Introduce a separate X86_PTDUMP_CORE config that enables the code for dumping the page tables without enabling the debugfs interface, so that DEBUG_WX can be enabled without exposing the debugfs interface. Switch EFI_PGT_DUMP to using X86_PTDUMP_CORE so that it also does not require enabling the debugfs interface. On success it prints this to the kernel log: x86/mm: Checked W+X mappings: passed, no W+X pages found. On failure it prints a warning and a count of the failed pages: [ cut here ] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:226 note_page+0x610/0x7b0() x86/mm: Found insecure W+X mapping at address 81755000/__stop___ex_table+0xfa8/0xabfa8 [...] Call Trace: [] dump_stack+0x44/0x55 [] warn_slowpath_common+0x82/0xc0 [] warn_slowpath_fmt+0x5c/0x80 [] ? note_page+0x5c9/0x7b0 [] note_page+0x610/0x7b0 [] ptdump_walk_pgd_level_core+0x259/0x3c0 [] ptdump_walk_pgd_level_checkwx+0x17/0x20 [] mark_rodata_ro+0xf5/0x100 [] ? rest_init+0x80/0x80 [] kernel_init+0x1d/0xe0 [] ret_from_fork+0x3f/0x70 [] ? rest_init+0x80/0x80 ---[ end trace a1f23a1e42a2ac76 ]--- x86/mm: Checked W+X mappings: FAILED, 171 W+X pages found. Signed-off-by: Stephen Smalley <s...@tycho.nsa.gov> Acked-by: Kees Cook <keesc...@chromium.org> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Arjan van de Ven <ar...@linux.intel.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Brian Gerst <brge...@gmail.com> Cc: Denys Vlasenko <dvlas...@redhat.com> Cc: H. Peter Anvin <h...@zytor.com> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Mike Galbraith <efa...@gmx.de> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1444064120-11450-1-git-send-email-...@tycho.nsa.gov [ Improved the Kconfig help text and made the new option default-y if CONFIG_DEBUG_RODATA=y, because it already found buggy mappings, so we really want people to have this on by default. ] Signed-off-by: Ingo Molnar <mi...@kernel.org> --- arch/x86/Kconfig.debug | 36 +++- arch/x86/include/asm/pgtable.h | 7 +++ arch/x86/mm/Makefile | 2 +- arch/x86/mm/dump_pagetables.c | 42 +- arch/x86/mm/init_32.c | 2 ++ arch/x86/mm/init_64.c | 2 ++ 6 files changed, 88 insertions(+), 3 deletions(-) diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index d8c0d32..3e0baf7 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -65,10 +65,14 @@ config EARLY_PRINTK_EFI This is useful for kernel debugging when your machine crashes very early before the console code is initialized. +config X86_PTDUMP_CORE + def_bool n + config X86_PTDUMP bool "Export kernel pagetable layout to userspace via debugfs" depends on DEBUG_KERNEL select DEBUG_FS + select X86_PTDUMP_CORE ---help--- Say Y here if you want to show the kernel pagetable layout in a debugfs file. This information is only useful for kernel developers @@ -79,7 +83,8 @@ config X86_PTDUMP config EFI_PGT_DUMP bool "Dump the EFI pagetable" - depends on EFI && X86_PTDUMP + depends on EFI + select X86_PTDUMP_CORE ---help--- Enable this if you want to dump the EFI page table before enabling virtual mode. This can be used to debug miscellaneous @@ -105,6 +110,35 @@ config DEBUG_RODATA_TEST feature as well as for the change_page_attr() infrastructure. If in doubt, say "N" +config DEBUG_WX + bool "Warn on W+X mappings at boot" + depends on DEBUG_RODATA + default y + select X86_PTDUMP_CORE + ---help--- + Generate a warning if any W+X mappings are found at boot. + + This is useful for discovering cases where the kernel is leaving + W+X mappings after applying NX, as such mappings are a security risk. + + Look for a message in dmesg output like this: + + x86/mm: Checked W+X mappings: passed, no W+X pages found. + + or like this, if the check failed: + + x86/mm: Checked W+X mappings: FAILED, W+X pages found. + + Note that even if the check fails, your kernel is possibly + still fine, as W+X mappings are
Re: [PATCH v2] x86/mm: warn on W+x mappings
On 10/03/2015 07:27 AM, Ingo Molnar wrote: > > * Stephen Smalley wrote: > >> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c >> index 30564e2..f8b1573 100644 >> --- a/arch/x86/mm/init_64.c >> +++ b/arch/x86/mm/init_64.c >> @@ -1150,6 +1150,8 @@ void mark_rodata_ro(void) >> free_init_pages("unused kernel", >> (unsigned long) __va(__pa_symbol(rodata_end)), >> (unsigned long) __va(__pa_symbol(_sdata))); >> + >> +debug_checkwx(); > > Any reason to not do this on NX capable 32-bit kernels as well? Done in v3. However, I do see lots of W+X mappings there. [1.012796] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x65d/0x840() [1.012803] x86/mm: Found insecure W+X mapping at address f4a0/0xf4a0 [1.012805] Modules linked in: [1.012833] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc4+ #2 [1.012837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014 [1.012844] c0d32967 173b7da7 f7105e7c c0713490 f7105ebc f7105eac c045d077 [1.012848] c0c47ef8 f7105edc 0001 c0c4de42 00e1 c04551fd c04551fd f7105f3c [1.012851] 0002 f7105ec8 c045d0ee 0009 f7105ebc c0c47ef8 f7105edc [1.012855] Call Trace: [1.012868] [] dump_stack+0x41/0x61 [1.012871] [] warn_slowpath_common+0x87/0xc0 [1.012873] [] ? note_page+0x65d/0x840 [1.012875] [] ? note_page+0x65d/0x840 [1.012877] [] warn_slowpath_fmt+0x3e/0x60 [1.012878] [] note_page+0x65d/0x840 [1.012880] [] ptdump_walk_pgd_level_core+0x1d6/0x2d0 [1.012883] [] ptdump_walk_pgd_level_checkwx+0x16/0x20 [1.012886] [] mark_rodata_ro+0x135/0x160 [1.012898] [] kernel_init+0x1f/0xe0 [1.012906] [] ? schedule_tail+0x11/0x50 [1.012909] [] ret_from_kernel_thread+0x21/0x30 [1.012910] [] ? rest_init+0x70/0x70 [1.012912] ---[ end trace 40a4f3d5e8fb70ac ]--- [1.012954] x86/mm: Checked W+X mappings: FAILED, 6556 W+X pages found. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3] x86/mm: warn on W+x mappings
Warn on any residual W+x mappings after setting NX if DEBUG_WX is enabled. Introduce a separate X86_PTDUMP_CORE config that enables the code for dumping the page tables without enabling the debugfs interface, so that DEBUG_WX can be enabled without exposing the debugfs interface. Switch EFI_PGT_DUMP to using X86_PTDUMP_CORE so that it also does not require enabling the debugfs interface. On success: x86/mm: Checked W+X mappings: passed, no W+X pages found. On failure: [ cut here ] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:226 note_page+0x610/0x7b0() x86/mm: Found insecure W+X mapping at address 81755000/__stop___ex_table+0xfa8/0xabfa8 Modules linked in: CPU: 1 PID: 1 Comm: swapper/0 Tainted: GW 4.3.0-rc3+ #19 e96b193f 88042c5dbd48 81380a5f 88042c5dbd90 88042c5dbd80 8109d3f2 81e1 0003 88042c5dbe90 88042c5dbe90 Call Trace: [] dump_stack+0x44/0x55 [] warn_slowpath_common+0x82/0xc0 [] warn_slowpath_fmt+0x5c/0x80 [] ? note_page+0x5c9/0x7b0 [] note_page+0x610/0x7b0 [] ptdump_walk_pgd_level_core+0x259/0x3c0 [] ptdump_walk_pgd_level_checkwx+0x17/0x20 [] mark_rodata_ro+0xf5/0x100 [] ? rest_init+0x80/0x80 [] kernel_init+0x1d/0xe0 [] ret_from_fork+0x3f/0x70 [] ? rest_init+0x80/0x80 ---[ end trace a1f23a1e42a2ac76 ]--- x86/mm: Checked W+X mappings: FAILED, 171 W+X pages found. Signed-off-by: Stephen Smalley --- v3 enables the checks on 32-bit if NX is supported, and also makes DEBUG_WX depend on DEBUG_RODATA since both the NX marking and the checking occurs from mark_rodata_ro(). arch/x86/Kconfig.debug | 20 +++- arch/x86/include/asm/pgtable.h | 7 +++ arch/x86/mm/Makefile | 2 +- arch/x86/mm/dump_pagetables.c | 42 +- arch/x86/mm/init_32.c | 2 ++ arch/x86/mm/init_64.c | 2 ++ 6 files changed, 72 insertions(+), 3 deletions(-) diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index d8c0d32..d09fde7 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -65,10 +65,14 @@ config EARLY_PRINTK_EFI This is useful for kernel debugging when your machine crashes very early before the console code is initialized. +config X86_PTDUMP_CORE + def_bool n + config X86_PTDUMP bool "Export kernel pagetable layout to userspace via debugfs" depends on DEBUG_KERNEL select DEBUG_FS + select X86_PTDUMP_CORE ---help--- Say Y here if you want to show the kernel pagetable layout in a debugfs file. This information is only useful for kernel developers @@ -79,7 +83,8 @@ config X86_PTDUMP config EFI_PGT_DUMP bool "Dump the EFI pagetable" - depends on EFI && X86_PTDUMP + depends on EFI + select X86_PTDUMP_CORE ---help--- Enable this if you want to dump the EFI page table before enabling virtual mode. This can be used to debug miscellaneous @@ -105,6 +110,19 @@ config DEBUG_RODATA_TEST feature as well as for the change_page_attr() infrastructure. If in doubt, say "N" +config DEBUG_WX + bool "Warn on W+X mappings at boot" + depends on DEBUG_RODATA + select X86_PTDUMP_CORE + ---help--- + Generate a warning if any W+X mappings are found at boot. + This is useful for discovering cases where the kernel is leaving + W+X mappings after applying NX, as such mappings are a security risk. + Look for a message in dmesg output like this: + x86/mm: Checked W+X mappings: passed, no W+X pages found. + or like this: + x86/mm: Checked W+X mappings: FAILED, W+X pages found. + config DEBUG_SET_MODULE_RONX bool "Set loadable kernel module data as NX and text as RO" depends on MODULES diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 867da5b..f2b6bed 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -19,6 +19,13 @@ #include void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd); +void ptdump_walk_pgd_level_checkwx(void); + +#ifdef CONFIG_DEBUG_WX +#define debug_checkwx() ptdump_walk_pgd_level_checkwx() +#else +#define debug_checkwx() do { } while (0) +#endif /* * ZERO_PAGE is a global shared page that is always zero: used diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index a482d10..65c47fd 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -14,7 +14,7 @@ obj-$(CONFIG_SMP) += tlb.o obj-$(CONFIG_X86_32) += pgtable_32.o iomap_32.o obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o -obj-$(CONFIG_X86_PTDUMP) += dump_pagetables.o +obj-$(CONFIG_X86_PTDUMP_CORE) += dump_pagetables.o obj-$(CONFIG_HIGHMEM) += h
Re: [PATCH v2 1/2] security: Add hook to invalidate inode security labels
On 10/04/2015 03:19 PM, Andreas Gruenbacher wrote: Add a hook to invalidate an inode's security label when the cached information becomes invalid. Implement the new hook in selinux: set a flag when a security label becomes invalid. When hitting a security label which has been marked as invalid in inode_has_perm, try reloading the label. If an inode does not have any dentries attached, we cannot reload its security label because we cannot use the getxattr inode operation. In that case, continue using the old, invalid label until a dentry becomes available. Signed-off-by: Andreas Gruenbacher Cc: Paul Moore Cc: Stephen Smalley Cc: Eric Paris Cc: seli...@tycho.nsa.gov --- include/linux/lsm_hooks.h | 6 ++ include/linux/security.h | 5 + security/security.c | 8 security/selinux/hooks.c | 23 +-- security/selinux/include/objsec.h | 3 ++- 5 files changed, 42 insertions(+), 3 deletions(-) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index ec3a6ba..945ae1d 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1261,6 +1261,10 @@ *audit_rule_init. *@rule contains the allocated rule * + * @inode_invalidate_secctx: + * Notify the security module that it must revalidate the security context + * of an inode. + * * @inode_notifysecctx: *Notify the security module of what the security context of an inode *should be. Initializes the incore security context managed by the @@ -1516,6 +1520,7 @@ union security_list_options { int (*secctx_to_secid)(const char *secdata, u32 seclen, u32 *secid); void (*release_secctx)(char *secdata, u32 seclen); + void (*inode_invalidate_secctx)(struct inode *inode); int (*inode_notifysecctx)(struct inode *inode, void *ctx, u32 ctxlen); int (*inode_setsecctx)(struct dentry *dentry, void *ctx, u32 ctxlen); int (*inode_getsecctx)(struct inode *inode, void **ctx, u32 *ctxlen); @@ -1757,6 +1762,7 @@ struct security_hook_heads { struct list_head secid_to_secctx; struct list_head secctx_to_secid; struct list_head release_secctx; + struct list_head inode_invalidate_secctx; struct list_head inode_notifysecctx; struct list_head inode_setsecctx; struct list_head inode_getsecctx; diff --git a/include/linux/security.h b/include/linux/security.h index 2f4c1f7..9692571 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -353,6 +353,7 @@ int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen); int security_secctx_to_secid(const char *secdata, u32 seclen, u32 *secid); void security_release_secctx(char *secdata, u32 seclen); +void security_inode_invalidate_secctx(struct inode *inode); int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen); int security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen); int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen); @@ -1093,6 +1094,10 @@ static inline void security_release_secctx(char *secdata, u32 seclen) { } +static inline void security_inode_invalidate_secctx(struct inode *inode) +{ +} + static inline int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen) { return -EOPNOTSUPP; diff --git a/security/security.c b/security/security.c index 46f405c..e4371cd 100644 --- a/security/security.c +++ b/security/security.c @@ -1161,6 +1161,12 @@ void security_release_secctx(char *secdata, u32 seclen) } EXPORT_SYMBOL(security_release_secctx); +void security_inode_invalidate_secctx(struct inode *inode) +{ + call_void_hook(inode_invalidate_secctx, inode); +} +EXPORT_SYMBOL(security_inode_invalidate_secctx); + int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen) { return call_int_hook(inode_notifysecctx, 0, inode, ctx, ctxlen); @@ -1763,6 +1769,8 @@ struct security_hook_heads security_hook_heads = { LIST_HEAD_INIT(security_hook_heads.secctx_to_secid), .release_secctx = LIST_HEAD_INIT(security_hook_heads.release_secctx), + .inode_invalidate_secctx = + LIST_HEAD_INIT(security_hook_heads.inode_invalidate_secctx), .inode_notifysecctx = LIST_HEAD_INIT(security_hook_heads.inode_notifysecctx), .inode_setsecctx = diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index e4369d8..c5e4ca8 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -1293,11 +1293,11 @@ static int inode_doinit_with_dentry(struct inode *inode, struct dentry *opt_dent unsigned len = 0; int rc = 0; - if (isec->initialized) + if (isec->initialized == 1) goto out; mutex_lock(>lock); - if (isec->initialized) + if (isec->initialized == 1)
Re: [PATCH v2] x86/mm: warn on W+x mappings
On 10/03/2015 07:27 AM, Ingo Molnar wrote: > > * Stephen Smalley <s...@tycho.nsa.gov> wrote: > >> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c >> index 30564e2..f8b1573 100644 >> --- a/arch/x86/mm/init_64.c >> +++ b/arch/x86/mm/init_64.c >> @@ -1150,6 +1150,8 @@ void mark_rodata_ro(void) >> free_init_pages("unused kernel", >> (unsigned long) __va(__pa_symbol(rodata_end)), >> (unsigned long) __va(__pa_symbol(_sdata))); >> + >> +debug_checkwx(); > > Any reason to not do this on NX capable 32-bit kernels as well? Done in v3. However, I do see lots of W+X mappings there. [1.012796] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x65d/0x840() [1.012803] x86/mm: Found insecure W+X mapping at address f4a0/0xf4a0 [1.012805] Modules linked in: [1.012833] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc4+ #2 [1.012837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014 [1.012844] c0d32967 173b7da7 f7105e7c c0713490 f7105ebc f7105eac c045d077 [1.012848] c0c47ef8 f7105edc 0001 c0c4de42 00e1 c04551fd c04551fd f7105f3c [1.012851] 0002 f7105ec8 c045d0ee 0009 f7105ebc c0c47ef8 f7105edc [1.012855] Call Trace: [1.012868] [] dump_stack+0x41/0x61 [1.012871] [] warn_slowpath_common+0x87/0xc0 [1.012873] [] ? note_page+0x65d/0x840 [1.012875] [] ? note_page+0x65d/0x840 [1.012877] [] warn_slowpath_fmt+0x3e/0x60 [1.012878] [] note_page+0x65d/0x840 [1.012880] [] ptdump_walk_pgd_level_core+0x1d6/0x2d0 [1.012883] [] ptdump_walk_pgd_level_checkwx+0x16/0x20 [1.012886] [] mark_rodata_ro+0x135/0x160 [1.012898] [] kernel_init+0x1f/0xe0 [1.012906] [] ? schedule_tail+0x11/0x50 [1.012909] [] ret_from_kernel_thread+0x21/0x30 [1.012910] [] ? rest_init+0x70/0x70 [1.012912] ---[ end trace 40a4f3d5e8fb70ac ]--- [1.012954] x86/mm: Checked W+X mappings: FAILED, 6556 W+X pages found. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 1/2] security: Add hook to invalidate inode security labels
On 10/04/2015 03:19 PM, Andreas Gruenbacher wrote: Add a hook to invalidate an inode's security label when the cached information becomes invalid. Implement the new hook in selinux: set a flag when a security label becomes invalid. When hitting a security label which has been marked as invalid in inode_has_perm, try reloading the label. If an inode does not have any dentries attached, we cannot reload its security label because we cannot use the getxattr inode operation. In that case, continue using the old, invalid label until a dentry becomes available. Signed-off-by: Andreas Gruenbacher <agrue...@redhat.com> Cc: Paul Moore <p...@paul-moore.com> Cc: Stephen Smalley <s...@tycho.nsa.gov> Cc: Eric Paris <epa...@parisplace.org> Cc: seli...@tycho.nsa.gov --- include/linux/lsm_hooks.h | 6 ++ include/linux/security.h | 5 + security/security.c | 8 security/selinux/hooks.c | 23 +-- security/selinux/include/objsec.h | 3 ++- 5 files changed, 42 insertions(+), 3 deletions(-) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index ec3a6ba..945ae1d 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1261,6 +1261,10 @@ *audit_rule_init. *@rule contains the allocated rule * + * @inode_invalidate_secctx: + * Notify the security module that it must revalidate the security context + * of an inode. + * * @inode_notifysecctx: *Notify the security module of what the security context of an inode *should be. Initializes the incore security context managed by the @@ -1516,6 +1520,7 @@ union security_list_options { int (*secctx_to_secid)(const char *secdata, u32 seclen, u32 *secid); void (*release_secctx)(char *secdata, u32 seclen); + void (*inode_invalidate_secctx)(struct inode *inode); int (*inode_notifysecctx)(struct inode *inode, void *ctx, u32 ctxlen); int (*inode_setsecctx)(struct dentry *dentry, void *ctx, u32 ctxlen); int (*inode_getsecctx)(struct inode *inode, void **ctx, u32 *ctxlen); @@ -1757,6 +1762,7 @@ struct security_hook_heads { struct list_head secid_to_secctx; struct list_head secctx_to_secid; struct list_head release_secctx; + struct list_head inode_invalidate_secctx; struct list_head inode_notifysecctx; struct list_head inode_setsecctx; struct list_head inode_getsecctx; diff --git a/include/linux/security.h b/include/linux/security.h index 2f4c1f7..9692571 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -353,6 +353,7 @@ int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen); int security_secctx_to_secid(const char *secdata, u32 seclen, u32 *secid); void security_release_secctx(char *secdata, u32 seclen); +void security_inode_invalidate_secctx(struct inode *inode); int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen); int security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen); int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen); @@ -1093,6 +1094,10 @@ static inline void security_release_secctx(char *secdata, u32 seclen) { } +static inline void security_inode_invalidate_secctx(struct inode *inode) +{ +} + static inline int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen) { return -EOPNOTSUPP; diff --git a/security/security.c b/security/security.c index 46f405c..e4371cd 100644 --- a/security/security.c +++ b/security/security.c @@ -1161,6 +1161,12 @@ void security_release_secctx(char *secdata, u32 seclen) } EXPORT_SYMBOL(security_release_secctx); +void security_inode_invalidate_secctx(struct inode *inode) +{ + call_void_hook(inode_invalidate_secctx, inode); +} +EXPORT_SYMBOL(security_inode_invalidate_secctx); + int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen) { return call_int_hook(inode_notifysecctx, 0, inode, ctx, ctxlen); @@ -1763,6 +1769,8 @@ struct security_hook_heads security_hook_heads = { LIST_HEAD_INIT(security_hook_heads.secctx_to_secid), .release_secctx = LIST_HEAD_INIT(security_hook_heads.release_secctx), + .inode_invalidate_secctx = + LIST_HEAD_INIT(security_hook_heads.inode_invalidate_secctx), .inode_notifysecctx = LIST_HEAD_INIT(security_hook_heads.inode_notifysecctx), .inode_setsecctx = diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index e4369d8..c5e4ca8 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -1293,11 +1293,11 @@ static int inode_doinit_with_dentry(struct inode *inode, struct dentry *opt_dent unsigned len = 0; int rc = 0; - if (isec->initialized) + if (isec->initialized == 1) goto out; mutex
[PATCH v3] x86/mm: warn on W+x mappings
Warn on any residual W+x mappings after setting NX if DEBUG_WX is enabled. Introduce a separate X86_PTDUMP_CORE config that enables the code for dumping the page tables without enabling the debugfs interface, so that DEBUG_WX can be enabled without exposing the debugfs interface. Switch EFI_PGT_DUMP to using X86_PTDUMP_CORE so that it also does not require enabling the debugfs interface. On success: x86/mm: Checked W+X mappings: passed, no W+X pages found. On failure: [ cut here ] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:226 note_page+0x610/0x7b0() x86/mm: Found insecure W+X mapping at address 81755000/__stop___ex_table+0xfa8/0xabfa8 Modules linked in: CPU: 1 PID: 1 Comm: swapper/0 Tainted: GW 4.3.0-rc3+ #19 e96b193f 88042c5dbd48 81380a5f 88042c5dbd90 88042c5dbd80 8109d3f2 81e1 0003 88042c5dbe90 88042c5dbe90 Call Trace: [] dump_stack+0x44/0x55 [] warn_slowpath_common+0x82/0xc0 [] warn_slowpath_fmt+0x5c/0x80 [] ? note_page+0x5c9/0x7b0 [] note_page+0x610/0x7b0 [] ptdump_walk_pgd_level_core+0x259/0x3c0 [] ptdump_walk_pgd_level_checkwx+0x17/0x20 [] mark_rodata_ro+0xf5/0x100 [] ? rest_init+0x80/0x80 [] kernel_init+0x1d/0xe0 [] ret_from_fork+0x3f/0x70 [] ? rest_init+0x80/0x80 ---[ end trace a1f23a1e42a2ac76 ]--- x86/mm: Checked W+X mappings: FAILED, 171 W+X pages found. Signed-off-by: Stephen Smalley <s...@tycho.nsa.gov> --- v3 enables the checks on 32-bit if NX is supported, and also makes DEBUG_WX depend on DEBUG_RODATA since both the NX marking and the checking occurs from mark_rodata_ro(). arch/x86/Kconfig.debug | 20 +++- arch/x86/include/asm/pgtable.h | 7 +++ arch/x86/mm/Makefile | 2 +- arch/x86/mm/dump_pagetables.c | 42 +- arch/x86/mm/init_32.c | 2 ++ arch/x86/mm/init_64.c | 2 ++ 6 files changed, 72 insertions(+), 3 deletions(-) diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index d8c0d32..d09fde7 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -65,10 +65,14 @@ config EARLY_PRINTK_EFI This is useful for kernel debugging when your machine crashes very early before the console code is initialized. +config X86_PTDUMP_CORE + def_bool n + config X86_PTDUMP bool "Export kernel pagetable layout to userspace via debugfs" depends on DEBUG_KERNEL select DEBUG_FS + select X86_PTDUMP_CORE ---help--- Say Y here if you want to show the kernel pagetable layout in a debugfs file. This information is only useful for kernel developers @@ -79,7 +83,8 @@ config X86_PTDUMP config EFI_PGT_DUMP bool "Dump the EFI pagetable" - depends on EFI && X86_PTDUMP + depends on EFI + select X86_PTDUMP_CORE ---help--- Enable this if you want to dump the EFI page table before enabling virtual mode. This can be used to debug miscellaneous @@ -105,6 +110,19 @@ config DEBUG_RODATA_TEST feature as well as for the change_page_attr() infrastructure. If in doubt, say "N" +config DEBUG_WX + bool "Warn on W+X mappings at boot" + depends on DEBUG_RODATA + select X86_PTDUMP_CORE + ---help--- + Generate a warning if any W+X mappings are found at boot. + This is useful for discovering cases where the kernel is leaving + W+X mappings after applying NX, as such mappings are a security risk. + Look for a message in dmesg output like this: + x86/mm: Checked W+X mappings: passed, no W+X pages found. + or like this: + x86/mm: Checked W+X mappings: FAILED, W+X pages found. + config DEBUG_SET_MODULE_RONX bool "Set loadable kernel module data as NX and text as RO" depends on MODULES diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 867da5b..f2b6bed 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -19,6 +19,13 @@ #include void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd); +void ptdump_walk_pgd_level_checkwx(void); + +#ifdef CONFIG_DEBUG_WX +#define debug_checkwx() ptdump_walk_pgd_level_checkwx() +#else +#define debug_checkwx() do { } while (0) +#endif /* * ZERO_PAGE is a global shared page that is always zero: used diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index a482d10..65c47fd 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -14,7 +14,7 @@ obj-$(CONFIG_SMP) += tlb.o obj-$(CONFIG_X86_32) += pgtable_32.o iomap_32.o obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o -obj-$(CONFIG_X86_PTDUMP) += dump_pagetables.o +obj-$(CONFIG_X86_PTDUMP_CORE) += dump_pagetables.o obj-$(CON
[PATCH v2] x86/mm: warn on W+x mappings
Warn on any residual W+x mappings after setting NX if DEBUG_WX is enabled. Introduce a separate X86_PTDUMP_CORE config that enables the code for dumping the page tables without enabling the debugfs interface, so that DEBUG_WX can be enabled without exposing the debugfs interface. Switch EFI_PGT_DUMP to using X86_PTDUMP_CORE so that it also does not require enabling the debugfs interface. On success: x86/mm: Checked W+X mappings: passed, no W+X pages found. On failure: [ cut here ] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:226 note_page+0x610/0x7b0() x86/mm: Found insecure W+X mapping at address 81755000/__stop___ex_table+0xfa8/0xabfa8 Modules linked in: CPU: 1 PID: 1 Comm: swapper/0 Tainted: GW 4.3.0-rc3+ #19 e96b193f 88042c5dbd48 81380a5f 88042c5dbd90 88042c5dbd80 8109d3f2 81e1 0003 88042c5dbe90 88042c5dbe90 Call Trace: [] dump_stack+0x44/0x55 [] warn_slowpath_common+0x82/0xc0 [] warn_slowpath_fmt+0x5c/0x80 [] ? note_page+0x5c9/0x7b0 [] note_page+0x610/0x7b0 [] ptdump_walk_pgd_level_core+0x259/0x3c0 [] ptdump_walk_pgd_level_checkwx+0x17/0x20 [] mark_rodata_ro+0xf5/0x100 [] ? rest_init+0x80/0x80 [] kernel_init+0x1d/0xe0 [] ret_from_fork+0x3f/0x70 [] ? rest_init+0x80/0x80 ---[ end trace a1f23a1e42a2ac76 ]--- x86/mm: Checked W+X mappings: FAILED, 171 W+X pages found. Signed-off-by: Stephen Smalley --- v2 addresses Kees' concern about being able to enable this check without enabling the debugfs interface, and reworks the output to present failure and success in the manner suggested by Ingo. arch/x86/Kconfig.debug | 19 ++- arch/x86/include/asm/pgtable.h | 7 +++ arch/x86/mm/Makefile | 2 +- arch/x86/mm/dump_pagetables.c | 42 +- arch/x86/mm/init_64.c | 2 ++ 5 files changed, 69 insertions(+), 3 deletions(-) diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index d8c0d32..c6fe16b 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -65,10 +65,14 @@ config EARLY_PRINTK_EFI This is useful for kernel debugging when your machine crashes very early before the console code is initialized. +config X86_PTDUMP_CORE + def_bool n + config X86_PTDUMP bool "Export kernel pagetable layout to userspace via debugfs" depends on DEBUG_KERNEL select DEBUG_FS + select X86_PTDUMP_CORE ---help--- Say Y here if you want to show the kernel pagetable layout in a debugfs file. This information is only useful for kernel developers @@ -79,13 +83,26 @@ config X86_PTDUMP config EFI_PGT_DUMP bool "Dump the EFI pagetable" - depends on EFI && X86_PTDUMP + depends on EFI + select X86_PTDUMP_CORE ---help--- Enable this if you want to dump the EFI page table before enabling virtual mode. This can be used to debug miscellaneous issues with the mapping of the EFI runtime regions into that table. +config DEBUG_WX + bool "Warn on W+X mappings at boot" + select X86_PTDUMP_CORE + ---help--- + Generate a warning if any W+X mappings are found at boot. + This is useful for discovering cases where the kernel is leaving + W+X mappings after applying NX, as such mappings are a security risk. + Look for a message in dmesg output like this: + x86/mm: Checked W+X mappings: passed, no W+X pages found. + or like this: + x86/mm: Checked W+X mappings: FAILED, W+X pages found. + config DEBUG_RODATA bool "Write protect kernel read-only data structures" default y diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 867da5b..f2b6bed 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -19,6 +19,13 @@ #include void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd); +void ptdump_walk_pgd_level_checkwx(void); + +#ifdef CONFIG_DEBUG_WX +#define debug_checkwx() ptdump_walk_pgd_level_checkwx() +#else +#define debug_checkwx() do { } while (0) +#endif /* * ZERO_PAGE is a global shared page that is always zero: used diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index a482d10..65c47fd 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -14,7 +14,7 @@ obj-$(CONFIG_SMP) += tlb.o obj-$(CONFIG_X86_32) += pgtable_32.o iomap_32.o obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o -obj-$(CONFIG_X86_PTDUMP) += dump_pagetables.o +obj-$(CONFIG_X86_PTDUMP_CORE) += dump_pagetables.o obj-$(CONFIG_HIGHMEM) += highmem_32.o diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index f0cedf3..19c64af 100644 --- a/arch/x86/mm/dump_
[tip:x86/urgent] x86/mm: Set NX on gap between __ex_table and rodata
Commit-ID: ab76f7b4ab2397ffdd2f1eb07c55697d19991d10 Gitweb: http://git.kernel.org/tip/ab76f7b4ab2397ffdd2f1eb07c55697d19991d10 Author: Stephen Smalley AuthorDate: Thu, 1 Oct 2015 09:04:22 -0400 Committer: Ingo Molnar CommitDate: Fri, 2 Oct 2015 09:21:06 +0200 x86/mm: Set NX on gap between __ex_table and rodata Unused space between the end of __ex_table and the start of rodata can be left W+x in the kernel page tables. Extend the setting of the NX bit to cover this gap by starting from text_end rather than rodata_start. Before: ---[ High Kernel Mapping ]--- 0x8000-0x8100 16M pmd 0x8100-0x8160 6M ro PSE GLB x pmd 0x8160-0x817540001360K ro GLB x pte 0x81754000-0x8180 688K RW GLB x pte 0x8180-0x81a0 2M ro PSE GLB NX pmd 0x81a0-0x81b3b0001260K ro GLB NX pte 0x81b3b000-0x82004884K RW GLB NX pte 0x8200-0x8220 2M RW PSE GLB NX pmd 0x8220-0xa000 478M pmd After: ---[ High Kernel Mapping ]--- 0x8000-0x8100 16M pmd 0x8100-0x8160 6M ro PSE GLB x pmd 0x8160-0x817540001360K ro GLB x pte 0x81754000-0x8180 688K RW GLB NX pte 0x8180-0x81a0 2M ro PSE GLB NX pmd 0x81a0-0x81b3b0001260K ro GLB NX pte 0x81b3b000-0x82004884K RW GLB NX pte 0x8200-0x8220 2M RW PSE GLB NX pmd 0x8220-0xa000 478M pmd Signed-off-by: Stephen Smalley Acked-by: Kees Cook Cc: Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-...@tycho.nsa.gov Signed-off-by: Ingo Molnar --- arch/x86/mm/init_64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 30564e2..df48430 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1132,7 +1132,7 @@ void mark_rodata_ro(void) * has been zapped already via cleanup_highmem(). */ all_end = roundup((unsigned long)_brk_end, PMD_SIZE); - set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT); + set_memory_nx(text_end, (all_end - text_end) >> PAGE_SHIFT); rodata_test(); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] x86/mm: warn on W+x mappings
Warn on any residual W+x mappings after setting NX if DEBUG_WX is enabled. Introduce a separate X86_PTDUMP_CORE config that enables the code for dumping the page tables without enabling the debugfs interface, so that DEBUG_WX can be enabled without exposing the debugfs interface. Switch EFI_PGT_DUMP to using X86_PTDUMP_CORE so that it also does not require enabling the debugfs interface. On success: x86/mm: Checked W+X mappings: passed, no W+X pages found. On failure: [ cut here ] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:226 note_page+0x610/0x7b0() x86/mm: Found insecure W+X mapping at address 81755000/__stop___ex_table+0xfa8/0xabfa8 Modules linked in: CPU: 1 PID: 1 Comm: swapper/0 Tainted: GW 4.3.0-rc3+ #19 e96b193f 88042c5dbd48 81380a5f 88042c5dbd90 88042c5dbd80 8109d3f2 81e1 0003 88042c5dbe90 88042c5dbe90 Call Trace: [] dump_stack+0x44/0x55 [] warn_slowpath_common+0x82/0xc0 [] warn_slowpath_fmt+0x5c/0x80 [] ? note_page+0x5c9/0x7b0 [] note_page+0x610/0x7b0 [] ptdump_walk_pgd_level_core+0x259/0x3c0 [] ptdump_walk_pgd_level_checkwx+0x17/0x20 [] mark_rodata_ro+0xf5/0x100 [] ? rest_init+0x80/0x80 [] kernel_init+0x1d/0xe0 [] ret_from_fork+0x3f/0x70 [] ? rest_init+0x80/0x80 ---[ end trace a1f23a1e42a2ac76 ]--- x86/mm: Checked W+X mappings: FAILED, 171 W+X pages found. Signed-off-by: Stephen Smalley <s...@tycho.nsa.gov> --- v2 addresses Kees' concern about being able to enable this check without enabling the debugfs interface, and reworks the output to present failure and success in the manner suggested by Ingo. arch/x86/Kconfig.debug | 19 ++- arch/x86/include/asm/pgtable.h | 7 +++ arch/x86/mm/Makefile | 2 +- arch/x86/mm/dump_pagetables.c | 42 +- arch/x86/mm/init_64.c | 2 ++ 5 files changed, 69 insertions(+), 3 deletions(-) diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index d8c0d32..c6fe16b 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -65,10 +65,14 @@ config EARLY_PRINTK_EFI This is useful for kernel debugging when your machine crashes very early before the console code is initialized. +config X86_PTDUMP_CORE + def_bool n + config X86_PTDUMP bool "Export kernel pagetable layout to userspace via debugfs" depends on DEBUG_KERNEL select DEBUG_FS + select X86_PTDUMP_CORE ---help--- Say Y here if you want to show the kernel pagetable layout in a debugfs file. This information is only useful for kernel developers @@ -79,13 +83,26 @@ config X86_PTDUMP config EFI_PGT_DUMP bool "Dump the EFI pagetable" - depends on EFI && X86_PTDUMP + depends on EFI + select X86_PTDUMP_CORE ---help--- Enable this if you want to dump the EFI page table before enabling virtual mode. This can be used to debug miscellaneous issues with the mapping of the EFI runtime regions into that table. +config DEBUG_WX + bool "Warn on W+X mappings at boot" + select X86_PTDUMP_CORE + ---help--- + Generate a warning if any W+X mappings are found at boot. + This is useful for discovering cases where the kernel is leaving + W+X mappings after applying NX, as such mappings are a security risk. + Look for a message in dmesg output like this: + x86/mm: Checked W+X mappings: passed, no W+X pages found. + or like this: + x86/mm: Checked W+X mappings: FAILED, W+X pages found. + config DEBUG_RODATA bool "Write protect kernel read-only data structures" default y diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 867da5b..f2b6bed 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -19,6 +19,13 @@ #include void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd); +void ptdump_walk_pgd_level_checkwx(void); + +#ifdef CONFIG_DEBUG_WX +#define debug_checkwx() ptdump_walk_pgd_level_checkwx() +#else +#define debug_checkwx() do { } while (0) +#endif /* * ZERO_PAGE is a global shared page that is always zero: used diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index a482d10..65c47fd 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -14,7 +14,7 @@ obj-$(CONFIG_SMP) += tlb.o obj-$(CONFIG_X86_32) += pgtable_32.o iomap_32.o obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o -obj-$(CONFIG_X86_PTDUMP) += dump_pagetables.o +obj-$(CONFIG_X86_PTDUMP_CORE) += dump_pagetables.o obj-$(CONFIG_HIGHMEM) += highmem_32.o diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index f0cedf3..19c64af 100644
[tip:x86/urgent] x86/mm: Set NX on gap between __ex_table and rodata
Commit-ID: ab76f7b4ab2397ffdd2f1eb07c55697d19991d10 Gitweb: http://git.kernel.org/tip/ab76f7b4ab2397ffdd2f1eb07c55697d19991d10 Author: Stephen Smalley <s...@tycho.nsa.gov> AuthorDate: Thu, 1 Oct 2015 09:04:22 -0400 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Fri, 2 Oct 2015 09:21:06 +0200 x86/mm: Set NX on gap between __ex_table and rodata Unused space between the end of __ex_table and the start of rodata can be left W+x in the kernel page tables. Extend the setting of the NX bit to cover this gap by starting from text_end rather than rodata_start. Before: ---[ High Kernel Mapping ]--- 0x8000-0x8100 16M pmd 0x8100-0x8160 6M ro PSE GLB x pmd 0x8160-0x817540001360K ro GLB x pte 0x81754000-0x8180 688K RW GLB x pte 0x8180-0x81a0 2M ro PSE GLB NX pmd 0x81a0-0x81b3b0001260K ro GLB NX pte 0x81b3b000-0x82004884K RW GLB NX pte 0x8200-0x8220 2M RW PSE GLB NX pmd 0x8220-0xa000 478M pmd After: ---[ High Kernel Mapping ]--- 0x8000-0x8100 16M pmd 0x8100-0x8160 6M ro PSE GLB x pmd 0x8160-0x817540001360K ro GLB x pte 0x81754000-0x8180 688K RW GLB NX pte 0x8180-0x81a0 2M ro PSE GLB NX pmd 0x81a0-0x81b3b0001260K ro GLB NX pte 0x81b3b000-0x82004884K RW GLB NX pte 0x8200-0x8220 2M RW PSE GLB NX pmd 0x8220-0xa000 478M pmd Signed-off-by: Stephen Smalley <s...@tycho.nsa.gov> Acked-by: Kees Cook <keesc...@chromium.org> Cc: <sta...@vger.kernel.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Mike Galbraith <efa...@gmx.de> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-...@tycho.nsa.gov Signed-off-by: Ingo Molnar <mi...@kernel.org> --- arch/x86/mm/init_64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 30564e2..df48430 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1132,7 +1132,7 @@ void mark_rodata_ro(void) * has been zapped already via cleanup_highmem(). */ all_end = roundup((unsigned long)_brk_end, PMD_SIZE); - set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT); + set_memory_nx(text_end, (all_end - text_end) >> PAGE_SHIFT); rodata_test(); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH] x86/mm: warn on W+x mappings
Warn on any residual W+x mappings if X86_PTDUMP is enabled. Sample dmesg output: Checking for W+x mappings 0x81755000-0x8180 684K RW GLB x pte Found W+x mappings. Please fix. Signed-off-by: Stephen Smalley --- Not sure if this is the best place to put this check. It must occur after free_init_pages() or it won't catch the W+x case for the gap between __ex_table and rodata. arch/x86/include/asm/pgtable.h | 6 ++ arch/x86/mm/dump_pagetables.c | 31 ++- arch/x86/mm/init_64.c | 2 ++ 3 files changed, 38 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 867da5b..8e771c1 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -20,6 +20,12 @@ void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd); +#ifdef CONFIG_X86_PTDUMP +void ptdump_walk_pgd_level_checkwx(void); +#else +#define ptdump_walk_pgd_level_checkwx() do { } while (0) +#endif + /* * ZERO_PAGE is a global shared page that is always zero: used * for zero-mapped memory areas etc.. diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index f0cedf3..986903b 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -32,6 +32,8 @@ struct pg_state { const struct addr_marker *marker; unsigned long lines; bool to_dmesg; + bool check_wx; + bool found_wx; }; struct addr_marker { @@ -214,6 +216,13 @@ static void note_page(struct seq_file *m, struct pg_state *st, const char *unit = units; unsigned long delta; int width = sizeof(unsigned long) * 2; + pgprotval_t pr = pgprot_val(st->current_prot); + bool savedmesg = st->to_dmesg; + + if (st->check_wx && (pr & _PAGE_RW) && !(pr & _PAGE_NX)) { + st->to_dmesg = true; + st->found_wx = true; + } /* * Now print the actual finished series @@ -261,6 +270,7 @@ static void note_page(struct seq_file *m, struct pg_state *st, st->start_address = st->current_address; st->current_prot = new_prot; st->level = level; + st->to_dmesg = savedmesg; } } @@ -344,7 +354,8 @@ static void walk_pud_level(struct seq_file *m, struct pg_state *st, pgd_t addr, #define pgd_none(a) pud_none(__pud(pgd_val(a))) #endif -void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd) +static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd, + bool checkwx) { #ifdef CONFIG_X86_64 pgd_t *start = (pgd_t *) _level4_pgt; @@ -359,6 +370,12 @@ void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd) st.to_dmesg = true; } + st.check_wx = checkwx; + if (checkwx) { + pr_info("Checking for W+x mappings\n"); + st.found_wx = false; + } + for (i = 0; i < PTRS_PER_PGD; i++) { st.current_address = normalize_addr(i * PGD_LEVEL_MULT); if (!pgd_none(*start)) { @@ -378,6 +395,18 @@ void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd) /* Flush out the last page */ st.current_address = normalize_addr(PTRS_PER_PGD*PGD_LEVEL_MULT); note_page(m, , __pgprot(0), 0); + if (checkwx && st.found_wx) + pr_warn("Found W+x mappings. Please fix.\n"); +} + +void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd) +{ + ptdump_walk_pgd_level_core(m, pgd, false); +} + +void ptdump_walk_pgd_level_checkwx(void) +{ + ptdump_walk_pgd_level_core(NULL, NULL, true); } static int ptdump_show(struct seq_file *m, void *v) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index df48430..7e704da 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1150,6 +1150,8 @@ void mark_rodata_ro(void) free_init_pages("unused kernel", (unsigned long) __va(__pa_symbol(rodata_end)), (unsigned long) __va(__pa_symbol(_sdata))); + + ptdump_walk_pgd_level_checkwx(); } #endif -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86/mm: Set NX on gap between __ex_table and rodata
Unused space between the end of __ex_table and the start of rodata can be left W+x in the kernel page tables. Extend the setting of the NX bit to cover this gap by starting from text_end rather than rodata_start. Before: ---[ High Kernel Mapping ]--- 0x8000-0x8100 16M pmd 0x8100-0x8160 6M ro PSE GLB x pmd 0x8160-0x817540001360K ro GLB x pte 0x81754000-0x8180 688K RW GLB x pte 0x8180-0x81a0 2M ro PSE GLB NX pmd 0x81a0-0x81b3b0001260K ro GLB NX pte 0x81b3b000-0x82004884K RW GLB NX pte 0x8200-0x8220 2M RW PSE GLB NX pmd 0x8220-0xa000 478M pmd After: ---[ High Kernel Mapping ]--- 0x8000-0x8100 16M pmd 0x8100-0x8160 6M ro PSE GLB x pmd 0x8160-0x817540001360K ro GLB x pte 0x81754000-0x8180 688K RW GLB NX pte 0x8180-0x81a0 2M ro PSE GLB NX pmd 0x81a0-0x81b3b0001260K ro GLB NX pte 0x81b3b000-0x82004884K RW GLB NX pte 0x8200-0x8220 2M RW PSE GLB NX pmd 0x8220-0xa000 478M pmd Signed-off-by: Stephen Smalley --- arch/x86/mm/init_64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 30564e2..df48430 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1132,7 +1132,7 @@ void mark_rodata_ro(void) * has been zapped already via cleanup_highmem(). */ all_end = roundup((unsigned long)_brk_end, PMD_SIZE); - set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT); + set_memory_nx(text_end, (all_end - text_end) >> PAGE_SHIFT); rodata_test(); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86/mm: Set NX on gap between __ex_table and rodata
Unused space between the end of __ex_table and the start of rodata can be left W+x in the kernel page tables. Extend the setting of the NX bit to cover this gap by starting from text_end rather than rodata_start. Before: ---[ High Kernel Mapping ]--- 0x8000-0x8100 16M pmd 0x8100-0x8160 6M ro PSE GLB x pmd 0x8160-0x817540001360K ro GLB x pte 0x81754000-0x8180 688K RW GLB x pte 0x8180-0x81a0 2M ro PSE GLB NX pmd 0x81a0-0x81b3b0001260K ro GLB NX pte 0x81b3b000-0x82004884K RW GLB NX pte 0x8200-0x8220 2M RW PSE GLB NX pmd 0x8220-0xa000 478M pmd After: ---[ High Kernel Mapping ]--- 0x8000-0x8100 16M pmd 0x8100-0x8160 6M ro PSE GLB x pmd 0x8160-0x817540001360K ro GLB x pte 0x81754000-0x8180 688K RW GLB NX pte 0x8180-0x81a0 2M ro PSE GLB NX pmd 0x81a0-0x81b3b0001260K ro GLB NX pte 0x81b3b000-0x82004884K RW GLB NX pte 0x8200-0x8220 2M RW PSE GLB NX pmd 0x8220-0xa000 478M pmd Signed-off-by: Stephen Smalley <s...@tycho.nsa.gov> --- arch/x86/mm/init_64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 30564e2..df48430 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1132,7 +1132,7 @@ void mark_rodata_ro(void) * has been zapped already via cleanup_highmem(). */ all_end = roundup((unsigned long)_brk_end, PMD_SIZE); - set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT); + set_memory_nx(text_end, (all_end - text_end) >> PAGE_SHIFT); rodata_test(); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH] x86/mm: warn on W+x mappings
Warn on any residual W+x mappings if X86_PTDUMP is enabled. Sample dmesg output: Checking for W+x mappings 0x81755000-0x8180 684K RW GLB x pte Found W+x mappings. Please fix. Signed-off-by: Stephen Smalley <s...@tycho.nsa.gov> --- Not sure if this is the best place to put this check. It must occur after free_init_pages() or it won't catch the W+x case for the gap between __ex_table and rodata. arch/x86/include/asm/pgtable.h | 6 ++ arch/x86/mm/dump_pagetables.c | 31 ++- arch/x86/mm/init_64.c | 2 ++ 3 files changed, 38 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 867da5b..8e771c1 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -20,6 +20,12 @@ void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd); +#ifdef CONFIG_X86_PTDUMP +void ptdump_walk_pgd_level_checkwx(void); +#else +#define ptdump_walk_pgd_level_checkwx() do { } while (0) +#endif + /* * ZERO_PAGE is a global shared page that is always zero: used * for zero-mapped memory areas etc.. diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index f0cedf3..986903b 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -32,6 +32,8 @@ struct pg_state { const struct addr_marker *marker; unsigned long lines; bool to_dmesg; + bool check_wx; + bool found_wx; }; struct addr_marker { @@ -214,6 +216,13 @@ static void note_page(struct seq_file *m, struct pg_state *st, const char *unit = units; unsigned long delta; int width = sizeof(unsigned long) * 2; + pgprotval_t pr = pgprot_val(st->current_prot); + bool savedmesg = st->to_dmesg; + + if (st->check_wx && (pr & _PAGE_RW) && !(pr & _PAGE_NX)) { + st->to_dmesg = true; + st->found_wx = true; + } /* * Now print the actual finished series @@ -261,6 +270,7 @@ static void note_page(struct seq_file *m, struct pg_state *st, st->start_address = st->current_address; st->current_prot = new_prot; st->level = level; + st->to_dmesg = savedmesg; } } @@ -344,7 +354,8 @@ static void walk_pud_level(struct seq_file *m, struct pg_state *st, pgd_t addr, #define pgd_none(a) pud_none(__pud(pgd_val(a))) #endif -void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd) +static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd, + bool checkwx) { #ifdef CONFIG_X86_64 pgd_t *start = (pgd_t *) _level4_pgt; @@ -359,6 +370,12 @@ void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd) st.to_dmesg = true; } + st.check_wx = checkwx; + if (checkwx) { + pr_info("Checking for W+x mappings\n"); + st.found_wx = false; + } + for (i = 0; i < PTRS_PER_PGD; i++) { st.current_address = normalize_addr(i * PGD_LEVEL_MULT); if (!pgd_none(*start)) { @@ -378,6 +395,18 @@ void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd) /* Flush out the last page */ st.current_address = normalize_addr(PTRS_PER_PGD*PGD_LEVEL_MULT); note_page(m, , __pgprot(0), 0); + if (checkwx && st.found_wx) + pr_warn("Found W+x mappings. Please fix.\n"); +} + +void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd) +{ + ptdump_walk_pgd_level_core(m, pgd, false); +} + +void ptdump_walk_pgd_level_checkwx(void) +{ + ptdump_walk_pgd_level_core(NULL, NULL, true); } static int ptdump_show(struct seq_file *m, void *v) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index df48430..7e704da 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1150,6 +1150,8 @@ void mark_rodata_ro(void) free_init_pages("unused kernel", (unsigned long) __va(__pa_symbol(rodata_end)), (unsigned long) __va(__pa_symbol(_sdata))); + + ptdump_walk_pgd_level_checkwx(); } #endif -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Security: Provide unioned file support
On 09/29/2015 05:03 PM, Stephen Smalley wrote: On 09/28/2015 04:00 PM, David Howells wrote: The attached patches provide security support for unioned files where the security involves an object-label-based LSM (such as SELinux) rather than a path-based LSM. [Note that a number of the bits that were in the original patch set are now upstream and I've rebased on Casey's changes to the security hook system] The patches can be broken down into two sets: (1) A patch to add LSM hooks to handle copy up of a file, including label determination/setting and xattr filtration and a patch to have overlayfs call the hooks during the copy-up procedure. (2) My SELinux implementations of these hooks. I do three things: (a) Don't copy up SELinux xattrs from the lower file to the upper file. It is assumed that the upper file will be created with the attrs we want or the selinux_inode_copy_up() hook will set it appropriately. The reason there are two separate hooks here is that selinux_inode_copy_up_xattr() might not ever be called if there aren't actually any xattrs on the lower inode. (b) I try to derive a label to be used by file operations by, in order of preference: using the label on the union inode if there is one (the normal overlayfs case); using the override label set on the superblock, if provided; or trying to derive a new label by sid transition operation. (c) Using the label obtained in (b) in file_has_perm() rather than using the label on the lower inode. Now the steps I have outlined in (b) and (c) seem to be at odds with what Dan Walsh and Stephen Smalley want - but I'm not sure I follow what that is, let alone how to do it: Wanted to bring back the original proposal. Stephen suggested that we could change back to the MLS way of handling labels. In MCS we base the MCS label of content created by a process on the containing directory. Which means that if a process running as s0:c1,c2 creates content in a directory labeled s0, it will get created as s0. In MLS if a process running as s0:c1,c2 creates content in a directory it labels it s0:c1,c2. No matter what the label of the directory is. (Well actually if the directory is not ranged the process will not be able to create the content.) We changed the default for MCS in Rawhide for about a week, when I realized this was a huge problem for containers sharing content. Currently if you want two containers to share the same volume mount, we label the content as svirt_sandbox_file_t:s0 If one process running as s0:c1,c2 creates content it gets created as s0, if the second container process is running as s0:c3,c4, it can still read/write the content. If we changed the default the object would get created as s0:c1,c2 and process runing as s0:c3,c4 would be blocked. So I had it reverted back to the standard MCS Mode. If we could get the default to be MLS mode on COW file systems and MCS on Volumes, we would get the best of both worlds. How are you testing this? I tried as follows: # Make sure we have a policy that is using xattrs to label overlay inodes. $ seinfo --fs_use | grep overlay fs_use_xattr overlay system_u:object_r:fs_t:s0 # Define some types for the different directories involved. $ cat overlay.te policy_module(overlay, 1.0) type lower_t; files_type(lower_t) type upper_t; files_type(upper_t) type work_t; files_type(work_t) type merged_t; files_type(merged_t) $ make -f /usr/share/selinux/devel/Makefile overlay.pp $ sudo semodule -i overlay.pp # Create and label the different directories involved. $ mkdir lower upper work merged $ chcon -t lower_t lower $ chcon -t upper_t upper $ chcon -t work_t work $ chcon -t merged_t merged # Populate lower $ echo "lower" > lower/a $ mkdir lower/b # Mount overlay $ sudo mount -t overlay -o lowerdir=lower,upperdir=upper,workdir=work merged # Look at the merged dir and labels. $ ls -Z merged unconfined_u:object_r:lower_t:s0 a unconfined_u:object_r:lower_t:s0 b # Write/create some files/directories. $ echo "foo" >> merged/a $ mkdir merged/b/c $ mkdir merged/c # Look again. $ ls -ZR merged merged: unconfined_u:object_r:lower_t:s0 a unconfined_u:object_r:upper_t:s0 c unconfined_u:object_r:lower_t:s0 b merged/b: unconfined_u:object_r:work_t:s0 c merged/b/c: $ ls -ZR upper upper: unconfined_u:object_r:work_t:s0 a unconfined_u:object_r:upper_t:s0 c unconfined_u:object_r:work_t:s0 b upper/b: unconfined_u:object_r:work_t:s0 c upper/b/c: Note that the copied-up file (a) and directory (b) are labeled lower_t in the overlay, but work_t in the upper dir, and neither of those is really what we want IIUC. And that's just the labeling question. Then there is the question of what permission checks were applied during those c
Re: [PATCH 0/5] Security: Provide unioned file support
On 09/29/2015 05:03 PM, Stephen Smalley wrote: On 09/28/2015 04:00 PM, David Howells wrote: The attached patches provide security support for unioned files where the security involves an object-label-based LSM (such as SELinux) rather than a path-based LSM. [Note that a number of the bits that were in the original patch set are now upstream and I've rebased on Casey's changes to the security hook system] The patches can be broken down into two sets: (1) A patch to add LSM hooks to handle copy up of a file, including label determination/setting and xattr filtration and a patch to have overlayfs call the hooks during the copy-up procedure. (2) My SELinux implementations of these hooks. I do three things: (a) Don't copy up SELinux xattrs from the lower file to the upper file. It is assumed that the upper file will be created with the attrs we want or the selinux_inode_copy_up() hook will set it appropriately. The reason there are two separate hooks here is that selinux_inode_copy_up_xattr() might not ever be called if there aren't actually any xattrs on the lower inode. (b) I try to derive a label to be used by file operations by, in order of preference: using the label on the union inode if there is one (the normal overlayfs case); using the override label set on the superblock, if provided; or trying to derive a new label by sid transition operation. (c) Using the label obtained in (b) in file_has_perm() rather than using the label on the lower inode. Now the steps I have outlined in (b) and (c) seem to be at odds with what Dan Walsh and Stephen Smalley want - but I'm not sure I follow what that is, let alone how to do it: Wanted to bring back the original proposal. Stephen suggested that we could change back to the MLS way of handling labels. In MCS we base the MCS label of content created by a process on the containing directory. Which means that if a process running as s0:c1,c2 creates content in a directory labeled s0, it will get created as s0. In MLS if a process running as s0:c1,c2 creates content in a directory it labels it s0:c1,c2. No matter what the label of the directory is. (Well actually if the directory is not ranged the process will not be able to create the content.) We changed the default for MCS in Rawhide for about a week, when I realized this was a huge problem for containers sharing content. Currently if you want two containers to share the same volume mount, we label the content as svirt_sandbox_file_t:s0 If one process running as s0:c1,c2 creates content it gets created as s0, if the second container process is running as s0:c3,c4, it can still read/write the content. If we changed the default the object would get created as s0:c1,c2 and process runing as s0:c3,c4 would be blocked. So I had it reverted back to the standard MCS Mode. If we could get the default to be MLS mode on COW file systems and MCS on Volumes, we would get the best of both worlds. How are you testing this? I tried as follows: # Make sure we have a policy that is using xattrs to label overlay inodes. $ seinfo --fs_use | grep overlay fs_use_xattr overlay system_u:object_r:fs_t:s0 # Define some types for the different directories involved. $ cat overlay.te policy_module(overlay, 1.0) type lower_t; files_type(lower_t) type upper_t; files_type(upper_t) type work_t; files_type(work_t) type merged_t; files_type(merged_t) $ make -f /usr/share/selinux/devel/Makefile overlay.pp $ sudo semodule -i overlay.pp # Create and label the different directories involved. $ mkdir lower upper work merged $ chcon -t lower_t lower $ chcon -t upper_t upper $ chcon -t work_t work $ chcon -t merged_t merged # Populate lower $ echo "lower" > lower/a $ mkdir lower/b # Mount overlay $ sudo mount -t overlay -o lowerdir=lower,upperdir=upper,workdir=work merged # Look at the merged dir and labels. $ ls -Z merged unconfined_u:object_r:lower_t:s0 a unconfined_u:object_r:lower_t:s0 b # Write/create some files/directories. $ echo "foo" >> merged/a $ mkdir merged/b/c $ mkdir merged/c # Look again. $ ls -ZR merged merged: unconfined_u:object_r:lower_t:s0 a unconfined_u:object_r:upper_t:s0 c unconfined_u:object_r:lower_t:s0 b merged/b: unconfined_u:object_r:work_t:s0 c merged/b/c: $ ls -ZR upper upper: unconfined_u:object_r:work_t:s0 a unconfined_u:object_r:upper_t:s0 c unconfined_u:object_r:work_t:s0 b upper/b: unconfined_u:object_r:work_t:s0 c upper/b/c: Note that the copied-up file (a) and directory (b) are labeled lower_t in the overlay, but work_t in the upper dir, and neither of those is really what we want IIUC. And that's just the labeling question. Then there is the question of what permission checks were applied during those c
Re: [PATCH 0/5] Security: Provide unioned file support
On 09/28/2015 04:00 PM, David Howells wrote: The attached patches provide security support for unioned files where the security involves an object-label-based LSM (such as SELinux) rather than a path-based LSM. [Note that a number of the bits that were in the original patch set are now upstream and I've rebased on Casey's changes to the security hook system] The patches can be broken down into two sets: (1) A patch to add LSM hooks to handle copy up of a file, including label determination/setting and xattr filtration and a patch to have overlayfs call the hooks during the copy-up procedure. (2) My SELinux implementations of these hooks. I do three things: (a) Don't copy up SELinux xattrs from the lower file to the upper file. It is assumed that the upper file will be created with the attrs we want or the selinux_inode_copy_up() hook will set it appropriately. The reason there are two separate hooks here is that selinux_inode_copy_up_xattr() might not ever be called if there aren't actually any xattrs on the lower inode. (b) I try to derive a label to be used by file operations by, in order of preference: using the label on the union inode if there is one (the normal overlayfs case); using the override label set on the superblock, if provided; or trying to derive a new label by sid transition operation. (c) Using the label obtained in (b) in file_has_perm() rather than using the label on the lower inode. Now the steps I have outlined in (b) and (c) seem to be at odds with what Dan Walsh and Stephen Smalley want - but I'm not sure I follow what that is, let alone how to do it: Wanted to bring back the original proposal. Stephen suggested that we could change back to the MLS way of handling labels. In MCS we base the MCS label of content created by a process on the containing directory. Which means that if a process running as s0:c1,c2 creates content in a directory labeled s0, it will get created as s0. In MLS if a process running as s0:c1,c2 creates content in a directory it labels it s0:c1,c2. No matter what the label of the directory is. (Well actually if the directory is not ranged the process will not be able to create the content.) We changed the default for MCS in Rawhide for about a week, when I realized this was a huge problem for containers sharing content. Currently if you want two containers to share the same volume mount, we label the content as svirt_sandbox_file_t:s0 If one process running as s0:c1,c2 creates content it gets created as s0, if the second container process is running as s0:c3,c4, it can still read/write the content. If we changed the default the object would get created as s0:c1,c2 and process runing as s0:c3,c4 would be blocked. So I had it reverted back to the standard MCS Mode. If we could get the default to be MLS mode on COW file systems and MCS on Volumes, we would get the best of both worlds. How are you testing this? I tried as follows: # Make sure we have a policy that is using xattrs to label overlay inodes. $ seinfo --fs_use | grep overlay fs_use_xattr overlay system_u:object_r:fs_t:s0 # Define some types for the different directories involved. $ cat overlay.te policy_module(overlay, 1.0) type lower_t; files_type(lower_t) type upper_t; files_type(upper_t) type work_t; files_type(work_t) type merged_t; files_type(merged_t) $ make -f /usr/share/selinux/devel/Makefile overlay.pp $ sudo semodule -i overlay.pp # Create and label the different directories involved. $ mkdir lower upper work merged $ chcon -t lower_t lower $ chcon -t upper_t upper $ chcon -t work_t work $ chcon -t merged_t merged # Populate lower $ echo "lower" > lower/a $ mkdir lower/b # Mount overlay $ sudo mount -t overlay -o lowerdir=lower,upperdir=upper,workdir=work merged # Look at the merged dir and labels. $ ls -Z merged unconfined_u:object_r:lower_t:s0 a unconfined_u:object_r:lower_t:s0 b # Write/create some files/directories. $ echo "foo" >> merged/a $ mkdir merged/b/c $ mkdir merged/c # Look again. $ ls -ZR merged merged: unconfined_u:object_r:lower_t:s0 a unconfined_u:object_r:upper_t:s0 c unconfined_u:object_r:lower_t:s0 b merged/b: unconfined_u:object_r:work_t:s0 c merged/b/c: $ ls -ZR upper upper: unconfined_u:object_r:work_t:s0 a unconfined_u:object_r:upper_t:s0 c unconfined_u:object_r:work_t:s0 b upper/b: unconfined_u:object_r:work_t:s0 c upper/b/c: Note that the copied-up file (a) and directory (b) are labeled lower_t in the overlay, but work_t in the upper dir, and neither of those is really what we want IIUC. And that's just the labeling question. Then there is the question of what permission che
Re: [PATCH 1/2] selinux: ioctl_has_perm should be static
On 09/27/2015 11:10 AM, Geliang Tang wrote: Fixes the following sparse warning: security/selinux/hooks.c:3242:5: warning: symbol 'ioctl_has_perm' was not declared. Should it be static? Signed-off-by: Geliang Tang Acked-by: Stephen Smalley --- security/selinux/hooks.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 84d21f9..5265c74 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -3239,7 +3239,7 @@ static void selinux_file_free_security(struct file *file) * Check whether a task has the ioctl permission and cmd * operation to an inode. */ -int ioctl_has_perm(const struct cred *cred, struct file *file, +static int ioctl_has_perm(const struct cred *cred, struct file *file, u32 requested, u16 cmd) { struct common_audit_data ad; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/5] selinux: use sprintf return value
On 09/25/2015 06:34 PM, Rasmus Villemoes wrote: sprintf returns the number of characters printed (excluding '\0'), so we can use that and avoid duplicating the length computation. Signed-off-by: Rasmus Villemoes Acked-by: Stephen Smalley --- security/selinux/ss/services.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c index aa2bdcb20848..ebb5eb3c318c 100644 --- a/security/selinux/ss/services.c +++ b/security/selinux/ss/services.c @@ -1218,13 +1218,10 @@ static int context_struct_to_string(struct context *context, char **scontext, u3 /* * Copy the user name, role name and type name into the context. */ - sprintf(scontextp, "%s:%s:%s", + scontextp += sprintf(scontextp, "%s:%s:%s", sym_name(, SYM_USERS, context->user - 1), sym_name(, SYM_ROLES, context->role - 1), sym_name(, SYM_TYPES, context->type - 1)); - scontextp += strlen(sym_name(, SYM_USERS, context->user - 1)) + -1 + strlen(sym_name(, SYM_ROLES, context->role - 1)) + -1 + strlen(sym_name(, SYM_TYPES, context->type - 1)); mls_sid_to_context(context, ); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] selinux: use kstrdup() in security_get_bools()
On 09/25/2015 06:34 PM, Rasmus Villemoes wrote: This is much simpler. Signed-off-by: Rasmus Villemoes Acked-by: Stephen Smalley --- security/selinux/ss/services.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c index 994c824a34c6..aa2bdcb20848 100644 --- a/security/selinux/ss/services.c +++ b/security/selinux/ss/services.c @@ -2609,18 +2609,12 @@ int security_get_bools(int *len, char ***names, int **values) goto err; for (i = 0; i < *len; i++) { - size_t name_len; - (*values)[i] = policydb.bool_val_to_struct[i]->state; - name_len = strlen(sym_name(, SYM_BOOLS, i)) + 1; rc = -ENOMEM; - (*names)[i] = kmalloc(sizeof(char) * name_len, GFP_ATOMIC); + (*names)[i] = kstrdup(sym_name(, SYM_BOOLS, i), GFP_ATOMIC); if (!(*names)[i]) goto err; - - strncpy((*names)[i], sym_name(, SYM_BOOLS, i), name_len); - (*names)[i][name_len - 1] = 0; } rc = 0; out: -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/5] selinux: use kmemdup in security_sid_to_context_core()
On 09/25/2015 06:34 PM, Rasmus Villemoes wrote: Signed-off-by: Rasmus Villemoes Acked-by: Stephen Smalley --- security/selinux/ss/services.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c index c550df0e0ff1..994c824a34c6 100644 --- a/security/selinux/ss/services.c +++ b/security/selinux/ss/services.c @@ -1259,12 +1259,12 @@ static int security_sid_to_context_core(u32 sid, char **scontext, *scontext_len = strlen(initial_sid_to_string[sid]) + 1; if (!scontext) goto out; - scontextp = kmalloc(*scontext_len, GFP_ATOMIC); + scontextp = kmemdup(initial_sid_to_string[sid], + *scontext_len, GFP_ATOMIC); if (!scontextp) { rc = -ENOMEM; goto out; } - strcpy(scontextp, initial_sid_to_string[sid]); *scontext = scontextp; goto out; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] selinux: remove pointless cast in selinux_inode_setsecurity()
On 09/25/2015 06:34 PM, Rasmus Villemoes wrote: security_context_to_sid() expects a const char* argument, so there's no point in casting away the const qualifier of value. Signed-off-by: Rasmus Villemoes Acked-by: Stephen Smalley --- security/selinux/hooks.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index fd50cd5ac2ec..5edb57df86f8 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -3162,7 +3162,7 @@ static int selinux_inode_setsecurity(struct inode *inode, const char *name, if (!value || !size) return -EACCES; - rc = security_context_to_sid((void *)value, size, , GFP_KERNEL); + rc = security_context_to_sid(value, size, , GFP_KERNEL); if (rc) return rc; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/5] selinux: introduce security_context_str_to_sid
On 09/25/2015 06:34 PM, Rasmus Villemoes wrote: There seems to be a little confusion as to whether the scontext_len parameter of security_context_to_sid() includes the nul-byte or not. Reading security_context_to_sid_core(), it seems that the expectation is that it does not (both the string copying and the test for scontext_len being zero hint at that). Introduce the helper security_context_str_to_sid() to do the strlen() call and fix all callers. Signed-off-by: Rasmus Villemoes Acked-by: Stephen Smalley --- security/selinux/hooks.c| 12 security/selinux/include/security.h | 2 ++ security/selinux/selinuxfs.c| 26 +- security/selinux/ss/services.c | 5 + 4 files changed, 20 insertions(+), 25 deletions(-) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index e4369d86e588..fd50cd5ac2ec 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -674,10 +674,9 @@ static int selinux_set_mnt_opts(struct super_block *sb, if (flags[i] == SBLABEL_MNT) continue; - rc = security_context_to_sid(mount_options[i], -strlen(mount_options[i]), , GFP_KERNEL); + rc = security_context_str_to_sid(mount_options[i], , GFP_KERNEL); if (rc) { - printk(KERN_WARNING "SELinux: security_context_to_sid" + printk(KERN_WARNING "SELinux: security_context_str_to_sid" "(%s) failed for (dev %s, type %s) errno=%d\n", mount_options[i], sb->s_id, name, rc); goto out; @@ -2617,15 +2616,12 @@ static int selinux_sb_remount(struct super_block *sb, void *data) for (i = 0; i < opts.num_mnt_opts; i++) { u32 sid; - size_t len; if (flags[i] == SBLABEL_MNT) continue; - len = strlen(mount_options[i]); - rc = security_context_to_sid(mount_options[i], len, , -GFP_KERNEL); + rc = security_context_str_to_sid(mount_options[i], , GFP_KERNEL); if (rc) { - printk(KERN_WARNING "SELinux: security_context_to_sid" + printk(KERN_WARNING "SELinux: security_context_str_to_sid" "(%s) failed for (dev %s, type %s) errno=%d\n", mount_options[i], sb->s_id, sb->s_type->name, rc); goto out_free_opts; diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h index 6a681d26bf20..223e9fd15d66 100644 --- a/security/selinux/include/security.h +++ b/security/selinux/include/security.h @@ -166,6 +166,8 @@ int security_sid_to_context_force(u32 sid, char **scontext, u32 *scontext_len); int security_context_to_sid(const char *scontext, u32 scontext_len, u32 *out_sid, gfp_t gfp); +int security_context_str_to_sid(const char *scontext, u32 *out_sid, gfp_t gfp); + int security_context_to_sid_default(const char *scontext, u32 scontext_len, u32 *out_sid, u32 def_sid, gfp_t gfp_flags); diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c index 5bed7716f8ab..c02da25d7b63 100644 --- a/security/selinux/selinuxfs.c +++ b/security/selinux/selinuxfs.c @@ -731,13 +731,11 @@ static ssize_t sel_write_access(struct file *file, char *buf, size_t size) if (sscanf(buf, "%s %s %hu", scon, tcon, ) != 3) goto out; - length = security_context_to_sid(scon, strlen(scon) + 1, , -GFP_KERNEL); + length = security_context_str_to_sid(scon, , GFP_KERNEL); if (length) goto out; - length = security_context_to_sid(tcon, strlen(tcon) + 1, , -GFP_KERNEL); + length = security_context_str_to_sid(tcon, , GFP_KERNEL); if (length) goto out; @@ -819,13 +817,11 @@ static ssize_t sel_write_create(struct file *file, char *buf, size_t size) objname = namebuf; } - length = security_context_to_sid(scon, strlen(scon) + 1, , -GFP_KERNEL); + length = security_context_str_to_sid(scon, , GFP_KERNEL); if (length) goto out; - length = security_context_to_sid(tcon, strlen(tcon) + 1, , -GFP_KERNEL); + length = security_context_str_to_sid(tcon, , GFP_KERNEL); if (length) goto out; @@ -882,13 +878,11 @@ static ssize_t sel_write_relabel(struct file *file, char *buf, size_t size) if (s
Re: [PATCH 0/5] selinux: minor cleanup suggestions
On 09/25/2015 06:34 PM, Rasmus Villemoes wrote: A few random things I stumbled on. While I'm pretty sure of the change in 1/5, I'm also confused, because the doc for the reverse security_sid_to_context state that @scontext_len is set to "the length of the string", which one would normally interpret as being what strlen() would give (i.e., without the \0). However, security_sid_to_context_core clearly includes the \0 in the return value, and I think callers rely on that. It is historical; originally security_context_to_sid() required @scontext to be NUL-terminated and @scontext_len to include the NUL byte in the length, and security_sid_to_context() returned a NUL-terminated @scontext and included the NUL byte in the returned length. However, when we switched SELinux to using xattrs rather than its own persistent label mapping, security_context_to_sid() was changed to accept contexts that did not already include the NUL because setfattr did not consider the NUL to be part of the attribute value for strings. So presently it accepts either form, although we prefer them to be NUL-terminated and canonicalize them to that form before returning to userspace. Rasmus Villemoes (5): selinux: introduce security_context_str_to_sid selinux: remove pointless cast in selinux_inode_setsecurity() selinux: use kmemdup in security_sid_to_context_core() selinux: use kstrdup() in security_get_bools() selinux: use sprintf return value security/selinux/hooks.c| 14 +- security/selinux/include/security.h | 2 ++ security/selinux/selinuxfs.c| 26 +- security/selinux/ss/services.c | 22 +- 4 files changed, 25 insertions(+), 39 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Security: Provide unioned file support
On 09/28/2015 04:00 PM, David Howells wrote: The attached patches provide security support for unioned files where the security involves an object-label-based LSM (such as SELinux) rather than a path-based LSM. [Note that a number of the bits that were in the original patch set are now upstream and I've rebased on Casey's changes to the security hook system] The patches can be broken down into two sets: (1) A patch to add LSM hooks to handle copy up of a file, including label determination/setting and xattr filtration and a patch to have overlayfs call the hooks during the copy-up procedure. (2) My SELinux implementations of these hooks. I do three things: (a) Don't copy up SELinux xattrs from the lower file to the upper file. It is assumed that the upper file will be created with the attrs we want or the selinux_inode_copy_up() hook will set it appropriately. The reason there are two separate hooks here is that selinux_inode_copy_up_xattr() might not ever be called if there aren't actually any xattrs on the lower inode. (b) I try to derive a label to be used by file operations by, in order of preference: using the label on the union inode if there is one (the normal overlayfs case); using the override label set on the superblock, if provided; or trying to derive a new label by sid transition operation. (c) Using the label obtained in (b) in file_has_perm() rather than using the label on the lower inode. Now the steps I have outlined in (b) and (c) seem to be at odds with what Dan Walsh and Stephen Smalley want - but I'm not sure I follow what that is, let alone how to do it: Wanted to bring back the original proposal. Stephen suggested that we could change back to the MLS way of handling labels. In MCS we base the MCS label of content created by a process on the containing directory. Which means that if a process running as s0:c1,c2 creates content in a directory labeled s0, it will get created as s0. In MLS if a process running as s0:c1,c2 creates content in a directory it labels it s0:c1,c2. No matter what the label of the directory is. (Well actually if the directory is not ranged the process will not be able to create the content.) We changed the default for MCS in Rawhide for about a week, when I realized this was a huge problem for containers sharing content. Currently if you want two containers to share the same volume mount, we label the content as svirt_sandbox_file_t:s0 If one process running as s0:c1,c2 creates content it gets created as s0, if the second container process is running as s0:c3,c4, it can still read/write the content. If we changed the default the object would get created as s0:c1,c2 and process runing as s0:c3,c4 would be blocked. So I had it reverted back to the standard MCS Mode. If we could get the default to be MLS mode on COW file systems and MCS on Volumes, we would get the best of both worlds. How are you testing this? I tried as follows: # Make sure we have a policy that is using xattrs to label overlay inodes. $ seinfo --fs_use | grep overlay fs_use_xattr overlay system_u:object_r:fs_t:s0 # Define some types for the different directories involved. $ cat overlay.te policy_module(overlay, 1.0) type lower_t; files_type(lower_t) type upper_t; files_type(upper_t) type work_t; files_type(work_t) type merged_t; files_type(merged_t) $ make -f /usr/share/selinux/devel/Makefile overlay.pp $ sudo semodule -i overlay.pp # Create and label the different directories involved. $ mkdir lower upper work merged $ chcon -t lower_t lower $ chcon -t upper_t upper $ chcon -t work_t work $ chcon -t merged_t merged # Populate lower $ echo "lower" > lower/a $ mkdir lower/b # Mount overlay $ sudo mount -t overlay -o lowerdir=lower,upperdir=upper,workdir=work merged # Look at the merged dir and labels. $ ls -Z merged unconfined_u:object_r:lower_t:s0 a unconfined_u:object_r:lower_t:s0 b # Write/create some files/directories. $ echo "foo" >> merged/a $ mkdir merged/b/c $ mkdir merged/c # Look again. $ ls -ZR merged merged: unconfined_u:object_r:lower_t:s0 a unconfined_u:object_r:upper_t:s0 c unconfined_u:object_r:lower_t:s0 b merged/b: unconfined_u:object_r:work_t:s0 c merged/b/c: $ ls -ZR upper upper: unconfined_u:object_r:work_t:s0 a unconfined_u:object_r:upper_t:s0 c unconfined_u:object_r:work_t:s0 b upper/b: unconfined_u:object_r:work_t:s0 c upper/b/c: Note that the copied-up file (a) and directory (b) are labeled lower_t in the overlay, but work_t in the upper dir, and neither of those is really what we want IIUC. And that's just the labeling question. Then there is the question of what permission che
Re: [PATCH 4/5] selinux: use kstrdup() in security_get_bools()
On 09/25/2015 06:34 PM, Rasmus Villemoes wrote: This is much simpler. Signed-off-by: Rasmus Villemoes <li...@rasmusvillemoes.dk> Acked-by: Stephen Smalley <s...@tycho.nsa.gov> --- security/selinux/ss/services.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c index 994c824a34c6..aa2bdcb20848 100644 --- a/security/selinux/ss/services.c +++ b/security/selinux/ss/services.c @@ -2609,18 +2609,12 @@ int security_get_bools(int *len, char ***names, int **values) goto err; for (i = 0; i < *len; i++) { - size_t name_len; - (*values)[i] = policydb.bool_val_to_struct[i]->state; - name_len = strlen(sym_name(, SYM_BOOLS, i)) + 1; rc = -ENOMEM; - (*names)[i] = kmalloc(sizeof(char) * name_len, GFP_ATOMIC); + (*names)[i] = kstrdup(sym_name(, SYM_BOOLS, i), GFP_ATOMIC); if (!(*names)[i]) goto err; - - strncpy((*names)[i], sym_name(, SYM_BOOLS, i), name_len); - (*names)[i][name_len - 1] = 0; } rc = 0; out: -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/5] selinux: introduce security_context_str_to_sid
On 09/25/2015 06:34 PM, Rasmus Villemoes wrote: There seems to be a little confusion as to whether the scontext_len parameter of security_context_to_sid() includes the nul-byte or not. Reading security_context_to_sid_core(), it seems that the expectation is that it does not (both the string copying and the test for scontext_len being zero hint at that). Introduce the helper security_context_str_to_sid() to do the strlen() call and fix all callers. Signed-off-by: Rasmus Villemoes <li...@rasmusvillemoes.dk> Acked-by: Stephen Smalley <s...@tycho.nsa.gov> --- security/selinux/hooks.c| 12 security/selinux/include/security.h | 2 ++ security/selinux/selinuxfs.c| 26 +- security/selinux/ss/services.c | 5 + 4 files changed, 20 insertions(+), 25 deletions(-) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index e4369d86e588..fd50cd5ac2ec 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -674,10 +674,9 @@ static int selinux_set_mnt_opts(struct super_block *sb, if (flags[i] == SBLABEL_MNT) continue; - rc = security_context_to_sid(mount_options[i], -strlen(mount_options[i]), , GFP_KERNEL); + rc = security_context_str_to_sid(mount_options[i], , GFP_KERNEL); if (rc) { - printk(KERN_WARNING "SELinux: security_context_to_sid" + printk(KERN_WARNING "SELinux: security_context_str_to_sid" "(%s) failed for (dev %s, type %s) errno=%d\n", mount_options[i], sb->s_id, name, rc); goto out; @@ -2617,15 +2616,12 @@ static int selinux_sb_remount(struct super_block *sb, void *data) for (i = 0; i < opts.num_mnt_opts; i++) { u32 sid; - size_t len; if (flags[i] == SBLABEL_MNT) continue; - len = strlen(mount_options[i]); - rc = security_context_to_sid(mount_options[i], len, , -GFP_KERNEL); + rc = security_context_str_to_sid(mount_options[i], , GFP_KERNEL); if (rc) { - printk(KERN_WARNING "SELinux: security_context_to_sid" + printk(KERN_WARNING "SELinux: security_context_str_to_sid" "(%s) failed for (dev %s, type %s) errno=%d\n", mount_options[i], sb->s_id, sb->s_type->name, rc); goto out_free_opts; diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h index 6a681d26bf20..223e9fd15d66 100644 --- a/security/selinux/include/security.h +++ b/security/selinux/include/security.h @@ -166,6 +166,8 @@ int security_sid_to_context_force(u32 sid, char **scontext, u32 *scontext_len); int security_context_to_sid(const char *scontext, u32 scontext_len, u32 *out_sid, gfp_t gfp); +int security_context_str_to_sid(const char *scontext, u32 *out_sid, gfp_t gfp); + int security_context_to_sid_default(const char *scontext, u32 scontext_len, u32 *out_sid, u32 def_sid, gfp_t gfp_flags); diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c index 5bed7716f8ab..c02da25d7b63 100644 --- a/security/selinux/selinuxfs.c +++ b/security/selinux/selinuxfs.c @@ -731,13 +731,11 @@ static ssize_t sel_write_access(struct file *file, char *buf, size_t size) if (sscanf(buf, "%s %s %hu", scon, tcon, ) != 3) goto out; - length = security_context_to_sid(scon, strlen(scon) + 1, , -GFP_KERNEL); + length = security_context_str_to_sid(scon, , GFP_KERNEL); if (length) goto out; - length = security_context_to_sid(tcon, strlen(tcon) + 1, , -GFP_KERNEL); + length = security_context_str_to_sid(tcon, , GFP_KERNEL); if (length) goto out; @@ -819,13 +817,11 @@ static ssize_t sel_write_create(struct file *file, char *buf, size_t size) objname = namebuf; } - length = security_context_to_sid(scon, strlen(scon) + 1, , -GFP_KERNEL); + length = security_context_str_to_sid(scon, , GFP_KERNEL); if (length) goto out; - length = security_context_to_sid(tcon, strlen(tcon) + 1, , -GFP_KERNEL); + length = security_context_str_to_sid(tcon, , GFP_KERNEL); if (length) goto out; @@ -882,13 +878,11 @@ static ssize_t sel_write_relabel(st
Re: [PATCH 3/5] selinux: use kmemdup in security_sid_to_context_core()
On 09/25/2015 06:34 PM, Rasmus Villemoes wrote: Signed-off-by: Rasmus Villemoes <li...@rasmusvillemoes.dk> Acked-by: Stephen Smalley <s...@tycho.nsa.gov> --- security/selinux/ss/services.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c index c550df0e0ff1..994c824a34c6 100644 --- a/security/selinux/ss/services.c +++ b/security/selinux/ss/services.c @@ -1259,12 +1259,12 @@ static int security_sid_to_context_core(u32 sid, char **scontext, *scontext_len = strlen(initial_sid_to_string[sid]) + 1; if (!scontext) goto out; - scontextp = kmalloc(*scontext_len, GFP_ATOMIC); + scontextp = kmemdup(initial_sid_to_string[sid], + *scontext_len, GFP_ATOMIC); if (!scontextp) { rc = -ENOMEM; goto out; } - strcpy(scontextp, initial_sid_to_string[sid]); *scontext = scontextp; goto out; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/5] selinux: use sprintf return value
On 09/25/2015 06:34 PM, Rasmus Villemoes wrote: sprintf returns the number of characters printed (excluding '\0'), so we can use that and avoid duplicating the length computation. Signed-off-by: Rasmus Villemoes <li...@rasmusvillemoes.dk> Acked-by: Stephen Smalley <s...@tycho.nsa.gov> --- security/selinux/ss/services.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c index aa2bdcb20848..ebb5eb3c318c 100644 --- a/security/selinux/ss/services.c +++ b/security/selinux/ss/services.c @@ -1218,13 +1218,10 @@ static int context_struct_to_string(struct context *context, char **scontext, u3 /* * Copy the user name, role name and type name into the context. */ - sprintf(scontextp, "%s:%s:%s", + scontextp += sprintf(scontextp, "%s:%s:%s", sym_name(, SYM_USERS, context->user - 1), sym_name(, SYM_ROLES, context->role - 1), sym_name(, SYM_TYPES, context->type - 1)); - scontextp += strlen(sym_name(, SYM_USERS, context->user - 1)) + -1 + strlen(sym_name(, SYM_ROLES, context->role - 1)) + -1 + strlen(sym_name(, SYM_TYPES, context->type - 1)); mls_sid_to_context(context, ); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] selinux: minor cleanup suggestions
On 09/25/2015 06:34 PM, Rasmus Villemoes wrote: A few random things I stumbled on. While I'm pretty sure of the change in 1/5, I'm also confused, because the doc for the reverse security_sid_to_context state that @scontext_len is set to "the length of the string", which one would normally interpret as being what strlen() would give (i.e., without the \0). However, security_sid_to_context_core clearly includes the \0 in the return value, and I think callers rely on that. It is historical; originally security_context_to_sid() required @scontext to be NUL-terminated and @scontext_len to include the NUL byte in the length, and security_sid_to_context() returned a NUL-terminated @scontext and included the NUL byte in the returned length. However, when we switched SELinux to using xattrs rather than its own persistent label mapping, security_context_to_sid() was changed to accept contexts that did not already include the NUL because setfattr did not consider the NUL to be part of the attribute value for strings. So presently it accepts either form, although we prefer them to be NUL-terminated and canonicalize them to that form before returning to userspace. Rasmus Villemoes (5): selinux: introduce security_context_str_to_sid selinux: remove pointless cast in selinux_inode_setsecurity() selinux: use kmemdup in security_sid_to_context_core() selinux: use kstrdup() in security_get_bools() selinux: use sprintf return value security/selinux/hooks.c| 14 +- security/selinux/include/security.h | 2 ++ security/selinux/selinuxfs.c| 26 +- security/selinux/ss/services.c | 22 +- 4 files changed, 25 insertions(+), 39 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] selinux: remove pointless cast in selinux_inode_setsecurity()
On 09/25/2015 06:34 PM, Rasmus Villemoes wrote: security_context_to_sid() expects a const char* argument, so there's no point in casting away the const qualifier of value. Signed-off-by: Rasmus Villemoes <li...@rasmusvillemoes.dk> Acked-by: Stephen Smalley <s...@tycho.nsa.gov> --- security/selinux/hooks.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index fd50cd5ac2ec..5edb57df86f8 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -3162,7 +3162,7 @@ static int selinux_inode_setsecurity(struct inode *inode, const char *name, if (!value || !size) return -EACCES; - rc = security_context_to_sid((void *)value, size, , GFP_KERNEL); + rc = security_context_to_sid(value, size, , GFP_KERNEL); if (rc) return rc; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] selinux: ioctl_has_perm should be static
On 09/27/2015 11:10 AM, Geliang Tang wrote: Fixes the following sparse warning: security/selinux/hooks.c:3242:5: warning: symbol 'ioctl_has_perm' was not declared. Should it be static? Signed-off-by: Geliang Tang <geliangt...@163.com> Acked-by: Stephen Smalley <s...@tycho.nsa.gov> --- security/selinux/hooks.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 84d21f9..5265c74 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -3239,7 +3239,7 @@ static void selinux_file_free_security(struct file *file) * Check whether a task has the ioctl permission and cmd * operation to an inode. */ -int ioctl_has_perm(const struct cred *cred, struct file *file, +static int ioctl_has_perm(const struct cred *cred, struct file *file, u32 requested, u16 cmd) { struct common_audit_data ad; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: rwx mapping between ex_table and rodata
On 09/24/2015 06:25 PM, Kees Cook wrote: > On Thu, Sep 24, 2015 at 1:26 PM, Stephen Smalley wrote: >> Hi, >> >> With the attached config and 4.3-rc2 on x86_64, I see the following in >> /sys/kernel/debug/kernel_page_tables: >> ... >> ---[ High Kernel Mapping ]--- >> 0x8000-0x8100 16M >> pmd >> 0x8100-0x8160 6M ro PSE >> GLB x pmd >> 0x8160-0x817750001492K ro >> GLB x pte >> 0x81775000-0x8180 556K RW >> GLB x pte >> ^ >> 0x8180-0x81a0 2M ro PSE >> GLB NX pmd >> 0x81a0-0x81b430001292K ro >> GLB NX pte >> 0x81b43000-0x82004852K RW >> GLB NX pte >> 0x8200-0x8220 2M RW PSE >> GLB NX pmd >> 0x8220-0xa000 478M >> pmd >> ... >> >> This region seems to be between the end of ex_table and the start of rodata, >> $ objdump -x vmlinux | sort >> ... >> 817728b0 g __ex_table __start___ex_table >> 817728b0 ld __ex_table __ex_table >> 81774998 g __ex_table __stop___ex_table >> 8180 g .rodata __start_rodata >> 8180 ld .rodata .rodata >> ... >> >> $ readelf -a vmlinux >> ... >> Section Headers: >> [Nr] Name Type Address Offset >>Size EntSize Flags Link Info Align >> ... >> [ 3] __ex_tablePROGBITS 817728b0 009728b0 >>20e8 A 0 0 8 >> [ 4] .rodata PROGBITS 8180 00a0 >>002eefd2 A 0 0 64 >> ... >> >> I see a similar rwx mapping with the stock Fedora kernels (e.g. 4.1.6), so >> it isn't new to 4.3. > > To me it looks like another alignment/padding issue like got fixed > before. The space between __ex_table and rodata is (seems?) unused, so > the default page table permissions end up being W+X. Can we fix the > default to be NX instead? It'll make these bugs stay gone. Not sure where that would get fixed (or the ramifications), but is there a reason we can't just do the following to fix this particular case? diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 30564e2..df48430 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1132,7 +1132,7 @@ void mark_rodata_ro(void) * has been zapped already via cleanup_highmem(). */ all_end = roundup((unsigned long)_brk_end, PMD_SIZE); - set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT); + set_memory_nx(text_end, (all_end - text_end) >> PAGE_SHIFT); rodata_test(); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: rwx mapping between ex_table and rodata
On 09/24/2015 06:25 PM, Kees Cook wrote: > On Thu, Sep 24, 2015 at 1:26 PM, Stephen Smalley <s...@tycho.nsa.gov> wrote: >> Hi, >> >> With the attached config and 4.3-rc2 on x86_64, I see the following in >> /sys/kernel/debug/kernel_page_tables: >> ... >> ---[ High Kernel Mapping ]--- >> 0x8000-0x8100 16M >> pmd >> 0x8100-0x8160 6M ro PSE >> GLB x pmd >> 0x8160-0x817750001492K ro >> GLB x pte >> 0x81775000-0x8180 556K RW >> GLB x pte >> ^ >> 0x8180-0x81a0 2M ro PSE >> GLB NX pmd >> 0x81a0-0x81b430001292K ro >> GLB NX pte >> 0x81b43000-0x82004852K RW >> GLB NX pte >> 0x8200-0x8220 2M RW PSE >> GLB NX pmd >> 0x8220-0xa000 478M >> pmd >> ... >> >> This region seems to be between the end of ex_table and the start of rodata, >> $ objdump -x vmlinux | sort >> ... >> 817728b0 g __ex_table __start___ex_table >> 817728b0 ld __ex_table __ex_table >> 81774998 g __ex_table __stop___ex_table >> 8180 g .rodata __start_rodata >> 8180 ld .rodata .rodata >> ... >> >> $ readelf -a vmlinux >> ... >> Section Headers: >> [Nr] Name Type Address Offset >>Size EntSize Flags Link Info Align >> ... >> [ 3] __ex_tablePROGBITS 817728b0 009728b0 >>20e8 A 0 0 8 >> [ 4] .rodata PROGBITS 8180 00a0 >>002eefd2 A 0 0 64 >> ... >> >> I see a similar rwx mapping with the stock Fedora kernels (e.g. 4.1.6), so >> it isn't new to 4.3. > > To me it looks like another alignment/padding issue like got fixed > before. The space between __ex_table and rodata is (seems?) unused, so > the default page table permissions end up being W+X. Can we fix the > default to be NX instead? It'll make these bugs stay gone. Not sure where that would get fixed (or the ramifications), but is there a reason we can't just do the following to fix this particular case? diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 30564e2..df48430 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1132,7 +1132,7 @@ void mark_rodata_ro(void) * has been zapped already via cleanup_highmem(). */ all_end = roundup((unsigned long)_brk_end, PMD_SIZE); - set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT); + set_memory_nx(text_end, (all_end - text_end) >> PAGE_SHIFT); rodata_test(); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/7] fs: Add user namesapace member to struct super_block
On 08/06/2015 11:44 AM, Seth Forshee wrote: > On Thu, Aug 06, 2015 at 10:51:16AM -0400, Stephen Smalley wrote: >> On 08/06/2015 10:20 AM, Seth Forshee wrote: >>> On Wed, Aug 05, 2015 at 04:19:03PM -0500, Eric W. Biederman wrote: >>>> Seth Forshee writes: >>>> >>>>> On Wed, Jul 15, 2015 at 09:47:11PM -0500, Eric W. Biederman wrote: >>>>>> Seth Forshee writes: >>>>>> >>>>>>> Initially this will be used to eliminate the implicit MNT_NODEV >>>>>>> flag for mounts from user namespaces. In the future it will also >>>>>>> be used for translating ids and checking capabilities for >>>>>>> filesystems mounted from user namespaces. >>>>>>> >>>>>>> s_user_ns is initialized in alloc_super() and is generally set to >>>>>>> current_user_ns(). To avoid security and corruption issues, two >>>>>>> additional mount checks are also added: >>>>>>> >>>>>>> - do_new_mount() gains a check that the user has CAP_SYS_ADMIN >>>>>>>in current_user_ns(). >>>>>>> >>>>>>> - sget() will fail with EBUSY when the filesystem it's looking >>>>>>>for is already mounted from another user namespace. >>>>>>> >>>>>>> proc needs some special handling here. The user namespace of >>>>>>> current isn't appropriate when forking as a result of clone (2) >>>>>>> with CLONE_NEWPID|CLONE_NEWUSER, as it will make proc unmountable >>>>>>> from within the new user namespace. Instead, the user namespace >>>>>>> which owns the new pid namespace should be used. sget_userns() is >>>>>>> added to allow passing of a user namespace other than that of >>>>>>> current, and this is used by proc_mount(). sget() becomes a >>>>>>> wrapper around sget_userns() which passes current_user_ns(). >>>>>> >>>>>> From bits of the previous conversation. >>>>>> >>>>>> We need sget_userns(..., _user_ns) for sysfs. The sysfs >>>>>> xattrs can travel from one mount of sysfs to another via the sysfs >>>>>> backing store. >>>>>> >>>>>> For tmpfs and any other filesystems we support mounting without >>>>>> privilige that support xattrs. We need to identify them and >>>>>> see if userspace is taking advantage of the ability to set >>>>>> xattrs and file caps (unlikely). If they are we need to call >>>>>> sget_userns(..., _user_ns) on those filesystems as well. >>>>>> >>>>>> Possibly/Probably we should just do that for all of the interesting >>>>>> filesystems to start with and then change back to an ordinary old sget >>>>>> after we have done the testing and confirmed we will not be introducing >>>>>> userspace regressions. >>>>> >>>>> I was reviewing everything in preparation for sending v2 patches, and I >>>>> realized that doing this has an undesirable side effect. In patch 2 the >>>>> implicit nodev is removed for unprivileged mounts, and instead s_user_ns >>>>> is used to block opening devices in these mounts. When we set s_user_ns >>>>> to _user_ns, it becomes possible to open device nodes from >>>>> unprivileged mounts of these filesystems. >>>>> >>>>> This doesn't pose a real problem today. The only filesystems it will >>>>> affect is sysfs, tmpfs, and ramfs (no others need s_user_ns = >>>>> _user_ns for user namespace mounts), and all of these aren't >>>>> problems. sysfs is okay because kernfs doesn't (currently?) allow device >>>>> nodes, and a user would require CAP_MKNOD to create any device nodes in >>>>> a tmpfs or ramfs mount. >>>>> >>>>> But for sysfs in particular it does mean that we will need to make sure >>>>> that there's no way that device nodes could start appearing in an >>>>> unprivileged mount. >>>> >>>> Good point about nodev. >>>> >>>> For tmpfs and ramfs and security labels the smack policy of allowing but >>>> filtering security labels mean smack once it has those bits will not >>>> care which user namespace ramfs and tmpfs live in.
Re: [PATCH 1/7] fs: Add user namesapace member to struct super_block
On 08/06/2015 10:20 AM, Seth Forshee wrote: > On Wed, Aug 05, 2015 at 04:19:03PM -0500, Eric W. Biederman wrote: >> Seth Forshee writes: >> >>> On Wed, Jul 15, 2015 at 09:47:11PM -0500, Eric W. Biederman wrote: Seth Forshee writes: > Initially this will be used to eliminate the implicit MNT_NODEV > flag for mounts from user namespaces. In the future it will also > be used for translating ids and checking capabilities for > filesystems mounted from user namespaces. > > s_user_ns is initialized in alloc_super() and is generally set to > current_user_ns(). To avoid security and corruption issues, two > additional mount checks are also added: > > - do_new_mount() gains a check that the user has CAP_SYS_ADMIN >in current_user_ns(). > > - sget() will fail with EBUSY when the filesystem it's looking >for is already mounted from another user namespace. > > proc needs some special handling here. The user namespace of > current isn't appropriate when forking as a result of clone (2) > with CLONE_NEWPID|CLONE_NEWUSER, as it will make proc unmountable > from within the new user namespace. Instead, the user namespace > which owns the new pid namespace should be used. sget_userns() is > added to allow passing of a user namespace other than that of > current, and this is used by proc_mount(). sget() becomes a > wrapper around sget_userns() which passes current_user_ns(). From bits of the previous conversation. We need sget_userns(..., _user_ns) for sysfs. The sysfs xattrs can travel from one mount of sysfs to another via the sysfs backing store. For tmpfs and any other filesystems we support mounting without privilige that support xattrs. We need to identify them and see if userspace is taking advantage of the ability to set xattrs and file caps (unlikely). If they are we need to call sget_userns(..., _user_ns) on those filesystems as well. Possibly/Probably we should just do that for all of the interesting filesystems to start with and then change back to an ordinary old sget after we have done the testing and confirmed we will not be introducing userspace regressions. >>> >>> I was reviewing everything in preparation for sending v2 patches, and I >>> realized that doing this has an undesirable side effect. In patch 2 the >>> implicit nodev is removed for unprivileged mounts, and instead s_user_ns >>> is used to block opening devices in these mounts. When we set s_user_ns >>> to _user_ns, it becomes possible to open device nodes from >>> unprivileged mounts of these filesystems. >>> >>> This doesn't pose a real problem today. The only filesystems it will >>> affect is sysfs, tmpfs, and ramfs (no others need s_user_ns = >>> _user_ns for user namespace mounts), and all of these aren't >>> problems. sysfs is okay because kernfs doesn't (currently?) allow device >>> nodes, and a user would require CAP_MKNOD to create any device nodes in >>> a tmpfs or ramfs mount. >>> >>> But for sysfs in particular it does mean that we will need to make sure >>> that there's no way that device nodes could start appearing in an >>> unprivileged mount. >> >> Good point about nodev. >> >> For tmpfs and ramfs and security labels the smack policy of allowing but >> filtering security labels mean smack once it has those bits will not >> care which user namespace ramfs and tmpfs live in. The labels should >> pretty much stay the same in any case. > > Smack does care which namespace ramfs and tmpfs are in. With the patch > I've got right now, if s_user_ns != _user_ns and the label of an > inode does not match that of the root inode then > security_inode_permission() will return EACCES. > > So if something with CAP_MAC_ADMIN is changing security labels in such a > mount, suddenly those inodes might become inaccessible. And while it may > be unlikely that anyone is doing this it's impossible for me to prove > that's the case. > >> If the same class of handling will also apply to selinux and those are >> the only two security modules that apply labels than we can leave tmpfs >> and ramfs with the security labels of whomever mounted them. > > For SELinux I now have a patch which applies mountpoint labeling to > mounts for which s_user_ns != _user_ns. I'm less sure then with > Smack how this behavior will differ from what happens today, but my > understanding is that this means that the label of the mountpoint is > used for all objects from that superblock. Afaik it does not have the > Smack behavior of denying access to filesystem objects which have a > different label in the backing store. > >> For sysfs things get a little more interesting. Assuming tmpfs and >> ramfs don't need s_user_ns == _user_ns, sysfs may be fine operating >> with possibly invalid securitly labels set on a different mount of >> selinux. (I am wondering now how all of
Re: [PATCH 1/7] fs: Add user namesapace member to struct super_block
On 08/06/2015 10:20 AM, Seth Forshee wrote: On Wed, Aug 05, 2015 at 04:19:03PM -0500, Eric W. Biederman wrote: Seth Forshee seth.fors...@canonical.com writes: On Wed, Jul 15, 2015 at 09:47:11PM -0500, Eric W. Biederman wrote: Seth Forshee seth.fors...@canonical.com writes: Initially this will be used to eliminate the implicit MNT_NODEV flag for mounts from user namespaces. In the future it will also be used for translating ids and checking capabilities for filesystems mounted from user namespaces. s_user_ns is initialized in alloc_super() and is generally set to current_user_ns(). To avoid security and corruption issues, two additional mount checks are also added: - do_new_mount() gains a check that the user has CAP_SYS_ADMIN in current_user_ns(). - sget() will fail with EBUSY when the filesystem it's looking for is already mounted from another user namespace. proc needs some special handling here. The user namespace of current isn't appropriate when forking as a result of clone (2) with CLONE_NEWPID|CLONE_NEWUSER, as it will make proc unmountable from within the new user namespace. Instead, the user namespace which owns the new pid namespace should be used. sget_userns() is added to allow passing of a user namespace other than that of current, and this is used by proc_mount(). sget() becomes a wrapper around sget_userns() which passes current_user_ns(). From bits of the previous conversation. We need sget_userns(..., init_user_ns) for sysfs. The sysfs xattrs can travel from one mount of sysfs to another via the sysfs backing store. For tmpfs and any other filesystems we support mounting without privilige that support xattrs. We need to identify them and see if userspace is taking advantage of the ability to set xattrs and file caps (unlikely). If they are we need to call sget_userns(..., init_user_ns) on those filesystems as well. Possibly/Probably we should just do that for all of the interesting filesystems to start with and then change back to an ordinary old sget after we have done the testing and confirmed we will not be introducing userspace regressions. I was reviewing everything in preparation for sending v2 patches, and I realized that doing this has an undesirable side effect. In patch 2 the implicit nodev is removed for unprivileged mounts, and instead s_user_ns is used to block opening devices in these mounts. When we set s_user_ns to init_user_ns, it becomes possible to open device nodes from unprivileged mounts of these filesystems. This doesn't pose a real problem today. The only filesystems it will affect is sysfs, tmpfs, and ramfs (no others need s_user_ns = init_user_ns for user namespace mounts), and all of these aren't problems. sysfs is okay because kernfs doesn't (currently?) allow device nodes, and a user would require CAP_MKNOD to create any device nodes in a tmpfs or ramfs mount. But for sysfs in particular it does mean that we will need to make sure that there's no way that device nodes could start appearing in an unprivileged mount. Good point about nodev. For tmpfs and ramfs and security labels the smack policy of allowing but filtering security labels mean smack once it has those bits will not care which user namespace ramfs and tmpfs live in. The labels should pretty much stay the same in any case. Smack does care which namespace ramfs and tmpfs are in. With the patch I've got right now, if s_user_ns != init_user_ns and the label of an inode does not match that of the root inode then security_inode_permission() will return EACCES. So if something with CAP_MAC_ADMIN is changing security labels in such a mount, suddenly those inodes might become inaccessible. And while it may be unlikely that anyone is doing this it's impossible for me to prove that's the case. If the same class of handling will also apply to selinux and those are the only two security modules that apply labels than we can leave tmpfs and ramfs with the security labels of whomever mounted them. For SELinux I now have a patch which applies mountpoint labeling to mounts for which s_user_ns != init_user_ns. I'm less sure then with Smack how this behavior will differ from what happens today, but my understanding is that this means that the label of the mountpoint is used for all objects from that superblock. Afaik it does not have the Smack behavior of denying access to filesystem objects which have a different label in the backing store. For sysfs things get a little more interesting. Assuming tmpfs and ramfs don't need s_user_ns == init_user_ns, sysfs may be fine operating with possibly invalid securitly labels set on a different mount of selinux. (I am wondering now how all of these labels work in the context of nfs). If someone was using Smack to label sysfs then a mount with s_user_ns != init_user_ns is going to leave inaccessible anything without the same label as the process which
Re: [PATCH 1/7] fs: Add user namesapace member to struct super_block
On 08/06/2015 11:44 AM, Seth Forshee wrote: On Thu, Aug 06, 2015 at 10:51:16AM -0400, Stephen Smalley wrote: On 08/06/2015 10:20 AM, Seth Forshee wrote: On Wed, Aug 05, 2015 at 04:19:03PM -0500, Eric W. Biederman wrote: Seth Forshee seth.fors...@canonical.com writes: On Wed, Jul 15, 2015 at 09:47:11PM -0500, Eric W. Biederman wrote: Seth Forshee seth.fors...@canonical.com writes: Initially this will be used to eliminate the implicit MNT_NODEV flag for mounts from user namespaces. In the future it will also be used for translating ids and checking capabilities for filesystems mounted from user namespaces. s_user_ns is initialized in alloc_super() and is generally set to current_user_ns(). To avoid security and corruption issues, two additional mount checks are also added: - do_new_mount() gains a check that the user has CAP_SYS_ADMIN in current_user_ns(). - sget() will fail with EBUSY when the filesystem it's looking for is already mounted from another user namespace. proc needs some special handling here. The user namespace of current isn't appropriate when forking as a result of clone (2) with CLONE_NEWPID|CLONE_NEWUSER, as it will make proc unmountable from within the new user namespace. Instead, the user namespace which owns the new pid namespace should be used. sget_userns() is added to allow passing of a user namespace other than that of current, and this is used by proc_mount(). sget() becomes a wrapper around sget_userns() which passes current_user_ns(). From bits of the previous conversation. We need sget_userns(..., init_user_ns) for sysfs. The sysfs xattrs can travel from one mount of sysfs to another via the sysfs backing store. For tmpfs and any other filesystems we support mounting without privilige that support xattrs. We need to identify them and see if userspace is taking advantage of the ability to set xattrs and file caps (unlikely). If they are we need to call sget_userns(..., init_user_ns) on those filesystems as well. Possibly/Probably we should just do that for all of the interesting filesystems to start with and then change back to an ordinary old sget after we have done the testing and confirmed we will not be introducing userspace regressions. I was reviewing everything in preparation for sending v2 patches, and I realized that doing this has an undesirable side effect. In patch 2 the implicit nodev is removed for unprivileged mounts, and instead s_user_ns is used to block opening devices in these mounts. When we set s_user_ns to init_user_ns, it becomes possible to open device nodes from unprivileged mounts of these filesystems. This doesn't pose a real problem today. The only filesystems it will affect is sysfs, tmpfs, and ramfs (no others need s_user_ns = init_user_ns for user namespace mounts), and all of these aren't problems. sysfs is okay because kernfs doesn't (currently?) allow device nodes, and a user would require CAP_MKNOD to create any device nodes in a tmpfs or ramfs mount. But for sysfs in particular it does mean that we will need to make sure that there's no way that device nodes could start appearing in an unprivileged mount. Good point about nodev. For tmpfs and ramfs and security labels the smack policy of allowing but filtering security labels mean smack once it has those bits will not care which user namespace ramfs and tmpfs live in. The labels should pretty much stay the same in any case. Smack does care which namespace ramfs and tmpfs are in. With the patch I've got right now, if s_user_ns != init_user_ns and the label of an inode does not match that of the root inode then security_inode_permission() will return EACCES. So if something with CAP_MAC_ADMIN is changing security labels in such a mount, suddenly those inodes might become inaccessible. And while it may be unlikely that anyone is doing this it's impossible for me to prove that's the case. If the same class of handling will also apply to selinux and those are the only two security modules that apply labels than we can leave tmpfs and ramfs with the security labels of whomever mounted them. For SELinux I now have a patch which applies mountpoint labeling to mounts for which s_user_ns != init_user_ns. I'm less sure then with Smack how this behavior will differ from what happens today, but my understanding is that this means that the label of the mountpoint is used for all objects from that superblock. Afaik it does not have the Smack behavior of denying access to filesystem objects which have a different label in the backing store. For sysfs things get a little more interesting. Assuming tmpfs and ramfs don't need s_user_ns == init_user_ns, sysfs may be fine operating with possibly invalid securitly labels set on a different mount of selinux. (I am wondering now how all of these labels work in the context of nfs). If someone was using Smack to label sysfs then a mount with s_user_ns
Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts
On 07/24/2015 11:11 AM, Seth Forshee wrote: > On Thu, Jul 23, 2015 at 11:23:31AM -0500, Seth Forshee wrote: >> On Thu, Jul 23, 2015 at 11:36:03AM -0400, Stephen Smalley wrote: >>> On 07/23/2015 10:39 AM, Seth Forshee wrote: >>>> On Thu, Jul 23, 2015 at 09:57:20AM -0400, Stephen Smalley wrote: >>>>> On 07/22/2015 04:40 PM, Stephen Smalley wrote: >>>>>> On 07/22/2015 04:25 PM, Stephen Smalley wrote: >>>>>>> On 07/22/2015 12:14 PM, Seth Forshee wrote: >>>>>>>> On Wed, Jul 22, 2015 at 12:02:13PM -0400, Stephen Smalley wrote: >>>>>>>>> On 07/16/2015 09:23 AM, Stephen Smalley wrote: >>>>>>>>>> On 07/15/2015 03:46 PM, Seth Forshee wrote: >>>>>>>>>>> Unprivileged users should not be able to supply security labels >>>>>>>>>>> in filesystems, nor should they be able to supply security >>>>>>>>>>> contexts in unprivileged mounts. For any mount where s_user_ns is >>>>>>>>>>> not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior >>>>>>>>>>> and return EPERM if any contexts are supplied in the mount >>>>>>>>>>> options. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Seth Forshee >>>>>>>>>> >>>>>>>>>> I think this is obsoleted by the subsequent discussion, but just for >>>>>>>>>> the >>>>>>>>>> record: this patch would cause the files in the userns mount to be >>>>>>>>>> left >>>>>>>>>> with the "unlabeled" label, and therefore under typical policies, >>>>>>>>>> completely inaccessible to any process in a confined domain. >>>>>>>>> >>>>>>>>> The right way to handle this for SELinux would be to automatically use >>>>>>>>> mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by >>>>>>>>> specifying a context= mount option), with the sbsec->mntpoint_sid set >>>>>>>>> from some related object (e.g. the block device file context, as in >>>>>>>>> your >>>>>>>>> patches for Smack). That will cause SELinux to use that value instead >>>>>>>>> of any xattr value from the filesystem and will cause attempts by >>>>>>>>> userspace to set the security.selinux xattr to fail on that >>>>>>>>> filesystem. >>>>>>>>> That is how SELinux normally deals with untrusted filesystems, except >>>>>>>>> that it is normally specified as a mount option by a trusted mounting >>>>>>>>> process, whereas in your case you need to automatically set it. >>>>>>>> >>>>>>>> Excellent, thank you for the advice. I'll start on this when I've >>>>>>>> finished with Smack. >>>>>>> >>>>>>> Not tested, but something like this should work. Note that it should >>>>>>> come after the call to security_fs_use() so we know whether SELinux >>>>>>> would even try to use xattrs supplied by the filesystem in the first >>>>>>> place. >>>>>>> >>>>>>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c >>>>>>> index 564079c..84da3a2 100644 >>>>>>> --- a/security/selinux/hooks.c >>>>>>> +++ b/security/selinux/hooks.c >>>>>>> @@ -745,6 +745,30 @@ static int selinux_set_mnt_opts(struct super_block >>>>>>> *sb, >>>>>>> goto out; >>>>>>> } >>>>>>> } >>>>>>> + >>>>>>> + /* >>>>>>> +* If this is a user namespace mount, no contexts are allowed >>>>>>> +* on the command line and security labels must be ignored. >>>>>>> +*/ >>>>>>> + if (sb->s_user_ns != _user_ns) { >>>>>>> + if (context_sid || fscontext_sid || rootcontext_sid || >>>>>>> + defcontext_sid) { >>>>>>> +
Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts
On 07/24/2015 11:11 AM, Seth Forshee wrote: On Thu, Jul 23, 2015 at 11:23:31AM -0500, Seth Forshee wrote: On Thu, Jul 23, 2015 at 11:36:03AM -0400, Stephen Smalley wrote: On 07/23/2015 10:39 AM, Seth Forshee wrote: On Thu, Jul 23, 2015 at 09:57:20AM -0400, Stephen Smalley wrote: On 07/22/2015 04:40 PM, Stephen Smalley wrote: On 07/22/2015 04:25 PM, Stephen Smalley wrote: On 07/22/2015 12:14 PM, Seth Forshee wrote: On Wed, Jul 22, 2015 at 12:02:13PM -0400, Stephen Smalley wrote: On 07/16/2015 09:23 AM, Stephen Smalley wrote: On 07/15/2015 03:46 PM, Seth Forshee wrote: Unprivileged users should not be able to supply security labels in filesystems, nor should they be able to supply security contexts in unprivileged mounts. For any mount where s_user_ns is not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior and return EPERM if any contexts are supplied in the mount options. Signed-off-by: Seth Forshee seth.fors...@canonical.com I think this is obsoleted by the subsequent discussion, but just for the record: this patch would cause the files in the userns mount to be left with the unlabeled label, and therefore under typical policies, completely inaccessible to any process in a confined domain. The right way to handle this for SELinux would be to automatically use mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by specifying a context= mount option), with the sbsec-mntpoint_sid set from some related object (e.g. the block device file context, as in your patches for Smack). That will cause SELinux to use that value instead of any xattr value from the filesystem and will cause attempts by userspace to set the security.selinux xattr to fail on that filesystem. That is how SELinux normally deals with untrusted filesystems, except that it is normally specified as a mount option by a trusted mounting process, whereas in your case you need to automatically set it. Excellent, thank you for the advice. I'll start on this when I've finished with Smack. Not tested, but something like this should work. Note that it should come after the call to security_fs_use() so we know whether SELinux would even try to use xattrs supplied by the filesystem in the first place. diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 564079c..84da3a2 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -745,6 +745,30 @@ static int selinux_set_mnt_opts(struct super_block *sb, goto out; } } + + /* +* If this is a user namespace mount, no contexts are allowed +* on the command line and security labels must be ignored. +*/ + if (sb-s_user_ns != init_user_ns) { + if (context_sid || fscontext_sid || rootcontext_sid || + defcontext_sid) { + rc = -EACCES; + goto out; + } + if (sbsec-behavior == SECURITY_FS_USE_XATTR) { + struct block_device *bdev = sb-s_bdev; + sbsec-behavior = SECURITY_FS_USE_MNTPOINT; + if (bdev) { + struct inode_security_struct *isec = bdev-bd_inode; That should be bdev-bd_inode-i_security. Sorry, this won't work. bd_inode is not the inode of the block device file that was passed to mount, and it isn't labeled in any way. It will just be unlabeled. So I guess the only real option here as a fallback is sbsec-mntpoint_sid = current_sid(). Which isn't great either, as the only case where we currently assign task labels to files is for their /proc/pid inodes, and no current policy will therefore allow create permission to such files. Darn, you're right, that isn't the inode we want. There really doesn't seem to be any way to get back to the one we want from the LSM, short of adding a new hook. Maybe list_first_entry(sb-s_bdev-bd_inodes, struct inode, i_devices)? Feels like a layering violation though... Yeah, and even though that probably works out to be the inode we want in most cases I don't think we can be absolutely certain that it is. Maybe there's some way we could walk the list and be sure we've found the right inode, but I'm not seeing it. I guess we could do something like this (note that most of the changes here are just to give a version of blkdev_get_by_path which takes a struct path * so that the filename lookup doesn't have to be done twice). Basically add a new hook that informs the security module of the inode for the backing device file passed to mount and call that from mount_bdev. The security module could grab a reference to the inode and stash it away. Something else to note is that, as I have it here, the hook would end up getting called for every mount of a given block device, not just the first. So it's possible the security module could see the hook
Re: [PATCH v2] ipc: Use private shmem or hugetlbfs inodes for shm segments.
On 07/27/2015 03:32 PM, Hugh Dickins wrote: > On Fri, 24 Jul 2015, Stephen Smalley wrote: > >> The shm implementation internally uses shmem or hugetlbfs inodes >> for shm segments. As these inodes are never directly exposed to >> userspace and only accessed through the shm operations which are >> already hooked by security modules, mark the inodes with the >> S_PRIVATE flag so that inode security initialization and permission >> checking is skipped. >> >> This was motivated by the following lockdep warning: >> Jul 22 14:36:40 fc23 kernel: >> == >> Jul 22 14:36:40 fc23 kernel: [ INFO: possible circular locking >> dependency detected ] >> Jul 22 14:36:40 fc23 kernel: 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 >> Tainted: GW >> Jul 22 14:36:40 fc23 kernel: >> --- >> Jul 22 14:36:40 fc23 kernel: httpd/1597 is trying to acquire lock: >> Jul 22 14:36:40 fc23 kernel: (>rwsem){+.}, at: >> [] shm_close+0x34/0x130 >> Jul 22 14:36:40 fc23 kernel: #012but task is already holding lock: >> Jul 22 14:36:40 fc23 kernel: (>mmap_sem){++}, at: >> [] SyS_shmdt+0x4b/0x180 >> Jul 22 14:36:40 fc23 kernel: #012which lock already depends on the new lock. >> Jul 22 14:36:40 fc23 kernel: #012the existing dependency chain (in >> reverse order) is: >> Jul 22 14:36:40 fc23 kernel: #012-> #3 (>mmap_sem){++}: >> Jul 22 14:36:40 fc23 kernel: [] >> lock_acquire+0xc7/0x270 >> Jul 22 14:36:40 fc23 kernel: [] >> __might_fault+0x7a/0xa0 >> Jul 22 14:36:40 fc23 kernel: [] filldir+0x9e/0x130 >> Jul 22 14:36:40 fc23 kernel: [] >> xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs] >> Jul 22 14:36:40 fc23 kernel: [] >> xfs_readdir+0x1b4/0x330 [xfs] >> Jul 22 14:36:40 fc23 kernel: [] >> xfs_file_readdir+0x2b/0x30 [xfs] >> Jul 22 14:36:40 fc23 kernel: [] >> iterate_dir+0x97/0x130 >> Jul 22 14:36:40 fc23 kernel: [] >> SyS_getdents+0x91/0x120 >> Jul 22 14:36:40 fc23 kernel: [] >> entry_SYSCALL_64_fastpath+0x12/0x76 >> Jul 22 14:36:40 fc23 kernel: #012-> #2 (_dir_ilock_class){.+}: >> Jul 22 14:36:40 fc23 kernel: [] >> lock_acquire+0xc7/0x270 >> Jul 22 14:36:40 fc23 kernel: [] >> down_read_nested+0x57/0xa0 >> Jul 22 14:36:40 fc23 kernel: [] >> xfs_ilock+0x167/0x350 [xfs] >> Jul 22 14:36:40 fc23 kernel: [] >> xfs_ilock_attr_map_shared+0x38/0x50 [xfs] >> Jul 22 14:36:40 fc23 kernel: [] >> xfs_attr_get+0xbd/0x190 [xfs] >> Jul 22 14:36:40 fc23 kernel: [] >> xfs_xattr_get+0x3d/0x70 [xfs] >> Jul 22 14:36:40 fc23 kernel: [] >> generic_getxattr+0x4f/0x70 >> Jul 22 14:36:40 fc23 kernel: [] >> inode_doinit_with_dentry+0x162/0x670 >> Jul 22 14:36:40 fc23 kernel: [] >> sb_finish_set_opts+0xd9/0x230 >> Jul 22 14:36:40 fc23 kernel: [] >> selinux_set_mnt_opts+0x35c/0x660 >> Jul 22 14:36:40 fc23 kernel: [] >> superblock_doinit+0x77/0xf0 >> Jul 22 14:36:40 fc23 kernel: [] >> delayed_superblock_init+0x10/0x20 >> Jul 22 14:36:40 fc23 kernel: [] >> iterate_supers+0xb3/0x110 >> Jul 22 14:36:40 fc23 kernel: [] >> selinux_complete_init+0x2f/0x40 >> Jul 22 14:36:40 fc23 kernel: [] >> security_load_policy+0x103/0x600 >> Jul 22 14:36:40 fc23 kernel: [] >> sel_write_load+0xc1/0x750 >> Jul 22 14:36:40 fc23 kernel: [] >> __vfs_write+0x37/0x100 >> Jul 22 14:36:40 fc23 kernel: [] vfs_write+0xa9/0x1a0 >> Jul 22 14:36:40 fc23 kernel: [] SyS_write+0x58/0xd0 >> Jul 22 14:36:40 fc23 kernel: [] >> entry_SYSCALL_64_fastpath+0x12/0x76 >> Jul 22 14:36:40 fc23 kernel: #012-> #1 (>lock){+.+.+.}: >> Jul 22 14:36:40 fc23 kernel: [] >> lock_acquire+0xc7/0x270 >> Jul 22 14:36:40 fc23 kernel: [] >> mutex_lock_nested+0x7f/0x3e0 >> Jul 22 14:36:40 fc23 kernel: [] >> inode_doinit_with_dentry+0xb9/0x670 >> Jul 22 14:36:40 fc23 kernel: [] >> selinux_d_instantiate+0x1c/0x20 >> Jul 22 14:36:40 fc23 kernel: [] >> security_d_instantiate+0x36/0x60 >> Jul 22 14:36:40 fc23 kernel: [] >> d_instantiate+0x54/0x70 >> Jul 22 14:36:40 fc23 kernel: [] >> __shmem_file_setup+0xdc/0x240 >> Jul 22 14:36:40 fc23 kernel: [] >> shmem_file_setup+0x10/0x20 >> Jul 22 14:36:40 fc23 kernel: [] newseg+0x290/0x3a0 >> Jul 22 14:36:40 fc23 kernel:
Re: [PATCH v2] ipc: Use private shmem or hugetlbfs inodes for shm segments.
On 07/27/2015 03:32 PM, Hugh Dickins wrote: On Fri, 24 Jul 2015, Stephen Smalley wrote: The shm implementation internally uses shmem or hugetlbfs inodes for shm segments. As these inodes are never directly exposed to userspace and only accessed through the shm operations which are already hooked by security modules, mark the inodes with the S_PRIVATE flag so that inode security initialization and permission checking is skipped. This was motivated by the following lockdep warning: Jul 22 14:36:40 fc23 kernel: == Jul 22 14:36:40 fc23 kernel: [ INFO: possible circular locking dependency detected ] Jul 22 14:36:40 fc23 kernel: 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: GW Jul 22 14:36:40 fc23 kernel: --- Jul 22 14:36:40 fc23 kernel: httpd/1597 is trying to acquire lock: Jul 22 14:36:40 fc23 kernel: (ids-rwsem){+.}, at: [81385354] shm_close+0x34/0x130 Jul 22 14:36:40 fc23 kernel: #012but task is already holding lock: Jul 22 14:36:40 fc23 kernel: (mm-mmap_sem){++}, at: [81386bbb] SyS_shmdt+0x4b/0x180 Jul 22 14:36:40 fc23 kernel: #012which lock already depends on the new lock. Jul 22 14:36:40 fc23 kernel: #012the existing dependency chain (in reverse order) is: Jul 22 14:36:40 fc23 kernel: #012- #3 (mm-mmap_sem){++}: Jul 22 14:36:40 fc23 kernel: [81109a07] lock_acquire+0xc7/0x270 Jul 22 14:36:40 fc23 kernel: [81217baa] __might_fault+0x7a/0xa0 Jul 22 14:36:40 fc23 kernel: [81284a1e] filldir+0x9e/0x130 Jul 22 14:36:40 fc23 kernel: [a019bb08] xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs] Jul 22 14:36:40 fc23 kernel: [a019c5b4] xfs_readdir+0x1b4/0x330 [xfs] Jul 22 14:36:40 fc23 kernel: [a019f38b] xfs_file_readdir+0x2b/0x30 [xfs] Jul 22 14:36:40 fc23 kernel: [812847e7] iterate_dir+0x97/0x130 Jul 22 14:36:40 fc23 kernel: [81284d21] SyS_getdents+0x91/0x120 Jul 22 14:36:40 fc23 kernel: [81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76 Jul 22 14:36:40 fc23 kernel: #012- #2 (xfs_dir_ilock_class){.+}: Jul 22 14:36:40 fc23 kernel: [81109a07] lock_acquire+0xc7/0x270 Jul 22 14:36:40 fc23 kernel: [81101e97] down_read_nested+0x57/0xa0 Jul 22 14:36:40 fc23 kernel: [a01b0e57] xfs_ilock+0x167/0x350 [xfs] Jul 22 14:36:40 fc23 kernel: [a01b10b8] xfs_ilock_attr_map_shared+0x38/0x50 [xfs] Jul 22 14:36:40 fc23 kernel: [a014799d] xfs_attr_get+0xbd/0x190 [xfs] Jul 22 14:36:40 fc23 kernel: [a01c17ad] xfs_xattr_get+0x3d/0x70 [xfs] Jul 22 14:36:40 fc23 kernel: [8129962f] generic_getxattr+0x4f/0x70 Jul 22 14:36:40 fc23 kernel: [8139ba52] inode_doinit_with_dentry+0x162/0x670 Jul 22 14:36:40 fc23 kernel: [8139cf69] sb_finish_set_opts+0xd9/0x230 Jul 22 14:36:40 fc23 kernel: [8139d66c] selinux_set_mnt_opts+0x35c/0x660 Jul 22 14:36:40 fc23 kernel: [8139ff97] superblock_doinit+0x77/0xf0 Jul 22 14:36:40 fc23 kernel: [813a0020] delayed_superblock_init+0x10/0x20 Jul 22 14:36:40 fc23 kernel: [81272d23] iterate_supers+0xb3/0x110 Jul 22 14:36:40 fc23 kernel: [813a4e5f] selinux_complete_init+0x2f/0x40 Jul 22 14:36:40 fc23 kernel: [813b47a3] security_load_policy+0x103/0x600 Jul 22 14:36:40 fc23 kernel: [813a6901] sel_write_load+0xc1/0x750 Jul 22 14:36:40 fc23 kernel: [8126e817] __vfs_write+0x37/0x100 Jul 22 14:36:40 fc23 kernel: [8126f229] vfs_write+0xa9/0x1a0 Jul 22 14:36:40 fc23 kernel: [8126ff48] SyS_write+0x58/0xd0 Jul 22 14:36:40 fc23 kernel: [81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76 Jul 22 14:36:40 fc23 kernel: #012- #1 (isec-lock){+.+.+.}: Jul 22 14:36:40 fc23 kernel: [81109a07] lock_acquire+0xc7/0x270 Jul 22 14:36:40 fc23 kernel: [8186de8f] mutex_lock_nested+0x7f/0x3e0 Jul 22 14:36:40 fc23 kernel: [8139b9a9] inode_doinit_with_dentry+0xb9/0x670 Jul 22 14:36:40 fc23 kernel: [8139bf7c] selinux_d_instantiate+0x1c/0x20 Jul 22 14:36:40 fc23 kernel: [813955f6] security_d_instantiate+0x36/0x60 Jul 22 14:36:40 fc23 kernel: [81287c34] d_instantiate+0x54/0x70 Jul 22 14:36:40 fc23 kernel: [8120111c] __shmem_file_setup+0xdc/0x240 Jul 22 14:36:40 fc23 kernel: [81201290] shmem_file_setup+0x10/0x20 Jul 22 14:36:40 fc23 kernel: [813856e0] newseg+0x290/0x3a0 Jul 22 14:36:40 fc23 kernel: [8137e278] ipcget+0x208/0x2d0 Jul 22 14:36:40 fc23 kernel: [81386074] SyS_shmget+0x54/0x70 Jul 22 14:36:40 fc23 kernel: [81871d2e
Re: [RFC][PATCH] ipc: Use private shmem or hugetlbfs inodes for shm segments.
On 07/23/2015 08:11 PM, Dave Chinner wrote: > On Thu, Jul 23, 2015 at 12:28:33PM -0400, Stephen Smalley wrote: >> The shm implementation internally uses shmem or hugetlbfs inodes >> for shm segments. As these inodes are never directly exposed to >> userspace and only accessed through the shm operations which are >> already hooked by security modules, mark the inodes with the >> S_PRIVATE flag so that inode security initialization and permission >> checking is skipped. >> >> This was motivated by the following lockdep warning: >> === >> [ INFO: possible circular locking dependency detected ] >> 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: GW >> --- >> httpd/1597 is trying to acquire lock: >> (>rwsem){+.}, at: [] shm_close+0x34/0x130 >> (>mmap_sem){++}, at: [] SyS_shmdt+0x4b/0x180 >> [] lock_acquire+0xc7/0x270 >> [] __might_fault+0x7a/0xa0 >> [] filldir+0x9e/0x130 >> [] xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs] >> [] xfs_readdir+0x1b4/0x330 [xfs] >> [] xfs_file_readdir+0x2b/0x30 [xfs] >> [] iterate_dir+0x97/0x130 >> [] SyS_getdents+0x91/0x120 >> [] entry_SYSCALL_64_fastpath+0x12/0x76 >> [] lock_acquire+0xc7/0x270 >> [] down_read_nested+0x57/0xa0 >> [] xfs_ilock+0x167/0x350 [xfs] >> [] xfs_ilock_attr_map_shared+0x38/0x50 [xfs] >> [] xfs_attr_get+0xbd/0x190 [xfs] >> [] xfs_xattr_get+0x3d/0x70 [xfs] >> [] generic_getxattr+0x4f/0x70 >> [] inode_doinit_with_dentry+0x162/0x670 >> [] sb_finish_set_opts+0xd9/0x230 >> [] selinux_set_mnt_opts+0x35c/0x660 >> [] superblock_doinit+0x77/0xf0 >> [] delayed_superblock_init+0x10/0x20 >> [] iterate_supers+0xb3/0x110 >> [] selinux_complete_init+0x2f/0x40 >> [] security_load_policy+0x103/0x600 >> [] sel_write_load+0xc1/0x750 >> [] __vfs_write+0x37/0x100 >> [] vfs_write+0xa9/0x1a0 >> [] SyS_write+0x58/0xd0 >> [] entry_SYSCALL_64_fastpath+0x12/0x76 >> [] lock_acquire+0xc7/0x270 >> [] mutex_lock_nested+0x7f/0x3e0 >> [] inode_doinit_with_dentry+0xb9/0x670 >> [] selinux_d_instantiate+0x1c/0x20 >> [] security_d_instantiate+0x36/0x60 >> [] d_instantiate+0x54/0x70 >> [] __shmem_file_setup+0xdc/0x240 >> [] shmem_file_setup+0x10/0x20 >> [] newseg+0x290/0x3a0 >> [] ipcget+0x208/0x2d0 >> [] SyS_shmget+0x54/0x70 >> [] entry_SYSCALL_64_fastpath+0x12/0x76 >> [] __lock_acquire+0x1a78/0x1d00 >> [] lock_acquire+0xc7/0x270 >> [] down_write+0x5a/0xc0 >> [] shm_close+0x34/0x130 >> [] remove_vma+0x45/0x80 >> [] do_munmap+0x2b0/0x460 >> [] SyS_shmdt+0xb5/0x180 >> [] entry_SYSCALL_64_fastpath+0x12/0x76 > > That's a completely screwed up stack trace. There are *4* syscall > entry points with 4 separate, unrelated syscall chains on that > stack trace, all starting at the same address. How is this a valid > stack trace and not a lockdep bug of some kind? Sorry, I mangled it when I tried to reformat it from Morten Steven's original report. Fixed in v2. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] ipc: Use private shmem or hugetlbfs inodes for shm segments.
CPU1 Jul 22 14:36:40 fc23 kernel: Jul 22 14:36:40 fc23 kernel: lock(>mmap_sem); Jul 22 14:36:40 fc23 kernel: lock(_dir_ilock_class); Jul 22 14:36:40 fc23 kernel: lock(>mmap_sem); Jul 22 14:36:40 fc23 kernel: lock(>rwsem); Jul 22 14:36:40 fc23 kernel: #012 *** DEADLOCK *** Jul 22 14:36:40 fc23 kernel: 1 lock held by httpd/1597: Jul 22 14:36:40 fc23 kernel: #0: (>mmap_sem){++}, at: [] SyS_shmdt+0x4b/0x180 Jul 22 14:36:40 fc23 kernel: #012stack backtrace: Jul 22 14:36:40 fc23 kernel: CPU: 7 PID: 1597 Comm: httpd Tainted: G W 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Jul 22 14:36:40 fc23 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014 Jul 22 14:36:40 fc23 kernel: 6cb6fe9d 88019ff07c58 81868175 Jul 22 14:36:40 fc23 kernel: 82aea390 88019ff07ca8 81105903 Jul 22 14:36:40 fc23 kernel: 88019ff07c78 88019ff07d08 0001 8800b75108f0 Jul 22 14:36:40 fc23 kernel: Call Trace: Jul 22 14:36:40 fc23 kernel: [] dump_stack+0x4c/0x65 Jul 22 14:36:40 fc23 kernel: [] print_circular_bug+0x1e3/0x250 Jul 22 14:36:40 fc23 kernel: [] __lock_acquire+0x1a78/0x1d00 Jul 22 14:36:40 fc23 kernel: [] ? unlink_file_vma+0x33/0x60 Jul 22 14:36:40 fc23 kernel: [] lock_acquire+0xc7/0x270 Jul 22 14:36:40 fc23 kernel: [] ? shm_close+0x34/0x130 Jul 22 14:36:40 fc23 kernel: [] down_write+0x5a/0xc0 Jul 22 14:36:40 fc23 kernel: [] ? shm_close+0x34/0x130 Jul 22 14:36:40 fc23 kernel: [] shm_close+0x34/0x130 Jul 22 14:36:40 fc23 kernel: [] remove_vma+0x45/0x80 Jul 22 14:36:40 fc23 kernel: [] do_munmap+0x2b0/0x460 Jul 22 14:36:40 fc23 kernel: [] ? SyS_shmdt+0x4b/0x180 Jul 22 14:36:40 fc23 kernel: [] SyS_shmdt+0xb5/0x180 Jul 22 14:36:40 fc23 kernel: [] entry_SYSCALL_64_fastpath+0x12/0x76 Reported-by: Morten Stevens Signed-off-by: Stephen Smalley --- This version only differs in the patch description, which restores the original lockdep trace from Morten Stevens. It was unfortunately mangled in the prior version. fs/hugetlbfs/inode.c | 2 ++ ipc/shm.c| 2 +- mm/shmem.c | 4 ++-- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 0cf74df..973c24c 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -1010,6 +1010,8 @@ struct file *hugetlb_file_setup(const char *name, size_t size, inode = hugetlbfs_get_inode(sb, NULL, S_IFREG | S_IRWXUGO, 0); if (!inode) goto out_dentry; + if (creat_flags == HUGETLB_SHMFS_INODE) + inode->i_flags |= S_PRIVATE; file = ERR_PTR(-ENOMEM); if (hugetlb_reserve_pages(inode, 0, diff --git a/ipc/shm.c b/ipc/shm.c index 06e5cf2..4aef24d 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -545,7 +545,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params) if ((shmflg & SHM_NORESERVE) && sysctl_overcommit_memory != OVERCOMMIT_NEVER) acctflag = VM_NORESERVE; - file = shmem_file_setup(name, size, acctflag); + file = shmem_kernel_file_setup(name, size, acctflag); } error = PTR_ERR(file); if (IS_ERR(file)) diff --git a/mm/shmem.c b/mm/shmem.c index 4caf8ed..dbe0c1e 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -3363,8 +3363,8 @@ put_path: * shmem_kernel_file_setup - get an unlinked file living in tmpfs which must be * kernel internal. There will be NO LSM permission checks against the * underlying inode. So users of this interface must do LSM checks at a - * higher layer. The one user is the big_key implementation. LSM checks - * are provided at the key level rather than the inode level. + * higher layer. The users are the big_key and shm implementations. LSM + * checks are provided at the key or shm level rather than the inode. * @name: name for dentry (to be seen in /proc//maps * @size: size to be set for the file * @flags: VM_NORESERVE suppresses pre-accounting of the entire object size -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] ipc: Use private shmem or hugetlbfs inodes for shm segments.
: [81109a07] lock_acquire+0xc7/0x270 Jul 22 14:36:40 fc23 kernel: [8186efba] down_write+0x5a/0xc0 Jul 22 14:36:40 fc23 kernel: [81385354] shm_close+0x34/0x130 Jul 22 14:36:40 fc23 kernel: [812203a5] remove_vma+0x45/0x80 Jul 22 14:36:40 fc23 kernel: [81222a30] do_munmap+0x2b0/0x460 Jul 22 14:36:40 fc23 kernel: [81386c25] SyS_shmdt+0xb5/0x180 Jul 22 14:36:40 fc23 kernel: [81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76 Jul 22 14:36:40 fc23 kernel: #012other info that might help us debug this: Jul 22 14:36:40 fc23 kernel: Chain exists of:#012 ids-rwsem -- xfs_dir_ilock_class -- mm-mmap_sem Jul 22 14:36:40 fc23 kernel: Possible unsafe locking scenario: Jul 22 14:36:40 fc23 kernel: CPU0CPU1 Jul 22 14:36:40 fc23 kernel: Jul 22 14:36:40 fc23 kernel: lock(mm-mmap_sem); Jul 22 14:36:40 fc23 kernel: lock(xfs_dir_ilock_class); Jul 22 14:36:40 fc23 kernel: lock(mm-mmap_sem); Jul 22 14:36:40 fc23 kernel: lock(ids-rwsem); Jul 22 14:36:40 fc23 kernel: #012 *** DEADLOCK *** Jul 22 14:36:40 fc23 kernel: 1 lock held by httpd/1597: Jul 22 14:36:40 fc23 kernel: #0: (mm-mmap_sem){++}, at: [81386bbb] SyS_shmdt+0x4b/0x180 Jul 22 14:36:40 fc23 kernel: #012stack backtrace: Jul 22 14:36:40 fc23 kernel: CPU: 7 PID: 1597 Comm: httpd Tainted: G W 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Jul 22 14:36:40 fc23 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014 Jul 22 14:36:40 fc23 kernel: 6cb6fe9d 88019ff07c58 81868175 Jul 22 14:36:40 fc23 kernel: 82aea390 88019ff07ca8 81105903 Jul 22 14:36:40 fc23 kernel: 88019ff07c78 88019ff07d08 0001 8800b75108f0 Jul 22 14:36:40 fc23 kernel: Call Trace: Jul 22 14:36:40 fc23 kernel: [81868175] dump_stack+0x4c/0x65 Jul 22 14:36:40 fc23 kernel: [81105903] print_circular_bug+0x1e3/0x250 Jul 22 14:36:40 fc23 kernel: [81108df8] __lock_acquire+0x1a78/0x1d00 Jul 22 14:36:40 fc23 kernel: [81220c33] ? unlink_file_vma+0x33/0x60 Jul 22 14:36:40 fc23 kernel: [81109a07] lock_acquire+0xc7/0x270 Jul 22 14:36:40 fc23 kernel: [81385354] ? shm_close+0x34/0x130 Jul 22 14:36:40 fc23 kernel: [8186efba] down_write+0x5a/0xc0 Jul 22 14:36:40 fc23 kernel: [81385354] ? shm_close+0x34/0x130 Jul 22 14:36:40 fc23 kernel: [81385354] shm_close+0x34/0x130 Jul 22 14:36:40 fc23 kernel: [812203a5] remove_vma+0x45/0x80 Jul 22 14:36:40 fc23 kernel: [81222a30] do_munmap+0x2b0/0x460 Jul 22 14:36:40 fc23 kernel: [81386bbb] ? SyS_shmdt+0x4b/0x180 Jul 22 14:36:40 fc23 kernel: [81386c25] SyS_shmdt+0xb5/0x180 Jul 22 14:36:40 fc23 kernel: [81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76 Reported-by: Morten Stevens mstev...@fedoraproject.org Signed-off-by: Stephen Smalley s...@tycho.nsa.gov --- This version only differs in the patch description, which restores the original lockdep trace from Morten Stevens. It was unfortunately mangled in the prior version. fs/hugetlbfs/inode.c | 2 ++ ipc/shm.c| 2 +- mm/shmem.c | 4 ++-- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 0cf74df..973c24c 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -1010,6 +1010,8 @@ struct file *hugetlb_file_setup(const char *name, size_t size, inode = hugetlbfs_get_inode(sb, NULL, S_IFREG | S_IRWXUGO, 0); if (!inode) goto out_dentry; + if (creat_flags == HUGETLB_SHMFS_INODE) + inode-i_flags |= S_PRIVATE; file = ERR_PTR(-ENOMEM); if (hugetlb_reserve_pages(inode, 0, diff --git a/ipc/shm.c b/ipc/shm.c index 06e5cf2..4aef24d 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -545,7 +545,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params) if ((shmflg SHM_NORESERVE) sysctl_overcommit_memory != OVERCOMMIT_NEVER) acctflag = VM_NORESERVE; - file = shmem_file_setup(name, size, acctflag); + file = shmem_kernel_file_setup(name, size, acctflag); } error = PTR_ERR(file); if (IS_ERR(file)) diff --git a/mm/shmem.c b/mm/shmem.c index 4caf8ed..dbe0c1e 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -3363,8 +3363,8 @@ put_path: * shmem_kernel_file_setup - get an unlinked file living in tmpfs which must be * kernel internal. There will be NO LSM permission checks against the * underlying inode. So users of this interface must do LSM checks at a - * higher layer. The one user is the big_key implementation. LSM checks - * are provided at the key level rather than the inode level. + * higher layer
Re: [RFC][PATCH] ipc: Use private shmem or hugetlbfs inodes for shm segments.
On 07/23/2015 08:11 PM, Dave Chinner wrote: On Thu, Jul 23, 2015 at 12:28:33PM -0400, Stephen Smalley wrote: The shm implementation internally uses shmem or hugetlbfs inodes for shm segments. As these inodes are never directly exposed to userspace and only accessed through the shm operations which are already hooked by security modules, mark the inodes with the S_PRIVATE flag so that inode security initialization and permission checking is skipped. This was motivated by the following lockdep warning: === [ INFO: possible circular locking dependency detected ] 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: GW --- httpd/1597 is trying to acquire lock: (ids-rwsem){+.}, at: [81385354] shm_close+0x34/0x130 (mm-mmap_sem){++}, at: [81386bbb] SyS_shmdt+0x4b/0x180 [81109a07] lock_acquire+0xc7/0x270 [81217baa] __might_fault+0x7a/0xa0 [81284a1e] filldir+0x9e/0x130 [a019bb08] xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs] [a019c5b4] xfs_readdir+0x1b4/0x330 [xfs] [a019f38b] xfs_file_readdir+0x2b/0x30 [xfs] [812847e7] iterate_dir+0x97/0x130 [81284d21] SyS_getdents+0x91/0x120 [81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76 [81109a07] lock_acquire+0xc7/0x270 [81101e97] down_read_nested+0x57/0xa0 [a01b0e57] xfs_ilock+0x167/0x350 [xfs] [a01b10b8] xfs_ilock_attr_map_shared+0x38/0x50 [xfs] [a014799d] xfs_attr_get+0xbd/0x190 [xfs] [a01c17ad] xfs_xattr_get+0x3d/0x70 [xfs] [8129962f] generic_getxattr+0x4f/0x70 [8139ba52] inode_doinit_with_dentry+0x162/0x670 [8139cf69] sb_finish_set_opts+0xd9/0x230 [8139d66c] selinux_set_mnt_opts+0x35c/0x660 [8139ff97] superblock_doinit+0x77/0xf0 [813a0020] delayed_superblock_init+0x10/0x20 [81272d23] iterate_supers+0xb3/0x110 [813a4e5f] selinux_complete_init+0x2f/0x40 [813b47a3] security_load_policy+0x103/0x600 [813a6901] sel_write_load+0xc1/0x750 [8126e817] __vfs_write+0x37/0x100 [8126f229] vfs_write+0xa9/0x1a0 [8126ff48] SyS_write+0x58/0xd0 [81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76 [81109a07] lock_acquire+0xc7/0x270 [8186de8f] mutex_lock_nested+0x7f/0x3e0 [8139b9a9] inode_doinit_with_dentry+0xb9/0x670 [8139bf7c] selinux_d_instantiate+0x1c/0x20 [813955f6] security_d_instantiate+0x36/0x60 [81287c34] d_instantiate+0x54/0x70 [8120111c] __shmem_file_setup+0xdc/0x240 [81201290] shmem_file_setup+0x10/0x20 [813856e0] newseg+0x290/0x3a0 [8137e278] ipcget+0x208/0x2d0 [81386074] SyS_shmget+0x54/0x70 [81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76 [81108df8] __lock_acquire+0x1a78/0x1d00 [81109a07] lock_acquire+0xc7/0x270 [8186efba] down_write+0x5a/0xc0 [81385354] shm_close+0x34/0x130 [812203a5] remove_vma+0x45/0x80 [81222a30] do_munmap+0x2b0/0x460 [81386c25] SyS_shmdt+0xb5/0x180 [81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76 That's a completely screwed up stack trace. There are *4* syscall entry points with 4 separate, unrelated syscall chains on that stack trace, all starting at the same address. How is this a valid stack trace and not a lockdep bug of some kind? Sorry, I mangled it when I tried to reformat it from Morten Steven's original report. Fixed in v2. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH] ipc: Use private shmem or hugetlbfs inodes for shm segments.
The shm implementation internally uses shmem or hugetlbfs inodes for shm segments. As these inodes are never directly exposed to userspace and only accessed through the shm operations which are already hooked by security modules, mark the inodes with the S_PRIVATE flag so that inode security initialization and permission checking is skipped. This was motivated by the following lockdep warning: === [ INFO: possible circular locking dependency detected ] 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: GW --- httpd/1597 is trying to acquire lock: (>rwsem){+.}, at: [] shm_close+0x34/0x130 (>mmap_sem){++}, at: [] SyS_shmdt+0x4b/0x180 [] lock_acquire+0xc7/0x270 [] __might_fault+0x7a/0xa0 [] filldir+0x9e/0x130 [] xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs] [] xfs_readdir+0x1b4/0x330 [xfs] [] xfs_file_readdir+0x2b/0x30 [xfs] [] iterate_dir+0x97/0x130 [] SyS_getdents+0x91/0x120 [] entry_SYSCALL_64_fastpath+0x12/0x76 [] lock_acquire+0xc7/0x270 [] down_read_nested+0x57/0xa0 [] xfs_ilock+0x167/0x350 [xfs] [] xfs_ilock_attr_map_shared+0x38/0x50 [xfs] [] xfs_attr_get+0xbd/0x190 [xfs] [] xfs_xattr_get+0x3d/0x70 [xfs] [] generic_getxattr+0x4f/0x70 [] inode_doinit_with_dentry+0x162/0x670 [] sb_finish_set_opts+0xd9/0x230 [] selinux_set_mnt_opts+0x35c/0x660 [] superblock_doinit+0x77/0xf0 [] delayed_superblock_init+0x10/0x20 [] iterate_supers+0xb3/0x110 [] selinux_complete_init+0x2f/0x40 [] security_load_policy+0x103/0x600 [] sel_write_load+0xc1/0x750 [] __vfs_write+0x37/0x100 [] vfs_write+0xa9/0x1a0 [] SyS_write+0x58/0xd0 [] entry_SYSCALL_64_fastpath+0x12/0x76 [] lock_acquire+0xc7/0x270 [] mutex_lock_nested+0x7f/0x3e0 [] inode_doinit_with_dentry+0xb9/0x670 [] selinux_d_instantiate+0x1c/0x20 [] security_d_instantiate+0x36/0x60 [] d_instantiate+0x54/0x70 [] __shmem_file_setup+0xdc/0x240 [] shmem_file_setup+0x10/0x20 [] newseg+0x290/0x3a0 [] ipcget+0x208/0x2d0 [] SyS_shmget+0x54/0x70 [] entry_SYSCALL_64_fastpath+0x12/0x76 [] __lock_acquire+0x1a78/0x1d00 [] lock_acquire+0xc7/0x270 [] down_write+0x5a/0xc0 [] shm_close+0x34/0x130 [] remove_vma+0x45/0x80 [] do_munmap+0x2b0/0x460 [] SyS_shmdt+0xb5/0x180 [] entry_SYSCALL_64_fastpath+0x12/0x76 Chain exists of:#012 >rwsem --> _dir_ilock_class --> >mmap_sem Possible unsafe locking scenario: CPU0CPU1 lock(>mmap_sem); lock(_dir_ilock_class); lock(>mmap_sem); lock(>rwsem); 1 lock held by httpd/1597: CPU: 7 PID: 1597 Comm: httpd Tainted: G W 4.2.0-0.rc3.git0.1.fc24.x86_64+Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Pla 6cb6fe9d 88019ff07c58 81868175 82aea390 88019ff07ca8 81105903 88019ff07c78 88019ff07d08 0001 8800b75108f0 Call Trace: [] dump_stack+0x4c/0x65 [] print_circular_bug+0x1e3/0x250 [] __lock_acquire+0x1a78/0x1d00 [] ? unlink_file_vma+0x33/0x60 [] lock_acquire+0xc7/0x270 [] ? shm_close+0x34/0x130 [] down_write+0x5a/0xc0 [] ? shm_close+0x34/0x130 [] shm_close+0x34/0x130 [] remove_vma+0x45/0x80 [] do_munmap+0x2b0/0x460 [] ? SyS_shmdt+0x4b/0x180 [] SyS_shmdt+0xb5/0x180 [] entry_SYSCALL_64_fastpath+0x12/0x76 Reported-by: Morten Stevens Signed-off-by: Stephen Smalley --- fs/hugetlbfs/inode.c | 2 ++ ipc/shm.c| 2 +- mm/shmem.c | 4 ++-- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 0cf74df..973c24c 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -1010,6 +1010,8 @@ struct file *hugetlb_file_setup(const char *name, size_t size, inode = hugetlbfs_get_inode(sb, NULL, S_IFREG | S_IRWXUGO, 0); if (!inode) goto out_dentry; + if (creat_flags == HUGETLB_SHMFS_INODE) + inode->i_flags |= S_PRIVATE; file = ERR_PTR(-ENOMEM); if (hugetlb_reserve_pages(inode, 0, diff --git a/ipc/shm.c b/ipc/shm.c index 06e5cf2..4aef24d 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -545,7 +545,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params) if ((shmflg & SHM_NORESERVE) && sysctl_overcommit_memory != OVERCOMMIT_NEVER) acctflag = VM_NORESERVE; - file = shmem_file_setup(name, size, acctflag); + file = shmem_kernel_file_setup(name, size, acctflag); } error = PTR_ERR(file); if (IS_ERR(file)) diff --git a/mm/shmem.c
Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts
On 07/23/2015 10:39 AM, Seth Forshee wrote: > On Thu, Jul 23, 2015 at 09:57:20AM -0400, Stephen Smalley wrote: >> On 07/22/2015 04:40 PM, Stephen Smalley wrote: >>> On 07/22/2015 04:25 PM, Stephen Smalley wrote: >>>> On 07/22/2015 12:14 PM, Seth Forshee wrote: >>>>> On Wed, Jul 22, 2015 at 12:02:13PM -0400, Stephen Smalley wrote: >>>>>> On 07/16/2015 09:23 AM, Stephen Smalley wrote: >>>>>>> On 07/15/2015 03:46 PM, Seth Forshee wrote: >>>>>>>> Unprivileged users should not be able to supply security labels >>>>>>>> in filesystems, nor should they be able to supply security >>>>>>>> contexts in unprivileged mounts. For any mount where s_user_ns is >>>>>>>> not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior >>>>>>>> and return EPERM if any contexts are supplied in the mount >>>>>>>> options. >>>>>>>> >>>>>>>> Signed-off-by: Seth Forshee >>>>>>> >>>>>>> I think this is obsoleted by the subsequent discussion, but just for the >>>>>>> record: this patch would cause the files in the userns mount to be left >>>>>>> with the "unlabeled" label, and therefore under typical policies, >>>>>>> completely inaccessible to any process in a confined domain. >>>>>> >>>>>> The right way to handle this for SELinux would be to automatically use >>>>>> mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by >>>>>> specifying a context= mount option), with the sbsec->mntpoint_sid set >>>>>> from some related object (e.g. the block device file context, as in your >>>>>> patches for Smack). That will cause SELinux to use that value instead >>>>>> of any xattr value from the filesystem and will cause attempts by >>>>>> userspace to set the security.selinux xattr to fail on that filesystem. >>>>>> That is how SELinux normally deals with untrusted filesystems, except >>>>>> that it is normally specified as a mount option by a trusted mounting >>>>>> process, whereas in your case you need to automatically set it. >>>>> >>>>> Excellent, thank you for the advice. I'll start on this when I've >>>>> finished with Smack. >>>> >>>> Not tested, but something like this should work. Note that it should >>>> come after the call to security_fs_use() so we know whether SELinux >>>> would even try to use xattrs supplied by the filesystem in the first place. >>>> >>>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c >>>> index 564079c..84da3a2 100644 >>>> --- a/security/selinux/hooks.c >>>> +++ b/security/selinux/hooks.c >>>> @@ -745,6 +745,30 @@ static int selinux_set_mnt_opts(struct super_block >>>> *sb, >>>> goto out; >>>> } >>>> } >>>> + >>>> + /* >>>> +* If this is a user namespace mount, no contexts are allowed >>>> +* on the command line and security labels must be ignored. >>>> +*/ >>>> + if (sb->s_user_ns != _user_ns) { >>>> + if (context_sid || fscontext_sid || rootcontext_sid || >>>> + defcontext_sid) { >>>> + rc = -EACCES; >>>> + goto out; >>>> + } >>>> + if (sbsec->behavior == SECURITY_FS_USE_XATTR) { >>>> + struct block_device *bdev = sb->s_bdev; >>>> + sbsec->behavior = SECURITY_FS_USE_MNTPOINT; >>>> + if (bdev) { >>>> + struct inode_security_struct *isec = >>>> bdev->bd_inode; >>> >>> That should be bdev->bd_inode->i_security. >> >> Sorry, this won't work. bd_inode is not the inode of the block device >> file that was passed to mount, and it isn't labeled in any way. It will >> just be unlabeled. >> >> So I guess the only real option here as a fallback is >> sbsec->mntpoint_sid = current_sid(). Which isn't great either, as the >> only case where we currently assign task labels to files is for their >> /proc/pid inodes, and no current policy will therefore allow create >> permission to such files. > > Darn, you're right, that isn't the inode we want. There really doesn't > seem to be any way to get back to the one we want from the LSM, short of > adding a new hook. Maybe list_first_entry(>s_bdev->bd_inodes, struct inode, i_devices)? Feels like a layering violation though... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts
On 07/22/2015 04:40 PM, Stephen Smalley wrote: > On 07/22/2015 04:25 PM, Stephen Smalley wrote: >> On 07/22/2015 12:14 PM, Seth Forshee wrote: >>> On Wed, Jul 22, 2015 at 12:02:13PM -0400, Stephen Smalley wrote: >>>> On 07/16/2015 09:23 AM, Stephen Smalley wrote: >>>>> On 07/15/2015 03:46 PM, Seth Forshee wrote: >>>>>> Unprivileged users should not be able to supply security labels >>>>>> in filesystems, nor should they be able to supply security >>>>>> contexts in unprivileged mounts. For any mount where s_user_ns is >>>>>> not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior >>>>>> and return EPERM if any contexts are supplied in the mount >>>>>> options. >>>>>> >>>>>> Signed-off-by: Seth Forshee >>>>> >>>>> I think this is obsoleted by the subsequent discussion, but just for the >>>>> record: this patch would cause the files in the userns mount to be left >>>>> with the "unlabeled" label, and therefore under typical policies, >>>>> completely inaccessible to any process in a confined domain. >>>> >>>> The right way to handle this for SELinux would be to automatically use >>>> mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by >>>> specifying a context= mount option), with the sbsec->mntpoint_sid set >>>> from some related object (e.g. the block device file context, as in your >>>> patches for Smack). That will cause SELinux to use that value instead >>>> of any xattr value from the filesystem and will cause attempts by >>>> userspace to set the security.selinux xattr to fail on that filesystem. >>>> That is how SELinux normally deals with untrusted filesystems, except >>>> that it is normally specified as a mount option by a trusted mounting >>>> process, whereas in your case you need to automatically set it. >>> >>> Excellent, thank you for the advice. I'll start on this when I've >>> finished with Smack. >> >> Not tested, but something like this should work. Note that it should >> come after the call to security_fs_use() so we know whether SELinux >> would even try to use xattrs supplied by the filesystem in the first place. >> >> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c >> index 564079c..84da3a2 100644 >> --- a/security/selinux/hooks.c >> +++ b/security/selinux/hooks.c >> @@ -745,6 +745,30 @@ static int selinux_set_mnt_opts(struct super_block *sb, >> goto out; >> } >> } >> + >> + /* >> +* If this is a user namespace mount, no contexts are allowed >> +* on the command line and security labels must be ignored. >> +*/ >> + if (sb->s_user_ns != _user_ns) { >> + if (context_sid || fscontext_sid || rootcontext_sid || >> + defcontext_sid) { >> + rc = -EACCES; >> + goto out; >> + } >> + if (sbsec->behavior == SECURITY_FS_USE_XATTR) { >> + struct block_device *bdev = sb->s_bdev; >> + sbsec->behavior = SECURITY_FS_USE_MNTPOINT; >> + if (bdev) { >> + struct inode_security_struct *isec = >> bdev->bd_inode; > > That should be bdev->bd_inode->i_security. Sorry, this won't work. bd_inode is not the inode of the block device file that was passed to mount, and it isn't labeled in any way. It will just be unlabeled. So I guess the only real option here as a fallback is sbsec->mntpoint_sid = current_sid(). Which isn't great either, as the only case where we currently assign task labels to files is for their /proc/pid inodes, and no current policy will therefore allow create permission to such files. > >> + sbsec->mntpoint_sid = isec->sid; >> + } else { >> + sbsec->mntpoint_sid = current_sid(); >> + } >> + } >> + goto out_set_opts; >> + } >> + >> /* sets the context of the superblock for the fs being mounted. */ >> if (fscontext_sid) { >> rc = may_context_mount_sb_relabel(fscontext_sid, sbsec, >> cred); >> @@ -813,6 +837,7 @@ static int selinux_set_mnt_opts(struct super_block *sb, >>
Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts
On 07/22/2015 04:40 PM, Stephen Smalley wrote: On 07/22/2015 04:25 PM, Stephen Smalley wrote: On 07/22/2015 12:14 PM, Seth Forshee wrote: On Wed, Jul 22, 2015 at 12:02:13PM -0400, Stephen Smalley wrote: On 07/16/2015 09:23 AM, Stephen Smalley wrote: On 07/15/2015 03:46 PM, Seth Forshee wrote: Unprivileged users should not be able to supply security labels in filesystems, nor should they be able to supply security contexts in unprivileged mounts. For any mount where s_user_ns is not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior and return EPERM if any contexts are supplied in the mount options. Signed-off-by: Seth Forshee seth.fors...@canonical.com I think this is obsoleted by the subsequent discussion, but just for the record: this patch would cause the files in the userns mount to be left with the unlabeled label, and therefore under typical policies, completely inaccessible to any process in a confined domain. The right way to handle this for SELinux would be to automatically use mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by specifying a context= mount option), with the sbsec-mntpoint_sid set from some related object (e.g. the block device file context, as in your patches for Smack). That will cause SELinux to use that value instead of any xattr value from the filesystem and will cause attempts by userspace to set the security.selinux xattr to fail on that filesystem. That is how SELinux normally deals with untrusted filesystems, except that it is normally specified as a mount option by a trusted mounting process, whereas in your case you need to automatically set it. Excellent, thank you for the advice. I'll start on this when I've finished with Smack. Not tested, but something like this should work. Note that it should come after the call to security_fs_use() so we know whether SELinux would even try to use xattrs supplied by the filesystem in the first place. diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 564079c..84da3a2 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -745,6 +745,30 @@ static int selinux_set_mnt_opts(struct super_block *sb, goto out; } } + + /* +* If this is a user namespace mount, no contexts are allowed +* on the command line and security labels must be ignored. +*/ + if (sb-s_user_ns != init_user_ns) { + if (context_sid || fscontext_sid || rootcontext_sid || + defcontext_sid) { + rc = -EACCES; + goto out; + } + if (sbsec-behavior == SECURITY_FS_USE_XATTR) { + struct block_device *bdev = sb-s_bdev; + sbsec-behavior = SECURITY_FS_USE_MNTPOINT; + if (bdev) { + struct inode_security_struct *isec = bdev-bd_inode; That should be bdev-bd_inode-i_security. Sorry, this won't work. bd_inode is not the inode of the block device file that was passed to mount, and it isn't labeled in any way. It will just be unlabeled. So I guess the only real option here as a fallback is sbsec-mntpoint_sid = current_sid(). Which isn't great either, as the only case where we currently assign task labels to files is for their /proc/pid inodes, and no current policy will therefore allow create permission to such files. + sbsec-mntpoint_sid = isec-sid; + } else { + sbsec-mntpoint_sid = current_sid(); + } + } + goto out_set_opts; + } + /* sets the context of the superblock for the fs being mounted. */ if (fscontext_sid) { rc = may_context_mount_sb_relabel(fscontext_sid, sbsec, cred); @@ -813,6 +837,7 @@ static int selinux_set_mnt_opts(struct super_block *sb, sbsec-def_sid = defcontext_sid; } +out_set_opts: rc = sb_finish_set_opts(sb); out: mutex_unlock(sbsec-lock); ___ Selinux mailing list seli...@tycho.nsa.gov To unsubscribe, send email to selinux-le...@tycho.nsa.gov. To get help, send an email containing help to selinux-requ...@tycho.nsa.gov. -- To unsubscribe from this list: send the line unsubscribe linux-security-module in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts
On 07/23/2015 10:39 AM, Seth Forshee wrote: On Thu, Jul 23, 2015 at 09:57:20AM -0400, Stephen Smalley wrote: On 07/22/2015 04:40 PM, Stephen Smalley wrote: On 07/22/2015 04:25 PM, Stephen Smalley wrote: On 07/22/2015 12:14 PM, Seth Forshee wrote: On Wed, Jul 22, 2015 at 12:02:13PM -0400, Stephen Smalley wrote: On 07/16/2015 09:23 AM, Stephen Smalley wrote: On 07/15/2015 03:46 PM, Seth Forshee wrote: Unprivileged users should not be able to supply security labels in filesystems, nor should they be able to supply security contexts in unprivileged mounts. For any mount where s_user_ns is not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior and return EPERM if any contexts are supplied in the mount options. Signed-off-by: Seth Forshee seth.fors...@canonical.com I think this is obsoleted by the subsequent discussion, but just for the record: this patch would cause the files in the userns mount to be left with the unlabeled label, and therefore under typical policies, completely inaccessible to any process in a confined domain. The right way to handle this for SELinux would be to automatically use mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by specifying a context= mount option), with the sbsec-mntpoint_sid set from some related object (e.g. the block device file context, as in your patches for Smack). That will cause SELinux to use that value instead of any xattr value from the filesystem and will cause attempts by userspace to set the security.selinux xattr to fail on that filesystem. That is how SELinux normally deals with untrusted filesystems, except that it is normally specified as a mount option by a trusted mounting process, whereas in your case you need to automatically set it. Excellent, thank you for the advice. I'll start on this when I've finished with Smack. Not tested, but something like this should work. Note that it should come after the call to security_fs_use() so we know whether SELinux would even try to use xattrs supplied by the filesystem in the first place. diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 564079c..84da3a2 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -745,6 +745,30 @@ static int selinux_set_mnt_opts(struct super_block *sb, goto out; } } + + /* +* If this is a user namespace mount, no contexts are allowed +* on the command line and security labels must be ignored. +*/ + if (sb-s_user_ns != init_user_ns) { + if (context_sid || fscontext_sid || rootcontext_sid || + defcontext_sid) { + rc = -EACCES; + goto out; + } + if (sbsec-behavior == SECURITY_FS_USE_XATTR) { + struct block_device *bdev = sb-s_bdev; + sbsec-behavior = SECURITY_FS_USE_MNTPOINT; + if (bdev) { + struct inode_security_struct *isec = bdev-bd_inode; That should be bdev-bd_inode-i_security. Sorry, this won't work. bd_inode is not the inode of the block device file that was passed to mount, and it isn't labeled in any way. It will just be unlabeled. So I guess the only real option here as a fallback is sbsec-mntpoint_sid = current_sid(). Which isn't great either, as the only case where we currently assign task labels to files is for their /proc/pid inodes, and no current policy will therefore allow create permission to such files. Darn, you're right, that isn't the inode we want. There really doesn't seem to be any way to get back to the one we want from the LSM, short of adding a new hook. Maybe list_first_entry(sb-s_bdev-bd_inodes, struct inode, i_devices)? Feels like a layering violation though... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH] ipc: Use private shmem or hugetlbfs inodes for shm segments.
The shm implementation internally uses shmem or hugetlbfs inodes for shm segments. As these inodes are never directly exposed to userspace and only accessed through the shm operations which are already hooked by security modules, mark the inodes with the S_PRIVATE flag so that inode security initialization and permission checking is skipped. This was motivated by the following lockdep warning: === [ INFO: possible circular locking dependency detected ] 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: GW --- httpd/1597 is trying to acquire lock: (ids-rwsem){+.}, at: [81385354] shm_close+0x34/0x130 (mm-mmap_sem){++}, at: [81386bbb] SyS_shmdt+0x4b/0x180 [81109a07] lock_acquire+0xc7/0x270 [81217baa] __might_fault+0x7a/0xa0 [81284a1e] filldir+0x9e/0x130 [a019bb08] xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs] [a019c5b4] xfs_readdir+0x1b4/0x330 [xfs] [a019f38b] xfs_file_readdir+0x2b/0x30 [xfs] [812847e7] iterate_dir+0x97/0x130 [81284d21] SyS_getdents+0x91/0x120 [81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76 [81109a07] lock_acquire+0xc7/0x270 [81101e97] down_read_nested+0x57/0xa0 [a01b0e57] xfs_ilock+0x167/0x350 [xfs] [a01b10b8] xfs_ilock_attr_map_shared+0x38/0x50 [xfs] [a014799d] xfs_attr_get+0xbd/0x190 [xfs] [a01c17ad] xfs_xattr_get+0x3d/0x70 [xfs] [8129962f] generic_getxattr+0x4f/0x70 [8139ba52] inode_doinit_with_dentry+0x162/0x670 [8139cf69] sb_finish_set_opts+0xd9/0x230 [8139d66c] selinux_set_mnt_opts+0x35c/0x660 [8139ff97] superblock_doinit+0x77/0xf0 [813a0020] delayed_superblock_init+0x10/0x20 [81272d23] iterate_supers+0xb3/0x110 [813a4e5f] selinux_complete_init+0x2f/0x40 [813b47a3] security_load_policy+0x103/0x600 [813a6901] sel_write_load+0xc1/0x750 [8126e817] __vfs_write+0x37/0x100 [8126f229] vfs_write+0xa9/0x1a0 [8126ff48] SyS_write+0x58/0xd0 [81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76 [81109a07] lock_acquire+0xc7/0x270 [8186de8f] mutex_lock_nested+0x7f/0x3e0 [8139b9a9] inode_doinit_with_dentry+0xb9/0x670 [8139bf7c] selinux_d_instantiate+0x1c/0x20 [813955f6] security_d_instantiate+0x36/0x60 [81287c34] d_instantiate+0x54/0x70 [8120111c] __shmem_file_setup+0xdc/0x240 [81201290] shmem_file_setup+0x10/0x20 [813856e0] newseg+0x290/0x3a0 [8137e278] ipcget+0x208/0x2d0 [81386074] SyS_shmget+0x54/0x70 [81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76 [81108df8] __lock_acquire+0x1a78/0x1d00 [81109a07] lock_acquire+0xc7/0x270 [8186efba] down_write+0x5a/0xc0 [81385354] shm_close+0x34/0x130 [812203a5] remove_vma+0x45/0x80 [81222a30] do_munmap+0x2b0/0x460 [81386c25] SyS_shmdt+0xb5/0x180 [81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76 Chain exists of:#012 ids-rwsem -- xfs_dir_ilock_class -- mm-mmap_sem Possible unsafe locking scenario: CPU0CPU1 lock(mm-mmap_sem); lock(xfs_dir_ilock_class); lock(mm-mmap_sem); lock(ids-rwsem); 1 lock held by httpd/1597: CPU: 7 PID: 1597 Comm: httpd Tainted: G W 4.2.0-0.rc3.git0.1.fc24.x86_64+Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Pla 6cb6fe9d 88019ff07c58 81868175 82aea390 88019ff07ca8 81105903 88019ff07c78 88019ff07d08 0001 8800b75108f0 Call Trace: [81868175] dump_stack+0x4c/0x65 [81105903] print_circular_bug+0x1e3/0x250 [81108df8] __lock_acquire+0x1a78/0x1d00 [81220c33] ? unlink_file_vma+0x33/0x60 [81109a07] lock_acquire+0xc7/0x270 [81385354] ? shm_close+0x34/0x130 [8186efba] down_write+0x5a/0xc0 [81385354] ? shm_close+0x34/0x130 [81385354] shm_close+0x34/0x130 [812203a5] remove_vma+0x45/0x80 [81222a30] do_munmap+0x2b0/0x460 [81386bbb] ? SyS_shmdt+0x4b/0x180 [81386c25] SyS_shmdt+0xb5/0x180 [81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76 Reported-by: Morten Stevens mstev...@fedoraproject.org Signed-off-by: Stephen Smalley s...@tycho.nsa.gov --- fs/hugetlbfs/inode.c | 2 ++ ipc/shm.c| 2 +- mm/shmem.c | 4 ++-- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs
Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS
On 07/22/2015 08:46 AM, Morten Stevens wrote: > 2015-06-17 13:45 GMT+02:00 Morten Stevens : >> 2015-06-15 8:09 GMT+02:00 Daniel Wagner : >>> On 06/14/2015 06:48 PM, Hugh Dickins wrote: It appears that, at some point last year, XFS made directory handling changes which bring it into lockdep conflict with shmem_zero_setup(): it is surprising that mmap() can clone an inode while holding mmap_sem, but that has been so for many years. Since those few lockdep traces that I've seen all implicated selinux, I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which v3.13's commit c7277090927a ("security: shmem: implement kernel private shmem inodes") introduced to avoid LSM checks on kernel-internal inodes: the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail. This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers which cloned inode in mmap(), but if so, I cannot locate them now. Reported-and-tested-by: Prarit Bhargava Reported-by: Daniel Wagner >>> >>> Reported-and-tested-by: Daniel Wagner >>> >>> Sorry for the long delay. It took me a while to figure out my original >>> setup. I could verify that this patch made the lockdep message go away >>> on 4.0-rc6 and also on 4.1-rc8. >> >> Yes, it's also fixed for me after applying this patch to 4.1-rc8. > > Here is another deadlock with the latest 4.2.0-rc3: > > Jul 22 14:36:40 fc23 kernel: > == > Jul 22 14:36:40 fc23 kernel: [ INFO: possible circular locking > dependency detected ] > Jul 22 14:36:40 fc23 kernel: 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 > Tainted: GW > Jul 22 14:36:40 fc23 kernel: > --- > Jul 22 14:36:40 fc23 kernel: httpd/1597 is trying to acquire lock: > Jul 22 14:36:40 fc23 kernel: (>rwsem){+.}, at: > [] shm_close+0x34/0x130 > Jul 22 14:36:40 fc23 kernel: #012but task is already holding lock: > Jul 22 14:36:40 fc23 kernel: (>mmap_sem){++}, at: > [] SyS_shmdt+0x4b/0x180 > Jul 22 14:36:40 fc23 kernel: #012which lock already depends on the new lock. > Jul 22 14:36:40 fc23 kernel: #012the existing dependency chain (in > reverse order) is: > Jul 22 14:36:40 fc23 kernel: #012-> #3 (>mmap_sem){++}: > Jul 22 14:36:40 fc23 kernel: [] > lock_acquire+0xc7/0x270 > Jul 22 14:36:40 fc23 kernel: [] > __might_fault+0x7a/0xa0 > Jul 22 14:36:40 fc23 kernel: [] filldir+0x9e/0x130 > Jul 22 14:36:40 fc23 kernel: [] > xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs] > Jul 22 14:36:40 fc23 kernel: [] > xfs_readdir+0x1b4/0x330 [xfs] > Jul 22 14:36:40 fc23 kernel: [] > xfs_file_readdir+0x2b/0x30 [xfs] > Jul 22 14:36:40 fc23 kernel: [] iterate_dir+0x97/0x130 > Jul 22 14:36:40 fc23 kernel: [] > SyS_getdents+0x91/0x120 > Jul 22 14:36:40 fc23 kernel: [] > entry_SYSCALL_64_fastpath+0x12/0x76 > Jul 22 14:36:40 fc23 kernel: #012-> #2 (_dir_ilock_class){.+}: > Jul 22 14:36:40 fc23 kernel: [] > lock_acquire+0xc7/0x270 > Jul 22 14:36:40 fc23 kernel: [] > down_read_nested+0x57/0xa0 > Jul 22 14:36:40 fc23 kernel: [] > xfs_ilock+0x167/0x350 [xfs] > Jul 22 14:36:40 fc23 kernel: [] > xfs_ilock_attr_map_shared+0x38/0x50 [xfs] > Jul 22 14:36:40 fc23 kernel: [] > xfs_attr_get+0xbd/0x190 [xfs] > Jul 22 14:36:40 fc23 kernel: [] > xfs_xattr_get+0x3d/0x70 [xfs] > Jul 22 14:36:40 fc23 kernel: [] > generic_getxattr+0x4f/0x70 > Jul 22 14:36:40 fc23 kernel: [] > inode_doinit_with_dentry+0x162/0x670 > Jul 22 14:36:40 fc23 kernel: [] > sb_finish_set_opts+0xd9/0x230 > Jul 22 14:36:40 fc23 kernel: [] > selinux_set_mnt_opts+0x35c/0x660 > Jul 22 14:36:40 fc23 kernel: [] > superblock_doinit+0x77/0xf0 > Jul 22 14:36:40 fc23 kernel: [] > delayed_superblock_init+0x10/0x20 > Jul 22 14:36:40 fc23 kernel: [] > iterate_supers+0xb3/0x110 > Jul 22 14:36:40 fc23 kernel: [] > selinux_complete_init+0x2f/0x40 > Jul 22 14:36:40 fc23 kernel: [] > security_load_policy+0x103/0x600 > Jul 22 14:36:40 fc23 kernel: [] > sel_write_load+0xc1/0x750 > Jul 22 14:36:40 fc23 kernel: [] __vfs_write+0x37/0x100 > Jul 22 14:36:40 fc23 kernel: [] vfs_write+0xa9/0x1a0 > Jul 22 14:36:40 fc23 kernel: [] SyS_write+0x58/0xd0 > Jul 22 14:36:40 fc23 kernel: [] > entry_SYSCALL_64_fastpath+0x12/0x76 > Jul 22 14:36:40 fc23 kernel: #012-> #1 (>lock){+.+.+.}: > Jul 22 14:36:40 fc23 kernel: [] > lock_acquire+0xc7/0x270 > Jul 22 14:36:40 fc23 kernel: [] > mutex_lock_nested+0x7f/0x3e0 > Jul 22 14:36:40 fc23 kernel: [] > inode_doinit_with_dentry+0xb9/0x670 > Jul 22 14:36:40 fc23 kernel: [] > selinux_d_instantiate+0x1c/0x20 > Jul 22 14:36:40 fc23 kernel: [] > security_d_instantiate+0x36/0x60 > Jul 22 14:36:40 fc23 kernel: []
Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts
On 07/22/2015 04:25 PM, Stephen Smalley wrote: > On 07/22/2015 12:14 PM, Seth Forshee wrote: >> On Wed, Jul 22, 2015 at 12:02:13PM -0400, Stephen Smalley wrote: >>> On 07/16/2015 09:23 AM, Stephen Smalley wrote: >>>> On 07/15/2015 03:46 PM, Seth Forshee wrote: >>>>> Unprivileged users should not be able to supply security labels >>>>> in filesystems, nor should they be able to supply security >>>>> contexts in unprivileged mounts. For any mount where s_user_ns is >>>>> not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior >>>>> and return EPERM if any contexts are supplied in the mount >>>>> options. >>>>> >>>>> Signed-off-by: Seth Forshee >>>> >>>> I think this is obsoleted by the subsequent discussion, but just for the >>>> record: this patch would cause the files in the userns mount to be left >>>> with the "unlabeled" label, and therefore under typical policies, >>>> completely inaccessible to any process in a confined domain. >>> >>> The right way to handle this for SELinux would be to automatically use >>> mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by >>> specifying a context= mount option), with the sbsec->mntpoint_sid set >>> from some related object (e.g. the block device file context, as in your >>> patches for Smack). That will cause SELinux to use that value instead >>> of any xattr value from the filesystem and will cause attempts by >>> userspace to set the security.selinux xattr to fail on that filesystem. >>> That is how SELinux normally deals with untrusted filesystems, except >>> that it is normally specified as a mount option by a trusted mounting >>> process, whereas in your case you need to automatically set it. >> >> Excellent, thank you for the advice. I'll start on this when I've >> finished with Smack. > > Not tested, but something like this should work. Note that it should > come after the call to security_fs_use() so we know whether SELinux > would even try to use xattrs supplied by the filesystem in the first place. > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > index 564079c..84da3a2 100644 > --- a/security/selinux/hooks.c > +++ b/security/selinux/hooks.c > @@ -745,6 +745,30 @@ static int selinux_set_mnt_opts(struct super_block *sb, > goto out; > } > } > + > + /* > +* If this is a user namespace mount, no contexts are allowed > +* on the command line and security labels must be ignored. > +*/ > + if (sb->s_user_ns != _user_ns) { > + if (context_sid || fscontext_sid || rootcontext_sid || > + defcontext_sid) { > + rc = -EACCES; > + goto out; > + } > + if (sbsec->behavior == SECURITY_FS_USE_XATTR) { > + struct block_device *bdev = sb->s_bdev; > + sbsec->behavior = SECURITY_FS_USE_MNTPOINT; > + if (bdev) { > + struct inode_security_struct *isec = > bdev->bd_inode; That should be bdev->bd_inode->i_security. > + sbsec->mntpoint_sid = isec->sid; > + } else { > + sbsec->mntpoint_sid = current_sid(); > + } > + } > + goto out_set_opts; > + } > + > /* sets the context of the superblock for the fs being mounted. */ > if (fscontext_sid) { > rc = may_context_mount_sb_relabel(fscontext_sid, sbsec, > cred); > @@ -813,6 +837,7 @@ static int selinux_set_mnt_opts(struct super_block *sb, > sbsec->def_sid = defcontext_sid; > } > > +out_set_opts: > rc = sb_finish_set_opts(sb); > out: > mutex_unlock(>lock); > > ___ > Selinux mailing list > seli...@tycho.nsa.gov > To unsubscribe, send email to selinux-le...@tycho.nsa.gov. > To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts
On 07/22/2015 12:14 PM, Seth Forshee wrote: > On Wed, Jul 22, 2015 at 12:02:13PM -0400, Stephen Smalley wrote: >> On 07/16/2015 09:23 AM, Stephen Smalley wrote: >>> On 07/15/2015 03:46 PM, Seth Forshee wrote: >>>> Unprivileged users should not be able to supply security labels >>>> in filesystems, nor should they be able to supply security >>>> contexts in unprivileged mounts. For any mount where s_user_ns is >>>> not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior >>>> and return EPERM if any contexts are supplied in the mount >>>> options. >>>> >>>> Signed-off-by: Seth Forshee >>> >>> I think this is obsoleted by the subsequent discussion, but just for the >>> record: this patch would cause the files in the userns mount to be left >>> with the "unlabeled" label, and therefore under typical policies, >>> completely inaccessible to any process in a confined domain. >> >> The right way to handle this for SELinux would be to automatically use >> mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by >> specifying a context= mount option), with the sbsec->mntpoint_sid set >> from some related object (e.g. the block device file context, as in your >> patches for Smack). That will cause SELinux to use that value instead >> of any xattr value from the filesystem and will cause attempts by >> userspace to set the security.selinux xattr to fail on that filesystem. >> That is how SELinux normally deals with untrusted filesystems, except >> that it is normally specified as a mount option by a trusted mounting >> process, whereas in your case you need to automatically set it. > > Excellent, thank you for the advice. I'll start on this when I've > finished with Smack. Not tested, but something like this should work. Note that it should come after the call to security_fs_use() so we know whether SELinux would even try to use xattrs supplied by the filesystem in the first place. diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 564079c..84da3a2 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -745,6 +745,30 @@ static int selinux_set_mnt_opts(struct super_block *sb, goto out; } } + + /* +* If this is a user namespace mount, no contexts are allowed +* on the command line and security labels must be ignored. +*/ + if (sb->s_user_ns != _user_ns) { + if (context_sid || fscontext_sid || rootcontext_sid || + defcontext_sid) { + rc = -EACCES; + goto out; + } + if (sbsec->behavior == SECURITY_FS_USE_XATTR) { + struct block_device *bdev = sb->s_bdev; + sbsec->behavior = SECURITY_FS_USE_MNTPOINT; + if (bdev) { + struct inode_security_struct *isec = bdev->bd_inode; + sbsec->mntpoint_sid = isec->sid; + } else { + sbsec->mntpoint_sid = current_sid(); + } + } + goto out_set_opts; + } + /* sets the context of the superblock for the fs being mounted. */ if (fscontext_sid) { rc = may_context_mount_sb_relabel(fscontext_sid, sbsec, cred); @@ -813,6 +837,7 @@ static int selinux_set_mnt_opts(struct super_block *sb, sbsec->def_sid = defcontext_sid; } +out_set_opts: rc = sb_finish_set_opts(sb); out: mutex_unlock(>lock); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts
On 07/16/2015 09:23 AM, Stephen Smalley wrote: > On 07/15/2015 03:46 PM, Seth Forshee wrote: >> Unprivileged users should not be able to supply security labels >> in filesystems, nor should they be able to supply security >> contexts in unprivileged mounts. For any mount where s_user_ns is >> not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior >> and return EPERM if any contexts are supplied in the mount >> options. >> >> Signed-off-by: Seth Forshee > > I think this is obsoleted by the subsequent discussion, but just for the > record: this patch would cause the files in the userns mount to be left > with the "unlabeled" label, and therefore under typical policies, > completely inaccessible to any process in a confined domain. The right way to handle this for SELinux would be to automatically use mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by specifying a context= mount option), with the sbsec->mntpoint_sid set from some related object (e.g. the block device file context, as in your patches for Smack). That will cause SELinux to use that value instead of any xattr value from the filesystem and will cause attempts by userspace to set the security.selinux xattr to fail on that filesystem. That is how SELinux normally deals with untrusted filesystems, except that it is normally specified as a mount option by a trusted mounting process, whereas in your case you need to automatically set it. > >> --- >> security/selinux/hooks.c | 14 ++ >> 1 file changed, 14 insertions(+) >> >> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c >> index 459e71ddbc9d..eeb71e45ab82 100644 >> --- a/security/selinux/hooks.c >> +++ b/security/selinux/hooks.c >> @@ -732,6 +732,19 @@ static int selinux_set_mnt_opts(struct super_block *sb, >> !strcmp(sb->s_type->name, "pstore")) >> sbsec->flags |= SE_SBGENFS; >> >> +/* >> + * If this is a user namespace mount, no contexts are allowed >> + * on the command line and security labels mus be ignored. >> + */ >> +if (sb->s_user_ns != _user_ns) { >> +if (context_sid || fscontext_sid || rootcontext_sid || >> +defcontext_sid) >> +return -EPERM; >> +sbsec->behavior = SECURITY_FS_USE_NONE; >> +goto out_set_opts; >> +} >> + >> + >> if (!sbsec->behavior) { >> /* >> * Determine the labeling behavior to use for this >> @@ -813,6 +826,7 @@ static int selinux_set_mnt_opts(struct super_block *sb, >> sbsec->def_sid = defcontext_sid; >> } >> >> +out_set_opts: >> rc = sb_finish_set_opts(sb); >> out: >> mutex_unlock(>lock); >> > > -- > To unsubscribe from this list: send the line "unsubscribe > linux-security-module" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts
On 07/16/2015 09:23 AM, Stephen Smalley wrote: On 07/15/2015 03:46 PM, Seth Forshee wrote: Unprivileged users should not be able to supply security labels in filesystems, nor should they be able to supply security contexts in unprivileged mounts. For any mount where s_user_ns is not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior and return EPERM if any contexts are supplied in the mount options. Signed-off-by: Seth Forshee seth.fors...@canonical.com I think this is obsoleted by the subsequent discussion, but just for the record: this patch would cause the files in the userns mount to be left with the unlabeled label, and therefore under typical policies, completely inaccessible to any process in a confined domain. The right way to handle this for SELinux would be to automatically use mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by specifying a context= mount option), with the sbsec-mntpoint_sid set from some related object (e.g. the block device file context, as in your patches for Smack). That will cause SELinux to use that value instead of any xattr value from the filesystem and will cause attempts by userspace to set the security.selinux xattr to fail on that filesystem. That is how SELinux normally deals with untrusted filesystems, except that it is normally specified as a mount option by a trusted mounting process, whereas in your case you need to automatically set it. --- security/selinux/hooks.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 459e71ddbc9d..eeb71e45ab82 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -732,6 +732,19 @@ static int selinux_set_mnt_opts(struct super_block *sb, !strcmp(sb-s_type-name, pstore)) sbsec-flags |= SE_SBGENFS; +/* + * If this is a user namespace mount, no contexts are allowed + * on the command line and security labels mus be ignored. + */ +if (sb-s_user_ns != init_user_ns) { +if (context_sid || fscontext_sid || rootcontext_sid || +defcontext_sid) +return -EPERM; +sbsec-behavior = SECURITY_FS_USE_NONE; +goto out_set_opts; +} + + if (!sbsec-behavior) { /* * Determine the labeling behavior to use for this @@ -813,6 +826,7 @@ static int selinux_set_mnt_opts(struct super_block *sb, sbsec-def_sid = defcontext_sid; } +out_set_opts: rc = sb_finish_set_opts(sb); out: mutex_unlock(sbsec-lock); -- To unsubscribe from this list: send the line unsubscribe linux-security-module in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts
On 07/22/2015 12:14 PM, Seth Forshee wrote: On Wed, Jul 22, 2015 at 12:02:13PM -0400, Stephen Smalley wrote: On 07/16/2015 09:23 AM, Stephen Smalley wrote: On 07/15/2015 03:46 PM, Seth Forshee wrote: Unprivileged users should not be able to supply security labels in filesystems, nor should they be able to supply security contexts in unprivileged mounts. For any mount where s_user_ns is not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior and return EPERM if any contexts are supplied in the mount options. Signed-off-by: Seth Forshee seth.fors...@canonical.com I think this is obsoleted by the subsequent discussion, but just for the record: this patch would cause the files in the userns mount to be left with the unlabeled label, and therefore under typical policies, completely inaccessible to any process in a confined domain. The right way to handle this for SELinux would be to automatically use mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by specifying a context= mount option), with the sbsec-mntpoint_sid set from some related object (e.g. the block device file context, as in your patches for Smack). That will cause SELinux to use that value instead of any xattr value from the filesystem and will cause attempts by userspace to set the security.selinux xattr to fail on that filesystem. That is how SELinux normally deals with untrusted filesystems, except that it is normally specified as a mount option by a trusted mounting process, whereas in your case you need to automatically set it. Excellent, thank you for the advice. I'll start on this when I've finished with Smack. Not tested, but something like this should work. Note that it should come after the call to security_fs_use() so we know whether SELinux would even try to use xattrs supplied by the filesystem in the first place. diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 564079c..84da3a2 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -745,6 +745,30 @@ static int selinux_set_mnt_opts(struct super_block *sb, goto out; } } + + /* +* If this is a user namespace mount, no contexts are allowed +* on the command line and security labels must be ignored. +*/ + if (sb-s_user_ns != init_user_ns) { + if (context_sid || fscontext_sid || rootcontext_sid || + defcontext_sid) { + rc = -EACCES; + goto out; + } + if (sbsec-behavior == SECURITY_FS_USE_XATTR) { + struct block_device *bdev = sb-s_bdev; + sbsec-behavior = SECURITY_FS_USE_MNTPOINT; + if (bdev) { + struct inode_security_struct *isec = bdev-bd_inode; + sbsec-mntpoint_sid = isec-sid; + } else { + sbsec-mntpoint_sid = current_sid(); + } + } + goto out_set_opts; + } + /* sets the context of the superblock for the fs being mounted. */ if (fscontext_sid) { rc = may_context_mount_sb_relabel(fscontext_sid, sbsec, cred); @@ -813,6 +837,7 @@ static int selinux_set_mnt_opts(struct super_block *sb, sbsec-def_sid = defcontext_sid; } +out_set_opts: rc = sb_finish_set_opts(sb); out: mutex_unlock(sbsec-lock); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts
On 07/22/2015 04:25 PM, Stephen Smalley wrote: On 07/22/2015 12:14 PM, Seth Forshee wrote: On Wed, Jul 22, 2015 at 12:02:13PM -0400, Stephen Smalley wrote: On 07/16/2015 09:23 AM, Stephen Smalley wrote: On 07/15/2015 03:46 PM, Seth Forshee wrote: Unprivileged users should not be able to supply security labels in filesystems, nor should they be able to supply security contexts in unprivileged mounts. For any mount where s_user_ns is not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior and return EPERM if any contexts are supplied in the mount options. Signed-off-by: Seth Forshee seth.fors...@canonical.com I think this is obsoleted by the subsequent discussion, but just for the record: this patch would cause the files in the userns mount to be left with the unlabeled label, and therefore under typical policies, completely inaccessible to any process in a confined domain. The right way to handle this for SELinux would be to automatically use mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by specifying a context= mount option), with the sbsec-mntpoint_sid set from some related object (e.g. the block device file context, as in your patches for Smack). That will cause SELinux to use that value instead of any xattr value from the filesystem and will cause attempts by userspace to set the security.selinux xattr to fail on that filesystem. That is how SELinux normally deals with untrusted filesystems, except that it is normally specified as a mount option by a trusted mounting process, whereas in your case you need to automatically set it. Excellent, thank you for the advice. I'll start on this when I've finished with Smack. Not tested, but something like this should work. Note that it should come after the call to security_fs_use() so we know whether SELinux would even try to use xattrs supplied by the filesystem in the first place. diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 564079c..84da3a2 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -745,6 +745,30 @@ static int selinux_set_mnt_opts(struct super_block *sb, goto out; } } + + /* +* If this is a user namespace mount, no contexts are allowed +* on the command line and security labels must be ignored. +*/ + if (sb-s_user_ns != init_user_ns) { + if (context_sid || fscontext_sid || rootcontext_sid || + defcontext_sid) { + rc = -EACCES; + goto out; + } + if (sbsec-behavior == SECURITY_FS_USE_XATTR) { + struct block_device *bdev = sb-s_bdev; + sbsec-behavior = SECURITY_FS_USE_MNTPOINT; + if (bdev) { + struct inode_security_struct *isec = bdev-bd_inode; That should be bdev-bd_inode-i_security. + sbsec-mntpoint_sid = isec-sid; + } else { + sbsec-mntpoint_sid = current_sid(); + } + } + goto out_set_opts; + } + /* sets the context of the superblock for the fs being mounted. */ if (fscontext_sid) { rc = may_context_mount_sb_relabel(fscontext_sid, sbsec, cred); @@ -813,6 +837,7 @@ static int selinux_set_mnt_opts(struct super_block *sb, sbsec-def_sid = defcontext_sid; } +out_set_opts: rc = sb_finish_set_opts(sb); out: mutex_unlock(sbsec-lock); ___ Selinux mailing list seli...@tycho.nsa.gov To unsubscribe, send email to selinux-le...@tycho.nsa.gov. To get help, send an email containing help to selinux-requ...@tycho.nsa.gov. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS
On 07/22/2015 08:46 AM, Morten Stevens wrote: 2015-06-17 13:45 GMT+02:00 Morten Stevens mstev...@fedoraproject.org: 2015-06-15 8:09 GMT+02:00 Daniel Wagner w...@monom.org: On 06/14/2015 06:48 PM, Hugh Dickins wrote: It appears that, at some point last year, XFS made directory handling changes which bring it into lockdep conflict with shmem_zero_setup(): it is surprising that mmap() can clone an inode while holding mmap_sem, but that has been so for many years. Since those few lockdep traces that I've seen all implicated selinux, I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which v3.13's commit c7277090927a (security: shmem: implement kernel private shmem inodes) introduced to avoid LSM checks on kernel-internal inodes: the mmap(/dev/zero) cloned inode is indeed a kernel-internal detail. This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers which cloned inode in mmap(), but if so, I cannot locate them now. Reported-and-tested-by: Prarit Bhargava pra...@redhat.com Reported-by: Daniel Wagner w...@monom.org Reported-and-tested-by: Daniel Wagner w...@monom.org Sorry for the long delay. It took me a while to figure out my original setup. I could verify that this patch made the lockdep message go away on 4.0-rc6 and also on 4.1-rc8. Yes, it's also fixed for me after applying this patch to 4.1-rc8. Here is another deadlock with the latest 4.2.0-rc3: Jul 22 14:36:40 fc23 kernel: == Jul 22 14:36:40 fc23 kernel: [ INFO: possible circular locking dependency detected ] Jul 22 14:36:40 fc23 kernel: 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: GW Jul 22 14:36:40 fc23 kernel: --- Jul 22 14:36:40 fc23 kernel: httpd/1597 is trying to acquire lock: Jul 22 14:36:40 fc23 kernel: (ids-rwsem){+.}, at: [81385354] shm_close+0x34/0x130 Jul 22 14:36:40 fc23 kernel: #012but task is already holding lock: Jul 22 14:36:40 fc23 kernel: (mm-mmap_sem){++}, at: [81386bbb] SyS_shmdt+0x4b/0x180 Jul 22 14:36:40 fc23 kernel: #012which lock already depends on the new lock. Jul 22 14:36:40 fc23 kernel: #012the existing dependency chain (in reverse order) is: Jul 22 14:36:40 fc23 kernel: #012- #3 (mm-mmap_sem){++}: Jul 22 14:36:40 fc23 kernel: [81109a07] lock_acquire+0xc7/0x270 Jul 22 14:36:40 fc23 kernel: [81217baa] __might_fault+0x7a/0xa0 Jul 22 14:36:40 fc23 kernel: [81284a1e] filldir+0x9e/0x130 Jul 22 14:36:40 fc23 kernel: [a019bb08] xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs] Jul 22 14:36:40 fc23 kernel: [a019c5b4] xfs_readdir+0x1b4/0x330 [xfs] Jul 22 14:36:40 fc23 kernel: [a019f38b] xfs_file_readdir+0x2b/0x30 [xfs] Jul 22 14:36:40 fc23 kernel: [812847e7] iterate_dir+0x97/0x130 Jul 22 14:36:40 fc23 kernel: [81284d21] SyS_getdents+0x91/0x120 Jul 22 14:36:40 fc23 kernel: [81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76 Jul 22 14:36:40 fc23 kernel: #012- #2 (xfs_dir_ilock_class){.+}: Jul 22 14:36:40 fc23 kernel: [81109a07] lock_acquire+0xc7/0x270 Jul 22 14:36:40 fc23 kernel: [81101e97] down_read_nested+0x57/0xa0 Jul 22 14:36:40 fc23 kernel: [a01b0e57] xfs_ilock+0x167/0x350 [xfs] Jul 22 14:36:40 fc23 kernel: [a01b10b8] xfs_ilock_attr_map_shared+0x38/0x50 [xfs] Jul 22 14:36:40 fc23 kernel: [a014799d] xfs_attr_get+0xbd/0x190 [xfs] Jul 22 14:36:40 fc23 kernel: [a01c17ad] xfs_xattr_get+0x3d/0x70 [xfs] Jul 22 14:36:40 fc23 kernel: [8129962f] generic_getxattr+0x4f/0x70 Jul 22 14:36:40 fc23 kernel: [8139ba52] inode_doinit_with_dentry+0x162/0x670 Jul 22 14:36:40 fc23 kernel: [8139cf69] sb_finish_set_opts+0xd9/0x230 Jul 22 14:36:40 fc23 kernel: [8139d66c] selinux_set_mnt_opts+0x35c/0x660 Jul 22 14:36:40 fc23 kernel: [8139ff97] superblock_doinit+0x77/0xf0 Jul 22 14:36:40 fc23 kernel: [813a0020] delayed_superblock_init+0x10/0x20 Jul 22 14:36:40 fc23 kernel: [81272d23] iterate_supers+0xb3/0x110 Jul 22 14:36:40 fc23 kernel: [813a4e5f] selinux_complete_init+0x2f/0x40 Jul 22 14:36:40 fc23 kernel: [813b47a3] security_load_policy+0x103/0x600 Jul 22 14:36:40 fc23 kernel: [813a6901] sel_write_load+0xc1/0x750 Jul 22 14:36:40 fc23 kernel: [8126e817] __vfs_write+0x37/0x100 Jul 22 14:36:40 fc23 kernel: [8126f229] vfs_write+0xa9/0x1a0 Jul 22 14:36:40 fc23 kernel: [8126ff48] SyS_write+0x58/0xd0 Jul 22 14:36:40 fc23 kernel: [81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76 Jul 22 14:36:40 fc23 kernel: #012- #1
Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts
On 07/15/2015 03:46 PM, Seth Forshee wrote: > Unprivileged users should not be able to supply security labels > in filesystems, nor should they be able to supply security > contexts in unprivileged mounts. For any mount where s_user_ns is > not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior > and return EPERM if any contexts are supplied in the mount > options. > > Signed-off-by: Seth Forshee I think this is obsoleted by the subsequent discussion, but just for the record: this patch would cause the files in the userns mount to be left with the "unlabeled" label, and therefore under typical policies, completely inaccessible to any process in a confined domain. > --- > security/selinux/hooks.c | 14 ++ > 1 file changed, 14 insertions(+) > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > index 459e71ddbc9d..eeb71e45ab82 100644 > --- a/security/selinux/hooks.c > +++ b/security/selinux/hooks.c > @@ -732,6 +732,19 @@ static int selinux_set_mnt_opts(struct super_block *sb, > !strcmp(sb->s_type->name, "pstore")) > sbsec->flags |= SE_SBGENFS; > > + /* > + * If this is a user namespace mount, no contexts are allowed > + * on the command line and security labels mus be ignored. > + */ > + if (sb->s_user_ns != _user_ns) { > + if (context_sid || fscontext_sid || rootcontext_sid || > + defcontext_sid) > + return -EPERM; > + sbsec->behavior = SECURITY_FS_USE_NONE; > + goto out_set_opts; > + } > + > + > if (!sbsec->behavior) { > /* >* Determine the labeling behavior to use for this > @@ -813,6 +826,7 @@ static int selinux_set_mnt_opts(struct super_block *sb, > sbsec->def_sid = defcontext_sid; > } > > +out_set_opts: > rc = sb_finish_set_opts(sb); > out: > mutex_unlock(>lock); > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] Initial support for user namespace owned mounts
On 07/15/2015 09:05 PM, Andy Lutomirski wrote: > On Jul 15, 2015 3:34 PM, "Eric W. Biederman" wrote: >> >> Seth Forshee writes: >> >>> On Wed, Jul 15, 2015 at 04:06:35PM -0500, Eric W. Biederman wrote: Casey Schaufler writes: > On 7/15/2015 12:46 PM, Seth Forshee wrote: >> These are the first in a larger set of patches that I've been working on >> (with help from Eric Biederman) to support mounting ext4 and fuse >> filesystems from within user namespaces. I've pushed the full series to: >> >> git://kernel.ubuntu.com/sforshee/linux.git userns-mounts >> >> Taking the series as a whole, the strategy is to handle as much of the >> heavy lifting as possible in the vfs so the filesystems don't have to >> handle weird edge cases. If you look at the full series you'll find that >> the changes in ext4 to support user namespace mounts turn out to be >> fairly minimal (fuse is a bit more complicated though as it must deal >> with translating ids for a userspace process which is running in pid and >> user namespaces). >> >> The patches I'm sending today lay some of the groundwork in the vfs and >> related code. They fall into two broad groups: >> >> 1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are >> pretty straightforward, and Eric has expressed interest in merging >> these patches soon. Note that patch 2 won't apply cleanly without >> Eric's noexec patches for proc and sys [1]. >> >> 2. Patches 2-7 tighten down security for mounts with s_user_ns != >> _user_ns. This includes updates to how file caps and suid are >> handled and LSM updates to ignore security labels on superblocks >> from non-init namespaces. >> >> The LSM changes in particular may not be optimal, as I don't have a >> lot of familiarity with this code, so I'd be especially appreciative >> of review of these changes and suggestions on how to improve them. > > Lukasz Pawelczyk proposed > LSM support in user namespaces ([RFC] lsm: namespace hooks) > that make a whole lot more sense than just turning off > the option of using labels on files. Gutting the ability > to use MAC in a namespace is a step down the road of > making MAC and namespaces incompatible. This is not "turning off the option to use labels on files". This is supporting mounting filesystems like ext4 by unprivileged users and not trusting the labels they set in the same way as we trust labels on filesystems mounted by privileged users. The first step needs to be not trusting those labels and treating such filesystems as filesystems without label support. I hope that is Seth has implemented. In the long run we can do more interesting things with such filesystems once the appropriate LSM policy is in place. >>> >>> Yes, this exactly. Right now it looks to me like the only safe thing to >>> do with mounts from unprivileged users is to ignore the security labels, >>> so that's what I'm trying to do with these changes. If there's some >>> better thing to do, or some better way to do it, I'm more than happy to >>> receive that feedback. >> >> Ugh. >> >> This made me realize that we have an interesting problem here. An >> unprivileged mount of tmpfs probably needs to have >> s_user_ns == _user_ns. >> >> Otherwise we will break security labels on tmpfs for no good reason. >> ramfs and sysfs also seem to have similar concerns. >> >> Because they have no backing store we can trust those filesystems with >> security labels. Plus for at least sysfs there is the security label >> bleed through issue, that we need to make certain works. >> >> Perhaps these filesystems with trusted backing store need to call >> "sget_userns(..., _user_ns)". >> >> If we don't get this right we will have significant regressions with >> respect to security labels, and that is not ok. > > That's only a problem if there's anyone who sets security labels on > such a mount. You need global caps to do that (I hope), which > requires someone outside the userns to help, which means there's a > good chance that literally no one does this. Setting of security.selinux attributes is governed by SELinux permission checks, not by capabilities. Also, files are always assigned a label at creation time; a tmpfs inode will be labeled based on its creator without any userspace entity ever calling setxattr() at all. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts
On 07/15/2015 03:46 PM, Seth Forshee wrote: Unprivileged users should not be able to supply security labels in filesystems, nor should they be able to supply security contexts in unprivileged mounts. For any mount where s_user_ns is not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior and return EPERM if any contexts are supplied in the mount options. Signed-off-by: Seth Forshee seth.fors...@canonical.com I think this is obsoleted by the subsequent discussion, but just for the record: this patch would cause the files in the userns mount to be left with the unlabeled label, and therefore under typical policies, completely inaccessible to any process in a confined domain. --- security/selinux/hooks.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 459e71ddbc9d..eeb71e45ab82 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -732,6 +732,19 @@ static int selinux_set_mnt_opts(struct super_block *sb, !strcmp(sb-s_type-name, pstore)) sbsec-flags |= SE_SBGENFS; + /* + * If this is a user namespace mount, no contexts are allowed + * on the command line and security labels mus be ignored. + */ + if (sb-s_user_ns != init_user_ns) { + if (context_sid || fscontext_sid || rootcontext_sid || + defcontext_sid) + return -EPERM; + sbsec-behavior = SECURITY_FS_USE_NONE; + goto out_set_opts; + } + + if (!sbsec-behavior) { /* * Determine the labeling behavior to use for this @@ -813,6 +826,7 @@ static int selinux_set_mnt_opts(struct super_block *sb, sbsec-def_sid = defcontext_sid; } +out_set_opts: rc = sb_finish_set_opts(sb); out: mutex_unlock(sbsec-lock); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] Initial support for user namespace owned mounts
On 07/15/2015 09:05 PM, Andy Lutomirski wrote: On Jul 15, 2015 3:34 PM, Eric W. Biederman ebied...@xmission.com wrote: Seth Forshee seth.fors...@canonical.com writes: On Wed, Jul 15, 2015 at 04:06:35PM -0500, Eric W. Biederman wrote: Casey Schaufler ca...@schaufler-ca.com writes: On 7/15/2015 12:46 PM, Seth Forshee wrote: These are the first in a larger set of patches that I've been working on (with help from Eric Biederman) to support mounting ext4 and fuse filesystems from within user namespaces. I've pushed the full series to: git://kernel.ubuntu.com/sforshee/linux.git userns-mounts Taking the series as a whole, the strategy is to handle as much of the heavy lifting as possible in the vfs so the filesystems don't have to handle weird edge cases. If you look at the full series you'll find that the changes in ext4 to support user namespace mounts turn out to be fairly minimal (fuse is a bit more complicated though as it must deal with translating ids for a userspace process which is running in pid and user namespaces). The patches I'm sending today lay some of the groundwork in the vfs and related code. They fall into two broad groups: 1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are pretty straightforward, and Eric has expressed interest in merging these patches soon. Note that patch 2 won't apply cleanly without Eric's noexec patches for proc and sys [1]. 2. Patches 2-7 tighten down security for mounts with s_user_ns != init_user_ns. This includes updates to how file caps and suid are handled and LSM updates to ignore security labels on superblocks from non-init namespaces. The LSM changes in particular may not be optimal, as I don't have a lot of familiarity with this code, so I'd be especially appreciative of review of these changes and suggestions on how to improve them. Lukasz Pawelczyk l.pawelc...@samsung.com proposed LSM support in user namespaces ([RFC] lsm: namespace hooks) that make a whole lot more sense than just turning off the option of using labels on files. Gutting the ability to use MAC in a namespace is a step down the road of making MAC and namespaces incompatible. This is not turning off the option to use labels on files. This is supporting mounting filesystems like ext4 by unprivileged users and not trusting the labels they set in the same way as we trust labels on filesystems mounted by privileged users. The first step needs to be not trusting those labels and treating such filesystems as filesystems without label support. I hope that is Seth has implemented. In the long run we can do more interesting things with such filesystems once the appropriate LSM policy is in place. Yes, this exactly. Right now it looks to me like the only safe thing to do with mounts from unprivileged users is to ignore the security labels, so that's what I'm trying to do with these changes. If there's some better thing to do, or some better way to do it, I'm more than happy to receive that feedback. Ugh. This made me realize that we have an interesting problem here. An unprivileged mount of tmpfs probably needs to have s_user_ns == init_user_ns. Otherwise we will break security labels on tmpfs for no good reason. ramfs and sysfs also seem to have similar concerns. Because they have no backing store we can trust those filesystems with security labels. Plus for at least sysfs there is the security label bleed through issue, that we need to make certain works. Perhaps these filesystems with trusted backing store need to call sget_userns(..., init_user_ns). If we don't get this right we will have significant regressions with respect to security labels, and that is not ok. That's only a problem if there's anyone who sets security labels on such a mount. You need global caps to do that (I hope), which requires someone outside the userns to help, which means there's a good chance that literally no one does this. Setting of security.selinux attributes is governed by SELinux permission checks, not by capabilities. Also, files are always assigned a label at creation time; a tmpfs inode will be labeled based on its creator without any userspace entity ever calling setxattr() at all. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 5/8] kdbus: use LSM hooks in kdbus code
On 07/08/2015 09:37 AM, Stephen Smalley wrote: > On 07/08/2015 06:25 AM, Paul Osmialowski wrote: >> Originates from: >> >> https://github.com/lmctl/kdbus.git (branch: kdbus-lsm-v4.for-systemd-v212) >> commit: aa0885489d19be92fa41c6f0a71df28763228a40 >> >> Signed-off-by: Karol Lewandowski >> Signed-off-by: Paul Osmialowski >> --- >> ipc/kdbus/bus.c| 12 ++- >> ipc/kdbus/bus.h| 3 +++ >> ipc/kdbus/connection.c | 54 >> ++ >> ipc/kdbus/connection.h | 4 >> ipc/kdbus/domain.c | 9 - >> ipc/kdbus/domain.h | 2 ++ >> ipc/kdbus/endpoint.c | 11 ++ >> ipc/kdbus/names.c | 11 ++ >> ipc/kdbus/queue.c | 30 ++-- >> 9 files changed, 124 insertions(+), 12 deletions(-) >> >> > >> diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c >> index 9993753..b85cdc7 100644 >> --- a/ipc/kdbus/connection.c >> +++ b/ipc/kdbus/connection.c >> @@ -31,6 +31,7 @@ >> #include >> #include >> #include >> +#include >> >> #include "bus.h" >> #include "connection.h" >> @@ -73,6 +74,8 @@ static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep >> *ep, bool privileged, >> bool is_activator; >> bool is_monitor; >> struct kvec kvec; >> +u32 sid, len; >> +char *label; >> int ret; >> >> struct { >> @@ -222,6 +225,14 @@ static struct kdbus_conn *kdbus_conn_new(struct >> kdbus_ep *ep, bool privileged, >> } >> } >> >> +security_task_getsecid(current, ); >> +security_secid_to_secctx(sid, , ); >> +ret = security_kdbus_connect(conn, label, len); >> +if (ret) { >> +ret = -EPERM; >> +goto exit_unref; >> +} > > This seems convoluted and expensive. If you always want the label of > the current task here, then why not just have security_kdbus_connect() > internally extract the label of the current task? Furthermore, why do we need a separate security field and copy of the current label in the conn->security, when we already have conn->cred->security available to us? I don't think we need new security fields unless we are going to assign some kind of object labeling to these structures separate from their cred, and offhand I don't see why we would do that. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: credential faking
On 07/10/2015 12:48 PM, David Herrmann wrote: > Hi > > On Fri, Jul 10, 2015 at 4:47 PM, Stephen Smalley wrote: >> On 07/10/2015 09:43 AM, David Herrmann wrote: >>> On Fri, Jul 10, 2015 at 3:25 PM, Stephen Smalley wrote: >>>> On 07/09/2015 06:22 PM, David Herrmann wrote: >>>>> With dbus1, clients can ask the dbus-daemon for the seclabel of a peer >>>>> they talk to. They're free to use this information for any purpose. On >>>>> kdbus, we want to be compatible to dbus-daemon. Therefore, if a native >>>>> client queries kdbus for the seclabel of a peer behind a proxy, we >>>>> want that query to return the actual seclabel of the peer, not the >>>>> seclabel of the proxy. Same applies to PIDS and CREDS. >>>>> >>>>> This faked metadata is never used by the kernel for any security >>>>> decisions. It's sole purpose is to return them if a native kdbus >>>>> client queries another peer. Furthermore, this information is never >>>>> transmitted as send-time metadata (as it is, in no way, send-time >>>>> metadata), but only if you explicitly query the connection-time >>>>> metadata of a peer (KDBUS_CMD_CONN_INFO). >>>> >>>> I guess I don't understand the difference. Is there a separate facility >>>> for obtaining the send-time metadata that is not subject to credential >>>> faking? >>> >>> Each message carries metadata of the sender, that was collected at the >>> time of _SEND_. This metadata cannot be faked. >>> Additionally (for introspection and dbus1 compat), kdbus allows peers >>> to query metadata of other peers, that were collected at the time of >>> _CONNECT_. Privileged peers can provide faked _connection_ metadata, >>> which has the side-effect of suppressing send-time metadata. >>> It is up to the receiver to request connection-metadata if a message >>> did not carry send-time metadata. We do this, currently, only to >>> support legacy dbus1 clients which do not support send-time metadata. >> >> So the "privileged" peer (which just means the bus owner, which can be >> completely unprivileged from a typical DAC perspective) can both prevent >> the receiver from getting the (real, unfakeable) send-time metadata and >> supply arbitrary fake credentials for the connection metadata? And the > > (Limited to PIDS/CREDS/SECLABEL metadata, but) yes. > > Note that this is all under the assumption that you never connect to a > bus owned by someone else but you or root. Hence, a peer can only fake > metadata, if it can also ptrace you. If you don't enforce this assumption in kdbus, then you can't be sure that it won't be violated by future userspace. Also, the statement about ptrace doesn't hold when using SELinux or other security modules. >> legacy dbus1 clients (i.e. all current DBUS applications?) will always >> use this potentially faked metadata. Meanwhile, what about new dbus >> clients? What is the standard behavior for them when the send-time >> metadata is suppressed? Do they always fall back to the connection >> metadata? > > This is a decision user-space has to make. In sd-bus, if we trust the > bus (root owned, or our own), we always fall back to connection > metadata. So the only benefit of the credentials in the send-time metadata is they come for free rather than needing to be separately queried? And aside from credential faking (impersonation being a nicer name), when else would they differ from the connection metadata? If the program does a setuid or something after creating the connection? >>>>> Regarding requiring CAP_SYS_ADMIN, I don't really see the point. In >>>>> the kdbus security model, if you don't trust the bus-creator, you >>>>> should not connect to the bus. A bus-creator can bypass kdbus >>>>> policies, sniff on any transmission and modify bus behavior. It just >>>>> seems logical to bind faked-metadata to the same privilege. However, I >>>>> also have no strong feeling about that, if you place valid points. So >>>>> please elaborate. >>>>> But, please be aware that if we require privileges to fake metadata, >>>>> then you need to have such privileges to provide a dbus1 proxy for >>>>> your native bus on kdbus. In other words, users are able to create >>>>> session/user buses, but they need CAP_SYS_ADMIN to spawn the dbus1 >>>>> proxy. This will have the net-effect of us requiring to run the proxy >>>>> a
Re: [RFC 5/8] kdbus: use LSM hooks in kdbus code
On 07/08/2015 09:37 AM, Stephen Smalley wrote: > On 07/08/2015 06:25 AM, Paul Osmialowski wrote: >> Originates from: >> >> https://github.com/lmctl/kdbus.git (branch: kdbus-lsm-v4.for-systemd-v212) >> commit: aa0885489d19be92fa41c6f0a71df28763228a40 >> >> Signed-off-by: Karol Lewandowski >> Signed-off-by: Paul Osmialowski >> --- >> ipc/kdbus/bus.c| 12 ++- >> ipc/kdbus/bus.h| 3 +++ >> ipc/kdbus/connection.c | 54 >> ++ >> ipc/kdbus/connection.h | 4 >> ipc/kdbus/domain.c | 9 - >> ipc/kdbus/domain.h | 2 ++ >> ipc/kdbus/endpoint.c | 11 ++ >> ipc/kdbus/names.c | 11 ++ >> ipc/kdbus/queue.c | 30 ++-- >> 9 files changed, 124 insertions(+), 12 deletions(-) >> >> > >> diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c >> index 9993753..b85cdc7 100644 >> --- a/ipc/kdbus/connection.c >> +++ b/ipc/kdbus/connection.c >> @@ -31,6 +31,7 @@ >> #include >> #include >> #include >> +#include >> >> #include "bus.h" >> #include "connection.h" >> @@ -73,6 +74,8 @@ static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep >> *ep, bool privileged, >> bool is_activator; >> bool is_monitor; >> struct kvec kvec; >> +u32 sid, len; >> +char *label; >> int ret; >> >> struct { >> @@ -222,6 +225,14 @@ static struct kdbus_conn *kdbus_conn_new(struct >> kdbus_ep *ep, bool privileged, >> } >> } >> >> +security_task_getsecid(current, ); >> +security_secid_to_secctx(sid, , ); >> +ret = security_kdbus_connect(conn, label, len); >> +if (ret) { >> +ret = -EPERM; >> +goto exit_unref; >> +} > > This seems convoluted and expensive. If you always want the label of > the current task here, then why not just have security_kdbus_connect() > internally extract the label of the current task? > >> @@ -1107,6 +1119,12 @@ static int kdbus_conn_reply(struct kdbus_conn *src, >> struct kdbus_kmsg *kmsg) >> if (ret < 0) >> goto exit; >> >> +ret = security_kdbus_talk(src, dst); >> +if (ret) { >> +ret = -EPERM; >> +goto exit; >> +} > > Where does kdbus apply its uid-based or other restrictions on > connections? Why do we need to insert separate hooks into each of these > functions? Is there no central chokepoint already for permission > checking that we can hook? For example, why wouldn't you insert a single hook into kdbus_conn_policy_talk() where they perform their DAC checking? You would need to restructure it slightly to ensure that the security hook is only called if it passes the DAC (privileged || uid_eq) check so that we do not trigger MAC denials when DAC wouldn't have allowed it anyway. Also, kdbus_conn_policy_talk() takes a separate conn_creds argument - that should be passed through to the hook as well. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: credential faking
On 07/10/2015 09:43 AM, David Herrmann wrote: > Hi > > On Fri, Jul 10, 2015 at 3:25 PM, Stephen Smalley wrote: >> On 07/09/2015 06:22 PM, David Herrmann wrote: >>> To be clear, faking metadata has one use-case, and one use-case only: >>> dbus1 compatibility >>> >>> In dbus1, clients connect to a unix-socket placed in the file-system >>> hierarchy. To avoid breaking ABI for old clients, we support a >>> unix-kdbus proxy. This proxy is called systemd-bus-proxyd. It is >>> spawned once for each bus we proxy and simply remarshals messages from >>> the client to kdbus and vice versa. >> >> Is this truly necessary? Can't the distributions just update the client >> side libraries to use kdbus if enabled and be done with it? Doesn't >> this proxy undo many of the benefits of using kdbus in the first place? > > We need binary compatibility to dbus1. There're millions of > applications and language bindings with dbus1 compiled in, which we > cannot suddenly break. So, are you saying that there are many applications that statically link the dbus1 library implementation (thus the distributions can't just push an updated shared library that switches from using the socket to using kdbus), and that many of these applications are third party applications not packaged by the distributions (thus the distributions cannot just do a mass rebuild to update these applications too)? Otherwise, I would think that the use of a socket would just be an implementation detail and you would be free to change it without affecting dbus1 library ABI compatibility. >>> With dbus1, clients can ask the dbus-daemon for the seclabel of a peer >>> they talk to. They're free to use this information for any purpose. On >>> kdbus, we want to be compatible to dbus-daemon. Therefore, if a native >>> client queries kdbus for the seclabel of a peer behind a proxy, we >>> want that query to return the actual seclabel of the peer, not the >>> seclabel of the proxy. Same applies to PIDS and CREDS. >>> >>> This faked metadata is never used by the kernel for any security >>> decisions. It's sole purpose is to return them if a native kdbus >>> client queries another peer. Furthermore, this information is never >>> transmitted as send-time metadata (as it is, in no way, send-time >>> metadata), but only if you explicitly query the connection-time >>> metadata of a peer (KDBUS_CMD_CONN_INFO). >> >> I guess I don't understand the difference. Is there a separate facility >> for obtaining the send-time metadata that is not subject to credential >> faking? > > Each message carries metadata of the sender, that was collected at the > time of _SEND_. This metadata cannot be faked. > Additionally (for introspection and dbus1 compat), kdbus allows peers > to query metadata of other peers, that were collected at the time of > _CONNECT_. Privileged peers can provide faked _connection_ metadata, > which has the side-effect of suppressing send-time metadata. > It is up to the receiver to request connection-metadata if a message > did not carry send-time metadata. We do this, currently, only to > support legacy dbus1 clients which do not support send-time metadata. So the "privileged" peer (which just means the bus owner, which can be completely unprivileged from a typical DAC perspective) can both prevent the receiver from getting the (real, unfakeable) send-time metadata and supply arbitrary fake credentials for the connection metadata? And the legacy dbus1 clients (i.e. all current DBUS applications?) will always use this potentially faked metadata. Meanwhile, what about new dbus clients? What is the standard behavior for them when the send-time metadata is suppressed? Do they always fall back to the connection metadata? >>> Regarding requiring CAP_SYS_ADMIN, I don't really see the point. In >>> the kdbus security model, if you don't trust the bus-creator, you >>> should not connect to the bus. A bus-creator can bypass kdbus >>> policies, sniff on any transmission and modify bus behavior. It just >>> seems logical to bind faked-metadata to the same privilege. However, I >>> also have no strong feeling about that, if you place valid points. So >>> please elaborate. >>> But, please be aware that if we require privileges to fake metadata, >>> then you need to have such privileges to provide a dbus1 proxy for >>> your native bus on kdbus. In other words, users are able to create >>> session/user buses, but they need CAP_SYS_ADMIN to spawn the dbus1 >>> proxy. This will have the net-effect of us requiring to run the proxy >>> as root (which, I th
[PATCH] selinux: fix mprotect PROT_EXEC regression caused by mm change
commit 66fc13039422ba7df2d01a8ee0873e4ef965b50b ("mm: shmem_zero_setup skip security check and lockdep conflict with XFS") caused a regression for SELinux by disabling any SELinux checking of mprotect PROT_EXEC on shared anonymous mappings. However, even before that regression, the checking on such mprotect PROT_EXEC calls was inconsistent with the checking on a mmap PROT_EXEC call for a shared anonymous mapping. On a mmap, the security hook is passed a NULL file and knows it is dealing with an anonymous mapping and therefore applies an execmem check and no file checks. On a mprotect, the security hook is passed a vma with a non-NULL vm_file (as this was set from the internally-created shmem file during mmap) and therefore applies the file-based execute check and no execmem check. Since the aforementioned commit now marks the shmem zero inode with the S_PRIVATE flag, the file checks are disabled and we have no checking at all on mprotect PROT_EXEC. Add a test to the mprotect hook logic for such private inodes, and apply an execmem check in that case. This makes the mmap and mprotect checking consistent for shared anonymous mappings, as well as for /dev/zero and ashmem. Signed-off-by: Stephen Smalley --- security/selinux/hooks.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 6231081..564079c 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -3283,7 +3283,8 @@ static int file_map_prot_check(struct file *file, unsigned long prot, int shared int rc = 0; if (default_noexec && - (prot & PROT_EXEC) && (!file || (!shared && (prot & PROT_WRITE { + (prot & PROT_EXEC) && (!file || IS_PRIVATE(file_inode(file)) || + (!shared && (prot & PROT_WRITE { /* * We are making executable an anonymous mapping or a * private file mapping that will also be writable. -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: credential faking
On 07/10/2015 05:05 AM, David Herrmann wrote: > Hi > > On Fri, Jul 10, 2015 at 12:56 AM, Casey Schaufler > wrote: >> On 7/9/2015 3:22 PM, David Herrmann wrote: >>> Regarding requiring CAP_SYS_ADMIN, I don't really see the point. In >>> the kdbus security model, if you don't trust the bus-creator, you >>> should not connect to the bus. >> >> That's fine in a discretionary access control model, but >> not in a mandatory access control model. The decision on >> trust of the "other" guy is never up to the process, it's >> up to the mandatory access control policy. > > Exactly. So LSMs are free to use a hook to limit faking other user's > credentials. But why does that have to affect the default (which, in > the case of kdbus, is a dac model)? > >>> A bus-creator can bypass kdbus >>> policies, sniff on any transmission and modify bus behavior. It just >>> seems logical to bind faked-metadata to the same privilege. However, I >>> also have no strong feeling about that, if you place valid points. So >>> please elaborate. >> >> Smack has to require CAP_MAC_ADMIN to allow a process to fake >> Smack metadata. This is exactly what CAP_MAC_ADMIN is for. >> Changing Smack metadata is considered a hugely dangerous activity. > > I'm totally fine with dropping support to fake seclabels, if LSM > developers see no need for it. I, certainly, will not insist on it. > With that in mind, I'd prefer if we limit this discussion to faking > CREDS/PIDS. Well, based on your use case, we actually do need support for faking seclabels if we need support for faking credentials at all, because your proxy needs to be able to fake all of the credentials in order to be fully transparent and preserve compatibility. So I don't think they can be divorced from each other. Regardless, we will definitely want a hook for controlling this ability to fake credentials, and I think we would want to separately distinguish each of the cases that you currently lump under your single privileged boolean, as the ability to do one should not necessarily imply the ability to do them all. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: credential faking
On 07/09/2015 06:22 PM, David Herrmann wrote: > Hi > > On Thu, Jul 9, 2015 at 8:26 PM, Stephen Smalley wrote: >> Hi, >> >> I have a concern with the support for faked credentials in kdbus, but >> don't know enough about the original motivation or intended use case to >> evaluate it concretely. I raised this issue during the "kdbus for >> 4.1-rc1" thread a while back but none of the kdbus maintainers >> responded, > > Sorry, some mails might have been gone unanswered in that huge thread. > Please feel free to ping us about anything we didn't comment on. See > below.. > >>and the one D-BUS maintainer who did respond said that there >> is no API in dbus-daemon for faking client credentials, so this is not >> something inherited from dbus-daemon or required for compatibility with it. >> >> First, I have doubts as to whether there should be any way to fake the >> seclabel, no matter how "privileged" the caller. Unless there is a >> clear use case for that functionality, I would prefer to see it dropped >> altogether. >> >> Second, IIUC, the ability to fake any portion of the credentials or pids >> is granted if the caller either has CAP_IPC_OWNER or owns the bus (uid >> match). Clearly that isn't sufficient basis for seclabel faking, and it >> seems questionable as to whether it should be sufficient for faking any >> of the other credentials or pids. Compare with e.g. >> net/core/scm.c:scm_check_creds() logic for faking credentials on a Unix >> domain socket, which requires CAP_SYS_ADMIN for faking pid, CAP_SETUID >> for faking any of the uid fields, and CAP_SETGID for faking any of the >> gid fields. >> >> Thanks for any light you can shed on the matter. > > To be clear, faking metadata has one use-case, and one use-case only: > dbus1 compatibility > > In dbus1, clients connect to a unix-socket placed in the file-system > hierarchy. To avoid breaking ABI for old clients, we support a > unix-kdbus proxy. This proxy is called systemd-bus-proxyd. It is > spawned once for each bus we proxy and simply remarshals messages from > the client to kdbus and vice versa. Is this truly necessary? Can't the distributions just update the client side libraries to use kdbus if enabled and be done with it? Doesn't this proxy undo many of the benefits of using kdbus in the first place? > With dbus1, clients can ask the dbus-daemon for the seclabel of a peer > they talk to. They're free to use this information for any purpose. On > kdbus, we want to be compatible to dbus-daemon. Therefore, if a native > client queries kdbus for the seclabel of a peer behind a proxy, we > want that query to return the actual seclabel of the peer, not the > seclabel of the proxy. Same applies to PIDS and CREDS. > > This faked metadata is never used by the kernel for any security > decisions. It's sole purpose is to return them if a native kdbus > client queries another peer. Furthermore, this information is never > transmitted as send-time metadata (as it is, in no way, send-time > metadata), but only if you explicitly query the connection-time > metadata of a peer (KDBUS_CMD_CONN_INFO). I guess I don't understand the difference. Is there a separate facility for obtaining the send-time metadata that is not subject to credential faking? > Regarding requiring CAP_SYS_ADMIN, I don't really see the point. In > the kdbus security model, if you don't trust the bus-creator, you > should not connect to the bus. A bus-creator can bypass kdbus > policies, sniff on any transmission and modify bus behavior. It just > seems logical to bind faked-metadata to the same privilege. However, I > also have no strong feeling about that, if you place valid points. So > please elaborate. > But, please be aware that if we require privileges to fake metadata, > then you need to have such privileges to provide a dbus1 proxy for > your native bus on kdbus. In other words, users are able to create > session/user buses, but they need CAP_SYS_ADMIN to spawn the dbus1 > proxy. This will have the net-effect of us requiring to run the proxy > as root (which, I think, is worse than allowing bus-owners to fake > _connection_ metadata). Applications have a reasonable expectation that credentials supplied by the kernel for a peer are trustworthy. Allowing unprivileged users to forge arbitrary credentials and pids seems fraught with peril. You say that one should never connect to a bus if you do not trust its creator. What mechanisms are provided to allow me to determine whether I trust the bus creator before connecting? Are those mechanisms automatically employed by default? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS
On 07/10/2015 03:48 AM, Hugh Dickins wrote: > On Thu, 9 Jul 2015, Stephen Smalley wrote: >> On 07/09/2015 04:23 AM, Hugh Dickins wrote: >>> On Wed, 8 Jul 2015, Stephen Smalley wrote: >>>> On 07/08/2015 09:13 AM, Stephen Smalley wrote: >>>>> On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins wrote: >>>>>> It appears that, at some point last year, XFS made directory handling >>>>>> changes which bring it into lockdep conflict with shmem_zero_setup(): >>>>>> it is surprising that mmap() can clone an inode while holding mmap_sem, >>>>>> but that has been so for many years. >>>>>> >>>>>> Since those few lockdep traces that I've seen all implicated selinux, >>>>>> I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which >>>>>> v3.13's commit c7277090927a ("security: shmem: implement kernel private >>>>>> shmem inodes") introduced to avoid LSM checks on kernel-internal inodes: >>>>>> the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail. >>>>>> >>>>>> This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero >>>>>> (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers >>>>>> which cloned inode in mmap(), but if so, I cannot locate them now. >>>>> >>>>> This causes a regression for SELinux (please, in the future, cc >>>>> selinux list and Paul Moore on SELinux-related changes). In >>> >>> Surprised and sorry about that, yes, I should have Cc'ed. >>> >>>>> particular, this change disables SELinux checking of mprotect >>>>> PROT_EXEC on shared anonymous mappings, so we lose the ability to >>>>> control executable mappings. That said, we are only getting that >>>>> check today as a side effect of our file execute check on the tmpfs >>>>> inode, whereas it would be better (and more consistent with the >>>>> mmap-time checks) to apply an execmem check in that case, in which >>>>> case we wouldn't care about the inode-based check. However, I am >>>>> unclear on how to correctly detect that situation from >>>>> selinux_file_mprotect() -> file_map_prot_check(), because we do have a >>>>> non-NULL vma->vm_file so we treat it as a file execute check. In >>>>> contrast, if directly creating an anonymous shared mapping with >>>>> PROT_EXEC via mmap(...PROT_EXEC...), selinux_mmap_file is called with >>>>> a NULL file and therefore we end up applying an execmem check. >>> >>> If you're willing to go forward with the change, rather than just call >>> for an immediate revert of it, then I think the right way to detect >>> the situation would be to check IS_PRIVATE(file_inode(vma->vm_file)), >>> wouldn't it? >> >> That seems misleading and might trigger execmem checks on non-shmem >> inodes. S_PRIVATE was originally introduced for fs-internal inodes that >> are never directly exposed to userspace, originally for reiserfs xattr >> inodes (reiserfs xattrs are internally implemented as their own files >> that are hidden from userspace) and later also applied to anon inodes. >> It would be better if we had an explicit way of testing that we are >> dealing with an anonymous shared mapping in selinux_file_mprotect() -> >> file_map_prot_check(). > > But how would any of those original S_PRIVATE inodes arrive at > selinux_file_mprotect()? Now we have added the anon shared mmap case > which can arrive there, but the S_PRIVATE check seems just the right > tool for the job of distinguishing those from the user-visible inodes. > > I don't see how adding some other flag for this case would be better > - though certainly I can see that adding an "anon shared shmem" > comment on its use in that check would be helpful. > > Or is there some further difficulty in this use of S_PRIVATE, beyond > the mprotect case that you've mentioned? Unless there is some further > difficulty, duplicating all the code relating to S_PRIVATE for a > differently named flag seems counter-productive to me. S_PRIVATE is supposed to disable all security processing on the inode, and often this is checked in the security framework (security/security.c) even before we reach the SELinux hook and causes an immediate return there. In the case of mprotect, we do reach the SELinux code since the hook is on the vma, not merely the inode, so we could apply an execmem check in the SE
Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS
On 07/10/2015 03:48 AM, Hugh Dickins wrote: On Thu, 9 Jul 2015, Stephen Smalley wrote: On 07/09/2015 04:23 AM, Hugh Dickins wrote: On Wed, 8 Jul 2015, Stephen Smalley wrote: On 07/08/2015 09:13 AM, Stephen Smalley wrote: On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins hu...@google.com wrote: It appears that, at some point last year, XFS made directory handling changes which bring it into lockdep conflict with shmem_zero_setup(): it is surprising that mmap() can clone an inode while holding mmap_sem, but that has been so for many years. Since those few lockdep traces that I've seen all implicated selinux, I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which v3.13's commit c7277090927a (security: shmem: implement kernel private shmem inodes) introduced to avoid LSM checks on kernel-internal inodes: the mmap(/dev/zero) cloned inode is indeed a kernel-internal detail. This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers which cloned inode in mmap(), but if so, I cannot locate them now. This causes a regression for SELinux (please, in the future, cc selinux list and Paul Moore on SELinux-related changes). In Surprised and sorry about that, yes, I should have Cc'ed. particular, this change disables SELinux checking of mprotect PROT_EXEC on shared anonymous mappings, so we lose the ability to control executable mappings. That said, we are only getting that check today as a side effect of our file execute check on the tmpfs inode, whereas it would be better (and more consistent with the mmap-time checks) to apply an execmem check in that case, in which case we wouldn't care about the inode-based check. However, I am unclear on how to correctly detect that situation from selinux_file_mprotect() - file_map_prot_check(), because we do have a non-NULL vma-vm_file so we treat it as a file execute check. In contrast, if directly creating an anonymous shared mapping with PROT_EXEC via mmap(...PROT_EXEC...), selinux_mmap_file is called with a NULL file and therefore we end up applying an execmem check. If you're willing to go forward with the change, rather than just call for an immediate revert of it, then I think the right way to detect the situation would be to check IS_PRIVATE(file_inode(vma-vm_file)), wouldn't it? That seems misleading and might trigger execmem checks on non-shmem inodes. S_PRIVATE was originally introduced for fs-internal inodes that are never directly exposed to userspace, originally for reiserfs xattr inodes (reiserfs xattrs are internally implemented as their own files that are hidden from userspace) and later also applied to anon inodes. It would be better if we had an explicit way of testing that we are dealing with an anonymous shared mapping in selinux_file_mprotect() - file_map_prot_check(). But how would any of those original S_PRIVATE inodes arrive at selinux_file_mprotect()? Now we have added the anon shared mmap case which can arrive there, but the S_PRIVATE check seems just the right tool for the job of distinguishing those from the user-visible inodes. I don't see how adding some other flag for this case would be better - though certainly I can see that adding an anon shared shmem comment on its use in that check would be helpful. Or is there some further difficulty in this use of S_PRIVATE, beyond the mprotect case that you've mentioned? Unless there is some further difficulty, duplicating all the code relating to S_PRIVATE for a differently named flag seems counter-productive to me. S_PRIVATE is supposed to disable all security processing on the inode, and often this is checked in the security framework (security/security.c) even before we reach the SELinux hook and causes an immediate return there. In the case of mprotect, we do reach the SELinux code since the hook is on the vma, not merely the inode, so we could apply an execmem check in the SELinux code if IS_PRIVATE() instead of file execute. However, I was trying to figure out if the fact that S_PRIVATE also would disable any read/write checking by SELinux on the inode could potentially open up a bypass of security policy. That would only be an issue if the file returned by shmem_zero_setup() was ever linked to an open file descriptor that could be inherited across a fork+exec or passed across local socket IPC or binder IPC and thereby shared across different security contexts. Uses of shmem_zero_setup() include mmap MAP_ANONYMOUS|MAP_SHARED, drivers/staging/android/ashmem.c (from ashmem_mmap if VM_SHARED), and drivers/char/mem.c (from mmap_zero if VM_SHARED). That all seems fine AFAICS. (There is a bool shmem_mapping(mapping) that could be used to confirm that the inode you're looking at indeed belongs to shmem; but of course that would say yes on all the user-visible shmem inodes too, so it wouldn't be a useful test on its own, and I don't
Re: kdbus: credential faking
On 07/10/2015 05:05 AM, David Herrmann wrote: Hi On Fri, Jul 10, 2015 at 12:56 AM, Casey Schaufler ca...@schaufler-ca.com wrote: On 7/9/2015 3:22 PM, David Herrmann wrote: Regarding requiring CAP_SYS_ADMIN, I don't really see the point. In the kdbus security model, if you don't trust the bus-creator, you should not connect to the bus. That's fine in a discretionary access control model, but not in a mandatory access control model. The decision on trust of the other guy is never up to the process, it's up to the mandatory access control policy. Exactly. So LSMs are free to use a hook to limit faking other user's credentials. But why does that have to affect the default (which, in the case of kdbus, is a dac model)? A bus-creator can bypass kdbus policies, sniff on any transmission and modify bus behavior. It just seems logical to bind faked-metadata to the same privilege. However, I also have no strong feeling about that, if you place valid points. So please elaborate. Smack has to require CAP_MAC_ADMIN to allow a process to fake Smack metadata. This is exactly what CAP_MAC_ADMIN is for. Changing Smack metadata is considered a hugely dangerous activity. I'm totally fine with dropping support to fake seclabels, if LSM developers see no need for it. I, certainly, will not insist on it. With that in mind, I'd prefer if we limit this discussion to faking CREDS/PIDS. Well, based on your use case, we actually do need support for faking seclabels if we need support for faking credentials at all, because your proxy needs to be able to fake all of the credentials in order to be fully transparent and preserve compatibility. So I don't think they can be divorced from each other. Regardless, we will definitely want a hook for controlling this ability to fake credentials, and I think we would want to separately distinguish each of the cases that you currently lump under your single privileged boolean, as the ability to do one should not necessarily imply the ability to do them all. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] selinux: fix mprotect PROT_EXEC regression caused by mm change
commit 66fc13039422ba7df2d01a8ee0873e4ef965b50b (mm: shmem_zero_setup skip security check and lockdep conflict with XFS) caused a regression for SELinux by disabling any SELinux checking of mprotect PROT_EXEC on shared anonymous mappings. However, even before that regression, the checking on such mprotect PROT_EXEC calls was inconsistent with the checking on a mmap PROT_EXEC call for a shared anonymous mapping. On a mmap, the security hook is passed a NULL file and knows it is dealing with an anonymous mapping and therefore applies an execmem check and no file checks. On a mprotect, the security hook is passed a vma with a non-NULL vm_file (as this was set from the internally-created shmem file during mmap) and therefore applies the file-based execute check and no execmem check. Since the aforementioned commit now marks the shmem zero inode with the S_PRIVATE flag, the file checks are disabled and we have no checking at all on mprotect PROT_EXEC. Add a test to the mprotect hook logic for such private inodes, and apply an execmem check in that case. This makes the mmap and mprotect checking consistent for shared anonymous mappings, as well as for /dev/zero and ashmem. Signed-off-by: Stephen Smalley s...@tycho.nsa.gov --- security/selinux/hooks.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 6231081..564079c 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -3283,7 +3283,8 @@ static int file_map_prot_check(struct file *file, unsigned long prot, int shared int rc = 0; if (default_noexec - (prot PROT_EXEC) (!file || (!shared (prot PROT_WRITE { + (prot PROT_EXEC) (!file || IS_PRIVATE(file_inode(file)) || + (!shared (prot PROT_WRITE { /* * We are making executable an anonymous mapping or a * private file mapping that will also be writable. -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: credential faking
On 07/10/2015 09:43 AM, David Herrmann wrote: Hi On Fri, Jul 10, 2015 at 3:25 PM, Stephen Smalley s...@tycho.nsa.gov wrote: On 07/09/2015 06:22 PM, David Herrmann wrote: To be clear, faking metadata has one use-case, and one use-case only: dbus1 compatibility In dbus1, clients connect to a unix-socket placed in the file-system hierarchy. To avoid breaking ABI for old clients, we support a unix-kdbus proxy. This proxy is called systemd-bus-proxyd. It is spawned once for each bus we proxy and simply remarshals messages from the client to kdbus and vice versa. Is this truly necessary? Can't the distributions just update the client side libraries to use kdbus if enabled and be done with it? Doesn't this proxy undo many of the benefits of using kdbus in the first place? We need binary compatibility to dbus1. There're millions of applications and language bindings with dbus1 compiled in, which we cannot suddenly break. So, are you saying that there are many applications that statically link the dbus1 library implementation (thus the distributions can't just push an updated shared library that switches from using the socket to using kdbus), and that many of these applications are third party applications not packaged by the distributions (thus the distributions cannot just do a mass rebuild to update these applications too)? Otherwise, I would think that the use of a socket would just be an implementation detail and you would be free to change it without affecting dbus1 library ABI compatibility. With dbus1, clients can ask the dbus-daemon for the seclabel of a peer they talk to. They're free to use this information for any purpose. On kdbus, we want to be compatible to dbus-daemon. Therefore, if a native client queries kdbus for the seclabel of a peer behind a proxy, we want that query to return the actual seclabel of the peer, not the seclabel of the proxy. Same applies to PIDS and CREDS. This faked metadata is never used by the kernel for any security decisions. It's sole purpose is to return them if a native kdbus client queries another peer. Furthermore, this information is never transmitted as send-time metadata (as it is, in no way, send-time metadata), but only if you explicitly query the connection-time metadata of a peer (KDBUS_CMD_CONN_INFO). I guess I don't understand the difference. Is there a separate facility for obtaining the send-time metadata that is not subject to credential faking? Each message carries metadata of the sender, that was collected at the time of _SEND_. This metadata cannot be faked. Additionally (for introspection and dbus1 compat), kdbus allows peers to query metadata of other peers, that were collected at the time of _CONNECT_. Privileged peers can provide faked _connection_ metadata, which has the side-effect of suppressing send-time metadata. It is up to the receiver to request connection-metadata if a message did not carry send-time metadata. We do this, currently, only to support legacy dbus1 clients which do not support send-time metadata. So the privileged peer (which just means the bus owner, which can be completely unprivileged from a typical DAC perspective) can both prevent the receiver from getting the (real, unfakeable) send-time metadata and supply arbitrary fake credentials for the connection metadata? And the legacy dbus1 clients (i.e. all current DBUS applications?) will always use this potentially faked metadata. Meanwhile, what about new dbus clients? What is the standard behavior for them when the send-time metadata is suppressed? Do they always fall back to the connection metadata? Regarding requiring CAP_SYS_ADMIN, I don't really see the point. In the kdbus security model, if you don't trust the bus-creator, you should not connect to the bus. A bus-creator can bypass kdbus policies, sniff on any transmission and modify bus behavior. It just seems logical to bind faked-metadata to the same privilege. However, I also have no strong feeling about that, if you place valid points. So please elaborate. But, please be aware that if we require privileges to fake metadata, then you need to have such privileges to provide a dbus1 proxy for your native bus on kdbus. In other words, users are able to create session/user buses, but they need CAP_SYS_ADMIN to spawn the dbus1 proxy. This will have the net-effect of us requiring to run the proxy as root (which, I think, is worse than allowing bus-owners to fake _connection_ metadata). Applications have a reasonable expectation that credentials supplied by the kernel for a peer are trustworthy. Allowing unprivileged users to forge arbitrary credentials and pids seems fraught with peril. You say that one should never connect to a bus if you do not trust its creator. What mechanisms are provided to allow me to determine whether I trust the bus creator before connecting? Are those mechanisms automatically employed by default? Regarding
Re: kdbus: credential faking
On 07/09/2015 06:22 PM, David Herrmann wrote: Hi On Thu, Jul 9, 2015 at 8:26 PM, Stephen Smalley s...@tycho.nsa.gov wrote: Hi, I have a concern with the support for faked credentials in kdbus, but don't know enough about the original motivation or intended use case to evaluate it concretely. I raised this issue during the kdbus for 4.1-rc1 thread a while back but none of the kdbus maintainers responded, Sorry, some mails might have been gone unanswered in that huge thread. Please feel free to ping us about anything we didn't comment on. See below.. and the one D-BUS maintainer who did respond said that there is no API in dbus-daemon for faking client credentials, so this is not something inherited from dbus-daemon or required for compatibility with it. First, I have doubts as to whether there should be any way to fake the seclabel, no matter how privileged the caller. Unless there is a clear use case for that functionality, I would prefer to see it dropped altogether. Second, IIUC, the ability to fake any portion of the credentials or pids is granted if the caller either has CAP_IPC_OWNER or owns the bus (uid match). Clearly that isn't sufficient basis for seclabel faking, and it seems questionable as to whether it should be sufficient for faking any of the other credentials or pids. Compare with e.g. net/core/scm.c:scm_check_creds() logic for faking credentials on a Unix domain socket, which requires CAP_SYS_ADMIN for faking pid, CAP_SETUID for faking any of the uid fields, and CAP_SETGID for faking any of the gid fields. Thanks for any light you can shed on the matter. To be clear, faking metadata has one use-case, and one use-case only: dbus1 compatibility In dbus1, clients connect to a unix-socket placed in the file-system hierarchy. To avoid breaking ABI for old clients, we support a unix-kdbus proxy. This proxy is called systemd-bus-proxyd. It is spawned once for each bus we proxy and simply remarshals messages from the client to kdbus and vice versa. Is this truly necessary? Can't the distributions just update the client side libraries to use kdbus if enabled and be done with it? Doesn't this proxy undo many of the benefits of using kdbus in the first place? With dbus1, clients can ask the dbus-daemon for the seclabel of a peer they talk to. They're free to use this information for any purpose. On kdbus, we want to be compatible to dbus-daemon. Therefore, if a native client queries kdbus for the seclabel of a peer behind a proxy, we want that query to return the actual seclabel of the peer, not the seclabel of the proxy. Same applies to PIDS and CREDS. This faked metadata is never used by the kernel for any security decisions. It's sole purpose is to return them if a native kdbus client queries another peer. Furthermore, this information is never transmitted as send-time metadata (as it is, in no way, send-time metadata), but only if you explicitly query the connection-time metadata of a peer (KDBUS_CMD_CONN_INFO). I guess I don't understand the difference. Is there a separate facility for obtaining the send-time metadata that is not subject to credential faking? Regarding requiring CAP_SYS_ADMIN, I don't really see the point. In the kdbus security model, if you don't trust the bus-creator, you should not connect to the bus. A bus-creator can bypass kdbus policies, sniff on any transmission and modify bus behavior. It just seems logical to bind faked-metadata to the same privilege. However, I also have no strong feeling about that, if you place valid points. So please elaborate. But, please be aware that if we require privileges to fake metadata, then you need to have such privileges to provide a dbus1 proxy for your native bus on kdbus. In other words, users are able to create session/user buses, but they need CAP_SYS_ADMIN to spawn the dbus1 proxy. This will have the net-effect of us requiring to run the proxy as root (which, I think, is worse than allowing bus-owners to fake _connection_ metadata). Applications have a reasonable expectation that credentials supplied by the kernel for a peer are trustworthy. Allowing unprivileged users to forge arbitrary credentials and pids seems fraught with peril. You say that one should never connect to a bus if you do not trust its creator. What mechanisms are provided to allow me to determine whether I trust the bus creator before connecting? Are those mechanisms automatically employed by default? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: credential faking
On 07/10/2015 12:48 PM, David Herrmann wrote: Hi On Fri, Jul 10, 2015 at 4:47 PM, Stephen Smalley s...@tycho.nsa.gov wrote: On 07/10/2015 09:43 AM, David Herrmann wrote: On Fri, Jul 10, 2015 at 3:25 PM, Stephen Smalley s...@tycho.nsa.gov wrote: On 07/09/2015 06:22 PM, David Herrmann wrote: With dbus1, clients can ask the dbus-daemon for the seclabel of a peer they talk to. They're free to use this information for any purpose. On kdbus, we want to be compatible to dbus-daemon. Therefore, if a native client queries kdbus for the seclabel of a peer behind a proxy, we want that query to return the actual seclabel of the peer, not the seclabel of the proxy. Same applies to PIDS and CREDS. This faked metadata is never used by the kernel for any security decisions. It's sole purpose is to return them if a native kdbus client queries another peer. Furthermore, this information is never transmitted as send-time metadata (as it is, in no way, send-time metadata), but only if you explicitly query the connection-time metadata of a peer (KDBUS_CMD_CONN_INFO). I guess I don't understand the difference. Is there a separate facility for obtaining the send-time metadata that is not subject to credential faking? Each message carries metadata of the sender, that was collected at the time of _SEND_. This metadata cannot be faked. Additionally (for introspection and dbus1 compat), kdbus allows peers to query metadata of other peers, that were collected at the time of _CONNECT_. Privileged peers can provide faked _connection_ metadata, which has the side-effect of suppressing send-time metadata. It is up to the receiver to request connection-metadata if a message did not carry send-time metadata. We do this, currently, only to support legacy dbus1 clients which do not support send-time metadata. So the privileged peer (which just means the bus owner, which can be completely unprivileged from a typical DAC perspective) can both prevent the receiver from getting the (real, unfakeable) send-time metadata and supply arbitrary fake credentials for the connection metadata? And the (Limited to PIDS/CREDS/SECLABEL metadata, but) yes. Note that this is all under the assumption that you never connect to a bus owned by someone else but you or root. Hence, a peer can only fake metadata, if it can also ptrace you. If you don't enforce this assumption in kdbus, then you can't be sure that it won't be violated by future userspace. Also, the statement about ptrace doesn't hold when using SELinux or other security modules. legacy dbus1 clients (i.e. all current DBUS applications?) will always use this potentially faked metadata. Meanwhile, what about new dbus clients? What is the standard behavior for them when the send-time metadata is suppressed? Do they always fall back to the connection metadata? This is a decision user-space has to make. In sd-bus, if we trust the bus (root owned, or our own), we always fall back to connection metadata. So the only benefit of the credentials in the send-time metadata is they come for free rather than needing to be separately queried? And aside from credential faking (impersonation being a nicer name), when else would they differ from the connection metadata? If the program does a setuid or something after creating the connection? Regarding requiring CAP_SYS_ADMIN, I don't really see the point. In the kdbus security model, if you don't trust the bus-creator, you should not connect to the bus. A bus-creator can bypass kdbus policies, sniff on any transmission and modify bus behavior. It just seems logical to bind faked-metadata to the same privilege. However, I also have no strong feeling about that, if you place valid points. So please elaborate. But, please be aware that if we require privileges to fake metadata, then you need to have such privileges to provide a dbus1 proxy for your native bus on kdbus. In other words, users are able to create session/user buses, but they need CAP_SYS_ADMIN to spawn the dbus1 proxy. This will have the net-effect of us requiring to run the proxy as root (which, I think, is worse than allowing bus-owners to fake _connection_ metadata). Applications have a reasonable expectation that credentials supplied by the kernel for a peer are trustworthy. Allowing unprivileged users to forge arbitrary credentials and pids seems fraught with peril. You say that one should never connect to a bus if you do not trust its creator. What mechanisms are provided to allow me to determine whether I trust the bus creator before connecting? Are those mechanisms automatically employed by default? Regarding the default security model (uid based), each bus is prefixed by the uid of the bus-owner. This is enforced by the kernel. Hence, a process cannot 'accidentally' connect to a bus of a user they don't trust. And how do they go about looking up / obtaining the destination bus name in the first
Re: [RFC 5/8] kdbus: use LSM hooks in kdbus code
On 07/08/2015 09:37 AM, Stephen Smalley wrote: On 07/08/2015 06:25 AM, Paul Osmialowski wrote: Originates from: https://github.com/lmctl/kdbus.git (branch: kdbus-lsm-v4.for-systemd-v212) commit: aa0885489d19be92fa41c6f0a71df28763228a40 Signed-off-by: Karol Lewandowski k.lewando...@samsung.com Signed-off-by: Paul Osmialowski p.osmialo...@samsung.com --- ipc/kdbus/bus.c| 12 ++- ipc/kdbus/bus.h| 3 +++ ipc/kdbus/connection.c | 54 ++ ipc/kdbus/connection.h | 4 ipc/kdbus/domain.c | 9 - ipc/kdbus/domain.h | 2 ++ ipc/kdbus/endpoint.c | 11 ++ ipc/kdbus/names.c | 11 ++ ipc/kdbus/queue.c | 30 ++-- 9 files changed, 124 insertions(+), 12 deletions(-) diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c index 9993753..b85cdc7 100644 --- a/ipc/kdbus/connection.c +++ b/ipc/kdbus/connection.c @@ -31,6 +31,7 @@ #include linux/slab.h #include linux/syscalls.h #include linux/uio.h +#include linux/security.h #include bus.h #include connection.h @@ -73,6 +74,8 @@ static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep *ep, bool privileged, bool is_activator; bool is_monitor; struct kvec kvec; +u32 sid, len; +char *label; int ret; struct { @@ -222,6 +225,14 @@ static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep *ep, bool privileged, } } +security_task_getsecid(current, sid); +security_secid_to_secctx(sid, label, len); +ret = security_kdbus_connect(conn, label, len); +if (ret) { +ret = -EPERM; +goto exit_unref; +} This seems convoluted and expensive. If you always want the label of the current task here, then why not just have security_kdbus_connect() internally extract the label of the current task? @@ -1107,6 +1119,12 @@ static int kdbus_conn_reply(struct kdbus_conn *src, struct kdbus_kmsg *kmsg) if (ret 0) goto exit; +ret = security_kdbus_talk(src, dst); +if (ret) { +ret = -EPERM; +goto exit; +} Where does kdbus apply its uid-based or other restrictions on connections? Why do we need to insert separate hooks into each of these functions? Is there no central chokepoint already for permission checking that we can hook? For example, why wouldn't you insert a single hook into kdbus_conn_policy_talk() where they perform their DAC checking? You would need to restructure it slightly to ensure that the security hook is only called if it passes the DAC (privileged || uid_eq) check so that we do not trigger MAC denials when DAC wouldn't have allowed it anyway. Also, kdbus_conn_policy_talk() takes a separate conn_creds argument - that should be passed through to the hook as well. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 5/8] kdbus: use LSM hooks in kdbus code
On 07/08/2015 09:37 AM, Stephen Smalley wrote: On 07/08/2015 06:25 AM, Paul Osmialowski wrote: Originates from: https://github.com/lmctl/kdbus.git (branch: kdbus-lsm-v4.for-systemd-v212) commit: aa0885489d19be92fa41c6f0a71df28763228a40 Signed-off-by: Karol Lewandowski k.lewando...@samsung.com Signed-off-by: Paul Osmialowski p.osmialo...@samsung.com --- ipc/kdbus/bus.c| 12 ++- ipc/kdbus/bus.h| 3 +++ ipc/kdbus/connection.c | 54 ++ ipc/kdbus/connection.h | 4 ipc/kdbus/domain.c | 9 - ipc/kdbus/domain.h | 2 ++ ipc/kdbus/endpoint.c | 11 ++ ipc/kdbus/names.c | 11 ++ ipc/kdbus/queue.c | 30 ++-- 9 files changed, 124 insertions(+), 12 deletions(-) diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c index 9993753..b85cdc7 100644 --- a/ipc/kdbus/connection.c +++ b/ipc/kdbus/connection.c @@ -31,6 +31,7 @@ #include linux/slab.h #include linux/syscalls.h #include linux/uio.h +#include linux/security.h #include bus.h #include connection.h @@ -73,6 +74,8 @@ static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep *ep, bool privileged, bool is_activator; bool is_monitor; struct kvec kvec; +u32 sid, len; +char *label; int ret; struct { @@ -222,6 +225,14 @@ static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep *ep, bool privileged, } } +security_task_getsecid(current, sid); +security_secid_to_secctx(sid, label, len); +ret = security_kdbus_connect(conn, label, len); +if (ret) { +ret = -EPERM; +goto exit_unref; +} This seems convoluted and expensive. If you always want the label of the current task here, then why not just have security_kdbus_connect() internally extract the label of the current task? Furthermore, why do we need a separate security field and copy of the current label in the conn-security, when we already have conn-cred-security available to us? I don't think we need new security fields unless we are going to assign some kind of object labeling to these structures separate from their cred, and offhand I don't see why we would do that. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
kdbus: credential faking
Hi, I have a concern with the support for faked credentials in kdbus, but don't know enough about the original motivation or intended use case to evaluate it concretely. I raised this issue during the "kdbus for 4.1-rc1" thread a while back but none of the kdbus maintainers responded, and the one D-BUS maintainer who did respond said that there is no API in dbus-daemon for faking client credentials, so this is not something inherited from dbus-daemon or required for compatibility with it. First, I have doubts as to whether there should be any way to fake the seclabel, no matter how "privileged" the caller. Unless there is a clear use case for that functionality, I would prefer to see it dropped altogether. Second, IIUC, the ability to fake any portion of the credentials or pids is granted if the caller either has CAP_IPC_OWNER or owns the bus (uid match). Clearly that isn't sufficient basis for seclabel faking, and it seems questionable as to whether it should be sufficient for faking any of the other credentials or pids. Compare with e.g. net/core/scm.c:scm_check_creds() logic for faking credentials on a Unix domain socket, which requires CAP_SYS_ADMIN for faking pid, CAP_SETUID for faking any of the uid fields, and CAP_SETGID for faking any of the gid fields. Thanks for any light you can shed on the matter. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS
On 07/09/2015 04:23 AM, Hugh Dickins wrote: > On Wed, 8 Jul 2015, Stephen Smalley wrote: >> On 07/08/2015 09:13 AM, Stephen Smalley wrote: >>> On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins wrote: >>>> It appears that, at some point last year, XFS made directory handling >>>> changes which bring it into lockdep conflict with shmem_zero_setup(): >>>> it is surprising that mmap() can clone an inode while holding mmap_sem, >>>> but that has been so for many years. >>>> >>>> Since those few lockdep traces that I've seen all implicated selinux, >>>> I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which >>>> v3.13's commit c7277090927a ("security: shmem: implement kernel private >>>> shmem inodes") introduced to avoid LSM checks on kernel-internal inodes: >>>> the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail. >>>> >>>> This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero >>>> (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers >>>> which cloned inode in mmap(), but if so, I cannot locate them now. >>> >>> This causes a regression for SELinux (please, in the future, cc >>> selinux list and Paul Moore on SELinux-related changes). In > > Surprised and sorry about that, yes, I should have Cc'ed. > >>> particular, this change disables SELinux checking of mprotect >>> PROT_EXEC on shared anonymous mappings, so we lose the ability to >>> control executable mappings. That said, we are only getting that >>> check today as a side effect of our file execute check on the tmpfs >>> inode, whereas it would be better (and more consistent with the >>> mmap-time checks) to apply an execmem check in that case, in which >>> case we wouldn't care about the inode-based check. However, I am >>> unclear on how to correctly detect that situation from >>> selinux_file_mprotect() -> file_map_prot_check(), because we do have a >>> non-NULL vma->vm_file so we treat it as a file execute check. In >>> contrast, if directly creating an anonymous shared mapping with >>> PROT_EXEC via mmap(...PROT_EXEC...), selinux_mmap_file is called with >>> a NULL file and therefore we end up applying an execmem check. > > If you're willing to go forward with the change, rather than just call > for an immediate revert of it, then I think the right way to detect > the situation would be to check IS_PRIVATE(file_inode(vma->vm_file)), > wouldn't it? That seems misleading and might trigger execmem checks on non-shmem inodes. S_PRIVATE was originally introduced for fs-internal inodes that are never directly exposed to userspace, originally for reiserfs xattr inodes (reiserfs xattrs are internally implemented as their own files that are hidden from userspace) and later also applied to anon inodes. It would be better if we had an explicit way of testing that we are dealing with an anonymous shared mapping in selinux_file_mprotect() -> file_map_prot_check(). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
kdbus: credential faking
Hi, I have a concern with the support for faked credentials in kdbus, but don't know enough about the original motivation or intended use case to evaluate it concretely. I raised this issue during the kdbus for 4.1-rc1 thread a while back but none of the kdbus maintainers responded, and the one D-BUS maintainer who did respond said that there is no API in dbus-daemon for faking client credentials, so this is not something inherited from dbus-daemon or required for compatibility with it. First, I have doubts as to whether there should be any way to fake the seclabel, no matter how privileged the caller. Unless there is a clear use case for that functionality, I would prefer to see it dropped altogether. Second, IIUC, the ability to fake any portion of the credentials or pids is granted if the caller either has CAP_IPC_OWNER or owns the bus (uid match). Clearly that isn't sufficient basis for seclabel faking, and it seems questionable as to whether it should be sufficient for faking any of the other credentials or pids. Compare with e.g. net/core/scm.c:scm_check_creds() logic for faking credentials on a Unix domain socket, which requires CAP_SYS_ADMIN for faking pid, CAP_SETUID for faking any of the uid fields, and CAP_SETGID for faking any of the gid fields. Thanks for any light you can shed on the matter. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS
On 07/09/2015 04:23 AM, Hugh Dickins wrote: On Wed, 8 Jul 2015, Stephen Smalley wrote: On 07/08/2015 09:13 AM, Stephen Smalley wrote: On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins hu...@google.com wrote: It appears that, at some point last year, XFS made directory handling changes which bring it into lockdep conflict with shmem_zero_setup(): it is surprising that mmap() can clone an inode while holding mmap_sem, but that has been so for many years. Since those few lockdep traces that I've seen all implicated selinux, I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which v3.13's commit c7277090927a (security: shmem: implement kernel private shmem inodes) introduced to avoid LSM checks on kernel-internal inodes: the mmap(/dev/zero) cloned inode is indeed a kernel-internal detail. This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers which cloned inode in mmap(), but if so, I cannot locate them now. This causes a regression for SELinux (please, in the future, cc selinux list and Paul Moore on SELinux-related changes). In Surprised and sorry about that, yes, I should have Cc'ed. particular, this change disables SELinux checking of mprotect PROT_EXEC on shared anonymous mappings, so we lose the ability to control executable mappings. That said, we are only getting that check today as a side effect of our file execute check on the tmpfs inode, whereas it would be better (and more consistent with the mmap-time checks) to apply an execmem check in that case, in which case we wouldn't care about the inode-based check. However, I am unclear on how to correctly detect that situation from selinux_file_mprotect() - file_map_prot_check(), because we do have a non-NULL vma-vm_file so we treat it as a file execute check. In contrast, if directly creating an anonymous shared mapping with PROT_EXEC via mmap(...PROT_EXEC...), selinux_mmap_file is called with a NULL file and therefore we end up applying an execmem check. If you're willing to go forward with the change, rather than just call for an immediate revert of it, then I think the right way to detect the situation would be to check IS_PRIVATE(file_inode(vma-vm_file)), wouldn't it? That seems misleading and might trigger execmem checks on non-shmem inodes. S_PRIVATE was originally introduced for fs-internal inodes that are never directly exposed to userspace, originally for reiserfs xattr inodes (reiserfs xattrs are internally implemented as their own files that are hidden from userspace) and later also applied to anon inodes. It would be better if we had an explicit way of testing that we are dealing with an anonymous shared mapping in selinux_file_mprotect() - file_map_prot_check(). -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 4.2-rc1
On 07/08/2015 01:47 PM, Casey Schaufler wrote: > On 7/8/2015 10:29 AM, Linus Torvalds wrote: >> On Wed, Jul 8, 2015 at 10:17 AM, Linus Torvalds >> wrote: >>> Decoding the "Code:" line shows that this is the "->fw_id" dereference in >>> >>> if (add_uevent_var(env, "FIRMWARE=%s", fw_priv->buf->fw_id)) >>> return -ENOMEM; >>> >>> and that "fw_priv->buf" pointer is NULL. >>> >>> However, I don't see anything that looks like it should have changed >>> any of this since 4.1. >> Looking at the otehr uses of "fw_priv->buf", they all check that >> pointer for NULL. I see code like >> >> fw_buf = fw_priv->buf; >> if (!fw_buf) >> goto out; >> >> etc. >> >> Also, it looks like you need to hold the "fw_lock" to even look at >> that pointer, since the buffer can get reallocated etc. >> >> So that uevent code really looks buggy. It just doesn't look like a >> *new* bug to me. That code looks old, going back to 2012 and commit >> 1244691c73b2. > > There have been SELinux changes to kernfs for 4.2. William, > you might want to have a look here. What change are you referring to? I see no SELinux-related changes to kernfs in 4.2-rc1. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS
On 07/08/2015 09:13 AM, Stephen Smalley wrote: > On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins wrote: >> It appears that, at some point last year, XFS made directory handling >> changes which bring it into lockdep conflict with shmem_zero_setup(): >> it is surprising that mmap() can clone an inode while holding mmap_sem, >> but that has been so for many years. >> >> Since those few lockdep traces that I've seen all implicated selinux, >> I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which >> v3.13's commit c7277090927a ("security: shmem: implement kernel private >> shmem inodes") introduced to avoid LSM checks on kernel-internal inodes: >> the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail. >> >> This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero >> (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers >> which cloned inode in mmap(), but if so, I cannot locate them now. > > This causes a regression for SELinux (please, in the future, cc > selinux list and Paul Moore on SELinux-related changes). In > particular, this change disables SELinux checking of mprotect > PROT_EXEC on shared anonymous mappings, so we lose the ability to > control executable mappings. That said, we are only getting that > check today as a side effect of our file execute check on the tmpfs > inode, whereas it would be better (and more consistent with the > mmap-time checks) to apply an execmem check in that case, in which > case we wouldn't care about the inode-based check. However, I am > unclear on how to correctly detect that situation from > selinux_file_mprotect() -> file_map_prot_check(), because we do have a > non-NULL vma->vm_file so we treat it as a file execute check. In > contrast, if directly creating an anonymous shared mapping with > PROT_EXEC via mmap(...PROT_EXEC...), selinux_mmap_file is called with > a NULL file and therefore we end up applying an execmem check. Also, can you provide the lockdep traces that motivated this change? > >> >> Reported-and-tested-by: Prarit Bhargava >> Reported-by: Daniel Wagner >> Reported-by: Morten Stevens >> Signed-off-by: Hugh Dickins >> --- >> >> mm/shmem.c |8 +++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> --- 4.1-rc7/mm/shmem.c 2015-04-26 19:16:31.352191298 -0700 >> +++ linux/mm/shmem.c2015-06-14 09:26:49.461120166 -0700 >> @@ -3401,7 +3401,13 @@ int shmem_zero_setup(struct vm_area_stru >> struct file *file; >> loff_t size = vma->vm_end - vma->vm_start; >> >> - file = shmem_file_setup("dev/zero", size, vma->vm_flags); >> + /* >> +* Cloning a new file under mmap_sem leads to a lock ordering >> conflict >> +* between XFS directory reading and selinux: since this file is only >> +* accessible to the user through its mapping, use S_PRIVATE flag to >> +* bypass file security, in the same way as >> shmem_kernel_file_setup(). >> +*/ >> + file = __shmem_file_setup("dev/zero", size, vma->vm_flags, >> S_PRIVATE); >> if (IS_ERR(file)) >> return PTR_ERR(file); >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ > ___ > Selinux mailing list > seli...@tycho.nsa.gov > To unsubscribe, send email to selinux-le...@tycho.nsa.gov. > To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov. > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/8] lsm: smack: smack callbacks for kdbus security hooks
On 07/08/2015 06:25 AM, Paul Osmialowski wrote: > This adds implementation of three smack callbacks sitting behind kdbus > security hooks as proposed by Karol Lewandowski. > > Originates from: > > git://git.infradead.org/users/pcmoore/selinux (branch: working-kdbus) > commit: fc3505d058c001fe72a6f66b833e0be5b2d118f3 > > https://github.com/lmctl/linux.git (branch: kdbus-lsm-v4.for-systemd-v212) > commit: 103c26fd27d1ec8c32d85dd3d85681f936ac66fb > > Signed-off-by: Karol Lewandowski > Signed-off-by: Paul Osmialowski > --- > security/smack/smack_lsm.c | 68 > ++ > 1 file changed, 68 insertions(+) > > diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c > index a143328..033b756 100644 > --- a/security/smack/smack_lsm.c > +++ b/security/smack/smack_lsm.c > @@ -41,6 +41,7 @@ > #include > #include > #include > +#include > #include "smack.h" > > #define TRANS_TRUE "TRUE" > @@ -3336,6 +3337,69 @@ static int smack_setprocattr(struct task_struct *p, > char *name, > } > > /** > + * smack_kdbus_connect - Set the security blob for a KDBus connection > + * @conn: the connection > + * @secctx: smack label > + * @seclen: smack label length > + * > + * Returns 0 > + */ > +static int smack_kdbus_connect(struct kdbus_conn *conn, > +const char *secctx, u32 seclen) > +{ > + struct smack_known *skp; > + > + if (secctx && seclen > 0) > + skp = smk_import_entry(secctx, seclen); > + else > + skp = smk_of_current(); > + conn->security = skp; > + > + return 0; > +} > + > +/** > + * smack_kdbus_conn_free - Clear the security blob for a KDBus connection > + * @conn: the connection > + * > + * Clears the blob pointer > + */ > +static void smack_kdbus_conn_free(struct kdbus_conn *conn) > +{ > + conn->security = NULL; > +} > + > +/** > + * smack_kdbus_talk - Smack access on KDBus > + * @src: source kdbus connection > + * @dst: destination kdbus connection > + * > + * Return 0 if a subject with the smack of sock could access > + * an object with the smack of other, otherwise an error code > + */ > +static int smack_kdbus_talk(const struct kdbus_conn *src, > + const struct kdbus_conn *dst) > +{ > + struct smk_audit_info ad; > + struct smack_known *sskp = src->security; > + struct smack_known *dskp = dst->security; > + int ret; > + > + BUG_ON(sskp == NULL); > + BUG_ON(dskp == NULL); > + > + if (smack_privileged(CAP_MAC_OVERRIDE)) > + return 0; > + > + smk_ad_init(, __func__, LSM_AUDIT_DATA_NONE); > + > + ret = smk_access(sskp, dskp, MAY_WRITE, ); > + if (ret) > + return ret; > + return 0; > +} > + > +/** > * smack_unix_stream_connect - Smack access on UDS > * @sock: one sock > * @other: the other sock > @@ -4393,6 +4457,10 @@ struct security_hook_list smack_hooks[] = { > LSM_HOOK_INIT(inode_notifysecctx, smack_inode_notifysecctx), > LSM_HOOK_INIT(inode_setsecctx, smack_inode_setsecctx), > LSM_HOOK_INIT(inode_getsecctx, smack_inode_getsecctx), > + > + LSM_HOOK_INIT(kdbus_connect, smack_kdbus_connect), > + LSM_HOOK_INIT(kdbus_conn_free, smack_kdbus_conn_free), > + LSM_HOOK_INIT(kdbus_talk, smack_kdbus_talk), > }; If Smack only truly needs 3 hooks, then it begs the question of why there are so many other hooks defined. Are the other hooks just to support finer-grained distinctions, or is Smack's coverage incomplete? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 5/8] kdbus: use LSM hooks in kdbus code
On 07/08/2015 06:25 AM, Paul Osmialowski wrote: > Originates from: > > https://github.com/lmctl/kdbus.git (branch: kdbus-lsm-v4.for-systemd-v212) > commit: aa0885489d19be92fa41c6f0a71df28763228a40 > > Signed-off-by: Karol Lewandowski > Signed-off-by: Paul Osmialowski > --- > ipc/kdbus/bus.c| 12 ++- > ipc/kdbus/bus.h| 3 +++ > ipc/kdbus/connection.c | 54 > ++ > ipc/kdbus/connection.h | 4 > ipc/kdbus/domain.c | 9 - > ipc/kdbus/domain.h | 2 ++ > ipc/kdbus/endpoint.c | 11 ++ > ipc/kdbus/names.c | 11 ++ > ipc/kdbus/queue.c | 30 ++-- > 9 files changed, 124 insertions(+), 12 deletions(-) > > > diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c > index 9993753..b85cdc7 100644 > --- a/ipc/kdbus/connection.c > +++ b/ipc/kdbus/connection.c > @@ -31,6 +31,7 @@ > #include > #include > #include > +#include > > #include "bus.h" > #include "connection.h" > @@ -73,6 +74,8 @@ static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep > *ep, bool privileged, > bool is_activator; > bool is_monitor; > struct kvec kvec; > + u32 sid, len; > + char *label; > int ret; > > struct { > @@ -222,6 +225,14 @@ static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep > *ep, bool privileged, > } > } > > + security_task_getsecid(current, ); > + security_secid_to_secctx(sid, , ); > + ret = security_kdbus_connect(conn, label, len); > + if (ret) { > + ret = -EPERM; > + goto exit_unref; > + } This seems convoluted and expensive. If you always want the label of the current task here, then why not just have security_kdbus_connect() internally extract the label of the current task? > @@ -1107,6 +1119,12 @@ static int kdbus_conn_reply(struct kdbus_conn *src, > struct kdbus_kmsg *kmsg) > if (ret < 0) > goto exit; > > + ret = security_kdbus_talk(src, dst); > + if (ret) { > + ret = -EPERM; > + goto exit; > + } Where does kdbus apply its uid-based or other restrictions on connections? Why do we need to insert separate hooks into each of these functions? Is there no central chokepoint already for permission checking that we can hook? > diff --git a/ipc/kdbus/connection.h b/ipc/kdbus/connection.h > index d1ffe90..1f91d39 100644 > --- a/ipc/kdbus/connection.h > +++ b/ipc/kdbus/connection.h > @@ -19,6 +19,7 @@ > #include > #include > #include > +#include > > #include "limits.h" > #include "metadata.h" > @@ -73,6 +74,7 @@ struct kdbus_kmsg; > * @names_queue_list:Well-known names this connection waits for > * @privileged: Whether this connection is privileged on the bus > * @faked_meta: Whether the metadata was faked on HELLO > + * @security:LSM security blob > */ > struct kdbus_conn { > struct kref kref; > @@ -113,6 +115,8 @@ struct kdbus_conn { > > bool privileged:1; > bool faked_meta:1; > + > + void *security; > }; Unless I missed it, you may have missed the most important thing of all: controlling kdbus's notion of "privileged". kdbus sets privileged to true if the process has CAP_IPC_OWNER or the process euid matches the uid of the bus creator, and then it allows those processes to do many dangerous things, including monitoring all traffic, impersonating credentials, pids, or seclabel, etc. I don't believe we should ever permit impersonating seclabel information. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS
On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins wrote: > It appears that, at some point last year, XFS made directory handling > changes which bring it into lockdep conflict with shmem_zero_setup(): > it is surprising that mmap() can clone an inode while holding mmap_sem, > but that has been so for many years. > > Since those few lockdep traces that I've seen all implicated selinux, > I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which > v3.13's commit c7277090927a ("security: shmem: implement kernel private > shmem inodes") introduced to avoid LSM checks on kernel-internal inodes: > the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail. > > This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero > (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers > which cloned inode in mmap(), but if so, I cannot locate them now. This causes a regression for SELinux (please, in the future, cc selinux list and Paul Moore on SELinux-related changes). In particular, this change disables SELinux checking of mprotect PROT_EXEC on shared anonymous mappings, so we lose the ability to control executable mappings. That said, we are only getting that check today as a side effect of our file execute check on the tmpfs inode, whereas it would be better (and more consistent with the mmap-time checks) to apply an execmem check in that case, in which case we wouldn't care about the inode-based check. However, I am unclear on how to correctly detect that situation from selinux_file_mprotect() -> file_map_prot_check(), because we do have a non-NULL vma->vm_file so we treat it as a file execute check. In contrast, if directly creating an anonymous shared mapping with PROT_EXEC via mmap(...PROT_EXEC...), selinux_mmap_file is called with a NULL file and therefore we end up applying an execmem check. > > Reported-and-tested-by: Prarit Bhargava > Reported-by: Daniel Wagner > Reported-by: Morten Stevens > Signed-off-by: Hugh Dickins > --- > > mm/shmem.c |8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > --- 4.1-rc7/mm/shmem.c 2015-04-26 19:16:31.352191298 -0700 > +++ linux/mm/shmem.c2015-06-14 09:26:49.461120166 -0700 > @@ -3401,7 +3401,13 @@ int shmem_zero_setup(struct vm_area_stru > struct file *file; > loff_t size = vma->vm_end - vma->vm_start; > > - file = shmem_file_setup("dev/zero", size, vma->vm_flags); > + /* > +* Cloning a new file under mmap_sem leads to a lock ordering conflict > +* between XFS directory reading and selinux: since this file is only > +* accessible to the user through its mapping, use S_PRIVATE flag to > +* bypass file security, in the same way as shmem_kernel_file_setup(). > +*/ > + file = __shmem_file_setup("dev/zero", size, vma->vm_flags, S_PRIVATE); > if (IS_ERR(file)) > return PTR_ERR(file); > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 5/8] kdbus: use LSM hooks in kdbus code
On 07/08/2015 06:25 AM, Paul Osmialowski wrote: Originates from: https://github.com/lmctl/kdbus.git (branch: kdbus-lsm-v4.for-systemd-v212) commit: aa0885489d19be92fa41c6f0a71df28763228a40 Signed-off-by: Karol Lewandowski k.lewando...@samsung.com Signed-off-by: Paul Osmialowski p.osmialo...@samsung.com --- ipc/kdbus/bus.c| 12 ++- ipc/kdbus/bus.h| 3 +++ ipc/kdbus/connection.c | 54 ++ ipc/kdbus/connection.h | 4 ipc/kdbus/domain.c | 9 - ipc/kdbus/domain.h | 2 ++ ipc/kdbus/endpoint.c | 11 ++ ipc/kdbus/names.c | 11 ++ ipc/kdbus/queue.c | 30 ++-- 9 files changed, 124 insertions(+), 12 deletions(-) diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c index 9993753..b85cdc7 100644 --- a/ipc/kdbus/connection.c +++ b/ipc/kdbus/connection.c @@ -31,6 +31,7 @@ #include linux/slab.h #include linux/syscalls.h #include linux/uio.h +#include linux/security.h #include bus.h #include connection.h @@ -73,6 +74,8 @@ static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep *ep, bool privileged, bool is_activator; bool is_monitor; struct kvec kvec; + u32 sid, len; + char *label; int ret; struct { @@ -222,6 +225,14 @@ static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep *ep, bool privileged, } } + security_task_getsecid(current, sid); + security_secid_to_secctx(sid, label, len); + ret = security_kdbus_connect(conn, label, len); + if (ret) { + ret = -EPERM; + goto exit_unref; + } This seems convoluted and expensive. If you always want the label of the current task here, then why not just have security_kdbus_connect() internally extract the label of the current task? @@ -1107,6 +1119,12 @@ static int kdbus_conn_reply(struct kdbus_conn *src, struct kdbus_kmsg *kmsg) if (ret 0) goto exit; + ret = security_kdbus_talk(src, dst); + if (ret) { + ret = -EPERM; + goto exit; + } Where does kdbus apply its uid-based or other restrictions on connections? Why do we need to insert separate hooks into each of these functions? Is there no central chokepoint already for permission checking that we can hook? diff --git a/ipc/kdbus/connection.h b/ipc/kdbus/connection.h index d1ffe90..1f91d39 100644 --- a/ipc/kdbus/connection.h +++ b/ipc/kdbus/connection.h @@ -19,6 +19,7 @@ #include linux/kref.h #include linux/lockdep.h #include linux/path.h +#include uapi/linux/kdbus.h #include limits.h #include metadata.h @@ -73,6 +74,7 @@ struct kdbus_kmsg; * @names_queue_list:Well-known names this connection waits for * @privileged: Whether this connection is privileged on the bus * @faked_meta: Whether the metadata was faked on HELLO + * @security:LSM security blob */ struct kdbus_conn { struct kref kref; @@ -113,6 +115,8 @@ struct kdbus_conn { bool privileged:1; bool faked_meta:1; + + void *security; }; Unless I missed it, you may have missed the most important thing of all: controlling kdbus's notion of privileged. kdbus sets privileged to true if the process has CAP_IPC_OWNER or the process euid matches the uid of the bus creator, and then it allows those processes to do many dangerous things, including monitoring all traffic, impersonating credentials, pids, or seclabel, etc. I don't believe we should ever permit impersonating seclabel information. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS
On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins hu...@google.com wrote: It appears that, at some point last year, XFS made directory handling changes which bring it into lockdep conflict with shmem_zero_setup(): it is surprising that mmap() can clone an inode while holding mmap_sem, but that has been so for many years. Since those few lockdep traces that I've seen all implicated selinux, I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which v3.13's commit c7277090927a (security: shmem: implement kernel private shmem inodes) introduced to avoid LSM checks on kernel-internal inodes: the mmap(/dev/zero) cloned inode is indeed a kernel-internal detail. This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers which cloned inode in mmap(), but if so, I cannot locate them now. This causes a regression for SELinux (please, in the future, cc selinux list and Paul Moore on SELinux-related changes). In particular, this change disables SELinux checking of mprotect PROT_EXEC on shared anonymous mappings, so we lose the ability to control executable mappings. That said, we are only getting that check today as a side effect of our file execute check on the tmpfs inode, whereas it would be better (and more consistent with the mmap-time checks) to apply an execmem check in that case, in which case we wouldn't care about the inode-based check. However, I am unclear on how to correctly detect that situation from selinux_file_mprotect() - file_map_prot_check(), because we do have a non-NULL vma-vm_file so we treat it as a file execute check. In contrast, if directly creating an anonymous shared mapping with PROT_EXEC via mmap(...PROT_EXEC...), selinux_mmap_file is called with a NULL file and therefore we end up applying an execmem check. Reported-and-tested-by: Prarit Bhargava pra...@redhat.com Reported-by: Daniel Wagner w...@monom.org Reported-by: Morten Stevens mstev...@fedoraproject.org Signed-off-by: Hugh Dickins hu...@google.com --- mm/shmem.c |8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) --- 4.1-rc7/mm/shmem.c 2015-04-26 19:16:31.352191298 -0700 +++ linux/mm/shmem.c2015-06-14 09:26:49.461120166 -0700 @@ -3401,7 +3401,13 @@ int shmem_zero_setup(struct vm_area_stru struct file *file; loff_t size = vma-vm_end - vma-vm_start; - file = shmem_file_setup(dev/zero, size, vma-vm_flags); + /* +* Cloning a new file under mmap_sem leads to a lock ordering conflict +* between XFS directory reading and selinux: since this file is only +* accessible to the user through its mapping, use S_PRIVATE flag to +* bypass file security, in the same way as shmem_kernel_file_setup(). +*/ + file = __shmem_file_setup(dev/zero, size, vma-vm_flags, S_PRIVATE); if (IS_ERR(file)) return PTR_ERR(file); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/