Re: [PATCH v2 5/7] selinux: Add support for unprivileged mounts from user namespaces

2015-10-13 Thread Stephen Smalley

On 10/13/2015 01:04 PM, Seth Forshee wrote:

Security labels from unprivileged mounts in user namespaces must
be ignored. Force superblocks from user namespaces whose labeling
behavior is to use xattrs to use mountpoint labeling instead.
For the mountpoint label, default to converting the current task
context into a form suitable for file objects, but also allow the
policy writer to specify a different label through policy
transition rules.

Pieced together from code snippets provided by Stephen Smalley.

Signed-off-by: Seth Forshee <seth.fors...@canonical.com>


Acked-by: Stephen Smalley <s...@tycho.nsa.gov>


---
  security/selinux/hooks.c | 23 +++
  1 file changed, 23 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index de05207eb665..09be1dc21e58 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -756,6 +756,28 @@ static int selinux_set_mnt_opts(struct super_block *sb,
goto out;
}
}
+
+   /*
+* If this is a user namespace mount, no contexts are allowed
+* on the command line and security labels must be ignored.
+*/
+   if (sb->s_user_ns != _user_ns) {
+   if (context_sid || fscontext_sid || rootcontext_sid ||
+   defcontext_sid) {
+   rc = -EACCES;
+   goto out;
+   }
+   if (sbsec->behavior == SECURITY_FS_USE_XATTR) {
+   sbsec->behavior = SECURITY_FS_USE_MNTPOINT;
+   rc = security_transition_sid(current_sid(), 
current_sid(),
+SECCLASS_FILE, NULL,
+>mntpoint_sid);
+   if (rc)
+   goto out;
+   }
+   goto out_set_opts;
+   }
+
/* sets the context of the superblock for the fs being mounted. */
if (fscontext_sid) {
rc = may_context_mount_sb_relabel(fscontext_sid, sbsec, cred);
@@ -824,6 +846,7 @@ static int selinux_set_mnt_opts(struct super_block *sb,
sbsec->def_sid = defcontext_sid;
}

+out_set_opts:
rc = sb_finish_set_opts(sb);
  out:
mutex_unlock(>lock);



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] security: selinux: Use a kmem_cache for allocation struct file_security_struct

2015-10-07 Thread Stephen Smalley
On 10/05/2015 01:45 AM, Sangwoo wrote:
> The size of struct file_security_struct is 16byte at my setup.
> But, the real allocation size for per each file_security_struct
> is 64bytes in my setup that kmalloc min size is 64bytes
> because ARCH_DMA_MINALIGN is 64.
> 
> This allocation is called every times at file allocation(alloc_file()).
> So, the total slack memory size(allocated size - request size)
> is increased exponentially.
> 
> E.g) Min Kmalloc Size : 64bytes, Unit : bytes
>   Allocated Size | Request Size | Slack Size | Allocation Count
> ---
>  770048  |192512|   577536   |  12032
> 
> At the result, this change reduce memory usage 42bytes per each
> file_security_struct
> 
> Signed-off-by: Sangwoo 

Acked-by:  Stephen Smalley 

> ---
>  security/selinux/hooks.c |8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 3f8d567..c20e082 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -126,6 +126,7 @@ int selinux_enabled = 1;
>  #endif
>  
>  static struct kmem_cache *sel_inode_cache;
> +static struct kmem_cache *file_security_cache;
>  
>  /**
>   * selinux_secmark_enabled - Check to see if SECMARK is currently enabled
> @@ -287,7 +288,7 @@ static int file_alloc_security(struct file *file)
>   struct file_security_struct *fsec;
>   u32 sid = current_sid();
>  
> - fsec = kzalloc(sizeof(struct file_security_struct), GFP_KERNEL);
> + fsec = kmem_cache_zalloc(file_security_cache, GFP_KERNEL);
>   if (!fsec)
>   return -ENOMEM;
>  
> @@ -302,7 +303,7 @@ static void file_free_security(struct file *file)
>  {
>   struct file_security_struct *fsec = file->f_security;
>   file->f_security = NULL;
> - kfree(fsec);
> + kmem_cache_free(file_security_cache, fsec);
>  }
>  
>  static int superblock_alloc_security(struct super_block *sb)
> @@ -6086,6 +6087,9 @@ static __init int selinux_init(void)
>   sel_inode_cache = kmem_cache_create("selinux_inode_security",
>   sizeof(struct 
> inode_security_struct),
>   0, SLAB_PANIC, NULL);
> + file_security_cache = kmem_cache_create("selinux_file_security",
> + sizeof(struct file_security_struct),
> + 0, SLAB_PANIC, NULL);
>   avc_init();
>  
>   security_add_hooks(selinux_hooks, ARRAY_SIZE(selinux_hooks));
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] security: selinux: Use a kmem_cache for allocation struct file_security_struct

2015-10-07 Thread Stephen Smalley
On 10/05/2015 01:45 AM, Sangwoo wrote:
> The size of struct file_security_struct is 16byte at my setup.
> But, the real allocation size for per each file_security_struct
> is 64bytes in my setup that kmalloc min size is 64bytes
> because ARCH_DMA_MINALIGN is 64.
> 
> This allocation is called every times at file allocation(alloc_file()).
> So, the total slack memory size(allocated size - request size)
> is increased exponentially.
> 
> E.g) Min Kmalloc Size : 64bytes, Unit : bytes
>   Allocated Size | Request Size | Slack Size | Allocation Count
> ---
>  770048  |192512|   577536   |  12032
> 
> At the result, this change reduce memory usage 42bytes per each
> file_security_struct
> 
> Signed-off-by: Sangwoo <sangwoo2.p...@lge.com>

Acked-by:  Stephen Smalley <s...@tycho.nsa.gov>

> ---
>  security/selinux/hooks.c |8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 3f8d567..c20e082 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -126,6 +126,7 @@ int selinux_enabled = 1;
>  #endif
>  
>  static struct kmem_cache *sel_inode_cache;
> +static struct kmem_cache *file_security_cache;
>  
>  /**
>   * selinux_secmark_enabled - Check to see if SECMARK is currently enabled
> @@ -287,7 +288,7 @@ static int file_alloc_security(struct file *file)
>   struct file_security_struct *fsec;
>   u32 sid = current_sid();
>  
> - fsec = kzalloc(sizeof(struct file_security_struct), GFP_KERNEL);
> + fsec = kmem_cache_zalloc(file_security_cache, GFP_KERNEL);
>   if (!fsec)
>   return -ENOMEM;
>  
> @@ -302,7 +303,7 @@ static void file_free_security(struct file *file)
>  {
>   struct file_security_struct *fsec = file->f_security;
>   file->f_security = NULL;
> - kfree(fsec);
> + kmem_cache_free(file_security_cache, fsec);
>  }
>  
>  static int superblock_alloc_security(struct super_block *sb)
> @@ -6086,6 +6087,9 @@ static __init int selinux_init(void)
>   sel_inode_cache = kmem_cache_create("selinux_inode_security",
>   sizeof(struct 
> inode_security_struct),
>   0, SLAB_PANIC, NULL);
> + file_security_cache = kmem_cache_create("selinux_file_security",
> + sizeof(struct file_security_struct),
> + 0, SLAB_PANIC, NULL);
>   avc_init();
>  
>   security_add_hooks(selinux_hooks, ARRAY_SIZE(selinux_hooks));
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] security: Add hook to invalidate inode security labels

2015-10-06 Thread Stephen Smalley
On 10/05/2015 05:56 PM, Andreas Gruenbacher wrote:
> On Mon, Oct 5, 2015 at 5:08 PM, Stephen Smalley  wrote:
>> Not fond of these magic initialized values.
> 
> That should be a solvable problem.
> 
>> Is it always safe to call inode_doinit() from all callers of
>> inode_has_perm()?
> 
> As long as inode_has_perm is only used in contexts in which a file
> permission check / acl check would be possible, I don't see why not.
> 
>> What about the cases where isec->sid is used without going through
>> inode_has_perm()?
> 
> inode_has_perm seems to be called frequently and invalid labels seem
> to be reload quickly, so this change may make SELinux work well enough
> to be useful on top of gfs2 or similar. More checks would of course be
> better. The ideal case would be to always reload invalid labels, but
> that currently won't be possible because we don't have dentries
> everywhere.
> 
> I can't tell if this is this good enough to provide a useful level of
> protection. In any case, without a patch like this, on gfs2 and
> similar file systems, SELinux currently doesn't work at all.
> 
> How we can make progress with this problem?

I think we'd need to wrap all uses of inode->i_security with a helper that
applies this test.  FWIW, many/most of them seem to have a dentry
available, including all callers of inode_has_perm itself, so you could
just use inode_doinit_with_dentry() for all of those cases.  Maybe just
inline inode_has_perm() and get rid of it.

Need to deal appropriately with situations like selinux_inode_permission with
MAY_NOT_BLOCK.






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] x86/mm: warn on W+x mappings

2015-10-06 Thread Stephen Smalley
On 10/06/2015 03:32 AM, Ingo Molnar wrote:
> 
> * Stephen Smalley  wrote:
> 
>> On 10/03/2015 07:27 AM, Ingo Molnar wrote:
>>>
>>> * Stephen Smalley  wrote:
>>>
>>>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>>>> index 30564e2..f8b1573 100644
>>>> --- a/arch/x86/mm/init_64.c
>>>> +++ b/arch/x86/mm/init_64.c
>>>> @@ -1150,6 +1150,8 @@ void mark_rodata_ro(void)
>>>>free_init_pages("unused kernel",
>>>>(unsigned long) __va(__pa_symbol(rodata_end)),
>>>>(unsigned long) __va(__pa_symbol(_sdata)));
>>>> +
>>>> +  debug_checkwx();
>>>
>>> Any reason to not do this on NX capable 32-bit kernels as well?
>>
>> Done in v3.  However, I do see lots of W+X mappings there.
> 
> Ha! That's a debug check plan gone very well! :)
> 
>> [1.012796] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:225 
>> note_page+0x65d/0x840()
>> [1.012803] x86/mm: Found insecure W+X mapping at address 
>> f4a0/0xf4a0
> 
> What does this range correspond to on your kernel?

>From dmesg:
[0.00] virtual kernel memory layout:
   fixmap  : 0xffa96000 - 0xf000   (5540 kB)
   pkmap   : 0xff80 - 0xffa0   (2048 kB)
   vmalloc : 0xf7ffe000 - 0xff7fe000   ( 120 MB)
   lowmem  : 0xc000 - 0xf77fe000   ( 887 MB)
 .init : 0xc0dde000 - 0xc0e9d000   ( 764 kB)
 .data : 0xc0aa2ba0 - 0xc0ddca00   (3303 kB)
 .text : 0xc040 - 0xc0aa2ba0   (6794 kB)

/sys/kernel/debug/kernel_page_tables seems to have many such mappings,
even before the reported one under Kernel Mapping, plus one in the vmalloc() 
area:

---[ Kernel Mapping ]---
0xc000-0xc009b000 620K RW GLB NX pte
0xc009b000-0xc009c000   4K ro GLB NX pte
0xc009c000-0xc009d000   4K ro GLB x  pte
0xc009d000-0xc0201420K RW GLB NX pte
0xc020-0xc040   2M RW PSE GLB NX pmd
0xc040-0xc0a0   6M ro PSE GLB x  pmd
0xc0a0-0xc0aa3000 652K ro GLB x  pte
0xc0aa3000-0xc0d2a0002588K ro GLB NX pte
0xc0d2a000-0xc1002904K RW GLB NX pte
0xc100-0xe700 608M RW PSE GLB NX pmd
0xe700-0xe7027000 156K RW GLB x  pte
0xe7027000-0xe7028000   4K ro GLB x  pte
0xe7028000-0xe709b000 460K RW GLB x  pte
0xe709b000-0xe709c000   4K ro GLB x  pte
0xe709c000-0xe70b8000 112K RW GLB x  pte
0xe70b8000-0xe70b9000   4K ro GLB x  pte
0xe70b9000-0xe7108000 316K RW GLB x  pte
0xe7108000-0xe710a000   8K ro GLB x  pte
0xe710a000-0xe7127000 116K RW GLB x  pte
0xe7127000-0xe712a000  12K ro GLB x  pte

0xf2c5c000-0xf2c5d000   4K ro GLB x  pte
0xf2c5d000-0xf2e01676K RW GLB x  pte
0xf2e0-0xf4a0  28M RW PSE GLB NX pmd
0xf4a0-0xf4b280001184K RW GLB x  pte
0xf4b28000-0xf4c0 864K RW GLB NX pte
0xf4c0-0xf520   6M RW PSE GLB x  pmd
0xf520-0xf525d000 372K RW GLB x  pte
0xf525d000-0xf525e000   4K ro GLB x  pte
0xf525e000-0xf525f000   4K RW GLB x  pte
0xf525f000-0xf526   4K ro GLB x  pte
0xf526-0xf526a000  40K RW GLB x  pte
0xf640-0xf658c0001584K RW GLB NX pte
0xf658c000-0xf660 464K RW GLB x  pte
0xf660-0xf760  16M RW PSE GLB NX pmd
0xf760-0xf77fe0002040K RW GLB NX pte
0xf77fe000-0xf780   8K   pte
0xf780-0xf7e0   6M   pmd
0xf7e0-0xf7ffe0002040K   pte
---[ vmalloc() Area ]---
0xf7ffe000-0xf7fff000   4K RW GLB NX pte
0xf7fff000-0xf800   4K   pte
0xf800-0xf8002000   8K RW GLB NX pte
...
0xf86f3000-0xf8801076K   pte
0xf880-0xf8a0   2M RW PWT PSE GLB x  pmd
0xf8a0-0xf8b0

[tip:x86/mm] x86/mm: Warn on W^X mappings

2015-10-06 Thread tip-bot for Stephen Smalley
Commit-ID:  e1a58320a38dfa72be48a0f1a3a92273663ba6db
Gitweb: http://git.kernel.org/tip/e1a58320a38dfa72be48a0f1a3a92273663ba6db
Author: Stephen Smalley 
AuthorDate: Mon, 5 Oct 2015 12:55:20 -0400
Committer:  Ingo Molnar 
CommitDate: Tue, 6 Oct 2015 11:11:48 +0200

x86/mm: Warn on W^X mappings

Warn on any residual W+X mappings after setting NX
if DEBUG_WX is enabled.  Introduce a separate
X86_PTDUMP_CORE config that enables the code for
dumping the page tables without enabling the debugfs
interface, so that DEBUG_WX can be enabled without
exposing the debugfs interface.  Switch EFI_PGT_DUMP
to using X86_PTDUMP_CORE so that it also does not require
enabling the debugfs interface.

On success it prints this to the kernel log:

  x86/mm: Checked W+X mappings: passed, no W+X pages found.

On failure it prints a warning and a count of the failed pages:

  [ cut here ]
  WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:226 
note_page+0x610/0x7b0()
  x86/mm: Found insecure W+X mapping at address 
81755000/__stop___ex_table+0xfa8/0xabfa8
  [...]
  Call Trace:
   [] dump_stack+0x44/0x55
   [] warn_slowpath_common+0x82/0xc0
   [] warn_slowpath_fmt+0x5c/0x80
   [] ? note_page+0x5c9/0x7b0
   [] note_page+0x610/0x7b0
   [] ptdump_walk_pgd_level_core+0x259/0x3c0
   [] ptdump_walk_pgd_level_checkwx+0x17/0x20
   [] mark_rodata_ro+0xf5/0x100
   [] ? rest_init+0x80/0x80
   [] kernel_init+0x1d/0xe0
   [] ret_from_fork+0x3f/0x70
   [] ? rest_init+0x80/0x80
  ---[ end trace a1f23a1e42a2ac76 ]---
  x86/mm: Checked W+X mappings: FAILED, 171 W+X pages found.

Signed-off-by: Stephen Smalley 
Acked-by: Kees Cook 
Cc: Andy Lutomirski 
Cc: Arjan van de Ven 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-kernel@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1444064120-11450-1-git-send-email-...@tycho.nsa.gov
[ Improved the Kconfig help text and made the new option default-y
  if CONFIG_DEBUG_RODATA=y, because it already found buggy mappings,
  so we really want people to have this on by default. ]
Signed-off-by: Ingo Molnar 
---
 arch/x86/Kconfig.debug | 36 +++-
 arch/x86/include/asm/pgtable.h |  7 +++
 arch/x86/mm/Makefile   |  2 +-
 arch/x86/mm/dump_pagetables.c  | 42 +-
 arch/x86/mm/init_32.c  |  2 ++
 arch/x86/mm/init_64.c  |  2 ++
 6 files changed, 88 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index d8c0d32..3e0baf7 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -65,10 +65,14 @@ config EARLY_PRINTK_EFI
  This is useful for kernel debugging when your machine crashes very
  early before the console code is initialized.
 
+config X86_PTDUMP_CORE
+   def_bool n
+
 config X86_PTDUMP
bool "Export kernel pagetable layout to userspace via debugfs"
depends on DEBUG_KERNEL
select DEBUG_FS
+   select X86_PTDUMP_CORE
---help---
  Say Y here if you want to show the kernel pagetable layout in a
  debugfs file. This information is only useful for kernel developers
@@ -79,7 +83,8 @@ config X86_PTDUMP
 
 config EFI_PGT_DUMP
bool "Dump the EFI pagetable"
-   depends on EFI && X86_PTDUMP
+   depends on EFI
+   select X86_PTDUMP_CORE
---help---
  Enable this if you want to dump the EFI page table before
  enabling virtual mode. This can be used to debug miscellaneous
@@ -105,6 +110,35 @@ config DEBUG_RODATA_TEST
  feature as well as for the change_page_attr() infrastructure.
  If in doubt, say "N"
 
+config DEBUG_WX
+   bool "Warn on W+X mappings at boot"
+   depends on DEBUG_RODATA
+   default y
+   select X86_PTDUMP_CORE
+   ---help---
+ Generate a warning if any W+X mappings are found at boot.
+
+ This is useful for discovering cases where the kernel is leaving
+ W+X mappings after applying NX, as such mappings are a security risk.
+
+ Look for a message in dmesg output like this:
+
+   x86/mm: Checked W+X mappings: passed, no W+X pages found.
+
+ or like this, if the check failed:
+
+   x86/mm: Checked W+X mappings: FAILED,  W+X pages found.
+
+ Note that even if the check fails, your kernel is possibly
+ still fine, as W+X mappings are not a security hole in
+ themselves, what they do is that they make the exploitation
+ of other unfixed kernel bugs easier.
+
+ There is no runtime or memory usage effect of this option
+ once the kernel has booted up - it's a one time check.
+
+ If in doubt, say "Y".
+
 config DEBUG_SET_MODULE_RONX
bool "Set loadable kernel module data as 

Re: [PATCH v2] x86/mm: warn on W+x mappings

2015-10-06 Thread Stephen Smalley
On 10/06/2015 03:32 AM, Ingo Molnar wrote:
> 
> * Stephen Smalley <s...@tycho.nsa.gov> wrote:
> 
>> On 10/03/2015 07:27 AM, Ingo Molnar wrote:
>>>
>>> * Stephen Smalley <s...@tycho.nsa.gov> wrote:
>>>
>>>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>>>> index 30564e2..f8b1573 100644
>>>> --- a/arch/x86/mm/init_64.c
>>>> +++ b/arch/x86/mm/init_64.c
>>>> @@ -1150,6 +1150,8 @@ void mark_rodata_ro(void)
>>>>free_init_pages("unused kernel",
>>>>(unsigned long) __va(__pa_symbol(rodata_end)),
>>>>(unsigned long) __va(__pa_symbol(_sdata)));
>>>> +
>>>> +  debug_checkwx();
>>>
>>> Any reason to not do this on NX capable 32-bit kernels as well?
>>
>> Done in v3.  However, I do see lots of W+X mappings there.
> 
> Ha! That's a debug check plan gone very well! :)
> 
>> [1.012796] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:225 
>> note_page+0x65d/0x840()
>> [1.012803] x86/mm: Found insecure W+X mapping at address 
>> f4a0/0xf4a0
> 
> What does this range correspond to on your kernel?

>From dmesg:
[0.00] virtual kernel memory layout:
   fixmap  : 0xffa96000 - 0xf000   (5540 kB)
   pkmap   : 0xff80 - 0xffa0   (2048 kB)
   vmalloc : 0xf7ffe000 - 0xff7fe000   ( 120 MB)
   lowmem  : 0xc000 - 0xf77fe000   ( 887 MB)
 .init : 0xc0dde000 - 0xc0e9d000   ( 764 kB)
 .data : 0xc0aa2ba0 - 0xc0ddca00   (3303 kB)
 .text : 0xc040 - 0xc0aa2ba0   (6794 kB)

/sys/kernel/debug/kernel_page_tables seems to have many such mappings,
even before the reported one under Kernel Mapping, plus one in the vmalloc() 
area:

---[ Kernel Mapping ]---
0xc000-0xc009b000 620K RW GLB NX pte
0xc009b000-0xc009c000   4K ro GLB NX pte
0xc009c000-0xc009d000   4K ro GLB x  pte
0xc009d000-0xc0201420K RW GLB NX pte
0xc020-0xc040   2M RW PSE GLB NX pmd
0xc040-0xc0a0   6M ro PSE GLB x  pmd
0xc0a0-0xc0aa3000 652K ro GLB x  pte
0xc0aa3000-0xc0d2a0002588K ro GLB NX pte
0xc0d2a000-0xc1002904K RW GLB NX pte
0xc100-0xe700 608M RW PSE GLB NX pmd
0xe700-0xe7027000 156K RW GLB x  pte
0xe7027000-0xe7028000   4K ro GLB x  pte
0xe7028000-0xe709b000 460K RW GLB x  pte
0xe709b000-0xe709c000   4K ro GLB x  pte
0xe709c000-0xe70b8000 112K RW GLB x  pte
0xe70b8000-0xe70b9000   4K ro GLB x  pte
0xe70b9000-0xe7108000 316K RW GLB x  pte
0xe7108000-0xe710a000   8K ro GLB x  pte
0xe710a000-0xe7127000 116K RW GLB x  pte
0xe7127000-0xe712a000  12K ro GLB x  pte

0xf2c5c000-0xf2c5d000   4K ro GLB x  pte
0xf2c5d000-0xf2e01676K RW GLB x  pte
0xf2e0-0xf4a0  28M RW PSE GLB NX pmd
0xf4a0-0xf4b280001184K RW GLB x  pte
0xf4b28000-0xf4c0 864K RW GLB NX pte
0xf4c0-0xf520   6M RW PSE GLB x  pmd
0xf520-0xf525d000 372K RW GLB x  pte
0xf525d000-0xf525e000   4K ro GLB x  pte
0xf525e000-0xf525f000   4K RW GLB x  pte
0xf525f000-0xf526   4K ro GLB x  pte
0xf526-0xf526a000  40K RW GLB x  pte
0xf640-0xf658c0001584K RW GLB NX pte
0xf658c000-0xf660 464K RW GLB x  pte
0xf660-0xf760  16M RW PSE GLB NX pmd
0xf760-0xf77fe0002040K RW GLB NX pte
0xf77fe000-0xf780   8K   pte
0xf780-0xf7e0   6M   pmd
0xf7e0-0xf7ffe0002040K   pte
---[ vmalloc() Area ]---
0xf7ffe000-0xf7fff000   4K RW GLB NX pte
0xf7fff000-0xf800   4K   pte
0xf800-0xf8002000   8K RW GLB NX pte
...
0xf86f3000-0xf8801076K   pte
0xf880-0xf8a0   2M RW

Re: [PATCH v2 1/2] security: Add hook to invalidate inode security labels

2015-10-06 Thread Stephen Smalley
On 10/05/2015 05:56 PM, Andreas Gruenbacher wrote:
> On Mon, Oct 5, 2015 at 5:08 PM, Stephen Smalley <s...@tycho.nsa.gov> wrote:
>> Not fond of these magic initialized values.
> 
> That should be a solvable problem.
> 
>> Is it always safe to call inode_doinit() from all callers of
>> inode_has_perm()?
> 
> As long as inode_has_perm is only used in contexts in which a file
> permission check / acl check would be possible, I don't see why not.
> 
>> What about the cases where isec->sid is used without going through
>> inode_has_perm()?
> 
> inode_has_perm seems to be called frequently and invalid labels seem
> to be reload quickly, so this change may make SELinux work well enough
> to be useful on top of gfs2 or similar. More checks would of course be
> better. The ideal case would be to always reload invalid labels, but
> that currently won't be possible because we don't have dentries
> everywhere.
> 
> I can't tell if this is this good enough to provide a useful level of
> protection. In any case, without a patch like this, on gfs2 and
> similar file systems, SELinux currently doesn't work at all.
> 
> How we can make progress with this problem?

I think we'd need to wrap all uses of inode->i_security with a helper that
applies this test.  FWIW, many/most of them seem to have a dentry
available, including all callers of inode_has_perm itself, so you could
just use inode_doinit_with_dentry() for all of those cases.  Maybe just
inline inode_has_perm() and get rid of it.

Need to deal appropriately with situations like selinux_inode_permission with
MAY_NOT_BLOCK.






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/mm] x86/mm: Warn on W^X mappings

2015-10-06 Thread tip-bot for Stephen Smalley
Commit-ID:  e1a58320a38dfa72be48a0f1a3a92273663ba6db
Gitweb: http://git.kernel.org/tip/e1a58320a38dfa72be48a0f1a3a92273663ba6db
Author: Stephen Smalley <s...@tycho.nsa.gov>
AuthorDate: Mon, 5 Oct 2015 12:55:20 -0400
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 6 Oct 2015 11:11:48 +0200

x86/mm: Warn on W^X mappings

Warn on any residual W+X mappings after setting NX
if DEBUG_WX is enabled.  Introduce a separate
X86_PTDUMP_CORE config that enables the code for
dumping the page tables without enabling the debugfs
interface, so that DEBUG_WX can be enabled without
exposing the debugfs interface.  Switch EFI_PGT_DUMP
to using X86_PTDUMP_CORE so that it also does not require
enabling the debugfs interface.

On success it prints this to the kernel log:

  x86/mm: Checked W+X mappings: passed, no W+X pages found.

On failure it prints a warning and a count of the failed pages:

  [ cut here ]
  WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:226 
note_page+0x610/0x7b0()
  x86/mm: Found insecure W+X mapping at address 
81755000/__stop___ex_table+0xfa8/0xabfa8
  [...]
  Call Trace:
   [] dump_stack+0x44/0x55
   [] warn_slowpath_common+0x82/0xc0
   [] warn_slowpath_fmt+0x5c/0x80
   [] ? note_page+0x5c9/0x7b0
   [] note_page+0x610/0x7b0
   [] ptdump_walk_pgd_level_core+0x259/0x3c0
   [] ptdump_walk_pgd_level_checkwx+0x17/0x20
   [] mark_rodata_ro+0xf5/0x100
   [] ? rest_init+0x80/0x80
   [] kernel_init+0x1d/0xe0
   [] ret_from_fork+0x3f/0x70
   [] ? rest_init+0x80/0x80
  ---[ end trace a1f23a1e42a2ac76 ]---
  x86/mm: Checked W+X mappings: FAILED, 171 W+X pages found.

Signed-off-by: Stephen Smalley <s...@tycho.nsa.gov>
Acked-by: Kees Cook <keesc...@chromium.org>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Arjan van de Ven <ar...@linux.intel.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Brian Gerst <brge...@gmail.com>
Cc: Denys Vlasenko <dvlas...@redhat.com>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Mike Galbraith <efa...@gmx.de>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1444064120-11450-1-git-send-email-...@tycho.nsa.gov
[ Improved the Kconfig help text and made the new option default-y
  if CONFIG_DEBUG_RODATA=y, because it already found buggy mappings,
  so we really want people to have this on by default. ]
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 arch/x86/Kconfig.debug | 36 +++-
 arch/x86/include/asm/pgtable.h |  7 +++
 arch/x86/mm/Makefile   |  2 +-
 arch/x86/mm/dump_pagetables.c  | 42 +-
 arch/x86/mm/init_32.c  |  2 ++
 arch/x86/mm/init_64.c  |  2 ++
 6 files changed, 88 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index d8c0d32..3e0baf7 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -65,10 +65,14 @@ config EARLY_PRINTK_EFI
  This is useful for kernel debugging when your machine crashes very
  early before the console code is initialized.
 
+config X86_PTDUMP_CORE
+   def_bool n
+
 config X86_PTDUMP
bool "Export kernel pagetable layout to userspace via debugfs"
depends on DEBUG_KERNEL
select DEBUG_FS
+   select X86_PTDUMP_CORE
---help---
  Say Y here if you want to show the kernel pagetable layout in a
  debugfs file. This information is only useful for kernel developers
@@ -79,7 +83,8 @@ config X86_PTDUMP
 
 config EFI_PGT_DUMP
bool "Dump the EFI pagetable"
-   depends on EFI && X86_PTDUMP
+   depends on EFI
+   select X86_PTDUMP_CORE
---help---
  Enable this if you want to dump the EFI page table before
  enabling virtual mode. This can be used to debug miscellaneous
@@ -105,6 +110,35 @@ config DEBUG_RODATA_TEST
  feature as well as for the change_page_attr() infrastructure.
  If in doubt, say "N"
 
+config DEBUG_WX
+   bool "Warn on W+X mappings at boot"
+   depends on DEBUG_RODATA
+   default y
+   select X86_PTDUMP_CORE
+   ---help---
+ Generate a warning if any W+X mappings are found at boot.
+
+ This is useful for discovering cases where the kernel is leaving
+ W+X mappings after applying NX, as such mappings are a security risk.
+
+ Look for a message in dmesg output like this:
+
+   x86/mm: Checked W+X mappings: passed, no W+X pages found.
+
+ or like this, if the check failed:
+
+   x86/mm: Checked W+X mappings: FAILED,  W+X pages found.
+
+ Note that even if the check fails, your kernel is possibly
+ still fine, as W+X mappings are 

Re: [PATCH v2] x86/mm: warn on W+x mappings

2015-10-05 Thread Stephen Smalley
On 10/03/2015 07:27 AM, Ingo Molnar wrote:
> 
> * Stephen Smalley  wrote:
> 
>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>> index 30564e2..f8b1573 100644
>> --- a/arch/x86/mm/init_64.c
>> +++ b/arch/x86/mm/init_64.c
>> @@ -1150,6 +1150,8 @@ void mark_rodata_ro(void)
>>  free_init_pages("unused kernel",
>>  (unsigned long) __va(__pa_symbol(rodata_end)),
>>  (unsigned long) __va(__pa_symbol(_sdata)));
>> +
>> +debug_checkwx();
> 
> Any reason to not do this on NX capable 32-bit kernels as well?

Done in v3.  However, I do see lots of W+X mappings there.

[1.012796] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:225 
note_page+0x65d/0x840()
[1.012803] x86/mm: Found insecure W+X mapping at address f4a0/0xf4a0
[1.012805] Modules linked in:
[1.012833] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc4+ #2
[1.012837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.7.5-20140709_153950- 04/01/2014
[1.012844]  c0d32967 173b7da7  f7105e7c c0713490 f7105ebc f7105eac 
c045d077
[1.012848]  c0c47ef8 f7105edc 0001 c0c4de42 00e1 c04551fd c04551fd 
f7105f3c
[1.012851]  0002  f7105ec8 c045d0ee 0009 f7105ebc c0c47ef8 
f7105edc
[1.012855] Call Trace:
[1.012868]  [] dump_stack+0x41/0x61
[1.012871]  [] warn_slowpath_common+0x87/0xc0
[1.012873]  [] ? note_page+0x65d/0x840
[1.012875]  [] ? note_page+0x65d/0x840
[1.012877]  [] warn_slowpath_fmt+0x3e/0x60
[1.012878]  [] note_page+0x65d/0x840
[1.012880]  [] ptdump_walk_pgd_level_core+0x1d6/0x2d0
[1.012883]  [] ptdump_walk_pgd_level_checkwx+0x16/0x20
[1.012886]  [] mark_rodata_ro+0x135/0x160
[1.012898]  [] kernel_init+0x1f/0xe0
[1.012906]  [] ? schedule_tail+0x11/0x50
[1.012909]  [] ret_from_kernel_thread+0x21/0x30
[1.012910]  [] ? rest_init+0x70/0x70
[1.012912] ---[ end trace 40a4f3d5e8fb70ac ]---
[1.012954] x86/mm: Checked W+X mappings: FAILED, 6556 W+X pages found.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3] x86/mm: warn on W+x mappings

2015-10-05 Thread Stephen Smalley
Warn on any residual W+x mappings after setting NX
if DEBUG_WX is enabled.  Introduce a separate
X86_PTDUMP_CORE config that enables the code for
dumping the page tables without enabling the debugfs
interface, so that DEBUG_WX can be enabled without
exposing the debugfs interface.  Switch EFI_PGT_DUMP
to using X86_PTDUMP_CORE so that it also does not require
enabling the debugfs interface.

On success:
x86/mm: Checked W+X mappings: passed, no W+X pages found.

On failure:
[ cut here ]
WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:226 
note_page+0x610/0x7b0()
x86/mm: Found insecure W+X mapping at address 
81755000/__stop___ex_table+0xfa8/0xabfa8
Modules linked in:
CPU: 1 PID: 1 Comm: swapper/0 Tainted: GW   4.3.0-rc3+ #19
  e96b193f 88042c5dbd48 81380a5f
 88042c5dbd90 88042c5dbd80 8109d3f2 81e1
 0003 88042c5dbe90 88042c5dbe90 
Call Trace:
 [] dump_stack+0x44/0x55
 [] warn_slowpath_common+0x82/0xc0
 [] warn_slowpath_fmt+0x5c/0x80
 [] ? note_page+0x5c9/0x7b0
 [] note_page+0x610/0x7b0
 [] ptdump_walk_pgd_level_core+0x259/0x3c0
 [] ptdump_walk_pgd_level_checkwx+0x17/0x20
 [] mark_rodata_ro+0xf5/0x100
 [] ? rest_init+0x80/0x80
 [] kernel_init+0x1d/0xe0
 [] ret_from_fork+0x3f/0x70
 [] ? rest_init+0x80/0x80
---[ end trace a1f23a1e42a2ac76 ]---
x86/mm: Checked W+X mappings: FAILED, 171 W+X pages found.

Signed-off-by: Stephen Smalley 
---
v3 enables the checks on 32-bit if NX is supported, and also
makes DEBUG_WX depend on DEBUG_RODATA since both the NX marking
and the checking occurs from mark_rodata_ro().

 arch/x86/Kconfig.debug | 20 +++-
 arch/x86/include/asm/pgtable.h |  7 +++
 arch/x86/mm/Makefile   |  2 +-
 arch/x86/mm/dump_pagetables.c  | 42 +-
 arch/x86/mm/init_32.c  |  2 ++
 arch/x86/mm/init_64.c  |  2 ++
 6 files changed, 72 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index d8c0d32..d09fde7 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -65,10 +65,14 @@ config EARLY_PRINTK_EFI
  This is useful for kernel debugging when your machine crashes very
  early before the console code is initialized.
 
+config X86_PTDUMP_CORE
+   def_bool n
+
 config X86_PTDUMP
bool "Export kernel pagetable layout to userspace via debugfs"
depends on DEBUG_KERNEL
select DEBUG_FS
+   select X86_PTDUMP_CORE
---help---
  Say Y here if you want to show the kernel pagetable layout in a
  debugfs file. This information is only useful for kernel developers
@@ -79,7 +83,8 @@ config X86_PTDUMP
 
 config EFI_PGT_DUMP
bool "Dump the EFI pagetable"
-   depends on EFI && X86_PTDUMP
+   depends on EFI
+   select X86_PTDUMP_CORE
---help---
  Enable this if you want to dump the EFI page table before
  enabling virtual mode. This can be used to debug miscellaneous
@@ -105,6 +110,19 @@ config DEBUG_RODATA_TEST
  feature as well as for the change_page_attr() infrastructure.
  If in doubt, say "N"
 
+config DEBUG_WX
+   bool "Warn on W+X mappings at boot"
+   depends on DEBUG_RODATA
+   select X86_PTDUMP_CORE
+   ---help---
+ Generate a warning if any W+X mappings are found at boot.
+ This is useful for discovering cases where the kernel is leaving
+ W+X mappings after applying NX, as such mappings are a security risk.
+ Look for a message in dmesg output like this:
+ x86/mm: Checked W+X mappings: passed, no W+X pages found.
+ or like this:
+ x86/mm: Checked W+X mappings: FAILED,  W+X pages found.
+
 config DEBUG_SET_MODULE_RONX
bool "Set loadable kernel module data as NX and text as RO"
depends on MODULES
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 867da5b..f2b6bed 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -19,6 +19,13 @@
 #include 
 
 void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
+void ptdump_walk_pgd_level_checkwx(void);
+
+#ifdef CONFIG_DEBUG_WX
+#define debug_checkwx() ptdump_walk_pgd_level_checkwx()
+#else
+#define debug_checkwx() do { } while (0)
+#endif
 
 /*
  * ZERO_PAGE is a global shared page that is always zero: used
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index a482d10..65c47fd 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -14,7 +14,7 @@ obj-$(CONFIG_SMP) += tlb.o
 obj-$(CONFIG_X86_32)   += pgtable_32.o iomap_32.o
 
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
-obj-$(CONFIG_X86_PTDUMP)   += dump_pagetables.o
+obj-$(CONFIG_X86_PTDUMP_CORE)  += dump_pagetables.o
 
 obj-$(CONFIG_HIGHMEM)  += h

Re: [PATCH v2 1/2] security: Add hook to invalidate inode security labels

2015-10-05 Thread Stephen Smalley

On 10/04/2015 03:19 PM, Andreas Gruenbacher wrote:

Add a hook to invalidate an inode's security label when the cached
information becomes invalid.

Implement the new hook in selinux: set a flag when a security label becomes
invalid.  When hitting a security label which has been marked as invalid in
inode_has_perm, try reloading the label.

If an inode does not have any dentries attached, we cannot reload its
security label because we cannot use the getxattr inode operation.  In that
case, continue using the old, invalid label until a dentry becomes
available.

Signed-off-by: Andreas Gruenbacher 
Cc: Paul Moore 
Cc: Stephen Smalley 
Cc: Eric Paris 
Cc: seli...@tycho.nsa.gov
---
  include/linux/lsm_hooks.h |  6 ++
  include/linux/security.h  |  5 +
  security/security.c   |  8 
  security/selinux/hooks.c  | 23 +--
  security/selinux/include/objsec.h |  3 ++-
  5 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index ec3a6ba..945ae1d 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1261,6 +1261,10 @@
   *audit_rule_init.
   *@rule contains the allocated rule
   *
+ * @inode_invalidate_secctx:
+ * Notify the security module that it must revalidate the security context
+ * of an inode.
+ *
   * @inode_notifysecctx:
   *Notify the security module of what the security context of an inode
   *should be.  Initializes the incore security context managed by the
@@ -1516,6 +1520,7 @@ union security_list_options {
int (*secctx_to_secid)(const char *secdata, u32 seclen, u32 *secid);
void (*release_secctx)(char *secdata, u32 seclen);

+   void (*inode_invalidate_secctx)(struct inode *inode);
int (*inode_notifysecctx)(struct inode *inode, void *ctx, u32 ctxlen);
int (*inode_setsecctx)(struct dentry *dentry, void *ctx, u32 ctxlen);
int (*inode_getsecctx)(struct inode *inode, void **ctx, u32 *ctxlen);
@@ -1757,6 +1762,7 @@ struct security_hook_heads {
struct list_head secid_to_secctx;
struct list_head secctx_to_secid;
struct list_head release_secctx;
+   struct list_head inode_invalidate_secctx;
struct list_head inode_notifysecctx;
struct list_head inode_setsecctx;
struct list_head inode_getsecctx;
diff --git a/include/linux/security.h b/include/linux/security.h
index 2f4c1f7..9692571 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -353,6 +353,7 @@ int security_secid_to_secctx(u32 secid, char **secdata, u32 
*seclen);
  int security_secctx_to_secid(const char *secdata, u32 seclen, u32 *secid);
  void security_release_secctx(char *secdata, u32 seclen);

+void security_inode_invalidate_secctx(struct inode *inode);
  int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen);
  int security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen);
  int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen);
@@ -1093,6 +1094,10 @@ static inline void security_release_secctx(char 
*secdata, u32 seclen)
  {
  }

+static inline void security_inode_invalidate_secctx(struct inode *inode)
+{
+}
+
  static inline int security_inode_notifysecctx(struct inode *inode, void *ctx, 
u32 ctxlen)
  {
return -EOPNOTSUPP;
diff --git a/security/security.c b/security/security.c
index 46f405c..e4371cd 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1161,6 +1161,12 @@ void security_release_secctx(char *secdata, u32 seclen)
  }
  EXPORT_SYMBOL(security_release_secctx);

+void security_inode_invalidate_secctx(struct inode *inode)
+{
+   call_void_hook(inode_invalidate_secctx, inode);
+}
+EXPORT_SYMBOL(security_inode_invalidate_secctx);
+
  int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen)
  {
return call_int_hook(inode_notifysecctx, 0, inode, ctx, ctxlen);
@@ -1763,6 +1769,8 @@ struct security_hook_heads security_hook_heads = {
LIST_HEAD_INIT(security_hook_heads.secctx_to_secid),
.release_secctx =
LIST_HEAD_INIT(security_hook_heads.release_secctx),
+   .inode_invalidate_secctx =
+   LIST_HEAD_INIT(security_hook_heads.inode_invalidate_secctx),
.inode_notifysecctx =
LIST_HEAD_INIT(security_hook_heads.inode_notifysecctx),
.inode_setsecctx =
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index e4369d8..c5e4ca8 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -1293,11 +1293,11 @@ static int inode_doinit_with_dentry(struct inode 
*inode, struct dentry *opt_dent
unsigned len = 0;
int rc = 0;

-   if (isec->initialized)
+   if (isec->initialized == 1)
goto out;

mutex_lock(>lock);
-   if (isec->initialized)
+   if (isec->initialized == 1)

Re: [PATCH v2] x86/mm: warn on W+x mappings

2015-10-05 Thread Stephen Smalley
On 10/03/2015 07:27 AM, Ingo Molnar wrote:
> 
> * Stephen Smalley <s...@tycho.nsa.gov> wrote:
> 
>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>> index 30564e2..f8b1573 100644
>> --- a/arch/x86/mm/init_64.c
>> +++ b/arch/x86/mm/init_64.c
>> @@ -1150,6 +1150,8 @@ void mark_rodata_ro(void)
>>  free_init_pages("unused kernel",
>>  (unsigned long) __va(__pa_symbol(rodata_end)),
>>  (unsigned long) __va(__pa_symbol(_sdata)));
>> +
>> +debug_checkwx();
> 
> Any reason to not do this on NX capable 32-bit kernels as well?

Done in v3.  However, I do see lots of W+X mappings there.

[1.012796] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:225 
note_page+0x65d/0x840()
[1.012803] x86/mm: Found insecure W+X mapping at address f4a0/0xf4a0
[1.012805] Modules linked in:
[1.012833] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc4+ #2
[1.012837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.7.5-20140709_153950- 04/01/2014
[1.012844]  c0d32967 173b7da7  f7105e7c c0713490 f7105ebc f7105eac 
c045d077
[1.012848]  c0c47ef8 f7105edc 0001 c0c4de42 00e1 c04551fd c04551fd 
f7105f3c
[1.012851]  0002  f7105ec8 c045d0ee 0009 f7105ebc c0c47ef8 
f7105edc
[1.012855] Call Trace:
[1.012868]  [] dump_stack+0x41/0x61
[1.012871]  [] warn_slowpath_common+0x87/0xc0
[1.012873]  [] ? note_page+0x65d/0x840
[1.012875]  [] ? note_page+0x65d/0x840
[1.012877]  [] warn_slowpath_fmt+0x3e/0x60
[1.012878]  [] note_page+0x65d/0x840
[1.012880]  [] ptdump_walk_pgd_level_core+0x1d6/0x2d0
[1.012883]  [] ptdump_walk_pgd_level_checkwx+0x16/0x20
[1.012886]  [] mark_rodata_ro+0x135/0x160
[1.012898]  [] kernel_init+0x1f/0xe0
[1.012906]  [] ? schedule_tail+0x11/0x50
[1.012909]  [] ret_from_kernel_thread+0x21/0x30
[1.012910]  [] ? rest_init+0x70/0x70
[1.012912] ---[ end trace 40a4f3d5e8fb70ac ]---
[1.012954] x86/mm: Checked W+X mappings: FAILED, 6556 W+X pages found.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] security: Add hook to invalidate inode security labels

2015-10-05 Thread Stephen Smalley

On 10/04/2015 03:19 PM, Andreas Gruenbacher wrote:

Add a hook to invalidate an inode's security label when the cached
information becomes invalid.

Implement the new hook in selinux: set a flag when a security label becomes
invalid.  When hitting a security label which has been marked as invalid in
inode_has_perm, try reloading the label.

If an inode does not have any dentries attached, we cannot reload its
security label because we cannot use the getxattr inode operation.  In that
case, continue using the old, invalid label until a dentry becomes
available.

Signed-off-by: Andreas Gruenbacher <agrue...@redhat.com>
Cc: Paul Moore <p...@paul-moore.com>
Cc: Stephen Smalley <s...@tycho.nsa.gov>
Cc: Eric Paris <epa...@parisplace.org>
Cc: seli...@tycho.nsa.gov
---
  include/linux/lsm_hooks.h |  6 ++
  include/linux/security.h  |  5 +
  security/security.c   |  8 
  security/selinux/hooks.c  | 23 +--
  security/selinux/include/objsec.h |  3 ++-
  5 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index ec3a6ba..945ae1d 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1261,6 +1261,10 @@
   *audit_rule_init.
   *@rule contains the allocated rule
   *
+ * @inode_invalidate_secctx:
+ * Notify the security module that it must revalidate the security context
+ * of an inode.
+ *
   * @inode_notifysecctx:
   *Notify the security module of what the security context of an inode
   *should be.  Initializes the incore security context managed by the
@@ -1516,6 +1520,7 @@ union security_list_options {
int (*secctx_to_secid)(const char *secdata, u32 seclen, u32 *secid);
void (*release_secctx)(char *secdata, u32 seclen);

+   void (*inode_invalidate_secctx)(struct inode *inode);
int (*inode_notifysecctx)(struct inode *inode, void *ctx, u32 ctxlen);
int (*inode_setsecctx)(struct dentry *dentry, void *ctx, u32 ctxlen);
int (*inode_getsecctx)(struct inode *inode, void **ctx, u32 *ctxlen);
@@ -1757,6 +1762,7 @@ struct security_hook_heads {
struct list_head secid_to_secctx;
struct list_head secctx_to_secid;
struct list_head release_secctx;
+   struct list_head inode_invalidate_secctx;
struct list_head inode_notifysecctx;
struct list_head inode_setsecctx;
struct list_head inode_getsecctx;
diff --git a/include/linux/security.h b/include/linux/security.h
index 2f4c1f7..9692571 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -353,6 +353,7 @@ int security_secid_to_secctx(u32 secid, char **secdata, u32 
*seclen);
  int security_secctx_to_secid(const char *secdata, u32 seclen, u32 *secid);
  void security_release_secctx(char *secdata, u32 seclen);

+void security_inode_invalidate_secctx(struct inode *inode);
  int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen);
  int security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen);
  int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen);
@@ -1093,6 +1094,10 @@ static inline void security_release_secctx(char 
*secdata, u32 seclen)
  {
  }

+static inline void security_inode_invalidate_secctx(struct inode *inode)
+{
+}
+
  static inline int security_inode_notifysecctx(struct inode *inode, void *ctx, 
u32 ctxlen)
  {
return -EOPNOTSUPP;
diff --git a/security/security.c b/security/security.c
index 46f405c..e4371cd 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1161,6 +1161,12 @@ void security_release_secctx(char *secdata, u32 seclen)
  }
  EXPORT_SYMBOL(security_release_secctx);

+void security_inode_invalidate_secctx(struct inode *inode)
+{
+   call_void_hook(inode_invalidate_secctx, inode);
+}
+EXPORT_SYMBOL(security_inode_invalidate_secctx);
+
  int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen)
  {
return call_int_hook(inode_notifysecctx, 0, inode, ctx, ctxlen);
@@ -1763,6 +1769,8 @@ struct security_hook_heads security_hook_heads = {
LIST_HEAD_INIT(security_hook_heads.secctx_to_secid),
.release_secctx =
LIST_HEAD_INIT(security_hook_heads.release_secctx),
+   .inode_invalidate_secctx =
+   LIST_HEAD_INIT(security_hook_heads.inode_invalidate_secctx),
.inode_notifysecctx =
LIST_HEAD_INIT(security_hook_heads.inode_notifysecctx),
.inode_setsecctx =
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index e4369d8..c5e4ca8 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -1293,11 +1293,11 @@ static int inode_doinit_with_dentry(struct inode 
*inode, struct dentry *opt_dent
unsigned len = 0;
int rc = 0;

-   if (isec->initialized)
+   if (isec->initialized == 1)
goto out;

mutex

[PATCH v3] x86/mm: warn on W+x mappings

2015-10-05 Thread Stephen Smalley
Warn on any residual W+x mappings after setting NX
if DEBUG_WX is enabled.  Introduce a separate
X86_PTDUMP_CORE config that enables the code for
dumping the page tables without enabling the debugfs
interface, so that DEBUG_WX can be enabled without
exposing the debugfs interface.  Switch EFI_PGT_DUMP
to using X86_PTDUMP_CORE so that it also does not require
enabling the debugfs interface.

On success:
x86/mm: Checked W+X mappings: passed, no W+X pages found.

On failure:
[ cut here ]
WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:226 
note_page+0x610/0x7b0()
x86/mm: Found insecure W+X mapping at address 
81755000/__stop___ex_table+0xfa8/0xabfa8
Modules linked in:
CPU: 1 PID: 1 Comm: swapper/0 Tainted: GW   4.3.0-rc3+ #19
  e96b193f 88042c5dbd48 81380a5f
 88042c5dbd90 88042c5dbd80 8109d3f2 81e1
 0003 88042c5dbe90 88042c5dbe90 
Call Trace:
 [] dump_stack+0x44/0x55
 [] warn_slowpath_common+0x82/0xc0
 [] warn_slowpath_fmt+0x5c/0x80
 [] ? note_page+0x5c9/0x7b0
 [] note_page+0x610/0x7b0
 [] ptdump_walk_pgd_level_core+0x259/0x3c0
 [] ptdump_walk_pgd_level_checkwx+0x17/0x20
 [] mark_rodata_ro+0xf5/0x100
 [] ? rest_init+0x80/0x80
 [] kernel_init+0x1d/0xe0
 [] ret_from_fork+0x3f/0x70
 [] ? rest_init+0x80/0x80
---[ end trace a1f23a1e42a2ac76 ]---
x86/mm: Checked W+X mappings: FAILED, 171 W+X pages found.

Signed-off-by: Stephen Smalley <s...@tycho.nsa.gov>
---
v3 enables the checks on 32-bit if NX is supported, and also
makes DEBUG_WX depend on DEBUG_RODATA since both the NX marking
and the checking occurs from mark_rodata_ro().

 arch/x86/Kconfig.debug | 20 +++-
 arch/x86/include/asm/pgtable.h |  7 +++
 arch/x86/mm/Makefile   |  2 +-
 arch/x86/mm/dump_pagetables.c  | 42 +-
 arch/x86/mm/init_32.c  |  2 ++
 arch/x86/mm/init_64.c  |  2 ++
 6 files changed, 72 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index d8c0d32..d09fde7 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -65,10 +65,14 @@ config EARLY_PRINTK_EFI
  This is useful for kernel debugging when your machine crashes very
  early before the console code is initialized.
 
+config X86_PTDUMP_CORE
+   def_bool n
+
 config X86_PTDUMP
bool "Export kernel pagetable layout to userspace via debugfs"
depends on DEBUG_KERNEL
select DEBUG_FS
+   select X86_PTDUMP_CORE
---help---
  Say Y here if you want to show the kernel pagetable layout in a
  debugfs file. This information is only useful for kernel developers
@@ -79,7 +83,8 @@ config X86_PTDUMP
 
 config EFI_PGT_DUMP
bool "Dump the EFI pagetable"
-   depends on EFI && X86_PTDUMP
+   depends on EFI
+   select X86_PTDUMP_CORE
---help---
  Enable this if you want to dump the EFI page table before
  enabling virtual mode. This can be used to debug miscellaneous
@@ -105,6 +110,19 @@ config DEBUG_RODATA_TEST
  feature as well as for the change_page_attr() infrastructure.
  If in doubt, say "N"
 
+config DEBUG_WX
+   bool "Warn on W+X mappings at boot"
+   depends on DEBUG_RODATA
+   select X86_PTDUMP_CORE
+   ---help---
+ Generate a warning if any W+X mappings are found at boot.
+ This is useful for discovering cases where the kernel is leaving
+ W+X mappings after applying NX, as such mappings are a security risk.
+ Look for a message in dmesg output like this:
+ x86/mm: Checked W+X mappings: passed, no W+X pages found.
+ or like this:
+ x86/mm: Checked W+X mappings: FAILED,  W+X pages found.
+
 config DEBUG_SET_MODULE_RONX
bool "Set loadable kernel module data as NX and text as RO"
depends on MODULES
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 867da5b..f2b6bed 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -19,6 +19,13 @@
 #include 
 
 void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
+void ptdump_walk_pgd_level_checkwx(void);
+
+#ifdef CONFIG_DEBUG_WX
+#define debug_checkwx() ptdump_walk_pgd_level_checkwx()
+#else
+#define debug_checkwx() do { } while (0)
+#endif
 
 /*
  * ZERO_PAGE is a global shared page that is always zero: used
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index a482d10..65c47fd 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -14,7 +14,7 @@ obj-$(CONFIG_SMP) += tlb.o
 obj-$(CONFIG_X86_32)   += pgtable_32.o iomap_32.o
 
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
-obj-$(CONFIG_X86_PTDUMP)   += dump_pagetables.o
+obj-$(CONFIG_X86_PTDUMP_CORE)  += dump_pagetables.o
 
 obj-$(CON

[PATCH v2] x86/mm: warn on W+x mappings

2015-10-02 Thread Stephen Smalley
Warn on any residual W+x mappings after setting NX
if DEBUG_WX is enabled.  Introduce a separate
X86_PTDUMP_CORE config that enables the code for
dumping the page tables without enabling the debugfs
interface, so that DEBUG_WX can be enabled without
exposing the debugfs interface.  Switch EFI_PGT_DUMP
to using X86_PTDUMP_CORE so that it also does not require
enabling the debugfs interface.

On success:
x86/mm: Checked W+X mappings: passed, no W+X pages found.

On failure:
[ cut here ]
WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:226 
note_page+0x610/0x7b0()
x86/mm: Found insecure W+X mapping at address 
81755000/__stop___ex_table+0xfa8/0xabfa8
Modules linked in:
CPU: 1 PID: 1 Comm: swapper/0 Tainted: GW   4.3.0-rc3+ #19
  e96b193f 88042c5dbd48 81380a5f
 88042c5dbd90 88042c5dbd80 8109d3f2 81e1
 0003 88042c5dbe90 88042c5dbe90 
Call Trace:
 [] dump_stack+0x44/0x55
 [] warn_slowpath_common+0x82/0xc0
 [] warn_slowpath_fmt+0x5c/0x80
 [] ? note_page+0x5c9/0x7b0
 [] note_page+0x610/0x7b0
 [] ptdump_walk_pgd_level_core+0x259/0x3c0
 [] ptdump_walk_pgd_level_checkwx+0x17/0x20
 [] mark_rodata_ro+0xf5/0x100
 [] ? rest_init+0x80/0x80
 [] kernel_init+0x1d/0xe0
 [] ret_from_fork+0x3f/0x70
 [] ? rest_init+0x80/0x80
---[ end trace a1f23a1e42a2ac76 ]---
x86/mm: Checked W+X mappings: FAILED, 171 W+X pages found.

Signed-off-by: Stephen Smalley 
---
v2 addresses Kees' concern about being able to enable this check
without enabling the debugfs interface, and reworks the output to
present failure and success in the manner suggested by Ingo.

 arch/x86/Kconfig.debug | 19 ++-
 arch/x86/include/asm/pgtable.h |  7 +++
 arch/x86/mm/Makefile   |  2 +-
 arch/x86/mm/dump_pagetables.c  | 42 +-
 arch/x86/mm/init_64.c  |  2 ++
 5 files changed, 69 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index d8c0d32..c6fe16b 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -65,10 +65,14 @@ config EARLY_PRINTK_EFI
  This is useful for kernel debugging when your machine crashes very
  early before the console code is initialized.
 
+config X86_PTDUMP_CORE
+   def_bool n
+
 config X86_PTDUMP
bool "Export kernel pagetable layout to userspace via debugfs"
depends on DEBUG_KERNEL
select DEBUG_FS
+   select X86_PTDUMP_CORE
---help---
  Say Y here if you want to show the kernel pagetable layout in a
  debugfs file. This information is only useful for kernel developers
@@ -79,13 +83,26 @@ config X86_PTDUMP
 
 config EFI_PGT_DUMP
bool "Dump the EFI pagetable"
-   depends on EFI && X86_PTDUMP
+   depends on EFI
+   select X86_PTDUMP_CORE
---help---
  Enable this if you want to dump the EFI page table before
  enabling virtual mode. This can be used to debug miscellaneous
  issues with the mapping of the EFI runtime regions into that
  table.
 
+config DEBUG_WX
+   bool "Warn on W+X mappings at boot"
+   select X86_PTDUMP_CORE
+   ---help---
+ Generate a warning if any W+X mappings are found at boot.
+ This is useful for discovering cases where the kernel is leaving
+ W+X mappings after applying NX, as such mappings are a security risk.
+ Look for a message in dmesg output like this:
+ x86/mm: Checked W+X mappings: passed, no W+X pages found.
+ or like this:
+ x86/mm: Checked W+X mappings: FAILED,  W+X pages found.
+
 config DEBUG_RODATA
bool "Write protect kernel read-only data structures"
default y
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 867da5b..f2b6bed 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -19,6 +19,13 @@
 #include 
 
 void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
+void ptdump_walk_pgd_level_checkwx(void);
+
+#ifdef CONFIG_DEBUG_WX
+#define debug_checkwx() ptdump_walk_pgd_level_checkwx()
+#else
+#define debug_checkwx() do { } while (0)
+#endif
 
 /*
  * ZERO_PAGE is a global shared page that is always zero: used
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index a482d10..65c47fd 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -14,7 +14,7 @@ obj-$(CONFIG_SMP) += tlb.o
 obj-$(CONFIG_X86_32)   += pgtable_32.o iomap_32.o
 
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
-obj-$(CONFIG_X86_PTDUMP)   += dump_pagetables.o
+obj-$(CONFIG_X86_PTDUMP_CORE)  += dump_pagetables.o
 
 obj-$(CONFIG_HIGHMEM)  += highmem_32.o
 
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index f0cedf3..19c64af 100644
--- a/arch/x86/mm/dump_

[tip:x86/urgent] x86/mm: Set NX on gap between __ex_table and rodata

2015-10-02 Thread tip-bot for Stephen Smalley
Commit-ID:  ab76f7b4ab2397ffdd2f1eb07c55697d19991d10
Gitweb: http://git.kernel.org/tip/ab76f7b4ab2397ffdd2f1eb07c55697d19991d10
Author: Stephen Smalley 
AuthorDate: Thu, 1 Oct 2015 09:04:22 -0400
Committer:  Ingo Molnar 
CommitDate: Fri, 2 Oct 2015 09:21:06 +0200

x86/mm: Set NX on gap between __ex_table and rodata

Unused space between the end of __ex_table and the start of
rodata can be left W+x in the kernel page tables.  Extend the
setting of the NX bit to cover this gap by starting from
text_end rather than rodata_start.

  Before:
  ---[ High Kernel Mapping ]---
  0x8000-0x8100  16M
   pmd
  0x8100-0x8160   6M ro PSE GLB 
x  pmd
  0x8160-0x817540001360K ro GLB 
x  pte
  0x81754000-0x8180 688K RW GLB 
x  pte
  0x8180-0x81a0   2M ro PSE GLB 
NX pmd
  0x81a0-0x81b3b0001260K ro GLB 
NX pte
  0x81b3b000-0x82004884K RW GLB 
NX pte
  0x8200-0x8220   2M RW PSE GLB 
NX pmd
  0x8220-0xa000 478M
   pmd

  After:
  ---[ High Kernel Mapping ]---
  0x8000-0x8100  16M
   pmd
  0x8100-0x8160   6M ro PSE GLB 
x  pmd
  0x8160-0x817540001360K ro GLB 
x  pte
  0x81754000-0x8180 688K RW GLB 
NX pte
  0x8180-0x81a0   2M ro PSE GLB 
NX pmd
  0x81a0-0x81b3b0001260K ro GLB 
NX pte
  0x81b3b000-0x82004884K RW GLB 
NX pte
  0x8200-0x8220   2M RW PSE GLB 
NX pmd
  0x8220-0xa000 478M
   pmd

Signed-off-by: Stephen Smalley 
Acked-by: Kees Cook 
Cc: 
Cc: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-kernel@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-...@tycho.nsa.gov
Signed-off-by: Ingo Molnar 
---
 arch/x86/mm/init_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 30564e2..df48430 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1132,7 +1132,7 @@ void mark_rodata_ro(void)
 * has been zapped already via cleanup_highmem().
 */
all_end = roundup((unsigned long)_brk_end, PMD_SIZE);
-   set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
+   set_memory_nx(text_end, (all_end - text_end) >> PAGE_SHIFT);
 
rodata_test();
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] x86/mm: warn on W+x mappings

2015-10-02 Thread Stephen Smalley
Warn on any residual W+x mappings after setting NX
if DEBUG_WX is enabled.  Introduce a separate
X86_PTDUMP_CORE config that enables the code for
dumping the page tables without enabling the debugfs
interface, so that DEBUG_WX can be enabled without
exposing the debugfs interface.  Switch EFI_PGT_DUMP
to using X86_PTDUMP_CORE so that it also does not require
enabling the debugfs interface.

On success:
x86/mm: Checked W+X mappings: passed, no W+X pages found.

On failure:
[ cut here ]
WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:226 
note_page+0x610/0x7b0()
x86/mm: Found insecure W+X mapping at address 
81755000/__stop___ex_table+0xfa8/0xabfa8
Modules linked in:
CPU: 1 PID: 1 Comm: swapper/0 Tainted: GW   4.3.0-rc3+ #19
  e96b193f 88042c5dbd48 81380a5f
 88042c5dbd90 88042c5dbd80 8109d3f2 81e1
 0003 88042c5dbe90 88042c5dbe90 
Call Trace:
 [] dump_stack+0x44/0x55
 [] warn_slowpath_common+0x82/0xc0
 [] warn_slowpath_fmt+0x5c/0x80
 [] ? note_page+0x5c9/0x7b0
 [] note_page+0x610/0x7b0
 [] ptdump_walk_pgd_level_core+0x259/0x3c0
 [] ptdump_walk_pgd_level_checkwx+0x17/0x20
 [] mark_rodata_ro+0xf5/0x100
 [] ? rest_init+0x80/0x80
 [] kernel_init+0x1d/0xe0
 [] ret_from_fork+0x3f/0x70
 [] ? rest_init+0x80/0x80
---[ end trace a1f23a1e42a2ac76 ]---
x86/mm: Checked W+X mappings: FAILED, 171 W+X pages found.

Signed-off-by: Stephen Smalley <s...@tycho.nsa.gov>
---
v2 addresses Kees' concern about being able to enable this check
without enabling the debugfs interface, and reworks the output to
present failure and success in the manner suggested by Ingo.

 arch/x86/Kconfig.debug | 19 ++-
 arch/x86/include/asm/pgtable.h |  7 +++
 arch/x86/mm/Makefile   |  2 +-
 arch/x86/mm/dump_pagetables.c  | 42 +-
 arch/x86/mm/init_64.c  |  2 ++
 5 files changed, 69 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index d8c0d32..c6fe16b 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -65,10 +65,14 @@ config EARLY_PRINTK_EFI
  This is useful for kernel debugging when your machine crashes very
  early before the console code is initialized.
 
+config X86_PTDUMP_CORE
+   def_bool n
+
 config X86_PTDUMP
bool "Export kernel pagetable layout to userspace via debugfs"
depends on DEBUG_KERNEL
select DEBUG_FS
+   select X86_PTDUMP_CORE
---help---
  Say Y here if you want to show the kernel pagetable layout in a
  debugfs file. This information is only useful for kernel developers
@@ -79,13 +83,26 @@ config X86_PTDUMP
 
 config EFI_PGT_DUMP
bool "Dump the EFI pagetable"
-   depends on EFI && X86_PTDUMP
+   depends on EFI
+   select X86_PTDUMP_CORE
---help---
  Enable this if you want to dump the EFI page table before
  enabling virtual mode. This can be used to debug miscellaneous
  issues with the mapping of the EFI runtime regions into that
  table.
 
+config DEBUG_WX
+   bool "Warn on W+X mappings at boot"
+   select X86_PTDUMP_CORE
+   ---help---
+ Generate a warning if any W+X mappings are found at boot.
+ This is useful for discovering cases where the kernel is leaving
+ W+X mappings after applying NX, as such mappings are a security risk.
+ Look for a message in dmesg output like this:
+ x86/mm: Checked W+X mappings: passed, no W+X pages found.
+ or like this:
+ x86/mm: Checked W+X mappings: FAILED,  W+X pages found.
+
 config DEBUG_RODATA
bool "Write protect kernel read-only data structures"
default y
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 867da5b..f2b6bed 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -19,6 +19,13 @@
 #include 
 
 void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
+void ptdump_walk_pgd_level_checkwx(void);
+
+#ifdef CONFIG_DEBUG_WX
+#define debug_checkwx() ptdump_walk_pgd_level_checkwx()
+#else
+#define debug_checkwx() do { } while (0)
+#endif
 
 /*
  * ZERO_PAGE is a global shared page that is always zero: used
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index a482d10..65c47fd 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -14,7 +14,7 @@ obj-$(CONFIG_SMP) += tlb.o
 obj-$(CONFIG_X86_32)   += pgtable_32.o iomap_32.o
 
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
-obj-$(CONFIG_X86_PTDUMP)   += dump_pagetables.o
+obj-$(CONFIG_X86_PTDUMP_CORE)  += dump_pagetables.o
 
 obj-$(CONFIG_HIGHMEM)  += highmem_32.o
 
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index f0cedf3..19c64af 100644

[tip:x86/urgent] x86/mm: Set NX on gap between __ex_table and rodata

2015-10-02 Thread tip-bot for Stephen Smalley
Commit-ID:  ab76f7b4ab2397ffdd2f1eb07c55697d19991d10
Gitweb: http://git.kernel.org/tip/ab76f7b4ab2397ffdd2f1eb07c55697d19991d10
Author: Stephen Smalley <s...@tycho.nsa.gov>
AuthorDate: Thu, 1 Oct 2015 09:04:22 -0400
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Fri, 2 Oct 2015 09:21:06 +0200

x86/mm: Set NX on gap between __ex_table and rodata

Unused space between the end of __ex_table and the start of
rodata can be left W+x in the kernel page tables.  Extend the
setting of the NX bit to cover this gap by starting from
text_end rather than rodata_start.

  Before:
  ---[ High Kernel Mapping ]---
  0x8000-0x8100  16M
   pmd
  0x8100-0x8160   6M ro PSE GLB 
x  pmd
  0x8160-0x817540001360K ro GLB 
x  pte
  0x81754000-0x8180 688K RW GLB 
x  pte
  0x8180-0x81a0   2M ro PSE GLB 
NX pmd
  0x81a0-0x81b3b0001260K ro GLB 
NX pte
  0x81b3b000-0x82004884K RW GLB 
NX pte
  0x8200-0x8220   2M RW PSE GLB 
NX pmd
  0x8220-0xa000 478M
   pmd

  After:
  ---[ High Kernel Mapping ]---
  0x8000-0x8100  16M
   pmd
  0x8100-0x8160   6M ro PSE GLB 
x  pmd
  0x8160-0x817540001360K ro GLB 
x  pte
  0x81754000-0x8180 688K RW GLB 
NX pte
  0x8180-0x81a0   2M ro PSE GLB 
NX pmd
  0x81a0-0x81b3b0001260K ro GLB 
NX pte
  0x81b3b000-0x82004884K RW GLB 
NX pte
  0x8200-0x8220   2M RW PSE GLB 
NX pmd
  0x8220-0xa000 478M
   pmd

Signed-off-by: Stephen Smalley <s...@tycho.nsa.gov>
Acked-by: Kees Cook <keesc...@chromium.org>
Cc: <sta...@vger.kernel.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Mike Galbraith <efa...@gmx.de>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-...@tycho.nsa.gov
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 arch/x86/mm/init_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 30564e2..df48430 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1132,7 +1132,7 @@ void mark_rodata_ro(void)
 * has been zapped already via cleanup_highmem().
 */
all_end = roundup((unsigned long)_brk_end, PMD_SIZE);
-   set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
+   set_memory_nx(text_end, (all_end - text_end) >> PAGE_SHIFT);
 
rodata_test();
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH] x86/mm: warn on W+x mappings

2015-10-01 Thread Stephen Smalley
Warn on any residual W+x mappings if X86_PTDUMP is enabled.

Sample dmesg output:
Checking for W+x mappings
0x81755000-0x8180 684K RW GLB x 
 pte
Found W+x mappings.  Please fix.

Signed-off-by: Stephen Smalley 
---
Not sure if this is the best place to put this check.
It must occur after free_init_pages() or it won't catch the
W+x case for the gap between __ex_table and rodata.

 arch/x86/include/asm/pgtable.h |  6 ++
 arch/x86/mm/dump_pagetables.c  | 31 ++-
 arch/x86/mm/init_64.c  |  2 ++
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 867da5b..8e771c1 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -20,6 +20,12 @@
 
 void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
 
+#ifdef CONFIG_X86_PTDUMP
+void ptdump_walk_pgd_level_checkwx(void);
+#else
+#define ptdump_walk_pgd_level_checkwx() do { } while (0)
+#endif
+
 /*
  * ZERO_PAGE is a global shared page that is always zero: used
  * for zero-mapped memory areas etc..
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index f0cedf3..986903b 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -32,6 +32,8 @@ struct pg_state {
const struct addr_marker *marker;
unsigned long lines;
bool to_dmesg;
+   bool check_wx;
+   bool found_wx;
 };
 
 struct addr_marker {
@@ -214,6 +216,13 @@ static void note_page(struct seq_file *m, struct pg_state 
*st,
const char *unit = units;
unsigned long delta;
int width = sizeof(unsigned long) * 2;
+   pgprotval_t pr = pgprot_val(st->current_prot);
+   bool savedmesg = st->to_dmesg;
+
+   if (st->check_wx && (pr & _PAGE_RW) && !(pr & _PAGE_NX)) {
+   st->to_dmesg = true;
+   st->found_wx = true;
+   }
 
/*
 * Now print the actual finished series
@@ -261,6 +270,7 @@ static void note_page(struct seq_file *m, struct pg_state 
*st,
st->start_address = st->current_address;
st->current_prot = new_prot;
st->level = level;
+   st->to_dmesg = savedmesg;
}
 }
 
@@ -344,7 +354,8 @@ static void walk_pud_level(struct seq_file *m, struct 
pg_state *st, pgd_t addr,
 #define pgd_none(a)  pud_none(__pud(pgd_val(a)))
 #endif
 
-void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd)
+static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
+  bool checkwx)
 {
 #ifdef CONFIG_X86_64
pgd_t *start = (pgd_t *) _level4_pgt;
@@ -359,6 +370,12 @@ void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd)
st.to_dmesg = true;
}
 
+   st.check_wx = checkwx;
+   if (checkwx) {
+   pr_info("Checking for W+x mappings\n");
+   st.found_wx = false;
+   }
+
for (i = 0; i < PTRS_PER_PGD; i++) {
st.current_address = normalize_addr(i * PGD_LEVEL_MULT);
if (!pgd_none(*start)) {
@@ -378,6 +395,18 @@ void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd)
/* Flush out the last page */
st.current_address = normalize_addr(PTRS_PER_PGD*PGD_LEVEL_MULT);
note_page(m, , __pgprot(0), 0);
+   if (checkwx && st.found_wx)
+   pr_warn("Found W+x mappings.  Please fix.\n");
+}
+
+void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd)
+{
+   ptdump_walk_pgd_level_core(m, pgd, false);
+}
+
+void ptdump_walk_pgd_level_checkwx(void)
+{
+   ptdump_walk_pgd_level_core(NULL, NULL, true);
 }
 
 static int ptdump_show(struct seq_file *m, void *v)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index df48430..7e704da 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1150,6 +1150,8 @@ void mark_rodata_ro(void)
free_init_pages("unused kernel",
(unsigned long) __va(__pa_symbol(rodata_end)),
(unsigned long) __va(__pa_symbol(_sdata)));
+
+   ptdump_walk_pgd_level_checkwx();
 }
 
 #endif
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86/mm: Set NX on gap between __ex_table and rodata

2015-10-01 Thread Stephen Smalley
Unused space between the end of __ex_table and the start of rodata
can be left W+x in the kernel page tables.  Extend the setting
of the NX bit to cover this gap by starting from text_end rather than
rodata_start.

Before:
---[ High Kernel Mapping ]---
0x8000-0x8100  16M  
 pmd
0x8100-0x8160   6M ro PSE GLB x 
 pmd
0x8160-0x817540001360K ro GLB x 
 pte
0x81754000-0x8180 688K RW GLB x 
 pte
0x8180-0x81a0   2M ro PSE GLB 
NX pmd
0x81a0-0x81b3b0001260K ro GLB 
NX pte
0x81b3b000-0x82004884K RW GLB 
NX pte
0x8200-0x8220   2M RW PSE GLB 
NX pmd
0x8220-0xa000 478M  
 pmd

After:
---[ High Kernel Mapping ]---
0x8000-0x8100  16M  
 pmd
0x8100-0x8160   6M ro PSE GLB x 
 pmd
0x8160-0x817540001360K ro GLB x 
 pte
0x81754000-0x8180 688K RW GLB 
NX pte
0x8180-0x81a0   2M ro PSE GLB 
NX pmd
0x81a0-0x81b3b0001260K ro GLB 
NX pte
0x81b3b000-0x82004884K RW GLB 
NX pte
0x8200-0x8220   2M RW PSE GLB 
NX pmd
0x8220-0xa000 478M  
 pmd

Signed-off-by: Stephen Smalley 
---
 arch/x86/mm/init_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 30564e2..df48430 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1132,7 +1132,7 @@ void mark_rodata_ro(void)
 * has been zapped already via cleanup_highmem().
 */
all_end = roundup((unsigned long)_brk_end, PMD_SIZE);
-   set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
+   set_memory_nx(text_end, (all_end - text_end) >> PAGE_SHIFT);
 
rodata_test();
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86/mm: Set NX on gap between __ex_table and rodata

2015-10-01 Thread Stephen Smalley
Unused space between the end of __ex_table and the start of rodata
can be left W+x in the kernel page tables.  Extend the setting
of the NX bit to cover this gap by starting from text_end rather than
rodata_start.

Before:
---[ High Kernel Mapping ]---
0x8000-0x8100  16M  
 pmd
0x8100-0x8160   6M ro PSE GLB x 
 pmd
0x8160-0x817540001360K ro GLB x 
 pte
0x81754000-0x8180 688K RW GLB x 
 pte
0x8180-0x81a0   2M ro PSE GLB 
NX pmd
0x81a0-0x81b3b0001260K ro GLB 
NX pte
0x81b3b000-0x82004884K RW GLB 
NX pte
0x8200-0x8220   2M RW PSE GLB 
NX pmd
0x8220-0xa000 478M  
 pmd

After:
---[ High Kernel Mapping ]---
0x8000-0x8100  16M  
 pmd
0x8100-0x8160   6M ro PSE GLB x 
 pmd
0x8160-0x817540001360K ro GLB x 
 pte
0x81754000-0x8180 688K RW GLB 
NX pte
0x8180-0x81a0   2M ro PSE GLB 
NX pmd
0x81a0-0x81b3b0001260K ro GLB 
NX pte
0x81b3b000-0x82004884K RW GLB 
NX pte
0x8200-0x8220   2M RW PSE GLB 
NX pmd
0x8220-0xa000 478M  
 pmd

Signed-off-by: Stephen Smalley <s...@tycho.nsa.gov>
---
 arch/x86/mm/init_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 30564e2..df48430 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1132,7 +1132,7 @@ void mark_rodata_ro(void)
 * has been zapped already via cleanup_highmem().
 */
all_end = roundup((unsigned long)_brk_end, PMD_SIZE);
-   set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
+   set_memory_nx(text_end, (all_end - text_end) >> PAGE_SHIFT);
 
rodata_test();
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH] x86/mm: warn on W+x mappings

2015-10-01 Thread Stephen Smalley
Warn on any residual W+x mappings if X86_PTDUMP is enabled.

Sample dmesg output:
Checking for W+x mappings
0x81755000-0x8180 684K RW GLB x 
 pte
Found W+x mappings.  Please fix.

Signed-off-by: Stephen Smalley <s...@tycho.nsa.gov>
---
Not sure if this is the best place to put this check.
It must occur after free_init_pages() or it won't catch the
W+x case for the gap between __ex_table and rodata.

 arch/x86/include/asm/pgtable.h |  6 ++
 arch/x86/mm/dump_pagetables.c  | 31 ++-
 arch/x86/mm/init_64.c  |  2 ++
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 867da5b..8e771c1 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -20,6 +20,12 @@
 
 void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
 
+#ifdef CONFIG_X86_PTDUMP
+void ptdump_walk_pgd_level_checkwx(void);
+#else
+#define ptdump_walk_pgd_level_checkwx() do { } while (0)
+#endif
+
 /*
  * ZERO_PAGE is a global shared page that is always zero: used
  * for zero-mapped memory areas etc..
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index f0cedf3..986903b 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -32,6 +32,8 @@ struct pg_state {
const struct addr_marker *marker;
unsigned long lines;
bool to_dmesg;
+   bool check_wx;
+   bool found_wx;
 };
 
 struct addr_marker {
@@ -214,6 +216,13 @@ static void note_page(struct seq_file *m, struct pg_state 
*st,
const char *unit = units;
unsigned long delta;
int width = sizeof(unsigned long) * 2;
+   pgprotval_t pr = pgprot_val(st->current_prot);
+   bool savedmesg = st->to_dmesg;
+
+   if (st->check_wx && (pr & _PAGE_RW) && !(pr & _PAGE_NX)) {
+   st->to_dmesg = true;
+   st->found_wx = true;
+   }
 
/*
 * Now print the actual finished series
@@ -261,6 +270,7 @@ static void note_page(struct seq_file *m, struct pg_state 
*st,
st->start_address = st->current_address;
st->current_prot = new_prot;
st->level = level;
+   st->to_dmesg = savedmesg;
}
 }
 
@@ -344,7 +354,8 @@ static void walk_pud_level(struct seq_file *m, struct 
pg_state *st, pgd_t addr,
 #define pgd_none(a)  pud_none(__pud(pgd_val(a)))
 #endif
 
-void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd)
+static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
+  bool checkwx)
 {
 #ifdef CONFIG_X86_64
pgd_t *start = (pgd_t *) _level4_pgt;
@@ -359,6 +370,12 @@ void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd)
st.to_dmesg = true;
}
 
+   st.check_wx = checkwx;
+   if (checkwx) {
+   pr_info("Checking for W+x mappings\n");
+   st.found_wx = false;
+   }
+
for (i = 0; i < PTRS_PER_PGD; i++) {
st.current_address = normalize_addr(i * PGD_LEVEL_MULT);
if (!pgd_none(*start)) {
@@ -378,6 +395,18 @@ void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd)
/* Flush out the last page */
st.current_address = normalize_addr(PTRS_PER_PGD*PGD_LEVEL_MULT);
note_page(m, , __pgprot(0), 0);
+   if (checkwx && st.found_wx)
+   pr_warn("Found W+x mappings.  Please fix.\n");
+}
+
+void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd)
+{
+   ptdump_walk_pgd_level_core(m, pgd, false);
+}
+
+void ptdump_walk_pgd_level_checkwx(void)
+{
+   ptdump_walk_pgd_level_core(NULL, NULL, true);
 }
 
 static int ptdump_show(struct seq_file *m, void *v)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index df48430..7e704da 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1150,6 +1150,8 @@ void mark_rodata_ro(void)
free_init_pages("unused kernel",
(unsigned long) __va(__pa_symbol(rodata_end)),
(unsigned long) __va(__pa_symbol(_sdata)));
+
+   ptdump_walk_pgd_level_checkwx();
 }
 
 #endif
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] Security: Provide unioned file support

2015-09-30 Thread Stephen Smalley

On 09/29/2015 05:03 PM, Stephen Smalley wrote:

On 09/28/2015 04:00 PM, David Howells wrote:


The attached patches provide security support for unioned files where the
security involves an object-label-based LSM (such as SELinux) rather
than a
path-based LSM.

[Note that a number of the bits that were in the original patch set
are now
upstream and I've rebased on Casey's changes to the security hook system]

The patches can be broken down into two sets:

  (1) A patch to add LSM hooks to handle copy up of a file, including
label
  determination/setting and xattr filtration and a patch to have
  overlayfs call the hooks during the copy-up procedure.

  (2) My SELinux implementations of these hooks.  I do three things:

  (a) Don't copy up SELinux xattrs from the lower file to the upper
   file.  It is assumed that the upper file will be created
with the
   attrs we want or the selinux_inode_copy_up() hook will set it
   appropriately.

 The reason there are two separate hooks here is that
 selinux_inode_copy_up_xattr() might not ever be called if there
 aren't actually any xattrs on the lower inode.

  (b) I try to derive a label to be used by file operations by, in
order
   of preference: using the label on the union inode if there
is one
   (the normal overlayfs case); using the override label set
on the
   superblock, if provided; or trying to derive a new label by
sid
   transition operation.

  (c) Using the label obtained in (b) in file_has_perm() rather than
   using the label on the lower inode.

Now the steps I have outlined in (b) and (c) seem to be at odds with what
Dan Walsh and Stephen Smalley want - but I'm not sure I follow what that
is, let alone how to do it:

Wanted to bring back the original proposal.  Stephen suggested that
we could change back to the MLS way of handling labels.

In MCS we base the MCS label of content created by a process on the
containing directory.  Which means that if a process running as
s0:c1,c2 creates content in a directory labeled s0, it will get
created as s0.

In MLS if a process running as s0:c1,c2 creates content in a
directory it labels it s0:c1,c2.  No matter what the label of the
directory is.  (Well actually if the directory is not ranged the
process will not be able to create the content.)

We changed the default for MCS in Rawhide for about a week, when I
realized this was a huge problem for containers sharing content.
Currently if you want two containers to share the same volume
mount, we label the content as svirt_sandbox_file_t:s0 If one
process running as s0:c1,c2 creates content it gets created as s0,
if the second container process is running as s0:c3,c4, it can
still read/write the content.  If we changed the default the object
would get created as s0:c1,c2 and process runing as s0:c3,c4 would
be blocked.

So I had it reverted back to the standard MCS Mode.

If we could get the default to be MLS mode on COW file systems and
MCS on Volumes, we would get the best of both worlds.


How are you testing this?
I tried as follows:

# Make sure we have a policy that is using xattrs to label overlay inodes.
$ seinfo --fs_use | grep overlay
fs_use_xattr overlay system_u:object_r:fs_t:s0

# Define some types for the different directories involved.
$ cat overlay.te
policy_module(overlay, 1.0)

type lower_t;
files_type(lower_t)

type upper_t;
files_type(upper_t)

type work_t;
files_type(work_t)

type merged_t;
files_type(merged_t)

$ make -f /usr/share/selinux/devel/Makefile overlay.pp
$ sudo semodule -i overlay.pp

# Create and label the different directories involved.
$ mkdir lower upper work merged
$ chcon -t lower_t lower
$ chcon -t upper_t upper
$ chcon -t work_t work
$ chcon -t merged_t merged

# Populate lower
$ echo "lower" > lower/a
$ mkdir lower/b

# Mount overlay
$ sudo mount -t overlay -o lowerdir=lower,upperdir=upper,workdir=work
merged

# Look at the merged dir and labels.
$ ls -Z merged
unconfined_u:object_r:lower_t:s0 a
unconfined_u:object_r:lower_t:s0 b

# Write/create some files/directories.
$ echo "foo" >> merged/a
$ mkdir merged/b/c
$ mkdir merged/c

# Look again.
$ ls -ZR merged
merged:
unconfined_u:object_r:lower_t:s0 a  unconfined_u:object_r:upper_t:s0 c
unconfined_u:object_r:lower_t:s0 b

merged/b:
unconfined_u:object_r:work_t:s0 c

merged/b/c:

$ ls -ZR upper
upper:
  unconfined_u:object_r:work_t:s0 a  unconfined_u:object_r:upper_t:s0 c
  unconfined_u:object_r:work_t:s0 b

upper/b:
unconfined_u:object_r:work_t:s0 c

upper/b/c:

Note that the copied-up file (a) and directory (b) are labeled lower_t
in the overlay, but work_t in the upper dir, and neither of those is
really what we want IIUC.

And that's just the labeling question.  Then there is the question of
what permission checks were applied during those c

Re: [PATCH 0/5] Security: Provide unioned file support

2015-09-30 Thread Stephen Smalley

On 09/29/2015 05:03 PM, Stephen Smalley wrote:

On 09/28/2015 04:00 PM, David Howells wrote:


The attached patches provide security support for unioned files where the
security involves an object-label-based LSM (such as SELinux) rather
than a
path-based LSM.

[Note that a number of the bits that were in the original patch set
are now
upstream and I've rebased on Casey's changes to the security hook system]

The patches can be broken down into two sets:

  (1) A patch to add LSM hooks to handle copy up of a file, including
label
  determination/setting and xattr filtration and a patch to have
  overlayfs call the hooks during the copy-up procedure.

  (2) My SELinux implementations of these hooks.  I do three things:

  (a) Don't copy up SELinux xattrs from the lower file to the upper
   file.  It is assumed that the upper file will be created
with the
   attrs we want or the selinux_inode_copy_up() hook will set it
   appropriately.

 The reason there are two separate hooks here is that
 selinux_inode_copy_up_xattr() might not ever be called if there
 aren't actually any xattrs on the lower inode.

  (b) I try to derive a label to be used by file operations by, in
order
   of preference: using the label on the union inode if there
is one
   (the normal overlayfs case); using the override label set
on the
   superblock, if provided; or trying to derive a new label by
sid
   transition operation.

  (c) Using the label obtained in (b) in file_has_perm() rather than
   using the label on the lower inode.

Now the steps I have outlined in (b) and (c) seem to be at odds with what
Dan Walsh and Stephen Smalley want - but I'm not sure I follow what that
is, let alone how to do it:

Wanted to bring back the original proposal.  Stephen suggested that
we could change back to the MLS way of handling labels.

In MCS we base the MCS label of content created by a process on the
containing directory.  Which means that if a process running as
s0:c1,c2 creates content in a directory labeled s0, it will get
created as s0.

In MLS if a process running as s0:c1,c2 creates content in a
directory it labels it s0:c1,c2.  No matter what the label of the
directory is.  (Well actually if the directory is not ranged the
process will not be able to create the content.)

We changed the default for MCS in Rawhide for about a week, when I
realized this was a huge problem for containers sharing content.
Currently if you want two containers to share the same volume
mount, we label the content as svirt_sandbox_file_t:s0 If one
process running as s0:c1,c2 creates content it gets created as s0,
if the second container process is running as s0:c3,c4, it can
still read/write the content.  If we changed the default the object
would get created as s0:c1,c2 and process runing as s0:c3,c4 would
be blocked.

So I had it reverted back to the standard MCS Mode.

If we could get the default to be MLS mode on COW file systems and
MCS on Volumes, we would get the best of both worlds.


How are you testing this?
I tried as follows:

# Make sure we have a policy that is using xattrs to label overlay inodes.
$ seinfo --fs_use | grep overlay
fs_use_xattr overlay system_u:object_r:fs_t:s0

# Define some types for the different directories involved.
$ cat overlay.te
policy_module(overlay, 1.0)

type lower_t;
files_type(lower_t)

type upper_t;
files_type(upper_t)

type work_t;
files_type(work_t)

type merged_t;
files_type(merged_t)

$ make -f /usr/share/selinux/devel/Makefile overlay.pp
$ sudo semodule -i overlay.pp

# Create and label the different directories involved.
$ mkdir lower upper work merged
$ chcon -t lower_t lower
$ chcon -t upper_t upper
$ chcon -t work_t work
$ chcon -t merged_t merged

# Populate lower
$ echo "lower" > lower/a
$ mkdir lower/b

# Mount overlay
$ sudo mount -t overlay -o lowerdir=lower,upperdir=upper,workdir=work
merged

# Look at the merged dir and labels.
$ ls -Z merged
unconfined_u:object_r:lower_t:s0 a
unconfined_u:object_r:lower_t:s0 b

# Write/create some files/directories.
$ echo "foo" >> merged/a
$ mkdir merged/b/c
$ mkdir merged/c

# Look again.
$ ls -ZR merged
merged:
unconfined_u:object_r:lower_t:s0 a  unconfined_u:object_r:upper_t:s0 c
unconfined_u:object_r:lower_t:s0 b

merged/b:
unconfined_u:object_r:work_t:s0 c

merged/b/c:

$ ls -ZR upper
upper:
  unconfined_u:object_r:work_t:s0 a  unconfined_u:object_r:upper_t:s0 c
  unconfined_u:object_r:work_t:s0 b

upper/b:
unconfined_u:object_r:work_t:s0 c

upper/b/c:

Note that the copied-up file (a) and directory (b) are labeled lower_t
in the overlay, but work_t in the upper dir, and neither of those is
really what we want IIUC.

And that's just the labeling question.  Then there is the question of
what permission checks were applied during those c

Re: [PATCH 0/5] Security: Provide unioned file support

2015-09-29 Thread Stephen Smalley

On 09/28/2015 04:00 PM, David Howells wrote:


The attached patches provide security support for unioned files where the
security involves an object-label-based LSM (such as SELinux) rather than a
path-based LSM.

[Note that a number of the bits that were in the original patch set are now
upstream and I've rebased on Casey's changes to the security hook system]

The patches can be broken down into two sets:

  (1) A patch to add LSM hooks to handle copy up of a file, including label
  determination/setting and xattr filtration and a patch to have
  overlayfs call the hooks during the copy-up procedure.

  (2) My SELinux implementations of these hooks.  I do three things:

  (a) Don't copy up SELinux xattrs from the lower file to the upper
 file.  It is assumed that the upper file will be created with the
 attrs we want or the selinux_inode_copy_up() hook will set it
 appropriately.

 The reason there are two separate hooks here is that
 selinux_inode_copy_up_xattr() might not ever be called if there
 aren't actually any xattrs on the lower inode.

  (b) I try to derive a label to be used by file operations by, in order
 of preference: using the label on the union inode if there is one
 (the normal overlayfs case); using the override label set on the
 superblock, if provided; or trying to derive a new label by sid
 transition operation.

  (c) Using the label obtained in (b) in file_has_perm() rather than
 using the label on the lower inode.

Now the steps I have outlined in (b) and (c) seem to be at odds with what
Dan Walsh and Stephen Smalley want - but I'm not sure I follow what that
is, let alone how to do it:

Wanted to bring back the original proposal.  Stephen suggested that
we could change back to the MLS way of handling labels.

In MCS we base the MCS label of content created by a process on the
containing directory.  Which means that if a process running as
s0:c1,c2 creates content in a directory labeled s0, it will get
created as s0.

In MLS if a process running as s0:c1,c2 creates content in a
directory it labels it s0:c1,c2.  No matter what the label of the
directory is.  (Well actually if the directory is not ranged the
process will not be able to create the content.)

We changed the default for MCS in Rawhide for about a week, when I
realized this was a huge problem for containers sharing content.
Currently if you want two containers to share the same volume
mount, we label the content as svirt_sandbox_file_t:s0 If one
process running as s0:c1,c2 creates content it gets created as s0,
if the second container process is running as s0:c3,c4, it can
still read/write the content.  If we changed the default the object
would get created as s0:c1,c2 and process runing as s0:c3,c4 would
be blocked.

So I had it reverted back to the standard MCS Mode.

If we could get the default to be MLS mode on COW file systems and
MCS on Volumes, we would get the best of both worlds.


How are you testing this?
I tried as follows:

# Make sure we have a policy that is using xattrs to label overlay inodes.
$ seinfo --fs_use | grep overlay
   fs_use_xattr overlay system_u:object_r:fs_t:s0

# Define some types for the different directories involved.
$ cat overlay.te
policy_module(overlay, 1.0)

type lower_t;
files_type(lower_t)

type upper_t;
files_type(upper_t)

type work_t;
files_type(work_t)

type merged_t;
files_type(merged_t)

$ make -f /usr/share/selinux/devel/Makefile overlay.pp
$ sudo semodule -i overlay.pp

# Create and label the different directories involved.
$ mkdir lower upper work merged
$ chcon -t lower_t lower
$ chcon -t upper_t upper
$ chcon -t work_t work
$ chcon -t merged_t merged

# Populate lower
$ echo "lower" > lower/a
$ mkdir lower/b

# Mount overlay
$ sudo mount -t overlay -o lowerdir=lower,upperdir=upper,workdir=work merged

# Look at the merged dir and labels.
$ ls -Z merged
unconfined_u:object_r:lower_t:s0 a
unconfined_u:object_r:lower_t:s0 b

# Write/create some files/directories.
$ echo "foo" >> merged/a
$ mkdir merged/b/c
$ mkdir merged/c

# Look again.
$ ls -ZR merged
merged:
unconfined_u:object_r:lower_t:s0 a  unconfined_u:object_r:upper_t:s0 c
unconfined_u:object_r:lower_t:s0 b

merged/b:
unconfined_u:object_r:work_t:s0 c

merged/b/c:

$ ls -ZR upper
upper:
 unconfined_u:object_r:work_t:s0 a  unconfined_u:object_r:upper_t:s0 c
 unconfined_u:object_r:work_t:s0 b

upper/b:
unconfined_u:object_r:work_t:s0 c

upper/b/c:

Note that the copied-up file (a) and directory (b) are labeled lower_t 
in the overlay, but work_t in the upper dir, and neither of those is 
really what we want IIUC.


And that's just the labeling question.  Then there is the question of 
what permission che

Re: [PATCH 1/2] selinux: ioctl_has_perm should be static

2015-09-29 Thread Stephen Smalley

On 09/27/2015 11:10 AM, Geliang Tang wrote:

Fixes the following sparse warning:

  security/selinux/hooks.c:3242:5: warning: symbol 'ioctl_has_perm' was
  not declared. Should it be static?

Signed-off-by: Geliang Tang 


Acked-by:  Stephen Smalley 


---
  security/selinux/hooks.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 84d21f9..5265c74 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3239,7 +3239,7 @@ static void selinux_file_free_security(struct file *file)
   * Check whether a task has the ioctl permission and cmd
   * operation to an inode.
   */
-int ioctl_has_perm(const struct cred *cred, struct file *file,
+static int ioctl_has_perm(const struct cred *cred, struct file *file,
u32 requested, u16 cmd)
  {
struct common_audit_data ad;



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] selinux: use sprintf return value

2015-09-29 Thread Stephen Smalley

On 09/25/2015 06:34 PM, Rasmus Villemoes wrote:

sprintf returns the number of characters printed (excluding '\0'), so
we can use that and avoid duplicating the length computation.

Signed-off-by: Rasmus Villemoes 


Acked-by:  Stephen Smalley 


---
  security/selinux/ss/services.c | 5 +
  1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index aa2bdcb20848..ebb5eb3c318c 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -1218,13 +1218,10 @@ static int context_struct_to_string(struct context 
*context, char **scontext, u3
/*
 * Copy the user name, role name and type name into the context.
 */
-   sprintf(scontextp, "%s:%s:%s",
+   scontextp += sprintf(scontextp, "%s:%s:%s",
sym_name(, SYM_USERS, context->user - 1),
sym_name(, SYM_ROLES, context->role - 1),
sym_name(, SYM_TYPES, context->type - 1));
-   scontextp += strlen(sym_name(, SYM_USERS, context->user - 1)) +
-1 + strlen(sym_name(, SYM_ROLES, context->role - 
1)) +
-1 + strlen(sym_name(, SYM_TYPES, context->type - 
1));

mls_sid_to_context(context, );




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] selinux: use kstrdup() in security_get_bools()

2015-09-29 Thread Stephen Smalley

On 09/25/2015 06:34 PM, Rasmus Villemoes wrote:

This is much simpler.

Signed-off-by: Rasmus Villemoes 


Acked-by:  Stephen Smalley 


---
  security/selinux/ss/services.c | 8 +---
  1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index 994c824a34c6..aa2bdcb20848 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -2609,18 +2609,12 @@ int security_get_bools(int *len, char ***names, int 
**values)
goto err;

for (i = 0; i < *len; i++) {
-   size_t name_len;
-
(*values)[i] = policydb.bool_val_to_struct[i]->state;
-   name_len = strlen(sym_name(, SYM_BOOLS, i)) + 1;

rc = -ENOMEM;
-   (*names)[i] = kmalloc(sizeof(char) * name_len, GFP_ATOMIC);
+   (*names)[i] = kstrdup(sym_name(, SYM_BOOLS, i), 
GFP_ATOMIC);
if (!(*names)[i])
goto err;
-
-   strncpy((*names)[i], sym_name(, SYM_BOOLS, i), 
name_len);
-   (*names)[i][name_len - 1] = 0;
}
rc = 0;
  out:



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] selinux: use kmemdup in security_sid_to_context_core()

2015-09-29 Thread Stephen Smalley

On 09/25/2015 06:34 PM, Rasmus Villemoes wrote:

Signed-off-by: Rasmus Villemoes 


Acked-by:  Stephen Smalley 


---
  security/selinux/ss/services.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index c550df0e0ff1..994c824a34c6 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -1259,12 +1259,12 @@ static int security_sid_to_context_core(u32 sid, char 
**scontext,
*scontext_len = strlen(initial_sid_to_string[sid]) + 1;
if (!scontext)
goto out;
-   scontextp = kmalloc(*scontext_len, GFP_ATOMIC);
+   scontextp = kmemdup(initial_sid_to_string[sid],
+   *scontext_len, GFP_ATOMIC);
if (!scontextp) {
rc = -ENOMEM;
goto out;
}
-   strcpy(scontextp, initial_sid_to_string[sid]);
*scontext = scontextp;
goto out;
}



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/5] selinux: remove pointless cast in selinux_inode_setsecurity()

2015-09-29 Thread Stephen Smalley

On 09/25/2015 06:34 PM, Rasmus Villemoes wrote:

security_context_to_sid() expects a const char* argument, so there's
no point in casting away the const qualifier of value.

Signed-off-by: Rasmus Villemoes 


Acked-by:  Stephen Smalley 


---
  security/selinux/hooks.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index fd50cd5ac2ec..5edb57df86f8 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3162,7 +3162,7 @@ static int selinux_inode_setsecurity(struct inode *inode, 
const char *name,
if (!value || !size)
return -EACCES;

-   rc = security_context_to_sid((void *)value, size, , GFP_KERNEL);
+   rc = security_context_to_sid(value, size, , GFP_KERNEL);
if (rc)
return rc;




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] selinux: introduce security_context_str_to_sid

2015-09-29 Thread Stephen Smalley

On 09/25/2015 06:34 PM, Rasmus Villemoes wrote:

There seems to be a little confusion as to whether the scontext_len
parameter of security_context_to_sid() includes the nul-byte or
not. Reading security_context_to_sid_core(), it seems that the
expectation is that it does not (both the string copying and the test
for scontext_len being zero hint at that).

Introduce the helper security_context_str_to_sid() to do the strlen()
call and fix all callers.

Signed-off-by: Rasmus Villemoes 


Acked-by:  Stephen Smalley 


---
  security/selinux/hooks.c| 12 
  security/selinux/include/security.h |  2 ++
  security/selinux/selinuxfs.c| 26 +-
  security/selinux/ss/services.c  |  5 +
  4 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index e4369d86e588..fd50cd5ac2ec 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -674,10 +674,9 @@ static int selinux_set_mnt_opts(struct super_block *sb,

if (flags[i] == SBLABEL_MNT)
continue;
-   rc = security_context_to_sid(mount_options[i],
-strlen(mount_options[i]), , 
GFP_KERNEL);
+   rc = security_context_str_to_sid(mount_options[i], , 
GFP_KERNEL);
if (rc) {
-   printk(KERN_WARNING "SELinux: security_context_to_sid"
+   printk(KERN_WARNING "SELinux: 
security_context_str_to_sid"
   "(%s) failed for (dev %s, type %s) errno=%d\n",
   mount_options[i], sb->s_id, name, rc);
goto out;
@@ -2617,15 +2616,12 @@ static int selinux_sb_remount(struct super_block *sb, 
void *data)

for (i = 0; i < opts.num_mnt_opts; i++) {
u32 sid;
-   size_t len;

if (flags[i] == SBLABEL_MNT)
continue;
-   len = strlen(mount_options[i]);
-   rc = security_context_to_sid(mount_options[i], len, ,
-GFP_KERNEL);
+   rc = security_context_str_to_sid(mount_options[i], , 
GFP_KERNEL);
if (rc) {
-   printk(KERN_WARNING "SELinux: security_context_to_sid"
+   printk(KERN_WARNING "SELinux: 
security_context_str_to_sid"
   "(%s) failed for (dev %s, type %s) errno=%d\n",
   mount_options[i], sb->s_id, sb->s_type->name, 
rc);
goto out_free_opts;
diff --git a/security/selinux/include/security.h 
b/security/selinux/include/security.h
index 6a681d26bf20..223e9fd15d66 100644
--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h
@@ -166,6 +166,8 @@ int security_sid_to_context_force(u32 sid, char **scontext, 
u32 *scontext_len);
  int security_context_to_sid(const char *scontext, u32 scontext_len,
u32 *out_sid, gfp_t gfp);

+int security_context_str_to_sid(const char *scontext, u32 *out_sid, gfp_t gfp);
+
  int security_context_to_sid_default(const char *scontext, u32 scontext_len,
u32 *out_sid, u32 def_sid, gfp_t gfp_flags);

diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index 5bed7716f8ab..c02da25d7b63 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -731,13 +731,11 @@ static ssize_t sel_write_access(struct file *file, char 
*buf, size_t size)
if (sscanf(buf, "%s %s %hu", scon, tcon, ) != 3)
goto out;

-   length = security_context_to_sid(scon, strlen(scon) + 1, ,
-GFP_KERNEL);
+   length = security_context_str_to_sid(scon, , GFP_KERNEL);
if (length)
goto out;

-   length = security_context_to_sid(tcon, strlen(tcon) + 1, ,
-GFP_KERNEL);
+   length = security_context_str_to_sid(tcon, , GFP_KERNEL);
if (length)
goto out;

@@ -819,13 +817,11 @@ static ssize_t sel_write_create(struct file *file, char 
*buf, size_t size)
objname = namebuf;
}

-   length = security_context_to_sid(scon, strlen(scon) + 1, ,
-GFP_KERNEL);
+   length = security_context_str_to_sid(scon, , GFP_KERNEL);
if (length)
goto out;

-   length = security_context_to_sid(tcon, strlen(tcon) + 1, ,
-GFP_KERNEL);
+   length = security_context_str_to_sid(tcon, , GFP_KERNEL);
if (length)
goto out;

@@ -882,13 +878,11 @@ static ssize_t sel_write_relabel(struct file *file, char 
*buf, size_t size)
if (s

Re: [PATCH 0/5] selinux: minor cleanup suggestions

2015-09-29 Thread Stephen Smalley

On 09/25/2015 06:34 PM, Rasmus Villemoes wrote:

A few random things I stumbled on.

While I'm pretty sure of the change in 1/5, I'm also confused, because
the doc for the reverse security_sid_to_context state that
@scontext_len is set to "the length of the string", which one would
normally interpret as being what strlen() would give (i.e., without
the \0). However, security_sid_to_context_core clearly includes the \0
in the return value, and I think callers rely on that.


It is historical; originally security_context_to_sid() required 
@scontext to be NUL-terminated and @scontext_len to include the NUL byte 
in the length, and security_sid_to_context() returned a NUL-terminated 
@scontext and included the NUL byte in the returned length.  However, 
when we switched SELinux to using xattrs rather than its own persistent 
label mapping, security_context_to_sid() was changed to accept contexts 
that did not already include the NUL because setfattr did not consider 
the NUL to be part of the attribute value for strings.  So presently it 
accepts either form, although we prefer them to be NUL-terminated and 
canonicalize them to that form before returning to userspace.




Rasmus Villemoes (5):
   selinux: introduce security_context_str_to_sid
   selinux: remove pointless cast in selinux_inode_setsecurity()
   selinux: use kmemdup in security_sid_to_context_core()
   selinux: use kstrdup() in security_get_bools()
   selinux: use sprintf return value

  security/selinux/hooks.c| 14 +-
  security/selinux/include/security.h |  2 ++
  security/selinux/selinuxfs.c| 26 +-
  security/selinux/ss/services.c  | 22 +-
  4 files changed, 25 insertions(+), 39 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] Security: Provide unioned file support

2015-09-29 Thread Stephen Smalley

On 09/28/2015 04:00 PM, David Howells wrote:


The attached patches provide security support for unioned files where the
security involves an object-label-based LSM (such as SELinux) rather than a
path-based LSM.

[Note that a number of the bits that were in the original patch set are now
upstream and I've rebased on Casey's changes to the security hook system]

The patches can be broken down into two sets:

  (1) A patch to add LSM hooks to handle copy up of a file, including label
  determination/setting and xattr filtration and a patch to have
  overlayfs call the hooks during the copy-up procedure.

  (2) My SELinux implementations of these hooks.  I do three things:

  (a) Don't copy up SELinux xattrs from the lower file to the upper
 file.  It is assumed that the upper file will be created with the
 attrs we want or the selinux_inode_copy_up() hook will set it
 appropriately.

 The reason there are two separate hooks here is that
 selinux_inode_copy_up_xattr() might not ever be called if there
 aren't actually any xattrs on the lower inode.

  (b) I try to derive a label to be used by file operations by, in order
 of preference: using the label on the union inode if there is one
 (the normal overlayfs case); using the override label set on the
 superblock, if provided; or trying to derive a new label by sid
 transition operation.

  (c) Using the label obtained in (b) in file_has_perm() rather than
 using the label on the lower inode.

Now the steps I have outlined in (b) and (c) seem to be at odds with what
Dan Walsh and Stephen Smalley want - but I'm not sure I follow what that
is, let alone how to do it:

Wanted to bring back the original proposal.  Stephen suggested that
we could change back to the MLS way of handling labels.

In MCS we base the MCS label of content created by a process on the
containing directory.  Which means that if a process running as
s0:c1,c2 creates content in a directory labeled s0, it will get
created as s0.

In MLS if a process running as s0:c1,c2 creates content in a
directory it labels it s0:c1,c2.  No matter what the label of the
directory is.  (Well actually if the directory is not ranged the
process will not be able to create the content.)

We changed the default for MCS in Rawhide for about a week, when I
realized this was a huge problem for containers sharing content.
Currently if you want two containers to share the same volume
mount, we label the content as svirt_sandbox_file_t:s0 If one
process running as s0:c1,c2 creates content it gets created as s0,
if the second container process is running as s0:c3,c4, it can
still read/write the content.  If we changed the default the object
would get created as s0:c1,c2 and process runing as s0:c3,c4 would
be blocked.

So I had it reverted back to the standard MCS Mode.

If we could get the default to be MLS mode on COW file systems and
MCS on Volumes, we would get the best of both worlds.


How are you testing this?
I tried as follows:

# Make sure we have a policy that is using xattrs to label overlay inodes.
$ seinfo --fs_use | grep overlay
   fs_use_xattr overlay system_u:object_r:fs_t:s0

# Define some types for the different directories involved.
$ cat overlay.te
policy_module(overlay, 1.0)

type lower_t;
files_type(lower_t)

type upper_t;
files_type(upper_t)

type work_t;
files_type(work_t)

type merged_t;
files_type(merged_t)

$ make -f /usr/share/selinux/devel/Makefile overlay.pp
$ sudo semodule -i overlay.pp

# Create and label the different directories involved.
$ mkdir lower upper work merged
$ chcon -t lower_t lower
$ chcon -t upper_t upper
$ chcon -t work_t work
$ chcon -t merged_t merged

# Populate lower
$ echo "lower" > lower/a
$ mkdir lower/b

# Mount overlay
$ sudo mount -t overlay -o lowerdir=lower,upperdir=upper,workdir=work merged

# Look at the merged dir and labels.
$ ls -Z merged
unconfined_u:object_r:lower_t:s0 a
unconfined_u:object_r:lower_t:s0 b

# Write/create some files/directories.
$ echo "foo" >> merged/a
$ mkdir merged/b/c
$ mkdir merged/c

# Look again.
$ ls -ZR merged
merged:
unconfined_u:object_r:lower_t:s0 a  unconfined_u:object_r:upper_t:s0 c
unconfined_u:object_r:lower_t:s0 b

merged/b:
unconfined_u:object_r:work_t:s0 c

merged/b/c:

$ ls -ZR upper
upper:
 unconfined_u:object_r:work_t:s0 a  unconfined_u:object_r:upper_t:s0 c
 unconfined_u:object_r:work_t:s0 b

upper/b:
unconfined_u:object_r:work_t:s0 c

upper/b/c:

Note that the copied-up file (a) and directory (b) are labeled lower_t 
in the overlay, but work_t in the upper dir, and neither of those is 
really what we want IIUC.


And that's just the labeling question.  Then there is the question of 
what permission che

Re: [PATCH 4/5] selinux: use kstrdup() in security_get_bools()

2015-09-29 Thread Stephen Smalley

On 09/25/2015 06:34 PM, Rasmus Villemoes wrote:

This is much simpler.

Signed-off-by: Rasmus Villemoes <li...@rasmusvillemoes.dk>


Acked-by:  Stephen Smalley <s...@tycho.nsa.gov>


---
  security/selinux/ss/services.c | 8 +---
  1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index 994c824a34c6..aa2bdcb20848 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -2609,18 +2609,12 @@ int security_get_bools(int *len, char ***names, int 
**values)
goto err;

for (i = 0; i < *len; i++) {
-   size_t name_len;
-
(*values)[i] = policydb.bool_val_to_struct[i]->state;
-   name_len = strlen(sym_name(, SYM_BOOLS, i)) + 1;

rc = -ENOMEM;
-   (*names)[i] = kmalloc(sizeof(char) * name_len, GFP_ATOMIC);
+   (*names)[i] = kstrdup(sym_name(, SYM_BOOLS, i), 
GFP_ATOMIC);
if (!(*names)[i])
goto err;
-
-   strncpy((*names)[i], sym_name(, SYM_BOOLS, i), 
name_len);
-   (*names)[i][name_len - 1] = 0;
}
rc = 0;
  out:



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] selinux: introduce security_context_str_to_sid

2015-09-29 Thread Stephen Smalley

On 09/25/2015 06:34 PM, Rasmus Villemoes wrote:

There seems to be a little confusion as to whether the scontext_len
parameter of security_context_to_sid() includes the nul-byte or
not. Reading security_context_to_sid_core(), it seems that the
expectation is that it does not (both the string copying and the test
for scontext_len being zero hint at that).

Introduce the helper security_context_str_to_sid() to do the strlen()
call and fix all callers.

Signed-off-by: Rasmus Villemoes <li...@rasmusvillemoes.dk>


Acked-by:  Stephen Smalley <s...@tycho.nsa.gov>


---
  security/selinux/hooks.c| 12 
  security/selinux/include/security.h |  2 ++
  security/selinux/selinuxfs.c| 26 +-
  security/selinux/ss/services.c  |  5 +
  4 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index e4369d86e588..fd50cd5ac2ec 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -674,10 +674,9 @@ static int selinux_set_mnt_opts(struct super_block *sb,

if (flags[i] == SBLABEL_MNT)
continue;
-   rc = security_context_to_sid(mount_options[i],
-strlen(mount_options[i]), , 
GFP_KERNEL);
+   rc = security_context_str_to_sid(mount_options[i], , 
GFP_KERNEL);
if (rc) {
-   printk(KERN_WARNING "SELinux: security_context_to_sid"
+   printk(KERN_WARNING "SELinux: 
security_context_str_to_sid"
   "(%s) failed for (dev %s, type %s) errno=%d\n",
   mount_options[i], sb->s_id, name, rc);
goto out;
@@ -2617,15 +2616,12 @@ static int selinux_sb_remount(struct super_block *sb, 
void *data)

for (i = 0; i < opts.num_mnt_opts; i++) {
u32 sid;
-   size_t len;

if (flags[i] == SBLABEL_MNT)
continue;
-   len = strlen(mount_options[i]);
-   rc = security_context_to_sid(mount_options[i], len, ,
-GFP_KERNEL);
+   rc = security_context_str_to_sid(mount_options[i], , 
GFP_KERNEL);
if (rc) {
-   printk(KERN_WARNING "SELinux: security_context_to_sid"
+   printk(KERN_WARNING "SELinux: 
security_context_str_to_sid"
   "(%s) failed for (dev %s, type %s) errno=%d\n",
   mount_options[i], sb->s_id, sb->s_type->name, 
rc);
goto out_free_opts;
diff --git a/security/selinux/include/security.h 
b/security/selinux/include/security.h
index 6a681d26bf20..223e9fd15d66 100644
--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h
@@ -166,6 +166,8 @@ int security_sid_to_context_force(u32 sid, char **scontext, 
u32 *scontext_len);
  int security_context_to_sid(const char *scontext, u32 scontext_len,
u32 *out_sid, gfp_t gfp);

+int security_context_str_to_sid(const char *scontext, u32 *out_sid, gfp_t gfp);
+
  int security_context_to_sid_default(const char *scontext, u32 scontext_len,
u32 *out_sid, u32 def_sid, gfp_t gfp_flags);

diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index 5bed7716f8ab..c02da25d7b63 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -731,13 +731,11 @@ static ssize_t sel_write_access(struct file *file, char 
*buf, size_t size)
if (sscanf(buf, "%s %s %hu", scon, tcon, ) != 3)
goto out;

-   length = security_context_to_sid(scon, strlen(scon) + 1, ,
-GFP_KERNEL);
+   length = security_context_str_to_sid(scon, , GFP_KERNEL);
if (length)
goto out;

-   length = security_context_to_sid(tcon, strlen(tcon) + 1, ,
-GFP_KERNEL);
+   length = security_context_str_to_sid(tcon, , GFP_KERNEL);
if (length)
goto out;

@@ -819,13 +817,11 @@ static ssize_t sel_write_create(struct file *file, char 
*buf, size_t size)
objname = namebuf;
}

-   length = security_context_to_sid(scon, strlen(scon) + 1, ,
-GFP_KERNEL);
+   length = security_context_str_to_sid(scon, , GFP_KERNEL);
if (length)
goto out;

-   length = security_context_to_sid(tcon, strlen(tcon) + 1, ,
-GFP_KERNEL);
+   length = security_context_str_to_sid(tcon, , GFP_KERNEL);
if (length)
goto out;

@@ -882,13 +878,11 @@ static ssize_t sel_write_relabel(st

Re: [PATCH 3/5] selinux: use kmemdup in security_sid_to_context_core()

2015-09-29 Thread Stephen Smalley

On 09/25/2015 06:34 PM, Rasmus Villemoes wrote:

Signed-off-by: Rasmus Villemoes <li...@rasmusvillemoes.dk>


Acked-by:  Stephen Smalley <s...@tycho.nsa.gov>


---
  security/selinux/ss/services.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index c550df0e0ff1..994c824a34c6 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -1259,12 +1259,12 @@ static int security_sid_to_context_core(u32 sid, char 
**scontext,
*scontext_len = strlen(initial_sid_to_string[sid]) + 1;
if (!scontext)
goto out;
-   scontextp = kmalloc(*scontext_len, GFP_ATOMIC);
+   scontextp = kmemdup(initial_sid_to_string[sid],
+   *scontext_len, GFP_ATOMIC);
if (!scontextp) {
rc = -ENOMEM;
goto out;
}
-   strcpy(scontextp, initial_sid_to_string[sid]);
*scontext = scontextp;
goto out;
}



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] selinux: use sprintf return value

2015-09-29 Thread Stephen Smalley

On 09/25/2015 06:34 PM, Rasmus Villemoes wrote:

sprintf returns the number of characters printed (excluding '\0'), so
we can use that and avoid duplicating the length computation.

Signed-off-by: Rasmus Villemoes <li...@rasmusvillemoes.dk>


Acked-by:  Stephen Smalley <s...@tycho.nsa.gov>


---
  security/selinux/ss/services.c | 5 +
  1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index aa2bdcb20848..ebb5eb3c318c 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -1218,13 +1218,10 @@ static int context_struct_to_string(struct context 
*context, char **scontext, u3
/*
 * Copy the user name, role name and type name into the context.
 */
-   sprintf(scontextp, "%s:%s:%s",
+   scontextp += sprintf(scontextp, "%s:%s:%s",
sym_name(, SYM_USERS, context->user - 1),
sym_name(, SYM_ROLES, context->role - 1),
sym_name(, SYM_TYPES, context->type - 1));
-   scontextp += strlen(sym_name(, SYM_USERS, context->user - 1)) +
-1 + strlen(sym_name(, SYM_ROLES, context->role - 
1)) +
-1 + strlen(sym_name(, SYM_TYPES, context->type - 
1));

mls_sid_to_context(context, );




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] selinux: minor cleanup suggestions

2015-09-29 Thread Stephen Smalley

On 09/25/2015 06:34 PM, Rasmus Villemoes wrote:

A few random things I stumbled on.

While I'm pretty sure of the change in 1/5, I'm also confused, because
the doc for the reverse security_sid_to_context state that
@scontext_len is set to "the length of the string", which one would
normally interpret as being what strlen() would give (i.e., without
the \0). However, security_sid_to_context_core clearly includes the \0
in the return value, and I think callers rely on that.


It is historical; originally security_context_to_sid() required 
@scontext to be NUL-terminated and @scontext_len to include the NUL byte 
in the length, and security_sid_to_context() returned a NUL-terminated 
@scontext and included the NUL byte in the returned length.  However, 
when we switched SELinux to using xattrs rather than its own persistent 
label mapping, security_context_to_sid() was changed to accept contexts 
that did not already include the NUL because setfattr did not consider 
the NUL to be part of the attribute value for strings.  So presently it 
accepts either form, although we prefer them to be NUL-terminated and 
canonicalize them to that form before returning to userspace.




Rasmus Villemoes (5):
   selinux: introduce security_context_str_to_sid
   selinux: remove pointless cast in selinux_inode_setsecurity()
   selinux: use kmemdup in security_sid_to_context_core()
   selinux: use kstrdup() in security_get_bools()
   selinux: use sprintf return value

  security/selinux/hooks.c| 14 +-
  security/selinux/include/security.h |  2 ++
  security/selinux/selinuxfs.c| 26 +-
  security/selinux/ss/services.c  | 22 +-
  4 files changed, 25 insertions(+), 39 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/5] selinux: remove pointless cast in selinux_inode_setsecurity()

2015-09-29 Thread Stephen Smalley

On 09/25/2015 06:34 PM, Rasmus Villemoes wrote:

security_context_to_sid() expects a const char* argument, so there's
no point in casting away the const qualifier of value.

Signed-off-by: Rasmus Villemoes <li...@rasmusvillemoes.dk>


Acked-by:  Stephen Smalley <s...@tycho.nsa.gov>


---
  security/selinux/hooks.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index fd50cd5ac2ec..5edb57df86f8 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3162,7 +3162,7 @@ static int selinux_inode_setsecurity(struct inode *inode, 
const char *name,
if (!value || !size)
return -EACCES;

-   rc = security_context_to_sid((void *)value, size, , GFP_KERNEL);
+   rc = security_context_to_sid(value, size, , GFP_KERNEL);
if (rc)
return rc;




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] selinux: ioctl_has_perm should be static

2015-09-29 Thread Stephen Smalley

On 09/27/2015 11:10 AM, Geliang Tang wrote:

Fixes the following sparse warning:

  security/selinux/hooks.c:3242:5: warning: symbol 'ioctl_has_perm' was
  not declared. Should it be static?

Signed-off-by: Geliang Tang <geliangt...@163.com>


Acked-by:  Stephen Smalley <s...@tycho.nsa.gov>


---
  security/selinux/hooks.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 84d21f9..5265c74 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3239,7 +3239,7 @@ static void selinux_file_free_security(struct file *file)
   * Check whether a task has the ioctl permission and cmd
   * operation to an inode.
   */
-int ioctl_has_perm(const struct cred *cred, struct file *file,
+static int ioctl_has_perm(const struct cred *cred, struct file *file,
u32 requested, u16 cmd)
  {
struct common_audit_data ad;



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: rwx mapping between ex_table and rodata

2015-09-28 Thread Stephen Smalley
On 09/24/2015 06:25 PM, Kees Cook wrote:
> On Thu, Sep 24, 2015 at 1:26 PM, Stephen Smalley  wrote:
>> Hi,
>>
>> With the attached config and 4.3-rc2 on x86_64, I see the following in 
>> /sys/kernel/debug/kernel_page_tables:
>> ...
>> ---[ High Kernel Mapping ]---
>> 0x8000-0x8100  16M   
>> pmd
>> 0x8100-0x8160   6M ro PSE 
>> GLB x  pmd
>> 0x8160-0x817750001492K ro 
>> GLB x  pte
>> 0x81775000-0x8180 556K RW 
>> GLB x  pte
>> ^
>> 0x8180-0x81a0   2M ro PSE 
>> GLB NX pmd
>> 0x81a0-0x81b430001292K ro 
>> GLB NX pte
>> 0x81b43000-0x82004852K RW 
>> GLB NX pte
>> 0x8200-0x8220   2M RW PSE 
>> GLB NX pmd
>> 0x8220-0xa000 478M   
>> pmd
>> ...
>>
>> This region seems to be between the end of ex_table and the start of rodata,
>> $ objdump -x vmlinux | sort
>> ...
>> 817728b0 g   __ex_table  __start___ex_table
>> 817728b0 ld  __ex_table  __ex_table
>> 81774998 g   __ex_table  __stop___ex_table
>> 8180 g   .rodata __start_rodata
>> 8180 ld  .rodata .rodata
>> ...
>>
>> $ readelf -a vmlinux
>> ...
>> Section Headers:
>>   [Nr] Name  Type Address   Offset
>>Size  EntSize  Flags  Link  Info  Align
>> ...
>>   [ 3] __ex_tablePROGBITS 817728b0  009728b0
>>20e8     A   0 0 8
>>   [ 4] .rodata   PROGBITS 8180  00a0
>>002eefd2     A   0 0 64
>> ...
>>
>> I see a similar rwx mapping with the stock Fedora kernels (e.g. 4.1.6), so 
>> it isn't new to 4.3.
> 
> To me it looks like another alignment/padding issue like got fixed
> before. The space between __ex_table and rodata is (seems?) unused, so
> the default page table permissions end up being W+X. Can we fix the
> default to be NX instead? It'll make these bugs stay gone.

Not sure where that would get fixed (or the ramifications), but is there
a reason we can't just do the following to fix this particular case?

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 30564e2..df48430 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1132,7 +1132,7 @@ void mark_rodata_ro(void)
 * has been zapped already via cleanup_highmem().
 */
all_end = roundup((unsigned long)_brk_end, PMD_SIZE);
-   set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
+   set_memory_nx(text_end, (all_end - text_end) >> PAGE_SHIFT);
 
rodata_test();


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: rwx mapping between ex_table and rodata

2015-09-28 Thread Stephen Smalley
On 09/24/2015 06:25 PM, Kees Cook wrote:
> On Thu, Sep 24, 2015 at 1:26 PM, Stephen Smalley <s...@tycho.nsa.gov> wrote:
>> Hi,
>>
>> With the attached config and 4.3-rc2 on x86_64, I see the following in 
>> /sys/kernel/debug/kernel_page_tables:
>> ...
>> ---[ High Kernel Mapping ]---
>> 0x8000-0x8100  16M   
>> pmd
>> 0x8100-0x8160   6M ro PSE 
>> GLB x  pmd
>> 0x8160-0x817750001492K ro 
>> GLB x  pte
>> 0x81775000-0x8180 556K RW 
>> GLB x  pte
>> ^
>> 0x8180-0x81a0   2M ro PSE 
>> GLB NX pmd
>> 0x81a0-0x81b430001292K ro 
>> GLB NX pte
>> 0x81b43000-0x82004852K RW 
>> GLB NX pte
>> 0x8200-0x8220   2M RW PSE 
>> GLB NX pmd
>> 0x8220-0xa000 478M   
>> pmd
>> ...
>>
>> This region seems to be between the end of ex_table and the start of rodata,
>> $ objdump -x vmlinux | sort
>> ...
>> 817728b0 g   __ex_table  __start___ex_table
>> 817728b0 ld  __ex_table  __ex_table
>> 81774998 g   __ex_table  __stop___ex_table
>> 8180 g   .rodata __start_rodata
>> 8180 ld  .rodata .rodata
>> ...
>>
>> $ readelf -a vmlinux
>> ...
>> Section Headers:
>>   [Nr] Name  Type Address   Offset
>>Size  EntSize  Flags  Link  Info  Align
>> ...
>>   [ 3] __ex_tablePROGBITS 817728b0  009728b0
>>20e8     A   0 0 8
>>   [ 4] .rodata   PROGBITS 8180  00a0
>>002eefd2     A   0 0 64
>> ...
>>
>> I see a similar rwx mapping with the stock Fedora kernels (e.g. 4.1.6), so 
>> it isn't new to 4.3.
> 
> To me it looks like another alignment/padding issue like got fixed
> before. The space between __ex_table and rodata is (seems?) unused, so
> the default page table permissions end up being W+X. Can we fix the
> default to be NX instead? It'll make these bugs stay gone.

Not sure where that would get fixed (or the ramifications), but is there
a reason we can't just do the following to fix this particular case?

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 30564e2..df48430 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1132,7 +1132,7 @@ void mark_rodata_ro(void)
 * has been zapped already via cleanup_highmem().
 */
all_end = roundup((unsigned long)_brk_end, PMD_SIZE);
-   set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
+   set_memory_nx(text_end, (all_end - text_end) >> PAGE_SHIFT);
 
rodata_test();


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/7] fs: Add user namesapace member to struct super_block

2015-08-06 Thread Stephen Smalley
On 08/06/2015 11:44 AM, Seth Forshee wrote:
> On Thu, Aug 06, 2015 at 10:51:16AM -0400, Stephen Smalley wrote:
>> On 08/06/2015 10:20 AM, Seth Forshee wrote:
>>> On Wed, Aug 05, 2015 at 04:19:03PM -0500, Eric W. Biederman wrote:
>>>> Seth Forshee  writes:
>>>>
>>>>> On Wed, Jul 15, 2015 at 09:47:11PM -0500, Eric W. Biederman wrote:
>>>>>> Seth Forshee  writes:
>>>>>>
>>>>>>> Initially this will be used to eliminate the implicit MNT_NODEV
>>>>>>> flag for mounts from user namespaces. In the future it will also
>>>>>>> be used for translating ids and checking capabilities for
>>>>>>> filesystems mounted from user namespaces.
>>>>>>>
>>>>>>> s_user_ns is initialized in alloc_super() and is generally set to
>>>>>>> current_user_ns(). To avoid security and corruption issues, two
>>>>>>> additional mount checks are also added:
>>>>>>>
>>>>>>>  - do_new_mount() gains a check that the user has CAP_SYS_ADMIN
>>>>>>>in current_user_ns().
>>>>>>>
>>>>>>>  - sget() will fail with EBUSY when the filesystem it's looking
>>>>>>>for is already mounted from another user namespace.
>>>>>>>
>>>>>>> proc needs some special handling here. The user namespace of
>>>>>>> current isn't appropriate when forking as a result of clone (2)
>>>>>>> with CLONE_NEWPID|CLONE_NEWUSER, as it will make proc unmountable
>>>>>>> from within the new user namespace. Instead, the user namespace
>>>>>>> which owns the new pid namespace should be used. sget_userns() is
>>>>>>> added to allow passing of a user namespace other than that of
>>>>>>> current, and this is used by proc_mount(). sget() becomes a
>>>>>>> wrapper around sget_userns() which passes current_user_ns().
>>>>>>
>>>>>> From bits of the previous conversation.
>>>>>>
>>>>>> We need sget_userns(..., _user_ns) for sysfs.  The sysfs
>>>>>> xattrs can travel from one mount of sysfs to another via the sysfs
>>>>>> backing store.
>>>>>>
>>>>>> For tmpfs and any other filesystems we support mounting without
>>>>>> privilige that support xattrs.  We need to identify them and
>>>>>> see if userspace is taking advantage of the ability to set
>>>>>> xattrs and file caps (unlikely).  If they are we need to call
>>>>>> sget_userns(..., _user_ns) on those filesystems as well.
>>>>>>
>>>>>> Possibly/Probably we should just do that for all of the interesting
>>>>>> filesystems to start with and then change back to an ordinary old sget
>>>>>> after we have done the testing and confirmed we will not be introducing
>>>>>> userspace regressions.
>>>>>
>>>>> I was reviewing everything in preparation for sending v2 patches, and I
>>>>> realized that doing this has an undesirable side effect. In patch 2 the
>>>>> implicit nodev is removed for unprivileged mounts, and instead s_user_ns
>>>>> is used to block opening devices in these mounts. When we set s_user_ns
>>>>> to _user_ns, it becomes possible to open device nodes from
>>>>> unprivileged mounts of these filesystems.
>>>>>
>>>>> This doesn't pose a real problem today. The only filesystems it will
>>>>> affect is sysfs, tmpfs, and ramfs (no others need s_user_ns =
>>>>> _user_ns for user namespace mounts), and all of these aren't
>>>>> problems. sysfs is okay because kernfs doesn't (currently?) allow device
>>>>> nodes, and a user would require CAP_MKNOD to create any device nodes in
>>>>> a tmpfs or ramfs mount.
>>>>>
>>>>> But for sysfs in particular it does mean that we will need to make sure
>>>>> that there's no way that device nodes could start appearing in an
>>>>> unprivileged mount.
>>>>
>>>> Good point about nodev.  
>>>>
>>>> For tmpfs and ramfs and security labels the smack policy of allowing but
>>>> filtering security labels mean smack once it has those bits will not
>>>> care which user namespace ramfs and tmpfs live in. 

Re: [PATCH 1/7] fs: Add user namesapace member to struct super_block

2015-08-06 Thread Stephen Smalley
On 08/06/2015 10:20 AM, Seth Forshee wrote:
> On Wed, Aug 05, 2015 at 04:19:03PM -0500, Eric W. Biederman wrote:
>> Seth Forshee  writes:
>>
>>> On Wed, Jul 15, 2015 at 09:47:11PM -0500, Eric W. Biederman wrote:
 Seth Forshee  writes:

> Initially this will be used to eliminate the implicit MNT_NODEV
> flag for mounts from user namespaces. In the future it will also
> be used for translating ids and checking capabilities for
> filesystems mounted from user namespaces.
>
> s_user_ns is initialized in alloc_super() and is generally set to
> current_user_ns(). To avoid security and corruption issues, two
> additional mount checks are also added:
>
>  - do_new_mount() gains a check that the user has CAP_SYS_ADMIN
>in current_user_ns().
>
>  - sget() will fail with EBUSY when the filesystem it's looking
>for is already mounted from another user namespace.
>
> proc needs some special handling here. The user namespace of
> current isn't appropriate when forking as a result of clone (2)
> with CLONE_NEWPID|CLONE_NEWUSER, as it will make proc unmountable
> from within the new user namespace. Instead, the user namespace
> which owns the new pid namespace should be used. sget_userns() is
> added to allow passing of a user namespace other than that of
> current, and this is used by proc_mount(). sget() becomes a
> wrapper around sget_userns() which passes current_user_ns().

 From bits of the previous conversation.

 We need sget_userns(..., _user_ns) for sysfs.  The sysfs
 xattrs can travel from one mount of sysfs to another via the sysfs
 backing store.

 For tmpfs and any other filesystems we support mounting without
 privilige that support xattrs.  We need to identify them and
 see if userspace is taking advantage of the ability to set
 xattrs and file caps (unlikely).  If they are we need to call
 sget_userns(..., _user_ns) on those filesystems as well.

 Possibly/Probably we should just do that for all of the interesting
 filesystems to start with and then change back to an ordinary old sget
 after we have done the testing and confirmed we will not be introducing
 userspace regressions.
>>>
>>> I was reviewing everything in preparation for sending v2 patches, and I
>>> realized that doing this has an undesirable side effect. In patch 2 the
>>> implicit nodev is removed for unprivileged mounts, and instead s_user_ns
>>> is used to block opening devices in these mounts. When we set s_user_ns
>>> to _user_ns, it becomes possible to open device nodes from
>>> unprivileged mounts of these filesystems.
>>>
>>> This doesn't pose a real problem today. The only filesystems it will
>>> affect is sysfs, tmpfs, and ramfs (no others need s_user_ns =
>>> _user_ns for user namespace mounts), and all of these aren't
>>> problems. sysfs is okay because kernfs doesn't (currently?) allow device
>>> nodes, and a user would require CAP_MKNOD to create any device nodes in
>>> a tmpfs or ramfs mount.
>>>
>>> But for sysfs in particular it does mean that we will need to make sure
>>> that there's no way that device nodes could start appearing in an
>>> unprivileged mount.
>>
>> Good point about nodev.  
>>
>> For tmpfs and ramfs and security labels the smack policy of allowing but
>> filtering security labels mean smack once it has those bits will not
>> care which user namespace ramfs and tmpfs live in.  The labels should
>> pretty much stay the same in any case.
> 
> Smack does care which namespace ramfs and tmpfs are in. With the patch
> I've got right now, if s_user_ns != _user_ns and the label of an
> inode does not match that of the root inode then
> security_inode_permission() will return EACCES.
> 
> So if something with CAP_MAC_ADMIN is changing security labels in such a
> mount, suddenly those inodes might become inaccessible. And while it may
> be unlikely that anyone is doing this it's impossible for me to prove
> that's the case.
> 
>> If the same class of handling will also apply to selinux and those are
>> the only two security modules that apply labels than we can leave tmpfs
>> and ramfs with the security labels of whomever mounted them.
> 
> For SELinux I now have a patch which applies mountpoint labeling to
> mounts for which s_user_ns != _user_ns. I'm less sure then with
> Smack how this behavior will differ from what happens today, but my
> understanding is that this means that the label of the mountpoint is
> used for all objects from that superblock. Afaik it does not have the
> Smack behavior of denying access to filesystem objects which have a
> different label in the backing store.
> 
>> For sysfs things get a little more interesting.  Assuming tmpfs and
>> ramfs don't need s_user_ns == _user_ns, sysfs may be fine operating
>> with possibly invalid securitly labels set on a different mount of
>> selinux.  (I am wondering now how all of 

Re: [PATCH 1/7] fs: Add user namesapace member to struct super_block

2015-08-06 Thread Stephen Smalley
On 08/06/2015 10:20 AM, Seth Forshee wrote:
 On Wed, Aug 05, 2015 at 04:19:03PM -0500, Eric W. Biederman wrote:
 Seth Forshee seth.fors...@canonical.com writes:

 On Wed, Jul 15, 2015 at 09:47:11PM -0500, Eric W. Biederman wrote:
 Seth Forshee seth.fors...@canonical.com writes:

 Initially this will be used to eliminate the implicit MNT_NODEV
 flag for mounts from user namespaces. In the future it will also
 be used for translating ids and checking capabilities for
 filesystems mounted from user namespaces.

 s_user_ns is initialized in alloc_super() and is generally set to
 current_user_ns(). To avoid security and corruption issues, two
 additional mount checks are also added:

  - do_new_mount() gains a check that the user has CAP_SYS_ADMIN
in current_user_ns().

  - sget() will fail with EBUSY when the filesystem it's looking
for is already mounted from another user namespace.

 proc needs some special handling here. The user namespace of
 current isn't appropriate when forking as a result of clone (2)
 with CLONE_NEWPID|CLONE_NEWUSER, as it will make proc unmountable
 from within the new user namespace. Instead, the user namespace
 which owns the new pid namespace should be used. sget_userns() is
 added to allow passing of a user namespace other than that of
 current, and this is used by proc_mount(). sget() becomes a
 wrapper around sget_userns() which passes current_user_ns().

 From bits of the previous conversation.

 We need sget_userns(..., init_user_ns) for sysfs.  The sysfs
 xattrs can travel from one mount of sysfs to another via the sysfs
 backing store.

 For tmpfs and any other filesystems we support mounting without
 privilige that support xattrs.  We need to identify them and
 see if userspace is taking advantage of the ability to set
 xattrs and file caps (unlikely).  If they are we need to call
 sget_userns(..., init_user_ns) on those filesystems as well.

 Possibly/Probably we should just do that for all of the interesting
 filesystems to start with and then change back to an ordinary old sget
 after we have done the testing and confirmed we will not be introducing
 userspace regressions.

 I was reviewing everything in preparation for sending v2 patches, and I
 realized that doing this has an undesirable side effect. In patch 2 the
 implicit nodev is removed for unprivileged mounts, and instead s_user_ns
 is used to block opening devices in these mounts. When we set s_user_ns
 to init_user_ns, it becomes possible to open device nodes from
 unprivileged mounts of these filesystems.

 This doesn't pose a real problem today. The only filesystems it will
 affect is sysfs, tmpfs, and ramfs (no others need s_user_ns =
 init_user_ns for user namespace mounts), and all of these aren't
 problems. sysfs is okay because kernfs doesn't (currently?) allow device
 nodes, and a user would require CAP_MKNOD to create any device nodes in
 a tmpfs or ramfs mount.

 But for sysfs in particular it does mean that we will need to make sure
 that there's no way that device nodes could start appearing in an
 unprivileged mount.

 Good point about nodev.  

 For tmpfs and ramfs and security labels the smack policy of allowing but
 filtering security labels mean smack once it has those bits will not
 care which user namespace ramfs and tmpfs live in.  The labels should
 pretty much stay the same in any case.
 
 Smack does care which namespace ramfs and tmpfs are in. With the patch
 I've got right now, if s_user_ns != init_user_ns and the label of an
 inode does not match that of the root inode then
 security_inode_permission() will return EACCES.
 
 So if something with CAP_MAC_ADMIN is changing security labels in such a
 mount, suddenly those inodes might become inaccessible. And while it may
 be unlikely that anyone is doing this it's impossible for me to prove
 that's the case.
 
 If the same class of handling will also apply to selinux and those are
 the only two security modules that apply labels than we can leave tmpfs
 and ramfs with the security labels of whomever mounted them.
 
 For SELinux I now have a patch which applies mountpoint labeling to
 mounts for which s_user_ns != init_user_ns. I'm less sure then with
 Smack how this behavior will differ from what happens today, but my
 understanding is that this means that the label of the mountpoint is
 used for all objects from that superblock. Afaik it does not have the
 Smack behavior of denying access to filesystem objects which have a
 different label in the backing store.
 
 For sysfs things get a little more interesting.  Assuming tmpfs and
 ramfs don't need s_user_ns == init_user_ns, sysfs may be fine operating
 with possibly invalid securitly labels set on a different mount of
 selinux.  (I am wondering now how all of these labels work in the
 context of nfs).
 
 If someone was using Smack to label sysfs then a mount with s_user_ns !=
 init_user_ns is going to leave inaccessible anything without the same
 label as the process which 

Re: [PATCH 1/7] fs: Add user namesapace member to struct super_block

2015-08-06 Thread Stephen Smalley
On 08/06/2015 11:44 AM, Seth Forshee wrote:
 On Thu, Aug 06, 2015 at 10:51:16AM -0400, Stephen Smalley wrote:
 On 08/06/2015 10:20 AM, Seth Forshee wrote:
 On Wed, Aug 05, 2015 at 04:19:03PM -0500, Eric W. Biederman wrote:
 Seth Forshee seth.fors...@canonical.com writes:

 On Wed, Jul 15, 2015 at 09:47:11PM -0500, Eric W. Biederman wrote:
 Seth Forshee seth.fors...@canonical.com writes:

 Initially this will be used to eliminate the implicit MNT_NODEV
 flag for mounts from user namespaces. In the future it will also
 be used for translating ids and checking capabilities for
 filesystems mounted from user namespaces.

 s_user_ns is initialized in alloc_super() and is generally set to
 current_user_ns(). To avoid security and corruption issues, two
 additional mount checks are also added:

  - do_new_mount() gains a check that the user has CAP_SYS_ADMIN
in current_user_ns().

  - sget() will fail with EBUSY when the filesystem it's looking
for is already mounted from another user namespace.

 proc needs some special handling here. The user namespace of
 current isn't appropriate when forking as a result of clone (2)
 with CLONE_NEWPID|CLONE_NEWUSER, as it will make proc unmountable
 from within the new user namespace. Instead, the user namespace
 which owns the new pid namespace should be used. sget_userns() is
 added to allow passing of a user namespace other than that of
 current, and this is used by proc_mount(). sget() becomes a
 wrapper around sget_userns() which passes current_user_ns().

 From bits of the previous conversation.

 We need sget_userns(..., init_user_ns) for sysfs.  The sysfs
 xattrs can travel from one mount of sysfs to another via the sysfs
 backing store.

 For tmpfs and any other filesystems we support mounting without
 privilige that support xattrs.  We need to identify them and
 see if userspace is taking advantage of the ability to set
 xattrs and file caps (unlikely).  If they are we need to call
 sget_userns(..., init_user_ns) on those filesystems as well.

 Possibly/Probably we should just do that for all of the interesting
 filesystems to start with and then change back to an ordinary old sget
 after we have done the testing and confirmed we will not be introducing
 userspace regressions.

 I was reviewing everything in preparation for sending v2 patches, and I
 realized that doing this has an undesirable side effect. In patch 2 the
 implicit nodev is removed for unprivileged mounts, and instead s_user_ns
 is used to block opening devices in these mounts. When we set s_user_ns
 to init_user_ns, it becomes possible to open device nodes from
 unprivileged mounts of these filesystems.

 This doesn't pose a real problem today. The only filesystems it will
 affect is sysfs, tmpfs, and ramfs (no others need s_user_ns =
 init_user_ns for user namespace mounts), and all of these aren't
 problems. sysfs is okay because kernfs doesn't (currently?) allow device
 nodes, and a user would require CAP_MKNOD to create any device nodes in
 a tmpfs or ramfs mount.

 But for sysfs in particular it does mean that we will need to make sure
 that there's no way that device nodes could start appearing in an
 unprivileged mount.

 Good point about nodev.  

 For tmpfs and ramfs and security labels the smack policy of allowing but
 filtering security labels mean smack once it has those bits will not
 care which user namespace ramfs and tmpfs live in.  The labels should
 pretty much stay the same in any case.

 Smack does care which namespace ramfs and tmpfs are in. With the patch
 I've got right now, if s_user_ns != init_user_ns and the label of an
 inode does not match that of the root inode then
 security_inode_permission() will return EACCES.

 So if something with CAP_MAC_ADMIN is changing security labels in such a
 mount, suddenly those inodes might become inaccessible. And while it may
 be unlikely that anyone is doing this it's impossible for me to prove
 that's the case.

 If the same class of handling will also apply to selinux and those are
 the only two security modules that apply labels than we can leave tmpfs
 and ramfs with the security labels of whomever mounted them.

 For SELinux I now have a patch which applies mountpoint labeling to
 mounts for which s_user_ns != init_user_ns. I'm less sure then with
 Smack how this behavior will differ from what happens today, but my
 understanding is that this means that the label of the mountpoint is
 used for all objects from that superblock. Afaik it does not have the
 Smack behavior of denying access to filesystem objects which have a
 different label in the backing store.

 For sysfs things get a little more interesting.  Assuming tmpfs and
 ramfs don't need s_user_ns == init_user_ns, sysfs may be fine operating
 with possibly invalid securitly labels set on a different mount of
 selinux.  (I am wondering now how all of these labels work in the
 context of nfs).

 If someone was using Smack to label sysfs then a mount with s_user_ns

Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts

2015-07-30 Thread Stephen Smalley
On 07/24/2015 11:11 AM, Seth Forshee wrote:
> On Thu, Jul 23, 2015 at 11:23:31AM -0500, Seth Forshee wrote:
>> On Thu, Jul 23, 2015 at 11:36:03AM -0400, Stephen Smalley wrote:
>>> On 07/23/2015 10:39 AM, Seth Forshee wrote:
>>>> On Thu, Jul 23, 2015 at 09:57:20AM -0400, Stephen Smalley wrote:
>>>>> On 07/22/2015 04:40 PM, Stephen Smalley wrote:
>>>>>> On 07/22/2015 04:25 PM, Stephen Smalley wrote:
>>>>>>> On 07/22/2015 12:14 PM, Seth Forshee wrote:
>>>>>>>> On Wed, Jul 22, 2015 at 12:02:13PM -0400, Stephen Smalley wrote:
>>>>>>>>> On 07/16/2015 09:23 AM, Stephen Smalley wrote:
>>>>>>>>>> On 07/15/2015 03:46 PM, Seth Forshee wrote:
>>>>>>>>>>> Unprivileged users should not be able to supply security labels
>>>>>>>>>>> in filesystems, nor should they be able to supply security
>>>>>>>>>>> contexts in unprivileged mounts. For any mount where s_user_ns is
>>>>>>>>>>> not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior
>>>>>>>>>>> and return EPERM if any contexts are supplied in the mount
>>>>>>>>>>> options.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Seth Forshee 
>>>>>>>>>>
>>>>>>>>>> I think this is obsoleted by the subsequent discussion, but just for 
>>>>>>>>>> the
>>>>>>>>>> record: this patch would cause the files in the userns mount to be 
>>>>>>>>>> left
>>>>>>>>>> with the "unlabeled" label, and therefore under typical policies,
>>>>>>>>>> completely inaccessible to any process in a confined domain.
>>>>>>>>>
>>>>>>>>> The right way to handle this for SELinux would be to automatically use
>>>>>>>>> mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by
>>>>>>>>> specifying a context= mount option), with the sbsec->mntpoint_sid set
>>>>>>>>> from some related object (e.g. the block device file context, as in 
>>>>>>>>> your
>>>>>>>>> patches for Smack).  That will cause SELinux to use that value instead
>>>>>>>>> of any xattr value from the filesystem and will cause attempts by
>>>>>>>>> userspace to set the security.selinux xattr to fail on that 
>>>>>>>>> filesystem.
>>>>>>>>>  That is how SELinux normally deals with untrusted filesystems, except
>>>>>>>>> that it is normally specified as a mount option by a trusted mounting
>>>>>>>>> process, whereas in your case you need to automatically set it.
>>>>>>>>
>>>>>>>> Excellent, thank you for the advice. I'll start on this when I've
>>>>>>>> finished with Smack.
>>>>>>>
>>>>>>> Not tested, but something like this should work. Note that it should
>>>>>>> come after the call to security_fs_use() so we know whether SELinux
>>>>>>> would even try to use xattrs supplied by the filesystem in the first 
>>>>>>> place.
>>>>>>>
>>>>>>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
>>>>>>> index 564079c..84da3a2 100644
>>>>>>> --- a/security/selinux/hooks.c
>>>>>>> +++ b/security/selinux/hooks.c
>>>>>>> @@ -745,6 +745,30 @@ static int selinux_set_mnt_opts(struct super_block 
>>>>>>> *sb,
>>>>>>> goto out;
>>>>>>> }
>>>>>>> }
>>>>>>> +
>>>>>>> +   /*
>>>>>>> +* If this is a user namespace mount, no contexts are allowed
>>>>>>> +* on the command line and security labels must be ignored.
>>>>>>> +*/
>>>>>>> +   if (sb->s_user_ns != _user_ns) {
>>>>>>> +   if (context_sid || fscontext_sid || rootcontext_sid ||
>>>>>>> +   defcontext_sid) {
>>>>>>> +  

Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts

2015-07-30 Thread Stephen Smalley
On 07/24/2015 11:11 AM, Seth Forshee wrote:
 On Thu, Jul 23, 2015 at 11:23:31AM -0500, Seth Forshee wrote:
 On Thu, Jul 23, 2015 at 11:36:03AM -0400, Stephen Smalley wrote:
 On 07/23/2015 10:39 AM, Seth Forshee wrote:
 On Thu, Jul 23, 2015 at 09:57:20AM -0400, Stephen Smalley wrote:
 On 07/22/2015 04:40 PM, Stephen Smalley wrote:
 On 07/22/2015 04:25 PM, Stephen Smalley wrote:
 On 07/22/2015 12:14 PM, Seth Forshee wrote:
 On Wed, Jul 22, 2015 at 12:02:13PM -0400, Stephen Smalley wrote:
 On 07/16/2015 09:23 AM, Stephen Smalley wrote:
 On 07/15/2015 03:46 PM, Seth Forshee wrote:
 Unprivileged users should not be able to supply security labels
 in filesystems, nor should they be able to supply security
 contexts in unprivileged mounts. For any mount where s_user_ns is
 not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior
 and return EPERM if any contexts are supplied in the mount
 options.

 Signed-off-by: Seth Forshee seth.fors...@canonical.com

 I think this is obsoleted by the subsequent discussion, but just for 
 the
 record: this patch would cause the files in the userns mount to be 
 left
 with the unlabeled label, and therefore under typical policies,
 completely inaccessible to any process in a confined domain.

 The right way to handle this for SELinux would be to automatically use
 mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by
 specifying a context= mount option), with the sbsec-mntpoint_sid set
 from some related object (e.g. the block device file context, as in 
 your
 patches for Smack).  That will cause SELinux to use that value instead
 of any xattr value from the filesystem and will cause attempts by
 userspace to set the security.selinux xattr to fail on that 
 filesystem.
  That is how SELinux normally deals with untrusted filesystems, except
 that it is normally specified as a mount option by a trusted mounting
 process, whereas in your case you need to automatically set it.

 Excellent, thank you for the advice. I'll start on this when I've
 finished with Smack.

 Not tested, but something like this should work. Note that it should
 come after the call to security_fs_use() so we know whether SELinux
 would even try to use xattrs supplied by the filesystem in the first 
 place.

 diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
 index 564079c..84da3a2 100644
 --- a/security/selinux/hooks.c
 +++ b/security/selinux/hooks.c
 @@ -745,6 +745,30 @@ static int selinux_set_mnt_opts(struct super_block 
 *sb,
 goto out;
 }
 }
 +
 +   /*
 +* If this is a user namespace mount, no contexts are allowed
 +* on the command line and security labels must be ignored.
 +*/
 +   if (sb-s_user_ns != init_user_ns) {
 +   if (context_sid || fscontext_sid || rootcontext_sid ||
 +   defcontext_sid) {
 +   rc = -EACCES;
 +   goto out;
 +   }
 +   if (sbsec-behavior == SECURITY_FS_USE_XATTR) {
 +   struct block_device *bdev = sb-s_bdev;
 +   sbsec-behavior = SECURITY_FS_USE_MNTPOINT;
 +   if (bdev) {
 +   struct inode_security_struct *isec =
 bdev-bd_inode;

 That should be bdev-bd_inode-i_security.

 Sorry, this won't work.  bd_inode is not the inode of the block device
 file that was passed to mount, and it isn't labeled in any way.  It will
 just be unlabeled.

 So I guess the only real option here as a fallback is
 sbsec-mntpoint_sid = current_sid().  Which isn't great either, as the
 only case where we currently assign task labels to files is for their
 /proc/pid inodes, and no current policy will therefore allow create
 permission to such files.

 Darn, you're right, that isn't the inode we want. There really doesn't
 seem to be any way to get back to the one we want from the LSM, short of
 adding a new hook.

 Maybe list_first_entry(sb-s_bdev-bd_inodes, struct inode, i_devices)?
 Feels like a layering violation though...

 Yeah, and even though that probably works out to be the inode we want in
 most cases I don't think we can be absolutely certain that it is. Maybe
 there's some way we could walk the list and be sure we've found the
 right inode, but I'm not seeing it.
 
 I guess we could do something like this (note that most of the changes
 here are just to give a version of blkdev_get_by_path which takes a
 struct path * so that the filename lookup doesn't have to be done
 twice). Basically add a new hook that informs the security module of the
 inode for the backing device file passed to mount and call that from
 mount_bdev. The security module could grab a reference to the inode and
 stash it away.
 
 Something else to note is that, as I have it here, the hook would end up
 getting called for every mount of a given block device, not just the
 first. So it's possible the security module could see the hook

Re: [PATCH v2] ipc: Use private shmem or hugetlbfs inodes for shm segments.

2015-07-27 Thread Stephen Smalley
On 07/27/2015 03:32 PM, Hugh Dickins wrote:
> On Fri, 24 Jul 2015, Stephen Smalley wrote:
> 
>> The shm implementation internally uses shmem or hugetlbfs inodes
>> for shm segments.  As these inodes are never directly exposed to
>> userspace and only accessed through the shm operations which are
>> already hooked by security modules, mark the inodes with the
>> S_PRIVATE flag so that inode security initialization and permission
>> checking is skipped.
>>
>> This was motivated by the following lockdep warning:
>> Jul 22 14:36:40 fc23 kernel:
>> ==
>> Jul 22 14:36:40 fc23 kernel: [ INFO: possible circular locking
>> dependency detected ]
>> Jul 22 14:36:40 fc23 kernel: 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1
>> Tainted: GW
>> Jul 22 14:36:40 fc23 kernel:
>> ---
>> Jul 22 14:36:40 fc23 kernel: httpd/1597 is trying to acquire lock:
>> Jul 22 14:36:40 fc23 kernel: (>rwsem){+.}, at:
>> [] shm_close+0x34/0x130
>> Jul 22 14:36:40 fc23 kernel: #012but task is already holding lock:
>> Jul 22 14:36:40 fc23 kernel: (>mmap_sem){++}, at:
>> [] SyS_shmdt+0x4b/0x180
>> Jul 22 14:36:40 fc23 kernel: #012which lock already depends on the new lock.
>> Jul 22 14:36:40 fc23 kernel: #012the existing dependency chain (in
>> reverse order) is:
>> Jul 22 14:36:40 fc23 kernel: #012-> #3 (>mmap_sem){++}:
>> Jul 22 14:36:40 fc23 kernel:   [] 
>> lock_acquire+0xc7/0x270
>> Jul 22 14:36:40 fc23 kernel:   [] 
>> __might_fault+0x7a/0xa0
>> Jul 22 14:36:40 fc23 kernel:   [] filldir+0x9e/0x130
>> Jul 22 14:36:40 fc23 kernel:   []
>> xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
>> Jul 22 14:36:40 fc23 kernel:   []
>> xfs_readdir+0x1b4/0x330 [xfs]
>> Jul 22 14:36:40 fc23 kernel:   []
>> xfs_file_readdir+0x2b/0x30 [xfs]
>> Jul 22 14:36:40 fc23 kernel:   [] 
>> iterate_dir+0x97/0x130
>> Jul 22 14:36:40 fc23 kernel:   [] 
>> SyS_getdents+0x91/0x120
>> Jul 22 14:36:40 fc23 kernel:   []
>> entry_SYSCALL_64_fastpath+0x12/0x76
>> Jul 22 14:36:40 fc23 kernel: #012-> #2 (_dir_ilock_class){.+}:
>> Jul 22 14:36:40 fc23 kernel:   [] 
>> lock_acquire+0xc7/0x270
>> Jul 22 14:36:40 fc23 kernel:   []
>> down_read_nested+0x57/0xa0
>> Jul 22 14:36:40 fc23 kernel:   []
>> xfs_ilock+0x167/0x350 [xfs]
>> Jul 22 14:36:40 fc23 kernel:   []
>> xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
>> Jul 22 14:36:40 fc23 kernel:   []
>> xfs_attr_get+0xbd/0x190 [xfs]
>> Jul 22 14:36:40 fc23 kernel:   []
>> xfs_xattr_get+0x3d/0x70 [xfs]
>> Jul 22 14:36:40 fc23 kernel:   []
>> generic_getxattr+0x4f/0x70
>> Jul 22 14:36:40 fc23 kernel:   []
>> inode_doinit_with_dentry+0x162/0x670
>> Jul 22 14:36:40 fc23 kernel:   []
>> sb_finish_set_opts+0xd9/0x230
>> Jul 22 14:36:40 fc23 kernel:   []
>> selinux_set_mnt_opts+0x35c/0x660
>> Jul 22 14:36:40 fc23 kernel:   []
>> superblock_doinit+0x77/0xf0
>> Jul 22 14:36:40 fc23 kernel:   []
>> delayed_superblock_init+0x10/0x20
>> Jul 22 14:36:40 fc23 kernel:   []
>> iterate_supers+0xb3/0x110
>> Jul 22 14:36:40 fc23 kernel:   []
>> selinux_complete_init+0x2f/0x40
>> Jul 22 14:36:40 fc23 kernel:   []
>> security_load_policy+0x103/0x600
>> Jul 22 14:36:40 fc23 kernel:   []
>> sel_write_load+0xc1/0x750
>> Jul 22 14:36:40 fc23 kernel:   [] 
>> __vfs_write+0x37/0x100
>> Jul 22 14:36:40 fc23 kernel:   [] vfs_write+0xa9/0x1a0
>> Jul 22 14:36:40 fc23 kernel:   [] SyS_write+0x58/0xd0
>> Jul 22 14:36:40 fc23 kernel:   []
>> entry_SYSCALL_64_fastpath+0x12/0x76
>> Jul 22 14:36:40 fc23 kernel: #012-> #1 (>lock){+.+.+.}:
>> Jul 22 14:36:40 fc23 kernel:   [] 
>> lock_acquire+0xc7/0x270
>> Jul 22 14:36:40 fc23 kernel:   []
>> mutex_lock_nested+0x7f/0x3e0
>> Jul 22 14:36:40 fc23 kernel:   []
>> inode_doinit_with_dentry+0xb9/0x670
>> Jul 22 14:36:40 fc23 kernel:   []
>> selinux_d_instantiate+0x1c/0x20
>> Jul 22 14:36:40 fc23 kernel:   []
>> security_d_instantiate+0x36/0x60
>> Jul 22 14:36:40 fc23 kernel:   [] 
>> d_instantiate+0x54/0x70
>> Jul 22 14:36:40 fc23 kernel:   []
>> __shmem_file_setup+0xdc/0x240
>> Jul 22 14:36:40 fc23 kernel:   []
>> shmem_file_setup+0x10/0x20
>> Jul 22 14:36:40 fc23 kernel:   [] newseg+0x290/0x3a0
>> Jul 22 14:36:40 fc23 kernel:   

Re: [PATCH v2] ipc: Use private shmem or hugetlbfs inodes for shm segments.

2015-07-27 Thread Stephen Smalley
On 07/27/2015 03:32 PM, Hugh Dickins wrote:
 On Fri, 24 Jul 2015, Stephen Smalley wrote:
 
 The shm implementation internally uses shmem or hugetlbfs inodes
 for shm segments.  As these inodes are never directly exposed to
 userspace and only accessed through the shm operations which are
 already hooked by security modules, mark the inodes with the
 S_PRIVATE flag so that inode security initialization and permission
 checking is skipped.

 This was motivated by the following lockdep warning:
 Jul 22 14:36:40 fc23 kernel:
 ==
 Jul 22 14:36:40 fc23 kernel: [ INFO: possible circular locking
 dependency detected ]
 Jul 22 14:36:40 fc23 kernel: 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1
 Tainted: GW
 Jul 22 14:36:40 fc23 kernel:
 ---
 Jul 22 14:36:40 fc23 kernel: httpd/1597 is trying to acquire lock:
 Jul 22 14:36:40 fc23 kernel: (ids-rwsem){+.}, at:
 [81385354] shm_close+0x34/0x130
 Jul 22 14:36:40 fc23 kernel: #012but task is already holding lock:
 Jul 22 14:36:40 fc23 kernel: (mm-mmap_sem){++}, at:
 [81386bbb] SyS_shmdt+0x4b/0x180
 Jul 22 14:36:40 fc23 kernel: #012which lock already depends on the new lock.
 Jul 22 14:36:40 fc23 kernel: #012the existing dependency chain (in
 reverse order) is:
 Jul 22 14:36:40 fc23 kernel: #012- #3 (mm-mmap_sem){++}:
 Jul 22 14:36:40 fc23 kernel:   [81109a07] 
 lock_acquire+0xc7/0x270
 Jul 22 14:36:40 fc23 kernel:   [81217baa] 
 __might_fault+0x7a/0xa0
 Jul 22 14:36:40 fc23 kernel:   [81284a1e] filldir+0x9e/0x130
 Jul 22 14:36:40 fc23 kernel:   [a019bb08]
 xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
 Jul 22 14:36:40 fc23 kernel:   [a019c5b4]
 xfs_readdir+0x1b4/0x330 [xfs]
 Jul 22 14:36:40 fc23 kernel:   [a019f38b]
 xfs_file_readdir+0x2b/0x30 [xfs]
 Jul 22 14:36:40 fc23 kernel:   [812847e7] 
 iterate_dir+0x97/0x130
 Jul 22 14:36:40 fc23 kernel:   [81284d21] 
 SyS_getdents+0x91/0x120
 Jul 22 14:36:40 fc23 kernel:   [81871d2e]
 entry_SYSCALL_64_fastpath+0x12/0x76
 Jul 22 14:36:40 fc23 kernel: #012- #2 (xfs_dir_ilock_class){.+}:
 Jul 22 14:36:40 fc23 kernel:   [81109a07] 
 lock_acquire+0xc7/0x270
 Jul 22 14:36:40 fc23 kernel:   [81101e97]
 down_read_nested+0x57/0xa0
 Jul 22 14:36:40 fc23 kernel:   [a01b0e57]
 xfs_ilock+0x167/0x350 [xfs]
 Jul 22 14:36:40 fc23 kernel:   [a01b10b8]
 xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
 Jul 22 14:36:40 fc23 kernel:   [a014799d]
 xfs_attr_get+0xbd/0x190 [xfs]
 Jul 22 14:36:40 fc23 kernel:   [a01c17ad]
 xfs_xattr_get+0x3d/0x70 [xfs]
 Jul 22 14:36:40 fc23 kernel:   [8129962f]
 generic_getxattr+0x4f/0x70
 Jul 22 14:36:40 fc23 kernel:   [8139ba52]
 inode_doinit_with_dentry+0x162/0x670
 Jul 22 14:36:40 fc23 kernel:   [8139cf69]
 sb_finish_set_opts+0xd9/0x230
 Jul 22 14:36:40 fc23 kernel:   [8139d66c]
 selinux_set_mnt_opts+0x35c/0x660
 Jul 22 14:36:40 fc23 kernel:   [8139ff97]
 superblock_doinit+0x77/0xf0
 Jul 22 14:36:40 fc23 kernel:   [813a0020]
 delayed_superblock_init+0x10/0x20
 Jul 22 14:36:40 fc23 kernel:   [81272d23]
 iterate_supers+0xb3/0x110
 Jul 22 14:36:40 fc23 kernel:   [813a4e5f]
 selinux_complete_init+0x2f/0x40
 Jul 22 14:36:40 fc23 kernel:   [813b47a3]
 security_load_policy+0x103/0x600
 Jul 22 14:36:40 fc23 kernel:   [813a6901]
 sel_write_load+0xc1/0x750
 Jul 22 14:36:40 fc23 kernel:   [8126e817] 
 __vfs_write+0x37/0x100
 Jul 22 14:36:40 fc23 kernel:   [8126f229] vfs_write+0xa9/0x1a0
 Jul 22 14:36:40 fc23 kernel:   [8126ff48] SyS_write+0x58/0xd0
 Jul 22 14:36:40 fc23 kernel:   [81871d2e]
 entry_SYSCALL_64_fastpath+0x12/0x76
 Jul 22 14:36:40 fc23 kernel: #012- #1 (isec-lock){+.+.+.}:
 Jul 22 14:36:40 fc23 kernel:   [81109a07] 
 lock_acquire+0xc7/0x270
 Jul 22 14:36:40 fc23 kernel:   [8186de8f]
 mutex_lock_nested+0x7f/0x3e0
 Jul 22 14:36:40 fc23 kernel:   [8139b9a9]
 inode_doinit_with_dentry+0xb9/0x670
 Jul 22 14:36:40 fc23 kernel:   [8139bf7c]
 selinux_d_instantiate+0x1c/0x20
 Jul 22 14:36:40 fc23 kernel:   [813955f6]
 security_d_instantiate+0x36/0x60
 Jul 22 14:36:40 fc23 kernel:   [81287c34] 
 d_instantiate+0x54/0x70
 Jul 22 14:36:40 fc23 kernel:   [8120111c]
 __shmem_file_setup+0xdc/0x240
 Jul 22 14:36:40 fc23 kernel:   [81201290]
 shmem_file_setup+0x10/0x20
 Jul 22 14:36:40 fc23 kernel:   [813856e0] newseg+0x290/0x3a0
 Jul 22 14:36:40 fc23 kernel:   [8137e278] ipcget+0x208/0x2d0
 Jul 22 14:36:40 fc23 kernel:   [81386074] SyS_shmget+0x54/0x70
 Jul 22 14:36:40 fc23 kernel:   [81871d2e

Re: [RFC][PATCH] ipc: Use private shmem or hugetlbfs inodes for shm segments.

2015-07-24 Thread Stephen Smalley
On 07/23/2015 08:11 PM, Dave Chinner wrote:
> On Thu, Jul 23, 2015 at 12:28:33PM -0400, Stephen Smalley wrote:
>> The shm implementation internally uses shmem or hugetlbfs inodes
>> for shm segments.  As these inodes are never directly exposed to
>> userspace and only accessed through the shm operations which are
>> already hooked by security modules, mark the inodes with the
>> S_PRIVATE flag so that inode security initialization and permission
>> checking is skipped.
>>
>> This was motivated by the following lockdep warning:
>> ===
>> [ INFO: possible circular locking dependency detected ]
>> 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: GW
>> ---
>> httpd/1597 is trying to acquire lock:
>> (>rwsem){+.}, at: [] shm_close+0x34/0x130
>> (>mmap_sem){++}, at: [] SyS_shmdt+0x4b/0x180
>>   [] lock_acquire+0xc7/0x270
>>   [] __might_fault+0x7a/0xa0
>>   [] filldir+0x9e/0x130
>>   [] xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
>>   [] xfs_readdir+0x1b4/0x330 [xfs]
>>   [] xfs_file_readdir+0x2b/0x30 [xfs]
>>   [] iterate_dir+0x97/0x130
>>   [] SyS_getdents+0x91/0x120
>>   [] entry_SYSCALL_64_fastpath+0x12/0x76
>>   [] lock_acquire+0xc7/0x270
>>   [] down_read_nested+0x57/0xa0
>>   [] xfs_ilock+0x167/0x350 [xfs]
>>   [] xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
>>   [] xfs_attr_get+0xbd/0x190 [xfs]
>>   [] xfs_xattr_get+0x3d/0x70 [xfs]
>>   [] generic_getxattr+0x4f/0x70
>>   [] inode_doinit_with_dentry+0x162/0x670
>>   [] sb_finish_set_opts+0xd9/0x230
>>   [] selinux_set_mnt_opts+0x35c/0x660
>>   [] superblock_doinit+0x77/0xf0
>>   [] delayed_superblock_init+0x10/0x20
>>   [] iterate_supers+0xb3/0x110
>>   [] selinux_complete_init+0x2f/0x40
>>   [] security_load_policy+0x103/0x600
>>   [] sel_write_load+0xc1/0x750
>>   [] __vfs_write+0x37/0x100
>>   [] vfs_write+0xa9/0x1a0
>>   [] SyS_write+0x58/0xd0
>>   [] entry_SYSCALL_64_fastpath+0x12/0x76
>>   [] lock_acquire+0xc7/0x270
>>   [] mutex_lock_nested+0x7f/0x3e0
>>   [] inode_doinit_with_dentry+0xb9/0x670
>>   [] selinux_d_instantiate+0x1c/0x20
>>   [] security_d_instantiate+0x36/0x60
>>   [] d_instantiate+0x54/0x70
>>   [] __shmem_file_setup+0xdc/0x240
>>   [] shmem_file_setup+0x10/0x20
>>   [] newseg+0x290/0x3a0
>>   [] ipcget+0x208/0x2d0
>>   [] SyS_shmget+0x54/0x70
>>   [] entry_SYSCALL_64_fastpath+0x12/0x76
>>   [] __lock_acquire+0x1a78/0x1d00
>>   [] lock_acquire+0xc7/0x270
>>   [] down_write+0x5a/0xc0
>>   [] shm_close+0x34/0x130
>>   [] remove_vma+0x45/0x80
>>   [] do_munmap+0x2b0/0x460
>>   [] SyS_shmdt+0xb5/0x180
>>   [] entry_SYSCALL_64_fastpath+0x12/0x76
> 
> That's a completely screwed up stack trace. There are *4* syscall
> entry points with 4 separate, unrelated syscall chains on that
> stack trace, all starting at the same address. How is this a valid
> stack trace and not a lockdep bug of some kind?

Sorry, I mangled it when I tried to reformat it from Morten Steven's
original report.  Fixed in v2.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] ipc: Use private shmem or hugetlbfs inodes for shm segments.

2015-07-24 Thread Stephen Smalley
   CPU1
Jul 22 14:36:40 fc23 kernel:   
Jul 22 14:36:40 fc23 kernel:  lock(>mmap_sem);
Jul 22 14:36:40 fc23 kernel:
lock(_dir_ilock_class);
Jul 22 14:36:40 fc23 kernel:   lock(>mmap_sem);
Jul 22 14:36:40 fc23 kernel:  lock(>rwsem);
Jul 22 14:36:40 fc23 kernel: #012 *** DEADLOCK ***
Jul 22 14:36:40 fc23 kernel: 1 lock held by httpd/1597:
Jul 22 14:36:40 fc23 kernel: #0:  (>mmap_sem){++}, at:
[] SyS_shmdt+0x4b/0x180
Jul 22 14:36:40 fc23 kernel: #012stack backtrace:
Jul 22 14:36:40 fc23 kernel: CPU: 7 PID: 1597 Comm: httpd Tainted: G
 W   4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1
Jul 22 14:36:40 fc23 kernel: Hardware name: VMware, Inc. VMware
Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00
05/20/2014
Jul 22 14:36:40 fc23 kernel:  6cb6fe9d
88019ff07c58 81868175
Jul 22 14:36:40 fc23 kernel:  82aea390
88019ff07ca8 81105903
Jul 22 14:36:40 fc23 kernel: 88019ff07c78 88019ff07d08
0001 8800b75108f0
Jul 22 14:36:40 fc23 kernel: Call Trace:
Jul 22 14:36:40 fc23 kernel: [] dump_stack+0x4c/0x65
Jul 22 14:36:40 fc23 kernel: [] print_circular_bug+0x1e3/0x250
Jul 22 14:36:40 fc23 kernel: [] __lock_acquire+0x1a78/0x1d00
Jul 22 14:36:40 fc23 kernel: [] ? unlink_file_vma+0x33/0x60
Jul 22 14:36:40 fc23 kernel: [] lock_acquire+0xc7/0x270
Jul 22 14:36:40 fc23 kernel: [] ? shm_close+0x34/0x130
Jul 22 14:36:40 fc23 kernel: [] down_write+0x5a/0xc0
Jul 22 14:36:40 fc23 kernel: [] ? shm_close+0x34/0x130
Jul 22 14:36:40 fc23 kernel: [] shm_close+0x34/0x130
Jul 22 14:36:40 fc23 kernel: [] remove_vma+0x45/0x80
Jul 22 14:36:40 fc23 kernel: [] do_munmap+0x2b0/0x460
Jul 22 14:36:40 fc23 kernel: [] ? SyS_shmdt+0x4b/0x180
Jul 22 14:36:40 fc23 kernel: [] SyS_shmdt+0xb5/0x180
Jul 22 14:36:40 fc23 kernel: []
entry_SYSCALL_64_fastpath+0x12/0x76

Reported-by: Morten Stevens 
Signed-off-by: Stephen Smalley 
---
This version only differs in the patch description, which restores
the original lockdep trace from Morten Stevens.  It was unfortunately
mangled in the prior version.

 fs/hugetlbfs/inode.c | 2 ++
 ipc/shm.c| 2 +-
 mm/shmem.c   | 4 ++--
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 0cf74df..973c24c 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -1010,6 +1010,8 @@ struct file *hugetlb_file_setup(const char *name, size_t 
size,
inode = hugetlbfs_get_inode(sb, NULL, S_IFREG | S_IRWXUGO, 0);
if (!inode)
goto out_dentry;
+   if (creat_flags == HUGETLB_SHMFS_INODE)
+   inode->i_flags |= S_PRIVATE;
 
file = ERR_PTR(-ENOMEM);
if (hugetlb_reserve_pages(inode, 0,
diff --git a/ipc/shm.c b/ipc/shm.c
index 06e5cf2..4aef24d 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -545,7 +545,7 @@ static int newseg(struct ipc_namespace *ns, struct 
ipc_params *params)
if  ((shmflg & SHM_NORESERVE) &&
sysctl_overcommit_memory != OVERCOMMIT_NEVER)
acctflag = VM_NORESERVE;
-   file = shmem_file_setup(name, size, acctflag);
+   file = shmem_kernel_file_setup(name, size, acctflag);
}
error = PTR_ERR(file);
if (IS_ERR(file))
diff --git a/mm/shmem.c b/mm/shmem.c
index 4caf8ed..dbe0c1e 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3363,8 +3363,8 @@ put_path:
  * shmem_kernel_file_setup - get an unlinked file living in tmpfs which must be
  * kernel internal.  There will be NO LSM permission checks against the
  * underlying inode.  So users of this interface must do LSM checks at a
- * higher layer.  The one user is the big_key implementation.  LSM checks
- * are provided at the key level rather than the inode level.
+ * higher layer.  The users are the big_key and shm implementations.  LSM
+ * checks are provided at the key or shm level rather than the inode.
  * @name: name for dentry (to be seen in /proc//maps
  * @size: size to be set for the file
  * @flags: VM_NORESERVE suppresses pre-accounting of the entire object size
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] ipc: Use private shmem or hugetlbfs inodes for shm segments.

2015-07-24 Thread Stephen Smalley
:   [81109a07] lock_acquire+0xc7/0x270
Jul 22 14:36:40 fc23 kernel:   [8186efba] down_write+0x5a/0xc0
Jul 22 14:36:40 fc23 kernel:   [81385354] shm_close+0x34/0x130
Jul 22 14:36:40 fc23 kernel:   [812203a5] remove_vma+0x45/0x80
Jul 22 14:36:40 fc23 kernel:   [81222a30] do_munmap+0x2b0/0x460
Jul 22 14:36:40 fc23 kernel:   [81386c25] SyS_shmdt+0xb5/0x180
Jul 22 14:36:40 fc23 kernel:   [81871d2e]
entry_SYSCALL_64_fastpath+0x12/0x76
Jul 22 14:36:40 fc23 kernel: #012other info that might help us debug this:
Jul 22 14:36:40 fc23 kernel: Chain exists of:#012  ids-rwsem --
xfs_dir_ilock_class -- mm-mmap_sem
Jul 22 14:36:40 fc23 kernel: Possible unsafe locking scenario:
Jul 22 14:36:40 fc23 kernel:   CPU0CPU1
Jul 22 14:36:40 fc23 kernel:   
Jul 22 14:36:40 fc23 kernel:  lock(mm-mmap_sem);
Jul 22 14:36:40 fc23 kernel:
lock(xfs_dir_ilock_class);
Jul 22 14:36:40 fc23 kernel:   lock(mm-mmap_sem);
Jul 22 14:36:40 fc23 kernel:  lock(ids-rwsem);
Jul 22 14:36:40 fc23 kernel: #012 *** DEADLOCK ***
Jul 22 14:36:40 fc23 kernel: 1 lock held by httpd/1597:
Jul 22 14:36:40 fc23 kernel: #0:  (mm-mmap_sem){++}, at:
[81386bbb] SyS_shmdt+0x4b/0x180
Jul 22 14:36:40 fc23 kernel: #012stack backtrace:
Jul 22 14:36:40 fc23 kernel: CPU: 7 PID: 1597 Comm: httpd Tainted: G
 W   4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1
Jul 22 14:36:40 fc23 kernel: Hardware name: VMware, Inc. VMware
Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00
05/20/2014
Jul 22 14:36:40 fc23 kernel:  6cb6fe9d
88019ff07c58 81868175
Jul 22 14:36:40 fc23 kernel:  82aea390
88019ff07ca8 81105903
Jul 22 14:36:40 fc23 kernel: 88019ff07c78 88019ff07d08
0001 8800b75108f0
Jul 22 14:36:40 fc23 kernel: Call Trace:
Jul 22 14:36:40 fc23 kernel: [81868175] dump_stack+0x4c/0x65
Jul 22 14:36:40 fc23 kernel: [81105903] print_circular_bug+0x1e3/0x250
Jul 22 14:36:40 fc23 kernel: [81108df8] __lock_acquire+0x1a78/0x1d00
Jul 22 14:36:40 fc23 kernel: [81220c33] ? unlink_file_vma+0x33/0x60
Jul 22 14:36:40 fc23 kernel: [81109a07] lock_acquire+0xc7/0x270
Jul 22 14:36:40 fc23 kernel: [81385354] ? shm_close+0x34/0x130
Jul 22 14:36:40 fc23 kernel: [8186efba] down_write+0x5a/0xc0
Jul 22 14:36:40 fc23 kernel: [81385354] ? shm_close+0x34/0x130
Jul 22 14:36:40 fc23 kernel: [81385354] shm_close+0x34/0x130
Jul 22 14:36:40 fc23 kernel: [812203a5] remove_vma+0x45/0x80
Jul 22 14:36:40 fc23 kernel: [81222a30] do_munmap+0x2b0/0x460
Jul 22 14:36:40 fc23 kernel: [81386bbb] ? SyS_shmdt+0x4b/0x180
Jul 22 14:36:40 fc23 kernel: [81386c25] SyS_shmdt+0xb5/0x180
Jul 22 14:36:40 fc23 kernel: [81871d2e]
entry_SYSCALL_64_fastpath+0x12/0x76

Reported-by: Morten Stevens mstev...@fedoraproject.org
Signed-off-by: Stephen Smalley s...@tycho.nsa.gov
---
This version only differs in the patch description, which restores
the original lockdep trace from Morten Stevens.  It was unfortunately
mangled in the prior version.

 fs/hugetlbfs/inode.c | 2 ++
 ipc/shm.c| 2 +-
 mm/shmem.c   | 4 ++--
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 0cf74df..973c24c 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -1010,6 +1010,8 @@ struct file *hugetlb_file_setup(const char *name, size_t 
size,
inode = hugetlbfs_get_inode(sb, NULL, S_IFREG | S_IRWXUGO, 0);
if (!inode)
goto out_dentry;
+   if (creat_flags == HUGETLB_SHMFS_INODE)
+   inode-i_flags |= S_PRIVATE;
 
file = ERR_PTR(-ENOMEM);
if (hugetlb_reserve_pages(inode, 0,
diff --git a/ipc/shm.c b/ipc/shm.c
index 06e5cf2..4aef24d 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -545,7 +545,7 @@ static int newseg(struct ipc_namespace *ns, struct 
ipc_params *params)
if  ((shmflg  SHM_NORESERVE) 
sysctl_overcommit_memory != OVERCOMMIT_NEVER)
acctflag = VM_NORESERVE;
-   file = shmem_file_setup(name, size, acctflag);
+   file = shmem_kernel_file_setup(name, size, acctflag);
}
error = PTR_ERR(file);
if (IS_ERR(file))
diff --git a/mm/shmem.c b/mm/shmem.c
index 4caf8ed..dbe0c1e 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3363,8 +3363,8 @@ put_path:
  * shmem_kernel_file_setup - get an unlinked file living in tmpfs which must be
  * kernel internal.  There will be NO LSM permission checks against the
  * underlying inode.  So users of this interface must do LSM checks at a
- * higher layer.  The one user is the big_key implementation.  LSM checks
- * are provided at the key level rather than the inode level.
+ * higher layer

Re: [RFC][PATCH] ipc: Use private shmem or hugetlbfs inodes for shm segments.

2015-07-24 Thread Stephen Smalley
On 07/23/2015 08:11 PM, Dave Chinner wrote:
 On Thu, Jul 23, 2015 at 12:28:33PM -0400, Stephen Smalley wrote:
 The shm implementation internally uses shmem or hugetlbfs inodes
 for shm segments.  As these inodes are never directly exposed to
 userspace and only accessed through the shm operations which are
 already hooked by security modules, mark the inodes with the
 S_PRIVATE flag so that inode security initialization and permission
 checking is skipped.

 This was motivated by the following lockdep warning:
 ===
 [ INFO: possible circular locking dependency detected ]
 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: GW
 ---
 httpd/1597 is trying to acquire lock:
 (ids-rwsem){+.}, at: [81385354] shm_close+0x34/0x130
 (mm-mmap_sem){++}, at: [81386bbb] SyS_shmdt+0x4b/0x180
   [81109a07] lock_acquire+0xc7/0x270
   [81217baa] __might_fault+0x7a/0xa0
   [81284a1e] filldir+0x9e/0x130
   [a019bb08] xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
   [a019c5b4] xfs_readdir+0x1b4/0x330 [xfs]
   [a019f38b] xfs_file_readdir+0x2b/0x30 [xfs]
   [812847e7] iterate_dir+0x97/0x130
   [81284d21] SyS_getdents+0x91/0x120
   [81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76
   [81109a07] lock_acquire+0xc7/0x270
   [81101e97] down_read_nested+0x57/0xa0
   [a01b0e57] xfs_ilock+0x167/0x350 [xfs]
   [a01b10b8] xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
   [a014799d] xfs_attr_get+0xbd/0x190 [xfs]
   [a01c17ad] xfs_xattr_get+0x3d/0x70 [xfs]
   [8129962f] generic_getxattr+0x4f/0x70
   [8139ba52] inode_doinit_with_dentry+0x162/0x670
   [8139cf69] sb_finish_set_opts+0xd9/0x230
   [8139d66c] selinux_set_mnt_opts+0x35c/0x660
   [8139ff97] superblock_doinit+0x77/0xf0
   [813a0020] delayed_superblock_init+0x10/0x20
   [81272d23] iterate_supers+0xb3/0x110
   [813a4e5f] selinux_complete_init+0x2f/0x40
   [813b47a3] security_load_policy+0x103/0x600
   [813a6901] sel_write_load+0xc1/0x750
   [8126e817] __vfs_write+0x37/0x100
   [8126f229] vfs_write+0xa9/0x1a0
   [8126ff48] SyS_write+0x58/0xd0
   [81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76
   [81109a07] lock_acquire+0xc7/0x270
   [8186de8f] mutex_lock_nested+0x7f/0x3e0
   [8139b9a9] inode_doinit_with_dentry+0xb9/0x670
   [8139bf7c] selinux_d_instantiate+0x1c/0x20
   [813955f6] security_d_instantiate+0x36/0x60
   [81287c34] d_instantiate+0x54/0x70
   [8120111c] __shmem_file_setup+0xdc/0x240
   [81201290] shmem_file_setup+0x10/0x20
   [813856e0] newseg+0x290/0x3a0
   [8137e278] ipcget+0x208/0x2d0
   [81386074] SyS_shmget+0x54/0x70
   [81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76
   [81108df8] __lock_acquire+0x1a78/0x1d00
   [81109a07] lock_acquire+0xc7/0x270
   [8186efba] down_write+0x5a/0xc0
   [81385354] shm_close+0x34/0x130
   [812203a5] remove_vma+0x45/0x80
   [81222a30] do_munmap+0x2b0/0x460
   [81386c25] SyS_shmdt+0xb5/0x180
   [81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76
 
 That's a completely screwed up stack trace. There are *4* syscall
 entry points with 4 separate, unrelated syscall chains on that
 stack trace, all starting at the same address. How is this a valid
 stack trace and not a lockdep bug of some kind?

Sorry, I mangled it when I tried to reformat it from Morten Steven's
original report.  Fixed in v2.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH] ipc: Use private shmem or hugetlbfs inodes for shm segments.

2015-07-23 Thread Stephen Smalley
The shm implementation internally uses shmem or hugetlbfs inodes
for shm segments.  As these inodes are never directly exposed to
userspace and only accessed through the shm operations which are
already hooked by security modules, mark the inodes with the
S_PRIVATE flag so that inode security initialization and permission
checking is skipped.

This was motivated by the following lockdep warning:
===
[ INFO: possible circular locking dependency detected ]
4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: GW
---
httpd/1597 is trying to acquire lock:
(>rwsem){+.}, at: [] shm_close+0x34/0x130
(>mmap_sem){++}, at: [] SyS_shmdt+0x4b/0x180
  [] lock_acquire+0xc7/0x270
  [] __might_fault+0x7a/0xa0
  [] filldir+0x9e/0x130
  [] xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
  [] xfs_readdir+0x1b4/0x330 [xfs]
  [] xfs_file_readdir+0x2b/0x30 [xfs]
  [] iterate_dir+0x97/0x130
  [] SyS_getdents+0x91/0x120
  [] entry_SYSCALL_64_fastpath+0x12/0x76
  [] lock_acquire+0xc7/0x270
  [] down_read_nested+0x57/0xa0
  [] xfs_ilock+0x167/0x350 [xfs]
  [] xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
  [] xfs_attr_get+0xbd/0x190 [xfs]
  [] xfs_xattr_get+0x3d/0x70 [xfs]
  [] generic_getxattr+0x4f/0x70
  [] inode_doinit_with_dentry+0x162/0x670
  [] sb_finish_set_opts+0xd9/0x230
  [] selinux_set_mnt_opts+0x35c/0x660
  [] superblock_doinit+0x77/0xf0
  [] delayed_superblock_init+0x10/0x20
  [] iterate_supers+0xb3/0x110
  [] selinux_complete_init+0x2f/0x40
  [] security_load_policy+0x103/0x600
  [] sel_write_load+0xc1/0x750
  [] __vfs_write+0x37/0x100
  [] vfs_write+0xa9/0x1a0
  [] SyS_write+0x58/0xd0
  [] entry_SYSCALL_64_fastpath+0x12/0x76
  [] lock_acquire+0xc7/0x270
  [] mutex_lock_nested+0x7f/0x3e0
  [] inode_doinit_with_dentry+0xb9/0x670
  [] selinux_d_instantiate+0x1c/0x20
  [] security_d_instantiate+0x36/0x60
  [] d_instantiate+0x54/0x70
  [] __shmem_file_setup+0xdc/0x240
  [] shmem_file_setup+0x10/0x20
  [] newseg+0x290/0x3a0
  [] ipcget+0x208/0x2d0
  [] SyS_shmget+0x54/0x70
  [] entry_SYSCALL_64_fastpath+0x12/0x76
  [] __lock_acquire+0x1a78/0x1d00
  [] lock_acquire+0xc7/0x270
  [] down_write+0x5a/0xc0
  [] shm_close+0x34/0x130
  [] remove_vma+0x45/0x80
  [] do_munmap+0x2b0/0x460
  [] SyS_shmdt+0xb5/0x180
  [] entry_SYSCALL_64_fastpath+0x12/0x76
Chain exists of:#012  >rwsem --> _dir_ilock_class --> >mmap_sem
Possible unsafe locking scenario:
  CPU0CPU1
  
 lock(>mmap_sem);
 lock(_dir_ilock_class);
  lock(>mmap_sem);
 lock(>rwsem);
1 lock held by httpd/1597:
CPU: 7 PID: 1597 Comm: httpd Tainted: G W   
4.2.0-0.rc3.git0.1.fc24.x86_64+Hardware name: VMware, Inc. VMware Virtual 
Platform/440BX Desktop Reference Pla 6cb6fe9d 
88019ff07c58 81868175
 82aea390 88019ff07ca8 81105903
88019ff07c78 88019ff07d08 0001 8800b75108f0
Call Trace:
[] dump_stack+0x4c/0x65
[] print_circular_bug+0x1e3/0x250
[] __lock_acquire+0x1a78/0x1d00
[] ? unlink_file_vma+0x33/0x60
[] lock_acquire+0xc7/0x270
[] ? shm_close+0x34/0x130
[] down_write+0x5a/0xc0
[] ? shm_close+0x34/0x130
[] shm_close+0x34/0x130
[] remove_vma+0x45/0x80
[] do_munmap+0x2b0/0x460
[] ? SyS_shmdt+0x4b/0x180
[] SyS_shmdt+0xb5/0x180
[] entry_SYSCALL_64_fastpath+0x12/0x76

Reported-by: Morten Stevens 
Signed-off-by: Stephen Smalley 
---
 fs/hugetlbfs/inode.c | 2 ++
 ipc/shm.c| 2 +-
 mm/shmem.c   | 4 ++--
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 0cf74df..973c24c 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -1010,6 +1010,8 @@ struct file *hugetlb_file_setup(const char *name, size_t 
size,
inode = hugetlbfs_get_inode(sb, NULL, S_IFREG | S_IRWXUGO, 0);
if (!inode)
goto out_dentry;
+   if (creat_flags == HUGETLB_SHMFS_INODE)
+   inode->i_flags |= S_PRIVATE;
 
file = ERR_PTR(-ENOMEM);
if (hugetlb_reserve_pages(inode, 0,
diff --git a/ipc/shm.c b/ipc/shm.c
index 06e5cf2..4aef24d 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -545,7 +545,7 @@ static int newseg(struct ipc_namespace *ns, struct 
ipc_params *params)
if  ((shmflg & SHM_NORESERVE) &&
sysctl_overcommit_memory != OVERCOMMIT_NEVER)
acctflag = VM_NORESERVE;
-   file = shmem_file_setup(name, size, acctflag);
+   file = shmem_kernel_file_setup(name, size, acctflag);
}
error = PTR_ERR(file);
if (IS_ERR(file))
diff --git a/mm/shmem.c 

Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts

2015-07-23 Thread Stephen Smalley
On 07/23/2015 10:39 AM, Seth Forshee wrote:
> On Thu, Jul 23, 2015 at 09:57:20AM -0400, Stephen Smalley wrote:
>> On 07/22/2015 04:40 PM, Stephen Smalley wrote:
>>> On 07/22/2015 04:25 PM, Stephen Smalley wrote:
>>>> On 07/22/2015 12:14 PM, Seth Forshee wrote:
>>>>> On Wed, Jul 22, 2015 at 12:02:13PM -0400, Stephen Smalley wrote:
>>>>>> On 07/16/2015 09:23 AM, Stephen Smalley wrote:
>>>>>>> On 07/15/2015 03:46 PM, Seth Forshee wrote:
>>>>>>>> Unprivileged users should not be able to supply security labels
>>>>>>>> in filesystems, nor should they be able to supply security
>>>>>>>> contexts in unprivileged mounts. For any mount where s_user_ns is
>>>>>>>> not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior
>>>>>>>> and return EPERM if any contexts are supplied in the mount
>>>>>>>> options.
>>>>>>>>
>>>>>>>> Signed-off-by: Seth Forshee 
>>>>>>>
>>>>>>> I think this is obsoleted by the subsequent discussion, but just for the
>>>>>>> record: this patch would cause the files in the userns mount to be left
>>>>>>> with the "unlabeled" label, and therefore under typical policies,
>>>>>>> completely inaccessible to any process in a confined domain.
>>>>>>
>>>>>> The right way to handle this for SELinux would be to automatically use
>>>>>> mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by
>>>>>> specifying a context= mount option), with the sbsec->mntpoint_sid set
>>>>>> from some related object (e.g. the block device file context, as in your
>>>>>> patches for Smack).  That will cause SELinux to use that value instead
>>>>>> of any xattr value from the filesystem and will cause attempts by
>>>>>> userspace to set the security.selinux xattr to fail on that filesystem.
>>>>>>  That is how SELinux normally deals with untrusted filesystems, except
>>>>>> that it is normally specified as a mount option by a trusted mounting
>>>>>> process, whereas in your case you need to automatically set it.
>>>>>
>>>>> Excellent, thank you for the advice. I'll start on this when I've
>>>>> finished with Smack.
>>>>
>>>> Not tested, but something like this should work. Note that it should
>>>> come after the call to security_fs_use() so we know whether SELinux
>>>> would even try to use xattrs supplied by the filesystem in the first place.
>>>>
>>>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
>>>> index 564079c..84da3a2 100644
>>>> --- a/security/selinux/hooks.c
>>>> +++ b/security/selinux/hooks.c
>>>> @@ -745,6 +745,30 @@ static int selinux_set_mnt_opts(struct super_block 
>>>> *sb,
>>>> goto out;
>>>> }
>>>> }
>>>> +
>>>> +   /*
>>>> +* If this is a user namespace mount, no contexts are allowed
>>>> +* on the command line and security labels must be ignored.
>>>> +*/
>>>> +   if (sb->s_user_ns != _user_ns) {
>>>> +   if (context_sid || fscontext_sid || rootcontext_sid ||
>>>> +   defcontext_sid) {
>>>> +   rc = -EACCES;
>>>> +   goto out;
>>>> +   }
>>>> +   if (sbsec->behavior == SECURITY_FS_USE_XATTR) {
>>>> +   struct block_device *bdev = sb->s_bdev;
>>>> +   sbsec->behavior = SECURITY_FS_USE_MNTPOINT;
>>>> +   if (bdev) {
>>>> +   struct inode_security_struct *isec =
>>>> bdev->bd_inode;
>>>
>>> That should be bdev->bd_inode->i_security.
>>
>> Sorry, this won't work.  bd_inode is not the inode of the block device
>> file that was passed to mount, and it isn't labeled in any way.  It will
>> just be unlabeled.
>>
>> So I guess the only real option here as a fallback is
>> sbsec->mntpoint_sid = current_sid().  Which isn't great either, as the
>> only case where we currently assign task labels to files is for their
>> /proc/pid inodes, and no current policy will therefore allow create
>> permission to such files.
> 
> Darn, you're right, that isn't the inode we want. There really doesn't
> seem to be any way to get back to the one we want from the LSM, short of
> adding a new hook.

Maybe list_first_entry(>s_bdev->bd_inodes, struct inode, i_devices)?
Feels like a layering violation though...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts

2015-07-23 Thread Stephen Smalley
On 07/22/2015 04:40 PM, Stephen Smalley wrote:
> On 07/22/2015 04:25 PM, Stephen Smalley wrote:
>> On 07/22/2015 12:14 PM, Seth Forshee wrote:
>>> On Wed, Jul 22, 2015 at 12:02:13PM -0400, Stephen Smalley wrote:
>>>> On 07/16/2015 09:23 AM, Stephen Smalley wrote:
>>>>> On 07/15/2015 03:46 PM, Seth Forshee wrote:
>>>>>> Unprivileged users should not be able to supply security labels
>>>>>> in filesystems, nor should they be able to supply security
>>>>>> contexts in unprivileged mounts. For any mount where s_user_ns is
>>>>>> not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior
>>>>>> and return EPERM if any contexts are supplied in the mount
>>>>>> options.
>>>>>>
>>>>>> Signed-off-by: Seth Forshee 
>>>>>
>>>>> I think this is obsoleted by the subsequent discussion, but just for the
>>>>> record: this patch would cause the files in the userns mount to be left
>>>>> with the "unlabeled" label, and therefore under typical policies,
>>>>> completely inaccessible to any process in a confined domain.
>>>>
>>>> The right way to handle this for SELinux would be to automatically use
>>>> mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by
>>>> specifying a context= mount option), with the sbsec->mntpoint_sid set
>>>> from some related object (e.g. the block device file context, as in your
>>>> patches for Smack).  That will cause SELinux to use that value instead
>>>> of any xattr value from the filesystem and will cause attempts by
>>>> userspace to set the security.selinux xattr to fail on that filesystem.
>>>>  That is how SELinux normally deals with untrusted filesystems, except
>>>> that it is normally specified as a mount option by a trusted mounting
>>>> process, whereas in your case you need to automatically set it.
>>>
>>> Excellent, thank you for the advice. I'll start on this when I've
>>> finished with Smack.
>>
>> Not tested, but something like this should work. Note that it should
>> come after the call to security_fs_use() so we know whether SELinux
>> would even try to use xattrs supplied by the filesystem in the first place.
>>
>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
>> index 564079c..84da3a2 100644
>> --- a/security/selinux/hooks.c
>> +++ b/security/selinux/hooks.c
>> @@ -745,6 +745,30 @@ static int selinux_set_mnt_opts(struct super_block *sb,
>> goto out;
>> }
>> }
>> +
>> +   /*
>> +* If this is a user namespace mount, no contexts are allowed
>> +* on the command line and security labels must be ignored.
>> +*/
>> +   if (sb->s_user_ns != _user_ns) {
>> +   if (context_sid || fscontext_sid || rootcontext_sid ||
>> +   defcontext_sid) {
>> +   rc = -EACCES;
>> +   goto out;
>> +   }
>> +   if (sbsec->behavior == SECURITY_FS_USE_XATTR) {
>> +   struct block_device *bdev = sb->s_bdev;
>> +   sbsec->behavior = SECURITY_FS_USE_MNTPOINT;
>> +   if (bdev) {
>> +   struct inode_security_struct *isec =
>> bdev->bd_inode;
> 
> That should be bdev->bd_inode->i_security.

Sorry, this won't work.  bd_inode is not the inode of the block device
file that was passed to mount, and it isn't labeled in any way.  It will
just be unlabeled.

So I guess the only real option here as a fallback is
sbsec->mntpoint_sid = current_sid().  Which isn't great either, as the
only case where we currently assign task labels to files is for their
/proc/pid inodes, and no current policy will therefore allow create
permission to such files.

> 
>> +   sbsec->mntpoint_sid = isec->sid;
>> +   } else {
>> +   sbsec->mntpoint_sid = current_sid();
>> +   }
>> +   }
>> +   goto out_set_opts;
>> +   }
>> +
>> /* sets the context of the superblock for the fs being mounted. */
>> if (fscontext_sid) {
>> rc = may_context_mount_sb_relabel(fscontext_sid, sbsec,
>> cred);
>> @@ -813,6 +837,7 @@ static int selinux_set_mnt_opts(struct super_block *sb,
>>  

Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts

2015-07-23 Thread Stephen Smalley
On 07/22/2015 04:40 PM, Stephen Smalley wrote:
 On 07/22/2015 04:25 PM, Stephen Smalley wrote:
 On 07/22/2015 12:14 PM, Seth Forshee wrote:
 On Wed, Jul 22, 2015 at 12:02:13PM -0400, Stephen Smalley wrote:
 On 07/16/2015 09:23 AM, Stephen Smalley wrote:
 On 07/15/2015 03:46 PM, Seth Forshee wrote:
 Unprivileged users should not be able to supply security labels
 in filesystems, nor should they be able to supply security
 contexts in unprivileged mounts. For any mount where s_user_ns is
 not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior
 and return EPERM if any contexts are supplied in the mount
 options.

 Signed-off-by: Seth Forshee seth.fors...@canonical.com

 I think this is obsoleted by the subsequent discussion, but just for the
 record: this patch would cause the files in the userns mount to be left
 with the unlabeled label, and therefore under typical policies,
 completely inaccessible to any process in a confined domain.

 The right way to handle this for SELinux would be to automatically use
 mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by
 specifying a context= mount option), with the sbsec-mntpoint_sid set
 from some related object (e.g. the block device file context, as in your
 patches for Smack).  That will cause SELinux to use that value instead
 of any xattr value from the filesystem and will cause attempts by
 userspace to set the security.selinux xattr to fail on that filesystem.
  That is how SELinux normally deals with untrusted filesystems, except
 that it is normally specified as a mount option by a trusted mounting
 process, whereas in your case you need to automatically set it.

 Excellent, thank you for the advice. I'll start on this when I've
 finished with Smack.

 Not tested, but something like this should work. Note that it should
 come after the call to security_fs_use() so we know whether SELinux
 would even try to use xattrs supplied by the filesystem in the first place.

 diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
 index 564079c..84da3a2 100644
 --- a/security/selinux/hooks.c
 +++ b/security/selinux/hooks.c
 @@ -745,6 +745,30 @@ static int selinux_set_mnt_opts(struct super_block *sb,
 goto out;
 }
 }
 +
 +   /*
 +* If this is a user namespace mount, no contexts are allowed
 +* on the command line and security labels must be ignored.
 +*/
 +   if (sb-s_user_ns != init_user_ns) {
 +   if (context_sid || fscontext_sid || rootcontext_sid ||
 +   defcontext_sid) {
 +   rc = -EACCES;
 +   goto out;
 +   }
 +   if (sbsec-behavior == SECURITY_FS_USE_XATTR) {
 +   struct block_device *bdev = sb-s_bdev;
 +   sbsec-behavior = SECURITY_FS_USE_MNTPOINT;
 +   if (bdev) {
 +   struct inode_security_struct *isec =
 bdev-bd_inode;
 
 That should be bdev-bd_inode-i_security.

Sorry, this won't work.  bd_inode is not the inode of the block device
file that was passed to mount, and it isn't labeled in any way.  It will
just be unlabeled.

So I guess the only real option here as a fallback is
sbsec-mntpoint_sid = current_sid().  Which isn't great either, as the
only case where we currently assign task labels to files is for their
/proc/pid inodes, and no current policy will therefore allow create
permission to such files.

 
 +   sbsec-mntpoint_sid = isec-sid;
 +   } else {
 +   sbsec-mntpoint_sid = current_sid();
 +   }
 +   }
 +   goto out_set_opts;
 +   }
 +
 /* sets the context of the superblock for the fs being mounted. */
 if (fscontext_sid) {
 rc = may_context_mount_sb_relabel(fscontext_sid, sbsec,
 cred);
 @@ -813,6 +837,7 @@ static int selinux_set_mnt_opts(struct super_block *sb,
 sbsec-def_sid = defcontext_sid;
 }

 +out_set_opts:
 rc = sb_finish_set_opts(sb);
  out:
 mutex_unlock(sbsec-lock);

 ___
 Selinux mailing list
 seli...@tycho.nsa.gov
 To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
 To get help, send an email containing help to 
 selinux-requ...@tycho.nsa.gov.

 
 --
 To unsubscribe from this list: send the line unsubscribe 
 linux-security-module in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts

2015-07-23 Thread Stephen Smalley
On 07/23/2015 10:39 AM, Seth Forshee wrote:
 On Thu, Jul 23, 2015 at 09:57:20AM -0400, Stephen Smalley wrote:
 On 07/22/2015 04:40 PM, Stephen Smalley wrote:
 On 07/22/2015 04:25 PM, Stephen Smalley wrote:
 On 07/22/2015 12:14 PM, Seth Forshee wrote:
 On Wed, Jul 22, 2015 at 12:02:13PM -0400, Stephen Smalley wrote:
 On 07/16/2015 09:23 AM, Stephen Smalley wrote:
 On 07/15/2015 03:46 PM, Seth Forshee wrote:
 Unprivileged users should not be able to supply security labels
 in filesystems, nor should they be able to supply security
 contexts in unprivileged mounts. For any mount where s_user_ns is
 not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior
 and return EPERM if any contexts are supplied in the mount
 options.

 Signed-off-by: Seth Forshee seth.fors...@canonical.com

 I think this is obsoleted by the subsequent discussion, but just for the
 record: this patch would cause the files in the userns mount to be left
 with the unlabeled label, and therefore under typical policies,
 completely inaccessible to any process in a confined domain.

 The right way to handle this for SELinux would be to automatically use
 mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by
 specifying a context= mount option), with the sbsec-mntpoint_sid set
 from some related object (e.g. the block device file context, as in your
 patches for Smack).  That will cause SELinux to use that value instead
 of any xattr value from the filesystem and will cause attempts by
 userspace to set the security.selinux xattr to fail on that filesystem.
  That is how SELinux normally deals with untrusted filesystems, except
 that it is normally specified as a mount option by a trusted mounting
 process, whereas in your case you need to automatically set it.

 Excellent, thank you for the advice. I'll start on this when I've
 finished with Smack.

 Not tested, but something like this should work. Note that it should
 come after the call to security_fs_use() so we know whether SELinux
 would even try to use xattrs supplied by the filesystem in the first place.

 diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
 index 564079c..84da3a2 100644
 --- a/security/selinux/hooks.c
 +++ b/security/selinux/hooks.c
 @@ -745,6 +745,30 @@ static int selinux_set_mnt_opts(struct super_block 
 *sb,
 goto out;
 }
 }
 +
 +   /*
 +* If this is a user namespace mount, no contexts are allowed
 +* on the command line and security labels must be ignored.
 +*/
 +   if (sb-s_user_ns != init_user_ns) {
 +   if (context_sid || fscontext_sid || rootcontext_sid ||
 +   defcontext_sid) {
 +   rc = -EACCES;
 +   goto out;
 +   }
 +   if (sbsec-behavior == SECURITY_FS_USE_XATTR) {
 +   struct block_device *bdev = sb-s_bdev;
 +   sbsec-behavior = SECURITY_FS_USE_MNTPOINT;
 +   if (bdev) {
 +   struct inode_security_struct *isec =
 bdev-bd_inode;

 That should be bdev-bd_inode-i_security.

 Sorry, this won't work.  bd_inode is not the inode of the block device
 file that was passed to mount, and it isn't labeled in any way.  It will
 just be unlabeled.

 So I guess the only real option here as a fallback is
 sbsec-mntpoint_sid = current_sid().  Which isn't great either, as the
 only case where we currently assign task labels to files is for their
 /proc/pid inodes, and no current policy will therefore allow create
 permission to such files.
 
 Darn, you're right, that isn't the inode we want. There really doesn't
 seem to be any way to get back to the one we want from the LSM, short of
 adding a new hook.

Maybe list_first_entry(sb-s_bdev-bd_inodes, struct inode, i_devices)?
Feels like a layering violation though...

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH] ipc: Use private shmem or hugetlbfs inodes for shm segments.

2015-07-23 Thread Stephen Smalley
The shm implementation internally uses shmem or hugetlbfs inodes
for shm segments.  As these inodes are never directly exposed to
userspace and only accessed through the shm operations which are
already hooked by security modules, mark the inodes with the
S_PRIVATE flag so that inode security initialization and permission
checking is skipped.

This was motivated by the following lockdep warning:
===
[ INFO: possible circular locking dependency detected ]
4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: GW
---
httpd/1597 is trying to acquire lock:
(ids-rwsem){+.}, at: [81385354] shm_close+0x34/0x130
(mm-mmap_sem){++}, at: [81386bbb] SyS_shmdt+0x4b/0x180
  [81109a07] lock_acquire+0xc7/0x270
  [81217baa] __might_fault+0x7a/0xa0
  [81284a1e] filldir+0x9e/0x130
  [a019bb08] xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
  [a019c5b4] xfs_readdir+0x1b4/0x330 [xfs]
  [a019f38b] xfs_file_readdir+0x2b/0x30 [xfs]
  [812847e7] iterate_dir+0x97/0x130
  [81284d21] SyS_getdents+0x91/0x120
  [81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76
  [81109a07] lock_acquire+0xc7/0x270
  [81101e97] down_read_nested+0x57/0xa0
  [a01b0e57] xfs_ilock+0x167/0x350 [xfs]
  [a01b10b8] xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
  [a014799d] xfs_attr_get+0xbd/0x190 [xfs]
  [a01c17ad] xfs_xattr_get+0x3d/0x70 [xfs]
  [8129962f] generic_getxattr+0x4f/0x70
  [8139ba52] inode_doinit_with_dentry+0x162/0x670
  [8139cf69] sb_finish_set_opts+0xd9/0x230
  [8139d66c] selinux_set_mnt_opts+0x35c/0x660
  [8139ff97] superblock_doinit+0x77/0xf0
  [813a0020] delayed_superblock_init+0x10/0x20
  [81272d23] iterate_supers+0xb3/0x110
  [813a4e5f] selinux_complete_init+0x2f/0x40
  [813b47a3] security_load_policy+0x103/0x600
  [813a6901] sel_write_load+0xc1/0x750
  [8126e817] __vfs_write+0x37/0x100
  [8126f229] vfs_write+0xa9/0x1a0
  [8126ff48] SyS_write+0x58/0xd0
  [81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76
  [81109a07] lock_acquire+0xc7/0x270
  [8186de8f] mutex_lock_nested+0x7f/0x3e0
  [8139b9a9] inode_doinit_with_dentry+0xb9/0x670
  [8139bf7c] selinux_d_instantiate+0x1c/0x20
  [813955f6] security_d_instantiate+0x36/0x60
  [81287c34] d_instantiate+0x54/0x70
  [8120111c] __shmem_file_setup+0xdc/0x240
  [81201290] shmem_file_setup+0x10/0x20
  [813856e0] newseg+0x290/0x3a0
  [8137e278] ipcget+0x208/0x2d0
  [81386074] SyS_shmget+0x54/0x70
  [81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76
  [81108df8] __lock_acquire+0x1a78/0x1d00
  [81109a07] lock_acquire+0xc7/0x270
  [8186efba] down_write+0x5a/0xc0
  [81385354] shm_close+0x34/0x130
  [812203a5] remove_vma+0x45/0x80
  [81222a30] do_munmap+0x2b0/0x460
  [81386c25] SyS_shmdt+0xb5/0x180
  [81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76
Chain exists of:#012  ids-rwsem -- xfs_dir_ilock_class -- mm-mmap_sem
Possible unsafe locking scenario:
  CPU0CPU1
  
 lock(mm-mmap_sem);
 lock(xfs_dir_ilock_class);
  lock(mm-mmap_sem);
 lock(ids-rwsem);
1 lock held by httpd/1597:
CPU: 7 PID: 1597 Comm: httpd Tainted: G W   
4.2.0-0.rc3.git0.1.fc24.x86_64+Hardware name: VMware, Inc. VMware Virtual 
Platform/440BX Desktop Reference Pla 6cb6fe9d 
88019ff07c58 81868175
 82aea390 88019ff07ca8 81105903
88019ff07c78 88019ff07d08 0001 8800b75108f0
Call Trace:
[81868175] dump_stack+0x4c/0x65
[81105903] print_circular_bug+0x1e3/0x250
[81108df8] __lock_acquire+0x1a78/0x1d00
[81220c33] ? unlink_file_vma+0x33/0x60
[81109a07] lock_acquire+0xc7/0x270
[81385354] ? shm_close+0x34/0x130
[8186efba] down_write+0x5a/0xc0
[81385354] ? shm_close+0x34/0x130
[81385354] shm_close+0x34/0x130
[812203a5] remove_vma+0x45/0x80
[81222a30] do_munmap+0x2b0/0x460
[81386bbb] ? SyS_shmdt+0x4b/0x180
[81386c25] SyS_shmdt+0xb5/0x180
[81871d2e] entry_SYSCALL_64_fastpath+0x12/0x76

Reported-by: Morten Stevens mstev...@fedoraproject.org
Signed-off-by: Stephen Smalley s...@tycho.nsa.gov
---
 fs/hugetlbfs/inode.c | 2 ++
 ipc/shm.c| 2 +-
 mm/shmem.c   | 4 ++--
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs

Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS

2015-07-22 Thread Stephen Smalley
On 07/22/2015 08:46 AM, Morten Stevens wrote:
> 2015-06-17 13:45 GMT+02:00 Morten Stevens :
>> 2015-06-15 8:09 GMT+02:00 Daniel Wagner :
>>> On 06/14/2015 06:48 PM, Hugh Dickins wrote:
 It appears that, at some point last year, XFS made directory handling
 changes which bring it into lockdep conflict with shmem_zero_setup():
 it is surprising that mmap() can clone an inode while holding mmap_sem,
 but that has been so for many years.

 Since those few lockdep traces that I've seen all implicated selinux,
 I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which
 v3.13's commit c7277090927a ("security: shmem: implement kernel private
 shmem inodes") introduced to avoid LSM checks on kernel-internal inodes:
 the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail.

 This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero
 (and MAP_SHARED|MAP_ANONYMOUS).  I thought there were also drivers
 which cloned inode in mmap(), but if so, I cannot locate them now.

 Reported-and-tested-by: Prarit Bhargava 
 Reported-by: Daniel Wagner 
>>>
>>> Reported-and-tested-by: Daniel Wagner 
>>>
>>> Sorry for the long delay. It took me a while to figure out my original
>>> setup. I could verify that this patch made the lockdep message go away
>>> on 4.0-rc6 and also on 4.1-rc8.
>>
>> Yes, it's also fixed for me after applying this patch to 4.1-rc8.
> 
> Here is another deadlock with the latest 4.2.0-rc3:
> 
> Jul 22 14:36:40 fc23 kernel:
> ==
> Jul 22 14:36:40 fc23 kernel: [ INFO: possible circular locking
> dependency detected ]
> Jul 22 14:36:40 fc23 kernel: 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1
> Tainted: GW
> Jul 22 14:36:40 fc23 kernel:
> ---
> Jul 22 14:36:40 fc23 kernel: httpd/1597 is trying to acquire lock:
> Jul 22 14:36:40 fc23 kernel: (>rwsem){+.}, at:
> [] shm_close+0x34/0x130
> Jul 22 14:36:40 fc23 kernel: #012but task is already holding lock:
> Jul 22 14:36:40 fc23 kernel: (>mmap_sem){++}, at:
> [] SyS_shmdt+0x4b/0x180
> Jul 22 14:36:40 fc23 kernel: #012which lock already depends on the new lock.
> Jul 22 14:36:40 fc23 kernel: #012the existing dependency chain (in
> reverse order) is:
> Jul 22 14:36:40 fc23 kernel: #012-> #3 (>mmap_sem){++}:
> Jul 22 14:36:40 fc23 kernel:   [] 
> lock_acquire+0xc7/0x270
> Jul 22 14:36:40 fc23 kernel:   [] 
> __might_fault+0x7a/0xa0
> Jul 22 14:36:40 fc23 kernel:   [] filldir+0x9e/0x130
> Jul 22 14:36:40 fc23 kernel:   []
> xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
> Jul 22 14:36:40 fc23 kernel:   []
> xfs_readdir+0x1b4/0x330 [xfs]
> Jul 22 14:36:40 fc23 kernel:   []
> xfs_file_readdir+0x2b/0x30 [xfs]
> Jul 22 14:36:40 fc23 kernel:   [] iterate_dir+0x97/0x130
> Jul 22 14:36:40 fc23 kernel:   [] 
> SyS_getdents+0x91/0x120
> Jul 22 14:36:40 fc23 kernel:   []
> entry_SYSCALL_64_fastpath+0x12/0x76
> Jul 22 14:36:40 fc23 kernel: #012-> #2 (_dir_ilock_class){.+}:
> Jul 22 14:36:40 fc23 kernel:   [] 
> lock_acquire+0xc7/0x270
> Jul 22 14:36:40 fc23 kernel:   []
> down_read_nested+0x57/0xa0
> Jul 22 14:36:40 fc23 kernel:   []
> xfs_ilock+0x167/0x350 [xfs]
> Jul 22 14:36:40 fc23 kernel:   []
> xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
> Jul 22 14:36:40 fc23 kernel:   []
> xfs_attr_get+0xbd/0x190 [xfs]
> Jul 22 14:36:40 fc23 kernel:   []
> xfs_xattr_get+0x3d/0x70 [xfs]
> Jul 22 14:36:40 fc23 kernel:   []
> generic_getxattr+0x4f/0x70
> Jul 22 14:36:40 fc23 kernel:   []
> inode_doinit_with_dentry+0x162/0x670
> Jul 22 14:36:40 fc23 kernel:   []
> sb_finish_set_opts+0xd9/0x230
> Jul 22 14:36:40 fc23 kernel:   []
> selinux_set_mnt_opts+0x35c/0x660
> Jul 22 14:36:40 fc23 kernel:   []
> superblock_doinit+0x77/0xf0
> Jul 22 14:36:40 fc23 kernel:   []
> delayed_superblock_init+0x10/0x20
> Jul 22 14:36:40 fc23 kernel:   []
> iterate_supers+0xb3/0x110
> Jul 22 14:36:40 fc23 kernel:   []
> selinux_complete_init+0x2f/0x40
> Jul 22 14:36:40 fc23 kernel:   []
> security_load_policy+0x103/0x600
> Jul 22 14:36:40 fc23 kernel:   []
> sel_write_load+0xc1/0x750
> Jul 22 14:36:40 fc23 kernel:   [] __vfs_write+0x37/0x100
> Jul 22 14:36:40 fc23 kernel:   [] vfs_write+0xa9/0x1a0
> Jul 22 14:36:40 fc23 kernel:   [] SyS_write+0x58/0xd0
> Jul 22 14:36:40 fc23 kernel:   []
> entry_SYSCALL_64_fastpath+0x12/0x76
> Jul 22 14:36:40 fc23 kernel: #012-> #1 (>lock){+.+.+.}:
> Jul 22 14:36:40 fc23 kernel:   [] 
> lock_acquire+0xc7/0x270
> Jul 22 14:36:40 fc23 kernel:   []
> mutex_lock_nested+0x7f/0x3e0
> Jul 22 14:36:40 fc23 kernel:   []
> inode_doinit_with_dentry+0xb9/0x670
> Jul 22 14:36:40 fc23 kernel:   []
> selinux_d_instantiate+0x1c/0x20
> Jul 22 14:36:40 fc23 kernel:   []
> security_d_instantiate+0x36/0x60
> Jul 22 14:36:40 fc23 kernel:   [] 

Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts

2015-07-22 Thread Stephen Smalley
On 07/22/2015 04:25 PM, Stephen Smalley wrote:
> On 07/22/2015 12:14 PM, Seth Forshee wrote:
>> On Wed, Jul 22, 2015 at 12:02:13PM -0400, Stephen Smalley wrote:
>>> On 07/16/2015 09:23 AM, Stephen Smalley wrote:
>>>> On 07/15/2015 03:46 PM, Seth Forshee wrote:
>>>>> Unprivileged users should not be able to supply security labels
>>>>> in filesystems, nor should they be able to supply security
>>>>> contexts in unprivileged mounts. For any mount where s_user_ns is
>>>>> not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior
>>>>> and return EPERM if any contexts are supplied in the mount
>>>>> options.
>>>>>
>>>>> Signed-off-by: Seth Forshee 
>>>>
>>>> I think this is obsoleted by the subsequent discussion, but just for the
>>>> record: this patch would cause the files in the userns mount to be left
>>>> with the "unlabeled" label, and therefore under typical policies,
>>>> completely inaccessible to any process in a confined domain.
>>>
>>> The right way to handle this for SELinux would be to automatically use
>>> mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by
>>> specifying a context= mount option), with the sbsec->mntpoint_sid set
>>> from some related object (e.g. the block device file context, as in your
>>> patches for Smack).  That will cause SELinux to use that value instead
>>> of any xattr value from the filesystem and will cause attempts by
>>> userspace to set the security.selinux xattr to fail on that filesystem.
>>>  That is how SELinux normally deals with untrusted filesystems, except
>>> that it is normally specified as a mount option by a trusted mounting
>>> process, whereas in your case you need to automatically set it.
>>
>> Excellent, thank you for the advice. I'll start on this when I've
>> finished with Smack.
> 
> Not tested, but something like this should work. Note that it should
> come after the call to security_fs_use() so we know whether SELinux
> would even try to use xattrs supplied by the filesystem in the first place.
> 
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 564079c..84da3a2 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -745,6 +745,30 @@ static int selinux_set_mnt_opts(struct super_block *sb,
> goto out;
> }
> }
> +
> +   /*
> +* If this is a user namespace mount, no contexts are allowed
> +* on the command line and security labels must be ignored.
> +*/
> +   if (sb->s_user_ns != _user_ns) {
> +   if (context_sid || fscontext_sid || rootcontext_sid ||
> +   defcontext_sid) {
> +   rc = -EACCES;
> +   goto out;
> +   }
> +   if (sbsec->behavior == SECURITY_FS_USE_XATTR) {
> +   struct block_device *bdev = sb->s_bdev;
> +   sbsec->behavior = SECURITY_FS_USE_MNTPOINT;
> +   if (bdev) {
> +   struct inode_security_struct *isec =
> bdev->bd_inode;

That should be bdev->bd_inode->i_security.

> +   sbsec->mntpoint_sid = isec->sid;
> +   } else {
> +   sbsec->mntpoint_sid = current_sid();
> +   }
> +   }
> +   goto out_set_opts;
> +   }
> +
> /* sets the context of the superblock for the fs being mounted. */
> if (fscontext_sid) {
> rc = may_context_mount_sb_relabel(fscontext_sid, sbsec,
> cred);
> @@ -813,6 +837,7 @@ static int selinux_set_mnt_opts(struct super_block *sb,
> sbsec->def_sid = defcontext_sid;
> }
> 
> +out_set_opts:
> rc = sb_finish_set_opts(sb);
>  out:
> mutex_unlock(>lock);
> 
> ___
> Selinux mailing list
> seli...@tycho.nsa.gov
> To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
> To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts

2015-07-22 Thread Stephen Smalley
On 07/22/2015 12:14 PM, Seth Forshee wrote:
> On Wed, Jul 22, 2015 at 12:02:13PM -0400, Stephen Smalley wrote:
>> On 07/16/2015 09:23 AM, Stephen Smalley wrote:
>>> On 07/15/2015 03:46 PM, Seth Forshee wrote:
>>>> Unprivileged users should not be able to supply security labels
>>>> in filesystems, nor should they be able to supply security
>>>> contexts in unprivileged mounts. For any mount where s_user_ns is
>>>> not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior
>>>> and return EPERM if any contexts are supplied in the mount
>>>> options.
>>>>
>>>> Signed-off-by: Seth Forshee 
>>>
>>> I think this is obsoleted by the subsequent discussion, but just for the
>>> record: this patch would cause the files in the userns mount to be left
>>> with the "unlabeled" label, and therefore under typical policies,
>>> completely inaccessible to any process in a confined domain.
>>
>> The right way to handle this for SELinux would be to automatically use
>> mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by
>> specifying a context= mount option), with the sbsec->mntpoint_sid set
>> from some related object (e.g. the block device file context, as in your
>> patches for Smack).  That will cause SELinux to use that value instead
>> of any xattr value from the filesystem and will cause attempts by
>> userspace to set the security.selinux xattr to fail on that filesystem.
>>  That is how SELinux normally deals with untrusted filesystems, except
>> that it is normally specified as a mount option by a trusted mounting
>> process, whereas in your case you need to automatically set it.
> 
> Excellent, thank you for the advice. I'll start on this when I've
> finished with Smack.

Not tested, but something like this should work. Note that it should
come after the call to security_fs_use() so we know whether SELinux
would even try to use xattrs supplied by the filesystem in the first place.

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 564079c..84da3a2 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -745,6 +745,30 @@ static int selinux_set_mnt_opts(struct super_block *sb,
goto out;
}
}
+
+   /*
+* If this is a user namespace mount, no contexts are allowed
+* on the command line and security labels must be ignored.
+*/
+   if (sb->s_user_ns != _user_ns) {
+   if (context_sid || fscontext_sid || rootcontext_sid ||
+   defcontext_sid) {
+   rc = -EACCES;
+   goto out;
+   }
+   if (sbsec->behavior == SECURITY_FS_USE_XATTR) {
+   struct block_device *bdev = sb->s_bdev;
+   sbsec->behavior = SECURITY_FS_USE_MNTPOINT;
+   if (bdev) {
+   struct inode_security_struct *isec =
bdev->bd_inode;
+   sbsec->mntpoint_sid = isec->sid;
+   } else {
+   sbsec->mntpoint_sid = current_sid();
+   }
+   }
+   goto out_set_opts;
+   }
+
/* sets the context of the superblock for the fs being mounted. */
if (fscontext_sid) {
rc = may_context_mount_sb_relabel(fscontext_sid, sbsec,
cred);
@@ -813,6 +837,7 @@ static int selinux_set_mnt_opts(struct super_block *sb,
sbsec->def_sid = defcontext_sid;
}

+out_set_opts:
rc = sb_finish_set_opts(sb);
 out:
mutex_unlock(>lock);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts

2015-07-22 Thread Stephen Smalley
On 07/16/2015 09:23 AM, Stephen Smalley wrote:
> On 07/15/2015 03:46 PM, Seth Forshee wrote:
>> Unprivileged users should not be able to supply security labels
>> in filesystems, nor should they be able to supply security
>> contexts in unprivileged mounts. For any mount where s_user_ns is
>> not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior
>> and return EPERM if any contexts are supplied in the mount
>> options.
>>
>> Signed-off-by: Seth Forshee 
> 
> I think this is obsoleted by the subsequent discussion, but just for the
> record: this patch would cause the files in the userns mount to be left
> with the "unlabeled" label, and therefore under typical policies,
> completely inaccessible to any process in a confined domain.

The right way to handle this for SELinux would be to automatically use
mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by
specifying a context= mount option), with the sbsec->mntpoint_sid set
from some related object (e.g. the block device file context, as in your
patches for Smack).  That will cause SELinux to use that value instead
of any xattr value from the filesystem and will cause attempts by
userspace to set the security.selinux xattr to fail on that filesystem.
 That is how SELinux normally deals with untrusted filesystems, except
that it is normally specified as a mount option by a trusted mounting
process, whereas in your case you need to automatically set it.

> 
>> ---
>>  security/selinux/hooks.c | 14 ++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
>> index 459e71ddbc9d..eeb71e45ab82 100644
>> --- a/security/selinux/hooks.c
>> +++ b/security/selinux/hooks.c
>> @@ -732,6 +732,19 @@ static int selinux_set_mnt_opts(struct super_block *sb,
>>  !strcmp(sb->s_type->name, "pstore"))
>>  sbsec->flags |= SE_SBGENFS;
>>  
>> +/*
>> + * If this is a user namespace mount, no contexts are allowed
>> + * on the command line and security labels mus be ignored.
>> + */
>> +if (sb->s_user_ns != _user_ns) {
>> +if (context_sid || fscontext_sid || rootcontext_sid ||
>> +defcontext_sid)
>> +return -EPERM;
>> +sbsec->behavior = SECURITY_FS_USE_NONE;
>> +goto out_set_opts;
>> +}
>> +
>> +
>>  if (!sbsec->behavior) {
>>  /*
>>   * Determine the labeling behavior to use for this
>> @@ -813,6 +826,7 @@ static int selinux_set_mnt_opts(struct super_block *sb,
>>  sbsec->def_sid = defcontext_sid;
>>  }
>>  
>> +out_set_opts:
>>  rc = sb_finish_set_opts(sb);
>>  out:
>>  mutex_unlock(>lock);
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe 
> linux-security-module" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts

2015-07-22 Thread Stephen Smalley
On 07/16/2015 09:23 AM, Stephen Smalley wrote:
 On 07/15/2015 03:46 PM, Seth Forshee wrote:
 Unprivileged users should not be able to supply security labels
 in filesystems, nor should they be able to supply security
 contexts in unprivileged mounts. For any mount where s_user_ns is
 not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior
 and return EPERM if any contexts are supplied in the mount
 options.

 Signed-off-by: Seth Forshee seth.fors...@canonical.com
 
 I think this is obsoleted by the subsequent discussion, but just for the
 record: this patch would cause the files in the userns mount to be left
 with the unlabeled label, and therefore under typical policies,
 completely inaccessible to any process in a confined domain.

The right way to handle this for SELinux would be to automatically use
mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by
specifying a context= mount option), with the sbsec-mntpoint_sid set
from some related object (e.g. the block device file context, as in your
patches for Smack).  That will cause SELinux to use that value instead
of any xattr value from the filesystem and will cause attempts by
userspace to set the security.selinux xattr to fail on that filesystem.
 That is how SELinux normally deals with untrusted filesystems, except
that it is normally specified as a mount option by a trusted mounting
process, whereas in your case you need to automatically set it.

 
 ---
  security/selinux/hooks.c | 14 ++
  1 file changed, 14 insertions(+)

 diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
 index 459e71ddbc9d..eeb71e45ab82 100644
 --- a/security/selinux/hooks.c
 +++ b/security/selinux/hooks.c
 @@ -732,6 +732,19 @@ static int selinux_set_mnt_opts(struct super_block *sb,
  !strcmp(sb-s_type-name, pstore))
  sbsec-flags |= SE_SBGENFS;
  
 +/*
 + * If this is a user namespace mount, no contexts are allowed
 + * on the command line and security labels mus be ignored.
 + */
 +if (sb-s_user_ns != init_user_ns) {
 +if (context_sid || fscontext_sid || rootcontext_sid ||
 +defcontext_sid)
 +return -EPERM;
 +sbsec-behavior = SECURITY_FS_USE_NONE;
 +goto out_set_opts;
 +}
 +
 +
  if (!sbsec-behavior) {
  /*
   * Determine the labeling behavior to use for this
 @@ -813,6 +826,7 @@ static int selinux_set_mnt_opts(struct super_block *sb,
  sbsec-def_sid = defcontext_sid;
  }
  
 +out_set_opts:
  rc = sb_finish_set_opts(sb);
  out:
  mutex_unlock(sbsec-lock);

 
 --
 To unsubscribe from this list: send the line unsubscribe 
 linux-security-module in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts

2015-07-22 Thread Stephen Smalley
On 07/22/2015 12:14 PM, Seth Forshee wrote:
 On Wed, Jul 22, 2015 at 12:02:13PM -0400, Stephen Smalley wrote:
 On 07/16/2015 09:23 AM, Stephen Smalley wrote:
 On 07/15/2015 03:46 PM, Seth Forshee wrote:
 Unprivileged users should not be able to supply security labels
 in filesystems, nor should they be able to supply security
 contexts in unprivileged mounts. For any mount where s_user_ns is
 not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior
 and return EPERM if any contexts are supplied in the mount
 options.

 Signed-off-by: Seth Forshee seth.fors...@canonical.com

 I think this is obsoleted by the subsequent discussion, but just for the
 record: this patch would cause the files in the userns mount to be left
 with the unlabeled label, and therefore under typical policies,
 completely inaccessible to any process in a confined domain.

 The right way to handle this for SELinux would be to automatically use
 mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by
 specifying a context= mount option), with the sbsec-mntpoint_sid set
 from some related object (e.g. the block device file context, as in your
 patches for Smack).  That will cause SELinux to use that value instead
 of any xattr value from the filesystem and will cause attempts by
 userspace to set the security.selinux xattr to fail on that filesystem.
  That is how SELinux normally deals with untrusted filesystems, except
 that it is normally specified as a mount option by a trusted mounting
 process, whereas in your case you need to automatically set it.
 
 Excellent, thank you for the advice. I'll start on this when I've
 finished with Smack.

Not tested, but something like this should work. Note that it should
come after the call to security_fs_use() so we know whether SELinux
would even try to use xattrs supplied by the filesystem in the first place.

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 564079c..84da3a2 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -745,6 +745,30 @@ static int selinux_set_mnt_opts(struct super_block *sb,
goto out;
}
}
+
+   /*
+* If this is a user namespace mount, no contexts are allowed
+* on the command line and security labels must be ignored.
+*/
+   if (sb-s_user_ns != init_user_ns) {
+   if (context_sid || fscontext_sid || rootcontext_sid ||
+   defcontext_sid) {
+   rc = -EACCES;
+   goto out;
+   }
+   if (sbsec-behavior == SECURITY_FS_USE_XATTR) {
+   struct block_device *bdev = sb-s_bdev;
+   sbsec-behavior = SECURITY_FS_USE_MNTPOINT;
+   if (bdev) {
+   struct inode_security_struct *isec =
bdev-bd_inode;
+   sbsec-mntpoint_sid = isec-sid;
+   } else {
+   sbsec-mntpoint_sid = current_sid();
+   }
+   }
+   goto out_set_opts;
+   }
+
/* sets the context of the superblock for the fs being mounted. */
if (fscontext_sid) {
rc = may_context_mount_sb_relabel(fscontext_sid, sbsec,
cred);
@@ -813,6 +837,7 @@ static int selinux_set_mnt_opts(struct super_block *sb,
sbsec-def_sid = defcontext_sid;
}

+out_set_opts:
rc = sb_finish_set_opts(sb);
 out:
mutex_unlock(sbsec-lock);

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts

2015-07-22 Thread Stephen Smalley
On 07/22/2015 04:25 PM, Stephen Smalley wrote:
 On 07/22/2015 12:14 PM, Seth Forshee wrote:
 On Wed, Jul 22, 2015 at 12:02:13PM -0400, Stephen Smalley wrote:
 On 07/16/2015 09:23 AM, Stephen Smalley wrote:
 On 07/15/2015 03:46 PM, Seth Forshee wrote:
 Unprivileged users should not be able to supply security labels
 in filesystems, nor should they be able to supply security
 contexts in unprivileged mounts. For any mount where s_user_ns is
 not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior
 and return EPERM if any contexts are supplied in the mount
 options.

 Signed-off-by: Seth Forshee seth.fors...@canonical.com

 I think this is obsoleted by the subsequent discussion, but just for the
 record: this patch would cause the files in the userns mount to be left
 with the unlabeled label, and therefore under typical policies,
 completely inaccessible to any process in a confined domain.

 The right way to handle this for SELinux would be to automatically use
 mountpoint labeling (SECURITY_FS_USE_MNTPOINT, normally set by
 specifying a context= mount option), with the sbsec-mntpoint_sid set
 from some related object (e.g. the block device file context, as in your
 patches for Smack).  That will cause SELinux to use that value instead
 of any xattr value from the filesystem and will cause attempts by
 userspace to set the security.selinux xattr to fail on that filesystem.
  That is how SELinux normally deals with untrusted filesystems, except
 that it is normally specified as a mount option by a trusted mounting
 process, whereas in your case you need to automatically set it.

 Excellent, thank you for the advice. I'll start on this when I've
 finished with Smack.
 
 Not tested, but something like this should work. Note that it should
 come after the call to security_fs_use() so we know whether SELinux
 would even try to use xattrs supplied by the filesystem in the first place.
 
 diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
 index 564079c..84da3a2 100644
 --- a/security/selinux/hooks.c
 +++ b/security/selinux/hooks.c
 @@ -745,6 +745,30 @@ static int selinux_set_mnt_opts(struct super_block *sb,
 goto out;
 }
 }
 +
 +   /*
 +* If this is a user namespace mount, no contexts are allowed
 +* on the command line and security labels must be ignored.
 +*/
 +   if (sb-s_user_ns != init_user_ns) {
 +   if (context_sid || fscontext_sid || rootcontext_sid ||
 +   defcontext_sid) {
 +   rc = -EACCES;
 +   goto out;
 +   }
 +   if (sbsec-behavior == SECURITY_FS_USE_XATTR) {
 +   struct block_device *bdev = sb-s_bdev;
 +   sbsec-behavior = SECURITY_FS_USE_MNTPOINT;
 +   if (bdev) {
 +   struct inode_security_struct *isec =
 bdev-bd_inode;

That should be bdev-bd_inode-i_security.

 +   sbsec-mntpoint_sid = isec-sid;
 +   } else {
 +   sbsec-mntpoint_sid = current_sid();
 +   }
 +   }
 +   goto out_set_opts;
 +   }
 +
 /* sets the context of the superblock for the fs being mounted. */
 if (fscontext_sid) {
 rc = may_context_mount_sb_relabel(fscontext_sid, sbsec,
 cred);
 @@ -813,6 +837,7 @@ static int selinux_set_mnt_opts(struct super_block *sb,
 sbsec-def_sid = defcontext_sid;
 }
 
 +out_set_opts:
 rc = sb_finish_set_opts(sb);
  out:
 mutex_unlock(sbsec-lock);
 
 ___
 Selinux mailing list
 seli...@tycho.nsa.gov
 To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
 To get help, send an email containing help to selinux-requ...@tycho.nsa.gov.
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS

2015-07-22 Thread Stephen Smalley
On 07/22/2015 08:46 AM, Morten Stevens wrote:
 2015-06-17 13:45 GMT+02:00 Morten Stevens mstev...@fedoraproject.org:
 2015-06-15 8:09 GMT+02:00 Daniel Wagner w...@monom.org:
 On 06/14/2015 06:48 PM, Hugh Dickins wrote:
 It appears that, at some point last year, XFS made directory handling
 changes which bring it into lockdep conflict with shmem_zero_setup():
 it is surprising that mmap() can clone an inode while holding mmap_sem,
 but that has been so for many years.

 Since those few lockdep traces that I've seen all implicated selinux,
 I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which
 v3.13's commit c7277090927a (security: shmem: implement kernel private
 shmem inodes) introduced to avoid LSM checks on kernel-internal inodes:
 the mmap(/dev/zero) cloned inode is indeed a kernel-internal detail.

 This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero
 (and MAP_SHARED|MAP_ANONYMOUS).  I thought there were also drivers
 which cloned inode in mmap(), but if so, I cannot locate them now.

 Reported-and-tested-by: Prarit Bhargava pra...@redhat.com
 Reported-by: Daniel Wagner w...@monom.org

 Reported-and-tested-by: Daniel Wagner w...@monom.org

 Sorry for the long delay. It took me a while to figure out my original
 setup. I could verify that this patch made the lockdep message go away
 on 4.0-rc6 and also on 4.1-rc8.

 Yes, it's also fixed for me after applying this patch to 4.1-rc8.
 
 Here is another deadlock with the latest 4.2.0-rc3:
 
 Jul 22 14:36:40 fc23 kernel:
 ==
 Jul 22 14:36:40 fc23 kernel: [ INFO: possible circular locking
 dependency detected ]
 Jul 22 14:36:40 fc23 kernel: 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1
 Tainted: GW
 Jul 22 14:36:40 fc23 kernel:
 ---
 Jul 22 14:36:40 fc23 kernel: httpd/1597 is trying to acquire lock:
 Jul 22 14:36:40 fc23 kernel: (ids-rwsem){+.}, at:
 [81385354] shm_close+0x34/0x130
 Jul 22 14:36:40 fc23 kernel: #012but task is already holding lock:
 Jul 22 14:36:40 fc23 kernel: (mm-mmap_sem){++}, at:
 [81386bbb] SyS_shmdt+0x4b/0x180
 Jul 22 14:36:40 fc23 kernel: #012which lock already depends on the new lock.
 Jul 22 14:36:40 fc23 kernel: #012the existing dependency chain (in
 reverse order) is:
 Jul 22 14:36:40 fc23 kernel: #012- #3 (mm-mmap_sem){++}:
 Jul 22 14:36:40 fc23 kernel:   [81109a07] 
 lock_acquire+0xc7/0x270
 Jul 22 14:36:40 fc23 kernel:   [81217baa] 
 __might_fault+0x7a/0xa0
 Jul 22 14:36:40 fc23 kernel:   [81284a1e] filldir+0x9e/0x130
 Jul 22 14:36:40 fc23 kernel:   [a019bb08]
 xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
 Jul 22 14:36:40 fc23 kernel:   [a019c5b4]
 xfs_readdir+0x1b4/0x330 [xfs]
 Jul 22 14:36:40 fc23 kernel:   [a019f38b]
 xfs_file_readdir+0x2b/0x30 [xfs]
 Jul 22 14:36:40 fc23 kernel:   [812847e7] iterate_dir+0x97/0x130
 Jul 22 14:36:40 fc23 kernel:   [81284d21] 
 SyS_getdents+0x91/0x120
 Jul 22 14:36:40 fc23 kernel:   [81871d2e]
 entry_SYSCALL_64_fastpath+0x12/0x76
 Jul 22 14:36:40 fc23 kernel: #012- #2 (xfs_dir_ilock_class){.+}:
 Jul 22 14:36:40 fc23 kernel:   [81109a07] 
 lock_acquire+0xc7/0x270
 Jul 22 14:36:40 fc23 kernel:   [81101e97]
 down_read_nested+0x57/0xa0
 Jul 22 14:36:40 fc23 kernel:   [a01b0e57]
 xfs_ilock+0x167/0x350 [xfs]
 Jul 22 14:36:40 fc23 kernel:   [a01b10b8]
 xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
 Jul 22 14:36:40 fc23 kernel:   [a014799d]
 xfs_attr_get+0xbd/0x190 [xfs]
 Jul 22 14:36:40 fc23 kernel:   [a01c17ad]
 xfs_xattr_get+0x3d/0x70 [xfs]
 Jul 22 14:36:40 fc23 kernel:   [8129962f]
 generic_getxattr+0x4f/0x70
 Jul 22 14:36:40 fc23 kernel:   [8139ba52]
 inode_doinit_with_dentry+0x162/0x670
 Jul 22 14:36:40 fc23 kernel:   [8139cf69]
 sb_finish_set_opts+0xd9/0x230
 Jul 22 14:36:40 fc23 kernel:   [8139d66c]
 selinux_set_mnt_opts+0x35c/0x660
 Jul 22 14:36:40 fc23 kernel:   [8139ff97]
 superblock_doinit+0x77/0xf0
 Jul 22 14:36:40 fc23 kernel:   [813a0020]
 delayed_superblock_init+0x10/0x20
 Jul 22 14:36:40 fc23 kernel:   [81272d23]
 iterate_supers+0xb3/0x110
 Jul 22 14:36:40 fc23 kernel:   [813a4e5f]
 selinux_complete_init+0x2f/0x40
 Jul 22 14:36:40 fc23 kernel:   [813b47a3]
 security_load_policy+0x103/0x600
 Jul 22 14:36:40 fc23 kernel:   [813a6901]
 sel_write_load+0xc1/0x750
 Jul 22 14:36:40 fc23 kernel:   [8126e817] __vfs_write+0x37/0x100
 Jul 22 14:36:40 fc23 kernel:   [8126f229] vfs_write+0xa9/0x1a0
 Jul 22 14:36:40 fc23 kernel:   [8126ff48] SyS_write+0x58/0xd0
 Jul 22 14:36:40 fc23 kernel:   [81871d2e]
 entry_SYSCALL_64_fastpath+0x12/0x76
 Jul 22 14:36:40 fc23 kernel: #012- #1 

Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts

2015-07-16 Thread Stephen Smalley
On 07/15/2015 03:46 PM, Seth Forshee wrote:
> Unprivileged users should not be able to supply security labels
> in filesystems, nor should they be able to supply security
> contexts in unprivileged mounts. For any mount where s_user_ns is
> not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior
> and return EPERM if any contexts are supplied in the mount
> options.
> 
> Signed-off-by: Seth Forshee 

I think this is obsoleted by the subsequent discussion, but just for the
record: this patch would cause the files in the userns mount to be left
with the "unlabeled" label, and therefore under typical policies,
completely inaccessible to any process in a confined domain.

> ---
>  security/selinux/hooks.c | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 459e71ddbc9d..eeb71e45ab82 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -732,6 +732,19 @@ static int selinux_set_mnt_opts(struct super_block *sb,
>   !strcmp(sb->s_type->name, "pstore"))
>   sbsec->flags |= SE_SBGENFS;
>  
> + /*
> +  * If this is a user namespace mount, no contexts are allowed
> +  * on the command line and security labels mus be ignored.
> +  */
> + if (sb->s_user_ns != _user_ns) {
> + if (context_sid || fscontext_sid || rootcontext_sid ||
> + defcontext_sid)
> + return -EPERM;
> + sbsec->behavior = SECURITY_FS_USE_NONE;
> + goto out_set_opts;
> + }
> +
> +
>   if (!sbsec->behavior) {
>   /*
>* Determine the labeling behavior to use for this
> @@ -813,6 +826,7 @@ static int selinux_set_mnt_opts(struct super_block *sb,
>   sbsec->def_sid = defcontext_sid;
>   }
>  
> +out_set_opts:
>   rc = sb_finish_set_opts(sb);
>  out:
>   mutex_unlock(>lock);
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] Initial support for user namespace owned mounts

2015-07-16 Thread Stephen Smalley
On 07/15/2015 09:05 PM, Andy Lutomirski wrote:
> On Jul 15, 2015 3:34 PM, "Eric W. Biederman"  wrote:
>>
>> Seth Forshee  writes:
>>
>>> On Wed, Jul 15, 2015 at 04:06:35PM -0500, Eric W. Biederman wrote:
 Casey Schaufler  writes:

> On 7/15/2015 12:46 PM, Seth Forshee wrote:
>> These are the first in a larger set of patches that I've been working on
>> (with help from Eric Biederman) to support mounting ext4 and fuse
>> filesystems from within user namespaces. I've pushed the full series to:
>>
>>   git://kernel.ubuntu.com/sforshee/linux.git userns-mounts
>>
>> Taking the series as a whole, the strategy is to handle as much of the
>> heavy lifting as possible in the vfs so the filesystems don't have to
>> handle weird edge cases. If you look at the full series you'll find that
>> the changes in ext4 to support user namespace mounts turn out to be
>> fairly minimal (fuse is a bit more complicated though as it must deal
>> with translating ids for a userspace process which is running in pid and
>> user namespaces).
>>
>> The patches I'm sending today lay some of the groundwork in the vfs and
>> related code. They fall into two broad groups:
>>
>>  1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are
>> pretty straightforward, and Eric has expressed interest in merging
>> these patches soon. Note that patch 2 won't apply cleanly without
>> Eric's noexec patches for proc and sys [1].
>>
>>  2. Patches 2-7 tighten down security for mounts with s_user_ns !=
>> _user_ns. This includes updates to how file caps and suid are
>> handled and LSM updates to ignore security labels on superblocks
>> from non-init namespaces.
>>
>> The LSM changes in particular may not be optimal, as I don't have a
>> lot of familiarity with this code, so I'd be especially appreciative
>> of review of these changes and suggestions on how to improve them.
>
> Lukasz Pawelczyk  proposed
> LSM support in user namespaces ([RFC] lsm: namespace hooks)
> that make a whole lot more sense than just turning off
> the option of using labels on files. Gutting the ability
> to use MAC in a namespace is a step down the road of
> making MAC and namespaces incompatible.

 This is not "turning off the option to use labels on files".

 This is supporting mounting filesystems like ext4 by unprivileged users
 and not trusting the labels they set in the same way as we trust labels
 on filesystems mounted by privileged users.

 The first step needs to be not trusting those labels and treating such
 filesystems as filesystems without label support.  I hope that is Seth
 has implemented.

 In the long run we can do more interesting things with such filesystems
 once the appropriate LSM policy is in place.
>>>
>>> Yes, this exactly. Right now it looks to me like the only safe thing to
>>> do with mounts from unprivileged users is to ignore the security labels,
>>> so that's what I'm trying to do with these changes. If there's some
>>> better thing to do, or some better way to do it, I'm more than happy to
>>> receive that feedback.
>>
>> Ugh.
>>
>> This made me realize that we have an interesting problem here.  An
>> unprivileged mount of tmpfs probably needs to have
>> s_user_ns == _user_ns.
>>
>> Otherwise we will break security labels on tmpfs for no good reason.
>> ramfs and sysfs also seem to have similar concerns.
>>
>> Because they have no backing store we can trust those filesystems with
>> security labels.  Plus for at least sysfs there is the security label
>> bleed through issue, that we need to make certain works.
>>
>> Perhaps these filesystems with trusted backing store need to call
>> "sget_userns(..., _user_ns)".
>>
>> If we don't get this right we will have significant regressions with
>> respect to security labels, and that is not ok.
> 
> That's only a problem if there's anyone who sets security labels on
> such a mount.  You need global caps to do that (I hope), which
> requires someone outside the userns to help, which means there's a
> good chance that literally no one does this.

Setting of security.selinux attributes is governed by SELinux permission
checks, not by capabilities.

Also, files are always assigned a label at creation time; a tmpfs inode
will be labeled based on its creator without any userspace entity ever
calling setxattr() at all.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] selinux: Ignore security labels on user namespace mounts

2015-07-16 Thread Stephen Smalley
On 07/15/2015 03:46 PM, Seth Forshee wrote:
 Unprivileged users should not be able to supply security labels
 in filesystems, nor should they be able to supply security
 contexts in unprivileged mounts. For any mount where s_user_ns is
 not init_user_ns, force the use of SECURITY_FS_USE_NONE behavior
 and return EPERM if any contexts are supplied in the mount
 options.
 
 Signed-off-by: Seth Forshee seth.fors...@canonical.com

I think this is obsoleted by the subsequent discussion, but just for the
record: this patch would cause the files in the userns mount to be left
with the unlabeled label, and therefore under typical policies,
completely inaccessible to any process in a confined domain.

 ---
  security/selinux/hooks.c | 14 ++
  1 file changed, 14 insertions(+)
 
 diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
 index 459e71ddbc9d..eeb71e45ab82 100644
 --- a/security/selinux/hooks.c
 +++ b/security/selinux/hooks.c
 @@ -732,6 +732,19 @@ static int selinux_set_mnt_opts(struct super_block *sb,
   !strcmp(sb-s_type-name, pstore))
   sbsec-flags |= SE_SBGENFS;
  
 + /*
 +  * If this is a user namespace mount, no contexts are allowed
 +  * on the command line and security labels mus be ignored.
 +  */
 + if (sb-s_user_ns != init_user_ns) {
 + if (context_sid || fscontext_sid || rootcontext_sid ||
 + defcontext_sid)
 + return -EPERM;
 + sbsec-behavior = SECURITY_FS_USE_NONE;
 + goto out_set_opts;
 + }
 +
 +
   if (!sbsec-behavior) {
   /*
* Determine the labeling behavior to use for this
 @@ -813,6 +826,7 @@ static int selinux_set_mnt_opts(struct super_block *sb,
   sbsec-def_sid = defcontext_sid;
   }
  
 +out_set_opts:
   rc = sb_finish_set_opts(sb);
  out:
   mutex_unlock(sbsec-lock);
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] Initial support for user namespace owned mounts

2015-07-16 Thread Stephen Smalley
On 07/15/2015 09:05 PM, Andy Lutomirski wrote:
 On Jul 15, 2015 3:34 PM, Eric W. Biederman ebied...@xmission.com wrote:

 Seth Forshee seth.fors...@canonical.com writes:

 On Wed, Jul 15, 2015 at 04:06:35PM -0500, Eric W. Biederman wrote:
 Casey Schaufler ca...@schaufler-ca.com writes:

 On 7/15/2015 12:46 PM, Seth Forshee wrote:
 These are the first in a larger set of patches that I've been working on
 (with help from Eric Biederman) to support mounting ext4 and fuse
 filesystems from within user namespaces. I've pushed the full series to:

   git://kernel.ubuntu.com/sforshee/linux.git userns-mounts

 Taking the series as a whole, the strategy is to handle as much of the
 heavy lifting as possible in the vfs so the filesystems don't have to
 handle weird edge cases. If you look at the full series you'll find that
 the changes in ext4 to support user namespace mounts turn out to be
 fairly minimal (fuse is a bit more complicated though as it must deal
 with translating ids for a userspace process which is running in pid and
 user namespaces).

 The patches I'm sending today lay some of the groundwork in the vfs and
 related code. They fall into two broad groups:

  1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are
 pretty straightforward, and Eric has expressed interest in merging
 these patches soon. Note that patch 2 won't apply cleanly without
 Eric's noexec patches for proc and sys [1].

  2. Patches 2-7 tighten down security for mounts with s_user_ns !=
 init_user_ns. This includes updates to how file caps and suid are
 handled and LSM updates to ignore security labels on superblocks
 from non-init namespaces.

 The LSM changes in particular may not be optimal, as I don't have a
 lot of familiarity with this code, so I'd be especially appreciative
 of review of these changes and suggestions on how to improve them.

 Lukasz Pawelczyk l.pawelc...@samsung.com proposed
 LSM support in user namespaces ([RFC] lsm: namespace hooks)
 that make a whole lot more sense than just turning off
 the option of using labels on files. Gutting the ability
 to use MAC in a namespace is a step down the road of
 making MAC and namespaces incompatible.

 This is not turning off the option to use labels on files.

 This is supporting mounting filesystems like ext4 by unprivileged users
 and not trusting the labels they set in the same way as we trust labels
 on filesystems mounted by privileged users.

 The first step needs to be not trusting those labels and treating such
 filesystems as filesystems without label support.  I hope that is Seth
 has implemented.

 In the long run we can do more interesting things with such filesystems
 once the appropriate LSM policy is in place.

 Yes, this exactly. Right now it looks to me like the only safe thing to
 do with mounts from unprivileged users is to ignore the security labels,
 so that's what I'm trying to do with these changes. If there's some
 better thing to do, or some better way to do it, I'm more than happy to
 receive that feedback.

 Ugh.

 This made me realize that we have an interesting problem here.  An
 unprivileged mount of tmpfs probably needs to have
 s_user_ns == init_user_ns.

 Otherwise we will break security labels on tmpfs for no good reason.
 ramfs and sysfs also seem to have similar concerns.

 Because they have no backing store we can trust those filesystems with
 security labels.  Plus for at least sysfs there is the security label
 bleed through issue, that we need to make certain works.

 Perhaps these filesystems with trusted backing store need to call
 sget_userns(..., init_user_ns).

 If we don't get this right we will have significant regressions with
 respect to security labels, and that is not ok.
 
 That's only a problem if there's anyone who sets security labels on
 such a mount.  You need global caps to do that (I hope), which
 requires someone outside the userns to help, which means there's a
 good chance that literally no one does this.

Setting of security.selinux attributes is governed by SELinux permission
checks, not by capabilities.

Also, files are always assigned a label at creation time; a tmpfs inode
will be labeled based on its creator without any userspace entity ever
calling setxattr() at all.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 5/8] kdbus: use LSM hooks in kdbus code

2015-07-10 Thread Stephen Smalley
On 07/08/2015 09:37 AM, Stephen Smalley wrote:
> On 07/08/2015 06:25 AM, Paul Osmialowski wrote:
>> Originates from:
>>
>> https://github.com/lmctl/kdbus.git (branch: kdbus-lsm-v4.for-systemd-v212)
>> commit: aa0885489d19be92fa41c6f0a71df28763228a40
>>
>> Signed-off-by: Karol Lewandowski 
>> Signed-off-by: Paul Osmialowski 
>> ---
>>  ipc/kdbus/bus.c| 12 ++-
>>  ipc/kdbus/bus.h|  3 +++
>>  ipc/kdbus/connection.c | 54 
>> ++
>>  ipc/kdbus/connection.h |  4 
>>  ipc/kdbus/domain.c |  9 -
>>  ipc/kdbus/domain.h |  2 ++
>>  ipc/kdbus/endpoint.c   | 11 ++
>>  ipc/kdbus/names.c  | 11 ++
>>  ipc/kdbus/queue.c  | 30 ++--
>>  9 files changed, 124 insertions(+), 12 deletions(-)
>>
>>
> 
>> diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c
>> index 9993753..b85cdc7 100644
>> --- a/ipc/kdbus/connection.c
>> +++ b/ipc/kdbus/connection.c
>> @@ -31,6 +31,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #include "bus.h"
>>  #include "connection.h"
>> @@ -73,6 +74,8 @@ static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep 
>> *ep, bool privileged,
>>  bool is_activator;
>>  bool is_monitor;
>>  struct kvec kvec;
>> +u32 sid, len;
>> +char *label;
>>  int ret;
>>  
>>  struct {
>> @@ -222,6 +225,14 @@ static struct kdbus_conn *kdbus_conn_new(struct 
>> kdbus_ep *ep, bool privileged,
>>  }
>>  }
>>  
>> +security_task_getsecid(current, );
>> +security_secid_to_secctx(sid, , );
>> +ret = security_kdbus_connect(conn, label, len);
>> +if (ret) {
>> +ret = -EPERM;
>> +goto exit_unref;
>> +}
> 
> This seems convoluted and expensive.  If you always want the label of
> the current task here, then why not just have security_kdbus_connect()
> internally extract the label of the current task?

Furthermore, why do we need a separate security field and copy of the
current label in the conn->security, when we already have
conn->cred->security available to us?

I don't think we need new security fields unless we are going to assign
some kind of object labeling to these structures separate from their
cred, and offhand I don't see why we would do that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kdbus: credential faking

2015-07-10 Thread Stephen Smalley
On 07/10/2015 12:48 PM, David Herrmann wrote:
> Hi
> 
> On Fri, Jul 10, 2015 at 4:47 PM, Stephen Smalley  wrote:
>> On 07/10/2015 09:43 AM, David Herrmann wrote:
>>> On Fri, Jul 10, 2015 at 3:25 PM, Stephen Smalley  wrote:
>>>> On 07/09/2015 06:22 PM, David Herrmann wrote:
>>>>> With dbus1, clients can ask the dbus-daemon for the seclabel of a peer
>>>>> they talk to. They're free to use this information for any purpose. On
>>>>> kdbus, we want to be compatible to dbus-daemon. Therefore, if a native
>>>>> client queries kdbus for the seclabel of a peer behind a proxy, we
>>>>> want that query to return the actual seclabel of the peer, not the
>>>>> seclabel of the proxy. Same applies to PIDS and CREDS.
>>>>>
>>>>> This faked metadata is never used by the kernel for any security
>>>>> decisions. It's sole purpose is to return them if a native kdbus
>>>>> client queries another peer. Furthermore, this information is never
>>>>> transmitted as send-time metadata (as it is, in no way, send-time
>>>>> metadata), but only if you explicitly query the connection-time
>>>>> metadata of a peer (KDBUS_CMD_CONN_INFO).
>>>>
>>>> I guess I don't understand the difference.  Is there a separate facility
>>>> for obtaining the send-time metadata that is not subject to credential
>>>> faking?
>>>
>>> Each message carries metadata of the sender, that was collected at the
>>> time of _SEND_. This metadata cannot be faked.
>>> Additionally (for introspection and dbus1 compat), kdbus allows peers
>>> to query metadata of other peers, that were collected at the time of
>>> _CONNECT_. Privileged peers can provide faked _connection_ metadata,
>>> which has the side-effect of suppressing send-time metadata.
>>> It is up to the receiver to request connection-metadata if a message
>>> did not carry send-time metadata. We do this, currently, only to
>>> support legacy dbus1 clients which do not support send-time metadata.
>>
>> So the "privileged" peer (which just means the bus owner, which can be
>> completely unprivileged from a typical DAC perspective) can both prevent
>> the receiver from getting the (real, unfakeable) send-time metadata and
>> supply arbitrary fake credentials for the connection metadata?  And the
> 
> (Limited to PIDS/CREDS/SECLABEL metadata, but) yes.
> 
> Note that this is all under the assumption that you never connect to a
> bus owned by someone else but you or root. Hence, a peer can only fake
> metadata, if it can also ptrace you.

If you don't enforce this assumption in kdbus, then you can't be sure
that it won't be violated by future userspace.

Also, the statement about ptrace doesn't hold when using SELinux or
other security modules.

>> legacy dbus1 clients (i.e. all current DBUS applications?) will always
>> use this potentially faked metadata.  Meanwhile, what about new dbus
>> clients?  What is the standard behavior for them when the send-time
>> metadata is suppressed?  Do they always fall back to the connection
>> metadata?
> 
> This is a decision user-space has to make. In sd-bus, if we trust the
> bus (root owned, or our own), we always fall back to connection
> metadata.

So the only benefit of the credentials in the send-time metadata is they
come for free rather than needing to be separately queried?  And aside
from credential faking (impersonation being a nicer name), when else
would they differ from the connection metadata?  If the program does a
setuid or something after creating the connection?

>>>>> Regarding requiring CAP_SYS_ADMIN, I don't really see the point. In
>>>>> the kdbus security model, if you don't trust the bus-creator, you
>>>>> should not connect to the bus. A bus-creator can bypass kdbus
>>>>> policies, sniff on any transmission and modify bus behavior. It just
>>>>> seems logical to bind faked-metadata to the same privilege. However, I
>>>>> also have no strong feeling about that, if you place valid points. So
>>>>> please elaborate.
>>>>> But, please be aware that if we require privileges to fake metadata,
>>>>> then you need to have such privileges to provide a dbus1 proxy for
>>>>> your native bus on kdbus. In other words, users are able to create
>>>>> session/user buses, but they need CAP_SYS_ADMIN to spawn the dbus1
>>>>> proxy. This will have the net-effect of us requiring to run the proxy
>>>>> a

Re: [RFC 5/8] kdbus: use LSM hooks in kdbus code

2015-07-10 Thread Stephen Smalley
On 07/08/2015 09:37 AM, Stephen Smalley wrote:
> On 07/08/2015 06:25 AM, Paul Osmialowski wrote:
>> Originates from:
>>
>> https://github.com/lmctl/kdbus.git (branch: kdbus-lsm-v4.for-systemd-v212)
>> commit: aa0885489d19be92fa41c6f0a71df28763228a40
>>
>> Signed-off-by: Karol Lewandowski 
>> Signed-off-by: Paul Osmialowski 
>> ---
>>  ipc/kdbus/bus.c| 12 ++-
>>  ipc/kdbus/bus.h|  3 +++
>>  ipc/kdbus/connection.c | 54 
>> ++
>>  ipc/kdbus/connection.h |  4 
>>  ipc/kdbus/domain.c |  9 -
>>  ipc/kdbus/domain.h |  2 ++
>>  ipc/kdbus/endpoint.c   | 11 ++
>>  ipc/kdbus/names.c  | 11 ++
>>  ipc/kdbus/queue.c  | 30 ++--
>>  9 files changed, 124 insertions(+), 12 deletions(-)
>>
>>
> 
>> diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c
>> index 9993753..b85cdc7 100644
>> --- a/ipc/kdbus/connection.c
>> +++ b/ipc/kdbus/connection.c
>> @@ -31,6 +31,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #include "bus.h"
>>  #include "connection.h"
>> @@ -73,6 +74,8 @@ static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep 
>> *ep, bool privileged,
>>  bool is_activator;
>>  bool is_monitor;
>>  struct kvec kvec;
>> +u32 sid, len;
>> +char *label;
>>  int ret;
>>  
>>  struct {
>> @@ -222,6 +225,14 @@ static struct kdbus_conn *kdbus_conn_new(struct 
>> kdbus_ep *ep, bool privileged,
>>  }
>>  }
>>  
>> +security_task_getsecid(current, );
>> +security_secid_to_secctx(sid, , );
>> +ret = security_kdbus_connect(conn, label, len);
>> +if (ret) {
>> +ret = -EPERM;
>> +goto exit_unref;
>> +}
> 
> This seems convoluted and expensive.  If you always want the label of
> the current task here, then why not just have security_kdbus_connect()
> internally extract the label of the current task?
> 
>> @@ -1107,6 +1119,12 @@ static int kdbus_conn_reply(struct kdbus_conn *src, 
>> struct kdbus_kmsg *kmsg)
>>  if (ret < 0)
>>  goto exit;
>>  
>> +ret = security_kdbus_talk(src, dst);
>> +if (ret) {
>> +ret = -EPERM;
>> +goto exit;
>> +}
> 
> Where does kdbus apply its uid-based or other restrictions on
> connections?  Why do we need to insert separate hooks into each of these
> functions?  Is there no central chokepoint already for permission
> checking that we can hook?

For example, why wouldn't you insert a single hook into
kdbus_conn_policy_talk() where they perform their DAC checking?
You would need to restructure it slightly to ensure that the security
hook is only called if it passes the DAC (privileged || uid_eq) check so
that we do not trigger MAC denials when DAC wouldn't have allowed it
anyway.  Also, kdbus_conn_policy_talk() takes a separate conn_creds
argument - that should be passed through to the hook as well.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kdbus: credential faking

2015-07-10 Thread Stephen Smalley
On 07/10/2015 09:43 AM, David Herrmann wrote:
> Hi
> 
> On Fri, Jul 10, 2015 at 3:25 PM, Stephen Smalley  wrote:
>> On 07/09/2015 06:22 PM, David Herrmann wrote:
>>> To be clear, faking metadata has one use-case, and one use-case only:
>>> dbus1 compatibility
>>>
>>> In dbus1, clients connect to a unix-socket placed in the file-system
>>> hierarchy. To avoid breaking ABI for old clients, we support a
>>> unix-kdbus proxy. This proxy is called systemd-bus-proxyd. It is
>>> spawned once for each bus we proxy and simply remarshals messages from
>>> the client to kdbus and vice versa.
>>
>> Is this truly necessary?  Can't the distributions just update the client
>> side libraries to use kdbus if enabled and be done with it?  Doesn't
>> this proxy undo many of the benefits of using kdbus in the first place?
> 
> We need binary compatibility to dbus1. There're millions of
> applications and language bindings with dbus1 compiled in, which we
> cannot suddenly break.

So, are you saying that there are many applications that statically link
the dbus1 library implementation (thus the distributions can't just push
an updated shared library that switches from using the socket to using
kdbus), and that many of these applications are third party applications
not packaged by the distributions (thus the distributions cannot just do
a mass rebuild to update these applications too)?  Otherwise, I would
think that the use of a socket would just be an implementation detail
and you would be free to change it without affecting dbus1 library ABI
compatibility.

>>> With dbus1, clients can ask the dbus-daemon for the seclabel of a peer
>>> they talk to. They're free to use this information for any purpose. On
>>> kdbus, we want to be compatible to dbus-daemon. Therefore, if a native
>>> client queries kdbus for the seclabel of a peer behind a proxy, we
>>> want that query to return the actual seclabel of the peer, not the
>>> seclabel of the proxy. Same applies to PIDS and CREDS.
>>>
>>> This faked metadata is never used by the kernel for any security
>>> decisions. It's sole purpose is to return them if a native kdbus
>>> client queries another peer. Furthermore, this information is never
>>> transmitted as send-time metadata (as it is, in no way, send-time
>>> metadata), but only if you explicitly query the connection-time
>>> metadata of a peer (KDBUS_CMD_CONN_INFO).
>>
>> I guess I don't understand the difference.  Is there a separate facility
>> for obtaining the send-time metadata that is not subject to credential
>> faking?
> 
> Each message carries metadata of the sender, that was collected at the
> time of _SEND_. This metadata cannot be faked.
> Additionally (for introspection and dbus1 compat), kdbus allows peers
> to query metadata of other peers, that were collected at the time of
> _CONNECT_. Privileged peers can provide faked _connection_ metadata,
> which has the side-effect of suppressing send-time metadata.
> It is up to the receiver to request connection-metadata if a message
> did not carry send-time metadata. We do this, currently, only to
> support legacy dbus1 clients which do not support send-time metadata.

So the "privileged" peer (which just means the bus owner, which can be
completely unprivileged from a typical DAC perspective) can both prevent
the receiver from getting the (real, unfakeable) send-time metadata and
supply arbitrary fake credentials for the connection metadata?  And the
legacy dbus1 clients (i.e. all current DBUS applications?) will always
use this potentially faked metadata.  Meanwhile, what about new dbus
clients?  What is the standard behavior for them when the send-time
metadata is suppressed?  Do they always fall back to the connection
metadata?

>>> Regarding requiring CAP_SYS_ADMIN, I don't really see the point. In
>>> the kdbus security model, if you don't trust the bus-creator, you
>>> should not connect to the bus. A bus-creator can bypass kdbus
>>> policies, sniff on any transmission and modify bus behavior. It just
>>> seems logical to bind faked-metadata to the same privilege. However, I
>>> also have no strong feeling about that, if you place valid points. So
>>> please elaborate.
>>> But, please be aware that if we require privileges to fake metadata,
>>> then you need to have such privileges to provide a dbus1 proxy for
>>> your native bus on kdbus. In other words, users are able to create
>>> session/user buses, but they need CAP_SYS_ADMIN to spawn the dbus1
>>> proxy. This will have the net-effect of us requiring to run the proxy
>>> as root (which, I th

[PATCH] selinux: fix mprotect PROT_EXEC regression caused by mm change

2015-07-10 Thread Stephen Smalley
commit 66fc13039422ba7df2d01a8ee0873e4ef965b50b ("mm: shmem_zero_setup skip
security check and lockdep conflict with XFS") caused a regression for
SELinux by disabling any SELinux checking of mprotect PROT_EXEC on
shared anonymous mappings.  However, even before that regression, the
checking on such mprotect PROT_EXEC calls was inconsistent with the
checking on a mmap PROT_EXEC call for a shared anonymous mapping.  On a
mmap, the security hook is passed a NULL file and knows it is dealing with
an anonymous mapping and therefore applies an execmem check and no file
checks.  On a mprotect, the security hook is passed a vma with a
non-NULL vm_file (as this was set from the internally-created shmem
file during mmap) and therefore applies the file-based execute check and
no execmem check.  Since the aforementioned commit now marks the shmem
zero inode with the S_PRIVATE flag, the file checks are disabled and
we have no checking at all on mprotect PROT_EXEC.  Add a test to
the mprotect hook logic for such private inodes, and apply an execmem
check in that case.  This makes the mmap and mprotect checking consistent
for shared anonymous mappings, as well as for /dev/zero and ashmem.

Signed-off-by: Stephen Smalley 
---
 security/selinux/hooks.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 6231081..564079c 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3283,7 +3283,8 @@ static int file_map_prot_check(struct file *file, 
unsigned long prot, int shared
int rc = 0;
 
if (default_noexec &&
-   (prot & PROT_EXEC) && (!file || (!shared && (prot & PROT_WRITE {
+   (prot & PROT_EXEC) && (!file || IS_PRIVATE(file_inode(file)) ||
+  (!shared && (prot & PROT_WRITE {
/*
 * We are making executable an anonymous mapping or a
 * private file mapping that will also be writable.
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kdbus: credential faking

2015-07-10 Thread Stephen Smalley
On 07/10/2015 05:05 AM, David Herrmann wrote:
> Hi
> 
> On Fri, Jul 10, 2015 at 12:56 AM, Casey Schaufler
>  wrote:
>> On 7/9/2015 3:22 PM, David Herrmann wrote:
>>> Regarding requiring CAP_SYS_ADMIN, I don't really see the point. In
>>> the kdbus security model, if you don't trust the bus-creator, you
>>> should not connect to the bus.
>>
>> That's fine in a discretionary access control model, but
>> not in a mandatory access control model. The decision on
>> trust of the "other" guy is never up to the process, it's
>> up to the mandatory access control policy.
> 
> Exactly. So LSMs are free to use a hook to limit faking other user's
> credentials. But why does that have to affect the default (which, in
> the case of kdbus, is a dac model)?
> 
>>> A bus-creator can bypass kdbus
>>> policies, sniff on any transmission and modify bus behavior. It just
>>> seems logical to bind faked-metadata to the same privilege. However, I
>>> also have no strong feeling about that, if you place valid points. So
>>> please elaborate.
>>
>> Smack has to require CAP_MAC_ADMIN to allow a process to fake
>> Smack metadata. This is exactly what CAP_MAC_ADMIN is for.
>> Changing Smack metadata is considered a hugely dangerous activity.
> 
> I'm totally fine with dropping support to fake seclabels, if LSM
> developers see no need for it. I, certainly, will not insist on it.
> With that in mind, I'd prefer if we limit this discussion to faking 
> CREDS/PIDS.

Well, based on your use case, we actually do need support for faking
seclabels if we need support for faking credentials at all, because your
proxy needs to be able to fake all of the credentials in order to be
fully transparent and preserve compatibility.  So I don't think they can
be divorced from each other.

Regardless, we will definitely want a hook for controlling this ability
to fake credentials, and I think we would want to separately distinguish
each of the cases that you currently lump under your single privileged
boolean, as the ability to do one should not necessarily imply the
ability to do them all.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kdbus: credential faking

2015-07-10 Thread Stephen Smalley
On 07/09/2015 06:22 PM, David Herrmann wrote:
> Hi
> 
> On Thu, Jul 9, 2015 at 8:26 PM, Stephen Smalley  wrote:
>> Hi,
>>
>> I have a concern with the support for faked credentials in kdbus, but
>> don't know enough about the original motivation or intended use case to
>> evaluate it concretely.  I raised this issue during the "kdbus for
>> 4.1-rc1" thread a while back but none of the kdbus maintainers
>> responded,
> 
> Sorry, some mails might have been gone unanswered in that huge thread.
> Please feel free to ping us about anything we didn't comment on. See
> below..
> 
>>and the one D-BUS maintainer who did respond said that there
>> is no API in dbus-daemon for faking client credentials, so this is not
>> something inherited from dbus-daemon or required for compatibility with it.
>>
>> First, I have doubts as to whether there should be any way to fake the
>> seclabel, no matter how "privileged" the caller.  Unless there is a
>> clear use case for that functionality, I would prefer to see it dropped
>> altogether.
>>
>> Second, IIUC, the ability to fake any portion of the credentials or pids
>> is granted if the caller either has CAP_IPC_OWNER or owns the bus (uid
>> match).  Clearly that isn't sufficient basis for seclabel faking, and it
>> seems questionable as to whether it should be sufficient for faking any
>> of the other credentials or pids.  Compare with e.g.
>> net/core/scm.c:scm_check_creds() logic for faking credentials on a Unix
>> domain socket, which requires CAP_SYS_ADMIN for faking pid, CAP_SETUID
>> for faking any of the uid fields, and CAP_SETGID for faking any of the
>> gid fields.
>>
>> Thanks for any light you can shed on the matter.
> 
> To be clear, faking metadata has one use-case, and one use-case only:
> dbus1 compatibility
> 
> In dbus1, clients connect to a unix-socket placed in the file-system
> hierarchy. To avoid breaking ABI for old clients, we support a
> unix-kdbus proxy. This proxy is called systemd-bus-proxyd. It is
> spawned once for each bus we proxy and simply remarshals messages from
> the client to kdbus and vice versa.

Is this truly necessary?  Can't the distributions just update the client
side libraries to use kdbus if enabled and be done with it?  Doesn't
this proxy undo many of the benefits of using kdbus in the first place?

> With dbus1, clients can ask the dbus-daemon for the seclabel of a peer
> they talk to. They're free to use this information for any purpose. On
> kdbus, we want to be compatible to dbus-daemon. Therefore, if a native
> client queries kdbus for the seclabel of a peer behind a proxy, we
> want that query to return the actual seclabel of the peer, not the
> seclabel of the proxy. Same applies to PIDS and CREDS.
> 
> This faked metadata is never used by the kernel for any security
> decisions. It's sole purpose is to return them if a native kdbus
> client queries another peer. Furthermore, this information is never
> transmitted as send-time metadata (as it is, in no way, send-time
> metadata), but only if you explicitly query the connection-time
> metadata of a peer (KDBUS_CMD_CONN_INFO).

I guess I don't understand the difference.  Is there a separate facility
for obtaining the send-time metadata that is not subject to credential
faking?

> Regarding requiring CAP_SYS_ADMIN, I don't really see the point. In
> the kdbus security model, if you don't trust the bus-creator, you
> should not connect to the bus. A bus-creator can bypass kdbus
> policies, sniff on any transmission and modify bus behavior. It just
> seems logical to bind faked-metadata to the same privilege. However, I
> also have no strong feeling about that, if you place valid points. So
> please elaborate.
> But, please be aware that if we require privileges to fake metadata,
> then you need to have such privileges to provide a dbus1 proxy for
> your native bus on kdbus. In other words, users are able to create
> session/user buses, but they need CAP_SYS_ADMIN to spawn the dbus1
> proxy. This will have the net-effect of us requiring to run the proxy
> as root (which, I think, is worse than allowing bus-owners to fake
> _connection_ metadata).

Applications have a reasonable expectation that credentials supplied by
the kernel for a peer are trustworthy.  Allowing unprivileged users to
forge arbitrary credentials and pids seems fraught with peril.  You say
that one should never connect to a bus if you do not trust its creator.
 What mechanisms are provided to allow me to determine whether I trust
the bus creator before connecting?  Are those mechanisms automatically
employed by default?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS

2015-07-10 Thread Stephen Smalley
On 07/10/2015 03:48 AM, Hugh Dickins wrote:
> On Thu, 9 Jul 2015, Stephen Smalley wrote:
>> On 07/09/2015 04:23 AM, Hugh Dickins wrote:
>>> On Wed, 8 Jul 2015, Stephen Smalley wrote:
>>>> On 07/08/2015 09:13 AM, Stephen Smalley wrote:
>>>>> On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins  wrote:
>>>>>> It appears that, at some point last year, XFS made directory handling
>>>>>> changes which bring it into lockdep conflict with shmem_zero_setup():
>>>>>> it is surprising that mmap() can clone an inode while holding mmap_sem,
>>>>>> but that has been so for many years.
>>>>>>
>>>>>> Since those few lockdep traces that I've seen all implicated selinux,
>>>>>> I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which
>>>>>> v3.13's commit c7277090927a ("security: shmem: implement kernel private
>>>>>> shmem inodes") introduced to avoid LSM checks on kernel-internal inodes:
>>>>>> the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail.
>>>>>>
>>>>>> This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero
>>>>>> (and MAP_SHARED|MAP_ANONYMOUS).  I thought there were also drivers
>>>>>> which cloned inode in mmap(), but if so, I cannot locate them now.
>>>>>
>>>>> This causes a regression for SELinux (please, in the future, cc
>>>>> selinux list and Paul Moore on SELinux-related changes).  In
>>>
>>> Surprised and sorry about that, yes, I should have Cc'ed.
>>>
>>>>> particular, this change disables SELinux checking of mprotect
>>>>> PROT_EXEC on shared anonymous mappings, so we lose the ability to
>>>>> control executable mappings.  That said, we are only getting that
>>>>> check today as a side effect of our file execute check on the tmpfs
>>>>> inode, whereas it would be better (and more consistent with the
>>>>> mmap-time checks) to apply an execmem check in that case, in which
>>>>> case we wouldn't care about the inode-based check.  However, I am
>>>>> unclear on how to correctly detect that situation from
>>>>> selinux_file_mprotect() -> file_map_prot_check(), because we do have a
>>>>> non-NULL vma->vm_file so we treat it as a file execute check.  In
>>>>> contrast, if directly creating an anonymous shared mapping with
>>>>> PROT_EXEC via mmap(...PROT_EXEC...),  selinux_mmap_file is called with
>>>>> a NULL file and therefore we end up applying an execmem check.
>>>
>>> If you're willing to go forward with the change, rather than just call
>>> for an immediate revert of it, then I think the right way to detect
>>> the situation would be to check IS_PRIVATE(file_inode(vma->vm_file)),
>>> wouldn't it?
>>
>> That seems misleading and might trigger execmem checks on non-shmem
>> inodes.  S_PRIVATE was originally introduced for fs-internal inodes that
>> are never directly exposed to userspace, originally for reiserfs xattr
>> inodes (reiserfs xattrs are internally implemented as their own files
>> that are hidden from userspace) and later also applied to anon inodes.
>> It would be better if we had an explicit way of testing that we are
>> dealing with an anonymous shared mapping in selinux_file_mprotect() ->
>> file_map_prot_check().
> 
> But how would any of those original S_PRIVATE inodes arrive at
> selinux_file_mprotect()?  Now we have added the anon shared mmap case
> which can arrive there, but the S_PRIVATE check seems just the right
> tool for the job of distinguishing those from the user-visible inodes.
> 
> I don't see how adding some other flag for this case would be better
> - though certainly I can see that adding an "anon shared shmem"
> comment on its use in that check would be helpful.
> 
> Or is there some further difficulty in this use of S_PRIVATE, beyond
> the mprotect case that you've mentioned?  Unless there is some further
> difficulty, duplicating all the code relating to S_PRIVATE for a
> differently named flag seems counter-productive to me.

S_PRIVATE is supposed to disable all security processing on the inode,
and often this is checked in the security framework
(security/security.c) even before we reach the SELinux hook and causes
an immediate return there.  In the case of mprotect, we do reach the
SELinux code since the hook is on the vma, not merely the inode, so we
could apply an execmem check in the SE

Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS

2015-07-10 Thread Stephen Smalley
On 07/10/2015 03:48 AM, Hugh Dickins wrote:
 On Thu, 9 Jul 2015, Stephen Smalley wrote:
 On 07/09/2015 04:23 AM, Hugh Dickins wrote:
 On Wed, 8 Jul 2015, Stephen Smalley wrote:
 On 07/08/2015 09:13 AM, Stephen Smalley wrote:
 On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins hu...@google.com wrote:
 It appears that, at some point last year, XFS made directory handling
 changes which bring it into lockdep conflict with shmem_zero_setup():
 it is surprising that mmap() can clone an inode while holding mmap_sem,
 but that has been so for many years.

 Since those few lockdep traces that I've seen all implicated selinux,
 I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which
 v3.13's commit c7277090927a (security: shmem: implement kernel private
 shmem inodes) introduced to avoid LSM checks on kernel-internal inodes:
 the mmap(/dev/zero) cloned inode is indeed a kernel-internal detail.

 This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero
 (and MAP_SHARED|MAP_ANONYMOUS).  I thought there were also drivers
 which cloned inode in mmap(), but if so, I cannot locate them now.

 This causes a regression for SELinux (please, in the future, cc
 selinux list and Paul Moore on SELinux-related changes).  In

 Surprised and sorry about that, yes, I should have Cc'ed.

 particular, this change disables SELinux checking of mprotect
 PROT_EXEC on shared anonymous mappings, so we lose the ability to
 control executable mappings.  That said, we are only getting that
 check today as a side effect of our file execute check on the tmpfs
 inode, whereas it would be better (and more consistent with the
 mmap-time checks) to apply an execmem check in that case, in which
 case we wouldn't care about the inode-based check.  However, I am
 unclear on how to correctly detect that situation from
 selinux_file_mprotect() - file_map_prot_check(), because we do have a
 non-NULL vma-vm_file so we treat it as a file execute check.  In
 contrast, if directly creating an anonymous shared mapping with
 PROT_EXEC via mmap(...PROT_EXEC...),  selinux_mmap_file is called with
 a NULL file and therefore we end up applying an execmem check.

 If you're willing to go forward with the change, rather than just call
 for an immediate revert of it, then I think the right way to detect
 the situation would be to check IS_PRIVATE(file_inode(vma-vm_file)),
 wouldn't it?

 That seems misleading and might trigger execmem checks on non-shmem
 inodes.  S_PRIVATE was originally introduced for fs-internal inodes that
 are never directly exposed to userspace, originally for reiserfs xattr
 inodes (reiserfs xattrs are internally implemented as their own files
 that are hidden from userspace) and later also applied to anon inodes.
 It would be better if we had an explicit way of testing that we are
 dealing with an anonymous shared mapping in selinux_file_mprotect() -
 file_map_prot_check().
 
 But how would any of those original S_PRIVATE inodes arrive at
 selinux_file_mprotect()?  Now we have added the anon shared mmap case
 which can arrive there, but the S_PRIVATE check seems just the right
 tool for the job of distinguishing those from the user-visible inodes.
 
 I don't see how adding some other flag for this case would be better
 - though certainly I can see that adding an anon shared shmem
 comment on its use in that check would be helpful.
 
 Or is there some further difficulty in this use of S_PRIVATE, beyond
 the mprotect case that you've mentioned?  Unless there is some further
 difficulty, duplicating all the code relating to S_PRIVATE for a
 differently named flag seems counter-productive to me.

S_PRIVATE is supposed to disable all security processing on the inode,
and often this is checked in the security framework
(security/security.c) even before we reach the SELinux hook and causes
an immediate return there.  In the case of mprotect, we do reach the
SELinux code since the hook is on the vma, not merely the inode, so we
could apply an execmem check in the SELinux code if IS_PRIVATE() instead
of file execute.

However, I was trying to figure out if the fact that S_PRIVATE also
would disable any read/write checking by SELinux on the inode could
potentially open up a bypass of security policy.  That would only be an
issue if the file returned by shmem_zero_setup() was ever linked to an
open file descriptor that could be inherited across a fork+exec or
passed across local socket IPC or binder IPC and thereby shared across
different security contexts. Uses of shmem_zero_setup() include mmap
MAP_ANONYMOUS|MAP_SHARED, drivers/staging/android/ashmem.c (from
ashmem_mmap if VM_SHARED), and drivers/char/mem.c (from mmap_zero if
VM_SHARED).  That all seems fine AFAICS.

 (There is a bool shmem_mapping(mapping) that could be used to confirm
 that the inode you're looking at indeed belongs to shmem; but of
 course that would say yes on all the user-visible shmem inodes too,
 so it wouldn't be a useful test on its own, and I don't

Re: kdbus: credential faking

2015-07-10 Thread Stephen Smalley
On 07/10/2015 05:05 AM, David Herrmann wrote:
 Hi
 
 On Fri, Jul 10, 2015 at 12:56 AM, Casey Schaufler
 ca...@schaufler-ca.com wrote:
 On 7/9/2015 3:22 PM, David Herrmann wrote:
 Regarding requiring CAP_SYS_ADMIN, I don't really see the point. In
 the kdbus security model, if you don't trust the bus-creator, you
 should not connect to the bus.

 That's fine in a discretionary access control model, but
 not in a mandatory access control model. The decision on
 trust of the other guy is never up to the process, it's
 up to the mandatory access control policy.
 
 Exactly. So LSMs are free to use a hook to limit faking other user's
 credentials. But why does that have to affect the default (which, in
 the case of kdbus, is a dac model)?
 
 A bus-creator can bypass kdbus
 policies, sniff on any transmission and modify bus behavior. It just
 seems logical to bind faked-metadata to the same privilege. However, I
 also have no strong feeling about that, if you place valid points. So
 please elaborate.

 Smack has to require CAP_MAC_ADMIN to allow a process to fake
 Smack metadata. This is exactly what CAP_MAC_ADMIN is for.
 Changing Smack metadata is considered a hugely dangerous activity.
 
 I'm totally fine with dropping support to fake seclabels, if LSM
 developers see no need for it. I, certainly, will not insist on it.
 With that in mind, I'd prefer if we limit this discussion to faking 
 CREDS/PIDS.

Well, based on your use case, we actually do need support for faking
seclabels if we need support for faking credentials at all, because your
proxy needs to be able to fake all of the credentials in order to be
fully transparent and preserve compatibility.  So I don't think they can
be divorced from each other.

Regardless, we will definitely want a hook for controlling this ability
to fake credentials, and I think we would want to separately distinguish
each of the cases that you currently lump under your single privileged
boolean, as the ability to do one should not necessarily imply the
ability to do them all.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] selinux: fix mprotect PROT_EXEC regression caused by mm change

2015-07-10 Thread Stephen Smalley
commit 66fc13039422ba7df2d01a8ee0873e4ef965b50b (mm: shmem_zero_setup skip
security check and lockdep conflict with XFS) caused a regression for
SELinux by disabling any SELinux checking of mprotect PROT_EXEC on
shared anonymous mappings.  However, even before that regression, the
checking on such mprotect PROT_EXEC calls was inconsistent with the
checking on a mmap PROT_EXEC call for a shared anonymous mapping.  On a
mmap, the security hook is passed a NULL file and knows it is dealing with
an anonymous mapping and therefore applies an execmem check and no file
checks.  On a mprotect, the security hook is passed a vma with a
non-NULL vm_file (as this was set from the internally-created shmem
file during mmap) and therefore applies the file-based execute check and
no execmem check.  Since the aforementioned commit now marks the shmem
zero inode with the S_PRIVATE flag, the file checks are disabled and
we have no checking at all on mprotect PROT_EXEC.  Add a test to
the mprotect hook logic for such private inodes, and apply an execmem
check in that case.  This makes the mmap and mprotect checking consistent
for shared anonymous mappings, as well as for /dev/zero and ashmem.

Signed-off-by: Stephen Smalley s...@tycho.nsa.gov
---
 security/selinux/hooks.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 6231081..564079c 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3283,7 +3283,8 @@ static int file_map_prot_check(struct file *file, 
unsigned long prot, int shared
int rc = 0;
 
if (default_noexec 
-   (prot  PROT_EXEC)  (!file || (!shared  (prot  PROT_WRITE {
+   (prot  PROT_EXEC)  (!file || IS_PRIVATE(file_inode(file)) ||
+  (!shared  (prot  PROT_WRITE {
/*
 * We are making executable an anonymous mapping or a
 * private file mapping that will also be writable.
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kdbus: credential faking

2015-07-10 Thread Stephen Smalley
On 07/10/2015 09:43 AM, David Herrmann wrote:
 Hi
 
 On Fri, Jul 10, 2015 at 3:25 PM, Stephen Smalley s...@tycho.nsa.gov wrote:
 On 07/09/2015 06:22 PM, David Herrmann wrote:
 To be clear, faking metadata has one use-case, and one use-case only:
 dbus1 compatibility

 In dbus1, clients connect to a unix-socket placed in the file-system
 hierarchy. To avoid breaking ABI for old clients, we support a
 unix-kdbus proxy. This proxy is called systemd-bus-proxyd. It is
 spawned once for each bus we proxy and simply remarshals messages from
 the client to kdbus and vice versa.

 Is this truly necessary?  Can't the distributions just update the client
 side libraries to use kdbus if enabled and be done with it?  Doesn't
 this proxy undo many of the benefits of using kdbus in the first place?
 
 We need binary compatibility to dbus1. There're millions of
 applications and language bindings with dbus1 compiled in, which we
 cannot suddenly break.

So, are you saying that there are many applications that statically link
the dbus1 library implementation (thus the distributions can't just push
an updated shared library that switches from using the socket to using
kdbus), and that many of these applications are third party applications
not packaged by the distributions (thus the distributions cannot just do
a mass rebuild to update these applications too)?  Otherwise, I would
think that the use of a socket would just be an implementation detail
and you would be free to change it without affecting dbus1 library ABI
compatibility.

 With dbus1, clients can ask the dbus-daemon for the seclabel of a peer
 they talk to. They're free to use this information for any purpose. On
 kdbus, we want to be compatible to dbus-daemon. Therefore, if a native
 client queries kdbus for the seclabel of a peer behind a proxy, we
 want that query to return the actual seclabel of the peer, not the
 seclabel of the proxy. Same applies to PIDS and CREDS.

 This faked metadata is never used by the kernel for any security
 decisions. It's sole purpose is to return them if a native kdbus
 client queries another peer. Furthermore, this information is never
 transmitted as send-time metadata (as it is, in no way, send-time
 metadata), but only if you explicitly query the connection-time
 metadata of a peer (KDBUS_CMD_CONN_INFO).

 I guess I don't understand the difference.  Is there a separate facility
 for obtaining the send-time metadata that is not subject to credential
 faking?
 
 Each message carries metadata of the sender, that was collected at the
 time of _SEND_. This metadata cannot be faked.
 Additionally (for introspection and dbus1 compat), kdbus allows peers
 to query metadata of other peers, that were collected at the time of
 _CONNECT_. Privileged peers can provide faked _connection_ metadata,
 which has the side-effect of suppressing send-time metadata.
 It is up to the receiver to request connection-metadata if a message
 did not carry send-time metadata. We do this, currently, only to
 support legacy dbus1 clients which do not support send-time metadata.

So the privileged peer (which just means the bus owner, which can be
completely unprivileged from a typical DAC perspective) can both prevent
the receiver from getting the (real, unfakeable) send-time metadata and
supply arbitrary fake credentials for the connection metadata?  And the
legacy dbus1 clients (i.e. all current DBUS applications?) will always
use this potentially faked metadata.  Meanwhile, what about new dbus
clients?  What is the standard behavior for them when the send-time
metadata is suppressed?  Do they always fall back to the connection
metadata?

 Regarding requiring CAP_SYS_ADMIN, I don't really see the point. In
 the kdbus security model, if you don't trust the bus-creator, you
 should not connect to the bus. A bus-creator can bypass kdbus
 policies, sniff on any transmission and modify bus behavior. It just
 seems logical to bind faked-metadata to the same privilege. However, I
 also have no strong feeling about that, if you place valid points. So
 please elaborate.
 But, please be aware that if we require privileges to fake metadata,
 then you need to have such privileges to provide a dbus1 proxy for
 your native bus on kdbus. In other words, users are able to create
 session/user buses, but they need CAP_SYS_ADMIN to spawn the dbus1
 proxy. This will have the net-effect of us requiring to run the proxy
 as root (which, I think, is worse than allowing bus-owners to fake
 _connection_ metadata).

 Applications have a reasonable expectation that credentials supplied by
 the kernel for a peer are trustworthy.  Allowing unprivileged users to
 forge arbitrary credentials and pids seems fraught with peril.  You say
 that one should never connect to a bus if you do not trust its creator.
  What mechanisms are provided to allow me to determine whether I trust
 the bus creator before connecting?  Are those mechanisms automatically
 employed by default?
 
 Regarding

Re: kdbus: credential faking

2015-07-10 Thread Stephen Smalley
On 07/09/2015 06:22 PM, David Herrmann wrote:
 Hi
 
 On Thu, Jul 9, 2015 at 8:26 PM, Stephen Smalley s...@tycho.nsa.gov wrote:
 Hi,

 I have a concern with the support for faked credentials in kdbus, but
 don't know enough about the original motivation or intended use case to
 evaluate it concretely.  I raised this issue during the kdbus for
 4.1-rc1 thread a while back but none of the kdbus maintainers
 responded,
 
 Sorry, some mails might have been gone unanswered in that huge thread.
 Please feel free to ping us about anything we didn't comment on. See
 below..
 
and the one D-BUS maintainer who did respond said that there
 is no API in dbus-daemon for faking client credentials, so this is not
 something inherited from dbus-daemon or required for compatibility with it.

 First, I have doubts as to whether there should be any way to fake the
 seclabel, no matter how privileged the caller.  Unless there is a
 clear use case for that functionality, I would prefer to see it dropped
 altogether.

 Second, IIUC, the ability to fake any portion of the credentials or pids
 is granted if the caller either has CAP_IPC_OWNER or owns the bus (uid
 match).  Clearly that isn't sufficient basis for seclabel faking, and it
 seems questionable as to whether it should be sufficient for faking any
 of the other credentials or pids.  Compare with e.g.
 net/core/scm.c:scm_check_creds() logic for faking credentials on a Unix
 domain socket, which requires CAP_SYS_ADMIN for faking pid, CAP_SETUID
 for faking any of the uid fields, and CAP_SETGID for faking any of the
 gid fields.

 Thanks for any light you can shed on the matter.
 
 To be clear, faking metadata has one use-case, and one use-case only:
 dbus1 compatibility
 
 In dbus1, clients connect to a unix-socket placed in the file-system
 hierarchy. To avoid breaking ABI for old clients, we support a
 unix-kdbus proxy. This proxy is called systemd-bus-proxyd. It is
 spawned once for each bus we proxy and simply remarshals messages from
 the client to kdbus and vice versa.

Is this truly necessary?  Can't the distributions just update the client
side libraries to use kdbus if enabled and be done with it?  Doesn't
this proxy undo many of the benefits of using kdbus in the first place?

 With dbus1, clients can ask the dbus-daemon for the seclabel of a peer
 they talk to. They're free to use this information for any purpose. On
 kdbus, we want to be compatible to dbus-daemon. Therefore, if a native
 client queries kdbus for the seclabel of a peer behind a proxy, we
 want that query to return the actual seclabel of the peer, not the
 seclabel of the proxy. Same applies to PIDS and CREDS.
 
 This faked metadata is never used by the kernel for any security
 decisions. It's sole purpose is to return them if a native kdbus
 client queries another peer. Furthermore, this information is never
 transmitted as send-time metadata (as it is, in no way, send-time
 metadata), but only if you explicitly query the connection-time
 metadata of a peer (KDBUS_CMD_CONN_INFO).

I guess I don't understand the difference.  Is there a separate facility
for obtaining the send-time metadata that is not subject to credential
faking?

 Regarding requiring CAP_SYS_ADMIN, I don't really see the point. In
 the kdbus security model, if you don't trust the bus-creator, you
 should not connect to the bus. A bus-creator can bypass kdbus
 policies, sniff on any transmission and modify bus behavior. It just
 seems logical to bind faked-metadata to the same privilege. However, I
 also have no strong feeling about that, if you place valid points. So
 please elaborate.
 But, please be aware that if we require privileges to fake metadata,
 then you need to have such privileges to provide a dbus1 proxy for
 your native bus on kdbus. In other words, users are able to create
 session/user buses, but they need CAP_SYS_ADMIN to spawn the dbus1
 proxy. This will have the net-effect of us requiring to run the proxy
 as root (which, I think, is worse than allowing bus-owners to fake
 _connection_ metadata).

Applications have a reasonable expectation that credentials supplied by
the kernel for a peer are trustworthy.  Allowing unprivileged users to
forge arbitrary credentials and pids seems fraught with peril.  You say
that one should never connect to a bus if you do not trust its creator.
 What mechanisms are provided to allow me to determine whether I trust
the bus creator before connecting?  Are those mechanisms automatically
employed by default?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kdbus: credential faking

2015-07-10 Thread Stephen Smalley
On 07/10/2015 12:48 PM, David Herrmann wrote:
 Hi
 
 On Fri, Jul 10, 2015 at 4:47 PM, Stephen Smalley s...@tycho.nsa.gov wrote:
 On 07/10/2015 09:43 AM, David Herrmann wrote:
 On Fri, Jul 10, 2015 at 3:25 PM, Stephen Smalley s...@tycho.nsa.gov wrote:
 On 07/09/2015 06:22 PM, David Herrmann wrote:
 With dbus1, clients can ask the dbus-daemon for the seclabel of a peer
 they talk to. They're free to use this information for any purpose. On
 kdbus, we want to be compatible to dbus-daemon. Therefore, if a native
 client queries kdbus for the seclabel of a peer behind a proxy, we
 want that query to return the actual seclabel of the peer, not the
 seclabel of the proxy. Same applies to PIDS and CREDS.

 This faked metadata is never used by the kernel for any security
 decisions. It's sole purpose is to return them if a native kdbus
 client queries another peer. Furthermore, this information is never
 transmitted as send-time metadata (as it is, in no way, send-time
 metadata), but only if you explicitly query the connection-time
 metadata of a peer (KDBUS_CMD_CONN_INFO).

 I guess I don't understand the difference.  Is there a separate facility
 for obtaining the send-time metadata that is not subject to credential
 faking?

 Each message carries metadata of the sender, that was collected at the
 time of _SEND_. This metadata cannot be faked.
 Additionally (for introspection and dbus1 compat), kdbus allows peers
 to query metadata of other peers, that were collected at the time of
 _CONNECT_. Privileged peers can provide faked _connection_ metadata,
 which has the side-effect of suppressing send-time metadata.
 It is up to the receiver to request connection-metadata if a message
 did not carry send-time metadata. We do this, currently, only to
 support legacy dbus1 clients which do not support send-time metadata.

 So the privileged peer (which just means the bus owner, which can be
 completely unprivileged from a typical DAC perspective) can both prevent
 the receiver from getting the (real, unfakeable) send-time metadata and
 supply arbitrary fake credentials for the connection metadata?  And the
 
 (Limited to PIDS/CREDS/SECLABEL metadata, but) yes.
 
 Note that this is all under the assumption that you never connect to a
 bus owned by someone else but you or root. Hence, a peer can only fake
 metadata, if it can also ptrace you.

If you don't enforce this assumption in kdbus, then you can't be sure
that it won't be violated by future userspace.

Also, the statement about ptrace doesn't hold when using SELinux or
other security modules.

 legacy dbus1 clients (i.e. all current DBUS applications?) will always
 use this potentially faked metadata.  Meanwhile, what about new dbus
 clients?  What is the standard behavior for them when the send-time
 metadata is suppressed?  Do they always fall back to the connection
 metadata?
 
 This is a decision user-space has to make. In sd-bus, if we trust the
 bus (root owned, or our own), we always fall back to connection
 metadata.

So the only benefit of the credentials in the send-time metadata is they
come for free rather than needing to be separately queried?  And aside
from credential faking (impersonation being a nicer name), when else
would they differ from the connection metadata?  If the program does a
setuid or something after creating the connection?

 Regarding requiring CAP_SYS_ADMIN, I don't really see the point. In
 the kdbus security model, if you don't trust the bus-creator, you
 should not connect to the bus. A bus-creator can bypass kdbus
 policies, sniff on any transmission and modify bus behavior. It just
 seems logical to bind faked-metadata to the same privilege. However, I
 also have no strong feeling about that, if you place valid points. So
 please elaborate.
 But, please be aware that if we require privileges to fake metadata,
 then you need to have such privileges to provide a dbus1 proxy for
 your native bus on kdbus. In other words, users are able to create
 session/user buses, but they need CAP_SYS_ADMIN to spawn the dbus1
 proxy. This will have the net-effect of us requiring to run the proxy
 as root (which, I think, is worse than allowing bus-owners to fake
 _connection_ metadata).

 Applications have a reasonable expectation that credentials supplied by
 the kernel for a peer are trustworthy.  Allowing unprivileged users to
 forge arbitrary credentials and pids seems fraught with peril.  You say
 that one should never connect to a bus if you do not trust its creator.
  What mechanisms are provided to allow me to determine whether I trust
 the bus creator before connecting?  Are those mechanisms automatically
 employed by default?

 Regarding the default security model (uid based), each bus is prefixed
 by the uid of the bus-owner. This is enforced by the kernel. Hence, a
 process cannot 'accidentally' connect to a bus of a user they don't
 trust.

 And how do they go about looking up / obtaining the destination bus name
 in the first

Re: [RFC 5/8] kdbus: use LSM hooks in kdbus code

2015-07-10 Thread Stephen Smalley
On 07/08/2015 09:37 AM, Stephen Smalley wrote:
 On 07/08/2015 06:25 AM, Paul Osmialowski wrote:
 Originates from:

 https://github.com/lmctl/kdbus.git (branch: kdbus-lsm-v4.for-systemd-v212)
 commit: aa0885489d19be92fa41c6f0a71df28763228a40

 Signed-off-by: Karol Lewandowski k.lewando...@samsung.com
 Signed-off-by: Paul Osmialowski p.osmialo...@samsung.com
 ---
  ipc/kdbus/bus.c| 12 ++-
  ipc/kdbus/bus.h|  3 +++
  ipc/kdbus/connection.c | 54 
 ++
  ipc/kdbus/connection.h |  4 
  ipc/kdbus/domain.c |  9 -
  ipc/kdbus/domain.h |  2 ++
  ipc/kdbus/endpoint.c   | 11 ++
  ipc/kdbus/names.c  | 11 ++
  ipc/kdbus/queue.c  | 30 ++--
  9 files changed, 124 insertions(+), 12 deletions(-)


 
 diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c
 index 9993753..b85cdc7 100644
 --- a/ipc/kdbus/connection.c
 +++ b/ipc/kdbus/connection.c
 @@ -31,6 +31,7 @@
  #include linux/slab.h
  #include linux/syscalls.h
  #include linux/uio.h
 +#include linux/security.h
  
  #include bus.h
  #include connection.h
 @@ -73,6 +74,8 @@ static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep 
 *ep, bool privileged,
  bool is_activator;
  bool is_monitor;
  struct kvec kvec;
 +u32 sid, len;
 +char *label;
  int ret;
  
  struct {
 @@ -222,6 +225,14 @@ static struct kdbus_conn *kdbus_conn_new(struct 
 kdbus_ep *ep, bool privileged,
  }
  }
  
 +security_task_getsecid(current, sid);
 +security_secid_to_secctx(sid, label, len);
 +ret = security_kdbus_connect(conn, label, len);
 +if (ret) {
 +ret = -EPERM;
 +goto exit_unref;
 +}
 
 This seems convoluted and expensive.  If you always want the label of
 the current task here, then why not just have security_kdbus_connect()
 internally extract the label of the current task?
 
 @@ -1107,6 +1119,12 @@ static int kdbus_conn_reply(struct kdbus_conn *src, 
 struct kdbus_kmsg *kmsg)
  if (ret  0)
  goto exit;
  
 +ret = security_kdbus_talk(src, dst);
 +if (ret) {
 +ret = -EPERM;
 +goto exit;
 +}
 
 Where does kdbus apply its uid-based or other restrictions on
 connections?  Why do we need to insert separate hooks into each of these
 functions?  Is there no central chokepoint already for permission
 checking that we can hook?

For example, why wouldn't you insert a single hook into
kdbus_conn_policy_talk() where they perform their DAC checking?
You would need to restructure it slightly to ensure that the security
hook is only called if it passes the DAC (privileged || uid_eq) check so
that we do not trigger MAC denials when DAC wouldn't have allowed it
anyway.  Also, kdbus_conn_policy_talk() takes a separate conn_creds
argument - that should be passed through to the hook as well.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 5/8] kdbus: use LSM hooks in kdbus code

2015-07-10 Thread Stephen Smalley
On 07/08/2015 09:37 AM, Stephen Smalley wrote:
 On 07/08/2015 06:25 AM, Paul Osmialowski wrote:
 Originates from:

 https://github.com/lmctl/kdbus.git (branch: kdbus-lsm-v4.for-systemd-v212)
 commit: aa0885489d19be92fa41c6f0a71df28763228a40

 Signed-off-by: Karol Lewandowski k.lewando...@samsung.com
 Signed-off-by: Paul Osmialowski p.osmialo...@samsung.com
 ---
  ipc/kdbus/bus.c| 12 ++-
  ipc/kdbus/bus.h|  3 +++
  ipc/kdbus/connection.c | 54 
 ++
  ipc/kdbus/connection.h |  4 
  ipc/kdbus/domain.c |  9 -
  ipc/kdbus/domain.h |  2 ++
  ipc/kdbus/endpoint.c   | 11 ++
  ipc/kdbus/names.c  | 11 ++
  ipc/kdbus/queue.c  | 30 ++--
  9 files changed, 124 insertions(+), 12 deletions(-)


 
 diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c
 index 9993753..b85cdc7 100644
 --- a/ipc/kdbus/connection.c
 +++ b/ipc/kdbus/connection.c
 @@ -31,6 +31,7 @@
  #include linux/slab.h
  #include linux/syscalls.h
  #include linux/uio.h
 +#include linux/security.h
  
  #include bus.h
  #include connection.h
 @@ -73,6 +74,8 @@ static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep 
 *ep, bool privileged,
  bool is_activator;
  bool is_monitor;
  struct kvec kvec;
 +u32 sid, len;
 +char *label;
  int ret;
  
  struct {
 @@ -222,6 +225,14 @@ static struct kdbus_conn *kdbus_conn_new(struct 
 kdbus_ep *ep, bool privileged,
  }
  }
  
 +security_task_getsecid(current, sid);
 +security_secid_to_secctx(sid, label, len);
 +ret = security_kdbus_connect(conn, label, len);
 +if (ret) {
 +ret = -EPERM;
 +goto exit_unref;
 +}
 
 This seems convoluted and expensive.  If you always want the label of
 the current task here, then why not just have security_kdbus_connect()
 internally extract the label of the current task?

Furthermore, why do we need a separate security field and copy of the
current label in the conn-security, when we already have
conn-cred-security available to us?

I don't think we need new security fields unless we are going to assign
some kind of object labeling to these structures separate from their
cred, and offhand I don't see why we would do that.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kdbus: credential faking

2015-07-09 Thread Stephen Smalley
Hi,

I have a concern with the support for faked credentials in kdbus, but
don't know enough about the original motivation or intended use case to
evaluate it concretely.  I raised this issue during the "kdbus for
4.1-rc1" thread a while back but none of the kdbus maintainers
responded, and the one D-BUS maintainer who did respond said that there
is no API in dbus-daemon for faking client credentials, so this is not
something inherited from dbus-daemon or required for compatibility with it.

First, I have doubts as to whether there should be any way to fake the
seclabel, no matter how "privileged" the caller.  Unless there is a
clear use case for that functionality, I would prefer to see it dropped
altogether.

Second, IIUC, the ability to fake any portion of the credentials or pids
is granted if the caller either has CAP_IPC_OWNER or owns the bus (uid
match).  Clearly that isn't sufficient basis for seclabel faking, and it
seems questionable as to whether it should be sufficient for faking any
of the other credentials or pids.  Compare with e.g.
net/core/scm.c:scm_check_creds() logic for faking credentials on a Unix
domain socket, which requires CAP_SYS_ADMIN for faking pid, CAP_SETUID
for faking any of the uid fields, and CAP_SETGID for faking any of the
gid fields.

Thanks for any light you can shed on the matter.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS

2015-07-09 Thread Stephen Smalley
On 07/09/2015 04:23 AM, Hugh Dickins wrote:
> On Wed, 8 Jul 2015, Stephen Smalley wrote:
>> On 07/08/2015 09:13 AM, Stephen Smalley wrote:
>>> On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins  wrote:
>>>> It appears that, at some point last year, XFS made directory handling
>>>> changes which bring it into lockdep conflict with shmem_zero_setup():
>>>> it is surprising that mmap() can clone an inode while holding mmap_sem,
>>>> but that has been so for many years.
>>>>
>>>> Since those few lockdep traces that I've seen all implicated selinux,
>>>> I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which
>>>> v3.13's commit c7277090927a ("security: shmem: implement kernel private
>>>> shmem inodes") introduced to avoid LSM checks on kernel-internal inodes:
>>>> the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail.
>>>>
>>>> This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero
>>>> (and MAP_SHARED|MAP_ANONYMOUS).  I thought there were also drivers
>>>> which cloned inode in mmap(), but if so, I cannot locate them now.
>>>
>>> This causes a regression for SELinux (please, in the future, cc
>>> selinux list and Paul Moore on SELinux-related changes).  In
> 
> Surprised and sorry about that, yes, I should have Cc'ed.
> 
>>> particular, this change disables SELinux checking of mprotect
>>> PROT_EXEC on shared anonymous mappings, so we lose the ability to
>>> control executable mappings.  That said, we are only getting that
>>> check today as a side effect of our file execute check on the tmpfs
>>> inode, whereas it would be better (and more consistent with the
>>> mmap-time checks) to apply an execmem check in that case, in which
>>> case we wouldn't care about the inode-based check.  However, I am
>>> unclear on how to correctly detect that situation from
>>> selinux_file_mprotect() -> file_map_prot_check(), because we do have a
>>> non-NULL vma->vm_file so we treat it as a file execute check.  In
>>> contrast, if directly creating an anonymous shared mapping with
>>> PROT_EXEC via mmap(...PROT_EXEC...),  selinux_mmap_file is called with
>>> a NULL file and therefore we end up applying an execmem check.
> 
> If you're willing to go forward with the change, rather than just call
> for an immediate revert of it, then I think the right way to detect
> the situation would be to check IS_PRIVATE(file_inode(vma->vm_file)),
> wouldn't it?

That seems misleading and might trigger execmem checks on non-shmem
inodes.  S_PRIVATE was originally introduced for fs-internal inodes that
are never directly exposed to userspace, originally for reiserfs xattr
inodes (reiserfs xattrs are internally implemented as their own files
that are hidden from userspace) and later also applied to anon inodes.
It would be better if we had an explicit way of testing that we are
dealing with an anonymous shared mapping in selinux_file_mprotect() ->
file_map_prot_check().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kdbus: credential faking

2015-07-09 Thread Stephen Smalley
Hi,

I have a concern with the support for faked credentials in kdbus, but
don't know enough about the original motivation or intended use case to
evaluate it concretely.  I raised this issue during the kdbus for
4.1-rc1 thread a while back but none of the kdbus maintainers
responded, and the one D-BUS maintainer who did respond said that there
is no API in dbus-daemon for faking client credentials, so this is not
something inherited from dbus-daemon or required for compatibility with it.

First, I have doubts as to whether there should be any way to fake the
seclabel, no matter how privileged the caller.  Unless there is a
clear use case for that functionality, I would prefer to see it dropped
altogether.

Second, IIUC, the ability to fake any portion of the credentials or pids
is granted if the caller either has CAP_IPC_OWNER or owns the bus (uid
match).  Clearly that isn't sufficient basis for seclabel faking, and it
seems questionable as to whether it should be sufficient for faking any
of the other credentials or pids.  Compare with e.g.
net/core/scm.c:scm_check_creds() logic for faking credentials on a Unix
domain socket, which requires CAP_SYS_ADMIN for faking pid, CAP_SETUID
for faking any of the uid fields, and CAP_SETGID for faking any of the
gid fields.

Thanks for any light you can shed on the matter.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS

2015-07-09 Thread Stephen Smalley
On 07/09/2015 04:23 AM, Hugh Dickins wrote:
 On Wed, 8 Jul 2015, Stephen Smalley wrote:
 On 07/08/2015 09:13 AM, Stephen Smalley wrote:
 On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins hu...@google.com wrote:
 It appears that, at some point last year, XFS made directory handling
 changes which bring it into lockdep conflict with shmem_zero_setup():
 it is surprising that mmap() can clone an inode while holding mmap_sem,
 but that has been so for many years.

 Since those few lockdep traces that I've seen all implicated selinux,
 I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which
 v3.13's commit c7277090927a (security: shmem: implement kernel private
 shmem inodes) introduced to avoid LSM checks on kernel-internal inodes:
 the mmap(/dev/zero) cloned inode is indeed a kernel-internal detail.

 This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero
 (and MAP_SHARED|MAP_ANONYMOUS).  I thought there were also drivers
 which cloned inode in mmap(), but if so, I cannot locate them now.

 This causes a regression for SELinux (please, in the future, cc
 selinux list and Paul Moore on SELinux-related changes).  In
 
 Surprised and sorry about that, yes, I should have Cc'ed.
 
 particular, this change disables SELinux checking of mprotect
 PROT_EXEC on shared anonymous mappings, so we lose the ability to
 control executable mappings.  That said, we are only getting that
 check today as a side effect of our file execute check on the tmpfs
 inode, whereas it would be better (and more consistent with the
 mmap-time checks) to apply an execmem check in that case, in which
 case we wouldn't care about the inode-based check.  However, I am
 unclear on how to correctly detect that situation from
 selinux_file_mprotect() - file_map_prot_check(), because we do have a
 non-NULL vma-vm_file so we treat it as a file execute check.  In
 contrast, if directly creating an anonymous shared mapping with
 PROT_EXEC via mmap(...PROT_EXEC...),  selinux_mmap_file is called with
 a NULL file and therefore we end up applying an execmem check.
 
 If you're willing to go forward with the change, rather than just call
 for an immediate revert of it, then I think the right way to detect
 the situation would be to check IS_PRIVATE(file_inode(vma-vm_file)),
 wouldn't it?

That seems misleading and might trigger execmem checks on non-shmem
inodes.  S_PRIVATE was originally introduced for fs-internal inodes that
are never directly exposed to userspace, originally for reiserfs xattr
inodes (reiserfs xattrs are internally implemented as their own files
that are hidden from userspace) and later also applied to anon inodes.
It would be better if we had an explicit way of testing that we are
dealing with an anonymous shared mapping in selinux_file_mprotect() -
file_map_prot_check().
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 4.2-rc1

2015-07-08 Thread Stephen Smalley
On 07/08/2015 01:47 PM, Casey Schaufler wrote:
> On 7/8/2015 10:29 AM, Linus Torvalds wrote:
>> On Wed, Jul 8, 2015 at 10:17 AM, Linus Torvalds
>>  wrote:
>>> Decoding the "Code:" line shows that this is the "->fw_id" dereference in
>>>
>>> if (add_uevent_var(env, "FIRMWARE=%s", fw_priv->buf->fw_id))
>>> return -ENOMEM;
>>>
>>> and that "fw_priv->buf" pointer is NULL.
>>>
>>> However, I don't see anything that looks like it should have changed
>>> any of this since 4.1.
>> Looking at the otehr uses of "fw_priv->buf", they all check that
>> pointer for NULL. I see code like
>>
>> fw_buf = fw_priv->buf;
>> if (!fw_buf)
>> goto out;
>>
>> etc.
>>
>> Also, it looks like you need to hold the "fw_lock" to even look at
>> that pointer, since the buffer can get reallocated etc.
>>
>> So that uevent code really looks buggy. It just doesn't look like a
>> *new* bug to me. That code looks old, going back to 2012 and commit
>> 1244691c73b2.
> 
> There have been SELinux changes to kernfs for 4.2. William,
> you might want to have a look here.

What change are you referring to?  I see no SELinux-related changes to
kernfs in 4.2-rc1.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS

2015-07-08 Thread Stephen Smalley
On 07/08/2015 09:13 AM, Stephen Smalley wrote:
> On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins  wrote:
>> It appears that, at some point last year, XFS made directory handling
>> changes which bring it into lockdep conflict with shmem_zero_setup():
>> it is surprising that mmap() can clone an inode while holding mmap_sem,
>> but that has been so for many years.
>>
>> Since those few lockdep traces that I've seen all implicated selinux,
>> I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which
>> v3.13's commit c7277090927a ("security: shmem: implement kernel private
>> shmem inodes") introduced to avoid LSM checks on kernel-internal inodes:
>> the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail.
>>
>> This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero
>> (and MAP_SHARED|MAP_ANONYMOUS).  I thought there were also drivers
>> which cloned inode in mmap(), but if so, I cannot locate them now.
> 
> This causes a regression for SELinux (please, in the future, cc
> selinux list and Paul Moore on SELinux-related changes).  In
> particular, this change disables SELinux checking of mprotect
> PROT_EXEC on shared anonymous mappings, so we lose the ability to
> control executable mappings.  That said, we are only getting that
> check today as a side effect of our file execute check on the tmpfs
> inode, whereas it would be better (and more consistent with the
> mmap-time checks) to apply an execmem check in that case, in which
> case we wouldn't care about the inode-based check.  However, I am
> unclear on how to correctly detect that situation from
> selinux_file_mprotect() -> file_map_prot_check(), because we do have a
> non-NULL vma->vm_file so we treat it as a file execute check.  In
> contrast, if directly creating an anonymous shared mapping with
> PROT_EXEC via mmap(...PROT_EXEC...),  selinux_mmap_file is called with
> a NULL file and therefore we end up applying an execmem check.

Also, can you provide the lockdep traces that motivated this change?

> 
>>
>> Reported-and-tested-by: Prarit Bhargava 
>> Reported-by: Daniel Wagner 
>> Reported-by: Morten Stevens 
>> Signed-off-by: Hugh Dickins 
>> ---
>>
>>  mm/shmem.c |8 +++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> --- 4.1-rc7/mm/shmem.c  2015-04-26 19:16:31.352191298 -0700
>> +++ linux/mm/shmem.c2015-06-14 09:26:49.461120166 -0700
>> @@ -3401,7 +3401,13 @@ int shmem_zero_setup(struct vm_area_stru
>> struct file *file;
>> loff_t size = vma->vm_end - vma->vm_start;
>>
>> -   file = shmem_file_setup("dev/zero", size, vma->vm_flags);
>> +   /*
>> +* Cloning a new file under mmap_sem leads to a lock ordering 
>> conflict
>> +* between XFS directory reading and selinux: since this file is only
>> +* accessible to the user through its mapping, use S_PRIVATE flag to
>> +* bypass file security, in the same way as 
>> shmem_kernel_file_setup().
>> +*/
>> +   file = __shmem_file_setup("dev/zero", size, vma->vm_flags, 
>> S_PRIVATE);
>> if (IS_ERR(file))
>> return PTR_ERR(file);
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
> ___
> Selinux mailing list
> seli...@tycho.nsa.gov
> To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
> To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 4/8] lsm: smack: smack callbacks for kdbus security hooks

2015-07-08 Thread Stephen Smalley
On 07/08/2015 06:25 AM, Paul Osmialowski wrote:
> This adds implementation of three smack callbacks sitting behind kdbus
> security hooks as proposed by Karol Lewandowski.
> 
> Originates from:
> 
> git://git.infradead.org/users/pcmoore/selinux (branch: working-kdbus)
> commit: fc3505d058c001fe72a6f66b833e0be5b2d118f3
> 
> https://github.com/lmctl/linux.git (branch: kdbus-lsm-v4.for-systemd-v212)
> commit: 103c26fd27d1ec8c32d85dd3d85681f936ac66fb
> 
> Signed-off-by: Karol Lewandowski 
> Signed-off-by: Paul Osmialowski 
> ---
>  security/smack/smack_lsm.c | 68 
> ++
>  1 file changed, 68 insertions(+)
> 
> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
> index a143328..033b756 100644
> --- a/security/smack/smack_lsm.c
> +++ b/security/smack/smack_lsm.c
> @@ -41,6 +41,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "smack.h"
>  
>  #define TRANS_TRUE   "TRUE"
> @@ -3336,6 +3337,69 @@ static int smack_setprocattr(struct task_struct *p, 
> char *name,
>  }
>  
>  /**
> + * smack_kdbus_connect - Set the security blob for a KDBus connection
> + * @conn: the connection
> + * @secctx: smack label
> + * @seclen: smack label length
> + *
> + * Returns 0
> + */
> +static int smack_kdbus_connect(struct kdbus_conn *conn,
> +const char *secctx, u32 seclen)
> +{
> + struct smack_known *skp;
> +
> + if (secctx && seclen > 0)
> + skp = smk_import_entry(secctx, seclen);
> + else
> + skp = smk_of_current();
> + conn->security = skp;
> +
> + return 0;
> +}
> +
> +/**
> + * smack_kdbus_conn_free - Clear the security blob for a KDBus connection
> + * @conn: the connection
> + *
> + * Clears the blob pointer
> + */
> +static void smack_kdbus_conn_free(struct kdbus_conn *conn)
> +{
> + conn->security = NULL;
> +}
> +
> +/**
> + * smack_kdbus_talk - Smack access on KDBus
> + * @src: source kdbus connection
> + * @dst: destination kdbus connection
> + *
> + * Return 0 if a subject with the smack of sock could access
> + * an object with the smack of other, otherwise an error code
> + */
> +static int smack_kdbus_talk(const struct kdbus_conn *src,
> + const struct kdbus_conn *dst)
> +{
> + struct smk_audit_info ad;
> + struct smack_known *sskp = src->security;
> + struct smack_known *dskp = dst->security;
> + int ret;
> +
> + BUG_ON(sskp == NULL);
> + BUG_ON(dskp == NULL);
> +
> + if (smack_privileged(CAP_MAC_OVERRIDE))
> + return 0;
> +
> + smk_ad_init(, __func__, LSM_AUDIT_DATA_NONE);
> +
> + ret = smk_access(sskp, dskp, MAY_WRITE, );
> + if (ret)
> + return ret;
> + return 0;
> +}
> +
> +/**
>   * smack_unix_stream_connect - Smack access on UDS
>   * @sock: one sock
>   * @other: the other sock
> @@ -4393,6 +4457,10 @@ struct security_hook_list smack_hooks[] = {
>   LSM_HOOK_INIT(inode_notifysecctx, smack_inode_notifysecctx),
>   LSM_HOOK_INIT(inode_setsecctx, smack_inode_setsecctx),
>   LSM_HOOK_INIT(inode_getsecctx, smack_inode_getsecctx),
> +
> + LSM_HOOK_INIT(kdbus_connect, smack_kdbus_connect),
> + LSM_HOOK_INIT(kdbus_conn_free, smack_kdbus_conn_free),
> + LSM_HOOK_INIT(kdbus_talk, smack_kdbus_talk),
>  };

If Smack only truly needs 3 hooks, then it begs the question of why
there are so many other hooks defined.  Are the other hooks just to
support finer-grained distinctions, or is Smack's coverage incomplete?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 5/8] kdbus: use LSM hooks in kdbus code

2015-07-08 Thread Stephen Smalley
On 07/08/2015 06:25 AM, Paul Osmialowski wrote:
> Originates from:
> 
> https://github.com/lmctl/kdbus.git (branch: kdbus-lsm-v4.for-systemd-v212)
> commit: aa0885489d19be92fa41c6f0a71df28763228a40
> 
> Signed-off-by: Karol Lewandowski 
> Signed-off-by: Paul Osmialowski 
> ---
>  ipc/kdbus/bus.c| 12 ++-
>  ipc/kdbus/bus.h|  3 +++
>  ipc/kdbus/connection.c | 54 
> ++
>  ipc/kdbus/connection.h |  4 
>  ipc/kdbus/domain.c |  9 -
>  ipc/kdbus/domain.h |  2 ++
>  ipc/kdbus/endpoint.c   | 11 ++
>  ipc/kdbus/names.c  | 11 ++
>  ipc/kdbus/queue.c  | 30 ++--
>  9 files changed, 124 insertions(+), 12 deletions(-)
> 
>

> diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c
> index 9993753..b85cdc7 100644
> --- a/ipc/kdbus/connection.c
> +++ b/ipc/kdbus/connection.c
> @@ -31,6 +31,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "bus.h"
>  #include "connection.h"
> @@ -73,6 +74,8 @@ static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep 
> *ep, bool privileged,
>   bool is_activator;
>   bool is_monitor;
>   struct kvec kvec;
> + u32 sid, len;
> + char *label;
>   int ret;
>  
>   struct {
> @@ -222,6 +225,14 @@ static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep 
> *ep, bool privileged,
>   }
>   }
>  
> + security_task_getsecid(current, );
> + security_secid_to_secctx(sid, , );
> + ret = security_kdbus_connect(conn, label, len);
> + if (ret) {
> + ret = -EPERM;
> + goto exit_unref;
> + }

This seems convoluted and expensive.  If you always want the label of
the current task here, then why not just have security_kdbus_connect()
internally extract the label of the current task?

> @@ -1107,6 +1119,12 @@ static int kdbus_conn_reply(struct kdbus_conn *src, 
> struct kdbus_kmsg *kmsg)
>   if (ret < 0)
>   goto exit;
>  
> + ret = security_kdbus_talk(src, dst);
> + if (ret) {
> + ret = -EPERM;
> + goto exit;
> + }

Where does kdbus apply its uid-based or other restrictions on
connections?  Why do we need to insert separate hooks into each of these
functions?  Is there no central chokepoint already for permission
checking that we can hook?

> diff --git a/ipc/kdbus/connection.h b/ipc/kdbus/connection.h
> index d1ffe90..1f91d39 100644
> --- a/ipc/kdbus/connection.h
> +++ b/ipc/kdbus/connection.h
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "limits.h"
>  #include "metadata.h"
> @@ -73,6 +74,7 @@ struct kdbus_kmsg;
>   * @names_queue_list:Well-known names this connection waits for
>   * @privileged:  Whether this connection is privileged on the bus
>   * @faked_meta:  Whether the metadata was faked on HELLO
> + * @security:LSM security blob
>   */
>  struct kdbus_conn {
>   struct kref kref;
> @@ -113,6 +115,8 @@ struct kdbus_conn {
>  
>   bool privileged:1;
>   bool faked_meta:1;
> +
> + void *security;
>  };

Unless I missed it, you may have missed the most important thing of all:
 controlling kdbus's notion of "privileged".  kdbus sets privileged to
true if the process has CAP_IPC_OWNER or the process euid matches the
uid of the bus creator, and then it allows those processes to do many
dangerous things, including monitoring all traffic, impersonating
credentials, pids, or seclabel, etc.

I don't believe we should ever permit impersonating seclabel information.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS

2015-07-08 Thread Stephen Smalley
On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins  wrote:
> It appears that, at some point last year, XFS made directory handling
> changes which bring it into lockdep conflict with shmem_zero_setup():
> it is surprising that mmap() can clone an inode while holding mmap_sem,
> but that has been so for many years.
>
> Since those few lockdep traces that I've seen all implicated selinux,
> I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which
> v3.13's commit c7277090927a ("security: shmem: implement kernel private
> shmem inodes") introduced to avoid LSM checks on kernel-internal inodes:
> the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail.
>
> This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero
> (and MAP_SHARED|MAP_ANONYMOUS).  I thought there were also drivers
> which cloned inode in mmap(), but if so, I cannot locate them now.

This causes a regression for SELinux (please, in the future, cc
selinux list and Paul Moore on SELinux-related changes).  In
particular, this change disables SELinux checking of mprotect
PROT_EXEC on shared anonymous mappings, so we lose the ability to
control executable mappings.  That said, we are only getting that
check today as a side effect of our file execute check on the tmpfs
inode, whereas it would be better (and more consistent with the
mmap-time checks) to apply an execmem check in that case, in which
case we wouldn't care about the inode-based check.  However, I am
unclear on how to correctly detect that situation from
selinux_file_mprotect() -> file_map_prot_check(), because we do have a
non-NULL vma->vm_file so we treat it as a file execute check.  In
contrast, if directly creating an anonymous shared mapping with
PROT_EXEC via mmap(...PROT_EXEC...),  selinux_mmap_file is called with
a NULL file and therefore we end up applying an execmem check.

>
> Reported-and-tested-by: Prarit Bhargava 
> Reported-by: Daniel Wagner 
> Reported-by: Morten Stevens 
> Signed-off-by: Hugh Dickins 
> ---
>
>  mm/shmem.c |8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> --- 4.1-rc7/mm/shmem.c  2015-04-26 19:16:31.352191298 -0700
> +++ linux/mm/shmem.c2015-06-14 09:26:49.461120166 -0700
> @@ -3401,7 +3401,13 @@ int shmem_zero_setup(struct vm_area_stru
> struct file *file;
> loff_t size = vma->vm_end - vma->vm_start;
>
> -   file = shmem_file_setup("dev/zero", size, vma->vm_flags);
> +   /*
> +* Cloning a new file under mmap_sem leads to a lock ordering conflict
> +* between XFS directory reading and selinux: since this file is only
> +* accessible to the user through its mapping, use S_PRIVATE flag to
> +* bypass file security, in the same way as shmem_kernel_file_setup().
> +*/
> +   file = __shmem_file_setup("dev/zero", size, vma->vm_flags, S_PRIVATE);
> if (IS_ERR(file))
> return PTR_ERR(file);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 5/8] kdbus: use LSM hooks in kdbus code

2015-07-08 Thread Stephen Smalley
On 07/08/2015 06:25 AM, Paul Osmialowski wrote:
 Originates from:
 
 https://github.com/lmctl/kdbus.git (branch: kdbus-lsm-v4.for-systemd-v212)
 commit: aa0885489d19be92fa41c6f0a71df28763228a40
 
 Signed-off-by: Karol Lewandowski k.lewando...@samsung.com
 Signed-off-by: Paul Osmialowski p.osmialo...@samsung.com
 ---
  ipc/kdbus/bus.c| 12 ++-
  ipc/kdbus/bus.h|  3 +++
  ipc/kdbus/connection.c | 54 
 ++
  ipc/kdbus/connection.h |  4 
  ipc/kdbus/domain.c |  9 -
  ipc/kdbus/domain.h |  2 ++
  ipc/kdbus/endpoint.c   | 11 ++
  ipc/kdbus/names.c  | 11 ++
  ipc/kdbus/queue.c  | 30 ++--
  9 files changed, 124 insertions(+), 12 deletions(-)
 


 diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c
 index 9993753..b85cdc7 100644
 --- a/ipc/kdbus/connection.c
 +++ b/ipc/kdbus/connection.c
 @@ -31,6 +31,7 @@
  #include linux/slab.h
  #include linux/syscalls.h
  #include linux/uio.h
 +#include linux/security.h
  
  #include bus.h
  #include connection.h
 @@ -73,6 +74,8 @@ static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep 
 *ep, bool privileged,
   bool is_activator;
   bool is_monitor;
   struct kvec kvec;
 + u32 sid, len;
 + char *label;
   int ret;
  
   struct {
 @@ -222,6 +225,14 @@ static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep 
 *ep, bool privileged,
   }
   }
  
 + security_task_getsecid(current, sid);
 + security_secid_to_secctx(sid, label, len);
 + ret = security_kdbus_connect(conn, label, len);
 + if (ret) {
 + ret = -EPERM;
 + goto exit_unref;
 + }

This seems convoluted and expensive.  If you always want the label of
the current task here, then why not just have security_kdbus_connect()
internally extract the label of the current task?

 @@ -1107,6 +1119,12 @@ static int kdbus_conn_reply(struct kdbus_conn *src, 
 struct kdbus_kmsg *kmsg)
   if (ret  0)
   goto exit;
  
 + ret = security_kdbus_talk(src, dst);
 + if (ret) {
 + ret = -EPERM;
 + goto exit;
 + }

Where does kdbus apply its uid-based or other restrictions on
connections?  Why do we need to insert separate hooks into each of these
functions?  Is there no central chokepoint already for permission
checking that we can hook?

 diff --git a/ipc/kdbus/connection.h b/ipc/kdbus/connection.h
 index d1ffe90..1f91d39 100644
 --- a/ipc/kdbus/connection.h
 +++ b/ipc/kdbus/connection.h
 @@ -19,6 +19,7 @@
  #include linux/kref.h
  #include linux/lockdep.h
  #include linux/path.h
 +#include uapi/linux/kdbus.h
  
  #include limits.h
  #include metadata.h
 @@ -73,6 +74,7 @@ struct kdbus_kmsg;
   * @names_queue_list:Well-known names this connection waits for
   * @privileged:  Whether this connection is privileged on the bus
   * @faked_meta:  Whether the metadata was faked on HELLO
 + * @security:LSM security blob
   */
  struct kdbus_conn {
   struct kref kref;
 @@ -113,6 +115,8 @@ struct kdbus_conn {
  
   bool privileged:1;
   bool faked_meta:1;
 +
 + void *security;
  };

Unless I missed it, you may have missed the most important thing of all:
 controlling kdbus's notion of privileged.  kdbus sets privileged to
true if the process has CAP_IPC_OWNER or the process euid matches the
uid of the bus creator, and then it allows those processes to do many
dangerous things, including monitoring all traffic, impersonating
credentials, pids, or seclabel, etc.

I don't believe we should ever permit impersonating seclabel information.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS

2015-07-08 Thread Stephen Smalley
On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins hu...@google.com wrote:
 It appears that, at some point last year, XFS made directory handling
 changes which bring it into lockdep conflict with shmem_zero_setup():
 it is surprising that mmap() can clone an inode while holding mmap_sem,
 but that has been so for many years.

 Since those few lockdep traces that I've seen all implicated selinux,
 I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which
 v3.13's commit c7277090927a (security: shmem: implement kernel private
 shmem inodes) introduced to avoid LSM checks on kernel-internal inodes:
 the mmap(/dev/zero) cloned inode is indeed a kernel-internal detail.

 This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero
 (and MAP_SHARED|MAP_ANONYMOUS).  I thought there were also drivers
 which cloned inode in mmap(), but if so, I cannot locate them now.

This causes a regression for SELinux (please, in the future, cc
selinux list and Paul Moore on SELinux-related changes).  In
particular, this change disables SELinux checking of mprotect
PROT_EXEC on shared anonymous mappings, so we lose the ability to
control executable mappings.  That said, we are only getting that
check today as a side effect of our file execute check on the tmpfs
inode, whereas it would be better (and more consistent with the
mmap-time checks) to apply an execmem check in that case, in which
case we wouldn't care about the inode-based check.  However, I am
unclear on how to correctly detect that situation from
selinux_file_mprotect() - file_map_prot_check(), because we do have a
non-NULL vma-vm_file so we treat it as a file execute check.  In
contrast, if directly creating an anonymous shared mapping with
PROT_EXEC via mmap(...PROT_EXEC...),  selinux_mmap_file is called with
a NULL file and therefore we end up applying an execmem check.


 Reported-and-tested-by: Prarit Bhargava pra...@redhat.com
 Reported-by: Daniel Wagner w...@monom.org
 Reported-by: Morten Stevens mstev...@fedoraproject.org
 Signed-off-by: Hugh Dickins hu...@google.com
 ---

  mm/shmem.c |8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)

 --- 4.1-rc7/mm/shmem.c  2015-04-26 19:16:31.352191298 -0700
 +++ linux/mm/shmem.c2015-06-14 09:26:49.461120166 -0700
 @@ -3401,7 +3401,13 @@ int shmem_zero_setup(struct vm_area_stru
 struct file *file;
 loff_t size = vma-vm_end - vma-vm_start;

 -   file = shmem_file_setup(dev/zero, size, vma-vm_flags);
 +   /*
 +* Cloning a new file under mmap_sem leads to a lock ordering conflict
 +* between XFS directory reading and selinux: since this file is only
 +* accessible to the user through its mapping, use S_PRIVATE flag to
 +* bypass file security, in the same way as shmem_kernel_file_setup().
 +*/
 +   file = __shmem_file_setup(dev/zero, size, vma-vm_flags, S_PRIVATE);
 if (IS_ERR(file))
 return PTR_ERR(file);

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   7   8   9   10   >