Re: [PATCH] audit: return on memory error to avoid null pointer dereference

2018-02-21 Thread Eric Paris
I think if we went back and looked at history we'd see that all of the
code originally had none of the if(!ab) checks after allocation and
they just sorta slowly crept in over time. I prefer this pattern, but
it used to be the opposite everywhere.


On Wed, 2018-02-21 at 19:02 -0500, Paul Moore wrote:
> On Wed, Feb 21, 2018 at 6:49 PM, Paul Moore 
> wrote:
> > On Wed, Feb 21, 2018 at 4:30 AM, Richard Guy Briggs  > > wrote:
> > > If there is a memory allocation error when trying to change an
> > > audit
> > > kernel feature value, the ignored allocation error will trigger a
> > > NULL
> > > pointer dereference oops on subsequent use of that
> > > pointer.  Return
> > > instead.
> > > 
> > > Passes audit-testsuite.
> > > See: https://github.com/linux-audit/audit-kernel/issues/76
> > > Signed-off-by: Richard Guy Briggs 
> > > ---
> > >  kernel/audit.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > 
> > Thanks, merged.
> > 
> > In the future a "[PATCH v2]" prefix would be appreciated for
> > patches
> > like this, it makes things a little easier in my inbox.
> 
> After merging this I went through all the other callers to see if
> they
> suffered the same mistake and everyone except for IMA was checking
> the
> returned pointer for NULL.  Upon looking at the IMA code, and the
> audit code which is called, I realized we are actually "ok" as
> audit_log_task_info(), audit_log_format(), audit_log_end(), and
> others
> all check for a NULL audit_buffer at the very top of the functions.
> I'm going to leave this patch merged, it's a good practice after all,
> but I don't believe that unpatched systems are in any danger of
> oops'ing here.
> 
> > > diff --git a/kernel/audit.c b/kernel/audit.c
> > > index 5c25449..2de74be 100644
> > > --- a/kernel/audit.c
> > > +++ b/kernel/audit.c
> > > @@ -1059,6 +1059,8 @@ static void audit_log_feature_change(int
> > > which, u32 old_feature, u32 new_feature
> > > return;
> > > 
> > > ab = audit_log_start(NULL, GFP_KERNEL,
> > > AUDIT_FEATURE_CHANGE);
> > > +   if (!ab)
> > > +   return;
> > > audit_log_task_info(ab, current);
> > > audit_log_format(ab, " feature=%s old=%u new=%u
> > > old_lock=%u new_lock=%u res=%d",
> > >  audit_feature_names[which],
> > > !!old_feature, !!new_feature,
> > > --
> > > 1.8.3.1
> 
> 


Re: RFC(v2): Audit Kernel Container IDs

2017-12-11 Thread Eric Paris
On Sat, 2017-12-09 at 10:28 -0800, Casey Schaufler wrote:
> On 12/9/2017 2:20 AM, Micka�l Sala�n wrote:

> >  What about automatically create
> > and assign an ID to a process when it enters a namespace different
> > than
> > one of its parent process? This delegates the (permission)
> > responsibility to the use of namespaces (e.g. /proc/sys/user/max_*
> > limit).
> 
> That gets ugly when you have a container that uses user, filesystem,
> network and whatever else namespaces. If all containers used the same
> set of namespaces I think this would be a fine idea, but they don't.
> 
> > One interesting side effect of this approach would be to be able to
> > identify which processes are in the same set of namespaces, even if
> > not
> > spawn from the container but entered after its creation (i.e. using
> > setns), by creating container IDs as a (deterministic) checksum
> > from the
> > /proc/self/ns/* IDs.
> > 
> > Since the concern is to identify a container, I think the ability
> > to
> > audit the switch from one container ID to another is enough. I
> > don't
> > think we need nested IDs.
> 
> Because a container doesn't have to use namespaces to be a container
> you still need a mechanism for a process to declare that it is in
> fact
> in a container, and to identify the container.

I like the idea but I'm still tossing it around in my head (and
thinking about Casey's statement too). Lets say we have a 'docker-like' 
container with pid=100  netns=X,userns=Y,mountns=Z. If I'm on the host
in all init namespaces and I run
  nsenter -t 100 -n ip link set eth0 promisc on
How should this be logged? Did this command run in it's own 'container'
unrelated to the 'docker-like' container?

-Eric


Re: [PATCH 4/4] kernel:audit.c fixed a coding style issue

2017-02-28 Thread Eric Paris
On Tue, 2017-02-28 at 21:49 +, Joan Jani wrote:
> This patch fixes the following checkpath.pl warning
>  WARNING: Block comments use a trailing */ on a separate line
> 
> like
> 
> kernel/audit.c:135: WARNING: Block comments use a trailing */ on a
> separate line
> kernel/audit.c:170: WARNING: Block comments use a trailing */ on a
> separate line
> kernel/audit.c:174: WARNING: Block comments use a trailing */ on a
> separate line
> kernel/audit.c:181: WARNING: Block comments use a trailing */ on a
> 
> and some more style. No changes to code
> 
> Signed-off-by: Joan Jani 
> ---
>  kernel/audit.c | 53 ++
> ---
>  1 file changed, 34 insertions(+), 19 deletions(-)
> 
> diff --git a/kernel/audit.c b/kernel/audit.c
> index e794544..62d90d9 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -70,7 +70,8 @@
>  #include "audit.h"
>  
>  /* No auditing will take place until audit_initialized ==
> AUDIT_INITIALIZED.
> - * (Initialization happens after skb_init is called.) */
> + * (Initialization happens after skb_init is called.)
> + */
>  #define AUDIT_DISABLED   -1
>  #define AUDIT_UNINITIALIZED  0
>  #define AUDIT_INITIALIZED1
> @@ -100,11 +101,13 @@ static __u32audit_nlk_portid;
>  
>  /* If audit_rate_limit is non-zero, limit the rate of sending audit
> records
>   * to that number per second.  This prevents DoS attacks, but
> results in
> - * audit records being dropped. */
> + * audit records being dropped.
> + */

After Linus's polite request around comment style I think I'm going to
have to disagree with these parts of the patch...

http://lkml.iu.edu/hypermail/linux/kernel/1607.1/00627.html




>  static u32   audit_rate_limit;
>  
>  /* Number of outstanding audit_buffers allowed.
> - * When set to zero, this means unlimited. */
> + * When set to zero, this means unlimited.
> + */
>  static u32   audit_backlog_limit = 64;
>  #define AUDIT_BACKLOG_WAIT_TIME (60 * HZ)
>  static u32   audit_backlog_wait_time = AUDIT_BACKLOG_WAIT_TIME;
> @@ -115,7 +118,7 @@ pid_t audit_sig_pid = -1;
>  u32  audit_sig_sid = 0;
>  
>  /* Records can be lost in several ways:
> -   0) [suppressed in audit_alloc]
> + * 0) [suppressed in audit_alloc]
> 1) out of memory in audit_log_start [kmalloc of struct
> audit_buffer]
> 2) out of memory in audit_log_move [alloc_skb]
> 3) suppressed due to audit_rate_limit
> @@ -132,7 +135,8 @@ struct list_head
> audit_inode_hash[AUDIT_INODE_BUCKETS];
>  
>  /* The audit_freelist is a list of pre-allocated audit buffers (if
> more
>   * than AUDIT_MAXFREE are in use, the audit buffer is freed instead
> of
> - * being placed on the freelist). */
> + * being placed on the freelist).
> + */
>  static DEFINE_SPINLOCK(audit_freelist_lock);
>  static int      audit_freelist_count;
>  static LIST_HEAD(audit_freelist);
> @@ -167,18 +171,21 @@ DEFINE_MUTEX(audit_cmd_mutex);
>  
>  /* AUDIT_BUFSIZ is the size of the temporary buffer used for
> formatting
>   * audit records.  Since printk uses a 1024 byte buffer, this buffer
> - * should be at least that large. */
> + * should be at least that large.
> + */
>  #define AUDIT_BUFSIZ 1024
>  
>  /* AUDIT_MAXFREE is the number of empty audit_buffers we keep on the
> - * audit_freelist.  Doing so eliminates many kmalloc/kfree calls. */
> + * audit_freelist.  Doing so eliminates many kmalloc/kfree calls.
> + */
>  #define AUDIT_MAXFREE  (2*NR_CPUS)
>  
>  /* The audit_buffer is used when formatting an audit record.  The
> caller
>   * locks briefly to get the record off the freelist or to allocate
> the
>   * buffer, and locks briefly to send the buffer to the netlink layer
> or
>   * to place it on a transmit queue.  Multiple audit_buffers can be
> in
> - * use simultaneously. */
> + * use simultaneously.
> + */
>  struct audit_buffer {
>   struct list_head list;
>   struct sk_buff   *skb;  /* formatted skb ready to
> send */
> @@ -227,7 +234,8 @@ static inline int audit_rate_check(void)
>   unsigned long   elapsed;
>   int retval     = 0;
>  
> - if (!audit_rate_limit) return 1;
> + if (!audit_rate_limit)
> + return 1;
>  
>   spin_lock_irqsave(&lock, flags);
>   if (++messages < audit_rate_limit) {
> @@ -253,7 +261,7 @@ static inline int audit_rate_check(void)
>   * Emit at least 1 message per second, even if audit_rate_check is
>   * throttling.
>   * Always increment the lost messages counter.
> -*/
> + */
>  void audit_log_lost(const char *message)
>  {
>   static unsigned longlast_msg = 0;
> @@ -350,6 +358,7 @@ static int audit_set_backlog_wait_time(u32
> timeout)
>  static int audit_set_enabled(u32 state)
>  {
>   int rc;
> +
>   if (state > AUDIT_LOCKED)
>   return -EINVAL;
>  
> @@ -402,7 +411,8 @@ static void kauditd_printk_skb(struct sk_buff
> *skb)
>  static void kauditd_hold_skb(struct sk_buff *skb)
>  {
>   /* at this point

Re: [PATCH 00/46] SELinux: Fine-tuning for several function implementations

2017-01-16 Thread Eric Paris


All of the patches look good to me except most of those which change
the handling of `rc=`. I have a personal style preference for

rc = -ENOMEM;
val = kalloc();
if (!val)
  goto err;

vs

val = kalloc();
if (!val) {
  rc = -ENOMEM;
  goto err;
}

because it saves 1 line and I think the compiler does the right/same
thing. If there is preference among the people active in selinux
developers (like I said, I'm now irrelevant) I guess they win.

But certainly a big +1 from me for the array allocation and sizeof()
changes.

-Eric

On Sun, 2017-01-15 at 15:55 +0100, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Sun, 15 Jan 2017 15:15:14 +0100
> 
> Several update suggestions were taken into account
> from static source code analysis.
> 
> Markus Elfring (46):
>   Use kmalloc_array() in cond_init_bool_indexes()
>   Delete an unnecessary return statement in cond_compute_av()
>   Improve size determinations in four functions
>   Use kmalloc_array() in hashtab_create()
>   Adjust four checks for null pointers
>   Use kcalloc() in policydb_index()
>   Delete unnecessary variable assignments in policydb_index()
>   Delete an unnecessary return statement in policydb_destroy()
>   Delete an error message for a failed memory allocation in
> policydb_read()
>   Move some assignments for the variable "rc" in policydb_read()
>   Return directly after a failed next_entry() in genfs_read()
>   Move assignments for two pointers in genfs_read()
>   Move four assignments for the variable "rc" in genfs_read()
>   One function call less in genfs_read() after null pointer detection
>   One check and function call less in genfs_read() after error
> detection
>   Move two assignments for the variable "rc" in filename_trans_read()
>   Delete an unnecessary variable assignment in filename_trans_read()
>   One function call less in filename_trans_read() after error
> detection
>   Return directly after a failed next_entry() in range_read()
>   Move four assignments for the variable "rc" in range_read()
>   Two function calls less in range_read() after error detection
>   Delete an unnecessary variable initialisation in range_read()
>   Move an assignment for a pointer in range_read()
>   Return directly after a failed kzalloc() in cat_read()
>   Return directly after a failed kzalloc() in sens_read()
>   Improve another size determination in sens_read()
>   Move an assignment for the variable "rc" in sens_read()
>   Return directly after a failed kzalloc() in user_read()
>   Return directly after a failed kzalloc() in type_read()
>   Return directly after a failed kzalloc() in role_read()
>   Move an assignment for the variable "rc" in role_read()
>   Return directly after a failed kzalloc() in class_read()
>   Move an assignment for the variable "rc" in class_read()
>   Return directly after a failed kzalloc() in common_read()
>   Return directly after a failed kzalloc() in perm_read()
>   Move an assignment for the variable "rc" in mls_read_range_helper()
>   Move an assignment for the variable "rc" in policydb_load_isids()
>   One function call less in five functions after null pointer
> detection
>   Move two assignments for the variable "rc" in ocontext_read()
>   Return directly after a failed kzalloc() in roles_init()
>   Move two assignments for the variable "rc" in roles_init()
>   One function call less in roles_init() after error detection
>   Use kmalloc_array() in sidtab_init()
>   Adjust two checks for null pointers
>   Use common error handling code in sidtab_insert()
>   Use seq_puts() in sel_avc_stats_seq_show()
> 
>  security/selinux/selinuxfs.c  |   8 +-
>  security/selinux/ss/conditional.c |  14 +--
>  security/selinux/ss/hashtab.c |  10 +-
>  security/selinux/ss/policydb.c| 255 --
> 
>  security/selinux/ss/sidtab.c  |  22 ++--
>  5 files changed, 157 insertions(+), 152 deletions(-)
> 


Re: [PATCH V2] audit: log 32-bit socketcalls

2017-01-13 Thread Eric Paris
On Fri, 2017-01-13 at 10:06 -0500, Richard Guy Briggs wrote:
> On 2017-01-13 09:42, Eric Paris wrote:
> > On Fri, 2017-01-13 at 04:51 -0500, Richard Guy Briggs wrote:


> > > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > > index 9d4443f..43d8003 100644
> > > --- a/include/linux/audit.h
> > > +++ b/include/linux/audit.h
> > > @@ -387,6 +387,18 @@ static inline int audit_socketcall(int
> > > nargs,
> > > unsigned long *args)
> > >   return __audit_socketcall(nargs, args);
> > >   return 0;
> > >  }
> > > +static inline int audit_socketcall_compat(int nargs, u32 *args)
> > > +{
> > > + if (unlikely(!audit_dummy_context())) {
> > 
> > I've always hated these likely/unlikely. Mostly because I think
> > they
> > are so often wrong. I believe this says that you compiled audit in
> > but
> > you expect it to be explicitly disabled. While that is (recently)
> > true
> > in Fedora I highly doubt that's true on the vast majority of
> > systems
> > that have audit compiled in.
> 
> It has been argued that audit should have pretty much no performance
> impact if it is not in use and that if it is, we're willing to take
> the
> more significant overhead of the rest of the code for the sake of one
> test to determine whether or not to follow this code path.

Ok, I can buy that argument. Not sure its where I would have settled,
but it does make sense. I'll obviously defer to Paul on what he wants
out of style. I always assume the compiler is brilliant and write
stupid code but your logic is sound there too.

You can/should pretend I said nothing.


Re: [PATCH V2] audit: log 32-bit socketcalls

2017-01-13 Thread Eric Paris
On Fri, 2017-01-13 at 04:51 -0500, Richard Guy Briggs wrote:
> 32-bit socketcalls were not being logged by audit on x86_64 systems.
> Log them.  This is basically a duplicate of the call from
> net/socket.c:sys_socketcall(), but it addresses the impedance
> mismatch
> between 32-bit userspace process and 64-bit kernel audit.
> 
> See: https://github.com/linux-audit/audit-kernel/issues/14
> 
> Signed-off-by: Richard Guy Briggs 
> 
> --
> v2:
>    Move work to audit_socketcall_compat() and use
> audit_dummy_context().
> ---
>  include/linux/audit.h |   16 
>  net/compat.c  |   15 +--
>  2 files changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index 9d4443f..43d8003 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -387,6 +387,18 @@ static inline int audit_socketcall(int nargs,
> unsigned long *args)
>   return __audit_socketcall(nargs, args);
>   return 0;
>  }
> +static inline int audit_socketcall_compat(int nargs, u32 *args)
> +{
> + if (unlikely(!audit_dummy_context())) {

I've always hated these likely/unlikely. Mostly because I think they
are so often wrong. I believe this says that you compiled audit in but
you expect it to be explicitly disabled. While that is (recently) true
in Fedora I highly doubt that's true on the vast majority of systems
that have audit compiled in.

I think all of the likely/unlikely need to just be abandoned, but at
least don't add more? It certainly wouldn't be the first time I was
wrong, and I haven't profiled it. But the function would definitely
look better if coded

static inline int audit_socketcall_compat(int nargs, u32 *args)
{
if (audit_cummy_context()) {
return 0
}
int i;
unsigned long a[AUDITSC_ARGS];

[...]
}

> + int i;
> + unsigned long a[AUDITSC_ARGS];
> +
> + for (i=0; i + a[i] = (unsigned long)args[i];
> + return __audit_socketcall(nargs, a);
> + }
> + return 0;
> +}
>  static inline int audit_sockaddr(int len, void *addr)
>  {
>   if (unlikely(!audit_dummy_context()))
> @@ -513,6 +525,10 @@ static inline int audit_socketcall(int nargs,
> unsigned long *args)
>  {
>   return 0;
>  }
> +static inline int audit_socketcall_compat(int nargs, u32 *args)
> +{
> + return 0;
> +}
>  static inline void audit_fd_pair(int fd1, int fd2)
>  { }
>  static inline int audit_sockaddr(int len, void *addr)
> diff --git a/net/compat.c b/net/compat.c
> index 1cd2ec0..f0404d4 100644
> --- a/net/compat.c
> +++ b/net/compat.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -781,14 +782,24 @@ COMPAT_SYSCALL_DEFINE5(recvmmsg, int, fd,
> struct compat_mmsghdr __user *, mmsg,
>  
>  COMPAT_SYSCALL_DEFINE2(socketcall, int, call, u32 __user *, args)
>  {
> + unsigned int len;
>   int ret;
> - u32 a[6];
> + u32 a[AUDITSC_ARGS];
>   u32 a0, a1;
>  
>   if (call < SYS_SOCKET || call > SYS_SENDMMSG)
>   return -EINVAL;
> - if (copy_from_user(a, args, nas[call]))
> + len = nas[call];
> + if (len > sizeof(a))
> + return -EINVAL;
> +
> + if (copy_from_user(a, args, len))
>   return -EFAULT;
> +
> + ret = audit_socketcall_compat(len/sizeof(a[0]), a);
> + if (ret)
> + return ret;
> +
>   a0 = a[0];
>   a1 = a[1];
>  


Re: [PATCH] Honor mmap_min_addr with the actual minimum

2016-05-11 Thread Eric Paris
On Wed, 2016-05-11 at 14:54 +0200, Hector Marco-Gisbert wrote:
> 
> El 21/04/16 a las 00:12, Kees Cook escribió:
> > On Tue, Apr 19, 2016 at 11:55 AM, Hector Marco-Gisbert  > v.es> wrote:
> > > > On Wed, Apr 6, 2016 at 12:07 PM, Hector Marco-Gisbert  > > > @upv.es> wrote:
> > > > > The minimum address that a process is allowed to mmap when
> > > > > LSM is
> > > > > enabled is 0x1 (65536). This value is tunable and
> > > > > exported via
> > > > > /proc/sys/vm/mmap_min_addr but it is not honored with the
> > > > > actual
> > > > > minimum value.
> > > > 
> > > > I think this is working as intended already, based on the
> > > > commit log
> > > > for 788084aba2ab7348257597496befcbccabdc98a3
> > > > 
> > > > See cap_mmap_addr (which uses dac_mmap_min_addr) vs SELinux's
> > > > hook
> > > > (which uses CONFIG_LSM_MMAP_MIN_ADDR), and everything else
> > > > (that uses
> > > > mmap_min_addr).
> > > > 
> > > > Without CONFIG_LSM_MMAP_MIN_ADDR, dac_mmap_min_addr always ==
> > > > mmap_min_addr.
> > > > 
> > > > With CONFIG_LSM_MMAP_MIN_ADDR, dac_mmap_min_addr can be less
> > > > than
> > > > mmap_min_addr, but mmap_min_addr will always be at least
> > > > CONFIG_LSM_MMAP_MIN_ADDR.
> > > > 
> > > > Eric may be able to shed more light on this...
> > > > 
> > > > -Kees
> > > 
> > > Ok, I see your point, but it seems that minimum address that a
> > > process is
> > > allowed to map is mmap_min_addr and not dac_mmap_min_addr.
> > > This is because mmap_min_addr can be seen as the
> > > max(dac_mmap_min_addr,
> > > CONFIG_LSM_MMAP_MIN_ADDR) which is correct (the minimum allowed
> > > address) but
> > > /proc/sys/vm/mmap_min_addr contains dac_mmap_min_addr which is
> > > not the minimum.
> > > 
> > > For example, if we set the CONFIG_LSM_MMAP_MIN_ADDR to 65536 and
> > > /proc/sys/vm/mmap_min_addr to 4096, then assuming that
> > > selinux_mmap_addr() has
> > > no permissions (it returns !=0), the minimum allowed address is
> > > 65536 not 4096.
> > > The mmap check is done in the security_mmap_addr(addr) function
> > > in mm/mmap.c
> > > file. It seems that we are exporting the dac_mmap_min_addr
> > > instead of the actual
> > > minimum.
> > > 
> > > Is this behavior intended ? I'm missing something here ?
> 
> Yes, the sysctl is reporting the dac value.
> 
> I think the meaning of the exported mmap_min_addr value was changed
> in the
> commit you pointed. A new variable was added (dac_mmap_min_addr) and
> it was
> replaced in the sysctl of "mmap_min_addr" but the exported name
> (/proc/sys/vm/mmap_min_addr) was not changed:
> 
> .procname = "mmap_min_addr",
> - .data = &mmap_min_addr,
> + .data = &dac_mmap_min_addr,
> 
> This can be confusing since the returned value is not the expected
> one (the
> minimum value according to sysctl/vm.txt) but the dac_mmap_min_addr.
> So, I think
> that If we need to export the dac value then we can do it but it
> would be
> desirable not to change the meaning of this exported value.
> 
> Maybe by renaming /proc/sys/vm/mmap_min_addr to
> /proc/sys/vm/dac_mmap_min_addr
> and adding a read-only /proc/sys/vm/mmap_min_addr ?

This breaks scripts which are currently setting mmap_min_addr (like
wine on ubuntu I think?). Seems like a non-starter.

You're trying to represent multiple values in a single value. It just
isn't possible. You could expose lsm_mmap_min_addr RO in another sysctl
(not sure of other places we expose kernel configs like that, but you
could).

I wouldn't say the meaning of mmap_min_addr changed, we just grew a new
(underdocumented) lsm_mmap_min_addr. mmap_min_addr continued to be
controlled by and controlling exactly the same thing.

dac_mmap_min_addr is controlled by capabilities.
lsm_mmap_min_addr is controlled by your LSM.

You can expose those 2 values. But it would be us to each process to
know how to use them. A process might be able to avoid the dac check
but not the mac check (aka a root process) or a process might be able
to avoid the mac check but not the dac check (wine).

No single value can represent this. The best you could do is expose the
lsm/mac value, but I'm not sure I see the value. All you are doing is
telling exploit authors exactly how high they have to put their nasty
bits...

> 
> If ok, I could send a patch.
> 
> In any case, I think we should update the doc (sysctl/vm.txt).
> 
> All these issue came to light because we are working on a new ASLR
> for userspace
> and for testing it would be easier if we know where the VMA starts
> (this can be
> changed at runtime and it affects to the available entropy).
> 
> 
> Best,
> Hector.
> 
> > 
> > I think it is -- the minimum is correct, it's just that the sysctl
> > may
> > be reporting the dac value. Eric, are you able to chime in on this?
> > 
> > -Kees
> > 
> > > 
> > > Thanks,
> > > Hector.
> > > 
> > > > 
> > > > > 
> > > > > It can be easily checked in a system typing:
> > > > > 
> > > > > $ cat /proc/sys/vm/mmap_min_addr
> > > > > 4096# <= Incorrect, it should be 65536
> > > > > 
> > > >

Re: [PATCH] audit: Don't spam logs with SECCOMP_KILL/RET_ERRNO by default

2016-04-11 Thread Eric Paris
Just an FYI originally the idea was to follow the pattern of logging
set by core dumps see kernel/auditsc.c::audit_core_dumps(). Which is
gated by audit_enable but not anything else. I believe at that time the
only option was kill, which meant, much like the core dumper, spam was
not a likely result given the initiator is killed.

I'm all for a way to shut up unsolicited audit messages, especially
seccomp with errno or trap. I think it would be best to default 'KILL'
to on and everything else to off. I'm no so sure a sysctl is the right
way though. Enabling more forms of 'seccomp audit' should really be a
part of the audit policy.

(p.s. I think the action should be part of the seccomp message, as
right now all we know is that Andi's message isn't KILL since the
sig=0)

-Eric


On Mon, 2016-04-11 at 09:30 -0400, Paul Moore wrote:
> On Mon, Apr 11, 2016 at 12:13 AM, Andi Kleen 
> wrote:
> > 
> > From: Andi Kleen 
> > 
> > When I run chrome on my opensuse system every time I open
> > a new tab the system log is spammed with:
> > 
> > audit[16857]: SECCOMP auid=1000 uid=1000 gid=100 ses=1 pid=16857
> > comm="chrome" exe="/opt/google/chrome/chrome" sig=0 arch=c03e
> > syscall=273 compat=0 ip=0x7fe27c11a444 code=0x5
> > 
> > This happens because chrome uses SECCOMP for its sandbox,
> > and for some reason always reaches a SECCOMP_KILL or more likely
> > SECCOMP_RET_ERRNO in the rule set.
> > 
> > The seccomp auditing was originally added by Eric with
> > 
> > commit 85e7bac33b8d5edafc4e219c7dfdb3d48e0b4e31
> > Author: Eric Paris 
> > Date:   Tue Jan 3 14:23:05 2012 -0500
> > 
> > seccomp: audit abnormal end to a process due to seccomp
> > 
> > The audit system likes to collect information about
> > processes that end
> > abnormally (SIGSEGV) as this may me useful intrusion
> > detection information.
> > This patch adds audit support to collect information when
> > seccomp
> > forces a task to exit because of misbehavior in a similar
> > way.
> > 
> > I don't have any other syscall auditing enabled,
> > just the standard user space auditing used by the systemd
> > and PAM userland. So basic auditing is alwas enabled,
> > but no other kernel auditing.
> > 
> > Add a sysctl to enable this unconditional behavior with default
> > to off. This replaces an earlier patch that simply checked
> > whether syscall auditing was on, but Paul Moore preferred
> > this more elaborate approach.
> > 
> > Signed-off-by: Andi Kleen 
> > ---
> >  Documentation/sysctl/kernel.txt |  9 +
> >  include/linux/audit.h   |  4 +++-
> >  kernel/seccomp.c|  4 
> >  kernel/sysctl.c | 11 +++
> >  4 files changed, 27 insertions(+), 1 deletion(-)
> Quick response as I'm traveling the next few days and
> time/connectivity will be spotty ... thanks for sending an updated
> patch, some initial thoughts:
> 
> * My thinking was that the sysctl knob could be a threshold value
> such
> that setting it to 0x0003 would only log TRAP and KILL.
> * With the sysctl tunable defaulting to no-logging there is no need
> to
> check for audit_enabled, further, checking for audit_enabled would
> prevent logging to dmesg/syslog which I believe is valuable (you may
> not).
> * A bit nitpicky, but considering the possibility of logging to
> dmesg/syslog when auditing is disabled, I think
> "seccomp-log-threshold" or similar would be a better sysctl name.
> 
> > 
> > diff --git a/Documentation/sysctl/kernel.txt
> > b/Documentation/sysctl/kernel.txt
> > index 57653a4..abc6ef9 100644
> > --- a/Documentation/sysctl/kernel.txt
> > +++ b/Documentation/sysctl/kernel.txt
> > @@ -21,6 +21,7 @@ show up in /proc/sys/kernel:
> >  - acct
> >  - acpi_video_flags
> >  - auto_msgmni
> > +- audit_log_seccomp
> >  - bootloader_type   [ X86 only ]
> >  - bootloader_version[ X86 only ]
> >  - callhome  [ S390 only ]
> > @@ -129,6 +130,14 @@ upon memory add/remove or upon ipc namespace
> > creation/removal.
> >  Echoing "1" into this file enabled msgmni automatic recomputing.
> >  Echoing "0" turned it off. auto_msgmni default value was 1.
> > 
> > +==
> > +
> > +audit_log_seccomp
> > +
> > +When this variable is set to 1 every
> > SECCOMP_KILL/SECCOMP_RET_ERRNO
> > +results in an audit log

Re: [PATCH] inotify: hide internal kernel bits from fdinfo

2015-09-21 Thread Eric Paris
Acked-by: Eric Paris 

On Mon, 2015-09-21 at 11:45 -0700, Dave Hansen wrote:
> From: Dave Hansen 
> 
> There was a report that my patch:
> 
>   inotify: actually check for invalid bits in
> sys_inotify_add_watch()
> 
> broke CRIU.
> 
> The reason is that CRIU looks up raw flags in /proc/$pid/fdinfo/*
> to figure out how to rebuild inotify watches and then passes those
> flags directly back in to the inotify API.  One of those flags
> (FS_EVENT_ON_CHILD) is set in mark->mask, but is not part of the
> inotify API.  It is used inside the kernel to _implement_ inotify
> but it is not and has never been part of the API.
> 
> My patch above ensured that we only allow bits which are part of
> the API (IN_ALL_EVENTS).  This broke CRIU.
> 
> FS_EVENT_ON_CHILD is really internal to the kernel.  It is set
> _anyway_ on all inotify marks.  So, CRIU was really just trying
> to set a bit that was already set.
> 
> This patch hides that bit from fdinfo.  CRIU will not see the
> bit, not try to set it, and should work as before.  We should not
> have been exposing this bit in the first place, so this is a good
> patch independent of the CRIU problem.
> 
> Signed-off-by: Dave Hansen 
> Reported-by: Andrey Wagin 
> Cc: Andrew Morton 
> Cc: Cyrill Gorcunov 
> Cc: xe...@parallels.com
> Cc: Eric Paris 
> Cc: j...@johnmccutchan.com
> Cc: rl...@rlove.org
> Cc: linux-kernel@vger.kernel.org
> ---
> 
>  b/fs/notify/fdinfo.c |9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff -puN fs/notify/fdinfo.c~fdinfo-mask fs/notify/fdinfo.c
> --- a/fs/notify/fdinfo.c~fdinfo-mask  2015-09-21
> 10:24:01.031864268 -0700
> +++ b/fs/notify/fdinfo.c  2015-09-21 10:25:04.335723826 -0700
> @@ -82,9 +82,16 @@ static void inotify_fdinfo(struct seq_fi
>   inode_mark = container_of(mark, struct inotify_inode_mark,
> fsn_mark);
>   inode = igrab(mark->inode);
>   if (inode) {
> + /*
> +  * IN_ALL_EVENTS represents all of the mask bits
> +  * that we expose to userspace.  There is at
> +  * least one bit (FS_EVENT_ON_CHILD) which is
> +  * used only internally to the kernel.
> +  */
> + u32 mask = mark->mask & IN_ALL_EVENTS;
>   seq_printf(m, "inotify wd:%x ino:%lx sdev:%x mask:%x
> ignored_mask:%x ",
>  inode_mark->wd, inode->i_ino, inode->i_sb
> ->s_dev,
> -mark->mask, mark->ignored_mask);
> +mask, mark->ignored_mask);
>   show_mark_fhandle(m, inode);
>   seq_putc(m, '\n');
>   iput(inode);
> _
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] inotify: actually check for invalid bits in sys_inotify_add_watch()

2015-09-09 Thread Eric Paris
Looks fine to me. And usually akpm picks them up these days.

On Wed, 2015-09-09 at 14:59 -0700, Dave Hansen wrote:
> On 06/30/2015 10:36 AM, Dave Hansen wrote:
> > From: Dave Hansen 
> > 
> > The comment here says that it is checking for invalid bits.  But,
> > the mask is *actually* checking to ensure that _any_ valid bit
> > is set, which is quite different.
> > 
> > Add the actual check which was intended.  Retain the existing
> > check because it actually does something useful: ensure that some
> > inotify bits are being added to the watch.  Plus, this is
> > existing behavior which would be nice to preserve.
> > 
> > I did a quick sniff test that inotify functions and that my
> > 'inotify-tools' package passes 'make check'.
> 
> Did anybody have any comments on this patch?  Who picks up inotify
> patches?
> 
> >  b/fs/notify/inotify/inotify_user.c |3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff -puN fs/notify/inotify/inotify_user.c~inotify-EINVAL-on
> > -invalid-bit fs/notify/inotify/inotify_user.c
> > --- a/fs/notify/inotify/inotify_user.c~inotify-EINVAL-on-invalid
> > -bit2015-06-26 13:33:30.277219285 -0700
> > +++ b/fs/notify/inotify/inotify_user.c  2015-06-26
> > 13:35:19.026122033 -0700
> > @@ -707,6 +707,9 @@ SYSCALL_DEFINE3(inotify_add_watch, int,
> > unsigned flags = 0;
> >  
> > /* don't allow invalid bits: we don't want flags set */
> > +   if (unlikely(mask & ~ALL_INOTIFY_BITS))
> > +   return -EINVAL;
> > +   /* require at least one valid bit set in the mask */
> > if (unlikely(!(mask & ALL_INOTIFY_BITS)))
> > return -EINVAL;
> >  
> > _
> > 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V1] audit: add warning that an old auditd may be starved out by a new auditd

2015-09-08 Thread Eric Paris
This is already going to be in the audit log, right? We're going to
send a CONFIG_CHANGE record with old_pid == the existing auditd. I bet
it gets delivered to the old auditd.

But why is this a printk(KERN_WARN) ?

On Mon, 2015-09-07 at 12:48 -0400, Richard Guy Briggs wrote:
> Nothing prevents a new auditd starting up and replacing a valid
> audit_pid when an old auditd is still running, effectively starving
> out
> the old auditd since audit_pid no longer points to the old valid
> auditd.
> 
> There isn't an easy way to detect if an old auditd is still running
> on
> the existing audit_pid other than attempting to send a message to see
> if
> it fails.  If no message to auditd has been attempted since auditd
> died
> unnaturally or got killed, audit_pid will still indicate it is alive.
> 
> Signed-off-by: Richard Guy Briggs 
> ---
> Note: Would it be too bold to actually block the registration of a
> new
> auditd if the netlink_getsockbyportid() call succeeded?  Would other
> checks be appropriate?
> 
>  kernel/audit.c |5 +
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 18cdfe2..1fa1e0d 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -872,6 +872,11 @@ static int audit_receive_msg(struct sk_buff
> *skb, struct nlmsghdr *nlh)
>   if (s.mask & AUDIT_STATUS_PID) {
>   int new_pid = s.pid;
>  
> + if (audit_pid && new_pid &&
> +
>  !IS_ERR(netlink_getsockbyportid(audit_sock, audit_nlk_portid)))
> + pr_warn("auditd replaced by new
> auditd before normal shutdown: "
> + "(old)audit_pid=%d
> (by)pid=%d new_pid=%d",
> + audit_pid, pid, new_pid);
>   if ((!new_pid) && (task_tgid_vnr(current) !=
> audit_pid))
>   return -EACCES;
>   if (audit_enabled != AUDIT_OFF)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux Firmware Signing

2015-09-01 Thread Eric Paris
On Mon, 2015-08-31 at 22:52 -0400, Paul Moore wrote:
> On Fri, Aug 28, 2015 at 10:03 PM, Luis R. Rodriguez 
> wrote:
> > On Fri, Aug 28, 2015 at 06:26:05PM -0400, Paul Moore wrote:
> > > On Fri, Aug 28, 2015 at 7:20 AM, Roberts, William C
> > >  wrote:
> > > > Even triggered updates make sense, since you can at least have
> > > > some form of trust
> > > > of where that binary policy came from.
> > > 
> > > It isn't always that simple, see my earlier comments about
> > > customization and manipulation by the policy loading tools.
> > 
> > If the customization of the data is done in kernel then the kernel
> > can *first* verify the file's signature prior to doing any data
> > modification. If userspace does the modification then the signature
> > stuff won't work unless the tool will have access to the MOK and
> > can
> > sign it pre-flight to the kernel selinuxfs.
> 
> Yes, userspace does the modification.
> 
> > > > Huh, not following? Perhaps, I am not following what your
> > > > laying down here.
> > > > 
> > > >  Right now there is no signing on the selinux policy file. We
> > > > should be able
> > > > to just use the firmware signing api's as is (I have not looked
> > > > on linux-next yet)
> > > > to unpack the blob.
> > > 
> > > I haven't looked at the existing fw signing hook in any detail to
> > > be
> > > able to comment on its use as a policy verification hook.  As
> > > long as
> > > we preserve backwards compatibility and don't introduce a new
> > > mechanism/API for loading SELinux policy I doubt I would have any
> > > objections.
> > 
> > You'd just have to implement a permissive model as we are with the
> > fw signing. No radical customizations, except one thing to note is
> > that on the fw signing side of things we're going to have the
> > signature
> > of the file *detached* in separate file. I think what you're
> > alluding
> > to is the issue of where that signature would be stuff in the
> > SELinux
> > policy file and its correct that you'd need to address that. You
> > could
> > just borrow the kernel's model and reader / sucker that strips out
> > the
> > signature. Another possibility would be two files but then I guess
> > you'd need a trigger to annotate both are in place.
> 
> Yes, there are lots of way we could solve the signed policy format
> issue, I just don't have one in mind at this moment.  Also, to be
> honest, there are enough limitations to signing SELinux policies that
> this isn't very high on my personal SELinux priority list.

Hard for me to argue on your priorities.

I will point out for others interested, userspace does usually need to
munge policy. It's typically only needed when the policy on disk is say
v35, the toolchain understands v35+ but the kernel only understands
v34. The userspace tools will downgrade the policy before it loads
shoves in the blob.  If the kernel understands v35 and the policy is
v35 you can (I think) actually use cat to load the policy.

So certainly this is a perfectly reasonable restriction on some
systems, but we have quite often run into user who don't update their
kernel but do update their userspace and any signing would be pretty
much impossible for them...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V6 4/4] audit: avoid double copying the audit_exe path string

2015-07-16 Thread Eric Paris
I have to admit, I'm partial to not merging this (with the other
patches).  Changing object lifetimes in what i seem to remember is long
standing code (auditfilter, not auditexe) seems to me like something we
really would want to be git bisectable, not mushed with an unrelated
feature addition. But it ain't my tree   :)

-Eric

On Thu, 2015-07-16 at 22:01 -0400, Richard Guy Briggs wrote:
> On 15/07/16, Paul Moore wrote:
> > On Tuesday, July 14, 2015 11:50:26 AM Richard Guy Briggs wrote:
> > > Make this interface consistent with watch and filter key, 
> > > avoiding the extra
> > > string copy and simply consume the new string pointer.
> > > 
> > > Signed-off-by: Richard Guy Briggs 
> > > ---
> > >  kernel/audit_exe.c  |8 ++--
> > >  kernel/audit_fsnotify.c |9 +
> > >  kernel/auditfilter.c|2 +-
> > >  3 files changed, 8 insertions(+), 11 deletions(-)
> > 
> > Merge this patch too, there is no reason why these needs to be its 
> > own patch.
> 
> I wanted to keep this patch seperate until it is well understood and
> accepted rather than mix it in.
> 
> I'm fine merging it if you prefer.
> 
> > > diff --git a/kernel/audit_exe.c b/kernel/audit_exe.c
> > > index 75ad4f2..09e4eb4 100644
> > > --- a/kernel/audit_exe.c
> > > +++ b/kernel/audit_exe.c
> > > @@ -27,11 +27,15 @@ int audit_dupe_exe(struct audit_krule *new, 
> > > struct
> > > audit_krule *old) struct audit_fsnotify_mark *audit_mark;
> > >   char *pathname;
> > > 
> > > - pathname = audit_mark_path(old->exe);
> > > + pathname = kstrdup(audit_mark_path(old->exe), 
> > > GFP_KERNEL);
> > > + if (!pathname)
> > > + return -ENOMEM;
> > > 
> > >   audit_mark = audit_alloc_mark(new, pathname, 
> > > strlen(pathname));
> > > - if (IS_ERR(audit_mark))
> > > + if (IS_ERR(audit_mark)) {
> > > + kfree(pathname);
> > >   return PTR_ERR(audit_mark);
> > > + }
> > >   new->exe = audit_mark;
> > > 
> > >   return 0;
> > > diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c
> > > index a4e7b16..e57e08a 100644
> > > --- a/kernel/audit_fsnotify.c
> > > +++ b/kernel/audit_fsnotify.c
> > > @@ -94,7 +94,6 @@ struct audit_fsnotify_mark 
> > > *audit_alloc_mark(struct
> > > audit_krule *krule, char *pa struct dentry *dentry;
> > >   struct inode *inode;
> > >   unsigned long ino;
> > > - char *local_pathname;
> > >   dev_t dev;
> > >   int ret;
> > > 
> > > @@ -115,21 +114,15 @@ struct audit_fsnotify_mark 
> > > *audit_alloc_mark(struct
> > > audit_krule *krule, char *pa ino = dentry->d_inode->i_ino;
> > >   }
> > > 
> > > - audit_mark = ERR_PTR(-ENOMEM);
> > > - local_pathname = kstrdup(pathname, GFP_KERNEL);
> > > - if (!local_pathname)
> > > - goto out;
> > > -
> > >   audit_mark = kzalloc(sizeof(*audit_mark), GFP_KERNEL);
> > >   if (unlikely(!audit_mark)) {
> > > - kfree(local_pathname);
> > >   audit_mark = ERR_PTR(-ENOMEM);
> > >   goto out;
> > >   }
> > > 
> > >   fsnotify_init_mark(&audit_mark->mark, 
> > > audit_fsnotify_free_mark);
> > >   audit_mark->mark.mask = AUDIT_FS_EVENTS;
> > > - audit_mark->path = local_pathname;
> > > + audit_mark->path = pathname;
> > >   audit_mark->ino = ino;
> > >   audit_mark->dev = dev;
> > >   audit_mark->rule = krule;
> > > diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
> > > index f65c97f..f46ed69 100644
> > > --- a/kernel/auditfilter.c
> > > +++ b/kernel/auditfilter.c
> > > @@ -559,8 +559,8 @@ static struct audit_entry 
> > > *audit_data_to_entry(struct
> > > audit_rule_data *data, entry->rule.buflen += f->val;
> > > 
> > >   audit_mark = audit_alloc_mark(&entry
> > > ->rule, str, f->val);
> > > - kfree(str);
> > >   if (IS_ERR(audit_mark)) {
> > > + kfree(str);
> > >   err = PTR_ERR(audit_mark);
> > >   goto exit_free;
> > >   }
> > 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv2][PATCH 1/7] fs: optimize inotify/fsnotify code for unwatched files

2015-06-24 Thread Eric Paris
On Wed, 2015-06-24 at 17:16 -0700, Dave Hansen wrote:
> From: Dave Hansen 
> 
> I have a _tiny_ microbenchmark that sits in a loop and writes
> single bytes to a file.  Writing one byte to a tmpfs file is
> around 2x slower than reading one byte from a file, which is a
> _bit_ more than I expecte.  This is a dumb benchmark, but I think
> it's hard to deny that write() is a hot path and we should avoid
> unnecessary overhead there.
> 
> I did a 'perf record' of 30-second samples of read and write.
> The top item in a diffprofile is srcu_read_lock() from
> fsnotify().  There are active inotify fd's from systemd, but
> nothing is actually listening to the file or its part of
> the filesystem.
> 
> I *think* we can avoid taking the srcu_read_lock() for the
> common case where there are no actual marks on the file.
> This means that there will both be nothing to notify for
> *and* implies that there is no need for clearing the ignore
> mask.
> 
> This patch gave a 13.8% speedup in writes/second on my test,
> which is an improvement from the 10.8% that I saw with the
> last version.
> 
> Signed-off-by: Dave Hansen 
> Cc: Andrew Morton 
> Cc: Jan Kara 
> Cc: Al Viro 
> Cc: Eric Paris 
> Cc: John McCutchan 
> Cc: Robert Love 
> Cc: Tim Chen 
> Cc: Andi Kleen 
> Cc: linux-kernel@vger.kernel.org
> ---
> 
>  b/fs/notify/fsnotify.c |   10 ++
>  1 file changed, 10 insertions(+)
> 
> diff -puN fs/notify/fsnotify.c~optimize-fsnotify fs/notify/fsnotify.c
> --- a/fs/notify/fsnotify.c~optimize-fsnotify  2015-06-24 
> 17:14:34.573109264 -0700
> +++ b/fs/notify/fsnotify.c2015-06-24 17:14:34.576109399 -0700
> @@ -213,6 +213,16 @@ int fsnotify(struct inode *to_tell, __u3
>   !(test_mask & to_tell->i_fsnotify_mask) &&
>   !(mnt && test_mask & mnt->mnt_fsnotify_mask))
>   return 0;
> + /*
> +  * Optimization: srcu_read_lock() has a memory barrier which 
> can
> +  * be expensive.  It protects walking the *_fsnotify_marks 
> lists.
> +  * However, if we do not walk the lists, we do not have to 
> do
> +  * SRCU because we have no references to any objects and do 
> not
> +  * need SRCU to keep them "alive".
> +  */
> + if (!to_tell->i_fsnotify_marks.first &&
> + (!mnt || !mnt->mnt_fsnotify_marks.first))
> + return 0;

two useless peeps from the old peanut gallery of long lost

1) should you actually move this check up before the IN_MODIFY check?
This seems like it would be by far the most common case, and you'd save
yourself a bunch of useless conditionals/bit operations.

2) do you want to use hlist_empty(&to_tell->i_fsnotify_marks) instead,
for readability (and they are static inline, so compiled code is the
same)

It is fine as it is. Don't know how much you want to try to bikeshed...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] selinux: reduce locking overhead in inode_free_security()

2015-06-13 Thread Eric Paris
On Sat, 2015-06-13 at 10:35 +0300, Yury wrote:
> 
> On 13.06.2015 01:35, Waiman Long wrote:
> > On 06/12/2015 08:31 AM, Stephen Smalley wrote:
> > > On 06/12/2015 02:26 AM, Raghavendra K T wrote:
> > > > On 06/12/2015 03:01 AM, Waiman Long wrote:
> > > > > The inode_free_security() function just took the superblock's 
> > > > > 
> > > > > isec_lock
> > > > > before checking and trying to remove the inode security 
> > > > > struct from 
> > > > > the
> > > > > linked list. In many cases, the list was empty and so the 
> > > > > lock taking
> > > > > is wasteful as no useful work is done. On multi-socket 
> > > > > systems with
> > > > > a large number of CPUs, there can also be a fair amount of 
> > > > > spinlock
> > > > > contention on the isec_lock if many tasks are exiting at the 
> > > > > same 
> > > > > time.
> > > > > 
> > > > > This patch changes the code to check the state of the list 
> > > > > first
> > > > > before taking the lock and attempting to dequeue it. As this 
> > > > > function
> > > > > is called indirectly from __destroy_inode(), there can't be 
> > > > > another
> > > > > instance of inode_free_security() running on the same inode.
> > > > > 
> > > > > Signed-off-by: Waiman Long
> > > > > ---
> > > > >security/selinux/hooks.c |   15 ---
> > > > >1 files changed, 12 insertions(+), 3 deletions(-)
> > > > > 
> > > > > v1->v2:
> > > > >- Take out the second list_empty() test inside the lock.
> > > > > 
> > > > > diff --git a/security/selinux/hooks.c 
> > > > > b/security/selinux/hooks.c
> > > > > index 7dade28..e5cdad7 100644
> > > > > --- a/security/selinux/hooks.c
> > > > > +++ b/security/selinux/hooks.c
> > > > > @@ -254,10 +254,19 @@ static void inode_free_security(struct 
> > > > > inode
> > > > > *inode)
> > > > >struct inode_security_struct *isec = inode
> > > > > ->i_security;
> > > > >struct superblock_security_struct *sbsec = 
> > > > > inode->i_sb->s_security;
> > > > > 
> > > > > -spin_lock(&sbsec->isec_lock);
> > > > > -if (!list_empty(&isec->list))
> > > > > +/*
> > > > > + * As not all inode security structures are in a list, 
> > > > > we 
> > > > > check for
> > > > > + * empty list outside of the lock to make sure that we 
> > > > > won't 
> > > > > waste
> > > > > + * time taking a lock doing nothing. As 
> > > > > inode_free_security() is
> > > > > + * being called indirectly from __destroy_inode(), there 
> > > > > is no 
> > > > > way
> > > > > + * there can be two or more concurrent calls. So doing 
> > > > > the
> > > > > list_empty()
> > > > > + * test outside the loop should be safe.
> > > > > + */
> > > > > +if (!list_empty(&isec->list)) {
> > > > > +spin_lock(&sbsec->isec_lock);
> > > > >list_del_init(&isec->list);
> > > > Stupid question,
> > > > 
> > > > I need to take a look at list_del_init() code, but it can so 
> > > > happen 
> > > > that
> > > > if !list_empty() check could happen simultaneously, then 
> > > > serially two
> > > > list_del_init() can happen.
> > > > 
> > > > is that not a problem()?
> > > Hmm...I suppose that's possible (sb_finish_set_opts and
> > > inode_free_security could both perform the list_del_init).  Ok, 
> > > we'll
> > > stay with the first version.
> > > 
> > 
> > Actually, list_del_init() can be applied twice with no harm being 
> > done. The first list_del_init() will set list-> next = list->prev = 
> > 
> > list. The second one will do the same thing and so it should be 
> > safe.
> > 
> > Cheers,
> > Longman
> > 
> 
> Hello, Waiman!
> 
> At first, minor.
> For me, moving the line 'if (!list_empty(&isec->list))' out of lock 
> is 
> not possible just because 'inode_free_security' is called from 
> '__destroy_inode' only. You cannot rely on it in future. It's rather 
> possible because empty list is invariant under 'list_del_init', as 
> you 
> noted here. In fact, you can call 'list_del_init' unconditionally 
> here, 
> and condition is the only optimization to decrease lock contention. 
> So, 
> I'd like to ask you reflect it in your comment.
> 
> At second, less minor.
> Now that you access list element outside of the lock, why don't you 
> use 
> 'list_empty_careful' instead of 'list_empty'? It may eliminate 
> possible 
> race between, say, 'list_add' and 'list_empty', and costs you 
> virtually 
> nothing.

Agree, the comment isn't really accurate. list_empty() outside of the
lock is safe because there is only one place one can ever get onto the
list. If you are already off (as most inodes will be!) the lock and
remove would be completely useless.

list_empty_careful() is not safe against list_add().

http://marc.info/?l=git-commits-head&m=107277005829348

I'm not even really sure what it is safe/useful for, but the comment
does seem like it would be fine for our case. I guess it might be
appropriate with the other task calling list_del_init().  In this case,
I don't believe we care to sync at all (especially si

Re: [PATCH v2] selinux: reduce locking overhead in inode_free_security()

2015-06-12 Thread Eric Paris
On Fri, 2015-06-12 at 08:31 -0400, Stephen Smalley wrote:
> On 06/12/2015 02:26 AM, Raghavendra K T wrote:
> > On 06/12/2015 03:01 AM, Waiman Long wrote:
> > > The inode_free_security() function just took the superblock's 
> > > isec_lock
> > > before checking and trying to remove the inode security struct 
> > > from the
> > > linked list. In many cases, the list was empty and so the lock 
> > > taking
> > > is wasteful as no useful work is done. On multi-socket systems 
> > > with
> > > a large number of CPUs, there can also be a fair amount of 
> > > spinlock
> > > contention on the isec_lock if many tasks are exiting at the same 
> > > time.
> > > 
> > > This patch changes the code to check the state of the list first
> > > before taking the lock and attempting to dequeue it. As this 
> > > function
> > > is called indirectly from __destroy_inode(), there can't be 
> > > another
> > > instance of inode_free_security() running on the same inode.
> > > 
> > > Signed-off-by: Waiman Long 
> > > ---
> > >   security/selinux/hooks.c |   15 ---
> > >   1 files changed, 12 insertions(+), 3 deletions(-)
> > > 
> > > v1->v2:
> > >   - Take out the second list_empty() test inside the lock.
> > > 
> > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > > index 7dade28..e5cdad7 100644
> > > --- a/security/selinux/hooks.c
> > > +++ b/security/selinux/hooks.c
> > > @@ -254,10 +254,19 @@ static void inode_free_security(struct 
> > > inode
> > > *inode)
> > >   struct inode_security_struct *isec = inode->i_security;
> > >   struct superblock_security_struct *sbsec = inode->i_sb
> > > ->s_security;
> > > 
> > > -spin_lock(&sbsec->isec_lock);
> > > -if (!list_empty(&isec->list))
> > > +/*
> > > + * As not all inode security structures are in a list, we 
> > > check for
> > > + * empty list outside of the lock to make sure that we won't 
> > > waste
> > > + * time taking a lock doing nothing. As 
> > > inode_free_security() is
> > > + * being called indirectly from __destroy_inode(), there is 
> > > no way
> > > + * there can be two or more concurrent calls. So doing the
> > > list_empty()
> > > + * test outside the loop should be safe.
> > > + */
> > > +if (!list_empty(&isec->list)) {
> > > +spin_lock(&sbsec->isec_lock);
> > >   list_del_init(&isec->list);
> > 
> > Stupid question,
> > 
> > I need to take a look at list_del_init() code, but it can so happen 
> > that
> > if !list_empty() check could happen simultaneously, then serially 
> > two
> > list_del_init() can happen.
> > 
> > is that not a problem()?
> 
> Hmm...I suppose that's possible (sb_finish_set_opts and
> inode_free_security could both perform the list_del_init).  Ok, we'll
> stay with the first version.

Wait, can't you list_del_init() an already list_del_init'd object.
Isn't that a big difference between list_del() and list_del_init() ?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Selinux/hooks.c: Fix a NULL pointer dereference caused by semop()

2015-01-20 Thread Eric Paris
What kernel version was this?  Didn't we have this problem and solve it
upstream some time ago? IPC could be allocated with a valid security
context, the ipc would be freed.  the isec was free'd syncronously, but
then the ipc could stick around until some rcu period or some usage flag
got to 0, then it got freed...

this seems so familiar, but it was a while ago

On Tue, 2015-01-20 at 16:01 -0500, Stephen Smalley wrote:
> On 01/20/2015 01:49 PM, Manfred Spraul wrote:
> > Hi,
> > 
> > On 01/20/2015 03:10 PM, Stephen Smalley wrote:
> >> On 01/20/2015 04:18 AM, Ethan Zhao wrote:
> >>> A NULL pointer dereference was observed as following panic:
> >>>
> >>> BUG: unable to handle kernel NULL pointer dereference at (null)
> >>> IP: [] ipc_has_perm+0x4b/0x60
> >>> ...
> >>> Process opcmon (pid: 30712, threadinfo 880237f2a000,
> >>> task 88022ac70e40)
> >>> Stack:
> >>> 880237f2bc04 01020953 880237f2bce8
> >>> 8125818e
> >>> 0001 37f78004 880237f2bcd8
> >>> 81273619
> >>> 880237f2bce8 8126e3e6 880237f2bf68
> >>> 8125c206
> >>> Call Trace:
> >>> [] ? ipcperms+0xae/0x110
> >>> [] selinux_sem_semop+0x19/0x20
> >>> [] security_sem_semop+0x16/0x20
> >>> [] sys_semtimedop+0x346/0x750
> >>> [] ? handle_pte_fault+0x1dc/0x200
> >>> [] ? __do_page_fault+0x280/0x500
> >>> [] ? __lock_release+0x90/0x1b0
> >>> [] ? __do_page_fault+0x280/0x500
> >>> [] ? up_read+0x23/0x40
> >>> [] ? __do_page_fault+0x280/0x500
> >>> [] ? might_fault+0x5c/0xb0
> >>> [] ? sys_newuname+0x66/0xf0
> >>> [] ? __lock_release+0x90/0x1b0
> >>> [] ? sys_newuname+0x66/0xf0
> >>> [] ? sysret_check+0x22/0x5d
> >>> [] sys_semop+0x10/0x20
> >>> [] system_call_fastpath+0x16/0x1b
> >>> Code: b8 00 00 48 8b 80 48 06 00 00 41 8b 54 24 40 4c 8d
> >>> 45 d0 89 d9 45 31 c9 48 8b 40 70 8b 78 04 49 8b 44 24 60 c6 45 d0 04
> >>> 89 55 d8
> >>> <0f> b7 10 8b 70 04 e8 0a dc ff ff 48 83 c4 20 5b 41 5c c9 c3 90
> >>> RIP  [] ipc_has_perm+0x4b/0x60
> >>> RSP 
> >>> CR2: 
> >>>
> >>> The root cause is semtimedop() was called after semget() without
> >>> checking its
> >>> return value in process opcmon. and semget() failed to check
> >>> permission in
> >>> function avc_has_perm() then sem_perm->security was freed shown as
> >>> following:
> >>>
> >>>   sys_semget()
> >>>   ->newary()
> >>>->security_sem_alloc()
> >>>  ->sem_alloc_security()
> >>>selinux_sem_alloc_security()
> >>>->ipc_alloc_security() {
> >>>  ->rc = avc_has_perm()
> >>> if (rc) {
> >>> ipc_free_security(&sma->sem_perm);
> >>> return rc;
> >> We free the security structure here to avoid a memory leak on a
> >> failed/denied semaphore set creation.  In this situation, we return an
> >> error to the caller (ultimately to newary), it does an
> >> ipc_rcu_putref(sma, ipc_rcu_free), and it returns an error to the
> >> caller.  Thus, it never calls ipc_addid() and the semaphore set is not
> >> created.  So how then can you call semtimedop() on it?
> > My only idea would be a race of semtimedop() with IPC_RMID:
> > If a rcu grace period happens between sem_obtain_object_check() and the
> > ipc_has_perm() call, the the observed NULL pointer assignment would happen.
> 
> We only free and clear the ipc_perms->security field on a failure during
> newary() -> security_sem_alloc(), in which case we fail with an error
> before the ipc_addid() call has occurred, or during sem_rcu_free() ->
> security_sem_free() just prior to calling ipc_rcu_free().   So I don't
> see how ipc_perms->security can be NULL in ipc_has_perm().  We could rcu
> free the ipc_perms->security field but I don't see why that would be
> correct/necessary.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: fanotify bug on gdb -- hard crash

2014-12-30 Thread Eric Paris
On Mon, 2014-12-29 at 13:06 +0800, ivo welch wrote:
> thank you, eric.  will do.  I read up on it above and now understand it 
> better.

Great let us know if it keeps giving you trouble!

> the example in the man page seems somewhat misfortunate.  I would use
> an example that does not, by default, lock up the user system.
> (perhaps add a second example with the _PERM feature that shows how it
> responds.)

The link you gave does respond and allow permissions:

   if (metadata->fd >= 0) {

   /* Handle open permission event */

   if (metadata->mask & FAN_OPEN_PERM) {
   printf("FAN_OPEN_PERM: ");

   /* Allow file to be opened */

   response.fd = metadata->fd;
   response.response = FAN_ALLOW;
   write(fd, &response,
 sizeof(struct fanotify_response));
   }

That's the key bit of the example...  If you use gdb and never get to
there, you are in a bit of trouble, I agree!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: fanotify bug on gdb -- hard crash

2014-12-28 Thread Eric Paris
Change FAN_OPEN_PERM to FAN_OPEN

If you have any more deadlocks, please let us know. Once you understand
the difference between the two let us know if there are any more
problems...

-Eric

On Mon, 2014-12-29 at 08:13 +0800, ivo welch wrote:
> 
> 
> I really don't know what I am doing.  however, the code is really not
> mine, but verbatim from the man-page example,
> http://man7.org/linux/man-pages/man7/fanotify.7.html .  I had more
> code below my code, but had whittled down the example to illustrate
> where my system locks up.
> 
> 
> I wonder if there could be safeguards in the call to avoid crashing
> the system.  I know fanotify is playing with fire, but should it
> incapacitate the system in this way?
> 
> 
> in the end, all I want to do is log each and every file-open operation
> asap.  I want to do read-only probing.  could someone please point me
> to a correct example or facility if the manpage is wrong.
> 
> 
> /iaw
> 
> 
> 
> 
> Ivo Welch (ivo.we...@gmail.com)
> http://www.ivo-welch.info/
> J. Fred Weston Professor of Finance
> Anderson School at UCLA, C519
> Director, UCLA Anderson Fink Center for Finance and Investments
> Free Finance Textbook, http://book.ivo-welch.info/
> Editor, Critical Finance Review,
> http://www.critical-finance-review.org/
>  
> 
> On Mon, Dec 29, 2014 at 7:13 AM, Eric Paris  wrote:
> Why are you setting FAN_OPEN_PERM and then not responding to
> perm
> requests? Of course the system is going to appear locked,
> until you
> start responding to open events, remove that mark, or close
> the fanotify
> fd...
> 
> -Eric
> 
> On Fri, 2014-12-26 at 19:40 +0100, Heinrich Schuchardt wrote:
> > Hello Ivo,
> >
> > On 26.12.2014 15:45, ivo welch wrote:
> > > I am not a kernel developer, so forgive the intrusion.
> > >
> > > I suspect I have found either a bug in gdb (less likely)
> or a bug in
> > > fanotify (more likely).  it is replicable, and the code is
> almost
> > > unchanged from the example in the fanotify man page.  to
> trigger it,
> > > all an su needs to do is to step through the program below
> with gdb
> > > 7.8.1 'n' command, and linux locks up the system pretty
> hard (reboot
> > > required).  I have confirmed the replicability of this
> issue on a
> > > clean arch 2014.12.01 3.17.6-1 system and on a clean
> ubuntu 14.10
> > > system, both VMs created just to check it.  /iaw
> > >
> > >
> > > #define _GNU_SOURCE /* Needed to get O_LARGEFILE
> definition */
> > > #include 
> > > #include 
> > > #include 
> > > #include 
> > > #include 
> > > #include 
> > >
> > > int main(int argc, char *argv[]) {
> > >int fd;
> > >fd = fanotify_init(FAN_CLOEXEC | FAN_CLASS_CONTENT |
> FAN_NONBLOCK,
> > > O_RDONLY | O_LARGEFILE);
> > >if (fd == -1) exit(1);
> > >fprintf(stderr, "calling fanotify_mark: fd=%d\n", fd);
> > >if (fanotify_mark(fd, FAN_MARK_ADD | FAN_MARK_MOUNT,
> FAN_OPEN_PERM |
> > > FAN_CLOSE_WRITE, -1, "/") == -1) exit(2);
> > >fprintf(stderr, "in gdb step through with 'n' for
> repeat.\n");
> > >fprintf(stderr, "  (and sometimes otherwise), a ^C
> works, but a ^Z
> > > and then ^C does not.\n");
> > > }
> >
> > I was not able to reproduce your problem according to your
> description
> > with Ubuntu 14.10.
> >
> > I ran a Ubuntu 14.04 amd64 LiveImage in a VM and compiled
> your example with
> > gcc -g -o test test.c
> >
> > The gdb version in Ubuntu 14.10 is 7.4 and not 7.8.1. The
> kernel version
> > is 3.13.
> >
> > root@ubuntu:/home/ubuntu/temp# gdb ./test
> > GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
> > Copyright (C) 2012 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later
> > <http://gnu.org/licenses/gpl.html>
> > This is free soft

Re: fanotify bug on gdb -- hard crash

2014-12-28 Thread Eric Paris
Why are you setting FAN_OPEN_PERM and then not responding to perm
requests? Of course the system is going to appear locked, until you
start responding to open events, remove that mark, or close the fanotify
fd...

-Eric

On Fri, 2014-12-26 at 19:40 +0100, Heinrich Schuchardt wrote:
> Hello Ivo,
> 
> On 26.12.2014 15:45, ivo welch wrote:
> > I am not a kernel developer, so forgive the intrusion.
> >
> > I suspect I have found either a bug in gdb (less likely) or a bug in
> > fanotify (more likely).  it is replicable, and the code is almost
> > unchanged from the example in the fanotify man page.  to trigger it,
> > all an su needs to do is to step through the program below with gdb
> > 7.8.1 'n' command, and linux locks up the system pretty hard (reboot
> > required).  I have confirmed the replicability of this issue on a
> > clean arch 2014.12.01 3.17.6-1 system and on a clean ubuntu 14.10
> > system, both VMs created just to check it.  /iaw
> >
> >
> > #define _GNU_SOURCE /* Needed to get O_LARGEFILE definition */
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> >
> > int main(int argc, char *argv[]) {
> >int fd;
> >fd = fanotify_init(FAN_CLOEXEC | FAN_CLASS_CONTENT | FAN_NONBLOCK,
> > O_RDONLY | O_LARGEFILE);
> >if (fd == -1) exit(1);
> >fprintf(stderr, "calling fanotify_mark: fd=%d\n", fd);
> >if (fanotify_mark(fd, FAN_MARK_ADD | FAN_MARK_MOUNT, FAN_OPEN_PERM |
> > FAN_CLOSE_WRITE, -1, "/") == -1) exit(2);
> >fprintf(stderr, "in gdb step through with 'n' for repeat.\n");
> >fprintf(stderr, "  (and sometimes otherwise), a ^C works, but a ^Z
> > and then ^C does not.\n");
> > }
> 
> I was not able to reproduce your problem according to your description 
> with Ubuntu 14.10.
> 
> I ran a Ubuntu 14.04 amd64 LiveImage in a VM and compiled your example with
> gcc -g -o test test.c
> 
> The gdb version in Ubuntu 14.10 is 7.4 and not 7.8.1. The kernel version 
> is 3.13.
> 
> root@ubuntu:/home/ubuntu/temp# gdb ./test
> GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
> Copyright (C) 2012 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> 
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> For bug reporting instructions, please see:
> ...
> Reading symbols from /home/ubuntu/temp/test...done.
> (gdb) break main
> Breakpoint 1 at 0x400693: file test.c, line 10.
> (gdb) run
> Starting program: /home/ubuntu/temp/test
> warning: no loadable sections found in added symbol-file system-supplied 
> DSO at 0x77ffa000
> 
> Breakpoint 1, main (argc=1, argv=0x7fffe638) at test.c:10
> 10fd = fanotify_init(FAN_CLOEXEC | FAN_CLASS_CONTENT | FAN_NONBLOCK,
> (gdb) n
> 12if (fd == -1) exit(1);
> (gdb) n
> 13fprintf(stderr, "calling fanotify_mark: fd=%d\n", fd);
> (gdb) n
> calling fanotify_mark: fd=7
> 14if (fanotify_mark(fd, FAN_MARK_ADD | FAN_MARK_MOUNT, 
> FAN_OPEN_PERM |
> (gdb) n
> 16fprintf(stderr, "in gdb step through with 'n' for repeat.\n");
> (gdb) n
> in gdb step through with 'n' for repeat.
> 17fprintf(stderr, "  (and sometimes otherwise), a ^C works, but 
> a ^Z and then ^C does not.\n");
> (gdb) n
>(and sometimes otherwise), a ^C works, but a ^Z and then ^C does not.
> 18  }
> (gdb) n
> 0x77a3b78d in __libc_start_main () from 
> /lib/x86_64-linux-gnu/libc.so.6
> (gdb) n
> Single stepping until exit from function __libc_start_main,
> which has no line number information.
> [Inferior 1 (process 4423) exited with code 0110]
> (gdb) n
> The program is not being run.
> (gdb) q
> root@ubuntu:/home/ubuntu/temp#
> 
> >
> > I don't know who else to tell this.  I hope this report is useful, if
> > someone competent can confirm it.  /iaw
> 
> Bug reports for the Linux kernel should be adressed to the maintainer. 
> You can find him in the MAINTAINERS file of the linux source.
> 
> See
> https://www.kernel.org/pub/linux/docs/lkml/reporting-bugs.html
> https://bugzilla.kernel.org/
> 
> Before reporting a bug it is worthwhile to check if the problem also 
> occurs with the current kernel version (as of today 3.18.1 or 3.19-rc1).
> 
>  > PS: Is there an alternative to fanotify to avoid this?  I want to
>  > learn of all file-open requests on a ro device.
>  > 
> 
> The fanotify API is the right choice. Inotify is an alternative but 
> requires marking all directories.
> 
> For your task you can use the code provided at
> https://launchpad.net/fatrace
> 
> Best regards
> 
> Heinrich Schuchardt


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the 

Re: linux-next 20141216 BUG: sleeping function called from invalid context at mm/slab.c:2849

2014-12-18 Thread Eric Paris
On Thu, 2014-12-18 at 13:44 -0500, Richard Guy Briggs wrote:
> On 14/12/18, Eric Paris wrote:
> > On Thu, 2014-12-18 at 12:46 -0500, Richard Guy Briggs wrote:
> > > On 14/12/18, Eric Paris wrote:
> > > > On Thu, 2014-12-18 at 11:45 -0500, valdis.kletni...@vt.edu wrote:
> > > > > On Tue, 16 Dec 2014 20:09:54 -0500, Valdis Kletnieks said:
> > > > > > Spotted these two while booting single-user on 20141216.  20141208
> > > > > > doesn't throw these, so it's something in the last week or so..
> > > > > 
> > > > > Gaah!  Turns out that 20141208 *is* susceptible - it had been booting
> > > > > just fine for several days, but it went around the bend, apparently 
> > > > > due
> > > > > to a userspace or initrd change.
> > > > 
> > > > $5 says you updated systemd?
> > > > 
> > > > Richard?
> > > 
> > > Ok, so if you are correct, then either we justify dropping the lock (I
> > > assume the one commone to both BUG reports [sig->cred_guard_mutex] ),
> > > or we make yet another queue were were hoping to avoid...
> > > 
> > > It would also be good to narrow it down to a rule that triggers this.
> > 
> > I thought the first message was enough to find the problem, but:
> > 
> > static void kauditd_send_multicast_skb(struct sk_buff *skb)
> > {
> > ...
> > nlmsg_multicast(sock, copy, 0, AUDIT_NLGRP_READLOG, GFP_KERNEL);
> > ...
> > }
> > 
> > Since kauditd_send_multicast_skb() gets called in audit_log_end(), which
> > can come from any context (aka even a sleeping context) you can't use
> > GFP_KERNEL.  The audit_buffer know what context it should use.  So pass
> > that down and use that.
> 
> Ok, that looks more obvious now...  We just need to change the internal
> interface to kauditd_send_multicast_skb() to accept an audit_buffer
> instead of just the skb and use the gfp_mask value from there instead of
> using our own...
> 
> Thanks, Eric.

I'd suggest just sending the GFP type, not the who audit_buffer, but
that's up to you.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next 20141216 BUG: sleeping function called from invalid context at mm/slab.c:2849

2014-12-18 Thread Eric Paris
On Thu, 2014-12-18 at 12:46 -0500, Richard Guy Briggs wrote:
> On 14/12/18, Eric Paris wrote:
> > On Thu, 2014-12-18 at 11:45 -0500, valdis.kletni...@vt.edu wrote:
> > > On Tue, 16 Dec 2014 20:09:54 -0500, Valdis Kletnieks said:
> > > > Spotted these two while booting single-user on 20141216.  20141208
> > > > doesn't throw these, so it's something in the last week or so..
> > > 
> > > Gaah!  Turns out that 20141208 *is* susceptible - it had been booting
> > > just fine for several days, but it went around the bend, apparently due
> > > to a userspace or initrd change.
> > 
> > $5 says you updated systemd?
> > 
> > Richard?
> 
> Ok, so if you are correct, then either we justify dropping the lock (I
> assume the one commone to both BUG reports [sig->cred_guard_mutex] ),
> or we make yet another queue were were hoping to avoid...
> 
> It would also be good to narrow it down to a rule that triggers this.

I thought the first message was enough to find the problem, but:

static void kauditd_send_multicast_skb(struct sk_buff *skb)
{
...
nlmsg_multicast(sock, copy, 0, AUDIT_NLGRP_READLOG, GFP_KERNEL);
...
}

Since kauditd_send_multicast_skb() gets called in audit_log_end(), which
can come from any context (aka even a sleeping context) you can't use
GFP_KERNEL.  The audit_buffer know what context it should use.  So pass
that down and use that.

-Eric

> 
> > > egrep 'BUG|Linux vers' from my syslog:
> > > 
> > > Dec  9 12:19:53 turing-police kernel: [0.00] Linux version 
> > > 3.18.0-next-20141208 (sou...@turing-police.cc.vt.edu) (gcc version 4.9.2 
> > > 20141101 (Red Hat 4.9.2-1) (GCC) ) #27 SMP PREEMPT Mon Dec 8 22:20:07 EST 
> > > 2014
> ...
> > > Dec 12 19:42:30 turing-police kernel: [0.00] Linux version 
> > > 3.18.0-next-20141208 (sou...@turing-police.cc.vt.edu) (gcc version 4.9.2 
> > > 20141101 (Red Hat 4.9.2-1) (GCC) ) #27 SMP PREEMPT Mon Dec 8 22:20:07 EST 
> > > 2014
> > > Dec 12 20:00:39 turing-police kernel: [ 1109.635328] BUG: sleeping 
> > > function called from invalid context at mm/slab.c:2849
> ...
> > > Dec 12 20:42:47 turing-police kernel: [ 3633.863552] BUG: sleeping 
> > > function called from invalid context at mm/slab.c:2849
> > > Dec 12 20:51:33 turing-police kernel: [0.00] Linux version 
> > > 3.18.0-next-20141208 (sou...@turing-police.cc.vt.edu) (gcc version 4.9.2 
> > > 20141101 (Red Hat 4.9.2-1) (GCC) ) #27 SMP PREEMPT Mon Dec 8 22:20:07 EST 
> > > 2014
> > > Dec 12 21:51:04 turing-police kernel: [ 3587.132867] BUG: sleeping 
> > > function called from invalid context at mm/slab.c:2849
> ...
> > > I need to figure out what changed around 7:30PM on the 12th.
> 
> - RGB
> 
> --
> Richard Guy Briggs 
> Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, 
> Red Hat
> Remote, Ottawa, Canada
> Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next 20141216 BUG: sleeping function called from invalid context at mm/slab.c:2849

2014-12-18 Thread Eric Paris
On Thu, 2014-12-18 at 11:45 -0500, valdis.kletni...@vt.edu wrote:
> On Tue, 16 Dec 2014 20:09:54 -0500, Valdis Kletnieks said:
> 
> > Spotted these two while booting single-user on 20141216.  20141208
> > doesn't throw these, so it's something in the last week or so..
> 
> Gaah!  Turns out that 20141208 *is* susceptible - it had been booting
> just fine for several days, but it went around the bend, apparently due
> to a userspace or initrd change.

$5 says you updated systemd?

Richard?

> egrep 'BUG|Linux vers' from my syslog:
> 
> Dec  9 12:19:53 turing-police kernel: [0.00] Linux version 
> 3.18.0-next-20141208 (sou...@turing-police.cc.vt.edu) (gcc version 4.9.2 
> 20141101 (Red Hat 4.9.2-1) (GCC) ) #27 SMP PREEMPT Mon Dec 8 22:20:07 EST 2014
> Dec  9 21:19:53 turing-police kernel: [0.00] Linux version 
> 3.18.0-next-20141208 (sou...@turing-police.cc.vt.edu) (gcc version 4.9.2 
> 20141101 (Red Hat 4.9.2-1) (GCC) ) #27 SMP PREEMPT Mon Dec 8 22:20:07 EST 2014
> Dec 10 12:39:45 turing-police kernel: [0.00] Linux version 
> 3.18.0-next-20141208 (sou...@turing-police.cc.vt.edu) (gcc version 4.9.2 
> 20141101 (Red Hat 4.9.2-1) (GCC) ) #27 SMP PREEMPT Mon Dec 8 22:20:07 EST 2014
> Dec 10 20:56:28 turing-police kernel: [0.00] Linux version 
> 3.18.0-next-20141208 (sou...@turing-police.cc.vt.edu) (gcc version 4.9.2 
> 20141101 (Red Hat 4.9.2-1) (GCC) ) #27 SMP PREEMPT Mon Dec 8 22:20:07 EST 2014
> Dec 11 10:46:49 turing-police kernel: [0.00] Linux version 
> 3.18.0-next-20141208 (sou...@turing-police.cc.vt.edu) (gcc version 4.9.2 
> 20141101 (Red Hat 4.9.2-1) (GCC) ) #27 SMP PREEMPT Mon Dec 8 22:20:07 EST 2014
> Dec 11 23:53:10 turing-police kernel: [0.00] Linux version 
> 3.18.0-next-20141208 (sou...@turing-police.cc.vt.edu) (gcc version 4.9.2 
> 20141101 (Red Hat 4.9.2-1) (GCC) ) #27 SMP PREEMPT Mon Dec 8 22:20:07 EST 2014
> Dec 12 11:13:19 turing-police kernel: [0.00] Linux version 
> 3.18.0-next-20141208 (sou...@turing-police.cc.vt.edu) (gcc version 4.9.2 
> 20141101 (Red Hat 4.9.2-1) (GCC) ) #27 SMP PREEMPT Mon Dec 8 22:20:07 EST 2014
> Dec 12 19:26:24 turing-police kernel: [0.00] Linux version 
> 3.18.0-next-20141208 (sou...@turing-police.cc.vt.edu) (gcc version 4.9.2 
> 20141101 (Red Hat 4.9.2-1) (GCC) ) #27 SMP PREEMPT Mon Dec 8 22:20:07 EST 2014
> Dec 12 19:33:32 turing-police kernel: [0.00] Linux version 
> 3.18.0-next-20141208 (sou...@turing-police.cc.vt.edu) (gcc version 4.9.2 
> 20141101 (Red Hat 4.9.2-1) (GCC) ) #27 SMP PREEMPT Mon Dec 8 22:20:07 EST 2014
> Dec 12 19:42:30 turing-police kernel: [0.00] Linux version 
> 3.18.0-next-20141208 (sou...@turing-police.cc.vt.edu) (gcc version 4.9.2 
> 20141101 (Red Hat 4.9.2-1) (GCC) ) #27 SMP PREEMPT Mon Dec 8 22:20:07 EST 2014
> Dec 12 20:00:39 turing-police kernel: [ 1109.635328] BUG: sleeping function 
> called from invalid context at mm/slab.c:2849
> Dec 12 20:00:43 turing-police kernel: [ 1113.680912] BUG: sleeping function 
> called from invalid context at mm/slab.c:2849
> Dec 12 20:33:15 turing-police kernel: [ 3062.345461] BUG: sleeping function 
> called from invalid context at mm/slab.c:2849
> Dec 12 20:37:48 turing-police kernel: [ 3335.788891] BUG: sleeping function 
> called from invalid context at mm/slab.c:2849
> Dec 12 20:41:57 turing-police kernel: [ 3584.265255] BUG: sleeping function 
> called from invalid context at mm/slab.c:2849
> Dec 12 20:42:47 turing-police kernel: [ 3633.863552] BUG: sleeping function 
> called from invalid context at mm/slab.c:2849
> Dec 12 20:51:33 turing-police kernel: [0.00] Linux version 
> 3.18.0-next-20141208 (sou...@turing-police.cc.vt.edu) (gcc version 4.9.2 
> 20141101 (Red Hat 4.9.2-1) (GCC) ) #27 SMP PREEMPT Mon Dec 8 22:20:07 EST 2014
> Dec 12 21:51:04 turing-police kernel: [ 3587.132867] BUG: sleeping function 
> called from invalid context at mm/slab.c:2849
> Dec 12 22:20:01 turing-police kernel: [ 5322.313024] BUG: sleeping function 
> called from invalid context at mm/slab.c:2849
> Dec 12 23:06:00 turing-police kernel: [ 8077.463289] BUG: sleeping function 
> called from invalid context at mm/slab.c:2849
> Dec 13 00:00:05 turing-police kernel: [11318.405826] BUG: sleeping function 
> called from invalid context at mm/slab.c:2849
> 
> I need to figure out what changed around 7:30PM on the 12th.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next 20141216 BUG: sleeping function called from invalid context at mm/slab.c:2849

2014-12-16 Thread Eric Paris
I haven't looked into it, but I'd place my first bet on the audit
multicast code...

Richard?

On Tue, 2014-12-16 at 20:09 -0500, Valdis Kletnieks wrote:
> Not sure who's to blame here, but I'm tending towards selinux based on
> who was holding the locks...
> 
> Spotted these two while booting single-user on 20141216.  20141208
> doesn't throw these, so it's something in the last week or so..
> 
> Tossed it twice - once for /sbin/sulogin, and then a second time for 
> /bin/bash.
> 
> [   34.061285] BUG: sleeping function called from invalid context at 
> mm/slab.c:2849
> [   34.062863] in_atomic(): 1, irqs_disabled(): 0, pid: 885, name: sulogin
> [   34.064416] 2 locks held by sulogin/885:
> [   34.064418]  #0:  (&sig->cred_guard_mutex){+.+.+.}, at: 
> [] prepare_bprm_creds+0x28/0x8b
> [   34.064428]  #1:  (tty_files_lock){+.+.+.}, at: [] 
> selinux_bprm_committing_creds+0x55/0x22b
> [   34.064438] CPU: 1 PID: 885 Comm: sulogin Not tainted 3.18.0-next-20141216 
> #30
> [   34.064440] Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A15 
> 06/20/2014
> [   34.064442]  880223744f10 88022410f9b8 916ba529 
> 0375
> [   34.064447]  880223744f10 88022410f9e8 91063185 
> 0006
> [   34.064452]     
> 88022410fa38
> [   34.064457] Call Trace:
> [   34.064463]  [] dump_stack+0x50/0xa8
> [   34.064467]  [] ___might_sleep+0x1b6/0x1be
> [   34.064472]  [] __might_sleep+0x119/0x128
> [   34.064477]  [] 
> cache_alloc_debugcheck_before.isra.45+0x1d/0x1f
> [   34.064480]  [] kmem_cache_alloc+0x43/0x1c9
> [   34.064484]  [] __alloc_skb+0x42/0x1a3
> [   34.064488]  [] skb_copy+0x3e/0xa3
> [   34.064492]  [] audit_log_end+0x83/0x100
> [   34.064496]  [] ? avc_audit_pre_callback+0x103/0x103
> [   34.064500]  [] common_lsm_audit+0x441/0x450
> [   34.064503]  [] slow_avc_audit+0x63/0x67
> [   34.064506]  [] avc_has_perm+0xca/0xe3
> [   34.064510]  [] inode_has_perm+0x5a/0x65
> [   34.064514]  [] selinux_bprm_committing_creds+0x98/0x22b
> [   34.064519]  [] security_bprm_committing_creds+0xe/0x10
> [   34.064522]  [] install_exec_creds+0xe/0x79
> [   34.064527]  [] load_elf_binary+0xe36/0x10d7
> [   34.064542]  [] search_binary_handler+0x81/0x18c
> [   34.064545]  [] do_execveat_common.isra.31+0x4e3/0x7b7
> [   34.064548]  [] do_execve+0x1f/0x21
> [   34.064552]  [] SyS_execve+0x25/0x29
> [   34.064557]  [] stub_execve+0x69/0xa0
> 
> [   48.826654] BUG: sleeping function called from invalid context at 
> mm/slab.c:2849
> [   48.829282] in_atomic(): 1, irqs_disabled(): 0, pid: 885, name: bash
> [   48.829284] 2 locks held by bash/885:
> [   48.829297]  #0:  (&sig->cred_guard_mutex){+.+.+.}, at: 
> [] prepare_bprm_creds+0x28/0x8b
> [   48.829307]  #1:  (&(&newf->file_lock)->rlock){+.+.+.}, at: 
> [] iterate_fd+0x34/0x11c
> [   48.829310] CPU: 3 PID: 885 Comm: bash Not tainted 3.18.0-next-20141216 #30
> [   48.829311] Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A15 
> 06/20/2014
> [   48.829317]  880223744f10 88022410f928 916ba529 
> 0375
> [   48.829321]  880223744f10 88022410f958 91063185 
> 0002
> [   48.829325]     
> 88022410f9a8
> [   48.829327] Call Trace:
> [   48.829333]  [] dump_stack+0x50/0xa8
> [   48.829338]  [] ___might_sleep+0x1b6/0x1be
> [   48.829341]  [] __might_sleep+0x119/0x128
> [   48.829347]  [] 
> cache_alloc_debugcheck_before.isra.45+0x1d/0x1f
> [   48.829350]  [] kmem_cache_alloc+0x43/0x1c9
> [   48.829356]  [] __alloc_skb+0x42/0x1a3
> [   48.829360]  [] skb_copy+0x3e/0xa3
> [   48.829367]  [] audit_log_end+0x83/0x100
> [   48.829372]  [] ? avc_audit_pre_callback+0x103/0x103
> [   48.829377]  [] common_lsm_audit+0x441/0x450
> [   48.829381]  [] slow_avc_audit+0x63/0x67
> [   48.829386]  [] avc_has_perm+0xca/0xe3
> [   48.829391]  [] ? selinux_file_permission+0x9b/0x9b
> [   48.829395]  [] file_has_perm+0x6d/0x7c
> [   48.829400]  [] match_file+0x2e/0x3b
> [   48.829404]  [] iterate_fd+0xf4/0x11c
> [   48.829409]  [] selinux_bprm_committing_creds+0xd0/0x22b
> [   48.829415]  [] security_bprm_committing_creds+0xe/0x10
> [   48.829419]  [] install_exec_creds+0xe/0x79
> [   48.829426]  [] load_elf_binary+0xe36/0x10d7
> [   48.829431]  [] search_binary_handler+0x81/0x18c
> [   48.829435]  [] do_execveat_common.isra.31+0x4e3/0x7b7
> [   48.829462]  [] do_execve+0x1f/0x21
> [   48.829466]  [] SyS_execve+0x25/0x29
> [   48.829472]  [] stub_execve+0x69/0xa0
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sparc: Clashing values for O_PATH and FMODE_NONOTIFY?

2014-11-20 Thread Eric Paris
On Thu, 2014-11-20 at 12:12 +, David Drysdale wrote:
> [+linux-fsdevel, without the typo this time]
> 
> On Wed, Nov 19, 2014 at 8:30 PM, David Miller  wrote:
> > From: David Drysdale 
> > Date: Tue, 18 Nov 2014 13:13:51 +
> >
> >> Hi folks,
> >>
> >> It looks like the value for O_PATH on sparc:
> >>
> >>   arch/sparc/include/uapi/asm/fcntl.h:37:#define O_PATH 0x100
> >>
> >> clashes with the arch-independent value for __FMODE_NONOTIFY:
> >>
> >>   include/linux/fs.h:137:#define FMODE_NONOTIFY ((__force 
> >> fmode_t)0x100)
> >>   include/linux/fs.h:2764:#define __FMODE_NONOTIFY ((__force int)
> >> FMODE_NONOTIFY)
> >>
> >> and they are both in the same numbering space, as indicated by the
> >> comment at the top of include/uapi/asm-generic/fcntl.h and the use in
> >> fs/notify/fanotify/fanotify_user.c:715.
> >>
> >> Presumably this could theoretically cause problems (no notifications for
> >> O_PATH files on SPARC?), so would it be a good idea to renumber
> >> FMODE_NONOTIFY?  (I *think* that value is entirely kernel-internal.)
> >>
> >> Given that this has happened before (12ed2e36c98aec6c4155 "fanotify:
> >> FMODE_NONOTIFY and __O_SYNC in sparc conflict") it would probably
> >> also be a good idea to add __FMODE_NOTIFY to the uniqueness check in
> >> fs/fcntl.c:fcntl_init().
> >>
> >> Thoughts?
> >
> > I think you will need to change the internal value, to not clash with
> > the sparc exported one, for sure.
> 
> Well, I was sort of hoping someone else might volunteer to make the
> change :-) --  I don't use fanotify (or sparc for that matter), I just
> happened to notice the clash in passing.
> 
> But I'm happy to have a go, although I can't test much.  It would be
> good to hear from the fanotify maintainers first, though -- Eric?

It's totally internal.  And was picked to not clash with anyone.  I
don't know how to keep it from happening in the future.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] audit_tree: keep inode pinned

2014-11-04 Thread Eric Paris
[adding paul and richard]

On Tue, 2014-11-04 at 11:27 +0100, Miklos Szeredi wrote:
> From: Miklos Szeredi 
> 
> Audit rules disappear when an inode they watch is evicted from the cache.
> This is likely not what we want.
> 
> The guilty commit is "fsnotify: allow marks to not pin inodes in core",
> which didn't take into account that audit_tree adds watches with a zero
> mask.
> 
> Adding any mask should fix this.
> 
> Fixes: 90b1e7a57880 ("fsnotify: allow marks to not pin inodes in core")
> Signed-off-by: Miklos Szeredi 
> Cc: sta...@vger.kernel.org # 2.6.36+
> ---
>  kernel/audit_tree.c |1 +
>  1 file changed, 1 insertion(+)
> 
> --- a/kernel/audit_tree.c
> +++ b/kernel/audit_tree.c
> @@ -154,6 +154,7 @@ static struct audit_chunk *alloc_chunk(i
>   chunk->owners[i].index = i;
>   }
>   fsnotify_init_mark(&chunk->mark, audit_tree_destroy_watch);
> + chunk->mark.mask = FS_IN_IGNORED;
>   return chunk;
>  }
>  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386/audit: stop scribbling on the stack frame

2014-10-27 Thread Eric Paris
On Mon, 2014-10-27 at 21:52 +0100, Thomas Gleixner wrote:
> On Sun, 26 Oct 2014, Richard Guy Briggs wrote:
> > diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
> > index b553ed8..344b63f 100644
> > --- a/arch/x86/kernel/entry_32.S
> > +++ b/arch/x86/kernel/entry_32.S
> > @@ -447,15 +447,14 @@ sysenter_exit:
> >  sysenter_audit:
> > testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags(%ebp)
> > jnz syscall_trace_entry
> > -   addl $4,%esp
> > -   CFI_ADJUST_CFA_OFFSET -4
> > -   movl %esi,4(%esp)   /* 5th arg: 4th syscall arg */
> > -   movl %edx,(%esp)/* 4th arg: 3rd syscall arg */
> > -   /* %ecx already in %ecx3rd arg: 2nd syscall arg */
> > -   movl %ebx,%edx  /* 2nd arg: 1st syscall arg */
> > -   /* %eax already in %eax1st arg: syscall number */
> > +   /* movl PT_ECX(%esp), %ecx  already set, a1: 3nd arg to audit */
> > +   /* movl PT_EAX(%esp), %eax  already set, syscall number: 1st arg to 
> > audit */
> > +   pushl_cfi %esi  /* a3: 5th arg */
> > +   pushl_cfi %edx  /* a2: 4th arg */
> > +   movl %ebx, %edx /* ebx/a0: 2nd arg to audit */
> > call __audit_syscall_entry
> > -   pushl_cfi %ebx
> > +   popl_cfi %ecx /* get that remapped edx off the stack */
> > +   popl_cfi %ecx /* get that remapped esi off the stack */
> 
> Why use pop instead of simply adjusting esp and CFI by 8?

Certainly seems like a good idea for RGB's perf improvement patch to go
on top of -tip urgent.

-Eric


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386/audit: stop scribbling on the stack frame

2014-10-27 Thread Eric Paris
On Mon, 2014-10-27 at 10:02 -0700, H. Peter Anvin wrote:
> On 10/27/2014 06:55 AM, Eric Paris wrote:
> > My patch was already committed to the -tip urgent branch.  I believe any
> > optimization should be based on that branch, Richard.  If you are trying
> > to wrangle every bit of speed out of this, should you
> > 
> > push %esi;
> > push %edi;
> > CFI_ADJUST_CFA_OFFSET 8
> > call __audit_syscall_entry
> > pop;
> > pop;
> > CFI_ADJUST_CFA_OFFSET -8
> > 
> > Instead of using the pushl_cfi and popl_cfi macros?
> > 
> > I wrote my patch to be obviously correct, but agree there are certainly
> > some speedups possible.
> > 
> 
> Uh... not only is that plain wrong (the CFI should be adjusted after
> each instruction that changes the stack pointer),

Sure, things would be screwed up between the two push's

>  but what the heck is
> wrong with using the macros?

I was asking if that would save an instruction or two by consolidating
the CFI update and if so would that tradeoff be worth it, given the
regularity of this code being run.

> 
>   -hpa
> 
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386/audit: stop scribbling on the stack frame

2014-10-27 Thread Eric Paris
My patch was already committed to the -tip urgent branch.  I believe any
optimization should be based on that branch, Richard.  If you are trying
to wrangle every bit of speed out of this, should you

push %esi;
push %edi;
CFI_ADJUST_CFA_OFFSET 8
call __audit_syscall_entry
pop;
pop;
CFI_ADJUST_CFA_OFFSET -8

Instead of using the pushl_cfi and popl_cfi macros?

I wrote my patch to be obviously correct, but agree there are certainly
some speedups possible.

-Eric

On Sun, 2014-10-26 at 22:34 -0400, Richard Guy Briggs wrote:
> git commit b4f0d3755c5e9cc86292d5fd78261903b4f23d4a was very very dumb.
> It was writing over %esp/pt_regs semi-randomly on i686 with the expected
> "system can't boot" results.  As noted in:
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=85277
> 
> This patch stops fscking with pt_regs.  Instead it sets up the registers
> for the call to __audit_syscall_entry in the most obvious conceivable
> way.  It then does just a tiny tiny touch of magic.  We need to get what
> started in PT_EDX into 0(%esp) and PT_ESI into 4(%esp).  This is as easy
> as a pair of pushes using the values still in those registers.
> 
> After the call to __audit_syscall_entry all we need to do is get that
> now useless junk off the stack (pair of pops) and reload %eax with the
> original syscall so other stuff can keep going about it's business.
> 
> Reported-by: Paulo Zanoni 
> Signed-off-by: Eric Paris 
> Signed-off-by: Richard Guy Briggs 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: x...@kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-au...@redhat.com
> 
> ---
> On 14/10/25, Thomas Gleixner wrote:
> > Why are we grabbing that from the stack? AFAICT all arguments are in
> > the registers still.
> 
> Right, re-arranging the instructions slightly to avoid overwriting %edx
> with %ebx before needing it to push onto the stack, how does this look?
> 
>  arch/x86/kernel/entry_32.S | 15 +++
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
> index b553ed8..344b63f 100644
> --- a/arch/x86/kernel/entry_32.S
> +++ b/arch/x86/kernel/entry_32.S
> @@ -447,15 +447,14 @@ sysenter_exit:
>  sysenter_audit:
>   testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags(%ebp)
>   jnz syscall_trace_entry
> - addl $4,%esp
> - CFI_ADJUST_CFA_OFFSET -4
> - movl %esi,4(%esp)   /* 5th arg: 4th syscall arg */
> - movl %edx,(%esp)/* 4th arg: 3rd syscall arg */
> - /* %ecx already in %ecx3rd arg: 2nd syscall arg */
> - movl %ebx,%edx  /* 2nd arg: 1st syscall arg */
> - /* %eax already in %eax1st arg: syscall number */
> + /* movl PT_ECX(%esp), %ecx  already set, a1: 3nd arg to audit */
> + /* movl PT_EAX(%esp), %eax  already set, syscall number: 1st arg to 
> audit */
> + pushl_cfi %esi  /* a3: 5th arg */
> + pushl_cfi %edx  /* a2: 4th arg */
> + movl %ebx, %edx /* ebx/a0: 2nd arg to audit */
>   call __audit_syscall_entry
> - pushl_cfi %ebx
> + popl_cfi %ecx /* get that remapped edx off the stack */
> + popl_cfi %ecx /* get that remapped esi off the stack */
>   movl PT_EAX(%esp),%eax  /* reload syscall number */
>   jmp sysenter_do_call
>  
> 
> - RGB
> 
> --
> Richard Guy Briggs 
> Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, 
> Red Hat
> Remote, Ottawa, Canada
> Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/urgent] i386/audit: stop scribbling on the stack frame

2014-10-24 Thread tip-bot for Eric Paris
Commit-ID:  26c2d2b39128adba276d140eefa2745591b88536
Gitweb: http://git.kernel.org/tip/26c2d2b39128adba276d140eefa2745591b88536
Author: Eric Paris 
AuthorDate: Thu, 23 Oct 2014 00:04:03 -0400
Committer:  H. Peter Anvin 
CommitDate: Fri, 24 Oct 2014 13:27:56 -0700

i386/audit: stop scribbling on the stack frame

git commit b4f0d3755c5e9cc86292d5fd78261903b4f23d4a was very very dumb.
It was writing over %esp/pt_regs semi-randomly on i686  with the expected
"system can't boot" results.  As noted in:

https://bugs.freedesktop.org/show_bug.cgi?id=85277

This patch stops fscking with pt_regs.  Instead it sets up the registers
for the call to __audit_syscall_entry in the most obvious conceivable
way.  It then does just a tiny tiny touch of magic.  We need to get what
started in PT_EDX into 0(%esp) and PT_ESI into 4(%esp).  This is as easy
as a pair of pushes.

After the call to __audit_syscall_entry all we need to do is get that
now useless junk off the stack (pair of pops) and reload %eax with the
original syscall so other stuff can keep going about it's business.

Reported-by: Paulo Zanoni 
Signed-off-by: Eric Paris 
Link: 
http://lkml.kernel.org/r/1414037043-30647-1-git-send-email-epa...@redhat.com
Cc: Richard Guy Briggs 
Signed-off-by: H. Peter Anvin 
---
 arch/x86/kernel/entry_32.S | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index b553ed8..344b63f 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -447,15 +447,14 @@ sysenter_exit:
 sysenter_audit:
testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags(%ebp)
jnz syscall_trace_entry
-   addl $4,%esp
-   CFI_ADJUST_CFA_OFFSET -4
-   movl %esi,4(%esp)   /* 5th arg: 4th syscall arg */
-   movl %edx,(%esp)/* 4th arg: 3rd syscall arg */
-   /* %ecx already in %ecx3rd arg: 2nd syscall arg */
-   movl %ebx,%edx  /* 2nd arg: 1st syscall arg */
-   /* %eax already in %eax1st arg: syscall number */
+   /* movl PT_EAX(%esp), %eax  already set, syscall number: 1st arg to 
audit */
+   movl PT_EBX(%esp), %edx /* ebx/a0: 2nd arg to audit */
+   /* movl PT_ECX(%esp), %ecx  already set, a1: 3nd arg to audit */
+   pushl_cfi PT_ESI(%esp)  /* a3: 5th arg */
+   pushl_cfi PT_EDX+4(%esp)/* a2: 4th arg */
call __audit_syscall_entry
-   pushl_cfi %ebx
+   popl_cfi %ecx /* get that remapped edx off the stack */
+   popl_cfi %ecx /* get that remapped esi off the stack */
movl PT_EAX(%esp),%eax  /* reload syscall number */
jmp sysenter_do_call
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386/audit: stop scribbling on the stack frame

2014-10-23 Thread Eric Paris
On Thu, 2014-10-23 at 12:20 -0700, Andy Lutomirski wrote:
> On Thu, Oct 23, 2014 at 12:15 PM, Eric Paris  wrote:
> > On Thu, 2014-10-23 at 11:39 -0700, Andy Lutomirski wrote:
> >> On 10/22/2014 09:04 PM, Eric Paris wrote:
> >> > git commit b4f0d3755c5e9cc86292d5fd78261903b4f23d4a was very very dumb.
> >> > It was writing over %esp/pt_regs semi-randomly on i686  with the expected
> >> > "system can't boot" results.  As noted in:
> >> >
> >> > https://bugs.freedesktop.org/show_bug.cgi?id=85277
> >> >
> >> > This patch stops fscking with pt_regs.  Instead it sets up the registers
> >> > for the call to __audit_syscall_entry in the most obvious conceivable
> >> > way.  It then does just a tiny tiny touch of magic.  We need to get what
> >> > started in PT_EDX into 0(%esp) and PT_ESI into 4(%esp).  This is as easy
> >> > as a pair of pushes.
> >> >
> >> > After the call to __audit_syscall_entry all we need to do is get that
> >> > now useless junk off the stack (pair of pops) and reload %eax with the
> >> > original syscall so other stuff can keep going about it's business.
> >> >
> >> > Signed-off-by: Eric Paris 
> >> > Cc: Thomas Gleixner 
> >> > Cc: Ingo Molnar 
> >> > Cc: "H. Peter Anvin" 
> >> > Cc: x...@kernel.org
> >> > Cc: linux-kernel@vger.kernel.org
> >> > Cc: linux-au...@redhat.com
> >> > ---
> >> >  arch/x86/kernel/entry_32.S | 15 +++
> >> >  1 file changed, 7 insertions(+), 8 deletions(-)
> >> >
> >> > diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
> >> > index f9e3fab..fb01d22 100644
> >> > --- a/arch/x86/kernel/entry_32.S
> >> > +++ b/arch/x86/kernel/entry_32.S
> >> > @@ -447,15 +447,14 @@ sysenter_exit:
> >> >  sysenter_audit:
> >> > testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags(%ebp)
> >> > jnz syscall_trace_entry
> >> > -   addl $4,%esp
> >> > -   CFI_ADJUST_CFA_OFFSET -4
> >> > -   movl %esi,4(%esp)   /* 5th arg: 4th syscall arg */
> >> > -   movl %edx,(%esp)/* 4th arg: 3rd syscall arg */
> >> > -   /* %ecx already in %ecx3rd arg: 2nd syscall arg */
> >> > -   movl %ebx,%edx  /* 2nd arg: 1st syscall arg */
> >> > -   /* %eax already in %eax1st arg: syscall number */
> >> > +   /* movl PT_EAX(%esp), %eax  already set, syscall number: 1st arg 
> >> > to audit */
> >> > +   movl PT_EBX(%esp), %edx /* ebx/a0: 2nd arg to audit */
> >> > +   /* movl PT_ECX(%esp), %ecx  already set, a1: 3nd arg to audit */
> >> > +   pushl_cfi PT_ESI(%esp)  /* a3: 5th arg */
> >> > +   pushl_cfi PT_EDX+4(%esp)/* a2: 4th arg */
> >> > call __audit_syscall_entry
> >> > -   pushl_cfi %ebx
> >> > +   popl_cfi %ecx /* get that remapped edx off the stack */
> >> > +   popl_cfi %ecx /* get that remapped esi off the stack */
> >> > movl PT_EAX(%esp),%eax  /* reload syscall number */
> >> > jmp sysenter_do_call
> >> >
> >> >
> >>
> >> This looks reasonably likely to be correct, but this code is complicated
> >> and now ever slower.
> >
> > I guess I could just use push/pop and do the CFI_ADJUST_CFA_OFFSET by
> > hand.  But I figured this was reasonable enough...
> >
> 
> I'm not complaining about your new assembly in particular.  There's
> just too much assembly in there in general.
> 
> But I feel like I'm missing something in the new code.  Aren't you
> corrupting ecx with those popl_cfi insns?

After the call __audit_syscall_entry aren't they already polluted?
Isn't that the reason we need to reload EAX?  You can verify this leaves
things in a similar state (although slightly differently polluted) than
before it got screwed up.  Here is diff between before the breakage and
what I propose we do now.

(I admit I don't understand how the pushl_cfi %ebx wasn't messing up
PT_EBX)

/me anxiously awaits x86 guy to tell me how dumb I am

$ git diff a17c8b54dc738c4fda31e8be0302cd131a04c19f -- 
arch/x86/kernel/entry_32.S
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 0d0c9d4..fb01d22 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -447,16 +447,14 @@ sysenter_exit:
 s

Re: [PATCH] i386/audit: stop scribbling on the stack frame

2014-10-23 Thread Eric Paris
On Thu, 2014-10-23 at 15:30 -0400, Eric Paris wrote:
> On Thu, 2014-10-23 at 12:20 -0700, Andy Lutomirski wrote:
> > On Thu, Oct 23, 2014 at 12:15 PM, Eric Paris  wrote:
> > > On Thu, 2014-10-23 at 11:39 -0700, Andy Lutomirski wrote:
> > >> On 10/22/2014 09:04 PM, Eric Paris wrote:
> > >> > git commit b4f0d3755c5e9cc86292d5fd78261903b4f23d4a was very very dumb.
> > >> > It was writing over %esp/pt_regs semi-randomly on i686  with the 
> > >> > expected
> > >> > "system can't boot" results.  As noted in:
> > >> >
> > >> > https://bugs.freedesktop.org/show_bug.cgi?id=85277
> > >> >
> > >> > This patch stops fscking with pt_regs.  Instead it sets up the 
> > >> > registers
> > >> > for the call to __audit_syscall_entry in the most obvious conceivable
> > >> > way.  It then does just a tiny tiny touch of magic.  We need to get 
> > >> > what
> > >> > started in PT_EDX into 0(%esp) and PT_ESI into 4(%esp).  This is as 
> > >> > easy
> > >> > as a pair of pushes.
> > >> >
> > >> > After the call to __audit_syscall_entry all we need to do is get that
> > >> > now useless junk off the stack (pair of pops) and reload %eax with the
> > >> > original syscall so other stuff can keep going about it's business.
> > >> >
> > >> > Signed-off-by: Eric Paris 
> > >> > Cc: Thomas Gleixner 
> > >> > Cc: Ingo Molnar 
> > >> > Cc: "H. Peter Anvin" 
> > >> > Cc: x...@kernel.org
> > >> > Cc: linux-kernel@vger.kernel.org
> > >> > Cc: linux-au...@redhat.com
> > >> > ---
> > >> >  arch/x86/kernel/entry_32.S | 15 +++
> > >> >  1 file changed, 7 insertions(+), 8 deletions(-)
> > >> >
> > >> > diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
> > >> > index f9e3fab..fb01d22 100644
> > >> > --- a/arch/x86/kernel/entry_32.S
> > >> > +++ b/arch/x86/kernel/entry_32.S
> > >> > @@ -447,15 +447,14 @@ sysenter_exit:
> > >> >  sysenter_audit:
> > >> > testl $(_TIF_WORK_SYSCALL_ENTRY & 
> > >> > ~_TIF_SYSCALL_AUDIT),TI_flags(%ebp)
> > >> > jnz syscall_trace_entry
> > >> > -   addl $4,%esp
> > >> > -   CFI_ADJUST_CFA_OFFSET -4
> > >> > -   movl %esi,4(%esp)   /* 5th arg: 4th syscall arg */
> > >> > -   movl %edx,(%esp)/* 4th arg: 3rd syscall arg */
> > >> > -   /* %ecx already in %ecx3rd arg: 2nd syscall arg */
> > >> > -   movl %ebx,%edx  /* 2nd arg: 1st syscall arg */
> > >> > -   /* %eax already in %eax1st arg: syscall number */
> > >> > +   /* movl PT_EAX(%esp), %eax  already set, syscall number: 1st 
> > >> > arg to audit */
> > >> > +   movl PT_EBX(%esp), %edx /* ebx/a0: 2nd arg to audit */
> > >> > +   /* movl PT_ECX(%esp), %ecx  already set, a1: 3nd arg to audit 
> > >> > */
> > >> > +   pushl_cfi PT_ESI(%esp)  /* a3: 5th arg */
> > >> > +   pushl_cfi PT_EDX+4(%esp)/* a2: 4th arg */
> > >> > call __audit_syscall_entry
> > >> > -   pushl_cfi %ebx
> > >> > +   popl_cfi %ecx /* get that remapped edx off the stack */
> > >> > +   popl_cfi %ecx /* get that remapped esi off the stack */
> > >> > movl PT_EAX(%esp),%eax  /* reload syscall number */
> > >> > jmp sysenter_do_call
> > >> >
> > >> >
> > >>
> > >> This looks reasonably likely to be correct, but this code is complicated
> > >> and now ever slower.
> > >
> > > I guess I could just use push/pop and do the CFI_ADJUST_CFA_OFFSET by
> > > hand.  But I figured this was reasonable enough...
> > >
> > 
> > I'm not complaining about your new assembly in particular.  There's
> > just too much assembly in there in general.
> > 
> > But I feel like I'm missing something in the new code.  Aren't you
> > corrupting ecx with those popl_cfi insns?
> 
> After the call __audit_syscall_entry aren't they already polluted?
> Isn't that the reason we need to reload EAX?

Well, I guess EAX is special...

>   You can verify 

Re: [PATCH] i386/audit: stop scribbling on the stack frame

2014-10-23 Thread Eric Paris
On Thu, 2014-10-23 at 11:39 -0700, Andy Lutomirski wrote:
> On 10/22/2014 09:04 PM, Eric Paris wrote:
> > git commit b4f0d3755c5e9cc86292d5fd78261903b4f23d4a was very very dumb.
> > It was writing over %esp/pt_regs semi-randomly on i686  with the expected
> > "system can't boot" results.  As noted in:
> > 
> > https://bugs.freedesktop.org/show_bug.cgi?id=85277
> > 
> > This patch stops fscking with pt_regs.  Instead it sets up the registers
> > for the call to __audit_syscall_entry in the most obvious conceivable
> > way.  It then does just a tiny tiny touch of magic.  We need to get what
> > started in PT_EDX into 0(%esp) and PT_ESI into 4(%esp).  This is as easy
> > as a pair of pushes.
> > 
> > After the call to __audit_syscall_entry all we need to do is get that
> > now useless junk off the stack (pair of pops) and reload %eax with the
> > original syscall so other stuff can keep going about it's business.
> > 
> > Signed-off-by: Eric Paris 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > Cc: x...@kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-au...@redhat.com
> > ---
> >  arch/x86/kernel/entry_32.S | 15 +++
> >  1 file changed, 7 insertions(+), 8 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
> > index f9e3fab..fb01d22 100644
> > --- a/arch/x86/kernel/entry_32.S
> > +++ b/arch/x86/kernel/entry_32.S
> > @@ -447,15 +447,14 @@ sysenter_exit:
> >  sysenter_audit:
> > testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags(%ebp)
> > jnz syscall_trace_entry
> > -   addl $4,%esp
> > -   CFI_ADJUST_CFA_OFFSET -4
> > -   movl %esi,4(%esp)   /* 5th arg: 4th syscall arg */
> > -   movl %edx,(%esp)/* 4th arg: 3rd syscall arg */
> > -   /* %ecx already in %ecx3rd arg: 2nd syscall arg */
> > -   movl %ebx,%edx  /* 2nd arg: 1st syscall arg */
> > -   /* %eax already in %eax1st arg: syscall number */
> > +   /* movl PT_EAX(%esp), %eax  already set, syscall number: 1st arg to 
> > audit */
> > +   movl PT_EBX(%esp), %edx /* ebx/a0: 2nd arg to audit */
> > +   /* movl PT_ECX(%esp), %ecx  already set, a1: 3nd arg to audit */
> > +   pushl_cfi PT_ESI(%esp)  /* a3: 5th arg */
> > +   pushl_cfi PT_EDX+4(%esp)/* a2: 4th arg */
> > call __audit_syscall_entry
> > -   pushl_cfi %ebx
> > +   popl_cfi %ecx /* get that remapped edx off the stack */
> > +   popl_cfi %ecx /* get that remapped esi off the stack */
> > movl PT_EAX(%esp),%eax  /* reload syscall number */
> > jmp sysenter_do_call
> >  
> > 
> 
> This looks reasonably likely to be correct, but this code is complicated
> and now ever slower.

I guess I could just use push/pop and do the CFI_ADJUST_CFA_OFFSET by
hand.  But I figured this was reasonable enough...

> How hard would it be to just delete it and replace it with a
> straightforward two-phase trace invocation a la x86_64?

For me?  Hard.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] i386/audit: stop scribbling on the stack frame

2014-10-22 Thread Eric Paris
git commit b4f0d3755c5e9cc86292d5fd78261903b4f23d4a was very very dumb.
It was writing over %esp/pt_regs semi-randomly on i686  with the expected
"system can't boot" results.  As noted in:

https://bugs.freedesktop.org/show_bug.cgi?id=85277

This patch stops fscking with pt_regs.  Instead it sets up the registers
for the call to __audit_syscall_entry in the most obvious conceivable
way.  It then does just a tiny tiny touch of magic.  We need to get what
started in PT_EDX into 0(%esp) and PT_ESI into 4(%esp).  This is as easy
as a pair of pushes.

After the call to __audit_syscall_entry all we need to do is get that
now useless junk off the stack (pair of pops) and reload %eax with the
original syscall so other stuff can keep going about it's business.

Signed-off-by: Eric Paris 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-au...@redhat.com
---
 arch/x86/kernel/entry_32.S | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index f9e3fab..fb01d22 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -447,15 +447,14 @@ sysenter_exit:
 sysenter_audit:
testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags(%ebp)
jnz syscall_trace_entry
-   addl $4,%esp
-   CFI_ADJUST_CFA_OFFSET -4
-   movl %esi,4(%esp)   /* 5th arg: 4th syscall arg */
-   movl %edx,(%esp)/* 4th arg: 3rd syscall arg */
-   /* %ecx already in %ecx3rd arg: 2nd syscall arg */
-   movl %ebx,%edx  /* 2nd arg: 1st syscall arg */
-   /* %eax already in %eax1st arg: syscall number */
+   /* movl PT_EAX(%esp), %eax  already set, syscall number: 1st arg to 
audit */
+   movl PT_EBX(%esp), %edx /* ebx/a0: 2nd arg to audit */
+   /* movl PT_ECX(%esp), %ecx  already set, a1: 3nd arg to audit */
+   pushl_cfi PT_ESI(%esp)  /* a3: 5th arg */
+   pushl_cfi PT_EDX+4(%esp)/* a2: 4th arg */
call __audit_syscall_entry
-   pushl_cfi %ebx
+   popl_cfi %ecx /* get that remapped edx off the stack */
+   popl_cfi %ecx /* get that remapped esi off the stack */
movl PT_EAX(%esp),%eax  /* reload syscall number */
jmp sysenter_do_call
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: audit: x86: drop arch from __audit_syscall_entry() interface

2014-10-22 Thread Eric Paris
On Wed, 2014-10-22 at 14:43 -0700, H. Peter Anvin wrote:
> On 10/22/2014 02:38 PM, Eric Paris wrote:
> > 
> > It was sent, numerous times, to the x86 list for reviews, and lived in
> > -next for 2 complete devel cycles without a complaint.  I'm trying to
> > get an i386 system to test a fix.  But yes, it's total crap.
> > 
> 
> You don't need an i386 system -- you can install an i386 distro on an
> x86-64 system, or in KVM.

So I might still be an idiot, because I still haven't gotten a working
kernel.  But I can't get Linus' latest not panic even without
CONFIG_AUDITSYSCALL.  I kept blaming myself for not fixing this problem,
but reverting the patch like the reporter didn't give me bootable
kernels either.

I just jumped back in time and am looking to get anything I build to
boot...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: audit: x86: drop arch from __audit_syscall_entry() interface

2014-10-22 Thread Eric Paris
On Wed, 2014-10-22 at 14:43 -0700, H. Peter Anvin wrote:
> On 10/22/2014 02:38 PM, Eric Paris wrote:
> > 
> > It was sent, numerous times, to the x86 list for reviews, and lived in
> > -next for 2 complete devel cycles without a complaint.  I'm trying to
> > get an i386 system to test a fix.  But yes, it's total crap.
> > 
> 
> You don't need an i386 system -- you can install an i386 distro on an
> x86-64 system, or in KVM.

I'm currently building on i386 on KVM.  That's what I meant by "get"

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: audit: x86: drop arch from __audit_syscall_entry() interface

2014-10-22 Thread Eric Paris
On Wed, 2014-10-22 at 23:36 +0200, Thomas Gleixner wrote:
> On Wed, 22 Oct 2014, Eric Paris wrote:
> 
> > That's really serious.  Looking now.
> 
> Indeed its serious. And it's even more serious as this masterpiece of
> assembly wreckage was pulled in via your tree w/o having an acked-by
> one of the x86 maintainers.
> 
> > On Wed, 2014-10-22 at 16:08 -0200, Paulo Zanoni wrote:
> > > commit b4f0d3755c5e9cc86292d5fd78261903b4f23d4a
> > > Author: Richard Guy Briggs
> > > Date:   Tue Mar 4 10:38:06 2014 -0500
> > > audit: x86: drop arch from __audit_syscall_entry() interface
> > > 
> > > According to our QA, their i386 machine doesn't boot anymore. I tried
> > > to write my own revert for the patch, asked QA to test, and they
> > > confirmed it "solves" the problem.
> 
> diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
> index 0d0c9d4ab6d5..f9e3fabc8716 100644
> --- a/arch/x86/kernel/entry_32.S
> +++ b/arch/x86/kernel/entry_32.S
> @@ -449,12 +449,11 @@ sysenter_audit:
>   jnz syscall_trace_entry
>   addl $4,%esp
>   CFI_ADJUST_CFA_OFFSET -4
> - /* %esi already in 8(%esp) 6th arg: 4th syscall arg */
> - /* %edx already in 4(%esp) 5th arg: 3rd syscall arg */
> - /* %ecx already in 0(%esp) 4th arg: 2nd syscall arg */
> - movl %ebx,%ecx  /* 3rd arg: 1st syscall arg */
> - movl %eax,%edx  /* 2nd arg: syscall number */
> - movl $AUDIT_ARCH_I386,%eax  /* 1st arg: audit arch */
> + movl %esi,4(%esp)   /* 5th arg: 4th syscall arg */
> + movl %edx,(%esp)/* 4th arg: 3rd syscall arg */
> 
> Bilndly overwriting the stack which holds the syscall arguments is
> really a brilliant way to ensure security.

It was sent, numerous times, to the x86 list for reviews, and lived in
-next for 2 complete devel cycles without a complaint.  I'm trying to
get an i386 system to test a fix.  But yes, it's total crap.

-Eric


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: audit: x86: drop arch from __audit_syscall_entry() interface

2014-10-22 Thread Eric Paris
That's really serious.  Looking now.

On Wed, 2014-10-22 at 16:08 -0200, Paulo Zanoni wrote:
> Hi
> 
> (Cc'ing everybody mentioned in the original patch)
> 
> I work for Intel, on our Linux Graphics driver - aka i915.ko - and our
> QA team recently reported a regression on:
> 
> commit b4f0d3755c5e9cc86292d5fd78261903b4f23d4a
> Author: Richard Guy Briggs
> Date:   Tue Mar 4 10:38:06 2014 -0500
> audit: x86: drop arch from __audit_syscall_entry() interface
> 
> According to our QA, their i386 machine doesn't boot anymore. I tried
> to write my own revert for the patch, asked QA to test, and they
> confirmed it "solves" the problem.
> 
> Here are the details of QA' s bug report:
> https://bugs.freedesktop.org/show_bug.cgi?id=85277 .
> 
> The trees our QA tests are the development trees from i915.ko:
> http://cgit.freedesktop.org/drm-intel?h=drm-intel-fixes .
> 
> I tried searching for other bug reports on the same patch, but
> couldn't find any. Forgive me if this bug was already reported.
> 
> Feel free to continue this discussion on the bugzilla report if you want.
> 
> Thanks,
> Paulo
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] audit: log join and part events to the read-only multicast log socket

2014-10-21 Thread Eric Paris
On Tue, 2014-10-21 at 17:08 -0400, Richard Guy Briggs wrote:
> On 14/10/21, Steve Grubb wrote:
> > On Tuesday, October 07, 2014 03:03:14 PM Eric Paris wrote:
> > > On Tue, 2014-10-07 at 14:23 -0400, Richard Guy Briggs wrote:
> > > > Log the event when a client attempts to connect to the netlink audit
> > > > multicast socket, requiring CAP_AUDIT_READ capability, binding to the
> > > > AUDIT_NLGRP_READLOG group.  Log the disconnect too.
> > > >
> > > > 
> > > >
> > > > Sample output:
> > > > time->Tue Oct  7 14:15:19 2014
> > > > type=UNKNOWN[1348] msg=audit(1412705719.316:117): auid=0 uid=0 gid=0 
> > > > ses=1
> > > > pid=3552 comm="audit-multicast"
> > > > exe="/home/rgb/rgb/git/audit-multicast-listen/audit-multicast-listen"
> > > > subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 group=0
> > > > op=connect res=1>
> > > > 
> > > >
> > > > Signed-off-by: Richard Guy Briggs 
> > > > ---
> > > > For some reason unbind isn't being called on disconnect.  I suspect
> > > > missing
> > > > plumbing in netlink.  Investigation needed...
> > > >
> > > > 
> > > >  include/uapi/linux/audit.h |1 +
> > > >  kernel/audit.c |   46
> > > >++- 2 files changed, 45
> > > >insertions(+), 2 deletions(-)
> > > > 
> > > >
> > > > diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> > > > index 4d100c8..7fa6e8f 100644
> > > > --- a/include/uapi/linux/audit.h
> > > > +++ b/include/uapi/linux/audit.h
> > > > @@ -110,6 +110,7 @@
> > > >
> > > >  #define AUDIT_SECCOMP1326/* Secure Computing event 
> > > > */
> > > >  #define AUDIT_PROCTITLE  1327/* Proctitle emit event */
> > > >  #define AUDIT_FEATURE_CHANGE 1328/* audit log listing feature 
> > > > changes
> > > >*/>
> > > > +#define AUDIT_EVENT_LISTENER 1348/* task joined multicast read 
> > > > socket
> > > > */>
> > > >  
> > > >  #define AUDIT_AVC1400/* SE Linux avc denial or grant */
> > > >  #define AUDIT_SELINUX_ERR1401/* Internal SE Linux Errors */
> > > >
> > > > diff --git a/kernel/audit.c b/kernel/audit.c
> > > > index 53bb39b..74c81a7 100644
> > > > --- a/kernel/audit.c
> > > > +++ b/kernel/audit.c
> > > > @@ -1108,13 +1108,54 @@ static void audit_receive(struct sk_buff  *skb)
> > > >
> > > >   mutex_unlock(&audit_cmd_mutex);
> > > >  }
> > > >  
> > > >
> > > > +static void audit_log_bind(int group, char *op, int err)
> > > > +{
> > > > + struct audit_buffer *ab;
> > > > + char comm[sizeof(current->comm)];
> > > > + struct mm_struct *mm = current->mm;
> > > > +
> > > > + ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_EVENT_LISTENER);
> > > > + if (!ab)
> > > > + return;
> > > > +
> > > > + audit_log_format(ab, "auid=%d",
> > > > + from_kuid(&init_user_ns,
> > > > audit_get_loginuid(current))); + audit_log_format(ab, " uid=%d",
> > > > +  from_kuid(&init_user_ns, current_uid()));
> > > > + audit_log_format(ab, " gid=%d",
> > > > +  from_kgid(&init_user_ns, current_gid()));
> > > > + audit_log_format(ab, " ses=%d", audit_get_sessionid(current));
> > > > + audit_log_format(ab, " pid=%d", task_pid_nr(current));
> > > > + audit_log_format(ab, " comm=");
> > > > + audit_log_untrustedstring(ab, get_task_comm(comm, current));
> > > > + if (mm) {
> > > > + down_read(&mm->mmap_sem);
> > > > + if (mm->exe_file)
> > > > + audit_log_d_path(ab, " exe=",
> > > > &mm->exe_file->f_path);
> > > > + up_read(&mm->mmap_sem);
> > > > + } else 
> > > > + audit_log_format(ab, " exe=(null)");
> > > > + audit_log_t

Re: [PATCH V5 0/5] audit by executable name

2014-10-21 Thread Eric Paris
On Tue, 2014-10-21 at 17:56 -0400, Paul Moore wrote:

> * Change the audit_status.version field comment in include/uapi/linux/audit.h 
> to "/* audit functionality bitmap */", or similar.  We can't really change 
> the 
> structure now, but the comment is fair game.

Trying to think how to do things with a #define so you can rename,
"version" is pretty darn generic to pre-process.  You could make it a
union, so userspace code and use a sane name

> 
> * Change AUDIT_VERSION_LATEST to a bitmask instead of a number.  For example, 
> it should be 3 given the current code, not 2.  In a perfect world this 
> wouldn't even be in the uapi header, but it is so we need to keep it updated. 
>  
> Bumping it higher should be backwards compatible.

Getting 1 without 2 is actually hard to accompish as I remember, but
yes, you're right, i missed that.  I should be 3

> Can anyone think of anything else that might be affected by this?

No one uses this stuff, just change it.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] audit: add Paul Moore to the MAINTAINERS entry

2014-10-21 Thread Eric Paris
On Mon, 2014-10-20 at 12:23 -0400, Paul Moore wrote:
> After a long stint maintaining the audit tree, Eric asked me to step
> in and handle the day-to-day management of the audit tree.  We should
> also update the linux-audit mailing list entry to better reflect
> current usage.
> 
> Signed-off-by: Paul Moore 

Acked-by: Eric Paris 

> ---
>  MAINTAINERS |5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c2066f4..86c24fd 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1689,10 +1689,11 @@ S:Supported
>  F:   drivers/scsi/esas2r
>  
>  AUDIT SUBSYSTEM
> +M:   Paul Moore 
>  M:   Eric Paris 
> -L:   linux-au...@redhat.com (subscribers-only)
> +L:   linux-au...@redhat.com (moderated for non-subscribers)
>  W:   http://people.redhat.com/sgrubb/audit/
> -T:   git git://git.infradead.org/users/eparis/audit.git
> +T:   git git://git.infradead.org/users/pcmoore/audit
>  S:   Maintained
>  F:   include/linux/audit.h
>  F:   include/uapi/linux/audit.h
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] fs: Support compiling out sendfile

2014-10-21 Thread Eric Paris
On Tue, 2014-10-21 at 10:18 -0700, j...@joshtriplett.org wrote:
> On Tue, Oct 21, 2014 at 08:37:00AM -0700, H. Peter Anvin wrote:
> > On 10/20/2014 02:48 PM, Pieter Smith wrote:
> > > Many embedded systems will not need this syscall, and omitting it
> > > saves space.  Add a new EXPERT config option CONFIG_SENDFILE_SYSCALL
> > > (default y) to support compiling it out.
> > 
> > 
> > I believe these options ought to be CONFIG_SYSCALL_*
> > 
> 
> I agree.  I think people started using CONFIG_*_SYSCALL because of
> things like AUDITSYSCALL 

AUDITSYSCALL audits syscalls.  It doesn't actually implement any
syscalls.  You are right about SYSFS_SYSCALL though...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V5 0/5] audit by executable name

2014-10-20 Thread Eric Paris
On Mon, 2014-10-20 at 16:25 -0400, Steve Grubb wrote:
> On Thursday, October 02, 2014 11:06:51 PM Richard Guy Briggs wrote:
> > This is a part of Peter Moody, my and Eric Paris' work to implement
> > audit by executable name.
> 
> Does this patch set define an AUDIT_VERSION_SOMETHING and then set 
> AUDIT_VERSION_LATEST to it? If not, I need one to tell if the kernel supports 
> it when issuing commands. Also, if its conceivable that kernels may pick and 
> choose what features could be backported to a curated kernel, should 
> AUDIT_VERSION_ be a number that is incremented or a bit mask?

Right now the value is 2. So this is your last hope if you want to make
it a bitmask. I'll leave that up to paul/richard to (over) design.

Support for by EXEC should probably be noted somehow. Especially since
audit_netlink_ok() sucks and return EINVAL for unknown message types. We
wouldn't need the bump to version if that returned EOPNOTSUP and
userspace could actually tell what was going on...

> 
> -Steve
> 
> 
> > Please see the accompanying userspace patch:
> > https://www.redhat.com/archives/linux-audit/2014-May/msg00019.html
> > The userspace interface is not expected to change appreciably unless
> > something important has been overlooked.  Setting and deleting rules works
> > as expected.
> > 
> > If the path does not exist at rule creation time, it will be re-evaluated
> > every time there is a change to the parent directory at which point the
> > change in device and inode will be noted.
> > 
> > 
> > Here's a sample run:
> > 
> > # /usr/local/sbin/auditctl -a always,exit -F dir=/tmp -F exe=/bin/touch -F
> > key=touch_tmp # /usr/local/sbin/ausearch --start recent -k touch_tmp
> > time->Mon Jun 30 14:15:06 2014
> > type=CONFIG_CHANGE msg=audit(1404152106.683:149): auid=0 ses=1
> > subj=unconfined_u :unconfined_r:auditctl_t:s0-s0:c0.c1023 op="add_rule"
> > key="touch_tmp" list=4 res =1
> > 
> > # /usr/local/sbin/auditctl -l
> > -a always,exit -S all -F dir=/tmp -F exe=/bin/touch -F key=touch_tmp
> > 
> > # touch /tmp/test
> > 
> > # /usr/local/sbin/ausearch --start recent -k touch_tmp
> > time->Wed Jul  2 12:18:47 2014
> > type=UNKNOWN[1327] msg=audit(1404317927.319:132):
> > proctitle=746F756368002F746D702F74657374 type=PATH
> > msg=audit(1404317927.319:132): item=1 name="/tmp/test" inode=25997
> > dev=00:20 mode=0100644 ouid=0 ogid=0 rdev=00:00
> > obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE type=PATH
> > msg=audit(1404317927.319:132): item=0 name="/tmp/" inode=11144 dev=00:20
> > mode=041777 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:tmp_t:s0
> > nametype=PARENT type=CWD msg=audit(1404317927.319:132):  cwd="/root"
> > type=SYSCALL msg=audit(1404317927.319:132): arch=c03e syscall=2
> > success=yes exit=3 a0=7a403dd5 a1=941 a2=1b6 a3=34b65b2c6c items=2
> > ppid=4321 pid=6436 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0
> > fsgid=0 tty=ttyS0 ses=1 comm="touch" exe="/usr/bin/touch"
> > subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="touch_tmp"
> > 
> > 
> > Revision history:
> > v5: Revert patch "Let audit_free_rule() take care of calling
> > audit_remove_mark()." since it caused a group mark deadlock.
> > 
> > v4: Re-order and squash down fixups
> > Fix audit_dup_exe() to copy pathname string before calling
> > audit_alloc_mark().
> > 
> > v3: Rationalize and rename some function names and clean up get/put and free
> > code. Rename several "watch" references to "mark".
> > Rename audit_remove_rule() to audit_remove_mark_rule().
> > Let audit_free_rule() take care of calling audit_remove_mark().
> > Put audit_alloc_mark() arguments in same order as watch, tree and inode.
> > Move the access to the entry for audit_match_signal() to the beginning of
> > the function in case the entry found is the same one passed in. This will
> > enable it to be used by audit_remove_mark_rule().
> > https://www.redhat.com/archives/linux-audit/2014-July/msg0.html
> > 
> > v2: Misguided attempt to add in audit_exe similar to watches
> > https://www.redhat.com/archives/linux-audit/2014-June/msg00066.html
> > 
> > v1.5: eparis' switch to fsnotify
> > https://www.redhat.com/archives/linux-audit/2014-May/msg00046.html
> > https://www.redhat.com/archives/linux-audit/2014-May/msg00066.html
> > 
> > v1: Change to path interface instead of inode
> > htt

[GIT PULL] Audit changes for 3.18

2014-10-15 Thread Eric Paris
So this change across a whole bunch of arches really solves one basic
problem.  We want to audit when seccomp is killing a process.  seccomp
hooks in before the audit syscall entry code.  audit_syscall_entry took
as an argument the arch of the given syscall.  Since the arch is part of
what makes a syscall number meaningful it's an important part of the
record, but it isn't available when seccomp shoots the syscall...

For most arch's we have a better way to get the arch (syscall_get_arch)
So the solution was two fold:  Implement syscall_get_arch() everywhere
there is audit which didn't have it.  Use syscall_get_arch() in the
seccomp audit code.  Having syscall_get_arch() everywhere meant it was a
useless flag on the stack and we could get rid of it for the typical
syscall entry.

This of course results in a couple of merge issues.  Pretty easy, x86_64
appears to have removed the assembly we were editing and did it in C
code in arch/x86/kernel/ptrace.c::do_audit_syscall_entry().  arm
conflict is also obvious.

The other changes inside the audit system aren't grand, fixed some
records that had invalid spaces.  Better locking around the task comm
field.  Removing some dead functions and structs.  Make some things
static.  Really minor stuff.


The following changes since commit 19583ca584d6f574384e17fe7613dfaeadcdc4a6:

  Linux 3.16 (2014-08-03 15:25:02 -0700)

are available in the git repository at:

  git://git.infradead.org/users/eparis/audit.git master

for you to fetch changes up to 2991dd2b0117e864f394c826af6df144206ce0db:

  audit: rename audit_log_remove_rule to disambiguate for trees (2014-10-10 
15:30:25 -0400)


AKASHI Takahiro (1):
  arm64: audit: Add audit hook in syscall_trace_enter/exit()

Burn Alting (1):
  audit: invalid op= values for rules

Eric Paris (11):
  audit: drop unused struct audit_rule definition
  SH: define syscall_get_arch() for superh
  UM: implement syscall_get_arch()
  Alpha: define syscall_get_arch()
  ARCH: AUDIT: implement syscall_get_arch for all arches
  ARCH: AUDIT: audit_syscall_entry() should not require the arch
  audit: fix build error when asm/syscall.h does not exist
  sparc: simplify syscall_get_arch()
  sparc: implement is_32bit_task
  audit: arm64: Remove the audit arch argument to audit_syscall_entry
  audit: WARN if audit_rule_change called illegally

Fabian Frederick (1):
  kernel/audit.c: use ARRAY_SIZE instead of sizeof/sizeof[0]

Guenter Roeck (1):
  next: openrisc: Fix build

Richard Guy Briggs (15):
  syscall.h: fix doc text for syscall_get_arch()
  audit: __audit_syscall_entry: ignore arch arg and call syscall_get_arch() 
directly
  audit: add arch field to seccomp event log
  audit: x86: drop arch from __audit_syscall_entry() interface
  audit: reduce scope of audit_net_id
  audit: reduce scope of audit_log_fcaps
  audit: use atomic_t to simplify audit_serial()
  audit: use union for audit_field values since they are mutually exclusive
  audit: set nlmsg_len for multicast messages.
  audit: correct AUDIT_GET_FEATURE return message type
  audit: remove open_arg() function that is never used
  audit: get comm using lock to avoid race in string printing
  audit: put rule existence check in canonical order
  audit: cull redundancy in audit_rule_change
  audit: rename audit_log_remove_rule to disambiguate for trees

Stephen Rothwell (1):
  sparc: properly conditionalize use of TIF_32BIT

 arch/alpha/include/asm/syscall.h| 11 +++
 arch/alpha/kernel/ptrace.c  |  2 +-
 arch/arm/kernel/ptrace.c|  4 +--
 arch/arm64/kernel/ptrace.c  |  7 +
 arch/ia64/include/asm/syscall.h |  6 
 arch/ia64/kernel/ptrace.c   |  2 +-
 arch/microblaze/include/asm/syscall.h   |  5 +++
 arch/microblaze/kernel/ptrace.c |  3 +-
 arch/mips/include/asm/syscall.h |  2 +-
 arch/mips/kernel/ptrace.c   |  4 +--
 arch/openrisc/include/asm/syscall.h |  5 +++
 arch/openrisc/include/uapi/asm/elf.h|  3 +-
 arch/openrisc/kernel/ptrace.c   |  3 +-
 arch/parisc/include/asm/syscall.h   | 11 +++
 arch/parisc/kernel/ptrace.c |  9 ++
 arch/powerpc/include/asm/syscall.h  |  6 
 arch/powerpc/kernel/ptrace.c|  7 ++---
 arch/s390/kernel/ptrace.c   |  4 +--
 arch/sh/include/asm/syscall_32.h| 10 ++
 arch/sh/include/asm/syscall_64.h| 14 +
 arch/sh/kernel/ptrace_32.c  | 14 +
 arch/sh/kernel/ptrace_64.c  | 17 +-
 arch/sparc/include/asm/syscall.h|  7 +
 arch/sparc/include/asm/thread_info_32.h |  2 ++
 arch/sparc/include/asm/thread_info_64.h |  2 ++
 arch/sparc/kernel/ptrace_64.c   |  9 ++
 arch/um/kernel/ptrace.c  

Re: [PATCH 5/7] audit: remove redundant watch refcount

2014-10-10 Thread Eric Paris
Having a hard time convincing myself of the next 2...  Doesn't mean
they're wrong or bad, but my brain isn't seeing it today...

On Thu, 2014-10-02 at 22:05 -0400, Richard Guy Briggs wrote:
> Remove extra layer of audit_{get,put}_watch() calls.
> 
> Signed-off-by: Richard Guy Briggs 
> ---
>  kernel/audit_watch.c |5 +
>  kernel/auditfilter.c |7 ---
>  2 files changed, 1 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
> index f209448..c707afb 100644
> --- a/kernel/audit_watch.c
> +++ b/kernel/audit_watch.c
> @@ -203,7 +203,6 @@ int audit_to_watch(struct audit_krule *krule, char *path, 
> int len, u32 op)
>   if (IS_ERR(watch))
>   return PTR_ERR(watch);
>  
> - audit_get_watch(watch);
>   krule->watch = watch;
>  
>   return 0;
> @@ -306,7 +305,6 @@ static void audit_update_watch(struct audit_parent 
> *parent,
>* new watch.
>*/
>   audit_put_watch(nentry->rule.watch);
> - audit_get_watch(nwatch);
>   nentry->rule.watch = nwatch;
>   list_add(&nentry->rule.rlist, &nwatch->rules);
>   list_add_rcu(&nentry->list, 
> &audit_inode_hash[h]);
> @@ -392,8 +390,7 @@ static void audit_add_to_parent(struct audit_krule *krule,
>  
>   watch_found = 1;
>  
> - /* put krule's and initial refs to temporary watch */
> - audit_put_watch(watch);
> + /* put krule's ref to temporary watch */
>   audit_put_watch(watch);
>  
>   audit_get_watch(w);
> diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
> index facd704..5675916 100644
> --- a/kernel/auditfilter.c
> +++ b/kernel/auditfilter.c
> @@ -563,8 +563,6 @@ exit_nofree:
>   return entry;
>  
>  exit_free:
> - if (entry->rule.watch)
> - audit_put_watch(entry->rule.watch); /* matches initial get */
>   if (entry->rule.tree)
>   audit_put_tree(entry->rule.tree); /* that's the temporary one */
>   audit_free_rule(entry);
> @@ -942,8 +940,6 @@ static inline int audit_add_rule(struct audit_entry 
> *entry)
>   return 0;
>  
>  error:
> - if (watch)
> - audit_put_watch(watch); /* tmp watch, matches initial get */
>   return err;
>  }
>  
> @@ -951,7 +947,6 @@ error:
>  static inline int audit_del_rule(struct audit_entry *entry)
>  {
>   struct audit_entry  *e;
> - struct audit_watch *watch = entry->rule.watch;
>   struct audit_tree *tree = entry->rule.tree;
>   struct list_head *list;
>   int ret = 0;
> @@ -992,8 +987,6 @@ static inline int audit_del_rule(struct audit_entry 
> *entry)
>   mutex_unlock(&audit_filter_mutex);
>  
>  out:
> - if (watch)
> - audit_put_watch(watch); /* match initial get */
>   if (tree)
>   audit_put_tree(tree);   /* that's the temporary one */
>  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/7] audit: optimize add to parent skipping needless search and consuming parent ref

2014-10-10 Thread Eric Paris
On Thu, 2014-10-02 at 22:05 -0400, Richard Guy Briggs wrote:
> When parent has just been created there is no need to search for the parent in
> the list.  Add a parameter to skip the search

Since the parent was just allocated, and thus has an empty list, this
"search" is just as fast as the check against 'new' and doesn't
complicate things...

>  and consume the parent reference
> no matter what happens.

Now the refcnt change...I guess it's personal taste, but I don't
like it at all.  If in audit_add_watch() I always get a reference to
parent it makes the code a whole lot easier to read if we always put
that refcnt in the same function.  I don't like sub functions that
consume my ref...   Especially since that makes it a whole lot less
obvious in audit_add_watch when I'm allowed to use parent and when I'm
not...

So I'm not going to apply this patch.  I don't believe it improves
things...

> 
> Signed-off-by: Richard Guy Briggs 
> ---
>  kernel/audit_watch.c |   23 +++
>  1 files changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
> index ad9c168..f209448 100644
> --- a/kernel/audit_watch.c
> +++ b/kernel/audit_watch.c
> @@ -372,15 +372,20 @@ static int audit_get_nd(struct audit_watch *watch, 
> struct path *parent)
>  }
>  
>  /* Associate the given rule with an existing parent.
> - * Caller must hold audit_filter_mutex. */
> + * Caller must hold audit_filter_mutex.
> + * Consumes parent reference. */
>  static void audit_add_to_parent(struct audit_krule *krule,
> - struct audit_parent *parent)
> + struct audit_parent *parent,
> + int new)
>  {
>   struct audit_watch *w, *watch = krule->watch;
>   int watch_found = 0;
>  
>   BUG_ON(!mutex_is_locked(&audit_filter_mutex));
>  
> + if (new)
> + goto not_found;
> +
>   list_for_each_entry(w, &parent->watches, wlist) {
>   if (strcmp(watch->path, w->path))
>   continue;
> @@ -396,12 +401,15 @@ static void audit_add_to_parent(struct audit_krule 
> *krule,
>   break;
>   }
>  
> +not_found:
>   if (!watch_found) {
> - audit_get_parent(parent);
>   watch->parent = parent;
>  
>   list_add(&watch->wlist, &parent->watches);
> - }
> + } else
> + /* match get in audit_find_parent or audit_init_parent */
> + audit_put_parent(parent);
> +
>   list_add(&krule->rlist, &watch->rules);
>  }
>  
> @@ -413,6 +421,7 @@ int audit_add_watch(struct audit_krule *krule, struct 
> list_head **list)
>   struct audit_parent *parent;
>   struct path parent_path;
>   int h, ret = 0;
> + int new = 0;
>  
>   mutex_unlock(&audit_filter_mutex);
>  
> @@ -433,12 +442,10 @@ int audit_add_watch(struct audit_krule *krule, struct 
> list_head **list)
>   ret = PTR_ERR(parent);
>   goto error;
>   }
> + new = 1;
>   }
>  
> - audit_add_to_parent(krule, parent);
> -
> - /* match get in audit_find_parent or audit_init_parent */
> - audit_put_parent(parent);
> + audit_add_to_parent(krule, parent, new);
>  
>   h = audit_hash_ino((u32)watch->ino);
>   *list = &audit_inode_hash[h];


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] audit: eliminate string copy for new tree rules

2014-10-10 Thread Eric Paris
On Thu, 2014-10-02 at 22:05 -0400, Richard Guy Briggs wrote:
> New tree rules copy the path twice and discard the intermediary copy.
> 
> This saves one pointer at the expense of one path string copy.
> 
> Signed-off-by: Richard Guy Briggs 
> ---
>  kernel/audit_tree.c  |9 +
>  kernel/auditfilter.c |5 +++--
>  2 files changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> index bd418c4..ace72ed 100644
> --- a/kernel/audit_tree.c
> +++ b/kernel/audit_tree.c
> @@ -17,7 +17,7 @@ struct audit_tree {
>   struct list_head list;
>   struct list_head same_root;
>   struct rcu_head head;
> - char pathname[];
> + char *pathname;
>  };
>  
>  struct audit_chunk {
> @@ -70,11 +70,11 @@ static LIST_HEAD(prune_list);
>  
>  static struct fsnotify_group *audit_tree_group;
>  
> -static struct audit_tree *alloc_tree(const char *s)
> +static struct audit_tree *alloc_tree(char *s)
>  {
>   struct audit_tree *tree;
>  
> - tree = kmalloc(sizeof(struct audit_tree) + strlen(s) + 1, GFP_KERNEL);
> + tree = kmalloc(sizeof(struct audit_tree), GFP_KERNEL);
>   if (tree) {
>   atomic_set(&tree->count, 1);
>   tree->goner = 0;
> @@ -83,7 +83,7 @@ static struct audit_tree *alloc_tree(const char *s)
>   INIT_LIST_HEAD(&tree->list);
>   INIT_LIST_HEAD(&tree->same_root);
>   tree->root = NULL;
> - strcpy(tree->pathname, s);
> + tree->pathname = s;
>   }
>   return tree;
>  }
> @@ -96,6 +96,7 @@ static inline void get_tree(struct audit_tree *tree)
>  static inline void put_tree(struct audit_tree *tree)
>  {
>   if (atomic_dec_and_test(&tree->count))
> + kfree(tree->pathname);
>   kfree_rcu(tree, head);
>  }

Why does the tree need to be freed after an RCU grace period but the
pathname can be freed immediately?  What's the locking/access that makes
this safe?

>  
> diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
> index e3378a4..facd704 100644
> --- a/kernel/auditfilter.c
> +++ b/kernel/auditfilter.c
> @@ -534,9 +534,10 @@ static struct audit_entry *audit_data_to_entry(struct 
> audit_rule_data *data,
>   entry->rule.buflen += f->val;
>  
>   err = audit_make_tree(&entry->rule, str, f->op);
> - kfree(str);
> - if (err)
> + if (err) {
> + kfree(str);
>   goto exit_free;
> + }
>   break;
>   case AUDIT_INODE:
>   err = audit_to_inode(&entry->rule, f);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/7] audit: cull redundancy in audit_rule_change

2014-10-10 Thread Eric Paris
On Thu, 2014-10-02 at 22:05 -0400, Richard Guy Briggs wrote:
> Re-factor audit_rule_change() to reduce the amount of code redundancy and
> simplify the logic.
> 
> Signed-off-by: Richard Guy Briggs 
> ---
>  kernel/auditfilter.c |   20 +++-
>  1 files changed, 7 insertions(+), 13 deletions(-)
> 
> diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
> index 4a11697..e3378a4 100644
> --- a/kernel/auditfilter.c
> +++ b/kernel/auditfilter.c
> @@ -1064,30 +1064,24 @@ int audit_rule_change(int type, __u32 portid, int 
> seq, void *data,
>   int err = 0;
>   struct audit_entry *entry;
>  
> + entry = audit_data_to_entry(data, datasz);
> + if (IS_ERR(entry))
> + return PTR_ERR(entry);
> +
>   switch (type) {
>   case AUDIT_ADD_RULE:
> - entry = audit_data_to_entry(data, datasz);
> - if (IS_ERR(entry))
> - return PTR_ERR(entry);
> -
>   err = audit_add_rule(entry);
>   audit_log_rule_change("add_rule", &entry->rule, !err);
> - if (err)
> - audit_free_rule(entry);
>   break;
>   case AUDIT_DEL_RULE:
> - entry = audit_data_to_entry(data, datasz);
> - if (IS_ERR(entry))
> - return PTR_ERR(entry);
> -
>   err = audit_del_rule(entry);
>   audit_log_rule_change("remove_rule", &entry->rule, !err);
> - audit_free_rule(entry);
>   break;
> - default:
> - return -EINVAL;

I left the default case and made it:

err = -EINVAL;
WARN_ON(1);

Seemed like better defensive coding

>   }
>  
> + if (err || type == AUDIT_DEL_RULE)
> + audit_free_rule(entry);
> +
>   return err;
>  }
>  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] audit: log join and part events to the read-only multicast log socket

2014-10-07 Thread Eric Paris
On Tue, 2014-10-07 at 14:23 -0400, Richard Guy Briggs wrote:
> Log the event when a client attempts to connect to the netlink audit multicast
> socket, requiring CAP_AUDIT_READ capability, binding to the 
> AUDIT_NLGRP_READLOG
> group.  Log the disconnect too.
> 
> Sample output:
> time->Tue Oct  7 14:15:19 2014
> type=UNKNOWN[1348] msg=audit(1412705719.316:117): auid=0 uid=0 gid=0 ses=1 
> pid=3552 comm="audit-multicast" 
> exe="/home/rgb/rgb/git/audit-multicast-listen/audit-multicast-listen" 
> subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 group=0 op=connect 
> res=1
> 
> Signed-off-by: Richard Guy Briggs 
> ---
> For some reason unbind isn't being called on disconnect.  I suspect missing
> plumbing in netlink.  Investigation needed...
> 
>  include/uapi/linux/audit.h |1 +
>  kernel/audit.c |   46 ++-
>  2 files changed, 45 insertions(+), 2 deletions(-)
> 
> diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> index 4d100c8..7fa6e8f 100644
> --- a/include/uapi/linux/audit.h
> +++ b/include/uapi/linux/audit.h
> @@ -110,6 +110,7 @@
>  #define AUDIT_SECCOMP1326/* Secure Computing event */
>  #define AUDIT_PROCTITLE  1327/* Proctitle emit event */
>  #define AUDIT_FEATURE_CHANGE 1328/* audit log listing feature changes */
> +#define AUDIT_EVENT_LISTENER 1348/* task joined multicast read socket */
>  
>  #define AUDIT_AVC1400/* SE Linux avc denial or grant */
>  #define AUDIT_SELINUX_ERR1401/* Internal SE Linux Errors */
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 53bb39b..74c81a7 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -1108,13 +1108,54 @@ static void audit_receive(struct sk_buff  *skb)
>   mutex_unlock(&audit_cmd_mutex);
>  }
>  
> +static void audit_log_bind(int group, char *op, int err)
> +{
> + struct audit_buffer *ab;
> + char comm[sizeof(current->comm)];
> + struct mm_struct *mm = current->mm;
> +
> + ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_EVENT_LISTENER);
> + if (!ab)
> + return;
> +
> + audit_log_format(ab, "auid=%d",
> + from_kuid(&init_user_ns, audit_get_loginuid(current)));
> + audit_log_format(ab, " uid=%d",
> +  from_kuid(&init_user_ns, current_uid()));
> + audit_log_format(ab, " gid=%d",
> +  from_kgid(&init_user_ns, current_gid()));
> + audit_log_format(ab, " ses=%d", audit_get_sessionid(current));
> + audit_log_format(ab, " pid=%d", task_pid_nr(current));
> + audit_log_format(ab, " comm=");
> + audit_log_untrustedstring(ab, get_task_comm(comm, current));
> + if (mm) {
> + down_read(&mm->mmap_sem);
> + if (mm->exe_file)
> + audit_log_d_path(ab, " exe=", &mm->exe_file->f_path);
> + up_read(&mm->mmap_sem);
> + } else 
> + audit_log_format(ab, " exe=(null)");
> + audit_log_task_context(ab); /* subj= */

super crazy yuck.  audit_log_task_info() ??

> + audit_log_format(ab, " group=%d", group);

group seems like too easily confused a name.

> + audit_log_format(ab, " op=%s", op);
> + audit_log_format(ab, " res=%d", !err);
> + audit_log_end(ab);
> +}
> +
>  /* Run custom bind function on netlink socket group connect or bind 
> requests. */
>  static int audit_bind(int group)
>  {
> + int err = 0;
> +
>   if (!capable(CAP_AUDIT_READ))
> - return -EPERM;
> + err = -EPERM;
> + audit_log_bind(group, "connect", err);
> + return err;
> +}
>  
> - return 0;
> +static void audit_unbind(int group)
> +{
> + audit_log_bind(group, "disconnect", 0);
>  }
>  
>  static int __net_init audit_net_init(struct net *net)
> @@ -1124,6 +1165,7 @@ static int __net_init audit_net_init(struct net *net)
>   .bind   = audit_bind,
>   .flags  = NL_CFG_F_NONROOT_RECV,
>   .groups = AUDIT_NLGRP_MAX,
> + .unbind = audit_unbind,
>   };
>  
>   struct audit_net *aunet = net_generic(net, audit_net_id);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] next: openrisc: Fix build

2014-09-26 Thread Eric Paris
Would you like me to carry this in the audit tree, since I'm the one who
broke it?

-Eric

On Fri, 2014-09-26 at 09:05 -0700, Guenter Roeck wrote:
> openrisc:defconfig fails to build in next-20140926 with the following error.
> 
> In file included from arch/openrisc/kernel/signal.c:31:0:
> ./arch/openrisc/include/asm/syscall.h: In function 'syscall_get_arch':
> ./arch/openrisc/include/asm/syscall.h:77:9: error: 'EM_OPENRISC' undeclared
> 
> Fix by moving EM_OPENRISC to include/uapi/linux/elf-em.h.
> 
> Fixes: ce5d112827e5 ("ARCH: AUDIT: implement syscall_get_arch for all arches")
> Cc: Eric Paris 
> Cc: Stefan Kristiansson 
> Cc: Geert Uytterhoeven 
> Cc: Stephen Rothwell 
> Signed-off-by: Guenter Roeck 
> ---
> v2: Only move EM_OPENRISC.
> 
> Another possible solution for the problem would be to include asm/elf.h
> in arch/openrisc/kernel/signal.c. I had actually submitted a patch with
> that fix back in August (maybe that is where I remembered the problem from).
> Wonder what happened with that patch.
> 
> Would it make sense to drop EM_OR32 and replace it with EM_OPENRISC where
> it is used ? binutils seems to suggest that EM_OPENRISC is the "official"
> definition.
> 
>  arch/openrisc/include/uapi/asm/elf.h | 3 +--
>  include/uapi/linux/elf-em.h  | 1 +
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/openrisc/include/uapi/asm/elf.h 
> b/arch/openrisc/include/uapi/asm/elf.h
> index f02ea58..8884276 100644
> --- a/arch/openrisc/include/uapi/asm/elf.h
> +++ b/arch/openrisc/include/uapi/asm/elf.h
> @@ -55,9 +55,8 @@ typedef elf_greg_t elf_gregset_t[ELF_NGREG];
>  /* A placeholder; OR32 does not have fp support yes, so no fp regs for now.  
> */
>  typedef unsigned long elf_fpregset_t;
>  
> -/* This should be moved to include/linux/elf.h */
> +/* EM_OPENRISC is defined in linux/elf-em.h */
>  #define EM_OR32 0x8472
> -#define EM_OPENRISC 92 /* OpenRISC 32-bit embedded processor */
>  
>  /*
>   * These are used to set parameters in the core dumps.
> diff --git a/include/uapi/linux/elf-em.h b/include/uapi/linux/elf-em.h
> index 01529bd..aa90bc9 100644
> --- a/include/uapi/linux/elf-em.h
> +++ b/include/uapi/linux/elf-em.h
> @@ -32,6 +32,7 @@
>  #define EM_V850  87  /* NEC v850 */
>  #define EM_M32R  88  /* Renesas M32R */
>  #define EM_MN10300   89  /* Panasonic/MEI MN10300, AM33 */
> +#define EM_OPENRISC 92 /* OpenRISC 32-bit embedded processor */
>  #define EM_BLACKFIN 106 /* ADI Blackfin Processor */
>  #define EM_TI_C6000  140 /* TI C6X DSPs */
>  #define EM_AARCH64   183 /* ARM 64 bit */


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: New build failures in Sep 25 tree

2014-09-26 Thread Eric Paris
On Fri, 2014-09-26 at 06:32 -0700, Guenter Roeck wrote:
> On 09/26/2014 12:59 AM, Stefan Kristiansson wrote:
> > On Fri, Sep 26, 2014 at 08:30:57AM +0200, Geert Uytterhoeven wrote:
> >> Hi Günther,
> >>
> >> [cc openrisc]
> >>
> >> On Thu, Sep 25, 2014 at 10:25 PM, Guenter Roeck  wrote:
> >>> New build failures:
> >>
> >>> openrisc-defconfig
> >>>
> >>> In file included from arch/openrisc/kernel/signal.c:31:0:
> >>> ./arch/openrisc/include/asm/syscall.h: In function 'syscall_get_arch':
> >>> ./arch/openrisc/include/asm/syscall.h:77:9: error: 'EM_OPENRISC' 
> >>> undeclared
> >>
> >> That's not a new one. It's been failing for half or year or so.
> >>
> >> If you only see it now, that means something else got fixed ;-)
> >>
> >
> >>From what I can see, it's caused by
> > ce5d112827e5 ("ARCH: AUDIT: implement syscall_get_arch for all arches")
> > that got (re?)introduced two days ago.
> >
> > To me it seems that the problem is that EM_OPENRISC is
> > missing in include/uapi/linux/elf-em.h, but if that's the case,
> > I think microblaze have the same problem with that patch applied.
> >
> 
> Microblaze builds for some reason. But, yes, that commit is from March.
> Maybe that is where I remember it from.
> 
> You are right, it builds if I add the define. I'll submit a patch to fix
> the problem if the resulting image builds.
> 
> Guenter

I had no idea you were having build problems for so long!!  I'm so
sorry!  The tree got pulled from -next for a while because we ran into
some arm problems and I was going on vacation.  So I pulled from -next.
It just went back into -next (after I dealt with arm).  And so you saw
build failure again on the 25th.

I guess the choices are moving EM_OPENRISC public like the other ones
(which I believe you sent a patch for) or maybe you can include
arch/openrisc/include/uapi/asm/elf.h inside
arch/openrisc/include/asm/syscall.h so you still get to keep that
definition 'private'.  I have no idea if such an include would give you
a circular mess.  That happens all to often at the arch level.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Security: List corruption occured during file system automation test

2014-08-13 Thread Eric Paris
Do you have a backtrace?

On Wed, Aug 13, 2014 at 8:30 AM, Al Viro  wrote:
> On Wed, Aug 13, 2014 at 05:04:13PM +0530, shivnanda...@samsung.com wrote:
>> From: Shivnandan Kumar 
>>
>> List element was freed by  inode_free_security and then it uses rcu
>> element to point inode_free_rcu, since it inside a union so it
>> shares memory, sb_finish_set_opts now also try to free list element,
>
> How in hell does it find that element?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CAPABILITIES: remove undefined caps from all processes

2014-07-23 Thread Eric Paris
On Wed, 2014-07-23 at 13:46 -0700, Andy Lutomirski wrote:
> On 07/23/2014 12:36 PM, Eric Paris wrote:
> > This is effectively a revert of 7b9a7ec565505699f503b4fcf61500dceb36e744
> > plus fixing it a different way...
> 
> You sent something like this a couple days ago.  What changed?

right when I sent it I knew I forgot to do the -v2 type stuff.

The new portions are fixes 3 and 4 below.  Which consists of masking
unknown caps from sys_setcap() and executing files with unknown
filecaps.

-Eric

> --Andy
> 
> > 
> > We found, when trying to run an application from an application which
> > had dropped privs that the kernel does security checks on undefined
> > capability bits.  This was ESPECIALLY difficult to debug as those
> > undefined bits are hidden from /proc/$PID/status.
> > 
> > Consider a root application which drops all capabilities from ALL 4
> > capability sets.  We assume, since the application is going to set
> > eff/perm/inh from an array that it will clear not only the defined caps
> > less than CAP_LAST_CAP, but also the higher 28ish bits which are
> > undefined future capabilities.
> > 
> > The BSET gets cleared differently.  Instead it is cleared one bit at a
> > time.  The problem here is that in security/commoncap.c::cap_task_prctl()
> > we actually check the validity of a capability being read.  So any task
> > which attempts to 'read all things set in bset' followed by 'unset all
> > things set in bset' will not even attempt to unset the undefined bits
> > higher than CAP_LAST_CAP.
> > 
> > So the 'parent' will look something like:
> > CapInh: 
> > CapPrm: 
> > CapEff: 
> > CapBnd: ffc0
> > 
> > All of this 'should' be fine.  Given that these are undefined bits that
> > aren't supposed to have anything to do with permissions.  But they do...
> > 
> > So lets now consider a task which cleared the eff/perm/inh completely
> > and cleared all of the valid caps in the bset (but not the invalid caps
> > it couldn't read out of the kernel).  We know that this is exactly what
> > the libcap-ng library does and what the go capabilities library does.
> > They both leave you in that above situation if you try to clear all of
> > you capapabilities from all 4 sets.  If that root task calls execve()
> > the child task will pick up all caps not blocked by the bset.  The bset
> > however does not block bits higher than CAP_LAST_CAP.  So now the child
> > task has bits in eff which are not in the parent.  These are
> > 'meaningless' undefined bits, but still bits which the parent doesn't
> > have.
> > 
> > The problem is now in cred_cap_issubset() (or any operation which does a
> > subset test) as the child, while a subset for valid cap bits, is not a
> > subset for invalid cap bits!  So now we set durring commit creds that
> > the child is not dumpable.  Given it is 'more priv' than its parent.  It
> > also means the parent cannot ptrace the child and other stupidity.
> > 
> > The solution here:
> > 1) stop hiding capability bits in status
> > This makes debugging easier!
> > 
> > 2) stop giving any task undefined capability bits.  it's simple, it you
> > don't put those invalid bits in CAP_FULL_SET you won't get them in init
> > and you won't get them in any other task either.
> > This fixes the cap_issubset() tests and resulting fallout (which
> > made the init task in a docker container untraceable among other
> > things)
> > 
> > 3) mask out undefined bits when sys_capset() is called as it might use
> > ~0, ~0 to denote 'all capabilities' for backward/forward compatibility.
> > This lets 'capsh --caps="all=eip" -- -c /bin/bash' run.
> > 
> > 4) mask out undefined bit when we read a file capability off of disk as
> > again likely all bits are set in the xattr for forward/backward
> > compatibility.
> > This lets 'setcap all+pe /bin/bash; /bin/bash' run
> > 
> > Signed-off-by: Eric Paris 
> > Cc: Andrew Vagin 
> > Cc: Andrew G. Morgan 
> > Cc: Serge E. Hallyn 
> > Cc: Kees Cook 
> > Cc: Steve Grubb 
> > Cc: Dan Walsh 
> > Cc: sta...@vger.kernel.org
> > ---
> >  fs/proc/array.c| 11 +--
> >  include/linux/capability.h |  5 -
> >  kernel/audit.c |  2 +-
> >  kernel/capability.c|  4 
> >  security/commoncap.c   | 

[PATCH] CAPABILITIES: remove undefined caps from all processes

2014-07-23 Thread Eric Paris
This is effectively a revert of 7b9a7ec565505699f503b4fcf61500dceb36e744
plus fixing it a different way...

We found, when trying to run an application from an application which
had dropped privs that the kernel does security checks on undefined
capability bits.  This was ESPECIALLY difficult to debug as those
undefined bits are hidden from /proc/$PID/status.

Consider a root application which drops all capabilities from ALL 4
capability sets.  We assume, since the application is going to set
eff/perm/inh from an array that it will clear not only the defined caps
less than CAP_LAST_CAP, but also the higher 28ish bits which are
undefined future capabilities.

The BSET gets cleared differently.  Instead it is cleared one bit at a
time.  The problem here is that in security/commoncap.c::cap_task_prctl()
we actually check the validity of a capability being read.  So any task
which attempts to 'read all things set in bset' followed by 'unset all
things set in bset' will not even attempt to unset the undefined bits
higher than CAP_LAST_CAP.

So the 'parent' will look something like:
CapInh: 
CapPrm: 
CapEff: 
CapBnd: ffc0

All of this 'should' be fine.  Given that these are undefined bits that
aren't supposed to have anything to do with permissions.  But they do...

So lets now consider a task which cleared the eff/perm/inh completely
and cleared all of the valid caps in the bset (but not the invalid caps
it couldn't read out of the kernel).  We know that this is exactly what
the libcap-ng library does and what the go capabilities library does.
They both leave you in that above situation if you try to clear all of
you capapabilities from all 4 sets.  If that root task calls execve()
the child task will pick up all caps not blocked by the bset.  The bset
however does not block bits higher than CAP_LAST_CAP.  So now the child
task has bits in eff which are not in the parent.  These are
'meaningless' undefined bits, but still bits which the parent doesn't
have.

The problem is now in cred_cap_issubset() (or any operation which does a
subset test) as the child, while a subset for valid cap bits, is not a
subset for invalid cap bits!  So now we set durring commit creds that
the child is not dumpable.  Given it is 'more priv' than its parent.  It
also means the parent cannot ptrace the child and other stupidity.

The solution here:
1) stop hiding capability bits in status
This makes debugging easier!

2) stop giving any task undefined capability bits.  it's simple, it you
don't put those invalid bits in CAP_FULL_SET you won't get them in init
and you won't get them in any other task either.
This fixes the cap_issubset() tests and resulting fallout (which
made the init task in a docker container untraceable among other
things)

3) mask out undefined bits when sys_capset() is called as it might use
~0, ~0 to denote 'all capabilities' for backward/forward compatibility.
This lets 'capsh --caps="all=eip" -- -c /bin/bash' run.

4) mask out undefined bit when we read a file capability off of disk as
again likely all bits are set in the xattr for forward/backward
compatibility.
This lets 'setcap all+pe /bin/bash; /bin/bash' run

Signed-off-by: Eric Paris 
Cc: Andrew Vagin 
Cc: Andrew G. Morgan 
Cc: Serge E. Hallyn 
Cc: Kees Cook 
Cc: Steve Grubb 
Cc: Dan Walsh 
Cc: sta...@vger.kernel.org
---
 fs/proc/array.c| 11 +--
 include/linux/capability.h |  5 -
 kernel/audit.c |  2 +-
 kernel/capability.c|  4 
 security/commoncap.c   |  3 +++
 5 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 64db2bc..3e1290b 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -297,15 +297,11 @@ static void render_cap_t(struct seq_file *m, const char 
*header,
seq_puts(m, header);
CAP_FOR_EACH_U32(__capi) {
seq_printf(m, "%08x",
-  a->cap[(_KERNEL_CAPABILITY_U32S-1) - __capi]);
+  a->cap[CAP_LAST_U32 - __capi]);
}
seq_putc(m, '\n');
 }
 
-/* Remove non-existent capabilities */
-#define NORM_CAPS(v) (v.cap[CAP_TO_INDEX(CAP_LAST_CAP)] &= \
-   CAP_TO_MASK(CAP_LAST_CAP + 1) - 1)
-
 static inline void task_cap(struct seq_file *m, struct task_struct *p)
 {
const struct cred *cred;
@@ -319,11 +315,6 @@ static inline void task_cap(struct seq_file *m, struct 
task_struct *p)
cap_bset= cred->cap_bset;
rcu_read_unlock();
 
-   NORM_CAPS(cap_inheritable);
-   NORM_CAPS(cap_permitted);
-   NORM_CAPS(cap_effective);
-   NORM_CAPS(cap_bset);
-
render_cap_t(m, "CapInh:\t", &cap_inheritable);

[PATCH] CAPABILITIES: remove undefined caps from all processes

2014-07-21 Thread Eric Paris
This is effectively a revert of 7b9a7ec565505699f503b4fcf61500dceb36e744
plus fixing it a different way...

We found, when trying to run an application from an application which
had dropped privs that the kernel does security checks on undefined
capability bits.  This was ESPECIALLY difficult to debug as those
undefined bits are hidden from /proc/$PID/status.

Consider a root application which drops all capabilities from ALL 4
capability sets.  We assume, since the application is going to set
eff/perm/inh from an array that it will clear not only the defined caps
less than CAP_LAST_CAP, but also the higher 28ish bits which are
undefined future capabilities.

The BSET gets cleared differently.  Instead it is cleared one bit at a
time.  The problem here is that in security/commoncap.c::cap_task_prctl()
we actually check the validity of a capability being read.  So any task
which attempts to 'read all things set in bset' followed by 'unset all
things set in bset' will not even attempt to unset the undefined bits
higher than CAP_LAST_CAP.

So the 'parent' will look something like:
CapInh: 
CapPrm: 
CapEff: 
CapBnd: ffc0

All of this 'should' be fine.  Given that these are undefined bits that
aren't supposed to have anything to do with permissions.  But they do...

So lets now consider a task which cleared the eff/perm/inh completely
and cleared all of the valid caps in the bset (but not the invalid caps
it couldn't read out of the kernel).  We know that this is exactly what
the libcap-ng library does and what the go capabilities library does.
They both leave you in that above situation if you try to clear all of
you capapabilities from all 4 sets.  If that root task calls execve()
the child task will pick up all caps not blocked by the bset.  The bset
however does not block bits higher than CAP_LAST_CAP.  So now the child
task has bits in eff which are not in the parent.  These are
'meaningless' undefined bits, but still bits which the parent doesn't
have.

The problem is now in cred_cap_issubset() (or any operation which does a
subset test) as the child, while a subset for valid cap bits, is not a
subset for invalid cap bits!  So now we set durring commit creds that
the child is not dumpable.  Given it is 'more priv' than its parent.  It
also means the parent cannot ptrace the child and other stupidity.

The solution here is 2 things.
1) stop hiding capability bits in status
we hide those upper bits which meant I couldn't spot this issue
2) stop giving any task undefined capability bits.  it's simple, it you
don't put those invalid bits in CAP_FULL_SET you won't get them in init
and you won't get them in any other task either.

Signed-off-by: Eric Paris 
Cc: Andrew Vagin 
Cc: Andrew G. Morgan 
Cc: Serge E. Hallyn 
Cc: Kees Cook 
Cc: Steve Grubb 
Cc: Dan Walsh 
Cc: sta...@kernel.org
---
 fs/proc/array.c| 9 -
 include/linux/capability.h | 2 +-
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 64db2bc..d882018 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -302,10 +302,6 @@ static void render_cap_t(struct seq_file *m, const char 
*header,
seq_putc(m, '\n');
 }
 
-/* Remove non-existent capabilities */
-#define NORM_CAPS(v) (v.cap[CAP_TO_INDEX(CAP_LAST_CAP)] &= \
-   CAP_TO_MASK(CAP_LAST_CAP + 1) - 1)
-
 static inline void task_cap(struct seq_file *m, struct task_struct *p)
 {
const struct cred *cred;
@@ -319,11 +315,6 @@ static inline void task_cap(struct seq_file *m, struct 
task_struct *p)
cap_bset= cred->cap_bset;
rcu_read_unlock();
 
-   NORM_CAPS(cap_inheritable);
-   NORM_CAPS(cap_permitted);
-   NORM_CAPS(cap_effective);
-   NORM_CAPS(cap_bset);
-
render_cap_t(m, "CapInh:\t", &cap_inheritable);
render_cap_t(m, "CapPrm:\t", &cap_permitted);
render_cap_t(m, "CapEff:\t", &cap_effective);
diff --git a/include/linux/capability.h b/include/linux/capability.h
index 84b13ad..1c36782 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -79,7 +79,7 @@ extern const kernel_cap_t __cap_init_eff_set;
 #else /* HAND-CODED capability initializers */
 
 # define CAP_EMPTY_SET((kernel_cap_t){{ 0, 0 }})
-# define CAP_FULL_SET ((kernel_cap_t){{ ~0, ~0 }})
+# define CAP_FULL_SET ((kernel_cap_t){{ ~0, CAP_TO_MASK(CAP_LAST_CAP + 1) 
-1 }})
 # define CAP_FS_SET   ((kernel_cap_t){{ CAP_FS_MASK_B0 \
| CAP_TO_MASK(CAP_LINUX_IMMUTABLE), \
CAP_FS_MASK_B1 } })
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Eric Paris
On Fri, 2014-07-11 at 12:32 -0400, Paul Moore wrote:
> On Friday, July 11, 2014 12:23:33 PM Eric Paris wrote:
> > On Fri, 2014-07-11 at 12:21 -0400, Paul Moore wrote:
> > > On Friday, July 11, 2014 12:16:47 PM Eric Paris wrote:
> > > > On Fri, 2014-07-11 at 12:11 -0400, Paul Moore wrote:
> > > > > On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
> > > > > > Incidentally: do seccomp users know that on an x86-64 system you can
> > > > > > recevie system calls from any of the x86 architectures, regardless
> > > > > > of
> > > > > > how the program is invoked?  (This is unusual, so normally denying
> > > > > > those
> > > > > > "alien" calls is the right thing to do.)
> > > > > 
> > > > > I obviously can't speak for all seccomp users, but libseccomp handles
> > > > > this
> > > > > by checking the seccomp_data->arch value at the start of the filter
> > > > > and
> > > > > killing (by default) any non-native architectures.  If you want, you
> > > > > can
> > > > > change this default behavior or add support for other architectures
> > > > > (e.g.
> > > > > create a filter that allows both x86-64 and x32 but disallows x86, or
> > > > > any
> > > > > combination of the three for that matter).
> > > > 
> > > > Maybe libseccomp does some HORRIFIC contortions under the hood, but the
> > > > interface is crap...  Since seccomp_data->arch can't distinguish between
> > > > X32 and X86_64.  If I write a seccomp filter which says
> > > > 
> > > > KILL arch != x86_64
> > > > KILL init_module
> > > > ALLOW everything else
> > > > 
> > > > I can still call init_module, I just have to use the X32 variant.
> > > > 
> > > > If libseccomp is translating:
> > > > 
> > > > KILL arch != x86_64 into:
> > > > 
> > > > KILL arch != x86_64
> > > > KILL syscall_nr >= 2000
> > > > 
> > > > That's just showing how dumb the kernel interface is...   Good for you
> > > > guys, but the kernel is just being dumb   :)
> > > 
> > > You're not going to hear me ever say that I like how the x32 ABI was done,
> > > it is a real mess from a seccomp filter point of view and we have to do
> > > some nasty stuff in libseccomp to make it all work correctly (see my
> > > comments on the libseccomp-devel list regarding my severe displeasure
> > > over x32), but what's done is done.
> > > 
> > > I think it's too late to change the x32 seccomp filter ABI.
> > 
> > So we have a security interface that is damn near impossible to get
> > right.  Perfect.
> 
> What?  Having to do two comparisons instead of one is "damn near impossible"? 
>  
> I think that might be a bit of an overreaction don't you think?

Actually no.  How can a normal userspace application coder POSSIBLY know
this?  Find this thread on an e-mail list, by accident?  
> 
> > I think this explains exactly why I support this idea.  Make X32 look
> > like everyone else ...
> 
> You do realize that this patch set makes x32 the odd man out by having 
> syscall_get_nr() return a different syscall number than what was used to make 
> the syscall?  I don't understand how that makes "x32 look like everyone else".

Ok, I buy the __X32_SYSCALL_BIT argument.  It can be dealt with in
audit.  No problem.  We don't need to strip it in syscall_get_nr().
I'll gladly concede that part of the patch series.

But given an x86_64 kernel a seccomp filter writer has to know about X32
and how to write rules to block the X32 ABI.  And I stick with my
assessment that x32 + seccomp is darn near impossible for a normal
developer to handle.  

Heck, even chromium took months to realize that x32 was a weird beast.
And they got it wrong on their first try.  Their original implementation
didn't handle __X32_SYSCALL_BIT quite right.  Looking at their code I'm
still not sure it does the right thing.  And they are the EXPERTS.  They
wrote seccomp!

> > Honestly, how many people are using seccomp on X32 and would be horribly
> > pissed if we just fixed it?
> 
> Okay, please stop suggesting we break the x32 kernel/user interface to 
> workaround a flaw in audit.  I get that it sucks for audit, I really do, but 
> this is audit's problem.

No one is asking to break X32 to fix audit.  Audit can handle itse

Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Eric Paris
On Fri, 2014-07-11 at 12:21 -0400, Paul Moore wrote:
> On Friday, July 11, 2014 12:16:47 PM Eric Paris wrote:
> > On Fri, 2014-07-11 at 12:11 -0400, Paul Moore wrote:
> > > On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
> > > > Incidentally: do seccomp users know that on an x86-64 system you can
> > > > recevie system calls from any of the x86 architectures, regardless of
> > > > how the program is invoked?  (This is unusual, so normally denying those
> > > > "alien" calls is the right thing to do.)
> > > 
> > > I obviously can't speak for all seccomp users, but libseccomp handles this
> > > by checking the seccomp_data->arch value at the start of the filter and
> > > killing (by default) any non-native architectures.  If you want, you can
> > > change this default behavior or add support for other architectures (e.g.
> > > create a filter that allows both x86-64 and x32 but disallows x86, or any
> > > combination of the three for that matter).
> > 
> > Maybe libseccomp does some HORRIFIC contortions under the hood, but the
> > interface is crap...  Since seccomp_data->arch can't distinguish between
> > X32 and X86_64.  If I write a seccomp filter which says
> > 
> > KILL arch != x86_64
> > KILL init_module
> > ALLOW everything else
> > 
> > I can still call init_module, I just have to use the X32 variant.
> > 
> > If libseccomp is translating:
> > 
> > KILL arch != x86_64 into:
> > 
> > KILL arch != x86_64
> > KILL syscall_nr >= 2000
> > 
> > That's just showing how dumb the kernel interface is...   Good for you
> > guys, but the kernel is just being dumb   :)
> 
> You're not going to hear me ever say that I like how the x32 ABI was done, it 
> is a real mess from a seccomp filter point of view and we have to do some 
> nasty stuff in libseccomp to make it all work correctly (see my comments on 
> the libseccomp-devel list regarding my severe displeasure over x32), but 
> what's done is done.
> 
> I think it's too late to change the x32 seccomp filter ABI.

So we have a security interface that is damn near impossible to get
right.  Perfect.

I think this explains exactly why I support this idea.  Make X32 look
like everyone else and put these custom horrific hacks in seccomp if we
are unwilling to 'do it right'

Honestly, how many people are using seccomp on X32 and would be horribly
pissed if we just fixed it?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter

2014-07-11 Thread Eric Paris
On Fri, 2014-07-11 at 12:11 -0400, Paul Moore wrote:
> On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
> > Incidentally: do seccomp users know that on an x86-64 system you can
> > recevie system calls from any of the x86 architectures, regardless of
> > how the program is invoked?  (This is unusual, so normally denying those
> > "alien" calls is the right thing to do.)
> 
> I obviously can't speak for all seccomp users, but libseccomp handles this by 
> checking the seccomp_data->arch value at the start of the filter and killing 
> (by default) any non-native architectures.  If you want, you can change this 
> default behavior or add support for other architectures (e.g. create a filter 
> that allows both x86-64 and x32 but disallows x86, or any combination of the 
> three for that matter).

Maybe libseccomp does some HORRIFIC contortions under the hood, but the
interface is crap...  Since seccomp_data->arch can't distinguish between
X32 and X86_64.  If I write a seccomp filter which says

KILL arch != x86_64
KILL init_module
ALLOW everything else

I can still call init_module, I just have to use the X32 variant.

If libseccomp is translating:

KILL arch != x86_64 into:

KILL arch != x86_64
KILL syscall_nr >= 2000

That's just showing how dumb the kernel interface is...   Good for you
guys, but the kernel is just being dumb   :)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] auditsc: audit_krule mask accesses need bounds checking

2014-06-10 Thread Eric Paris
On Mon, 2014-06-09 at 16:36 -0700, Linus Torvalds wrote:
> On Mon, Jun 9, 2014 at 3:56 PM, Andy Lutomirski  wrote:
> >
> > In this particular case, it's my patch, and I've never sent you a pull
> > request.  I sort of assumed that secur...@kernel.org magically caused
> > acknowledged fixes to end up in your tree.  I'm not sure what I'm
> > supposed to do here.
> >
> > Maybe the confusion is because Eric resent the patch?
> 
> So I saw the patch twice in email , but neither time did I get the
> feeling that I should apply it. The first time Eric responded to it,
> so the maintainer clearly knew about it and was reacting to it, so I
> ignored it. The second time Eric resent it as email to various people
> and lists, and I didn't react to it because I expected that was again
> just for discussion.
> 
> So I'm not blaming you as much as Eric.

No, it's good to blame me.  I was trying to deal with it as fast as I
could since I was already trying to ignore my computer before I got
married last weekend and took the last week off.  I realized when I got
back yesterday you hadn't picked it up and it was on my list of things
to try to handle today.  I think both 1 and 2 are good to be applied to
your tree.  Although only #1 is really an absolutely critical issue.

>  If a maintainer expects me to
> pick it up from the email (rather than his usual git pulls), I want
> that maintainer to *say* so. Because otherwise, as mentioned, I expect
> it to come through the maintainer tree as usual.
> 
>   Linus


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] inotify: bug 77111 - fix reusage of watch descriptors

2014-06-09 Thread Eric Paris
This 'bug' feels very theoretical to me.  There were about 3 kernel
releases back when inotify was rewriten onto fsnotify where it was
intentionally reusing wd's.  So instead of a MAX_INT wrap all you have
to do was a single create/destroy/create to get reuse.  Almost every
utility survived...   But we did have 2 things 'misbehave'.   udev and
restorecond (an SELinux utility)   Both of which were rewritten to
handle reuse, but then we stopped re-use because obviously it had
broken userspace...

I'm also not all that worried about the 'long lived daemon' comment
since it wasn't until about 4 kernels ago (the idr_alloc_cyclic work
from jlayton) that the idr COULD loop.  And I'd never seen a complaint
that anyone hit the max.  So any looping seems unlikely.

In any case...   I'm so scared of changing any object lifetime in this
code.  It's just really complex!

What happens with this patch if I close(inotify_fd) ?  We
obviously can't write the IN_IGNORED event to userspace so we don't free
the mark/idr entry?  Ever?  pretty sure it'll BUG()...

What happens if you cause an IGNORED events, don't read it, then
close(inotify_fd)...

Also if the copy to userspace fails we NEVER free the mark?  That'll
BUG() eventually, I'm pretty sure...

Also by leaving he mark until you sent the ignored to userspace, can't
we keep generating events afer the ignored?

I'm not sure this patch is helping, but maybe I'm not seeing it right...

I do think a mention of potential reuse in the man page is
appropriate...

On Mon,  2 Jun 2014 20:03:42 +0200
Heinrich Schuchardt  wrote:

> Without the patch inotify watch descriptors may be reused by
> inotify_add_watch before all events for the previous usage
> of the watch descriptor have been read.
> 
> With the patch watch descriptors are removed from the idr only
> after the IN_IGNORED event has been read.
> 
> The sequence of some static routines is rearranged.
> 
> The significant change moving the call of
> inotify_remove_from_idr form inotify_ignored_and_remove_idr to
> to copy_event_to_user and renaming inotify_ignored_and_remove_idr
> to inotify_ignored.
> 
> cf.
> https://bugzilla.kernel.org/show_bug.cgi?id=77111
> 
> Signed-off-by: Heinrich Schuchardt 
> ---
>  fs/notify/inotify/inotify.h  |   4 +-
>  fs/notify/inotify/inotify_fsnotify.c |   2 +-
>  fs/notify/inotify/inotify_user.c | 257
> ++- 3 files changed, 135
> insertions(+), 128 deletions(-)
> 
> diff --git a/fs/notify/inotify/inotify.h b/fs/notify/inotify/inotify.h
> index ed855ef..596c513 100644
> --- a/fs/notify/inotify/inotify.h
> +++ b/fs/notify/inotify/inotify.h
> @@ -20,8 +20,8 @@ static inline struct inotify_event_info
> *INOTIFY_E(struct fsnotify_event *fse) return container_of(fse,
> struct inotify_event_info, fse); }
>  
> -extern void inotify_ignored_and_remove_idr(struct fsnotify_mark
> *fsn_mark,
> -struct fsnotify_group
> *group); +extern void inotify_ignored(struct fsnotify_mark *fsn_mark,
> + struct fsnotify_group *group);
>  extern int inotify_handle_event(struct fsnotify_group *group,
>   struct inode *inode,
>   struct fsnotify_mark *inode_mark,
> diff --git a/fs/notify/inotify/inotify_fsnotify.c
> b/fs/notify/inotify/inotify_fsnotify.c index 43ab1e1..68729dd 100644
> --- a/fs/notify/inotify/inotify_fsnotify.c
> +++ b/fs/notify/inotify/inotify_fsnotify.c
> @@ -122,7 +122,7 @@ int inotify_handle_event(struct fsnotify_group
> *group, 
>  static void inotify_freeing_mark(struct fsnotify_mark *fsn_mark,
> struct fsnotify_group *group) {
> - inotify_ignored_and_remove_idr(fsn_mark, group);
> + inotify_ignored(fsn_mark, group);
>  }
>  
>  /*
> diff --git a/fs/notify/inotify/inotify_user.c
> b/fs/notify/inotify/inotify_user.c index 78a2ca3..7a354b0 100644
> --- a/fs/notify/inotify/inotify_user.c
> +++ b/fs/notify/inotify/inotify_user.c
> @@ -164,115 +164,6 @@ static struct fsnotify_event
> *get_one_event(struct fsnotify_group *group, return event;
>  }
>  
> -/*
> - * Copy an event to user space, returning how much we copied.
> - *
> - * We already checked that the event size is smaller than the
> - * buffer we had in "get_one_event()" above.
> - */
> -static ssize_t copy_event_to_user(struct fsnotify_group *group,
> -   struct fsnotify_event *fsn_event,
> -   char __user *buf)
> -{
> - struct inotify_event inotify_event;
> - struct inotify_event_info *event;
> - size_t event_size = sizeof(struct inotify_event);
> - size_t name_len;
> - size_t pad_name_len;
> -
> - pr_debug("%s: group=%p event=%p\n", __func__, group,
> fsn_event); -
> - event = INOTIFY_E(fsn_event);
> - name_len = event->name_len;
> - /*
> -  * round up name length so it is a multiple of event_size
> -  * plus an extra byte for the terminating '\0'.
> -  */
> - pad_n

[PATCH 1/2] auditsc: audit_krule mask accesses need bounds checking

2014-05-28 Thread Eric Paris
From: Andy Lutomirski 

Fixes an easy DoS and possible information disclosure.

This does nothing about the broken state of x32 auditing.

eparis: If the admin has enabled auditd and has specifically loaded audit
rules.  This bug has been around since before git.  Wow...

Cc: sta...@vger.kernel.org
Signed-off-by: Andy Lutomirski 
Signed-off-by: Eric Paris 
---
 kernel/auditsc.c | 27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 254ce20..842f58a 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -728,6 +728,22 @@ static enum audit_state audit_filter_task(struct 
task_struct *tsk, char **key)
return AUDIT_BUILD_CONTEXT;
 }
 
+static int audit_in_mask(const struct audit_krule *rule, unsigned long val)
+{
+   int word, bit;
+
+   if (val > 0x)
+   return false;
+
+   word = AUDIT_WORD(val);
+   if (word >= AUDIT_BITMASK_SIZE)
+   return false;
+
+   bit = AUDIT_BIT(val);
+
+   return rule->mask[word] & bit;
+}
+
 /* At syscall entry and exit time, this filter is called if the
  * audit_state is not low enough that auditing cannot take place, but is
  * also not high enough that we already know we have to write an audit
@@ -745,11 +761,8 @@ static enum audit_state audit_filter_syscall(struct 
task_struct *tsk,
 
rcu_read_lock();
if (!list_empty(list)) {
-   int word = AUDIT_WORD(ctx->major);
-   int bit  = AUDIT_BIT(ctx->major);
-
list_for_each_entry_rcu(e, list, list) {
-   if ((e->rule.mask[word] & bit) == bit &&
+   if (audit_in_mask(&e->rule, ctx->major) &&
audit_filter_rules(tsk, &e->rule, ctx, NULL,
   &state, false)) {
rcu_read_unlock();
@@ -769,20 +782,16 @@ static enum audit_state audit_filter_syscall(struct 
task_struct *tsk,
 static int audit_filter_inode_name(struct task_struct *tsk,
   struct audit_names *n,
   struct audit_context *ctx) {
-   int word, bit;
int h = audit_hash_ino((u32)n->ino);
struct list_head *list = &audit_inode_hash[h];
struct audit_entry *e;
enum audit_state state;
 
-   word = AUDIT_WORD(ctx->major);
-   bit  = AUDIT_BIT(ctx->major);
-
if (list_empty(list))
return 0;
 
list_for_each_entry_rcu(e, list, list) {
-   if ((e->rule.mask[word] & bit) == bit &&
+   if (audit_in_mask(&e->rule, ctx->major) &&
audit_filter_rules(tsk, &e->rule, ctx, n, &state, false)) {
ctx->current_state = state;
return 1;
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] audit: do not select HAVE_ARCH_AUDITSYSCALL on x32

2014-05-28 Thread Eric Paris
When x32 was introduced it assumed that it would get syscall audit for
free (since it works for x86 and x86_64).  However, the audit system
assumed that the syscall table has less that 2048 entries.  This is not
the case for x32 (or it is, kinda sorta)

Since audit syscall does not work on x32 stop selecting it.

Signed-off-by: Eric Paris 
Cc: Andy Lutomirski 
---
 arch/x86/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 56f47ca..e11c4da 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -125,7 +125,7 @@ config X86
select RTC_LIB
select HAVE_DEBUG_STACKOVERFLOW
select HAVE_IRQ_EXIT_ON_IRQ_STACK if X86_64
-   select HAVE_ARCH_AUDITSYSCALL
+   select HAVE_ARCH_AUDITSYSCALL if !X86_X32
 
 config INSTRUCTION_DECODER
def_bool y
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] audit: Mark CONFIG_AUDITSYSCALL BROKEN and update help text

2014-05-28 Thread Eric Paris
On Wed, 2014-05-28 at 19:40 -0700, Andy Lutomirski wrote:
> On Wed, May 28, 2014 at 7:09 PM, Eric Paris  wrote:
> > NAK
> >
> > On Wed, 2014-05-28 at 18:44 -0700, Andy Lutomirski wrote:
> >> Here are some issues with the code:
> >>  - It thinks that syscalls have four arguments.
> >
> > Not true at all.  It records the registers that would hold the first 4
> > entries on syscall entry, for use later if needed, as getting those
> > later on some arches is not feasible (see ia64).  It makes no assumption
> > about how many syscalls a function has.
> 
> What about a5 and a6?

On the couple of syscalls where a5 and a6 had any state that was
actually wanted by someone (mainly just the fd on mmap) audit collects
it later in the actual syscall.

> >>  - It assumes that syscall numbers are between 0 and 2048.
> >
> > There could well be a bug here.  Not questioning that.  Although that
> > would be patch 1/2
> 
> Even with patch 1, it still doesn't handle large syscall numbers -- it
> just assumes they're not audited.

That's because we haven't had large syscall numbers.  That's the whole
point of an arch doing select HAVE_ARCH_AUDITSYSCALL.  If they don't
meet the requirements, they shouldn't be selecting it

> >>  - It's unclear whether it's supposed to be reliable.
> >
> > Unclear to whom?
> 
> To me.
> 
> If some inode access or selinux rule triggers an audit, is the auditsc
> code guaranteed to write an exit record?  And see below...

This is an honest question:  Do you want to discuss these things, or
would you be happier if I shut up, fix the bugs you found, and leave
things be?  I don't want to have an argument, I'm happy to have a
discussion if you think that will be beneficial...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] auditsc: audit_krule mask accesses need bounds checking

2014-05-28 Thread Eric Paris
On Wed, 2014-05-28 at 19:27 -0700, Andy Lutomirski wrote:
> On Wed, May 28, 2014 at 7:23 PM, Eric Paris  wrote:
> > On Wed, 2014-05-28 at 18:44 -0700, Andy Lutomirski wrote:
> >> Fixes an easy DoS and possible information disclosure.
> >>
> >> This does nothing about the broken state of x32 auditing.
> >>
> >> Cc: sta...@vger.kernel.org
> >> Signed-off-by: Andy Lutomirski 
> >> ---
> >>  kernel/auditsc.c | 27 ++-
> >>  1 file changed, 18 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> >> index f251a5e..7ccd9db 100644
> >> --- a/kernel/auditsc.c
> >> +++ b/kernel/auditsc.c
> >> @@ -728,6 +728,22 @@ static enum audit_state audit_filter_task(struct 
> >> task_struct *tsk, char **key)
> >>   return AUDIT_BUILD_CONTEXT;
> >>  }
> >>
> >> +static bool audit_in_mask(const struct audit_krule *rule, unsigned long 
> >> val)
> >> +{
> >> + int word, bit;
> >> +
> >> + if (val > 0x)
> >> + return false;
> >
> > Why is this necessary?
> 
> To avoid an integer overflow.  Admittedly, this particular overflow
> won't cause a crash, but it will cause incorrect results.

You know this code pre-dates git?  I admit, I'm shocked no one ever
noticed it before!  This is ANCIENT.  And clearly broken.

I'll likely ask Richard to add a WARN_ONCE() in both this place, and
below in word > AUDIT_BITMASK_SIZE so we might know if we ever need a
larger bitmask to store syscall numbers

It'd be nice if lib had a efficient bitmask implementation...

> >
> >> +
> >> + word = AUDIT_WORD(val);
> >> + if (word >= AUDIT_BITMASK_SIZE)
> >> + return false;
> >
> > Since this covers it and it extensible...
> >
> >> +
> >> + bit = AUDIT_BIT(val);
> >> +
> >> + return rule->mask[word] & bit;
> >
> > Returning an int as a bool creates worse code than just returning the
> > int.  (although in this case if the compiler chooses to inline it might
> > be smart enough not to actually convert this int to a bool and make
> > worse assembly...)   I'd suggest the function here return an int.  bools
> > usually make the assembly worse...
> 
> I'm ambivalent.  The right assembly would use flags on x86, not an int
> or a bool, and I don't know what the compiler will do.

Also, clearly X86_X32 was implemented without audit support, so we
shouldn't config it in.  What do you think of this?

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 25d2c6f..fa852e2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -129,7 +129,7 @@ config X86
select HAVE_IRQ_EXIT_ON_IRQ_STACK if X86_64
select HAVE_CC_STACKPROTECTOR
select GENERIC_CPU_AUTOPROBE
-   select HAVE_ARCH_AUDITSYSCALL
+   select HAVE_ARCH_AUDITSYSCALL if !X86_X32
 
 config INSTRUCTION_DECODER
def_bool y


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] auditsc: audit_krule mask accesses need bounds checking

2014-05-28 Thread Eric Paris
On Wed, 2014-05-28 at 18:44 -0700, Andy Lutomirski wrote:
> Fixes an easy DoS and possible information disclosure.
> 
> This does nothing about the broken state of x32 auditing.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Andy Lutomirski 
> ---
>  kernel/auditsc.c | 27 ++-
>  1 file changed, 18 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index f251a5e..7ccd9db 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -728,6 +728,22 @@ static enum audit_state audit_filter_task(struct 
> task_struct *tsk, char **key)
>   return AUDIT_BUILD_CONTEXT;
>  }
>  
> +static bool audit_in_mask(const struct audit_krule *rule, unsigned long val)
> +{
> + int word, bit;
> +
> + if (val > 0x)
> + return false;

Why is this necessary?  

> +
> + word = AUDIT_WORD(val);
> + if (word >= AUDIT_BITMASK_SIZE)
> + return false;

Since this covers it and it extensible...

> +
> + bit = AUDIT_BIT(val);
> +
> + return rule->mask[word] & bit;

Returning an int as a bool creates worse code than just returning the
int.  (although in this case if the compiler chooses to inline it might
be smart enough not to actually convert this int to a bool and make
worse assembly...)   I'd suggest the function here return an int.  bools
usually make the assembly worse...

Otherwise I'd give it an ACK...

> +}
> +
>  /* At syscall entry and exit time, this filter is called if the
>   * audit_state is not low enough that auditing cannot take place, but is
>   * also not high enough that we already know we have to write an audit
> @@ -745,11 +761,8 @@ static enum audit_state audit_filter_syscall(struct 
> task_struct *tsk,
>  
>   rcu_read_lock();
>   if (!list_empty(list)) {
> - int word = AUDIT_WORD(ctx->major);
> - int bit  = AUDIT_BIT(ctx->major);
> -
>   list_for_each_entry_rcu(e, list, list) {
> - if ((e->rule.mask[word] & bit) == bit &&
> + if (audit_in_mask(&e->rule, ctx->major) &&
>   audit_filter_rules(tsk, &e->rule, ctx, NULL,
>  &state, false)) {
>   rcu_read_unlock();
> @@ -769,20 +782,16 @@ static enum audit_state audit_filter_syscall(struct 
> task_struct *tsk,
>  static int audit_filter_inode_name(struct task_struct *tsk,
>  struct audit_names *n,
>  struct audit_context *ctx) {
> - int word, bit;
>   int h = audit_hash_ino((u32)n->ino);
>   struct list_head *list = &audit_inode_hash[h];
>   struct audit_entry *e;
>   enum audit_state state;
>  
> - word = AUDIT_WORD(ctx->major);
> - bit  = AUDIT_BIT(ctx->major);
> -
>   if (list_empty(list))
>   return 0;
>  
>   list_for_each_entry_rcu(e, list, list) {
> - if ((e->rule.mask[word] & bit) == bit &&
> + if (audit_in_mask(&e->rule, ctx->major) &&
>   audit_filter_rules(tsk, &e->rule, ctx, n, &state, false)) {
>   ctx->current_state = state;
>   return 1;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] audit: Mark CONFIG_AUDITSYSCALL BROKEN and update help text

2014-05-28 Thread Eric Paris
NAK

On Wed, 2014-05-28 at 18:44 -0700, Andy Lutomirski wrote:
> Here are some issues with the code:
>  - It thinks that syscalls have four arguments.

Not true at all.  It records the registers that would hold the first 4
entries on syscall entry, for use later if needed, as getting those
later on some arches is not feasible (see ia64).  It makes no assumption
about how many syscalls a function has.

>  - It's a performance disaster.

Only if you enable it.  If you don't use audit it is a single branch.
Hardly a disaster.

>  - It assumes that syscall numbers are between 0 and 2048.

There could well be a bug here.  Not questioning that.  Although that
would be patch 1/2

>  - It's unclear whether it's supposed to be reliable.

Unclear to whom?

>  - It's broken on things like x32.
>  - It can't support ARM OABI.

Some arches aren't supported?  And that makes it BROKEN?  

>  - Its approach to freeing memory is terrifying.

What?

None of your reasons hold water.  Bugs need to be fixed.  Try reporting
them...  This is just stupid.

> Signed-off-by: Andy Lutomirski 
> ---
>  init/Kconfig | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index 9d3585b..24d4b53 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -296,13 +296,16 @@ config HAVE_ARCH_AUDITSYSCALL
>   bool
>  
>  config AUDITSYSCALL
> - bool "Enable system-call auditing support"
> - depends on AUDIT && HAVE_ARCH_AUDITSYSCALL
> + bool "Enable system-call auditing support (not recommended)"
> + depends on AUDIT && HAVE_ARCH_AUDITSYSCALL && BROKEN
>   default y if SECURITY_SELINUX
>   help
> -   Enable low-overhead system-call auditing infrastructure that
> -   can be used independently or with another kernel subsystem,
> -   such as SELinux.
> +   Enable system-call auditing infrastructure that can be used
> +   independently or with another kernel subsystem, such as
> +   SELinux.
> +
> +   AUDITSYSCALL has serious performance and correctness issues.
> +   Use it with extreme caution.
>  
>  config AUDIT_WATCH
>   def_bool y


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 0/6] namespaces: log namespaces per task

2014-05-20 Thread Eric Paris
On Tue, 2014-05-20 at 09:12 -0400, Richard Guy Briggs wrote:
> The purpose is to track namespaces in use by logged processes from the
> perspective of init_*_ns.
> 
> 1/6 defines a function to generate them and assigns them.
> 
> Use a serial number per namespace (unique across one boot of one kernel)
> instead of the inode number (which is claimed to have had the right to change
> reserved and is not necessarily unique if there is more than one proc fs).  It
> could be argued that the inode numbers have now become a defacto interface and
> can't change now, but I'm proposing this approach to see if this helps address
> some of the objections to the earlier patchset.
> 
> 2/6 adds access functions to get to the serial numbers in a similar way to
> inode access for namespace proc operations.
> 
> 3/6 implements, as suggested by Serge Hallyn, making these serial numbers
> available in /proc/self/ns/{ipc,mnt,net,pid,user,uts}_snum.  I chose "snum"
> instead of "seq" for consistency with inum and there are a number of other 
> uses
> of "seq" in the namespace code.
> 
> 4/6 exposes proc's ns entries structure which lists a number of useful
> operations per namespace type for other subsystems to use.
> 
> 5/6 provides an example of usage for audit_log_task_info() which is used by
> syscall audits, among others.  audit_log_task() and 
> audit_common_recv_message()
> would be other potential use cases.
> 
> Proposed output format:
> This differs slightly from Aristeu's patch because of the label conflict with
> "pid=" due to including it in existing records rather than it being a seperate
> record.  The serial numbers are printed in hex.
>   type=SYSCALL msg=audit(1399651071.433:72): arch=c03e syscall=272 
> success=yes exit=0 a0=4000 a1= a2=0 a3=22 items=0 ppid=1 
> pid=483 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 
> fsgid=0 tty=(none) ses=4294967295 comm="(t-daemon)" 
> exe="/usr/lib/systemd/systemd" netns=97 utsns=2 ipcns=1 pidns=4 userns=3 
> mntns=5 subj=system_u:system_r:init_t:s0 key=(null)

I'm undecided if I'd rather see this as a separate NS_INFO record type.
It would mean we could filter them out of the logs...

Do we print out lots of pidns=0 for tasks not in a newly created NS?  Do
we want to?

> 6/6 tracks the creation and deletion of of namespaces, listing the type of
> namespace instance, related namespace id if there is one and the newly minted
> serial number.
> 
> Proposed output format:
>   type=NS_INIT msg=audit(1400217435.706:94): pid=524 uid=0 
> auid=4294967295 ses=4294967295 subj=system_u:system_r:mount_t:s0 type=2 
> old_snum=0 snum=a1 res=1

I'd love to be able to grep for netns=20 and find both the NS_INIT and
the SYSCALL/NS_INFO records, instead of having them named different
things.  So basically I think you want to translate the type= into a
string for the old_X= and X=...


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] namespaces: assign each namespace instance a serial number

2014-05-13 Thread Eric Paris
On Tue, 2014-05-13 at 11:30 -0400, Eric Paris wrote:
> On Tue, 2014-05-13 at 11:13 -0400, Richard Guy Briggs wrote:
> > On 14/05/13, Richard Guy Briggs wrote:
> > > On 14/05/10, Eric Paris wrote:
> > > > On Fri, 2014-05-09 at 20:27 -0400, Richard Guy Briggs wrote:
> > > > > Generate and assign a serial number per namespace instance since boot.
> > > > > 
> > > > > Use a serial number per namespace (unique across one boot of one 
> > > > > kernel)
> > > > > instead of the inode number (which is claimed to have had the right 
> > > > > to change
> > > > > reserved and is not necessarily unique if there is more than one proc 
> > > > > fs) to
> > > > > uniquely identify it per kernel boot.
> > > > > 
> > > > > Signed-off-by: Richard Guy Briggs 
> > > > > ---
> > > > 
> > > > > +/**
> > > > > + * ns_serial - compute a serial number for the namespace
> > > > > + *
> > > > > + * Compute a serial number for the namespace to uniquely identify it 
> > > > > in
> > > > > + * audit records.
> > > > > + */
> > > > > +unsigned long long ns_serial(void)
> > > > > +{
> > > > > + static DEFINE_SPINLOCK(serial_lock);
> > > > > + static unsigned long long serial = 4; /* reserved for IPC, UTS, 
> > > > > user, PID */
> > > > > + unsigned long flags;
> > > > > +
> > > > > + spin_lock_irqsave(&serial_lock, flags);
> > > > > + ++serial;
> > > > > + spin_unlock_irqrestore(&serial_lock, flags);
> > > > > + BUG_ON(!serial);
> > > > > +
> > > > > + return serial;
> > > > > +}
> > > > > +
> > > > >  static inline struct nsproxy *create_nsproxy(void)
> > > > >  {
> > > > >   struct nsproxy *nsproxy;
> > > > 
> > > > atomic64_t instead of doing it yourself?
> > > 
> > > I'm willing to switch to atomic64_*.  Thanks for pointing out its
> > > existence.
> > 
> > Same would then go for using atomic_t in audit_serial().
> 
> Yup, moving to an atomic in audit_serial() looks like a good idea to me.

Talking with steve on irc, neither of us see a need for the double
increment quirk if the serial wraps back to 0.  Nothing wrong with 0, it
is a number.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] namespaces: assign each namespace instance a serial number

2014-05-13 Thread Eric Paris
On Tue, 2014-05-13 at 11:13 -0400, Richard Guy Briggs wrote:
> On 14/05/13, Richard Guy Briggs wrote:
> > On 14/05/10, Eric Paris wrote:
> > > On Fri, 2014-05-09 at 20:27 -0400, Richard Guy Briggs wrote:
> > > > Generate and assign a serial number per namespace instance since boot.
> > > > 
> > > > Use a serial number per namespace (unique across one boot of one kernel)
> > > > instead of the inode number (which is claimed to have had the right to 
> > > > change
> > > > reserved and is not necessarily unique if there is more than one proc 
> > > > fs) to
> > > > uniquely identify it per kernel boot.
> > > > 
> > > > Signed-off-by: Richard Guy Briggs 
> > > > ---
> > > 
> > > > +/**
> > > > + * ns_serial - compute a serial number for the namespace
> > > > + *
> > > > + * Compute a serial number for the namespace to uniquely identify it in
> > > > + * audit records.
> > > > + */
> > > > +unsigned long long ns_serial(void)
> > > > +{
> > > > +   static DEFINE_SPINLOCK(serial_lock);
> > > > +   static unsigned long long serial = 4; /* reserved for IPC, UTS, 
> > > > user, PID */
> > > > +   unsigned long flags;
> > > > +
> > > > +   spin_lock_irqsave(&serial_lock, flags);
> > > > +   ++serial;
> > > > +   spin_unlock_irqrestore(&serial_lock, flags);
> > > > +   BUG_ON(!serial);
> > > > +
> > > > +   return serial;
> > > > +}
> > > > +
> > > >  static inline struct nsproxy *create_nsproxy(void)
> > > >  {
> > > > struct nsproxy *nsproxy;
> > > 
> > > atomic64_t instead of doing it yourself?
> > 
> > I'm willing to switch to atomic64_*.  Thanks for pointing out its
> > existence.
> 
> Same would then go for using atomic_t in audit_serial().

Yup, moving to an atomic in audit_serial() looks like a good idea to me.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15] MIPS: Add new AUDIT_ARCH token for the N32 ABI on MIPS64

2014-05-12 Thread Eric Paris
On Mon, 2014-05-12 at 14:53 -0400, Paul Moore wrote:
> On Tuesday, April 22, 2014 03:40:36 PM Markos Chandras wrote:
> > A MIPS64 kernel may support ELF files for all 3 MIPS ABIs
> > (O32, N32, N64). Furthermore, the AUDIT_ARCH_MIPS{,EL}64 token
> > does not provide enough information about the ABI for the 64-bit
> > process. As a result of which, userland needs to use complex
> > seccomp filters to decide whether a syscall belongs to the o32 or n32
> > or n64 ABI. Therefore, a new arch token for MIPS64/n32 is added so it
> > can be used by seccomp to explicitely set syscall filters for this ABI.
> > 
> > Link: http://sourceforge.net/p/libseccomp/mailman/message/32239040/
> > Cc: Andy Lutomirski 
> > Cc: Eric Paris 
> > Cc: Paul Moore 
> > Cc: Ralf Baechle 
> > Signed-off-by: Markos Chandras 
> > ---
> > Ralf, can we please have this in 3.15 (Assuming it's ACK'd)?
> > 
> > Thanks a lot!
> > ---
> >  arch/mips/include/asm/syscall.h |  2 ++
> >  include/uapi/linux/audit.h  | 12 
> >  2 files changed, 14 insertions(+)
> 
> [NOTE: Adding lkml to the To line to hopefully spur discussion/acceptance as 
> this *really* should be in 3.15]
> 
> I'm re-replying to this patch and adding lkml to the To line because I 
> believe 
> it is very important we get this patch into 3.15.  For those who don't follow 
> the MIPS architecture very closely, the upcoming 3.15 is the first release to 
> include support for seccomp filters, the latest generation of syscall 
> filtering which used a BPF based filter language.  For reason that are easy 
> to 
> understand, the syscall filters are ABI specific (e.g. syscall tables, word 
> length, endianness) and those generating syscall filters in userspace (e.g. 
> libseccomp) need to take great care to ensure that the generated filters take 
> the ABI into account and fail safely in the case where a different ABI is 
> used 
> (e.g. x86, x86_64, x32).
> 
> The patch below corrects, what is IMHO, an omission in the original MIPS 
> seccomp filter patch, allowing userspace to easily separate MIPS and MIPS64.  
> Without this patch we will be forced to handle MIPS/MIPS64 like we handle 
> x86_64/x32 which is a royal pain and not something I want to have deal with 
> again.
> 
> Further, while I don't want to speak for the audit folks, it is my 
> understanding that they want this patch for similar reasons.

Audit would also like to see this patch.  We can survive without it, but
having this patch lets us write a better/easier userspace.

Acked-by: Eric Paris 

> 
> Please merge this patch for 3.15 or at least provide some feedback as to why 
> this isn't a viable solution for upstream.  Once 3.15 ships, fixing this will 
> require breaking the MIPS ABI which isn't something any of us want.
> 
> Thanks,
> -Paul
> 
> > diff --git a/arch/mips/include/asm/syscall.h
> > b/arch/mips/include/asm/syscall.h index c6e9cd2..17960fe 100644
> > --- a/arch/mips/include/asm/syscall.h
> > +++ b/arch/mips/include/asm/syscall.h
> > @@ -133,6 +133,8 @@ static inline int syscall_get_arch(void)
> >  #ifdef CONFIG_64BIT
> > if (!test_thread_flag(TIF_32BIT_REGS))
> > arch |= __AUDIT_ARCH_64BIT;
> > +   if (test_thread_flag(TIF_32BIT_ADDR))
> > +   arch |= __AUDIT_ARCH_CONVENTION_MIPS64_N32;
> >  #endif
> >  #if defined(__LITTLE_ENDIAN)
> > arch |=  __AUDIT_ARCH_LE;
> > diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> > index 11917f7..1b1efdd 100644
> > --- a/include/uapi/linux/audit.h
> > +++ b/include/uapi/linux/audit.h
> > @@ -331,9 +331,17 @@ enum {
> >  #define AUDIT_FAIL_PRINTK  1
> >  #define AUDIT_FAIL_PANIC   2
> > 
> > +/*
> > + * These bits disambiguate different calling conventions that share an
> > + * ELF machine type, bitness, and endianness
> > + */
> > +#define __AUDIT_ARCH_CONVENTION_MASK 0x3000
> > +#define __AUDIT_ARCH_CONVENTION_MIPS64_N32 0x2000
> > +
> >  /* distinguish syscall tables */
> >  #define __AUDIT_ARCH_64BIT 0x8000
> >  #define __AUDIT_ARCH_LE   0x4000
> > +
> >  #define AUDIT_ARCH_ALPHA   (EM_ALPHA|__AUDIT_ARCH_64BIT|__AUDIT_ARCH_LE)
> >  #define AUDIT_ARCH_ARM (EM_ARM|__AUDIT_ARCH_LE)
> >  #define AUDIT_ARCH_ARMEB   (EM_ARM)
> > @@ -346,7 +354,11 @@ enum {
> >  #define AUDIT_ARCH_MIPS(EM_MIPS)
> >  #define AUDIT_ARCH_MIPSEL  (EM_MIPS|__AUDIT_ARCH_LE)
> >  #define AUDIT_ARCH_MIPS64  (EM_MIPS|__AUDIT_ARCH_64BIT)
> > +#define AUDIT_ARCH_MIPS64N32   (EM_MIPS|__AU

Re: [PATCH V2 2/6] audit: log namespace serial numbers

2014-05-10 Thread Eric Paris
On Fri, 2014-05-09 at 20:27 -0400, Richard Guy Briggs wrote:

Not so relevant because you delete all of this code later...  But
still...

> +#ifdef CONFIG_NAMESPACES
> +void audit_log_namespace_info(struct audit_buffer *ab, struct task_struct 
> *tsk)
> +{
> + struct nsproxy *nsproxy;
> +
> + rcu_read_lock();

ok, so we are under rcu_read_lock() and cannot sleep

> + nsproxy = task_nsproxy(tsk);
> + if (nsproxy != NULL) {
> + audit_log_format(ab, " mntns=%llx", 
> nsproxy->mnt_ns->serial_num);

But this could do an allocation, are we sure that everything used
GFP_ATOMIC when creating the audit buffer? [hint: it doesn't]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 2/6] audit: log namespace serial numbers

2014-05-10 Thread Eric Paris
On Fri, 2014-05-09 at 20:27 -0400, Richard Guy Briggs wrote:
> Log the namespace serial numbers of a task in audit_log_task_info() which
> is used by syscall audits, among others..
> 
> Idea first presented:
>   https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
> 
> Typical output format would look something like:
>   type=SYSCALL msg=audit(1399651071.433:72): arch=c03e 
> syscall=272 success=yes exit=0 a0=4000 a1= a2=0 a3=22 
> items=0 ppid=1 pid=483 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 
> egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="(t-daemon)" 
> exe="/usr/lib/systemd/systemd" netns=97 utsns=2 ipcns=1 pidns=4 userns=3 
> mntns=5 subj=system_u:system_r:init_t:s0 key=(null)
> 
> The serial numbers are printed in hex.
> 
> Suggested-by: Aristeu Rozanski 
> Signed-off-by: Richard Guy Briggs 
> Acked-by: Serge Hallyn 
> ---
>  include/linux/audit.h |7 +++
>  kernel/audit.c|   38 ++
>  2 files changed, 45 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index 22cfddb..0ef404a 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -101,6 +101,13 @@ extern int __weak audit_classify_compat_syscall(int abi, 
> unsigned syscall);
>  struct filename;
>  
>  extern void audit_log_session_info(struct audit_buffer *ab);
> +#ifdef CONFIG_NAMESPACES
> +extern void audit_log_namespace_info(struct audit_buffer *ab, struct 
> task_struct *tsk);
> +#else
> +void audit_log_namespace_info(struct audit_buffer *ab, struct task_struct 
> *tsk)
> +{
> +}
> +#endif
>  
>  #ifdef CONFIG_AUDIT_COMPAT_GENERIC
>  #define audit_is_compat(arch)  (!((arch) & __AUDIT_ARCH_64BIT))
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 59c0bbe..fe783ad 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -64,7 +64,15 @@
>  #endif
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
> +#include "../fs/mount.h"

I don't think such an include is ever a good idea and likely to get us
SHOT by Viro...

Why do we need this include?

> +#include 
> +#include 
>  #include 
> +#include 
> +#include 
>  #include 
>  
>  #include "audit.h"
> @@ -1617,6 +1625,35 @@ void audit_log_session_info(struct audit_buffer *ab)
>   audit_log_format(ab, " auid=%u ses=%u", auid, sessionid);
>  }
>  
> +#ifdef CONFIG_NAMESPACES
> +void audit_log_namespace_info(struct audit_buffer *ab, struct task_struct 
> *tsk)
> +{
> + struct nsproxy *nsproxy;
> +
> + rcu_read_lock();
> + nsproxy = task_nsproxy(tsk);
> + if (nsproxy != NULL) {
> + audit_log_format(ab, " mntns=%llx", 
> nsproxy->mnt_ns->serial_num);
> +#ifdef CONFIG_NET_NS
> + audit_log_format(ab, " netns=%llx", 
> nsproxy->net_ns->serial_num);
> +#endif
> +#ifdef CONFIG_UTS_NS
> + audit_log_format(ab, " utsns=%llx", 
> nsproxy->uts_ns->serial_num);
> +#endif
> +#ifdef CONFIG_IPC_NS
> + audit_log_format(ab, " ipcns=%llx", 
> nsproxy->ipc_ns->serial_num);
> +#endif
> + }
> +#ifdef CONFIG_PID_NS
> + audit_log_format(ab, " pidns=%llx", 
> task_active_pid_ns(tsk)->serial_num);
> +#endif
> +#ifdef CONFIG_USER_NS
> + audit_log_format(ab, " userns=%llx", task_cred_xxx(tsk, 
> user_ns)->serial_num);
> +#endif
> + rcu_read_unlock();
> +}
> +#endif /* CONFIG_NAMESPACES */
> +
>  void audit_log_key(struct audit_buffer *ab, char *key)
>  {
>   audit_log_format(ab, " key=");
> @@ -1861,6 +1898,7 @@ void audit_log_task_info(struct audit_buffer *ab, 
> struct task_struct *tsk)
>   up_read(&mm->mmap_sem);
>   } else
>   audit_log_format(ab, " exe=(null)");
> + audit_log_namespace_info(ab, tsk);
>   audit_log_task_context(ab);
>  }
>  EXPORT_SYMBOL(audit_log_task_info);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] namespaces: assign each namespace instance a serial number

2014-05-10 Thread Eric Paris
On Fri, 2014-05-09 at 20:27 -0400, Richard Guy Briggs wrote:
> Generate and assign a serial number per namespace instance since boot.
> 
> Use a serial number per namespace (unique across one boot of one kernel)
> instead of the inode number (which is claimed to have had the right to change
> reserved and is not necessarily unique if there is more than one proc fs) to
> uniquely identify it per kernel boot.
> 
> Signed-off-by: Richard Guy Briggs 
> ---

> +/**
> + * ns_serial - compute a serial number for the namespace
> + *
> + * Compute a serial number for the namespace to uniquely identify it in
> + * audit records.
> + */
> +unsigned long long ns_serial(void)
> +{
> + static DEFINE_SPINLOCK(serial_lock);
> + static unsigned long long serial = 4; /* reserved for IPC, UTS, user, 
> PID */
> + unsigned long flags;
> +
> + spin_lock_irqsave(&serial_lock, flags);
> + ++serial;
> + spin_unlock_irqrestore(&serial_lock, flags);
> + BUG_ON(!serial);
> +
> + return serial;
> +}
> +
>  static inline struct nsproxy *create_nsproxy(void)
>  {
>   struct nsproxy *nsproxy;

atomic64_t instead of doing it yourself?

and why _irqsave() ?  Can we seriously create new namespaces in irq
context?  If you use the atomic though, you don't have to worry about
it...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] watchdog: print all locks on a softlock

2014-05-01 Thread Eric Paris
If the CPU hits a softlockup this patch will also have it print the
information about all locks being held on the system.  This might help
determine if a lock is being held too long leading to this problem.

Signed-off-by: Eric Paris 
Cc: Frederic Weisbecker 
Cc: Andrew Morton 
Cc: Don Zickus 
Cc: Michal Hocko 
Cc: Ben Zhang 
---
 kernel/watchdog.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 516203e..a027240 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -322,6 +322,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
current->comm, task_pid_nr(current));
print_modules();
print_irqtrace_events(current);
+   debug_show_all_locks();
if (regs)
show_regs(regs);
else
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: fanotify API: FMODE_NONOTIFY, FMODE_EXEC, FMODE_NOCMTIME

2014-04-29 Thread Eric Paris
On Tue, 2014-04-29 at 22:10 +0200, Jan Kara wrote:
>   Hello,
> 
> On Tue 29-04-14 15:29:12, Michael Kerrisk (man-pages) wrote:
> > Can you offer any insight on Heinrich's question, below?
> > 
> > On Sun, Apr 13, 2014 at 4:05 PM, Heinrich Schuchardt  
> > wrote:
> > > On 06.04.2014 14:18, Michael Kerrisk (man-pages) wrote:
> > >>>
> > >>> ==
> > >>> >
> > >>> >  >> I notice that the FDs returned by read()s from the FAN FD have the
> > >>> >  >> FMODE_NONOTIFY flag (fcntl(F_GETFL)) flag set. If you know what
> > >>> > that's
> > >>> >  >> about, it would be good to say something about. But, if not, do 
> > >>> > not
> > >>> >  >> worry--just place a FIXME in the page source of fanotify(7)
> > >>> >
> > >>> >Fixed in fanotify.7
> > >>> >If the listener accesses the file through the file descriptor provided
> > >>> >no additional events are created.
> > >>
> > >> Ahh -- thanks for filling in that piece. I see that you refer to
> > >> fcntl(2) when discussing that flag. But fcntl(2) does not
> > >> mention that flag. I would rather see an explanation of this flag
> > >> in the fanotify pages.
> > >>
> > >
> > > I wrote a small test program and found:
> > >
> > > The flag FMODE_NONOTIFY can be read by function fcntl from userspace.
> > > int flag = fcntl(fd, F_GETFL)
> > >
> > > In include/uapi/asm-generic/fcntl.h I found the following comment:
> > >
> > > /*
> > >  * FMODE_EXEC is 0x20
> > >  * FMODE_NONOTIFY is 0x100
> > >  * These cannot be used by userspace O_* until internal and external open
> > >  * flags are split.
> > >  * -Eric Paris
> > >  */
> > >
> > > The definition of FMODE_NONOTIFY is in include/linux/fs.h but this
> > > include is only used to compile the Kernel and not supposed to be used by
> > > userspace.
> > >
> > > I think it is quite annoying that fcntl can return a flag that is not
> > > described in the manpage of fcntl and that is not defined in fcntl.h.
> > >
> > > But FMODE_NONOTIFY is not the only flag:
> > >
> > > I was able to pass
> > > 0x20 (FMODE_EXEC), and
> > > 0x800 (FMODE_NOCMTIME)
> > > to fanotify_init and received them as flag in the file descriptors for the
> > > fanotify events.
> > > I wonder why fanotify_init does not check import parameter event_f_flags 
> > > and
> > > return an error if any inappropriate value is set.
>   It seems to me fanotify_init() should really check event_f_flags have
> only valid flags set. In particular exclude FMODE_EXEC, FMODE_NOCMTIME, or
> FMODE_NONOTIFY.

Agreed.  Clearly a bug on my part.

> > > Should I put this into the BUGS section?
> > >
> > > Should the name of the flag FMODE_NONOTIFY be mentioned at all in the man
> > > pages?
> > >
> > > Or should we write:
> > >
> > > .I fd
> > > This is an open file descriptor for the object being accessed or
> > > .B FAN_NOFD
> > > if a queue overflow occurred.
> > > The file descriptor can be used to access the contents of the monitored 
> > > file
> > > or
> > > directory.
> > > It has an internal flag set, that suppresses fanotify event generation.
> > > Hence when the receiver of the fanotify event accesses the notified file 
> > > or
> > > directory using this file descriptor no additional events will be created.
> > > The reading application is responsible for closing the file descriptor.
>   So this is what I would prefer. Just mention the file descriptor does not
> generate new events. I would even go as far as masking kernel internal
> flags like FMODE_EXEC or FMODE_NONOTIFY from the result of F_GETFL. What do
> you think Al?

I agree on this point too...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6][v2] audit: implement multicast socket for journald

2014-04-24 Thread Eric Paris
On Thu, 2014-04-24 at 10:59 -0400, Daniel J Walsh wrote:
> I don't disagree.  I would think the real solution to this would be to
> not allow sysadm_t to get to SystemHigh, where all of the logging data
> will be stored.

make journalctl a userspace object manager and do selinux checks on if
it can see individual records?  so secadm_t running journalctl would see
them and sysadm running journalctl wouldn't see them?

Sounds elegant.  Who is going to code it?  *NOT IT!*

> 
> On 04/24/2014 09:22 AM, Eric Paris wrote:
> > They would be equivalent if and only if journald had CAP_AUDIT_READ.
> >
> > I suggest you take CAP_AUDIT_READ away from journald on systems which
> > need the secadm/sysadmin split (which is a ridiculously stupid split
> > anyway, but who am I to complain?)
> >
> > On Wed, Apr 23, 2014 at 11:52 AM, Daniel J Walsh  wrote:
> >> Meaning looking at the journal would be equivalent to looking at
> >> /var/log/audit/audit.log.
> >>
> >>
> >> On 04/23/2014 11:37 AM, Eric Paris wrote:
> >>> On Wed, 2014-04-23 at 11:36 -0400, Daniel J Walsh wrote:
> >>>> I guess the problem would be that the sysadm_t would be able to look at
> >>>> the journal which would now contain the audit content.
> >>> right.  so include it in the sysadm_secadm bool
> >>>
> >>>> On 04/23/2014 10:42 AM, Eric Paris wrote:
> >>>>> On Wed, 2014-04-23 at 09:40 -0400, Daniel J Walsh wrote:
> >>>>>> Here are the capabilities we currently give to sysadm_t with
> >>>>>> sysadm_secadm1.0.0Disabled
> >>>>>>
> >>>>>>allow sysadm_t sysadm_t : capability { chown dac_override
> >>>>>> dac_read_search fowner fsetid kill setgid setuid setpcap 
> >>>>>> linux_immutable
> >>>>>> net_bind_service net_broadcast net_admin net_raw ipc_lock ipc_owner
> >>>>>> sys_rawio sys_chroot sys_ptrace sys_pacct sys_admin sys_boot sys_nice
> >>>>>> sys_resource sys_time sys_tty_config mknod lease audit_write setfcap } 
> >>>>>> ;
> >>>>>>allow sysadm_t sysadm_t : capability { setgid setuid sys_chroot }
> >>>>>>
> >>>>>>allow sysadm_t sysadm_t : capability2 { syslog block_suspend } ;
> >>>>>>
> >>>>>> cap_audit_write might be a problem?
> >>>>> cap_audit_write is fine.
> >>>>>
> >>>>> syslogd_t (aka journal) is going to need the new permission
> >>>>> cap_audit_read.  Also, as steve pointed out, someone may be likely to
> >>>>> want to be able to disable that permission easily.
> >>>>>
> >>>>> -Eric
> >>>>>
> >>> ___
> >>> Selinux mailing list
> >>> seli...@tycho.nsa.gov
> >>> To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
> >>> To get help, send an email containing "help" to 
> >>> selinux-requ...@tycho.nsa.gov.
> >>>
> >>>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >> the body of a message to majord...@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> Please read the FAQ at  http://www.tux.org/lkml/
> > ___
> > Selinux mailing list
> > seli...@tycho.nsa.gov
> > To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
> > To get help, send an email containing "help" to 
> > selinux-requ...@tycho.nsa.gov.
> >
> >
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6][v2] audit: implement multicast socket for journald

2014-04-24 Thread Eric Paris
They would be equivalent if and only if journald had CAP_AUDIT_READ.

I suggest you take CAP_AUDIT_READ away from journald on systems which
need the secadm/sysadmin split (which is a ridiculously stupid split
anyway, but who am I to complain?)

On Wed, Apr 23, 2014 at 11:52 AM, Daniel J Walsh  wrote:
> Meaning looking at the journal would be equivalent to looking at
> /var/log/audit/audit.log.
>
>
> On 04/23/2014 11:37 AM, Eric Paris wrote:
>> On Wed, 2014-04-23 at 11:36 -0400, Daniel J Walsh wrote:
>>> I guess the problem would be that the sysadm_t would be able to look at
>>> the journal which would now contain the audit content.
>> right.  so include it in the sysadm_secadm bool
>>
>>> On 04/23/2014 10:42 AM, Eric Paris wrote:
>>>> On Wed, 2014-04-23 at 09:40 -0400, Daniel J Walsh wrote:
>>>>> Here are the capabilities we currently give to sysadm_t with
>>>>> sysadm_secadm1.0.0Disabled
>>>>>
>>>>>allow sysadm_t sysadm_t : capability { chown dac_override
>>>>> dac_read_search fowner fsetid kill setgid setuid setpcap linux_immutable
>>>>> net_bind_service net_broadcast net_admin net_raw ipc_lock ipc_owner
>>>>> sys_rawio sys_chroot sys_ptrace sys_pacct sys_admin sys_boot sys_nice
>>>>> sys_resource sys_time sys_tty_config mknod lease audit_write setfcap } ;
>>>>>allow sysadm_t sysadm_t : capability { setgid setuid sys_chroot }
>>>>>
>>>>>allow sysadm_t sysadm_t : capability2 { syslog block_suspend } ;
>>>>>
>>>>> cap_audit_write might be a problem?
>>>> cap_audit_write is fine.
>>>>
>>>> syslogd_t (aka journal) is going to need the new permission
>>>> cap_audit_read.  Also, as steve pointed out, someone may be likely to
>>>> want to be able to disable that permission easily.
>>>>
>>>> -Eric
>>>>
>>
>> ___
>> Selinux mailing list
>> seli...@tycho.nsa.gov
>> To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
>> To get help, send an email containing "help" to 
>> selinux-requ...@tycho.nsa.gov.
>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] audit: x86: drop arch from __audit_syscall_entry() interface

2014-04-23 Thread Eric Paris
From: Richard Guy Briggs 

Since the arch is found locally in __audit_syscall_entry(), there is no need to
pass it in as a parameter.  Delete it from the parameter list.

x86* was the only arch to call __audit_syscall_entry() directly and did so from
assembly code.

Signed-off-by: Richard Guy Briggs 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-au...@redhat.com
Signed-off-by: Eric Paris 

---

As this patch relies on changes in the audit tree, I think it
appropriate to send it through my tree rather than the x86 tree.
---
 arch/x86/ia32/ia32entry.S  | 12 ++--
 arch/x86/kernel/entry_32.S | 11 +--
 arch/x86/kernel/entry_64.S | 11 +--
 include/linux/audit.h  |  5 ++---
 kernel/auditsc.c   |  6 ++
 5 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 4299eb0..f5bdd28 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -186,12 +186,12 @@ sysexit_from_sys_call:
 
 #ifdef CONFIG_AUDITSYSCALL
.macro auditsys_entry_common
-   movl %esi,%r9d  /* 6th arg: 4th syscall arg */
-   movl %edx,%r8d  /* 5th arg: 3rd syscall arg */
-   /* (already in %ecx)   4th arg: 2nd syscall arg */
-   movl %ebx,%edx  /* 3rd arg: 1st syscall arg */
-   movl %eax,%esi  /* 2nd arg: syscall number */
-   movl $AUDIT_ARCH_I386,%edi  /* 1st arg: audit arch */
+   movl %esi,%r8d  /* 5th arg: 4th syscall arg */
+   movl %ecx,%r9d  /*swap with edx*/
+   movl %edx,%ecx  /* 4th arg: 3rd syscall arg */
+   movl %r9d,%edx  /* 3rd arg: 2nd syscall arg */
+   movl %ebx,%esi  /* 2nd arg: 1st syscall arg */
+   movl %eax,%edi  /* 1st arg: syscall number */
call __audit_syscall_entry
movl RAX-ARGOFFSET(%rsp),%eax   /* reload syscall number */
cmpq $(IA32_NR_syscalls-1),%rax
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index a2a4f46..078053e 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -456,12 +456,11 @@ sysenter_audit:
jnz syscall_trace_entry
addl $4,%esp
CFI_ADJUST_CFA_OFFSET -4
-   /* %esi already in 8(%esp) 6th arg: 4th syscall arg */
-   /* %edx already in 4(%esp) 5th arg: 3rd syscall arg */
-   /* %ecx already in 0(%esp) 4th arg: 2nd syscall arg */
-   movl %ebx,%ecx  /* 3rd arg: 1st syscall arg */
-   movl %eax,%edx  /* 2nd arg: syscall number */
-   movl $AUDIT_ARCH_I386,%eax  /* 1st arg: audit arch */
+   movl %esi,4(%esp)   /* 5th arg: 4th syscall arg */
+   movl %edx,(%esp)/* 4th arg: 3rd syscall arg */
+   /* %ecx already in %ecx3rd arg: 2nd syscall arg */
+   movl %ebx,%edx  /* 2nd arg: 1st syscall arg */
+   /* %eax already in %eax1st arg: syscall number */
call __audit_syscall_entry
pushl_cfi %ebx
movl PT_EAX(%esp),%eax  /* reload syscall number */
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 1e96c36..8292ff7 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -694,12 +694,11 @@ badsys:
 * jump back to the normal fast path.
 */
 auditsys:
-   movq %r10,%r9   /* 6th arg: 4th syscall arg */
-   movq %rdx,%r8   /* 5th arg: 3rd syscall arg */
-   movq %rsi,%rcx  /* 4th arg: 2nd syscall arg */
-   movq %rdi,%rdx  /* 3rd arg: 1st syscall arg */
-   movq %rax,%rsi  /* 2nd arg: syscall number */
-   movl $AUDIT_ARCH_X86_64,%edi/* 1st arg: audit arch */
+   movq %r10,%r8   /* 5th arg: 4th syscall arg */
+   movq %rdx,%rcx  /* 4th arg: 3rd syscall arg */
+   movq %rsi,%rdx  /* 3rd arg: 2nd syscall arg */
+   movq %rdi,%rsi  /* 2nd arg: 1st syscall arg */
+   movq %rax,%rdi  /* 1st arg: syscall number */
call __audit_syscall_entry
LOAD_ARGS 0 /* reload call-clobbered registers */
jmp system_call_fastpath
diff --git a/include/linux/audit.h b/include/linux/audit.h
index 783157b..1ae0089 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -115,8 +115,7 @@ extern void audit_log_session_info(struct audit_buffer *ab);
/* Public API */
 extern int  audit_alloc(struct task_struct *task);
 extern void __audit_free(struct task_struct *task);
-extern void __audit_syscall_entry(int arch,
- int major, unsigned long a0, unsign

Re: [PATCH 0/6][v2] audit: implement multicast socket for journald

2014-04-23 Thread Eric Paris
On Wed, 2014-04-23 at 11:36 -0400, Daniel J Walsh wrote:
> I guess the problem would be that the sysadm_t would be able to look at
> the journal which would now contain the audit content.

right.  so include it in the sysadm_secadm bool

> 
> On 04/23/2014 10:42 AM, Eric Paris wrote:
> > On Wed, 2014-04-23 at 09:40 -0400, Daniel J Walsh wrote:
> >> Here are the capabilities we currently give to sysadm_t with
> >> sysadm_secadm1.0.0Disabled
> >>
> >>allow sysadm_t sysadm_t : capability { chown dac_override
> >> dac_read_search fowner fsetid kill setgid setuid setpcap linux_immutable
> >> net_bind_service net_broadcast net_admin net_raw ipc_lock ipc_owner
> >> sys_rawio sys_chroot sys_ptrace sys_pacct sys_admin sys_boot sys_nice
> >> sys_resource sys_time sys_tty_config mknod lease audit_write setfcap } ;
> >>allow sysadm_t sysadm_t : capability { setgid setuid sys_chroot }
> >>
> >>allow sysadm_t sysadm_t : capability2 { syslog block_suspend } ;
> >>
> >> cap_audit_write might be a problem?
> > cap_audit_write is fine.
> >
> > syslogd_t (aka journal) is going to need the new permission
> > cap_audit_read.  Also, as steve pointed out, someone may be likely to
> > want to be able to disable that permission easily.
> >
> > -Eric
> >
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6][v2] audit: implement multicast socket for journald

2014-04-23 Thread Eric Paris
On Wed, 2014-04-23 at 09:40 -0400, Daniel J Walsh wrote:
> Here are the capabilities we currently give to sysadm_t with
> sysadm_secadm1.0.0Disabled
> 
>allow sysadm_t sysadm_t : capability { chown dac_override
> dac_read_search fowner fsetid kill setgid setuid setpcap linux_immutable
> net_bind_service net_broadcast net_admin net_raw ipc_lock ipc_owner
> sys_rawio sys_chroot sys_ptrace sys_pacct sys_admin sys_boot sys_nice
> sys_resource sys_time sys_tty_config mknod lease audit_write setfcap } ;
>allow sysadm_t sysadm_t : capability { setgid setuid sys_chroot }
> 
>allow sysadm_t sysadm_t : capability2 { syslog block_suspend } ;
> 
> cap_audit_write might be a problem?

cap_audit_write is fine.

syslogd_t (aka journal) is going to need the new permission
cap_audit_read.  Also, as steve pointed out, someone may be likely to
want to be able to disable that permission easily.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6][v2] audit: implement multicast socket for journald

2014-04-22 Thread Eric Paris
On Tue, 2014-04-22 at 22:25 -0400, Steve Grubb wrote:
> On Tuesday, April 22, 2014 09:31:52 PM Richard Guy Briggs wrote:
> > This is a patch set Eric Paris and I have been working on to add a
> > restricted capability read-only netlink multicast socket to kernel audit to
> > enable userspace clients such as systemd/journald to receive audit logs, in
> > addition to the bidirectional auditd userspace client.
> 
> Do have the ability to separate of secadm_r and sysadm_r? By allowing this, 
> we 
> will leak to a sysadmin that he is being audited by the security officer. In 
> a 
> lot of cases, they are one in the same person. But for others, they are not. 
> I 
> have a feeling this will cause problems for MLS systems.

Why?  This requires CAP_AUDIT_READ.  Just don't give CAP_AUDIT_READ to
places you don't want to have read permission.  Exactly the same as you
don't give CAP_AUDIT_CONTROL to sysadm_r.  (If we are giving
CAP_AUDIT_CONTROL to sysadm_r and you think that any file protections
on /var/log/audit/audit.log are adequate we are fooling ourselves!)

> Also, shouldn't we have an audit event for every attempt to connect to this 
> socket? We really need to know where this information is getting leaked to.

We certainly can.  What would you like to see in that event?

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the audit tree

2014-04-22 Thread Eric Paris
On Tue, 2014-04-22 at 16:22 +1000, Stephen Rothwell wrote:
> Hi Eric,
> 
> After merging the audit tree, today's linux-next build (sparc defconfig)
> failed like this:
> 
> In file included from include/linux/audit.h:29:0,
>  from mm/mmap.c:33:
> arch/sparc/include/asm/syscall.h: In function 'syscall_get_arch':
> arch/sparc/include/asm/syscall.h:131:9: error: 'TIF_32BIT' undeclared (first 
> use in this function)
> arch/sparc/include/asm/syscall.h:131:9: note: each undeclared identifier is 
> reported only once for each function it appears in
> 
> And many more ...
> 
> Caused by commit 374c0c054122 ("ARCH: AUDIT: implement syscall_get_arch
> for all arches").
> 
> I applied this patch for today:
> 
> From: Stephen Rothwell 
> Date: Tue, 22 Apr 2014 16:18:53 +1000
> Subject: [PATCH] fix ARCH: AUDIT: implement syscall_get_arch for all arches
> 
> Signed-off-by: Stephen Rothwell 
> ---
>  arch/sparc/include/asm/syscall.h | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/sparc/include/asm/syscall.h 
> b/arch/sparc/include/asm/syscall.h
> index fed3d511b108..a5a8153766b3 100644
> --- a/arch/sparc/include/asm/syscall.h
> +++ b/arch/sparc/include/asm/syscall.h
> @@ -128,8 +128,12 @@ static inline void syscall_set_arguments(struct 
> task_struct *task,
>  
>  static inline int syscall_get_arch(void)
>  {
> +#if defined(__sparc__) && defined(__arch64__)
>   return test_thread_flag(TIF_32BIT) ? AUDIT_ARCH_SPARC
>  : AUDIT_ARCH_SPARC64;
> +#else
> + return AUDIT_ARCH_SPARC;
> +#endif
>  }
>  
>  #endif /* __ASM_SPARC_SYSCALL_H */
> -- 

I swear I saw this and I fixed it.  Drat.  Do we want to do it this way?
Above in syscall_get_arguments() they use

#ifdef CONFIG_SPARC64
if (test_tsk_thread_flag(task, TIF_32BIT))
zero_extend = 1;
#endif

Is CONFIG_SPARC64 a better choice than:
   defined(__sparc__) && defined(__arch64__)

Maybe even better would be to copy what you suggested in powerpc:

---

diff --git a/arch/sparc/include/asm/syscall.h b/arch/sparc/include/asm/syscall.h
index fed3d51..49f71fd 100644
--- a/arch/sparc/include/asm/syscall.h
+++ b/arch/sparc/include/asm/syscall.h
@@ -128,8 +128,7 @@ static inline void syscall_set_arguments(struct task_struct 
*task,
 
 static inline int syscall_get_arch(void)
 {
-   return test_thread_flag(TIF_32BIT) ? AUDIT_ARCH_SPARC
-  : AUDIT_ARCH_SPARC64;
+   return is_32bit_task() ? AUDIT_ARCH_SPARC : AUDIT_ARCH_SPARC64;
 }
 
 #endif /* __ASM_SPARC_SYSCALL_H */
diff --git a/arch/sparc/include/asm/thread_info_32.h 
b/arch/sparc/include/asm/thread_info_32.h
index 96efa7a..acd2be0 100644
--- a/arch/sparc/include/asm/thread_info_32.h
+++ b/arch/sparc/include/asm/thread_info_32.h
@@ -130,6 +130,8 @@ register struct thread_info *current_thread_info_reg 
asm("g6");
 #define _TIF_DO_NOTIFY_RESUME_MASK (_TIF_NOTIFY_RESUME | \
 _TIF_SIGPENDING)
 
+#define is_32bit_task()(0)
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_THREAD_INFO_H */
diff --git a/arch/sparc/include/asm/thread_info_64.h 
b/arch/sparc/include/asm/thread_info_64.h
index a5f01ac..5a4f660 100644
--- a/arch/sparc/include/asm/thread_info_64.h
+++ b/arch/sparc/include/asm/thread_info_64.h
@@ -219,6 +219,8 @@ register struct thread_info *current_thread_info_reg 
asm("g6");
 _TIF_NEED_RESCHED)
 #define _TIF_DO_NOTIFY_RESUME_MASK (_TIF_NOTIFY_RESUME | _TIF_SIGPENDING)
 
+#define is_32bit_task()(test_thread_flag(TIF_32BIT))
+
 /*
  * Thread-synchronous status.
  *


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: manual merge of the audit tree with Linus' tree

2014-04-16 Thread Eric Paris
On Wed, 2014-04-16 at 14:02 +1000, Stephen Rothwell wrote:

> You could have avoided this by doing a fast forward merge of v3.15-rc1
> instead of the v3.14 merge (since everything in your tree before that
> merge was also in Linus' tree by v3.15-rc1).

This is a situation I've never really known the right way to handle.  I
certainly could/can fast forward to 3.15-rc1, but then I have a random
crap development base for the audit tree.  Which is especially bad sine
-rc1 doesn't even boot on my main machine.

What I've always done is to merge the last release right after the pull
and go from there, but it clearly leaves conflict potential

Which is preferred?  I've always enjoyed having my trees based on a
release

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] gpio: ich: set regs and reglen for i3100 and ich6 chipset

2014-04-15 Thread Eric Paris
On Tue, 2014-04-15 at 14:21 +0200, Vincent Donnefort wrote:
> From: Vincent Donnefort 
> 
> This patch fixes kernel NULL pointer BUG introduced by the following commit:
> b667cf488aa9476b0ab64acd91f2a96f188cfd21
> gpio: ich: Add support for multiple register addresses.
> 
> Signed-off-by: Vincent Donnefort 

Things seem much happier now!  Thank you sir!

Tested-by: Eric Paris 

> 
> diff --git a/drivers/gpio/gpio-ich.c b/drivers/gpio/gpio-ich.c
> index e73c675..7030422 100644
> --- a/drivers/gpio/gpio-ich.c
> +++ b/drivers/gpio/gpio-ich.c
> @@ -305,6 +305,8 @@ static struct ichx_desc ich6_desc = {
>  
>   .ngpio = 50,
>   .have_blink = true,
> + .regs = ichx_regs,
> + .reglen = ichx_reglen,
>  };
>  
>  /* Intel 3100 */
> @@ -324,6 +326,8 @@ static struct ichx_desc i3100_desc = {
>   .uses_gpe0 = true,
>  
>   .ngpio = 50,
> + .regs = ichx_regs,
> + .reglen = ichx_reglen,
>  };
>  
>  /* ICH7 and ICH8-based */


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git bisect regression 3.15-rc1] NULL ptr deref in ichx_gpio_probe

2014-04-14 Thread Eric Paris
A tad more information.  I did a build of-rc1 with the GPIO_ICH module built in 
so I could use addr2line to help you run it down.  No idea if this is actually 
useful for you...

$ addr2line --inline --exe=vmlinux 813fc4e0
/storage/kernel/ichx-rebase/drivers/gpio/gpio-ich.c:388
/storage/kernel/ichx-rebase/drivers/gpio/gpio-ich.c:461

-Eric

- Original Message -
> I cannot boot 3.15-rc1 kernels because I get a NULL ptr bug in
> ichx_gpio_probe.  The backtrace is at the end of the e-mail.  I did a
> bisect and found:
> 
> $ git bisect good
> b667cf488aa9476b0ab64acd91f2a96f188cfd21 is the first bad commit
> commit b667cf488aa9476b0ab64acd91f2a96f188cfd21
> Author: Vincent Donnefort 
> Date:   Fri Feb 7 14:21:05 2014 +0100
> 
> gpio: ich: Add support for multiple register addresses
> 
> This patch introduces regs and reglen pointers which allow a chipset to
> have
> register addresses differing from ICH ones.
> 
> Signed-off-by: Vincent Donnefort 
> Signed-off-by: Linus Walleij 
> 
> :04 04 f69690db4ff26eb01553bbc33679bf43d9054948
> 889bd1726d656d0a274edbc41c220e67e6151500 M  drivers
> 
> I am attaching the full dmesg from that boot as possibly other
> information will be helpful...
> 
> The Backtrace:
> 
> [   18.021255] BUG: unable to handle kernel NULL pointer dereference at
> (null)
> [   18.021617] IP: [] ichx_gpio_probe+0x2a0/0x41c
> [gpio_ich]
> [   18.021918] PGD 0
> [   18.022011] Oops:  [#1] SMP
> [   18.022011] Modules linked in: gpio_ich(+) snd_seq_device snd_pcm
> i5400_edac joydev edac_core parport_pc snd_timer lpc_ich shpchp parport
> tpm_tis snd soundcore microcode i2c_i801 serio_raw mfd_core i5k_amb tpm
> nouveau hid_logitech_dj video mxm_wmi wmi i2c_algo_bit drm_kms_helper ttm
> drm ata_generic tg3 pata_acpi ptp pps_core i2c_core
> [   18.022011] CPU: 5 PID: 553 Comm: systemd-udevd Not tainted 3.14.0-rc1+ #9
> [   18.022011] Hardware name: Dell Inc. Precision WorkStation T5400  /0RW203,
> BIOS A04 08/21/2008
> [   18.022011] task: 880033e2cc50 ti: 88044ba26000 task.ti:
> 88044ba26000
> [   18.022011] RIP: 0010:[]  []
> ichx_gpio_probe+0x2a0/0x41c [gpio_ich]
> [   18.022011] RSP: 0018:88044ba27ba0  EFLAGS: 00010246
> [   18.022011] RAX:  RBX:  RCX:
> 
> [   18.022011] RDX:  RSI: 0100 RDI:
> 81c3e180
> [   18.022011] RBP: 88044ba27bd0 R08:  R09:
> 880034fb04b0
> [   18.022011] R10: 0001 R11: 880033e2d7f0 R12:
> 880034fb
> [   18.022011] R13: 88044c28dcc0 R14: 0003 R15:
> 
> [   18.022011] FS:  7fbc8df5f880() GS:88045e00()
> knlGS:
> [   18.022011] CS:  0010 DS:  ES:  CR0: 80050033
> [   18.022011] CR2:  CR3: 00044b95e000 CR4:
> 07e0
> [   18.022011] Stack:
> [   18.022011]  880034fb04b0 880034fb0010 a02a6028
> 880034fb
> [   18.022011]   0001 88044ba27c00
> 814f4a55
> [   18.022011]  814f21a2 880034fb0010 
> a02a6028
> [   18.022011] Call Trace:
> [   18.022011]  [] platform_drv_probe+0x45/0xb0
> [   18.022011]  [] ? driver_sysfs_add+0x82/0xb0
> [   18.022011]  [] driver_probe_device+0x125/0x3a0
> [   18.022011]  [] __driver_attach+0x93/0xa0
> [   18.022011]  [] ? __device_attach+0x40/0x40
> [   18.022011]  [] bus_for_each_dev+0x73/0xc0
> [   18.022011]  [] driver_attach+0x1e/0x20
> [   18.022011]  [] bus_add_driver+0x188/0x260
> [   18.022011]  [] ? 0xa00c
> [   18.022011]  [] driver_register+0x64/0xf0
> [   18.022011]  [] ? 0xa00c
> [   18.022011]  [] __platform_driver_register+0x4a/0x50
> [   18.022011]  [] ichx_gpio_driver_init+0x17/0x1000
> [gpio_ich]
> [   18.022011]  [] do_one_initcall+0xfa/0x1b0
> [   18.022011]  [] ? set_memory_nx+0x43/0x50
> [   18.022011]  [] load_module+0x1c28/0x26d0
> [   18.022011]  [] ? store_uevent+0x70/0x70
> [   18.022011]  [] ? kernel_read+0x50/0x80
> [   18.022011]  [] SyS_finit_module+0xa6/0xd0
> [   18.022011]  [] system_call_fastpath+0x16/0x1b
> [   18.022011] Code: 00 00 40 61 2a a0 e9 f0 fd ff ff 48 8b 05 81 1f 00 00 45
> 31 c0 48 c7 c7 80 e1 c3 81 4c 89 4d d0 48 8b 48 08 48 8b 50 10 48 63 c3 <0f>
> b6 34 01 0f b6 14 1a 4c 89 c9 49 03 75 00 e8 8c a2 df e0 48
> [   18.022011] RIP  [] ichx_gpio_probe+0x2a0/0x41c
> [gpio_ich]
> [   18.022011]  RSP 
> [   18.022011] CR2: 
> [   18.047269] ---[ end trace 178b39b238232179 ]---
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] Audit subsystem for v3.15

2014-04-10 Thread Eric Paris
My tree is fine, your tree is fine, but the merge (even if you solve the
conflicts) has a build failure on MIPS just discovered when I published
by 'merge-test' branch because of the syscall_get_arch(void) changes.
(thanks to the kbuild test robot)

attached is my solution which I just sent to the MIPS people.  I hope
that you can apply it as part of the merge itself...

On Thu, 2014-04-10 at 19:53 -0400, Eric Paris wrote:
> Linus,
> 
> Please pull the audit tree for v3.15.  You will have merge conflicts.
> I'll publish my branch "merge-test" where I attempted to resolve them
> the way you will.
> 
> The main issue is an across tree change to syscall_get_arch().  I change
> it from taking a task_struct and pt_regs to take a void.  Not a single
> arch used or needed either of these arguments.  (For 3.16 we plan to
> implement the function on more arches)
> 
> There are a couple of conflicts where I made changes to #includes and
> your tree also has some additions.  Should be obvious.
> 
> Two conflict issues with Kconfig changes.  The first is just that your
> tree has some additional 'select' lines mine didn't.  Obvious to
> resolve.
> 
> The second is a conflict in init/Kconfig.  I don't completely understand
> it.  I believe it was the addition of ALPHA to the gigantic depends
> line.  I cherry-picked the patch from your tree that introduced ALPHA
> before I made the switch to HAVE_ARCH_AUDITSYSCALL.  I believed that
> would avoid the conflict, but I guess I was wrong.  In any case, I have
> the 'select HAVE_ARCH_AUDITSYSCALL' in alpha.
> 
> There is also a conflict given the last second EPERM->ECONNREFUSED
> switcheroo.  My 3.15 is less restrictive.  We return ECONNREFUSED only
> for non-init username.  Should be another easy one...
> 
> Please let me know if anything isn't easy/obvious for you!
> 
> Thank you!
> 
> -Eric
> 
> The following changes since commit b7d3622a39fde7658170b7f3cf6c6889bb8db30d:
> 
>   Merge tag 'v3.13' into for-3.15 (2014-03-07 11:41:32 -0500)
> 
> are available in the git repository at:
> 
> 
>   git://git.infradead.org/users/eparis/audit.git master
> 
> for you to fetch changes up to 312103d64d0fcadb332899a2c84b357ddb18f4e3:
> 
>   AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC 
> (2014-04-10 17:51:29 -0400)
> 
> 
> AKASHI Takahiro (2):
>   audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
>   audit: Add generic compat syscall support
> 
> Chris Metcalf (1):
>   AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC
> 
> Eric Paris (7):
>   audit: include subject in login records
>   syscall_get_arch: remove useless function arguments
>   audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
>   audit: define audit_is_compat in kernel internal header
>   AUDIT: Allow login in non-init namespaces
>   audit: do not cast audit_rule_data pointers pointlesly
>   audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range
> 
> Eric W. Biederman (2):
>   audit: Use struct net not pid_t to remember the network namespce to 
> reply in
>   audit: Send replies in the proper network namespace.
> 
> Joe Perches (1):
>   audit: remove stray newline from audit_log_execve_info() audit_panic() 
> call
> 
> Josh Boyer (1):
>   audit: remove stray newlines from audit_log_lost messages
> 
> Monam Agarwal (1):
>   kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c
> 
> Richard Guy Briggs (9):
>   audit: Use more current logging style again
>   capabilities: add descriptions for AUDIT_CONTROL and AUDIT_WRITE
>   audit: rename the misleading audit_get_context() to audit_take_context()
>   pid: get pid_t ppid of task in init_pid_ns
>   audit: convert PPIDs to the inital PID namespace.
>   audit: anchor all pid references in the initial pid namespace
>   audit: allow user processes to log from another PID namespace
>   audit: remove superfluous new- prefix in AUDIT_LOGIN messages
>   sched: declare pid_alive as inline
> 
> William Roberts (3):
>   mm: Create utility function for accessing a tasks commandline value
>   proc: Update get proc_pid_cmdline() to use mm.h helpers
>   audit: Audit proc//cmdline aka proctitle
> 
> 蔡正龙 (1):
>   alpha: Enable system-call auditing support.
> 
>  arch/alpha/Kconfig   |   4 
>  arch/alpha/include/asm/ptrace.h  |   5 +
>  arch/alpha/include/asm/thread_info.h |   2 ++
>  arch/alpha/kernel/Makefile   |   1 +
>  arch/alpha/kernel/audit.c|  

[GIT PULL] Audit subsystem for v3.15

2014-04-10 Thread Eric Paris
Linus,

Please pull the audit tree for v3.15.  You will have merge conflicts.
I'll publish my branch "merge-test" where I attempted to resolve them
the way you will.

The main issue is an across tree change to syscall_get_arch().  I change
it from taking a task_struct and pt_regs to take a void.  Not a single
arch used or needed either of these arguments.  (For 3.16 we plan to
implement the function on more arches)

There are a couple of conflicts where I made changes to #includes and
your tree also has some additions.  Should be obvious.

Two conflict issues with Kconfig changes.  The first is just that your
tree has some additional 'select' lines mine didn't.  Obvious to
resolve.

The second is a conflict in init/Kconfig.  I don't completely understand
it.  I believe it was the addition of ALPHA to the gigantic depends
line.  I cherry-picked the patch from your tree that introduced ALPHA
before I made the switch to HAVE_ARCH_AUDITSYSCALL.  I believed that
would avoid the conflict, but I guess I was wrong.  In any case, I have
the 'select HAVE_ARCH_AUDITSYSCALL' in alpha.

There is also a conflict given the last second EPERM->ECONNREFUSED
switcheroo.  My 3.15 is less restrictive.  We return ECONNREFUSED only
for non-init username.  Should be another easy one...

Please let me know if anything isn't easy/obvious for you!

Thank you!

-Eric

The following changes since commit b7d3622a39fde7658170b7f3cf6c6889bb8db30d:

  Merge tag 'v3.13' into for-3.15 (2014-03-07 11:41:32 -0500)

are available in the git repository at:


  git://git.infradead.org/users/eparis/audit.git master

for you to fetch changes up to 312103d64d0fcadb332899a2c84b357ddb18f4e3:

  AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC (2014-04-10 
17:51:29 -0400)


AKASHI Takahiro (2):
  audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
  audit: Add generic compat syscall support

Chris Metcalf (1):
  AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC

Eric Paris (7):
  audit: include subject in login records
  syscall_get_arch: remove useless function arguments
  audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
  audit: define audit_is_compat in kernel internal header
  AUDIT: Allow login in non-init namespaces
  audit: do not cast audit_rule_data pointers pointlesly
  audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range

Eric W. Biederman (2):
  audit: Use struct net not pid_t to remember the network namespce to reply 
in
  audit: Send replies in the proper network namespace.

Joe Perches (1):
  audit: remove stray newline from audit_log_execve_info() audit_panic() 
call

Josh Boyer (1):
  audit: remove stray newlines from audit_log_lost messages

Monam Agarwal (1):
  kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c

Richard Guy Briggs (9):
  audit: Use more current logging style again
  capabilities: add descriptions for AUDIT_CONTROL and AUDIT_WRITE
  audit: rename the misleading audit_get_context() to audit_take_context()
  pid: get pid_t ppid of task in init_pid_ns
  audit: convert PPIDs to the inital PID namespace.
  audit: anchor all pid references in the initial pid namespace
  audit: allow user processes to log from another PID namespace
  audit: remove superfluous new- prefix in AUDIT_LOGIN messages
  sched: declare pid_alive as inline

William Roberts (3):
  mm: Create utility function for accessing a tasks commandline value
  proc: Update get proc_pid_cmdline() to use mm.h helpers
  audit: Audit proc//cmdline aka proctitle

蔡正龙 (1):
  alpha: Enable system-call auditing support.

 arch/alpha/Kconfig   |   4 
 arch/alpha/include/asm/ptrace.h  |   5 +
 arch/alpha/include/asm/thread_info.h |   2 ++
 arch/alpha/kernel/Makefile   |   1 +
 arch/alpha/kernel/audit.c|  60 
+++
 arch/alpha/kernel/entry.S|   6 +-
 arch/alpha/kernel/ptrace.c   |   4 
 arch/arm/Kconfig |   1 +
 arch/arm/include/asm/syscall.h   |   5 ++---
 arch/ia64/Kconfig|   1 +
 arch/mips/include/asm/syscall.h  |   4 ++--
 arch/mips/kernel/ptrace.c|   2 +-
 arch/parisc/Kconfig  |   1 +
 arch/powerpc/Kconfig |   1 +
 arch/s390/Kconfig|   1 +
 arch/s390/include/asm/syscall.h  |   7 +++
 arch/sh/Kconfig  |   1 +
 arch/sparc/Kconfig   |   1 +
 arch/um/Kconfig.common   |   1 +
 arch/x86/Kconfig |   1 +
 arch/x86/include/asm/syscall.h   |  10 --
 drivers/tty/tty_audit.c  |   3 ++-
 fs/proc/base.c   |  36 ++--
 include

Re: Things I wish I'd known about Inotify

2014-04-04 Thread Eric Paris
On Fri, 2014-04-04 at 15:00 +0200, David Herrmann wrote:

> 1)
> IN_IGNORED is async and _immediate_ in case a file got deleted. So if
> you use watch-descriptors as keys for your objects, an _already_ used
> key might be returned by inotify_add_watch() if an IN_IGNORED is
> queued for the old watch (which implicitly destroys the watch). Once
> you read the IN_IGNORED from the queue, there is usually no way to
> know whether it's generated by the old watch or by the new. The
> man-page mentions this in:
>   "IN_IGNORED:  Watch was removed explicitly (inotify_rm_watch(2)) or
>automatically (file was deleted, or filesystem was unmounted)."
> I think we should add a note to BUGS that mentions this race (which is
> really not obvious from the description).
> 
> This race could be fixed by requiring an explicit inotify_rm_watch()
> if an IN_IGNORED was generated asynchronously.

For a brief while after the introduction of fsnotify this was a problem,
but not before then, or on anything remotely recent (like 4-5 years?).
We didn't re-use watch descriptors at all, so if you get a notification
after the IGNORED, its still the old one.  Today it's possible to wrap
around at INT_MAX and reuse, but that is a tee tiny issue...



Note that both of these races rely on watch-descriptors being reused
after they were freed. Turns out, that was "fixed" about exactly 1
year ago in:

commit a66c04b4534f9b25e1241dff9a9d94dff9fd66f8
Author: Jeff Layton 
Date:   Mon Apr 29 16:21:21 2013 -0700

inotify: convert inotify_add_to_idr() to use idr_alloc_cyclic()

So in case that was never backported, only older kernels are affected.
In newer kernels, wd reuse is quite unlikely. The races are still
there, though.



Actually that has nothing to do with it.  If anything, it reintroduces
the reuse since now it wraps instead of fails...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] integrity: get comm using lock to avoid race in string printing

2014-04-02 Thread Eric Paris
On Wed, 2014-04-02 at 14:12 -0400, Mimi Zohar wrote:
> On Wed, 2014-04-02 at 14:00 -0400, Steve Grubb wrote: 
> > Hello Mimi,
> > 
> > On Wednesday, April 02, 2014 01:39:47 PM Mimi Zohar wrote:
> > > This change is already being upstreamed as commit 73a6b44 "Integrity:
> > > Pass commname via get_task_comm()".
> > 
> > While I was looking at Richard's patch, I noticed a few places where cause 
> > and 
> > op are logged and the string isn't tied together with a _ or -. These are 
> > in 
> > ima/ima_appraise.c line 383, and ima/ima_policy.c lines 333, 657, and 683. 
> > Are 
> > these fixed upstream? Or should a patch be made?
> 
> Nothing has changed in terms of 'cause' and 'op'.  I would suggest
> making the changes in integrity_audit.c: integrity_audit_msg().

The question is actually, do you know of anyone who is expecting the
space, instead of a more 'audit standard' - or _ ?  If not, we'll change
it.  If so, we'll discuss more   :)

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH for v3.14] AUDIT: Allow login in non-init namespaces

2014-03-30 Thread Eric Paris
It its possible to configure your PAM stack to refuse login if
audit messages (about the login) were unable to be sent.  This is common
in many distros and thus normal configuration of many containers. The
PAM modules determine if audit is enabled/disabled in the kernel based
on the return value from sending an audit message on the netlink socket.
If userspace gets back ECONNREFUSED it believes audit is disabled in the
kernel.  If it gets any other error else it refuses to let the login
proceed.

Just about ever since the introduction of namespaces the kernel audit
subsystem has returned EPERM if the task sending a message was not in
the init user or pid namespace.  So many forms of containers have never
worked if audit was enabled in the kernel.

BUT if the container was not in net_init then the kernel network code
would send ECONNREFUSED (instead of the audit code sending EPERM).  Thus
by pure accident/dumb luck/bug if an admin configured the PAM stack to
reject all logins that didn't talk to audit, but then ran the login
untility in the non-init_net namespace, it would work!!  Clearly this
was a bug, but it is a bug some people expected.

With the introduction of network namespace support in 3.14-rc1 the two
bugs stopped cancelling each other out.  Now, containers in the
non-init_net namespace refused to let users log in (just like PAM was
configfured!)  Obviously some people were not happy that what used to
let users log in, now didn't!

This fix is kinda hacky.  We return ECONNREFUSED for all non-init
relevant namespaces.  That means that not only will the old broken
non-init_net setups continue to work, now the broken non-init_pid or
non-init_user setups will 'work'.  They don't really work, since audit
isn't logging things.  But it's what most users want.

In 3.15 we should have patches to support not only the non-init_net
(3.14) namespace but also the non-init_pid and non-init_user namespace.
So all will be right in the world.  This just opens the doors wide open
on 3.14 and hopefully makes users happy, if not the audit system...

Reported-by: Andre Tomt 
Reported-by: Adam Richter 
Signed-off-by: Eric Paris 
---
 kernel/audit.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 3392d3e..95a20f3 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -608,9 +608,19 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 
msg_type)
int err = 0;
 
/* Only support the initial namespaces for now. */
+   /*
+* We return ECONNREFUSED because it tricks userspace into thinking
+* that audit was not configured into the kernel.  Lots of users
+* configure their PAM stack (because that's what the distro does)
+* to reject login if unable to send messages to audit.  If we return
+* ECONNREFUSED the PAM stack thinks the kernel does not have audit
+* configured in and will let login proceed.  If we return EPERM
+* userspace will reject all logins.  This should be removed when we
+* support non init namespaces!!
+*/
if ((current_user_ns() != &init_user_ns) ||
(task_active_pid_ns(current) != &init_pid_ns))
-   return -EPERM;
+   return -ECONNREFUSED;
 
switch (msg_type) {
case AUDIT_LIST:
-- 
1.8.5.3



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] Avoid having to provide a fake/invalid fd and path

2014-03-26 Thread Eric Paris
On Wed, 2014-03-26 at 19:47 +0100, Jan Kara wrote:
> On Wed 26-03-14 16:30:05, xypron.g...@gmx.de wrote:
> > From: Heinrich Schuchardt 
> > 
> > https://lkml.org/lkml/2011/1/12/112
> > holds a patch by Tvrtko Ursulin
> > 
> >   Avoid having to provide a fake/invalid fd and path when flushing marks
> > 
> >   Currently for a group to flush marks it has set it needs to
> >   provide a fake or invalid (but resolvable) file descriptor
> >   and path when calling fanotify_mark. This patch pulls the
> >   flush handling a bit up so file descriptor and path are
> >   completely ignored when flushing.
> > 
> > Eric wrote it was applied.
> > https://lkml.org/lkml/2011/1/19/321
> > 
> > Unfortunately it is still not in the main stream code and the problem 
> > remains.
> > 
> > I reworked the patch to be applicable again (the signature of fanotify_mark
> > has changed since Tvrtko's work).
> > 
> > Signed-off-by: Heinrich Schuchardt 
>   The patch looks good to me. You can add:
> Reviewed-by: Jan Kara 
> 
>   Andrew, can you please add the patch to the fanotify patches you already
> carry? Thanks!

Acked-by: Eric Paris 

that would be great Andrew!

> 
>   Honza
> 
> > ---
> >  fs/notify/fanotify/fanotify_user.c |   17 ++---
> >  1 file changed, 10 insertions(+), 7 deletions(-)
> > 
> > diff --git a/fs/notify/fanotify/fanotify_user.c 
> > b/fs/notify/fanotify/fanotify_user.c
> > index 287a22c..05bb38a 100644
> > --- a/fs/notify/fanotify/fanotify_user.c
> > +++ b/fs/notify/fanotify/fanotify_user.c
> > @@ -856,6 +856,15 @@ SYSCALL_DEFINE5(fanotify_mark, int, fanotify_fd, 
> > unsigned int, flags,
> > group->priority == FS_PRIO_0)
> > goto fput_and_out;
> >  
> > +   if (flags & FAN_MARK_FLUSH) {
> > +   ret = 0;
> > +   if (flags & FAN_MARK_MOUNT)
> > +   fsnotify_clear_vfsmount_marks_by_group(group);
> > +   else
> > +   fsnotify_clear_inode_marks_by_group(group);
> > +   goto fput_and_out;
> > +   }
> > +
> > ret = fanotify_find_path(dfd, pathname, &path, flags);
> > if (ret)
> > goto fput_and_out;
> > @@ -867,7 +876,7 @@ SYSCALL_DEFINE5(fanotify_mark, int, fanotify_fd, 
> > unsigned int, flags,
> > mnt = path.mnt;
> >  
> > /* create/update an inode mark */
> > -   switch (flags & (FAN_MARK_ADD | FAN_MARK_REMOVE | FAN_MARK_FLUSH)) {
> > +   switch (flags & (FAN_MARK_ADD | FAN_MARK_REMOVE)) {
> > case FAN_MARK_ADD:
> > if (flags & FAN_MARK_MOUNT)
> > ret = fanotify_add_vfsmount_mark(group, mnt, mask, 
> > flags);
> > @@ -880,12 +889,6 @@ SYSCALL_DEFINE5(fanotify_mark, int, fanotify_fd, 
> > unsigned int, flags,
> > else
> > ret = fanotify_remove_inode_mark(group, inode, mask, 
> > flags);
> > break;
> > -   case FAN_MARK_FLUSH:
> > -   if (flags & FAN_MARK_MOUNT)
> > -   fsnotify_clear_vfsmount_marks_by_group(group);
> > -   else
> > -   fsnotify_clear_inode_marks_by_group(group);
> > -   break;
> > default:
> > ret = -EINVAL;
> > }
> > -- 
> > 1.7.10.4
> > 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 3.14-rc8 (LXC broken)

2014-03-25 Thread Eric Paris
On Tue, 2014-03-25 at 21:36 +0100, Andre Tomt wrote:
> *testing hat on*
> 
> PAM within namespaces (say, LXC) does not work anymore with 3.14-rc8,
> making login, ssh etc fail in containers unless you boot with audit=0.
> 
> This is due to a change in return value to user space; and is
> appearantly a known issue as evident in this earlier post from february:
> https://www.redhat.com/archives/linux-audit/2014-February/msg00087.html
> 
> Judging from the post it seems they want to ship 3.14 with this IMO
> quite serious regression? What is the namespace/container folks take on
> this?

Fair question.

Pam only worked in non-initial pid (or user) namespace if it was also in
the non-initial network namespace.  We added support for the network
namespace in 3.14.  So now PAM in the non-initial network namespace
functions the same as it would in the inital network namespace.  aka, it
fails.  This is actually what the audit userspace people think is the
right thing to happen.  You configured PAM to fail if it couldn't do the
right audit things, and it's failing.  Needing audit=0 is not new.

BUT given we broke (already broken [remember you configured PAM to fail
if audit didn't go well and it let you log in anyway?  aka broken?])
functionality adding network namespace support I'll send a request to
Linus tomorrow to rip out our network namespace support and I'll re-add
in 3.15 when we add pid (and partial user) namespace support.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] compat_audit: allow it to work without asm/unistd32.h

2014-03-24 Thread Eric Paris
I don't know tilegx, but I have replaced 223b24d807610 with
4b58841149dcaa5.  I believe adding AUDIT_ARCH_COMPAT_GENERIC was
akashi-san's  fix for this problem on mips.  Is this a better fix?

Thanks
-Eric

On Thu, 2014-03-20 at 11:31 -0400, Chris Metcalf wrote:
> For architectures that use the asm-generic syscall table for both
> 32- and 64-bit, there should be no need to provide a separate
> ; just using  is sufficient.
> Conditionalize use of  on the one platform that
> currently requires it (arm64).  If another platform ends up needing
> it we can create a suitable config flag at that point.
> 
> This change fixes the tilegx build failure seen in linux-next.
> 
> Signed-off-by: Chris Metcalf 
> ---
> By the way - I also note that commit 223b24d807610 that introduced
> this also put an "#ifdef COMPAT_xxx" in a UAPI header.  This seems
> like a pretty clear signal that the added code should be in
> linux/include/audit.h, not linux/uapi/include/audit.h.  But here
> I'm just focussing on getting tilegx to continue to build...
> 
>  lib/compat_audit.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/compat_audit.c b/lib/compat_audit.c
> index 873f75b640ab..e89a84b3fbe8 100644
> --- a/lib/compat_audit.c
> +++ b/lib/compat_audit.c
> @@ -1,6 +1,11 @@
>  #include 
>  #include 
> -#include 
> +#ifdef COMPAT_ARM64
> +/* 64-bit syscalls are generic, but 32-bit are not. */
> +# include 
> +#else
> +# include 
> +#endif
>  
>  unsigned compat_dir_class[] = {
>  #include 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/7] pid: get pid_t ppid of task in init_pid_ns

2014-03-17 Thread Eric Paris
On Mon, 2014-03-17 at 13:14 -0700, Tony Luck wrote:
> On Thu, Jan 23, 2014 at 11:32 AM, Richard Guy Briggs  wrote:
> > Added the functions task_ppid_nr_ns() and task_ppid_nr() to abstract the 
> > lookup
> > of the PPID (real_parent's pid_t) of a process, including rcu locking, in 
> > the
> > arbitrary and init_pid_ns.
> > This provides an alternative to sys_getppid(), which is relative to the 
> > child
> > process' pid namespace.
> ...
> > +static int pid_alive(const struct task_struct *p);
> 
> This patch (or some successor version of it) showed up
> in next-20140317 and the above declaration caused a
> bunch of warnings on ia64:
> 
> include/linux/sched.h:1718: warning: 'pid_alive' declared inline after
> being called
> 
> [repeated 1675 times across files that include this]
> 
> The ia64 complier is a lot happier if "inline" is added like this:
> 
> static inline int pid_alive(const struct task_struct *p);

Fixed for tomorrow.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] syscall_get_arch: remove useless function arguments

2014-03-11 Thread Eric Paris
Every caller of syscall_get_arch() uses current for the task and no
implementors of the function need args.  So just get rid of both of
those things.  Admittedly, since these are inline functions we aren't
wasting stack space, but it just makes the prototypes better.

Signed-off-by: Eric Paris 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-m...@linux-mips.org
Cc: linux...@de.ibm.com
Cc: x...@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
---
 arch/arm/include/asm/syscall.h  | 3 +--
 arch/mips/include/asm/syscall.h | 2 +-
 arch/mips/kernel/ptrace.c   | 2 +-
 arch/s390/include/asm/syscall.h | 5 ++---
 arch/x86/include/asm/syscall.h  | 8 +++-
 include/asm-generic/syscall.h   | 4 +---
 kernel/seccomp.c| 4 ++--
 7 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index 73ddd72..ed805f1 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -103,8 +103,7 @@ static inline void syscall_set_arguments(struct task_struct 
*task,
memcpy(®s->ARM_r0 + i, args, n * sizeof(args[0]));
 }
 
-static inline int syscall_get_arch(struct task_struct *task,
-  struct pt_regs *regs)
+static inline int syscall_get_arch(void)
 {
/* ARM tasks don't change audit architectures on the fly. */
return AUDIT_ARCH_ARM;
diff --git a/arch/mips/include/asm/syscall.h b/arch/mips/include/asm/syscall.h
index 81c8913..625e709 100644
--- a/arch/mips/include/asm/syscall.h
+++ b/arch/mips/include/asm/syscall.h
@@ -101,7 +101,7 @@ extern const unsigned long sys_call_table[];
 extern const unsigned long sys32_call_table[];
 extern const unsigned long sysn32_call_table[];
 
-static inline int __syscall_get_arch(void)
+static inline int syscall_get_arch(void)
 {
int arch = EM_MIPS;
 #ifdef CONFIG_64BIT
diff --git a/arch/mips/kernel/ptrace.c b/arch/mips/kernel/ptrace.c
index b52e1d2..65ba622 100644
--- a/arch/mips/kernel/ptrace.c
+++ b/arch/mips/kernel/ptrace.c
@@ -671,7 +671,7 @@ asmlinkage void syscall_trace_enter(struct pt_regs *regs)
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_enter(regs, regs->regs[2]);
 
-   audit_syscall_entry(__syscall_get_arch(),
+   audit_syscall_entry(syscall_get_arch(),
regs->regs[2],
regs->regs[4], regs->regs[5],
regs->regs[6], regs->regs[7]);
diff --git a/arch/s390/include/asm/syscall.h b/arch/s390/include/asm/syscall.h
index cd29d2f..bebc0bd 100644
--- a/arch/s390/include/asm/syscall.h
+++ b/arch/s390/include/asm/syscall.h
@@ -89,11 +89,10 @@ static inline void syscall_set_arguments(struct task_struct 
*task,
regs->orig_gpr2 = args[0];
 }
 
-static inline int syscall_get_arch(struct task_struct *task,
-  struct pt_regs *regs)
+static inline int syscall_get_arch(void)
 {
 #ifdef CONFIG_COMPAT
-   if (test_tsk_thread_flag(task, TIF_31BIT))
+   if (test_tsk_thread_flag(current, TIF_31BIT))
return AUDIT_ARCH_S390;
 #endif
return sizeof(long) == 8 ? AUDIT_ARCH_S390X : AUDIT_ARCH_S390;
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index aea284b..7e6d0c4 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -91,8 +91,7 @@ static inline void syscall_set_arguments(struct task_struct 
*task,
memcpy(®s->bx + i, args, n * sizeof(args[0]));
 }
 
-static inline int syscall_get_arch(struct task_struct *task,
-  struct pt_regs *regs)
+static inline int syscall_get_arch(void)
 {
return AUDIT_ARCH_I386;
 }
@@ -221,8 +220,7 @@ static inline void syscall_set_arguments(struct task_struct 
*task,
}
 }
 
-static inline int syscall_get_arch(struct task_struct *task,
-  struct pt_regs *regs)
+static inline int syscall_get_arch(void)
 {
 #ifdef CONFIG_IA32_EMULATION
/*
@@ -234,7 +232,7 @@ static inline int syscall_get_arch(struct task_struct *task,
 *
 * x32 tasks should be considered AUDIT_ARCH_X86_64.
 */
-   if (task_thread_info(task)->status & TS_COMPAT)
+   if (task_thread_info(current)->status & TS_COMPAT)
return AUDIT_ARCH_I386;
 #endif
/* Both x32 and x86_64 are considered "64-bit". */
diff --git a/include/asm-generic/syscall.h b/include/asm-generic/syscall.h
index 5b09392..d401e54 100644
--- a/include/asm-generic/syscall.h
+++ b/include/asm-generic/syscall.h
@@ -144,8 +144,6 @@ void syscall_set_arguments(struct task_struct *task, struct 
pt_regs *regs,
 
 /**
  * syscall_get_arch - return the AUDIT_ARCH for the current system call
- * @task:  task of interest, must be in system call entry tr

Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough

2014-03-10 Thread Eric Paris
On Mon, 2014-03-10 at 15:30 -0400, David Miller wrote:
> From: Eric Paris 
> Date: Fri, 07 Mar 2014 17:52:02 -0500
> 
> > The second user Eric patched, audit_send_list(), can grow without bound.
> > The number of skb's is going to be the size of the number of audit rules
> > that root loaded.  We run the list of rules, generate an skb per rule,
> > and add all of them to an skb_buff_head.  We then pass the skb_buff_head
> > to a kthread so that current will be able to read/drain the socket.
> > There really is no limit to how big the skb_buff_head could possibly
> > grow.  This doesn't necessarily absolutely have to be lossless but it
> > can actually quite reasonably be a whole lot of data that needs to get
> > sent.  I know of no way to deliver unbounded lengths of data to the
> > current task via netlink without blocking on more space in the socket.
> > Even if the socket rmem was MAX_INT, how can we deliver more?  The rule
> > size is unbounded.  How do I get an unbounded amount of data onto this
> > side of the socket when I have to generate it all during the request...
> 
> This is what netlink dumps  are for.  It is how we are able to dump
> routing tables with millions of routes to userspace.
> 
> By using normal netlink requests and netlink_unicast() for this, you
> are ignoring an entire mechanism in netlink designed specifically to
> handle this kind of situation.
> 
> Netlink dumps track state and build one or more SKBs (as necessary),
> one by one, to form the reply.  It implements flow control, state
> tracking for iteration, optimized SKB sizing and allocation, etc.

Awesome.  I'll see what I can find!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespaces fixes for 3.14-rcX

2014-03-10 Thread Eric Paris
On Sun, 2014-03-09 at 20:06 -0700, Eric W. Biederman wrote:
> Linus,
> 
> Please pull the for-linus branch from the git tree:
> 
>git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git 
> for-linus
> 
>HEAD: d211f177b28ec070c25b3d0b960aa55f352f731f audit: Update kdoc for 
> audit_send_reply and audit_list_rules_send
> 
> Starting with 3.14-rc1 the audit code is faulty (think oopses and races)
> with respect to how it computes the network namespace of which socket to
> reply to, and I happened to notice by chance when reading through the
> code.
> 
> My efforts to get these fixes noticed by people who care about audit
> seem to have landed on deaf ears, so since these are namespace related I
> have put them in my tree.

This commentary sounds like a pile of crap seeing as how there was a
whole discussion around how to handle this stuff, which you were a part
of.  And that the audit tree already picked up these 2 patches.

In any case, since I haven't sent them to Linus and I'm glad that is
done, so feel free to consider this me Acking the pull request.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough

2014-03-07 Thread Eric Paris
On Fri, 2014-03-07 at 19:48 -0500, David Miller wrote:
> From: Eric Paris 
> Date: Fri, 07 Mar 2014 17:52:02 -0500
> 
> > Audit is non-tolerant to failure and loss.
> 
> Netlink is not a loss-less transport.
I'm happy to accept that (and know it to be true).  How can I better
architect things?  It seems Eric is complaining that when we get a
request for info, we queue that info up, and then use a kthread to make
it available when the task next calls recv.  By using blocking sockets
in the kthread we have no problem with the size of the socket read buf.
If we switch to non-blocking sockets how can we possibly queue up more
than rmem size of data?  (honestly, if userspace used INT_MAX it is
almost certainly overkill for even the largest rulesets, but
theoretically, it's not...)

Is our design somehow wrong?  Flawed?  Mind you it's pretty dumb that we
do basically the same thing in 3 different audit code path, but the
architecture doesn't seem crazy to me.  Then again, I'm not brilliant by
any stretch!

   +--+
   |  |
   |   auditctl (audit tool run by root)  |
   | netlink send netlink receive |
   +--+
  +^
  ||
  v+
  ++++
  | kernel audit generate skbs || send skbs to userspace |
  ++++
  +^
  |++  |
  +--->| send skbs to a kthread |+-+
   ++

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough

2014-03-07 Thread Eric Paris
As usual Eric, your commentary is anything but useful.  However your
technical thoughts are not off the mark.  Can we stick to those?

On Wed, 2014-03-05 at 10:06 -0800, Eric W. Biederman wrote:
> Steve Grubb  writes:
> 
> > On Tuesday, March 04, 2014 07:21:52 PM David Miller wrote:
> >> From: ebied...@xmission.com (Eric W. Biederman)
> >> Date: Tue, 04 Mar 2014 14:41:16 -0800
> >> 
> >> > If we really want the ability to always appened to the queue of skb's
> >> > is to just have a version of netlink_send_skb that ignores the queued
> >> > limits.  Of course an evil program then could force the generation of
> >> > enough audit records to DOS the kernel, but we seem to be in that
> >> > situation now.  Shrug.
> >> 
> >> There is never a valid reason to bypass the socket limits.

Audit does have some pretty crazy/wacky/dumb ways of doing things.  And
they've worked really well.  I'm the first to agree that doesn't make
them right.  But also that I don't know how to do them better.  I'm
happy to try if I know how.  Audit is non-tolerant to failure and loss.
Many users of audit would prefer the system panic() than lose a message.
If someone shows me how to do it better I'll happily admit there are
likely places where what we do is just a 'little' too
strong/foolish/crazy.  Note that ALL users of these functions must have
at least 1 capability (CAP_AUDIT_CONTROL).  So if there is a malicious
app, it is a root malicious app...

The kernel audit has 3 different 'things' that send skbs to userspace.
All of them work a little crazy, but similar crazy.  The current task
calls into the kernel via netlink, and the kernel then builds one or
more skb(s) and passes those skb(s) (via differing mechanisms) to a
kthread which in turn calls netlink_unicast(,,,0) sending the response
back to the current task in 2 of the three cases.  In all cases, since
the timeout is infinite, we assume that the only possible reason this
call to netlink_unicast() will fail is because the other end of the
socket went away.  Simple drawing of 2 of the 3 cases.
 +--+
 |  |
 |   auditctl (audit tool run by root)  |
 | netlink send netlink receive |
 +--+
+^
||
v+
++++
| kernel audit generate skbs || send skbs to userspace |
++++
+^
|++  |
+--->| send skbs to a kthread |+-+
 ++
The most important of the 3 cases and the one that people care the
absolute most about 'things cannot be lost' is the actual audit
messages.  Messages like 'process A just did action B to object C'.
These are handled by means of the current process generating an skb and
passing those to an audit specific queue.  This audit internal queue
depth is controllable by userspace.  If we overflow this queue we may
call panic() (admin choice, obviously non-default).  Again, the kthread
on the other end of that queue assumes that all calls to
netlink_unicast(,,,0) will eventually succeed (unless the receiving task
died).  It is actually imperative that the current process be blocked
until the message is on track to userspace.  Even Eric isn't trying to
change this one case in his patch.  This is the one case where the task
receiving the skb is not (likely) the current task (but could be)

The other two, the ones Eric patched, are much more flexible.  In both
cases a userspace task ask the kernel for a specific piece of
information (by sending a netlink message).  current is going to be the
task draining the netlink queue.  This is the reason the send is being
punted to a kthread.  So current can read from the netlink socket.  In
one case audit_send_reply_thread() the response is small and can't
really grow without bound.  Converting to a nonblocking socket might
well make sense here.

The second user Eric patched, audit_send_list(), can grow without bound.
The number of skb's is going to be the size of the number of audit rules
that root loaded.  We run the list of rules, generate an skb per rule,
and add all of them to an skb_buff_head.  We then pass the skb_buff_head
to a kthread so that current will be able to read/drain the socket.
There really is no limit to how big the skb_buff_head could possibly
grow.  This doesn't necessarily absolutely have to be lossless but it
can actually quite r

Re: [libseccomp-discuss] Making a universal list of syscalls?

2014-02-27 Thread Eric Paris
On Thu, 2014-02-27 at 12:40 -0800, Andy Lutomirski wrote:
> Currently, dealing with Linux syscalls in an architecture-independent
> way is a mess.  Here are some issues:
> 
>  1. There's no clean way to map between syscall names and numbers on
> different architectures.  The kernel contains a number of tables (that
> work differently for different architectures).  strace has some arcane
> mechanism.  libseccomp has another.

userspace audit a 3rd.

> I'd like to see a master list in the kernel that lists, for every
> syscall, the name, the number for each architecture that implements it
> (using the AUDIT_ARCH semantics, probably), and the signature.  The
> build process could parse this table to replace the current per-arch
> mess.

I know for audit it would be huge if userspace didn't try to organically
grow this knowledge on their own!  So +1 from me!

> 
> Issues here: some syscalls have different signatures on different
> architectures.  Maybe we could require that a canonical syscall name
> would have the same signature everywhere, but architectures could
> specify alternate names.  So, for things like clone (?), there could
> actually be a few syscalls that all have alternate names of "clone".
> 
> More importantly, we could add a library in tools that exposes this
> information to userspace.  Useful operations:
> 
>  - For a given (arch, nr), indicate, for each logical argument, which
> physical argument slot is used or, if the argument is split into a
> high and low part, which pair of slots is used.
> 
>  - For a given (nr, logical args), issue the syscall for the
> architecture that build the library.
> 
>  - For a given (arch, nr, logical args), issue the syscall if
> possible.  An x86_32 build could issue x86_64 syscalls with some
> effort, and an x86_64 build could easily issue 32-bit syscalls.
> 
>  - For a given arch, map between name and nr, and give access to the 
> signature.
> 
> If this happened, presumably all architectures that supported it would
> have to have valid AUDIT_ARCH support.  That means that someone would
> have to fix ARM OABI (sigh).
> 
> Thoughts?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   >