On 2017-08-11 02:36, Richard Guy Briggs wrote: > On 2017-06-28 15:03, Paul Moore wrote: > > On Tue, Jun 27, 2017 at 5:11 PM, Richard Guy Briggs <r...@redhat.com> wrote: > > > On 2017-05-30 17:21, Paul Moore wrote: > > >> On Tue, Apr 4, 2017 at 5:21 AM, Richard Guy Briggs <r...@redhat.com> > > >> wrote: > > > > ... > > > > >> > diff --git a/kernel/audit.c b/kernel/audit.c > > >> > index 25dd70a..7d83c5a 100644 > > >> > --- a/kernel/audit.c > > >> > +++ b/kernel/audit.c > > >> > @@ -66,6 +66,7 @@ > > >> > #include <linux/freezer.h> > > >> > #include <linux/pid_namespace.h> > > >> > #include <net/netns/generic.h> > > >> > +#include <linux/dcache.h> > > >> > > > >> > #include "audit.h" > > >> > > > >> > @@ -1884,6 +1885,10 @@ void audit_copy_inode(struct audit_names *name, > > >> > const struct dentry *dentry, > > >> > name->gid = inode->i_gid; > > >> > name->rdev = inode->i_rdev; > > >> > security_inode_getsecid(inode, &name->osid); > > >> > + if (name->dentry) { > > >> > + dput(name->dentry); > > >> > + name->dentry = NULL; > > >> > + } > > >> > > >> Out of curiosity, what terrible things happen if we take a reference > > >> to a non-NULL dentry passed to audit_copy_inode() and store it in > > >> name->dentry? Does performance tank? > > > > > > Interesting idea. Right now it is optimized to only take a reference to > > > the dentry's parent dentry in the case we're handed an anonymous entry. > > > Most of the time it will never be used even though we invest in the > > > overhead of taking a reference count. Besides, __audit_inode_child() > > > hands in a NULL for the dentry parameter to audit_copy_inode(). > > > > [NOTE: audit_copy_inode() hands a NULL dentry only in the anonymous parent > > case] > > > > I believe I was just thinking of less conditional handling, especially > > when reference counts are concerned. I'm just trying to limit future > > headaches, but I suspect the perf cost would be problematic, and as > > you point out, there is no *need* for the majority of cases. > > > > Looking at this again today, why would we want to clear name->dentry > > in audit_copy_inode() if it is already set? Does that ever happen? > > I'm not sure it does ... > > Ok, I just tried re-writing that part to dput that dentry only for the > child of an anonymous parent rather than whenever audit_copy_inode() was > called and I ended up with a: > > BUG: Dentry ffff8800338f1dc0{i=369a,n=stderr} still in use (1) > [unmount of tmpfs tmpfs] > WARNING: CPU: 0 PID: 387 at fs/dcache.c:1445 umount_check+0x99/0xc0 > > This was after rebasing on a recent audit-next based on 4.11 rather than > the previous based on a 4.8 audit-next. Something else changed in the > kernel or kernel config between these two points. I was getting some > "INFO: suspicious RCU usage." warnings in idmap for NFS on the earlier > kernel and that is no longer happenning, and I'm no longer getting any > tracefs audit PATH entries in the more recent kernel which suggests that > whatever was causing the anonymous parent PATH records from tracefs has > been changed/fixed/configured-out. This tmpfs Dentry issue happens on > both 4.8 and 4.11 kernels.
Minor correction here: I'm still getting tracefs PATH records on 4.11, so whatever happenned in that test appears to be a test procedure error. > So the tmpfs is being unmounted within the time of a task that has taken > an anonymous parent audit_name and it appears that the dput in > audit_copy_inode() called from elsewhere had reset it to avoid > needlessly extending the life of this dentry. > > I see two obvious ways to solve this: > - Return to freeing this dentry from audit_copy_inode() > - Add tmpfs to the list of filesystems to not create PATH records > > I'm really not crazy aobut this second one unless I know why tmpfs is > generating calls with anonymous parents. > > For reference, the rest of the call trace is: > dump_stack+0x85/0xc9 > __warn+0xd1/0xf0 > ? d_genocide_kill+0x40/0x40 > warn_slowpath_null+0x1d/0x20 > umount_check+0x99/0xc0 > d_walk+0x10b/0x580 > ? do_one_tree+0x26/0x40 > do_one_tree+0x26/0x40 > shrink_dcache_for_umount+0x5d/0xd0 > generic_shutdown_super+0x1f/0xf0 > kill_litter_super+0x29/0x40 > deactivate_locked_super+0x43/0x70 > deactivate_super+0x88/0xa0 > cleanup_mnt+0x8e/0xe0 > __cleanup_mnt+0x12/0x20 > task_work_run+0x83/0xc0 > do_exit+0x45c/0xfe0 > ? syscall_trace_enter+0x2e4/0x400 > do_group_exit+0x68/0xe0 > SyS_exit_group+0x14/0x20 > do_syscall_64+0x82/0x270 > entry_SYSCALL64_slow_path+0x25/0x25 > > > > I'm assuming you are hinting at also using that dentry to compare the > > > audit_names entry, which I think it a bad idea since there could be > > > multiple paths to access a dentry. I did orignially have another patch > > > that would have tried to use that as well, which didn't seem to hurt, > > > but I didn't think was worth upstreaming. > > > > No, I wasn't thinking that, the dev/inode numbers should be sufficient > > in those cases I believe; I'm not sure the dentry would help us here. > > > > >> Also out of curiosity, why do we want to drop a dentry reference here > > >> if one already exists? > > > > > > I think we want to drop a dentry reference here because this inode child > > > could be a subsequent access to the same dentry with a full path, > > > removing the need to cache this dentry information in the first place. > > > > Related to my comment above from today ... what code path please? > > So it appears that there is a code path that does free this dentry in a > timely fashion to avoid needlessly extending its life and simply leaving > it there until the audit_context is freed is too long. > > > >> > @@ -1925,6 +1930,17 @@ void audit_log_name(struct audit_context > > >> > *context, struct audit_names *n, > > >> > audit_log_n_untrustedstring(ab, n->name->name, > > >> > n->name_len); > > >> > } > > >> > + } else if (n->dentry) { > > >> > + char *fullpath; > > >> > + const char *fullpathp; > > >> > + > > >> > + fullpath = kmalloc(PATH_MAX, GFP_KERNEL); > > >> > + if (!fullpath) > > >> > + return; > > >> > > >> I'm wondering if there is some value in still emitting the record if > > >> the kmalloc() fails, just with the name field set as the unset "?" > > >> value, e.g. "name=?". Thoughts? > > > > > > Possibly. We've got much bigger problems if that happens, but this > > > sounds like a good defensive coding approach. I'm even tempted to call > > > audit_panic(). > > > > No audit_panic(). We've still got good information that we can > > record, e.g. dev/inode numbers; let's just print "name=?" and go on > > our way recording the rest of the information. This is in keeping > > with the current audit_log_name() error handling. > > > > At the very least you need to clean up here instead of just returning. > > As the patch currently stands I believe this will end up leaking an > > audit_buffer. > > This has been fixed along with a fullpath kmalloc leak. > > > paul moore > > - RGB > > -- > Richard Guy Briggs <r...@redhat.com> > Sr. S/W Engineer, Kernel Security, Base Operating Systems > Remote, Ottawa, Red Hat Canada > IRC: rgb, SunRaycer > Voice: +1.647.777.2635, Internal: (81) 32635 > > -- > Linux-audit mailing list > linux-au...@redhat.com > https://www.redhat.com/mailman/listinfo/linux-audit - RGB -- Richard Guy Briggs <r...@redhat.com> Sr. S/W Engineer, Kernel Security, Base Operating Systems Remote, Ottawa, Red Hat Canada IRC: rgb, SunRaycer Voice: +1.647.777.2635, Internal: (81) 32635