Re: fs_context-related oops in mainline

2019-03-15 Thread Al Viro
On Fri, Mar 15, 2019 at 02:24:30PM +, David Howells wrote:
> Al Viro  wrote:
> 
> > -   if (fc->user_ns)
> > -   put_user_ns(fc->user_ns);
> > -   fc->user_ns = get_user_ns(netns->user_ns);
> > +   if (netns) {
> > +   if (fc->user_ns)
> > +   put_user_ns(fc->user_ns);
> > +   fc->user_ns = get_user_ns(netns->user_ns);
> > +   }
> 
> This begs the question why is sysfs using the current network namespace's idea
> of the user namespace?  Why not just use the one directly from current->cred?

Because it gives access to that netns guts, presumably.  In a saner world sysfs
wouldn't _have_ netns-dependent bits; a separate per-netns filesystem would
contain those, and be mounted separately.  And yes, we do have way too many
kinds of namespaces, along with filesystems that try to mix unrelated bits and
lead to something that looks like Cthulhu's arse after an unfortunate
accident with capsaicin suppository...


Re: fs_context-related oops in mainline

2019-03-15 Thread Greg Kroah-Hartman
On Fri, Mar 15, 2019 at 02:24:30PM +, David Howells wrote:
> Al Viro  wrote:
> 
> > -   if (fc->user_ns)
> > -   put_user_ns(fc->user_ns);
> > -   fc->user_ns = get_user_ns(netns->user_ns);
> > +   if (netns) {
> > +   if (fc->user_ns)
> > +   put_user_ns(fc->user_ns);
> > +   fc->user_ns = get_user_ns(netns->user_ns);
> > +   }
> 
> This begs the question why is sysfs using the current network namespace's idea
> of the user namespace?  Why not just use the one directly from current->cred?

Ask the networking people that question, I have no idea :)


Re: fs_context-related oops in mainline

2019-03-15 Thread David Howells
Al Viro  wrote:

> - if (fc->user_ns)
> - put_user_ns(fc->user_ns);
> - fc->user_ns = get_user_ns(netns->user_ns);
> + if (netns) {
> + if (fc->user_ns)
> + put_user_ns(fc->user_ns);
> + fc->user_ns = get_user_ns(netns->user_ns);
> + }

This begs the question why is sysfs using the current network namespace's idea
of the user namespace?  Why not just use the one directly from current->cred?

David


Re: fs_context-related oops in mainline

2019-03-15 Thread Dominik Brodowski
On Fri, Mar 15, 2019 at 12:18:13PM +, Al Viro wrote:
> On Fri, Mar 15, 2019 at 12:50:02PM +0100, Dominik Brodowski wrote:
> > On Fri, Mar 15, 2019 at 11:44:45AM +, David Howells wrote:
> > > Dominik Brodowski  wrote:
> > > 
> > > > [0.839322] RIP: 0010:sysfs_init_fs_context+0x82/0xd0
> > > 
> > > Could you load your kernel into gdb and then do:
> > > 
> > >   i li *sysfs_init_fs_context+0x82
> > 
> > Doesn't seem necessary as per my mail to Al a few seconds ago:
> > kobj_ns_grab_current(KOBJ_NS_TYPE_NET) returns NULL, so
> > 
> > fc->user_ns = get_user_ns(netns->user_ns);
> > 
> > will try to dereference a null pointer.
> > 
> > Thanks,
> > Dominik
> > 
> 
> Charming...  It's netns being turned off, and the thing we'd missed
> is that kobj_ns_current_may_mount() becomes constant true in that
> setup.  IOW, ns_capable(..., CAP_SYS_ADMIN) is suppressed entirely
> in that case (well, aside of what may_mount() has done in
> do_mount/sys_fsmount).
> 
> So what we need is making that "change fc->user_ns" conditional
> on netns != NULL, as in
> 
> diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
> index 4cb21b558a85..1b56686ab178 100644
> --- a/fs/sysfs/mount.c
> +++ b/fs/sysfs/mount.c
> @@ -71,9 +71,11 @@ static int sysfs_init_fs_context(struct fs_context *fc)
>   kfc->magic = SYSFS_MAGIC;
>   fc->fs_private = kfc;
>   fc->ops = _fs_context_ops;
> - if (fc->user_ns)
> - put_user_ns(fc->user_ns);
> - fc->user_ns = get_user_ns(netns->user_ns);
> + if (netns) {
> + if (fc->user_ns)
> + put_user_ns(fc->user_ns);
> + fc->user_ns = get_user_ns(netns->user_ns);
> + }
>   fc->global = true;
>   return 0;
>  }

That fixes the issue at hand, thanks (verified both on top of the
problematic commit and on top of mainline).

Thanks,
Dominik


Re: fs_context-related oops in mainline

2019-03-15 Thread Al Viro
On Fri, Mar 15, 2019 at 12:50:02PM +0100, Dominik Brodowski wrote:
> On Fri, Mar 15, 2019 at 11:44:45AM +, David Howells wrote:
> > Dominik Brodowski  wrote:
> > 
> > > [0.839322] RIP: 0010:sysfs_init_fs_context+0x82/0xd0
> > 
> > Could you load your kernel into gdb and then do:
> > 
> > i li *sysfs_init_fs_context+0x82
> 
> Doesn't seem necessary as per my mail to Al a few seconds ago:
> kobj_ns_grab_current(KOBJ_NS_TYPE_NET) returns NULL, so
> 
> fc->user_ns = get_user_ns(netns->user_ns);
> 
> will try to dereference a null pointer.
> 
> Thanks,
>   Dominik
> 

Charming...  It's netns being turned off, and the thing we'd missed
is that kobj_ns_current_may_mount() becomes constant true in that
setup.  IOW, ns_capable(..., CAP_SYS_ADMIN) is suppressed entirely
in that case (well, aside of what may_mount() has done in
do_mount/sys_fsmount).

So what we need is making that "change fc->user_ns" conditional
on netns != NULL, as in

diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index 4cb21b558a85..1b56686ab178 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -71,9 +71,11 @@ static int sysfs_init_fs_context(struct fs_context *fc)
kfc->magic = SYSFS_MAGIC;
fc->fs_private = kfc;
fc->ops = _fs_context_ops;
-   if (fc->user_ns)
-   put_user_ns(fc->user_ns);
-   fc->user_ns = get_user_ns(netns->user_ns);
+   if (netns) {
+   if (fc->user_ns)
+   put_user_ns(fc->user_ns);
+   fc->user_ns = get_user_ns(netns->user_ns);
+   }
fc->global = true;
return 0;
 }


Re: fs_context-related oops in mainline

2019-03-15 Thread Dominik Brodowski
On Fri, Mar 15, 2019 at 11:44:45AM +, David Howells wrote:
> Dominik Brodowski  wrote:
> 
> > [0.839322] RIP: 0010:sysfs_init_fs_context+0x82/0xd0
> 
> Could you load your kernel into gdb and then do:
> 
>   i li *sysfs_init_fs_context+0x82

Doesn't seem necessary as per my mail to Al a few seconds ago:
kobj_ns_grab_current(KOBJ_NS_TYPE_NET) returns NULL, so

fc->user_ns = get_user_ns(netns->user_ns);

will try to dereference a null pointer.

Thanks,
Dominik



Re: fs_context-related oops in mainline

2019-03-15 Thread Dominik Brodowski
On Fri, Mar 15, 2019 at 11:34:47AM +, Al Viro wrote:
> On Fri, Mar 15, 2019 at 08:43:07AM +0100, Dominik Brodowski wrote:
> > David, Al,
> > 
> > commit 23bf1b6be9c2 ("kernfs, sysfs, cgroup, intel_rdt: Support fs_context")
> > seems to have introduced a bug; at least that's the commit I bisected the
> > following oops down to:

As additional information: In sysfs_init_fs_context, we have

kfc->ns_tag = netns = kobj_ns_grab_current(KOBJ_NS_TYPE_NET);
...
fc->user_ns = get_user_ns(netns->user_ns);

With my config, kobj_ns_grab_current() returns NULL here, which obviously
causes a null pointer dereference when trying to set fs->user_ns.

Probably the most important line of the .config (attached):

# CONFIG_NET is not set

Thanks,
Dominik
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 5.0.0-rc2 Kernel Configuration
#

#
# Compiler: gcc (GCC) 8.2.1 20181127
#
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=80201
CONFIG_CLANG_VERSION=0
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
CONFIG_KERNEL_XZ=y
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_CROSS_MEMORY_ATTACH is not set
# CONFIG_USELIB is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_ARCH_CLOCKSOURCE_INIT=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_PSI is not set
# CONFIG_CPU_ISOLATION is not set

#
# RCU Subsystem
#
CONFIG_PREEMPT_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=13
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_ARCH_SUPPORTS_INT128=y
# CONFIG_NUMA_BALANCING is not set
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
CONFIG_BLK_CGROUP=y
# CONFIG_DEBUG_BLK_CGROUP is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_CGROUP_PIDS=y
# CONFIG_CGROUP_RDMA is not set
CONFIG_CGROUP_FREEZER=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_BPF=y
# CONFIG_CGROUP_DEBUG is not set
CONFIG_SOCK_CGROUP_DATA=y
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
CONFIG_USER_NS=y
CONFIG_PID_NS=y
# CONFIG_CHECKPOINT_RESTORE is not set
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
CONFIG_RD_LZ4=y
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_BPF=y
# CONFIG_EXPERT is not set
CONFIG_MULTIUSER=y
CONFIG_SGETMASK_SYSCALL=y
CONFIG_SYSFS_SYSCALL=y
CONFIG_FHANDLE=y
CONFIG_POSIX_TIMERS=y
CONFIG_PRINTK=y
CONFIG_PRINTK_NMI=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_ADVISE_SYSCALLS=y
CONFIG_MEMBARRIER=y
CONFIG_KALLSYMS=y

Re: fs_context-related oops in mainline

2019-03-15 Thread David Howells
Dominik Brodowski  wrote:

> [0.839322] RIP: 0010:sysfs_init_fs_context+0x82/0xd0

Could you load your kernel into gdb and then do:

i li *sysfs_init_fs_context+0x82

David


Re: fs_context-related oops in mainline

2019-03-15 Thread Al Viro
On Fri, Mar 15, 2019 at 08:43:07AM +0100, Dominik Brodowski wrote:
> David, Al,
> 
> commit 23bf1b6be9c2 ("kernfs, sysfs, cgroup, intel_rdt: Support fs_context")
> seems to have introduced a bug; at least that's the commit I bisected the
> following oops down to:

> This occurs while trying to mount sysfs in initramfs
> 
>   mount -n -t sysfs sysfs /sys
> 
> All this obviously runs in qemu; config and further information are available 
> upon request.

.config, please?