Am 25.03.21 um 22:20 schrieb Stefan Metzmacher: > > Am 25.03.21 um 21:55 schrieb Eric W. Biederman: >> Oleg Nesterov <o...@redhat.com> writes: >> >>> On 03/25, Linus Torvalds wrote: >>>> >>>> The whole "signals are very special for IO threads" thing has caused >>>> so many problems, that maybe the solution is simply to _not_ make them >>>> special? >>> >>> Or may be IO threads should not abuse CLONE_THREAD? >>> >>> Why does create_io_thread() abuse CLONE_THREAD ? >>> >>> One reason (I think) is that this implies SIGKILL when the process >>> exits/execs, >>> anything else? >> >> A lot. >> >> The io workers perform work on behave of the ordinary userspace threads. >> Some of that work is opening files. For things like rlimits to work >> properly you need to share the signal_struct. But odds are if you find >> anything in signal_struct (not counting signals) there will be an >> io_uring code path that can exercise it as io_uring can traverse the >> filesystem, open files and read/write files. So io_uring can exercise >> all of proc. >> >> Using create_io_thread with CLONE_THREAD is the least problematic way >> (including all of the signal and ptrace problems we are looking at right >> now) to implement the io worker threads. >> >> They _really_ are threads of the process that just never execute any >> code in userspace. > > So they should look like a userspace thread sitting in something like > epoll_pwait() with all signals blocked, which will never return to userspace > again?
Would gdb work with that? The question is what backtrace gdb would show for that thread. Is it possible to block SIGSTOP/SIGCONT? I also think that all signals to an iothread should not be delivered to other threads and it may only react on a direct SIGSTOP/SIGCONT. I guess even SIGKILL should be ignored as the shutdown should happen via the exit path of the iothread parent only. > I think that would be useful, but I also think that userspace should see: > - /proc/$tidofiothread/cmdline as empty (in order to let ps and top use > [iou-wrk-$tidofuserspacethread]) > - /proc/$tidofiothread/exe as symlink to that not exists > - all of /proc/$tidofiothread/ shows root.root as owner and group > and things which still allow write access to /proc/$tidofiothread/comm > similar things > with rw permissions should still disallow modifications: > > For the other kernel threads e.g. "[cryptd]" I see the following: > > LANG=C ls -l /proc/653 | grep rw > ls: cannot read symbolic link '/proc/653/exe': No such file or directory > -rw-r--r-- 1 root root 0 Mar 25 22:09 autogroup > -rw-r--r-- 1 root root 0 Mar 25 22:09 comm > -rw-r--r-- 1 root root 0 Mar 25 22:09 coredump_filter > lrwxrwxrwx 1 root root 0 Mar 25 22:09 cwd -> / > lrwxrwxrwx 1 root root 0 Mar 25 22:09 exe > -rw-r--r-- 1 root root 0 Mar 25 22:09 gid_map > -rw-r--r-- 1 root root 0 Mar 25 22:09 loginuid > -rw------- 1 root root 0 Mar 25 22:09 mem > -rw-r--r-- 1 root root 0 Mar 25 22:09 oom_adj > -rw-r--r-- 1 root root 0 Mar 25 22:09 oom_score_adj > -rw-r--r-- 1 root root 0 Mar 25 22:09 projid_map > lrwxrwxrwx 1 root root 0 Mar 25 22:09 root -> / > -rw-r--r-- 1 root root 0 Mar 25 22:09 sched > -rw-r--r-- 1 root root 0 Mar 25 22:09 setgroups > -rw-r--r-- 1 root root 0 Mar 25 22:09 timens_offsets > -rw-rw-rw- 1 root root 0 Mar 25 22:09 timerslack_ns > -rw-r--r-- 1 root root 0 Mar 25 22:09 uid_map > > And this: > > LANG=C echo "bla" > /proc/653/comm > -bash: echo: write error: Invalid argument > > LANG=C echo "bla" > /proc/653/gid_map > -bash: echo: write error: Operation not permitted > > Can't we do the same for iothreads regarding /proc? > Just make things read only there and empty "cmdline"/"exe"? > > Maybe I'm too naive, but that what I'd assume as a userspace developer/admin. > > Does at least parts of it make any sense? I think the strange glibc setuid() behavior should also be tests here, I guess we don't want that to reset the credentials of an iothread! Another idea would be to have the iothreads as a child process with it's threads, but again I'm only looking as an admin to what I'd except to see under /proc via ps and top. metze