Am 25.03.21 um 22:20 schrieb Stefan Metzmacher:
> 
> Am 25.03.21 um 21:55 schrieb Eric W. Biederman:
>> Oleg Nesterov <o...@redhat.com> writes:
>>
>>> On 03/25, Linus Torvalds wrote:
>>>>
>>>> The whole "signals are very special for IO threads" thing has caused
>>>> so many problems, that maybe the solution is simply to _not_ make them
>>>> special?
>>>
>>> Or may be IO threads should not abuse CLONE_THREAD?
>>>
>>> Why does create_io_thread() abuse CLONE_THREAD ?
>>>
>>> One reason (I think) is that this implies SIGKILL when the process 
>>> exits/execs,
>>> anything else?
>>
>> A lot.
>>
>> The io workers perform work on behave of the ordinary userspace threads.
>> Some of that work is opening files.  For things like rlimits to work
>> properly you need to share the signal_struct.  But odds are if you find
>> anything in signal_struct (not counting signals) there will be an
>> io_uring code path that can exercise it as io_uring can traverse the
>> filesystem, open files and read/write files.  So io_uring can exercise
>> all of proc.
>>
>> Using create_io_thread with CLONE_THREAD is the least problematic way
>> (including all of the signal and ptrace problems we are looking at right
>> now) to implement the io worker threads.
>>
>> They _really_ are threads of the process that just never execute any
>> code in userspace.
> 
> So they should look like a userspace thread sitting in something like
> epoll_pwait() with all signals blocked, which will never return to userspace 
> again?

Would gdb work with that?
The question is what backtrace gdb would show for that thread.

Is it possible to block SIGSTOP/SIGCONT?

I also think that all signals to an iothread should not be delivered to
other threads and it may only react on a direct SIGSTOP/SIGCONT.
I guess even SIGKILL should be ignored as the shutdown should happen
via the exit path of the iothread parent only.

> I think that would be useful, but I also think that userspace should see:
> - /proc/$tidofiothread/cmdline as empty (in order to let ps and top use 
> [iou-wrk-$tidofuserspacethread])
> - /proc/$tidofiothread/exe as symlink to that not exists
> - all of /proc/$tidofiothread/ shows root.root as owner and group
>   and things which still allow write access to /proc/$tidofiothread/comm 
> similar things
>   with rw permissions should still disallow modifications:
> 
> For the other kernel threads e.g. "[cryptd]" I see the following:
> 
> LANG=C ls -l /proc/653 | grep rw
> ls: cannot read symbolic link '/proc/653/exe': No such file or directory
> -rw-r--r--  1 root root 0 Mar 25 22:09 autogroup
> -rw-r--r--  1 root root 0 Mar 25 22:09 comm
> -rw-r--r--  1 root root 0 Mar 25 22:09 coredump_filter
> lrwxrwxrwx  1 root root 0 Mar 25 22:09 cwd -> /
> lrwxrwxrwx  1 root root 0 Mar 25 22:09 exe
> -rw-r--r--  1 root root 0 Mar 25 22:09 gid_map
> -rw-r--r--  1 root root 0 Mar 25 22:09 loginuid
> -rw-------  1 root root 0 Mar 25 22:09 mem
> -rw-r--r--  1 root root 0 Mar 25 22:09 oom_adj
> -rw-r--r--  1 root root 0 Mar 25 22:09 oom_score_adj
> -rw-r--r--  1 root root 0 Mar 25 22:09 projid_map
> lrwxrwxrwx  1 root root 0 Mar 25 22:09 root -> /
> -rw-r--r--  1 root root 0 Mar 25 22:09 sched
> -rw-r--r--  1 root root 0 Mar 25 22:09 setgroups
> -rw-r--r--  1 root root 0 Mar 25 22:09 timens_offsets
> -rw-rw-rw-  1 root root 0 Mar 25 22:09 timerslack_ns
> -rw-r--r--  1 root root 0 Mar 25 22:09 uid_map
> 
> And this:
> 
> LANG=C echo "bla" > /proc/653/comm
> -bash: echo: write error: Invalid argument
> 
> LANG=C echo "bla" > /proc/653/gid_map
> -bash: echo: write error: Operation not permitted
> 
> Can't we do the same for iothreads regarding /proc?
> Just make things read only there and empty "cmdline"/"exe"?
> 
> Maybe I'm too naive, but that what I'd assume as a userspace developer/admin.
> 
> Does at least parts of it make any sense?

I think the strange glibc setuid() behavior should also be tests here,
I guess we don't want that to reset the credentials of an iothread!

Another idea would be to have the iothreads as a child process with it's 
threads,
but again I'm only looking as an admin to what I'd except to see under /proc
via ps and top.

metze

Reply via email to