Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

2018-07-20 Thread Eric W. Biederman
Oleg Nesterov  writes:

> On 07/17, Oleg Nesterov wrote:
>>
>> And, I didn't mention this yesterday, but probably the next 08/11 patch can
>> have the same problem. But this is a bit more complicated because 
>> send_sigio()
>> uses the same "type" both for do_each_pid_task() and as an argument passed to
>> do_send_sig_info().
>
> perhaps it can simply do
>
>   if (type <= PIDTYPE_TGID) {
>   rcu_read_lock();
>   p = pid_task(pid, PIDTYPE_PID);
>   send_sigio_to_task(p, fown, fd, band, type);
>   rcu_read_unlock();
>   } else {
>   read_lock(_lock);
>   do_each_pid_task(pid, type, p) {
>   send_sigio_to_task(p, fown, fd, band, type);
>   } while_each_pid_task(pid, type, p);
>   read_unlock(_lock);
>   }
>
> this way we also avoid tasklist_lock in F_OWNER_TID/F_OWNER_PID case.

I like that.  I updated that code in a different way but that looks
more elegant and I think I will incoporate it.

> To clarify, it is not that I think any sane application can do
> fcntl(F_OWNER_PID, thread_tid) but still this is a user-visible change
> we can easily avoid.

Agreed.

I do think 

Eric



Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

2018-07-20 Thread Eric W. Biederman
Oleg Nesterov  writes:

> On 07/17, Oleg Nesterov wrote:
>>
>> And, I didn't mention this yesterday, but probably the next 08/11 patch can
>> have the same problem. But this is a bit more complicated because 
>> send_sigio()
>> uses the same "type" both for do_each_pid_task() and as an argument passed to
>> do_send_sig_info().
>
> perhaps it can simply do
>
>   if (type <= PIDTYPE_TGID) {
>   rcu_read_lock();
>   p = pid_task(pid, PIDTYPE_PID);
>   send_sigio_to_task(p, fown, fd, band, type);
>   rcu_read_unlock();
>   } else {
>   read_lock(_lock);
>   do_each_pid_task(pid, type, p) {
>   send_sigio_to_task(p, fown, fd, band, type);
>   } while_each_pid_task(pid, type, p);
>   read_unlock(_lock);
>   }
>
> this way we also avoid tasklist_lock in F_OWNER_TID/F_OWNER_PID case.

I like that.  I updated that code in a different way but that looks
more elegant and I think I will incoporate it.

> To clarify, it is not that I think any sane application can do
> fcntl(F_OWNER_PID, thread_tid) but still this is a user-visible change
> we can easily avoid.

Agreed.

I do think 

Eric



Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

2018-07-20 Thread Eric W. Biederman
Linus Torvalds  writes:

> On Mon, Jul 16, 2018 at 7:50 AM Eric W. Biederman  
> wrote:
>>
>> In practice since glibc does not make thread id's available I don't
>> expect anyone relies on this behavior.  Since no one relies on it we
>> can change it without creating a regression.
>
> Actually, there's a really obvious case where this simply isn't true.
>
> Just imagine you're a MIS person or a developer, doing "ps -eLf" to
> see what's going on, and want to kill one thread. Either because you
> see that one thread using all CPU, or because you are the developer
> and you know what's up.
>
> Those thread ID's are exported trivially.

True.  Which makes all of this shell script visible.  So someone may
have done something with this functionality.

I have just gone through all of my patches and updated them to ensure
that everything has the same behavior when selecting processes as it does
today.  So this will not be an issue with the next version this patch series.



I am going to come back to this as there are some really nasty corner
cases in the current kernel.  Primarily that we can send signals through
a zombie thread group leader and it can have unchangable credentials
completely out of sync with the credentials on the other threads.

Eric












Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

2018-07-20 Thread Eric W. Biederman
Linus Torvalds  writes:

> On Mon, Jul 16, 2018 at 7:50 AM Eric W. Biederman  
> wrote:
>>
>> In practice since glibc does not make thread id's available I don't
>> expect anyone relies on this behavior.  Since no one relies on it we
>> can change it without creating a regression.
>
> Actually, there's a really obvious case where this simply isn't true.
>
> Just imagine you're a MIS person or a developer, doing "ps -eLf" to
> see what's going on, and want to kill one thread. Either because you
> see that one thread using all CPU, or because you are the developer
> and you know what's up.
>
> Those thread ID's are exported trivially.

True.  Which makes all of this shell script visible.  So someone may
have done something with this functionality.

I have just gone through all of my patches and updated them to ensure
that everything has the same behavior when selecting processes as it does
today.  So this will not be an issue with the next version this patch series.



I am going to come back to this as there are some really nasty corner
cases in the current kernel.  Primarily that we can send signals through
a zombie thread group leader and it can have unchangable credentials
completely out of sync with the credentials on the other threads.

Eric












Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

2018-07-17 Thread Linus Torvalds
On Mon, Jul 16, 2018 at 7:50 AM Eric W. Biederman  wrote:
>
> In practice since glibc does not make thread id's available I don't
> expect anyone relies on this behavior.  Since no one relies on it we
> can change it without creating a regression.

Actually, there's a really obvious case where this simply isn't true.

Just imagine you're a MIS person or a developer, doing "ps -eLf" to
see what's going on, and want to kill one thread. Either because you
see that one thread using all CPU, or because you are the developer
and you know what's up.

Those thread ID's are exported trivially.

   Linus


Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

2018-07-17 Thread Linus Torvalds
On Mon, Jul 16, 2018 at 7:50 AM Eric W. Biederman  wrote:
>
> In practice since glibc does not make thread id's available I don't
> expect anyone relies on this behavior.  Since no one relies on it we
> can change it without creating a regression.

Actually, there's a really obvious case where this simply isn't true.

Just imagine you're a MIS person or a developer, doing "ps -eLf" to
see what's going on, and want to kill one thread. Either because you
see that one thread using all CPU, or because you are the developer
and you know what's up.

Those thread ID's are exported trivially.

   Linus


Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

2018-07-17 Thread Oleg Nesterov
On 07/17, Oleg Nesterov wrote:
>
> And, I didn't mention this yesterday, but probably the next 08/11 patch can
> have the same problem. But this is a bit more complicated because send_sigio()
> uses the same "type" both for do_each_pid_task() and as an argument passed to
> do_send_sig_info().

perhaps it can simply do

if (type <= PIDTYPE_TGID) {
rcu_read_lock();
p = pid_task(pid, PIDTYPE_PID);
send_sigio_to_task(p, fown, fd, band, type);
rcu_read_unlock();
} else {
read_lock(_lock);
do_each_pid_task(pid, type, p) {
send_sigio_to_task(p, fown, fd, band, type);
} while_each_pid_task(pid, type, p);
read_unlock(_lock);
}

this way we also avoid tasklist_lock in F_OWNER_TID/F_OWNER_PID case.

To clarify, it is not that I think any sane application can do
fcntl(F_OWNER_PID, thread_tid) but still this is a user-visible change
we can easily avoid.

Oleg.



Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

2018-07-17 Thread Oleg Nesterov
On 07/17, Oleg Nesterov wrote:
>
> And, I didn't mention this yesterday, but probably the next 08/11 patch can
> have the same problem. But this is a bit more complicated because send_sigio()
> uses the same "type" both for do_each_pid_task() and as an argument passed to
> do_send_sig_info().

perhaps it can simply do

if (type <= PIDTYPE_TGID) {
rcu_read_lock();
p = pid_task(pid, PIDTYPE_PID);
send_sigio_to_task(p, fown, fd, band, type);
rcu_read_unlock();
} else {
read_lock(_lock);
do_each_pid_task(pid, type, p) {
send_sigio_to_task(p, fown, fd, band, type);
} while_each_pid_task(pid, type, p);
read_unlock(_lock);
}

this way we also avoid tasklist_lock in F_OWNER_TID/F_OWNER_PID case.

To clarify, it is not that I think any sane application can do
fcntl(F_OWNER_PID, thread_tid) but still this is a user-visible change
we can easily avoid.

Oleg.



Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

2018-07-17 Thread Oleg Nesterov
On 07/16, Eric W. Biederman wrote:
>
> There are two questions.
> a) Can we use the pid of a thread to find the thread group?
> b) Will the signal be queued in the thread group?

IMO "yes" to both questions, I simply see no reason to change the current
semantics. Even if glibc doesn't show the tread id's a user can see them
in /proc/$tgid/task/. So I think kill_pid_info() should just do

p = pid_task(pid, PIDTYPE_PID);
group_send_sig_info(p, PIDTYPE_TGID);

again, posix_timer_event() looks fine, but to me

pid_task(timr->it_pid, shared ? PIDTYPE_TGID : PIDTYPE_PID)

looks like unnecessary complication,

pid_task(timr->it_pid, PIDTYPE_PID);

should do the same thing.

And, I didn't mention this yesterday, but probably the next 08/11 patch can
have the same problem. But this is a bit more complicated because send_sigio()
uses the same "type" both for do_each_pid_task() and as an argument passed to
do_send_sig_info().

Oleg.



Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

2018-07-17 Thread Oleg Nesterov
On 07/16, Eric W. Biederman wrote:
>
> There are two questions.
> a) Can we use the pid of a thread to find the thread group?
> b) Will the signal be queued in the thread group?

IMO "yes" to both questions, I simply see no reason to change the current
semantics. Even if glibc doesn't show the tread id's a user can see them
in /proc/$tgid/task/. So I think kill_pid_info() should just do

p = pid_task(pid, PIDTYPE_PID);
group_send_sig_info(p, PIDTYPE_TGID);

again, posix_timer_event() looks fine, but to me

pid_task(timr->it_pid, shared ? PIDTYPE_TGID : PIDTYPE_PID)

looks like unnecessary complication,

pid_task(timr->it_pid, PIDTYPE_PID);

should do the same thing.

And, I didn't mention this yesterday, but probably the next 08/11 patch can
have the same problem. But this is a bit more complicated because send_sigio()
uses the same "type" both for do_each_pid_task() and as an argument passed to
do_send_sig_info().

Oleg.



Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

2018-07-16 Thread Linus Torvalds
On Mon, Jul 16, 2018 at 11:02 AM Eric W. Biederman
 wrote:
>
> There are two questions.
> a) Can we use the pid of a thread to find the thread group?

Yes. Just find the thread, and then use p->tgid.

However, that's not what the code used to do. It used to just find the
thread, and then do "do_send_sig_info()" on it.

And it's actually *slightly* different than "find the thread group
based on the thread". At least the permission checks are different.
The permission checks are done on the thread.

> b) Will the signal be queued in the thread group?

Yes.

pending = group ? >signal->shared_pending : >pending;

and "group" is true.

> > Now, it is possible that at none of the legacy uses use CLONE_THREAD
> > and thus aren't affected (because tgid will always be pid). So maybe
> > nobody notices.
>
> That is what I expect.  I don't know think legacy is a good description.
> Calling other uses of CLONE_THREAD non-glibc seems better.  The old
> LinuxThreads did not use CLONE_THREAD because it did not exist.

Again, don't get hung up about different libc implementations.

People have literally used clone() directly. And some of them use CLONE_THREAD.

Just google it. I guarantee you'll find examples of it, because I
found examples.

So stop the whole "libc" argument. That's not the point, and as long
as you make that argument, your argument is simply not valid.

People use clone() directly. Really. Really really.

Linus


Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

2018-07-16 Thread Linus Torvalds
On Mon, Jul 16, 2018 at 11:02 AM Eric W. Biederman
 wrote:
>
> There are two questions.
> a) Can we use the pid of a thread to find the thread group?

Yes. Just find the thread, and then use p->tgid.

However, that's not what the code used to do. It used to just find the
thread, and then do "do_send_sig_info()" on it.

And it's actually *slightly* different than "find the thread group
based on the thread". At least the permission checks are different.
The permission checks are done on the thread.

> b) Will the signal be queued in the thread group?

Yes.

pending = group ? >signal->shared_pending : >pending;

and "group" is true.

> > Now, it is possible that at none of the legacy uses use CLONE_THREAD
> > and thus aren't affected (because tgid will always be pid). So maybe
> > nobody notices.
>
> That is what I expect.  I don't know think legacy is a good description.
> Calling other uses of CLONE_THREAD non-glibc seems better.  The old
> LinuxThreads did not use CLONE_THREAD because it did not exist.

Again, don't get hung up about different libc implementations.

People have literally used clone() directly. And some of them use CLONE_THREAD.

Just google it. I guarantee you'll find examples of it, because I
found examples.

So stop the whole "libc" argument. That's not the point, and as long
as you make that argument, your argument is simply not valid.

People use clone() directly. Really. Really really.

Linus


Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

2018-07-16 Thread Eric W. Biederman
Linus Torvalds  writes:

> On Mon, Jul 16, 2018 at 7:50 AM Eric W. Biederman  
> wrote:
>>
>> In practice since glibc does not make thread id's available I don't
>> expect anyone relies on this behavior.  Since no one relies on it we
>> can change it without creating a regression.
>
> Maybe.
>
> However, possibly not.
>
> The thing is, glibc wasn't the original or only use of our threads. In
> fact, there are people out there that use clone() directly, without
> using it for posix threading. And Oleg was right to notice this,
> because the traditional model was literally to just use "kill()" on
> the pid returned from clone().

I completely agree that Oleg was right to notice this, and I was
definitely not right to overlook.  In my description and otherwise.

I also think the semantic change needs to happen in it's own separate
patch so things can be tracked down.

I really don't think anyone uses this but it is not smart to hold the
rest of the changes hostage to my belief.  So I am thinking about how
to rework this.

> So the semantics of Linux kill() really is to kill the thread, not the
> group leader. glibc's implementation of pthreads is not the only model
> out there.

There are two questions.
a) Can we use the pid of a thread to find the thread group?
b) Will the signal be queued in the thread group?

> Now, it is possible that at none of the legacy uses use CLONE_THREAD
> and thus aren't affected (because tgid will always be pid). So maybe
> nobody notices.

That is what I expect.  I don't know think legacy is a good description.
Calling other uses of CLONE_THREAD non-glibc seems better.  The old
LinuxThreads did not use CLONE_THREAD because it did not exist.
>
> But we really have three different 'kill' system calls:
>
>  - the original 'kill' system call (#37 on x86-32).
>
>This looks up the thread ID, but signals the *group*.
>
>  - tkill (#238)
>
>This looks up the thread, and signals the specific thread.
>
>  - tgkill (#270)
>
>This looks up the tgid, and signals the group.

No.  tgkill is a less racy version of tkill and verifies that the
thread it signals is in the proper thread group.

> Modern glibc will not even use the original 'kill()' at all, I think.
> But it's the legacy behavior.

No.  Modern glibc definitely still uses kill.  As kill is the only one
exporting the posix kill API.  

> So I do think Oleg is right. We should be careful. You'll not notice
> breakage on a modern distro, but you might easily break old code.

Yes.  We definitely need to be careful.   At the same time since this
isn't something the old LinuxThreads had to cope with we can probably
clean it up.  But as that is not my focus it should probably be pushed out.

Eric


Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

2018-07-16 Thread Eric W. Biederman
Linus Torvalds  writes:

> On Mon, Jul 16, 2018 at 7:50 AM Eric W. Biederman  
> wrote:
>>
>> In practice since glibc does not make thread id's available I don't
>> expect anyone relies on this behavior.  Since no one relies on it we
>> can change it without creating a regression.
>
> Maybe.
>
> However, possibly not.
>
> The thing is, glibc wasn't the original or only use of our threads. In
> fact, there are people out there that use clone() directly, without
> using it for posix threading. And Oleg was right to notice this,
> because the traditional model was literally to just use "kill()" on
> the pid returned from clone().

I completely agree that Oleg was right to notice this, and I was
definitely not right to overlook.  In my description and otherwise.

I also think the semantic change needs to happen in it's own separate
patch so things can be tracked down.

I really don't think anyone uses this but it is not smart to hold the
rest of the changes hostage to my belief.  So I am thinking about how
to rework this.

> So the semantics of Linux kill() really is to kill the thread, not the
> group leader. glibc's implementation of pthreads is not the only model
> out there.

There are two questions.
a) Can we use the pid of a thread to find the thread group?
b) Will the signal be queued in the thread group?

> Now, it is possible that at none of the legacy uses use CLONE_THREAD
> and thus aren't affected (because tgid will always be pid). So maybe
> nobody notices.

That is what I expect.  I don't know think legacy is a good description.
Calling other uses of CLONE_THREAD non-glibc seems better.  The old
LinuxThreads did not use CLONE_THREAD because it did not exist.
>
> But we really have three different 'kill' system calls:
>
>  - the original 'kill' system call (#37 on x86-32).
>
>This looks up the thread ID, but signals the *group*.
>
>  - tkill (#238)
>
>This looks up the thread, and signals the specific thread.
>
>  - tgkill (#270)
>
>This looks up the tgid, and signals the group.

No.  tgkill is a less racy version of tkill and verifies that the
thread it signals is in the proper thread group.

> Modern glibc will not even use the original 'kill()' at all, I think.
> But it's the legacy behavior.

No.  Modern glibc definitely still uses kill.  As kill is the only one
exporting the posix kill API.  

> So I do think Oleg is right. We should be careful. You'll not notice
> breakage on a modern distro, but you might easily break old code.

Yes.  We definitely need to be careful.   At the same time since this
isn't something the old LinuxThreads had to cope with we can probably
clean it up.  But as that is not my focus it should probably be pushed out.

Eric


Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

2018-07-16 Thread Linus Torvalds
On Mon, Jul 16, 2018 at 7:50 AM Eric W. Biederman  wrote:
>
> In practice since glibc does not make thread id's available I don't
> expect anyone relies on this behavior.  Since no one relies on it we
> can change it without creating a regression.

Maybe.

However, possibly not.

The thing is, glibc wasn't the original or only use of our threads. In
fact, there are people out there that use clone() directly, without
using it for posix threading. And Oleg was right to notice this,
because the traditional model was literally to just use "kill()" on
the pid returned from clone().

So the semantics of Linux kill() really is to kill the thread, not the
group leader. glibc's implementation of pthreads is not the only model
out there.

Now, it is possible that at none of the legacy uses use CLONE_THREAD
and thus aren't affected (because tgid will always be pid). So maybe
nobody notices.

But we really have three different 'kill' system calls:

 - the original 'kill' system call (#37 on x86-32).

   This looks up the thread ID, but signals the *group*.

 - tkill (#238)

   This looks up the thread, and signals the specific thread.

 - tgkill (#270)

   This looks up the tgid, and signals the group.

Modern glibc will not even use the original 'kill()' at all, I think.
But it's the legacy behavior.

So I do think Oleg is right. We should be careful. You'll not notice
breakage on a modern distro, but you might easily break old code.

 Linus


Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

2018-07-16 Thread Linus Torvalds
On Mon, Jul 16, 2018 at 7:50 AM Eric W. Biederman  wrote:
>
> In practice since glibc does not make thread id's available I don't
> expect anyone relies on this behavior.  Since no one relies on it we
> can change it without creating a regression.

Maybe.

However, possibly not.

The thing is, glibc wasn't the original or only use of our threads. In
fact, there are people out there that use clone() directly, without
using it for posix threading. And Oleg was right to notice this,
because the traditional model was literally to just use "kill()" on
the pid returned from clone().

So the semantics of Linux kill() really is to kill the thread, not the
group leader. glibc's implementation of pthreads is not the only model
out there.

Now, it is possible that at none of the legacy uses use CLONE_THREAD
and thus aren't affected (because tgid will always be pid). So maybe
nobody notices.

But we really have three different 'kill' system calls:

 - the original 'kill' system call (#37 on x86-32).

   This looks up the thread ID, but signals the *group*.

 - tkill (#238)

   This looks up the thread, and signals the specific thread.

 - tgkill (#270)

   This looks up the tgid, and signals the group.

Modern glibc will not even use the original 'kill()' at all, I think.
But it's the legacy behavior.

So I do think Oleg is right. We should be careful. You'll not notice
breakage on a modern distro, but you might easily break old code.

 Linus


Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

2018-07-16 Thread Eric W. Biederman
Oleg Nesterov  writes:

> On 07/10, Eric W. Biederman wrote:
>>
>> Now that we can make the distinction use PIDTYPE_TGID rather than
>> PIDTYPE_PID.
>
> Wai, wait, this doesn't look right...
>
>> There is no immediate effect as they point point at the
>> same task,
>
> How so? pid_task(pid, PIDTYPE_TGID) will return NULL unless this pid is 
> actually
> a group leader's pid,
>
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -1315,7 +1315,7 @@ int kill_pid_info(int sig, struct siginfo *info, 
>> struct pid *pid)
>>
>>  for (;;) {
>>  rcu_read_lock();
>> -p = pid_task(pid, PIDTYPE_PID);
>> +p = pid_task(pid, PIDTYPE_TGID);
>>  if (p)
>>  error = group_send_sig_info(sig, info, p);
>
> So, currently kill(pid_nr) always works, even if pid_nr is a sub-thread's tid.
>
> After this change kill(2) will always fail with -ESRCH in this case.
>
> Or I am totally confused?

No you are not.

That does at least need to be documented in the description of the
patch.

In practice since glibc does not make thread id's available I don't
expect anyone relies on this behavior.  Since no one relies on it we
can change it without creating a regression.

I believe this can be described as fixing a bug that we were not able to
before the introduction of PIDTYPE_TGID.

I will update my change description.

Eric


Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

2018-07-16 Thread Eric W. Biederman
Oleg Nesterov  writes:

> On 07/10, Eric W. Biederman wrote:
>>
>> Now that we can make the distinction use PIDTYPE_TGID rather than
>> PIDTYPE_PID.
>
> Wai, wait, this doesn't look right...
>
>> There is no immediate effect as they point point at the
>> same task,
>
> How so? pid_task(pid, PIDTYPE_TGID) will return NULL unless this pid is 
> actually
> a group leader's pid,
>
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -1315,7 +1315,7 @@ int kill_pid_info(int sig, struct siginfo *info, 
>> struct pid *pid)
>>
>>  for (;;) {
>>  rcu_read_lock();
>> -p = pid_task(pid, PIDTYPE_PID);
>> +p = pid_task(pid, PIDTYPE_TGID);
>>  if (p)
>>  error = group_send_sig_info(sig, info, p);
>
> So, currently kill(pid_nr) always works, even if pid_nr is a sub-thread's tid.
>
> After this change kill(2) will always fail with -ESRCH in this case.
>
> Or I am totally confused?

No you are not.

That does at least need to be documented in the description of the
patch.

In practice since glibc does not make thread id's available I don't
expect anyone relies on this behavior.  Since no one relies on it we
can change it without creating a regression.

I believe this can be described as fixing a bug that we were not able to
before the introduction of PIDTYPE_TGID.

I will update my change description.

Eric


Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

2018-07-16 Thread Oleg Nesterov
On 07/10, Eric W. Biederman wrote:
>
> Now that we can make the distinction use PIDTYPE_TGID rather than
> PIDTYPE_PID.

Wai, wait, this doesn't look right...

> There is no immediate effect as they point point at the
> same task,

How so? pid_task(pid, PIDTYPE_TGID) will return NULL unless this pid is actually
a group leader's pid,

> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1315,7 +1315,7 @@ int kill_pid_info(int sig, struct siginfo *info, struct 
> pid *pid)
>
>   for (;;) {
>   rcu_read_lock();
> - p = pid_task(pid, PIDTYPE_PID);
> + p = pid_task(pid, PIDTYPE_TGID);
>   if (p)
>   error = group_send_sig_info(sig, info, p);

So, currently kill(pid_nr) always works, even if pid_nr is a sub-thread's tid.

After this change kill(2) will always fail with -ESRCH in this case.

Or I am totally confused?

> --- a/kernel/time/posix-timers.c
> +++ b/kernel/time/posix-timers.c
> @@ -347,12 +347,11 @@ int posix_timer_event(struct k_itimer *timr, int 
> si_private)
>*/
>   timr->sigq->info.si_sys_private = si_private;
>  
> + shared = !(timr->it_sigev_notify & SIGEV_THREAD_ID);
>   rcu_read_lock();
> - task = pid_task(timr->it_pid, PIDTYPE_PID);
> - if (task) {
> - shared = !(timr->it_sigev_notify & SIGEV_THREAD_ID);
> + task = pid_task(timr->it_pid, shared ? PIDTYPE_TGID : PIDTYPE_PID);

This looks fine, afaics without SIGEV_THREAD_ID ->it_pid is alwats task_tgid().

Oleg.



Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

2018-07-16 Thread Oleg Nesterov
On 07/10, Eric W. Biederman wrote:
>
> Now that we can make the distinction use PIDTYPE_TGID rather than
> PIDTYPE_PID.

Wai, wait, this doesn't look right...

> There is no immediate effect as they point point at the
> same task,

How so? pid_task(pid, PIDTYPE_TGID) will return NULL unless this pid is actually
a group leader's pid,

> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1315,7 +1315,7 @@ int kill_pid_info(int sig, struct siginfo *info, struct 
> pid *pid)
>
>   for (;;) {
>   rcu_read_lock();
> - p = pid_task(pid, PIDTYPE_PID);
> + p = pid_task(pid, PIDTYPE_TGID);
>   if (p)
>   error = group_send_sig_info(sig, info, p);

So, currently kill(pid_nr) always works, even if pid_nr is a sub-thread's tid.

After this change kill(2) will always fail with -ESRCH in this case.

Or I am totally confused?

> --- a/kernel/time/posix-timers.c
> +++ b/kernel/time/posix-timers.c
> @@ -347,12 +347,11 @@ int posix_timer_event(struct k_itimer *timr, int 
> si_private)
>*/
>   timr->sigq->info.si_sys_private = si_private;
>  
> + shared = !(timr->it_sigev_notify & SIGEV_THREAD_ID);
>   rcu_read_lock();
> - task = pid_task(timr->it_pid, PIDTYPE_PID);
> - if (task) {
> - shared = !(timr->it_sigev_notify & SIGEV_THREAD_ID);
> + task = pid_task(timr->it_pid, shared ? PIDTYPE_TGID : PIDTYPE_PID);

This looks fine, afaics without SIGEV_THREAD_ID ->it_pid is alwats task_tgid().

Oleg.