On 6/18/22 23:54, adr wrote:
> On Sat, 18 Jun 2022, Jacob Moody wrote:
>> I've attempted to reproduce it, trying to remove the libthread/notify
>> factors. I've come up with this:
>>
>> #include <u.h>
>> #include <libc.h>
>>
>> static void
>> proc_udp(void*)
>> {
>>        char resp[512];
>>        char req[] = "request";
>>        int fd;
>>        int n;
>>        int pid;
>>
>>        fd = dial("udp!185.157.221.201!5678", nil, nil, nil);
>>        if(fd < 0)
>>                exits("can't dial");
>>
>>        if(write(fd, req, strlen(req)) != strlen(req))
>>                exits("can't write");
>>
>>        pid = getpid();
>>        fprint(1, "start %d\n", pid);
>>        n = read(fd, resp, sizeof(resp)-1);
>>        fprint(1, "end %d %d\n", pid, n);
>>        exits(nil);
>> }
>>
>> void
>> main(int, char**)
>> {
>>        int i;
>>        Waitmsg *wm;
>>
>>        for(i = 0; i < 10; i++){
>>                switch(fork()){
>>                case -1:
>>                        sysfatal("fork %r");
>>                case 0:
>>                        proc_udp(nil);
>>                        sysfatal("ret");
>>                default:
>>                        break;
>>                }
>>        }
>>        for(i = 0; i < 10; i++){
>>                wm = wait();
>>                print("proc %d died with message %s\n", wm->pid, wm->msg);
>>        }
>>        exits(nil);
>> }
>>
>> This code makes it pretty obvious that we are losing some children;
>> on my machine this program never exits. I see some portion of the
>> readers correctly returning -1, and the parent is able to get their
>> Waitmsg but not all of them.
> 
> Moody I think this old thread will interest you:
> 
> https://marc.info/?t=112730920400001&r=1&w=2
> 
> Russ Cox explained there:
>   It appears that your program, at its core, it is doing this:
> 
>   void
>   readproc(void *v)
>   {
>       int fd;
>       char buf[100];
>       fd = (int)v;
>       read(fd, buf, sizeof buf);
>   }
> 
>   void
>   threadmain(int argc, char **argv)
>   {
>       int p[2];
>       pipe(p);
>       proccreate(readproc, (void*)p[0], 8192);
>       proccreate(readproc, (void*)p[1], 8192);
>       close(p[0]);
>       /* and here you expect the first readproc to be done */
>       close(p[1]);
>       /* and here the second */
>   }
> 
>   Each read call is holding up a reference to its channel
>   inside the kernel, so that even though you've closed the fd
>   and removed the ref from the fd table, there is still a reference
>   to each side of the pipe in the form of the process blocked
>   on the read.
> 
>   I've never been sure whether the implicit ref held during
>   the system call is good behavior, but it's hard to change.
> 
>   In your case, writing 0 (or anything) makes the read
>   finish, releasing the last ref to the underlying pipe when
>   the system call finishes, and then everything cleans up
>   as expected.  So you've found your workaround, and now
>   we understand why it works.
> 

I was just making the wrong observation here.
I thought I had observed the child procs getting
murdered mid read, and the parent never getting
the Waitmsg. Testing again I see as Andrej had observed,
they are just blocking. I thought I was seeing a bug
related to just udp, nothing to do with notes/threads.

I apologize for the confusion, interesting thread
you linked regardless.

> ------------------------------------------
> 9fans: 9fans
> Permalink: 
> https://9fans.topicbox.com/groups/9fans/Tfa6823048ad90a21-M6e48031f9e8673387c0b47b8
> Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

------------------------------------------
9fans: 9fans
Permalink: 
https://9fans.topicbox.com/groups/9fans/Tfa6823048ad90a21-Md81beb48e514ad6a776fa41d
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

Reply via email to