I checked in this change; thanks.
-- David

Michael Melanson wrote:
> Hi again,
> 
> I fixed the problem from my last email, and I want to share what the  
> root problem was and the solution I found, in case someone else runs  
> into something similar. I also have a suggestion for a change to the  
> BOINC client.
> 
> I didn't think this was relevant, so I didn't mention it in my last  
> email, but my apps link against MPICH. I have it set up in a one-node  
> configuration in my app because the code I'm modifying is depends on  
> it and will still be used on a grid for the time being as well as via  
> BOINC.
> 
> When you call MPI_Init, if the process is not attached to a terminal  
> (this is why it only showed up when the "--background" flag was set),  
> then MPICH calls setsid() to create a new session. This creates a new  
> process group as well for that session, thus removing the app process  
> from the BOINC process's group.
> 
> In app_control.cpp:169 and app_control.cpp:510, when BOINC calls
> waitpid() to check the status of the processes, it gives pid=0. This  
> makes it only reap processes in its same group; mine weren't anymore,  
> and so they turned into zombies. I fixed this problem by setting the  
> environment variable MPICH_PROCESS_GROUP=no, which prevents MPICH from  
> creating a new process group.
> 
> However, I think BOINC's behaviour should be changed as well so it  
> will wait for any child process, not just those in its same group  
> (i.e, set pid=-1 when calling waitpid()). Is there a reason for the  
> current behaviour, or has this just never bitten anyone before?
> 
> 
> Thanks,
> 
> Michael Melanson
> 
> 
> On 23-Oct-09, at 16:46 , Michael Melanson wrote:
> 
>> Hi everyone,
>>
>> I'm having a problem with my work units. When I run them on my test
>> box (client version 6.9.0), they run correctly and exit, but then
>> remain in the "Running" state as zombies:
>>
>> r...@tomtest1:~/boinc_project# ps -Af | grep boinc
>> melanson 10519     1  0 Oct22 ?        00:05:25 /usr/bin/boincmgr
>> boinc    11646     1  0 16:32 ?        00:00:00 /usr/bin/boinc --
>> check_all_logins --redirectio --dir /var/lib/boinc-client
>> boinc    11654 11646  1 16:33 ?        00:00:04 [evaluator_0.150]
>> <defunct>
>> boinc    11655 11646  1 16:33 ?        00:00:04 [evaluator_0.150]
>> <defunct>
>> root     11943  4667  0 16:38 pts/2    00:00:00 grep boinc
>>
>> Apparently the client is not calling waitpid() on them, but I have no
>> idea why not.
>>
>> The processes terminate correctly when run stand-alone. They also
>> terminate correctly when I start the client such that it remains
>> attached to the terminal. This problem only occurs when it is started
>> as a daemon process, such as by the init.d process. That is, if the
>> client is started by running the following command as root, it leaves
>> zombies behind:
>>
>> # start-stop-daemon --start --background --pidfile /var/run/boinc.pid
>> --make-pidfile -quiet --user boinc --chuid boinc --chdir /var/lib/
>> boinc-client --exec /usr/bin/boinc -- --check_all_logins --redirectio
>> --dir /var/lib/boinc-client
>>
>> However, if you get rid of the '--background' flag from that line, the
>> app processes terminate correctly. I'm at a loss as to why this should
>> make any difference, and would appreciate any help and insight you can
>> provide.
>>
>> Thank you in advance.
>>
>>
>> Michael
> 
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to