Re: [OMPI users] ompi-clean on single executable
On Oct 26, 2012, at 4:14 AM, Nicolas Deladerriere wrote: > Thanks all for your comments > > Ralph > > What I was initially looking at is a tool (or option of orte-clean) that > clean up the mess you are talking about, but only the mess that have been > created by a single mpirun command. As far I have understood, orte-clean > clean all mess on a node associated to all open-mpi process that have run (or > are currently running). That is correct. We could fairly easily modify it to cleanup leftover files from a single mpirun without affecting others. Unfortunately, there really isn't any easy way to tell what processes belong to a specific mpirun, so selectively killing zombies would be very hard to do > > According to Rolph comment, usually, mpirun command does not leave any zombie > processes, Hence it seems that the effect of orte-clean is limited. But, > since it exists, I was wondering that it is doing usefull stuff ? It was created during the early years of our work when zombies were frequently occurring. The need for it has declined over the years, but we keep it around because we do still hit problems on occasion - especially during development. > > Cheers, > Nicolas > > 2012/10/25 Ralph Castain > Okay, now I'm confused. If all you want to do is cleanly "kill" a running > OMPI job, then why not just issue > > $ kill SIGTERM > > This will cause mpirun to order the clean termination of all remote procs > within that execution, and then cleanly terminate itself. No tool we create > could do it any better. > > Is there an issue with doing so? > > orte-clean was intended to cleanup the mess if/when the above method doesn't > work - i.e., when you have to "kill SIGKILL mpirun", which forcibly kills > mpirun but might leave zombie orteds on the remote nodes. > > > On Oct 24, 2012, at 10:39 AM, Jeff Squyres wrote: > > > Or perhaps cloned, renamed to orte-kill, and modified to kill a single (or > > multiple) specific job(s). That would be POSIX-like ("kill" vs. "clean"). > > > > > > On Oct 24, 2012, at 1:32 PM, Rolf vandeVaart wrote: > > > >> And just to give a little context, ompi-clean was created initially to > >> "clean" up a node, not for cleaning up a specific job. It was for the > >> case where MPI jobs would leave some files behind or leave some processes > >> running. (I do not believe this happens much at all anymore.) But, as > >> was said, no reason it could not be modified. > >> > >>> -Original Message- > >>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] > >>> On Behalf Of Jeff Squyres > >>> Sent: Wednesday, October 24, 2012 12:56 PM > >>> To: Open MPI Users > >>> Subject: Re: [OMPI users] ompi-clean on single executable > >>> > >>> ...but patches would be greatly appreciated. :-) > >>> > >>> On Oct 24, 2012, at 12:24 PM, Ralph Castain wrote: > >>> > >>>> All things are possible, including what you describe. Not sure when we > >>> would get to it, though. > >>>> > >>>> > >>>> On Oct 24, 2012, at 4:01 AM, Nicolas Deladerriere > >>> wrote: > >>>> > >>>>> Reuti, > >>>>> > >>>>> The problem I am facing is a small small part of our production > >>>>> system, and I cannot modify our mpirun submission system. This is why > >>>>> i am looking at solution using only ompi-clean of mpirun command > >>>>> specification. > >>>>> > >>>>> Thanks, > >>>>> Nicolas > >>>>> > >>>>> 2012/10/24, Reuti : > >>>>>> Am 24.10.2012 um 11:33 schrieb Nicolas Deladerriere: > >>>>>> > >>>>>>> Reuti, > >>>>>>> > >>>>>>> Thanks for your comments, > >>>>>>> > >>>>>>> In our case, we are currently running different mpirun commands on > >>>>>>> clusters sharing the same frontend. Basically we use a wrapper to > >>>>>>> run the mpirun command and to run an ompi-clean command to clean > >>> up > >>>>>>> the mpi job if required. > >>>>>>> Using ompi-clean like this just kills all other mpi jobs running on > >>>>>>> same frontend. I cannot use queuing system > >>>&g
Re: [OMPI users] ompi-clean on single executable
Thanks all for your comments Ralph What I was initially looking at is a tool (or option of orte-clean) that clean up the mess you are talking about, but only the mess that have been created by a single mpirun command. As far I have understood, orte-clean clean all mess on a node associated to all open-mpi process that have run (or are currently running). According to Rolph comment, usually, mpirun command does not leave any zombie processes, Hence it seems that the effect of orte-clean is limited. But, since it exists, I was wondering that it is doing usefull stuff ? Cheers, Nicolas 2012/10/25 Ralph Castain > Okay, now I'm confused. If all you want to do is cleanly "kill" a running > OMPI job, then why not just issue > > $ kill SIGTERM > > This will cause mpirun to order the clean termination of all remote procs > within that execution, and then cleanly terminate itself. No tool we create > could do it any better. > > Is there an issue with doing so? > > orte-clean was intended to cleanup the mess if/when the above method > doesn't work - i.e., when you have to "kill SIGKILL mpirun", which forcibly > kills mpirun but might leave zombie orteds on the remote nodes. > > > On Oct 24, 2012, at 10:39 AM, Jeff Squyres wrote: > > > Or perhaps cloned, renamed to orte-kill, and modified to kill a single > (or multiple) specific job(s). That would be POSIX-like ("kill" vs. > "clean"). > > > > > > On Oct 24, 2012, at 1:32 PM, Rolf vandeVaart wrote: > > > >> And just to give a little context, ompi-clean was created initially to > "clean" up a node, not for cleaning up a specific job. It was for the case > where MPI jobs would leave some files behind or leave some processes > running. (I do not believe this happens much at all anymore.) But, as was > said, no reason it could not be modified. > >> > >>> -Original Message- > >>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] > >>> On Behalf Of Jeff Squyres > >>> Sent: Wednesday, October 24, 2012 12:56 PM > >>> To: Open MPI Users > >>> Subject: Re: [OMPI users] ompi-clean on single executable > >>> > >>> ...but patches would be greatly appreciated. :-) > >>> > >>> On Oct 24, 2012, at 12:24 PM, Ralph Castain wrote: > >>> > >>>> All things are possible, including what you describe. Not sure when we > >>> would get to it, though. > >>>> > >>>> > >>>> On Oct 24, 2012, at 4:01 AM, Nicolas Deladerriere > >>> wrote: > >>>> > >>>>> Reuti, > >>>>> > >>>>> The problem I am facing is a small small part of our production > >>>>> system, and I cannot modify our mpirun submission system. This is why > >>>>> i am looking at solution using only ompi-clean of mpirun command > >>>>> specification. > >>>>> > >>>>> Thanks, > >>>>> Nicolas > >>>>> > >>>>> 2012/10/24, Reuti : > >>>>>> Am 24.10.2012 um 11:33 schrieb Nicolas Deladerriere: > >>>>>> > >>>>>>> Reuti, > >>>>>>> > >>>>>>> Thanks for your comments, > >>>>>>> > >>>>>>> In our case, we are currently running different mpirun commands on > >>>>>>> clusters sharing the same frontend. Basically we use a wrapper to > >>>>>>> run the mpirun command and to run an ompi-clean command to clean > >>> up > >>>>>>> the mpi job if required. > >>>>>>> Using ompi-clean like this just kills all other mpi jobs running on > >>>>>>> same frontend. I cannot use queuing system > >>>>>> > >>>>>> Why? Using it on a single machine was only one possible setup. Its > >>>>>> purpose is to distribute jobs to slave hosts. If you have already > >>>>>> one frontend as login-machine it fits perfect: the qmaster (in case > >>>>>> of SGE) can run there and the execd on the nodes. > >>>>>> > >>>>>> -- Reuti > >>>>>> > >>>>>> > >>>>>>> as you have suggested this > >>>>>>> is why I was wondering a option or other solution associated to > >>>>>>> ompi-clean command to avoi
Re: [OMPI users] ompi-clean on single executable
Okay, now I'm confused. If all you want to do is cleanly "kill" a running OMPI job, then why not just issue $ kill SIGTERM This will cause mpirun to order the clean termination of all remote procs within that execution, and then cleanly terminate itself. No tool we create could do it any better. Is there an issue with doing so? orte-clean was intended to cleanup the mess if/when the above method doesn't work - i.e., when you have to "kill SIGKILL mpirun", which forcibly kills mpirun but might leave zombie orteds on the remote nodes. On Oct 24, 2012, at 10:39 AM, Jeff Squyres wrote: > Or perhaps cloned, renamed to orte-kill, and modified to kill a single (or > multiple) specific job(s). That would be POSIX-like ("kill" vs. "clean"). > > > On Oct 24, 2012, at 1:32 PM, Rolf vandeVaart wrote: > >> And just to give a little context, ompi-clean was created initially to >> "clean" up a node, not for cleaning up a specific job. It was for the case >> where MPI jobs would leave some files behind or leave some processes >> running. (I do not believe this happens much at all anymore.) But, as was >> said, no reason it could not be modified. >> >>> -Original Message- >>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] >>> On Behalf Of Jeff Squyres >>> Sent: Wednesday, October 24, 2012 12:56 PM >>> To: Open MPI Users >>> Subject: Re: [OMPI users] ompi-clean on single executable >>> >>> ...but patches would be greatly appreciated. :-) >>> >>> On Oct 24, 2012, at 12:24 PM, Ralph Castain wrote: >>> >>>> All things are possible, including what you describe. Not sure when we >>> would get to it, though. >>>> >>>> >>>> On Oct 24, 2012, at 4:01 AM, Nicolas Deladerriere >>> wrote: >>>> >>>>> Reuti, >>>>> >>>>> The problem I am facing is a small small part of our production >>>>> system, and I cannot modify our mpirun submission system. This is why >>>>> i am looking at solution using only ompi-clean of mpirun command >>>>> specification. >>>>> >>>>> Thanks, >>>>> Nicolas >>>>> >>>>> 2012/10/24, Reuti : >>>>>> Am 24.10.2012 um 11:33 schrieb Nicolas Deladerriere: >>>>>> >>>>>>> Reuti, >>>>>>> >>>>>>> Thanks for your comments, >>>>>>> >>>>>>> In our case, we are currently running different mpirun commands on >>>>>>> clusters sharing the same frontend. Basically we use a wrapper to >>>>>>> run the mpirun command and to run an ompi-clean command to clean >>> up >>>>>>> the mpi job if required. >>>>>>> Using ompi-clean like this just kills all other mpi jobs running on >>>>>>> same frontend. I cannot use queuing system >>>>>> >>>>>> Why? Using it on a single machine was only one possible setup. Its >>>>>> purpose is to distribute jobs to slave hosts. If you have already >>>>>> one frontend as login-machine it fits perfect: the qmaster (in case >>>>>> of SGE) can run there and the execd on the nodes. >>>>>> >>>>>> -- Reuti >>>>>> >>>>>> >>>>>>> as you have suggested this >>>>>>> is why I was wondering a option or other solution associated to >>>>>>> ompi-clean command to avoid this general mpi jobs cleaning. >>>>>>> >>>>>>> Cheers >>>>>>> Nicolas >>>>>>> >>>>>>> 2012/10/24, Reuti : >>>>>>>> Hi, >>>>>>>> >>>>>>>> Am 24.10.2012 um 09:36 schrieb Nicolas Deladerriere: >>>>>>>> >>>>>>>>> I am having issue running ompi-clean which clean up (this is >>>>>>>>> normal) session associated to a user which means it kills all >>>>>>>>> running jobs assoicated to this session (this is also normal). >>>>>>>>> But I would like to be able to clean up session associated to a >>>>>>>>> job (a not user). >>>>>>>>> >>>>>>>>> Here is my point: >>>>>>>&g
Re: [OMPI users] ompi-clean on single executable
Or perhaps cloned, renamed to orte-kill, and modified to kill a single (or multiple) specific job(s). That would be POSIX-like ("kill" vs. "clean"). On Oct 24, 2012, at 1:32 PM, Rolf vandeVaart wrote: > And just to give a little context, ompi-clean was created initially to > "clean" up a node, not for cleaning up a specific job. It was for the case > where MPI jobs would leave some files behind or leave some processes running. > (I do not believe this happens much at all anymore.) But, as was said, no > reason it could not be modified. > >> -Original Message- >> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] >> On Behalf Of Jeff Squyres >> Sent: Wednesday, October 24, 2012 12:56 PM >> To: Open MPI Users >> Subject: Re: [OMPI users] ompi-clean on single executable >> >> ...but patches would be greatly appreciated. :-) >> >> On Oct 24, 2012, at 12:24 PM, Ralph Castain wrote: >> >>> All things are possible, including what you describe. Not sure when we >> would get to it, though. >>> >>> >>> On Oct 24, 2012, at 4:01 AM, Nicolas Deladerriere >> wrote: >>> >>>> Reuti, >>>> >>>> The problem I am facing is a small small part of our production >>>> system, and I cannot modify our mpirun submission system. This is why >>>> i am looking at solution using only ompi-clean of mpirun command >>>> specification. >>>> >>>> Thanks, >>>> Nicolas >>>> >>>> 2012/10/24, Reuti : >>>>> Am 24.10.2012 um 11:33 schrieb Nicolas Deladerriere: >>>>> >>>>>> Reuti, >>>>>> >>>>>> Thanks for your comments, >>>>>> >>>>>> In our case, we are currently running different mpirun commands on >>>>>> clusters sharing the same frontend. Basically we use a wrapper to >>>>>> run the mpirun command and to run an ompi-clean command to clean >> up >>>>>> the mpi job if required. >>>>>> Using ompi-clean like this just kills all other mpi jobs running on >>>>>> same frontend. I cannot use queuing system >>>>> >>>>> Why? Using it on a single machine was only one possible setup. Its >>>>> purpose is to distribute jobs to slave hosts. If you have already >>>>> one frontend as login-machine it fits perfect: the qmaster (in case >>>>> of SGE) can run there and the execd on the nodes. >>>>> >>>>> -- Reuti >>>>> >>>>> >>>>>> as you have suggested this >>>>>> is why I was wondering a option or other solution associated to >>>>>> ompi-clean command to avoid this general mpi jobs cleaning. >>>>>> >>>>>> Cheers >>>>>> Nicolas >>>>>> >>>>>> 2012/10/24, Reuti : >>>>>>> Hi, >>>>>>> >>>>>>> Am 24.10.2012 um 09:36 schrieb Nicolas Deladerriere: >>>>>>> >>>>>>>> I am having issue running ompi-clean which clean up (this is >>>>>>>> normal) session associated to a user which means it kills all >>>>>>>> running jobs assoicated to this session (this is also normal). >>>>>>>> But I would like to be able to clean up session associated to a >>>>>>>> job (a not user). >>>>>>>> >>>>>>>> Here is my point: >>>>>>>> >>>>>>>> I am running two executable : >>>>>>>> >>>>>>>> % mpirun -np 2 myexec1 >>>>>>>>--> run with PID 2399 ... >>>>>>>> % mpirun -np 2 myexec2 >>>>>>>>--> run with PID 2402 ... >>>>>>>> >>>>>>>> When I run orte-clean I got this result : >>>>>>>> % orte-clean -v >>>>>>>> orte-clean: cleaning session dir tree >>>>>>>> openmpi-sessions-ndelader@myhost_0 >>>>>>>> orte-clean: killing any lingering procs >>>>>>>> orte-clean: found potential rogue orterun process >>>>>>>> (pid=2399,user=ndelader), sending SIGKILL... >>>>>>>> orte-clean: found potential rogue ort
Re: [OMPI users] ompi-clean on single executable
And just to give a little context, ompi-clean was created initially to "clean" up a node, not for cleaning up a specific job. It was for the case where MPI jobs would leave some files behind or leave some processes running. (I do not believe this happens much at all anymore.) But, as was said, no reason it could not be modified. >-Original Message- >From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] >On Behalf Of Jeff Squyres >Sent: Wednesday, October 24, 2012 12:56 PM >To: Open MPI Users >Subject: Re: [OMPI users] ompi-clean on single executable > >...but patches would be greatly appreciated. :-) > >On Oct 24, 2012, at 12:24 PM, Ralph Castain wrote: > >> All things are possible, including what you describe. Not sure when we >would get to it, though. >> >> >> On Oct 24, 2012, at 4:01 AM, Nicolas Deladerriere > wrote: >> >>> Reuti, >>> >>> The problem I am facing is a small small part of our production >>> system, and I cannot modify our mpirun submission system. This is why >>> i am looking at solution using only ompi-clean of mpirun command >>> specification. >>> >>> Thanks, >>> Nicolas >>> >>> 2012/10/24, Reuti : >>>> Am 24.10.2012 um 11:33 schrieb Nicolas Deladerriere: >>>> >>>>> Reuti, >>>>> >>>>> Thanks for your comments, >>>>> >>>>> In our case, we are currently running different mpirun commands on >>>>> clusters sharing the same frontend. Basically we use a wrapper to >>>>> run the mpirun command and to run an ompi-clean command to clean >up >>>>> the mpi job if required. >>>>> Using ompi-clean like this just kills all other mpi jobs running on >>>>> same frontend. I cannot use queuing system >>>> >>>> Why? Using it on a single machine was only one possible setup. Its >>>> purpose is to distribute jobs to slave hosts. If you have already >>>> one frontend as login-machine it fits perfect: the qmaster (in case >>>> of SGE) can run there and the execd on the nodes. >>>> >>>> -- Reuti >>>> >>>> >>>>> as you have suggested this >>>>> is why I was wondering a option or other solution associated to >>>>> ompi-clean command to avoid this general mpi jobs cleaning. >>>>> >>>>> Cheers >>>>> Nicolas >>>>> >>>>> 2012/10/24, Reuti : >>>>>> Hi, >>>>>> >>>>>> Am 24.10.2012 um 09:36 schrieb Nicolas Deladerriere: >>>>>> >>>>>>> I am having issue running ompi-clean which clean up (this is >>>>>>> normal) session associated to a user which means it kills all >>>>>>> running jobs assoicated to this session (this is also normal). >>>>>>> But I would like to be able to clean up session associated to a >>>>>>> job (a not user). >>>>>>> >>>>>>> Here is my point: >>>>>>> >>>>>>> I am running two executable : >>>>>>> >>>>>>> % mpirun -np 2 myexec1 >>>>>>> --> run with PID 2399 ... >>>>>>> % mpirun -np 2 myexec2 >>>>>>> --> run with PID 2402 ... >>>>>>> >>>>>>> When I run orte-clean I got this result : >>>>>>> % orte-clean -v >>>>>>> orte-clean: cleaning session dir tree >>>>>>> openmpi-sessions-ndelader@myhost_0 >>>>>>> orte-clean: killing any lingering procs >>>>>>> orte-clean: found potential rogue orterun process >>>>>>> (pid=2399,user=ndelader), sending SIGKILL... >>>>>>> orte-clean: found potential rogue orterun process >>>>>>> (pid=2402,user=ndelader), sending SIGKILL... >>>>>>> >>>>>>> Which means that both jobs have been killed :-( Basically I would >>>>>>> like to perform orte-clean using executable name or PID or >>>>>>> whatever that identify which job I want to stop an clean. It >>>>>>> seems I would need to create an openmpi session per job. Does it >make sense ? >>>>>>> And >>>>>>> I would like to be able to do something like following command >>>>>>> and
Re: [OMPI users] ompi-clean on single executable
...but patches would be greatly appreciated. :-) On Oct 24, 2012, at 12:24 PM, Ralph Castain wrote: > All things are possible, including what you describe. Not sure when we would > get to it, though. > > > On Oct 24, 2012, at 4:01 AM, Nicolas Deladerriere > wrote: > >> Reuti, >> >> The problem I am facing is a small small part of our production >> system, and I cannot modify our mpirun submission system. This is why >> i am looking at solution using only ompi-clean of mpirun command >> specification. >> >> Thanks, >> Nicolas >> >> 2012/10/24, Reuti : >>> Am 24.10.2012 um 11:33 schrieb Nicolas Deladerriere: >>> Reuti, Thanks for your comments, In our case, we are currently running different mpirun commands on clusters sharing the same frontend. Basically we use a wrapper to run the mpirun command and to run an ompi-clean command to clean up the mpi job if required. Using ompi-clean like this just kills all other mpi jobs running on same frontend. I cannot use queuing system >>> >>> Why? Using it on a single machine was only one possible setup. Its purpose >>> is to distribute jobs to slave hosts. If you have already one frontend as >>> login-machine it fits perfect: the qmaster (in case of SGE) can run there >>> and the execd on the nodes. >>> >>> -- Reuti >>> >>> as you have suggested this is why I was wondering a option or other solution associated to ompi-clean command to avoid this general mpi jobs cleaning. Cheers Nicolas 2012/10/24, Reuti : > Hi, > > Am 24.10.2012 um 09:36 schrieb Nicolas Deladerriere: > >> I am having issue running ompi-clean which clean up (this is normal) >> session associated to a user which means it kills all running jobs >> assoicated to this session (this is also normal). But I would like to >> be >> able to clean up session associated to a job (a not user). >> >> Here is my point: >> >> I am running two executable : >> >> % mpirun -np 2 myexec1 >> --> run with PID 2399 ... >> % mpirun -np 2 myexec2 >> --> run with PID 2402 ... >> >> When I run orte-clean I got this result : >> % orte-clean -v >> orte-clean: cleaning session dir tree >> openmpi-sessions-ndelader@myhost_0 >> orte-clean: killing any lingering procs >> orte-clean: found potential rogue orterun process >> (pid=2399,user=ndelader), sending SIGKILL... >> orte-clean: found potential rogue orterun process >> (pid=2402,user=ndelader), sending SIGKILL... >> >> Which means that both jobs have been killed :-( >> Basically I would like to perform orte-clean using executable name or >> PID >> or whatever that identify which job I want to stop an clean. It seems I >> would need to create an openmpi session per job. Does it make sense ? >> And >> I would like to be able to do something like following command and get >> following result : >> >> % orte-clean -v myexec1 >> orte-clean: cleaning session dir tree >> openmpi-sessions-ndelader@myhost_0 >> orte-clean: killing any lingering procs >> orte-clean: found potential rogue orterun process >> (pid=2399,user=ndelader), sending SIGKILL... >> >> >> Does it make sense ? Is there a way to perform this kind of selection >> in >> cleaning process ? > > How many jobs are you starting on how many nodes at one time? This > requirement could be a point to start to use a queuing system, where can > remove job individually and also serialize your workflow. In fact: we > use > GridEngine also local on workstations for this purpose. > > -- Reuti > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] ompi-clean on single executable
All things are possible, including what you describe. Not sure when we would get to it, though. On Oct 24, 2012, at 4:01 AM, Nicolas Deladerriere wrote: > Reuti, > > The problem I am facing is a small small part of our production > system, and I cannot modify our mpirun submission system. This is why > i am looking at solution using only ompi-clean of mpirun command > specification. > > Thanks, > Nicolas > > 2012/10/24, Reuti : >> Am 24.10.2012 um 11:33 schrieb Nicolas Deladerriere: >> >>> Reuti, >>> >>> Thanks for your comments, >>> >>> In our case, we are currently running different mpirun commands on >>> clusters sharing the same frontend. Basically we use a wrapper to run >>> the mpirun command and to run an ompi-clean command to clean up the >>> mpi job if required. >>> Using ompi-clean like this just kills all other mpi jobs running on >>> same frontend. I cannot use queuing system >> >> Why? Using it on a single machine was only one possible setup. Its purpose >> is to distribute jobs to slave hosts. If you have already one frontend as >> login-machine it fits perfect: the qmaster (in case of SGE) can run there >> and the execd on the nodes. >> >> -- Reuti >> >> >>> as you have suggested this >>> is why I was wondering a option or other solution associated to >>> ompi-clean command to avoid this general mpi jobs cleaning. >>> >>> Cheers >>> Nicolas >>> >>> 2012/10/24, Reuti : Hi, Am 24.10.2012 um 09:36 schrieb Nicolas Deladerriere: > I am having issue running ompi-clean which clean up (this is normal) > session associated to a user which means it kills all running jobs > assoicated to this session (this is also normal). But I would like to > be > able to clean up session associated to a job (a not user). > > Here is my point: > > I am running two executable : > > % mpirun -np 2 myexec1 > --> run with PID 2399 ... > % mpirun -np 2 myexec2 > --> run with PID 2402 ... > > When I run orte-clean I got this result : > % orte-clean -v > orte-clean: cleaning session dir tree > openmpi-sessions-ndelader@myhost_0 > orte-clean: killing any lingering procs > orte-clean: found potential rogue orterun process > (pid=2399,user=ndelader), sending SIGKILL... > orte-clean: found potential rogue orterun process > (pid=2402,user=ndelader), sending SIGKILL... > > Which means that both jobs have been killed :-( > Basically I would like to perform orte-clean using executable name or > PID > or whatever that identify which job I want to stop an clean. It seems I > would need to create an openmpi session per job. Does it make sense ? > And > I would like to be able to do something like following command and get > following result : > > % orte-clean -v myexec1 > orte-clean: cleaning session dir tree > openmpi-sessions-ndelader@myhost_0 > orte-clean: killing any lingering procs > orte-clean: found potential rogue orterun process > (pid=2399,user=ndelader), sending SIGKILL... > > > Does it make sense ? Is there a way to perform this kind of selection > in > cleaning process ? How many jobs are you starting on how many nodes at one time? This requirement could be a point to start to use a queuing system, where can remove job individually and also serialize your workflow. In fact: we use GridEngine also local on workstations for this purpose. -- Reuti ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] ompi-clean on single executable
Reuti, The problem I am facing is a small small part of our production system, and I cannot modify our mpirun submission system. This is why i am looking at solution using only ompi-clean of mpirun command specification. Thanks, Nicolas 2012/10/24, Reuti : > Am 24.10.2012 um 11:33 schrieb Nicolas Deladerriere: > >> Reuti, >> >> Thanks for your comments, >> >> In our case, we are currently running different mpirun commands on >> clusters sharing the same frontend. Basically we use a wrapper to run >> the mpirun command and to run an ompi-clean command to clean up the >> mpi job if required. >> Using ompi-clean like this just kills all other mpi jobs running on >> same frontend. I cannot use queuing system > > Why? Using it on a single machine was only one possible setup. Its purpose > is to distribute jobs to slave hosts. If you have already one frontend as > login-machine it fits perfect: the qmaster (in case of SGE) can run there > and the execd on the nodes. > > -- Reuti > > >> as you have suggested this >> is why I was wondering a option or other solution associated to >> ompi-clean command to avoid this general mpi jobs cleaning. >> >> Cheers >> Nicolas >> >> 2012/10/24, Reuti : >>> Hi, >>> >>> Am 24.10.2012 um 09:36 schrieb Nicolas Deladerriere: >>> I am having issue running ompi-clean which clean up (this is normal) session associated to a user which means it kills all running jobs assoicated to this session (this is also normal). But I would like to be able to clean up session associated to a job (a not user). Here is my point: I am running two executable : % mpirun -np 2 myexec1 --> run with PID 2399 ... % mpirun -np 2 myexec2 --> run with PID 2402 ... When I run orte-clean I got this result : % orte-clean -v orte-clean: cleaning session dir tree openmpi-sessions-ndelader@myhost_0 orte-clean: killing any lingering procs orte-clean: found potential rogue orterun process (pid=2399,user=ndelader), sending SIGKILL... orte-clean: found potential rogue orterun process (pid=2402,user=ndelader), sending SIGKILL... Which means that both jobs have been killed :-( Basically I would like to perform orte-clean using executable name or PID or whatever that identify which job I want to stop an clean. It seems I would need to create an openmpi session per job. Does it make sense ? And I would like to be able to do something like following command and get following result : % orte-clean -v myexec1 orte-clean: cleaning session dir tree openmpi-sessions-ndelader@myhost_0 orte-clean: killing any lingering procs orte-clean: found potential rogue orterun process (pid=2399,user=ndelader), sending SIGKILL... Does it make sense ? Is there a way to perform this kind of selection in cleaning process ? >>> >>> How many jobs are you starting on how many nodes at one time? This >>> requirement could be a point to start to use a queuing system, where can >>> remove job individually and also serialize your workflow. In fact: we >>> use >>> GridEngine also local on workstations for this purpose. >>> >>> -- Reuti >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] ompi-clean on single executable
Am 24.10.2012 um 11:33 schrieb Nicolas Deladerriere: > Reuti, > > Thanks for your comments, > > In our case, we are currently running different mpirun commands on > clusters sharing the same frontend. Basically we use a wrapper to run > the mpirun command and to run an ompi-clean command to clean up the > mpi job if required. > Using ompi-clean like this just kills all other mpi jobs running on > same frontend. I cannot use queuing system Why? Using it on a single machine was only one possible setup. Its purpose is to distribute jobs to slave hosts. If you have already one frontend as login-machine it fits perfect: the qmaster (in case of SGE) can run there and the execd on the nodes. -- Reuti > as you have suggested this > is why I was wondering a option or other solution associated to > ompi-clean command to avoid this general mpi jobs cleaning. > > Cheers > Nicolas > > 2012/10/24, Reuti : >> Hi, >> >> Am 24.10.2012 um 09:36 schrieb Nicolas Deladerriere: >> >>> I am having issue running ompi-clean which clean up (this is normal) >>> session associated to a user which means it kills all running jobs >>> assoicated to this session (this is also normal). But I would like to be >>> able to clean up session associated to a job (a not user). >>> >>> Here is my point: >>> >>> I am running two executable : >>> >>> % mpirun -np 2 myexec1 >>> --> run with PID 2399 ... >>> % mpirun -np 2 myexec2 >>> --> run with PID 2402 ... >>> >>> When I run orte-clean I got this result : >>> % orte-clean -v >>> orte-clean: cleaning session dir tree openmpi-sessions-ndelader@myhost_0 >>> orte-clean: killing any lingering procs >>> orte-clean: found potential rogue orterun process >>> (pid=2399,user=ndelader), sending SIGKILL... >>> orte-clean: found potential rogue orterun process >>> (pid=2402,user=ndelader), sending SIGKILL... >>> >>> Which means that both jobs have been killed :-( >>> Basically I would like to perform orte-clean using executable name or PID >>> or whatever that identify which job I want to stop an clean. It seems I >>> would need to create an openmpi session per job. Does it make sense ? And >>> I would like to be able to do something like following command and get >>> following result : >>> >>> % orte-clean -v myexec1 >>> orte-clean: cleaning session dir tree openmpi-sessions-ndelader@myhost_0 >>> orte-clean: killing any lingering procs >>> orte-clean: found potential rogue orterun process >>> (pid=2399,user=ndelader), sending SIGKILL... >>> >>> >>> Does it make sense ? Is there a way to perform this kind of selection in >>> cleaning process ? >> >> How many jobs are you starting on how many nodes at one time? This >> requirement could be a point to start to use a queuing system, where can >> remove job individually and also serialize your workflow. In fact: we use >> GridEngine also local on workstations for this purpose. >> >> -- Reuti >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] ompi-clean on single executable
Reuti, Thanks for your comments, In our case, we are currently running different mpirun commands on clusters sharing the same frontend. Basically we use a wrapper to run the mpirun command and to run an ompi-clean command to clean up the mpi job if required. Using ompi-clean like this just kills all other mpi jobs running on same frontend. I cannot use queuing system as you have suggested this is why I was wondering a option or other solution associated to ompi-clean command to avoid this general mpi jobs cleaning. Cheers Nicolas 2012/10/24, Reuti : > Hi, > > Am 24.10.2012 um 09:36 schrieb Nicolas Deladerriere: > >> I am having issue running ompi-clean which clean up (this is normal) >> session associated to a user which means it kills all running jobs >> assoicated to this session (this is also normal). But I would like to be >> able to clean up session associated to a job (a not user). >> >> Here is my point: >> >> I am running two executable : >> >> % mpirun -np 2 myexec1 >>--> run with PID 2399 ... >> % mpirun -np 2 myexec2 >>--> run with PID 2402 ... >> >> When I run orte-clean I got this result : >> % orte-clean -v >> orte-clean: cleaning session dir tree openmpi-sessions-ndelader@myhost_0 >> orte-clean: killing any lingering procs >> orte-clean: found potential rogue orterun process >> (pid=2399,user=ndelader), sending SIGKILL... >> orte-clean: found potential rogue orterun process >> (pid=2402,user=ndelader), sending SIGKILL... >> >> Which means that both jobs have been killed :-( >> Basically I would like to perform orte-clean using executable name or PID >> or whatever that identify which job I want to stop an clean. It seems I >> would need to create an openmpi session per job. Does it make sense ? And >> I would like to be able to do something like following command and get >> following result : >> >> % orte-clean -v myexec1 >> orte-clean: cleaning session dir tree openmpi-sessions-ndelader@myhost_0 >> orte-clean: killing any lingering procs >> orte-clean: found potential rogue orterun process >> (pid=2399,user=ndelader), sending SIGKILL... >> >> >> Does it make sense ? Is there a way to perform this kind of selection in >> cleaning process ? > > How many jobs are you starting on how many nodes at one time? This > requirement could be a point to start to use a queuing system, where can > remove job individually and also serialize your workflow. In fact: we use > GridEngine also local on workstations for this purpose. > > -- Reuti > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] ompi-clean on single executable
Hi, Am 24.10.2012 um 09:36 schrieb Nicolas Deladerriere: > I am having issue running ompi-clean which clean up (this is normal) session > associated to a user which means it kills all running jobs assoicated to this > session (this is also normal). But I would like to be able to clean up > session associated to a job (a not user). > > Here is my point: > > I am running two executable : > > % mpirun -np 2 myexec1 >--> run with PID 2399 ... > % mpirun -np 2 myexec2 >--> run with PID 2402 ... > > When I run orte-clean I got this result : > % orte-clean -v > orte-clean: cleaning session dir tree openmpi-sessions-ndelader@myhost_0 > orte-clean: killing any lingering procs > orte-clean: found potential rogue orterun process (pid=2399,user=ndelader), > sending SIGKILL... > orte-clean: found potential rogue orterun process (pid=2402,user=ndelader), > sending SIGKILL... > > Which means that both jobs have been killed :-( > Basically I would like to perform orte-clean using executable name or PID or > whatever that identify which job I want to stop an clean. It seems I would > need to create an openmpi session per job. Does it make sense ? And I would > like to be able to do something like following command and get following > result : > > % orte-clean -v myexec1 > orte-clean: cleaning session dir tree openmpi-sessions-ndelader@myhost_0 > orte-clean: killing any lingering procs > orte-clean: found potential rogue orterun process (pid=2399,user=ndelader), > sending SIGKILL... > > > Does it make sense ? Is there a way to perform this kind of selection in > cleaning process ? How many jobs are you starting on how many nodes at one time? This requirement could be a point to start to use a queuing system, where can remove job individually and also serialize your workflow. In fact: we use GridEngine also local on workstations for this purpose. -- Reuti