Re: [OMPI users] Multi-threading with OpenMPI ?
Ralph , thank you for your help. I set "-mca opal_set_max_sys_limits 1" and my "ulimit" us "unlimited" , but still I get the errors. What's happening now is ,for every user request(webservice request) a new thread is created and in the same thread I spawn processes and these newly spawned processes do the calculation is parallel. I think, I have to change the design so that I put the "Requests" in a queue and execute "parallel job" one at a time ,rather than runing multiple "parallel jobs" at once.(this might eventually run-out of system resources ). Thank you , umanga Ralph Castain wrote: Are these threads running for long periods of time? I ask because there typically are system limits on the number of pipes any one process can open, which is what you appear to be hitting. You can check two things (as the error message tells you :-)): 1. set -mca opal_set_max_sys_limits 1 on your cmd line (or in environ). This will tell OMPI to automatically set the system to the max allowed values 2. check "ulimit" to see what you are allowed. You might need to talk to you sys admin about upping limits. On Oct 5, 2009, at 1:33 AM, Ashika Umanga Umagiliya wrote: Greetings all, First of all thank you all for the help. I tried using locks and still I get following problems : 1) When multiple threads calling MPI_Comm_Spawn (sequentially or in parallel), some spawned processes hang up on its "MPI_Init_thread(NULL,NULL,MPI_THREAD_MULTIPLE,);" method. (I can see list of all spawned processes are stacked in the 'top' command.) 2) Randomly, program (webservice) crashes with the error "[umanga:06488] [[4594,0],0] ORTE_ERROR_LOG: The system limit on number of pipes a process can open was reached in file odls_default_module.c at line 218 [umanga:06488] [[4594,0],0] ORTE_ERROR_LOG: The system limit on number of network connections a process can open was reached in file oob_tcp.c at line 447 -- Error: system limit exceeded on number of network connections that can be open This can be resolved by setting the mca parameter opal_set_max_sys_limits to 1, increasing your limit descriptor setting (using limit or ulimit commands), or asking the system administrator to increase the system limit. --" Any advices ? Thank you, umanga Richard Treumann wrote: MPI_COMM_SELF is one example. The only task it contains is the local task. The other case I had in mind is where there is a master doing all spawns. Master is launched as an MPI "job" but it has only one task. In that master, even MPI_COMM_WORLD is what I called a "single task communicator". Because the collective spawn call is "collective: across only one task in this case, it does not have the same sort of dependency on what other tasks do. I think it is common for a single task master to have responsibility for all spawns in the kind of model yours sounds like. I did not study the conversation enough to knew if you are doing all spawn calls from a "single task communicator" and I was trying to give a broadly useful explanation. Dick Treumann - MPI Team IBM Systems & Technology Group Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 users-boun...@open-mpi.org wrote on 09/25/2009 02:59:04 AM: > [image removed] > > Re: [OMPI users] Multi-threading with OpenMPI ? > > Ashika Umanga Umagiliya > > to: > > Open MPI Users > > 09/25/2009 03:00 AM > > Sent by: > > users-boun...@open-mpi.org > > Please respond to Open MPI Users > > Thank you Dick for your detailed reply, > > I am sorry, could you explain more what you meant by "unless you are > calling MPI_Comm_spawn on a single task communicator you would need > to have a different input communicator for each thread that will > make an MPI_Comm_spawn call" , i am confused with the term "single > task communicator" > > Best Regards, > umanga > > Richard Treumann wrote: > It is dangerous to hold a local lock (like a mutex} across a > blocking MPI call unless you can be 100% sure everything that must > happen remotely will be completely independent of what is done with > local locks & communication dependancies on other tasks. > > It is likely that a MPI_Comm_spawn call in which the spawning > communicator is MPI_COMM_SELF would be safe to serialize with a > mutex. But be careful and do not view this as an approach to making > MPI applications thread safe in general. Also, unless you are > calling MPI_Comm_spawn on a single task communicator you would need > to have a different input communicator for ea
Re: [OMPI users] Multi-threading with OpenMPI ?
Are these threads running for long periods of time? I ask because there typically are system limits on the number of pipes any one process can open, which is what you appear to be hitting. You can check two things (as the error message tells you :-)): 1. set -mca opal_set_max_sys_limits 1 on your cmd line (or in environ). This will tell OMPI to automatically set the system to the max allowed values 2. check "ulimit" to see what you are allowed. You might need to talk to you sys admin about upping limits. On Oct 5, 2009, at 1:33 AM, Ashika Umanga Umagiliya wrote: Greetings all, First of all thank you all for the help. I tried using locks and still I get following problems : 1) When multiple threads calling MPI_Comm_Spawn (sequentially or in parallel), some spawned processes hang up on its "MPI_Init_thread(NULL,NULL,MPI_THREAD_MULTIPLE,);" method. (I can see list of all spawned processes are stacked in the 'top' command.) 2) Randomly, program (webservice) crashes with the error "[umanga:06488] [[4594,0],0] ORTE_ERROR_LOG: The system limit on number of pipes a process can open was reached in file odls_default_module.c at line 218 [umanga:06488] [[4594,0],0] ORTE_ERROR_LOG: The system limit on number of network connections a process can open was reached in file oob_tcp.c at line 447 -- Error: system limit exceeded on number of network connections that can be open This can be resolved by setting the mca parameter opal_set_max_sys_limits to 1, increasing your limit descriptor setting (using limit or ulimit commands), or asking the system administrator to increase the system limit. --" Any advices ? Thank you, umanga Richard Treumann wrote: MPI_COMM_SELF is one example. The only task it contains is the local task. The other case I had in mind is where there is a master doing all spawns. Master is launched as an MPI "job" but it has only one task. In that master, even MPI_COMM_WORLD is what I called a "single task communicator". Because the collective spawn call is "collective: across only one task in this case, it does not have the same sort of dependency on what other tasks do. I think it is common for a single task master to have responsibility for all spawns in the kind of model yours sounds like. I did not study the conversation enough to knew if you are doing all spawn calls from a "single task communicator" and I was trying to give a broadly useful explanation. Dick Treumann - MPI Team IBM Systems & Technology Group Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 users-boun...@open-mpi.org wrote on 09/25/2009 02:59:04 AM: > [image removed] > > Re: [OMPI users] Multi-threading with OpenMPI ? > > Ashika Umanga Umagiliya > > to: > > Open MPI Users > > 09/25/2009 03:00 AM > > Sent by: > > users-boun...@open-mpi.org > > Please respond to Open MPI Users > > Thank you Dick for your detailed reply, > > I am sorry, could you explain more what you meant by "unless you are > calling MPI_Comm_spawn on a single task communicator you would need > to have a different input communicator for each thread that will > make an MPI_Comm_spawn call" , i am confused with the term "single > task communicator" > > Best Regards, > umanga > > Richard Treumann wrote: > It is dangerous to hold a local lock (like a mutex} across a > blocking MPI call unless you can be 100% sure everything that must > happen remotely will be completely independent of what is done with > local locks & communication dependancies on other tasks. > > It is likely that a MPI_Comm_spawn call in which the spawning > communicator is MPI_COMM_SELF would be safe to serialize with a > mutex. But be careful and do not view this as an approach to making > MPI applications thread safe in general. Also, unless you are > calling MPI_Comm_spawn on a single task communicator you would need > to have a different input communicator for each thread that will > make an MPI_Comm_spawn call. MPI requires that collective calls on a > given communicator be made in the same order by all participating tasks. > > If there are two or more tasks making the MPI_Comm_spawn call > collectively from multiple threads (even with per-thread input > communicators) then using a local lock this way is pretty sure to > deadlock at some point. Say task 0 serializes spawning threads as A > then B and task 1 serializes them as B then A. The job will deadlock > because task 0 cannot free its lock for thread A until task 1 makes > the spawn call for thread A as wel
Re: [OMPI users] Multi-threading with OpenMPI ?
Greetings all, First of all thank you all for the help. I tried using locks and still I get following problems : 1) When multiple threads calling MPI_Comm_Spawn (sequentially or in parallel), some spawned processes hang up on its "MPI_Init_thread(NULL,NULL,MPI_THREAD_MULTIPLE,);" method. (I can see list of all spawned processes are stacked in the 'top' command.) 2) Randomly, program (webservice) crashes with the error "[umanga:06488] [[4594,0],0] ORTE_ERROR_LOG: The system limit on number of pipes a process can open was reached in file odls_default_module.c at line 218 [umanga:06488] [[4594,0],0] ORTE_ERROR_LOG: The system limit on number of network connections a process can open was reached in file oob_tcp.c at line 447 -- Error: system limit exceeded on number of network connections that can be open This can be resolved by setting the mca parameter opal_set_max_sys_limits to 1, increasing your limit descriptor setting (using limit or ulimit commands), or asking the system administrator to increase the system limit. --" Any advices ? Thank you, umanga Richard Treumann wrote: MPI_COMM_SELF is one example. The only task it contains is the local task. The other case I had in mind is where there is a master doing all spawns. Master is launched as an MPI "job" but it has only one task. In that master, even MPI_COMM_WORLD is what I called a "single task communicator". Because the collective spawn call is "collective: across only one task in this case, it does not have the same sort of dependency on what other tasks do. I think it is common for a single task master to have responsibility for all spawns in the kind of model yours sounds like. I did not study the conversation enough to knew if you are doing all spawn calls from a "single task communicator" and I was trying to give a broadly useful explanation. Dick Treumann - MPI Team IBM Systems & Technology Group Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 users-boun...@open-mpi.org wrote on 09/25/2009 02:59:04 AM: > [image removed] > > Re: [OMPI users] Multi-threading with OpenMPI ? > > Ashika Umanga Umagiliya > > to: > > Open MPI Users > > 09/25/2009 03:00 AM > > Sent by: > > users-boun...@open-mpi.org > > Please respond to Open MPI Users > > Thank you Dick for your detailed reply, > > I am sorry, could you explain more what you meant by "unless you are > calling MPI_Comm_spawn on a single task communicator you would need > to have a different input communicator for each thread that will > make an MPI_Comm_spawn call" , i am confused with the term "single > task communicator" > > Best Regards, > umanga > > Richard Treumann wrote: > It is dangerous to hold a local lock (like a mutex} across a > blocking MPI call unless you can be 100% sure everything that must > happen remotely will be completely independent of what is done with > local locks & communication dependancies on other tasks. > > It is likely that a MPI_Comm_spawn call in which the spawning > communicator is MPI_COMM_SELF would be safe to serialize with a > mutex. But be careful and do not view this as an approach to making > MPI applications thread safe in general. Also, unless you are > calling MPI_Comm_spawn on a single task communicator you would need > to have a different input communicator for each thread that will > make an MPI_Comm_spawn call. MPI requires that collective calls on a > given communicator be made in the same order by all participating tasks. > > If there are two or more tasks making the MPI_Comm_spawn call > collectively from multiple threads (even with per-thread input > communicators) then using a local lock this way is pretty sure to > deadlock at some point. Say task 0 serializes spawning threads as A > then B and task 1 serializes them as B then A. The job will deadlock > because task 0 cannot free its lock for thread A until task 1 makes > the spawn call for thread A as well. That will never happen if task > 1 is stuck in a lock that will not release until task 0 makes its > call for thread B. > > When you look at the code for a particular task and consider thread > interactions within the task, the use of the lock looks safe. It is > only when you consider the dependancies on what other tasks are > doing that the danger becomes clear. This particular case is pretty > easy to see but sometime when there is a temptation to hold a local > mutex across an blocking MPI call, the chain of dependancies that > can lead to deadlock becomes very hard to predic
Re: [OMPI users] Multi-threading with OpenMPI ?
MPI_COMM_SELF is one example. The only task it contains is the local task. The other case I had in mind is where there is a master doing all spawns. Master is launched as an MPI "job" but it has only one task. In that master, even MPI_COMM_WORLD is what I called a "single task communicator". Because the collective spawn call is "collective: across only one task in this case, it does not have the same sort of dependency on what other tasks do. I think it is common for a single task master to have responsibility for all spawns in the kind of model yours sounds like. I did not study the conversation enough to knew if you are doing all spawn calls from a "single task communicator" and I was trying to give a broadly useful explanation. Dick Treumann - MPI Team IBM Systems & Technology Group Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 users-boun...@open-mpi.org wrote on 09/25/2009 02:59:04 AM: > [image removed] > > Re: [OMPI users] Multi-threading with OpenMPI ? > > Ashika Umanga Umagiliya > > to: > > Open MPI Users > > 09/25/2009 03:00 AM > > Sent by: > > users-boun...@open-mpi.org > > Please respond to Open MPI Users > > Thank you Dick for your detailed reply, > > I am sorry, could you explain more what you meant by "unless you are > calling MPI_Comm_spawn on a single task communicator you would need > to have a different input communicator for each thread that will > make an MPI_Comm_spawn call" , i am confused with the term "single > task communicator" > > Best Regards, > umanga > > Richard Treumann wrote: > It is dangerous to hold a local lock (like a mutex} across a > blocking MPI call unless you can be 100% sure everything that must > happen remotely will be completely independent of what is done with > local locks & communication dependancies on other tasks. > > It is likely that a MPI_Comm_spawn call in which the spawning > communicator is MPI_COMM_SELF would be safe to serialize with a > mutex. But be careful and do not view this as an approach to making > MPI applications thread safe in general. Also, unless you are > calling MPI_Comm_spawn on a single task communicator you would need > to have a different input communicator for each thread that will > make an MPI_Comm_spawn call. MPI requires that collective calls on a > given communicator be made in the same order by all participating tasks. > > If there are two or more tasks making the MPI_Comm_spawn call > collectively from multiple threads (even with per-thread input > communicators) then using a local lock this way is pretty sure to > deadlock at some point. Say task 0 serializes spawning threads as A > then B and task 1 serializes them as B then A. The job will deadlock > because task 0 cannot free its lock for thread A until task 1 makes > the spawn call for thread A as well. That will never happen if task > 1 is stuck in a lock that will not release until task 0 makes its > call for thread B. > > When you look at the code for a particular task and consider thread > interactions within the task, the use of the lock looks safe. It is > only when you consider the dependancies on what other tasks are > doing that the danger becomes clear. This particular case is pretty > easy to see but sometime when there is a temptation to hold a local > mutex across an blocking MPI call, the chain of dependancies that > can lead to deadlock becomes very hard to predict. > > BTW - maybe this is obvious but you also need to protect the logic > which calls MPI_Thread_init to make sure you do not have a a race in > which 2 threads each race to test the flag for whether > MPI_Init_thread has already been called. If two thread do: > 1) if (MPI_Inited_flag == FALSE) { > 2) set MPI_Inited_flag > 3) MPI_Init_thread > 4) } > You have a couple race conditions. > 1) Two threads may both try to call MPI_Iint_thread if one thread > tests " if (MPI_Inited_flag == FALSE)" while the other is between > statements 1 & 2. > 2) If some thread tests "if (MPI_Inited_flag == FALSE)" while > another thread is between statements 2 and 3, that thread could > assume MPI_Init_thread is done and make the MPI_Comm_spawn call > before the thread that is trying to initialize MPI manages to do it. > > Dick > > > Dick Treumann - MPI Team > IBM Systems & Technology Group > Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 > Tele (845) 433-7846 Fax (845) 433-8363 > > > users-boun...@open-mpi.org wrote on 09/17/2009 11:36:48 PM: > > > [image removed] > > > > Re: [OMPI users] Multi-threading with OpenMPI ? > >
Re: [OMPI users] Multi-threading with OpenMPI ?
Thank you Dick for your detailed reply, I am sorry, could you explain more what you meant by "unless you are calling MPI_Comm_spawn on a single task communicator you would need to have a different input communicator for each thread that will make an MPI_Comm_spawn call" , i am confused with the term "single task communicator" Best Regards, umanga Richard Treumann wrote: It is dangerous to hold a local lock (like a mutex} across a blocking MPI call unless you can be 100% sure everything that must happen remotely will be completely independent of what is done with local locks & communication dependancies on other tasks. It is likely that a MPI_Comm_spawn call in which the spawning communicator is MPI_COMM_SELF would be safe to serialize with a mutex. But be careful and do not view this as an approach to making MPI applications thread safe in general. Also, unless you are calling MPI_Comm_spawn on a single task communicator you would need to have a different input communicator for each thread that will make an MPI_Comm_spawn call. MPI requires that collective calls on a given communicator be made in the same order by all participating tasks. If there are two or more tasks making the MPI_Comm_spawn call collectively from multiple threads (even with per-thread input communicators) then using a local lock this way is pretty sure to deadlock at some point. Say task 0 serializes spawning threads as A then B and task 1 serializes them as B then A. The job will deadlock because task 0 cannot free its lock for thread A until task 1 makes the spawn call for thread A as well. That will never happen if task 1 is stuck in a lock that will not release until task 0 makes its call for thread B. When you look at the code for a particular task and consider thread interactions within the task, the use of the lock looks safe. It is only when you consider the dependancies on what other tasks are doing that the danger becomes clear. This particular case is pretty easy to see but sometime when there is a temptation to hold a local mutex across an blocking MPI call, the chain of dependancies that can lead to deadlock becomes very hard to predict. BTW - maybe this is obvious but you also need to protect the logic which calls MPI_Thread_init to make sure you do not have a a race in which 2 threads each race to test the flag for whether MPI_Init_thread has already been called. If two thread do: 1) if (MPI_Inited_flag == FALSE) { 2) set MPI_Inited_flag 3) MPI_Init_thread 4) } You have a couple race conditions. 1) Two threads may both try to call MPI_Iint_thread if one thread tests " if (MPI_Inited_flag == FALSE)" while the other is between statements 1 & 2. 2) If some thread tests "if (MPI_Inited_flag == FALSE)" while another thread is between statements 2 and 3, that thread could assume MPI_Init_thread is done and make the MPI_Comm_spawn call before the thread that is trying to initialize MPI manages to do it. Dick Dick Treumann - MPI Team IBM Systems & Technology Group Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 users-boun...@open-mpi.org wrote on 09/17/2009 11:36:48 PM: > [image removed] > > Re: [OMPI users] Multi-threading with OpenMPI ? > > Ralph Castain > > to: > > Open MPI Users > > 09/17/2009 11:37 PM > > Sent by: > > users-boun...@open-mpi.org > > Please respond to Open MPI Users > > Only thing I can suggest is to place a thread lock around the call to > comm_spawn so that only one thread at a time can execute that > function. The call to mpi_init_thread is fine - you just need to > explicitly protect the call to comm_spawn. > > ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Multi-threading with OpenMPI ?
It is dangerous to hold a local lock (like a mutex} across a blocking MPI call unless you can be 100% sure everything that must happen remotely will be completely independent of what is done with local locks & communication dependancies on other tasks. It is likely that a MPI_Comm_spawn call in which the spawning communicator is MPI_COMM_SELF would be safe to serialize with a mutex. But be careful and do not view this as an approach to making MPI applications thread safe in general. Also, unless you are calling MPI_Comm_spawn on a single task communicator you would need to have a different input communicator for each thread that will make an MPI_Comm_spawn call. MPI requires that collective calls on a given communicator be made in the same order by all participating tasks. If there are two or more tasks making the MPI_Comm_spawn call collectively from multiple threads (even with per-thread input communicators) then using a local lock this way is pretty sure to deadlock at some point. Say task 0 serializes spawning threads as A then B and task 1 serializes them as B then A. The job will deadlock because task 0 cannot free its lock for thread A until task 1 makes the spawn call for thread A as well. That will never happen if task 1 is stuck in a lock that will not release until task 0 makes its call for thread B. When you look at the code for a particular task and consider thread interactions within the task, the use of the lock looks safe. It is only when you consider the dependancies on what other tasks are doing that the danger becomes clear. This particular case is pretty easy to see but sometime when there is a temptation to hold a local mutex across an blocking MPI call, the chain of dependancies that can lead to deadlock becomes very hard to predict. BTW - maybe this is obvious but you also need to protect the logic which calls MPI_Thread_init to make sure you do not have a a race in which 2 threads each race to test the flag for whether MPI_Init_thread has already been called. If two thread do: 1) if (MPI_Inited_flag == FALSE) { 2)set MPI_Inited_flag 3)MPI_Init_thread 4) } You have a couple race conditions. 1) Two threads may both try to call MPI_Iint_thread if one thread tests " if (MPI_Inited_flag == FALSE)" while the other is between statements 1 & 2. 2) If some thread tests "if (MPI_Inited_flag == FALSE)" while another thread is between statements 2 and 3, that thread could assume MPI_Init_thread is done and make the MPI_Comm_spawn call before the thread that is trying to initialize MPI manages to do it. Dick Dick Treumann - MPI Team IBM Systems & Technology Group Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 users-boun...@open-mpi.org wrote on 09/17/2009 11:36:48 PM: > [image removed] > > Re: [OMPI users] Multi-threading with OpenMPI ? > > Ralph Castain > > to: > > Open MPI Users > > 09/17/2009 11:37 PM > > Sent by: > > users-boun...@open-mpi.org > > Please respond to Open MPI Users > > Only thing I can suggest is to place a thread lock around the call to > comm_spawn so that only one thread at a time can execute that > function. The call to mpi_init_thread is fine - you just need to > explicitly protect the call to comm_spawn. > >
Re: [OMPI users] Multi-threading with OpenMPI ?
Hi Ashika, Yes you can serialize the call using pthead mutex if you have created threads using pthreads. Basically whatever thread libray you are using for thread creation provides synchronization API's which you have to use here. OPAL_THREAD_LOCK and UNLOCK code is also implemented using supported thread library on your platform(selected by default during configure or use --with-threads). You can't use OPAL library as it is not exported to outside MPI programming world. Regards Neeraj Chourasia (MTS) Computational Research Laboratories Ltd. (A wholly Owned Subsidiary of TATA SONS Ltd) B-101, ICC Trade Towers, Senapati Bapat Road Pune 411016 (Mah) INDIA (O) +91-20-6620 9863 (Fax) +91-20-6620 9862 M: +91.9225520634 Ashika Umanga Umagiliya <auma...@biggjapan.com> Sent by: users-boun...@open-mpi.org 09/18/2009 09:25 AM Please respond to Open MPI Users <us...@open-mpi.org> To Open MPI Users <us...@open-mpi.org> cc Subject Re: [OMPI users] Multi-threading with OpenMPI ? Thanks Ralph, I have not much experience in this area.shall i use pthread_mutex_lock(*),*pthread_mutex_unlock() ..etc or following which i saw in OpenMPI source : static opal_mutex_t ompi_lock; OPAL_THREAD_LOCK(_lock); // OPAL_THREAD_UNLOCK(_lock); Thanks in advance, umanga Ralph Castain wrote: > Only thing I can suggest is to place a thread lock around the call to > comm_spawn so that only one thread at a time can execute that > function. The call to mpi_init_thread is fine - you just need to > explicitly protect the call to comm_spawn. > > > On Sep 17, 2009, at 7:44 PM, Ashika Umanga Umagiliya wrote: > >> HI Jeff, Ralph, >> >> Yes, I call MPI_COMM_SPAWN in multiple threads simultaneously. >> Because I need to expose my parallel algorithm as a web service, I >> need multiple clients connect and execute my logic as same time(ie >> mutiple threads). >> For each client , a new thread is created (by Web service framework) >> and inside the thread,MPI_Init_Thread() is called if the MPI hasnt >> been initialized. >> The the thread calls MPI_COMM__SPAWN and create new processes. >> >> So ,if this is the case isn't there any workarounds ? >> >> Thanks in advance, >> umanga >> >> >> Jeff Squyres wrote: >>> On Sep 16, 2009, at 9:53 PM, Ralph Castain wrote: >>> >>>> Only the obvious, and not very helpful one: comm_spawn isn't thread >>>> safe at this time. You'll need to serialize your requests to that >>>> function. >>>> >>> >>> >>> This is likely the cause of your issues if you are calling >>> MPI_COMM_SPAWN in multiple threads simultaneously. Can you verify? >>> >>> If not, we'll need to dig a little deeper to figure out what's going >>> on. But Ralph is right -- read up on the THREAD_MULTIPLE >>> constraints (check the OMPI README file) to see if that's what's >>> biting you. >>> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. The sender does not accept liability for any errors or omissions.Thank you =-=-=
Re: [OMPI users] Multi-threading with OpenMPI ?
Thanks Ralph, I have not much experience in this area.shall i use pthread_mutex_lock(*),*pthread_mutex_unlock() ..etc or following which i saw in OpenMPI source : static opal_mutex_t ompi_lock; OPAL_THREAD_LOCK(_lock); // OPAL_THREAD_UNLOCK(_lock); Thanks in advance, umanga Ralph Castain wrote: Only thing I can suggest is to place a thread lock around the call to comm_spawn so that only one thread at a time can execute that function. The call to mpi_init_thread is fine - you just need to explicitly protect the call to comm_spawn. On Sep 17, 2009, at 7:44 PM, Ashika Umanga Umagiliya wrote: HI Jeff, Ralph, Yes, I call MPI_COMM_SPAWN in multiple threads simultaneously. Because I need to expose my parallel algorithm as a web service, I need multiple clients connect and execute my logic as same time(ie mutiple threads). For each client , a new thread is created (by Web service framework) and inside the thread,MPI_Init_Thread() is called if the MPI hasnt been initialized. The the thread calls MPI_COMM__SPAWN and create new processes. So ,if this is the case isn't there any workarounds ? Thanks in advance, umanga Jeff Squyres wrote: On Sep 16, 2009, at 9:53 PM, Ralph Castain wrote: Only the obvious, and not very helpful one: comm_spawn isn't thread safe at this time. You'll need to serialize your requests to that function. This is likely the cause of your issues if you are calling MPI_COMM_SPAWN in multiple threads simultaneously. Can you verify? If not, we'll need to dig a little deeper to figure out what's going on. But Ralph is right -- read up on the THREAD_MULTIPLE constraints (check the OMPI README file) to see if that's what's biting you. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Multi-threading with OpenMPI ?
Only thing I can suggest is to place a thread lock around the call to comm_spawn so that only one thread at a time can execute that function. The call to mpi_init_thread is fine - you just need to explicitly protect the call to comm_spawn. On Sep 17, 2009, at 7:44 PM, Ashika Umanga Umagiliya wrote: HI Jeff, Ralph, Yes, I call MPI_COMM_SPAWN in multiple threads simultaneously. Because I need to expose my parallel algorithm as a web service, I need multiple clients connect and execute my logic as same time(ie mutiple threads). For each client , a new thread is created (by Web service framework) and inside the thread,MPI_Init_Thread() is called if the MPI hasnt been initialized. The the thread calls MPI_COMM__SPAWN and create new processes. So ,if this is the case isn't there any workarounds ? Thanks in advance, umanga Jeff Squyres wrote: On Sep 16, 2009, at 9:53 PM, Ralph Castain wrote: Only the obvious, and not very helpful one: comm_spawn isn't thread safe at this time. You'll need to serialize your requests to that function. This is likely the cause of your issues if you are calling MPI_COMM_SPAWN in multiple threads simultaneously. Can you verify? If not, we'll need to dig a little deeper to figure out what's going on. But Ralph is right -- read up on the THREAD_MULTIPLE constraints (check the OMPI README file) to see if that's what's biting you. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Multi-threading with OpenMPI ?
HI Jeff, Ralph, Yes, I call MPI_COMM_SPAWN in multiple threads simultaneously. Because I need to expose my parallel algorithm as a web service, I need multiple clients connect and execute my logic as same time(ie mutiple threads). For each client , a new thread is created (by Web service framework) and inside the thread,MPI_Init_Thread() is called if the MPI hasnt been initialized. The the thread calls MPI_COMM__SPAWN and create new processes. So ,if this is the case isn't there any workarounds ? Thanks in advance, umanga Jeff Squyres wrote: On Sep 16, 2009, at 9:53 PM, Ralph Castain wrote: Only the obvious, and not very helpful one: comm_spawn isn't thread safe at this time. You'll need to serialize your requests to that function. This is likely the cause of your issues if you are calling MPI_COMM_SPAWN in multiple threads simultaneously. Can you verify? If not, we'll need to dig a little deeper to figure out what's going on. But Ralph is right -- read up on the THREAD_MULTIPLE constraints (check the OMPI README file) to see if that's what's biting you.
Re: [OMPI users] Multi-threading with OpenMPI ?
On Sep 16, 2009, at 9:53 PM, Ralph Castain wrote: Only the obvious, and not very helpful one: comm_spawn isn't thread safe at this time. You'll need to serialize your requests to that function. This is likely the cause of your issues if you are calling MPI_COMM_SPAWN in multiple threads simultaneously. Can you verify? If not, we'll need to dig a little deeper to figure out what's going on. But Ralph is right -- read up on the THREAD_MULTIPLE constraints (check the OMPI README file) to see if that's what's biting you. -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] Multi-threading with OpenMPI ?
Only the obvious, and not very helpful one: comm_spawn isn't thread safe at this time. You'll need to serialize your requests to that function. I believe the thread safety constraints within OMPI are discussed to some extent on the FAQ site. At the least, they have been discussed in some depth on this mailing list several times. Might be some further nuggets of advice on workarounds in there. On Sep 16, 2009, at 7:37 PM, Ashika Umanga Umagiliya wrote: Any tips ? Anyone ? :( Ashika Umanga Umagiliya wrote: One more modification , I do not call MPI_Finalize() from the "libParallel.so" library. Ashika Umanga Umagiliya wrote: Greetings all, After some reading , I found out that I have to build openMPI using "--enable-mpi-threads" After thatm I changed MPI_INIT() code in my "libParallel.so" and in "parallel-svr" (please refer to http://i27.tinypic.com/ mtqurp.jpg ) to : int sup; MPI_Init_thread(NULL,NULL,MPI_THREAD_MULTIPLE,); Now when multiple requests comes (multiple threads) MPI gives following two errors: "[umanga:06127] [[8004,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 299 [umanga:6127] *** An error occurred in MPI_Comm_spawn [umanga:6127] *** on communicator MPI_COMM_SELF [umanga:6127] *** MPI_ERR_UNKNOWN: unknown error [umanga:6127] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [umanga:06126] [[8004,0],0]-[[8004,1],0] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104) -- mpirun has exited due to process rank 0 with PID 6127 on node umanga exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). " or sometimes : "[umanga:5477] *** An error occurred in MPI_Comm_spawn [umanga:5477] *** on communicator MPI_COMM_SELF [umanga:5477] *** MPI_ERR_UNKNOWN: unknown error [umanga:5477] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [umanga:05477] [[7630,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 299 -- mpirun has exited due to process rank 0 with PID 5477 on node umanga exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). --" Any tips ? Thank you Ashika Umanga Umagiliya wrote: Greetings all, Please refer to image at: http://i27.tinypic.com/mtqurp.jpg Here the process illustrated in the image: 1) C++ Webservice loads the "libParallel.so" when it starts up. (dlopen) 2) When a new request comes from a client,*new thread* is created, SOAP data is bound to C++ objects and calcRisk() method of webservice invoked.Inside this method, "calcRisk()" of "libParallel" is invoked (using dlsym ..etc) 3) Inside "calcRisk()" of "libParallel" ,it spawns "parallel-svr" MPI application. (I am using boost MPI and boost serializarion to send custom-data- types across spawned processes.) 4) "parallel-svr" (MPI Application in image) execute the parallel logic and send the result back to "libParallel.so" using boost MPI send..etc. 5) "libParallel.so" send the result to webservice,bind into SOAP and sent result to client and the thread ends. My problem is : Everthing works fine for the first request from the client, For the second request it throws an error (i assume from libParallel.so") saying: "-- Calling any MPI-function after calling MPI_Finalize is erroneous. The only exceptions are MPI_Initialized, MPI_Finalized and MPI_Get_version. -- *** An error occurred in MPI_Init *** after MPI was finalized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [umanga:19390] Abort after MPI_FINALIZE completed successfully; not able to guarantee that all other processes were killed!" Is this because of multithreading ? Any idea how to fix this ? Thanks in advance, umanga ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Multi-threading with OpenMPI ?
Any tips ? Anyone ? :( Ashika Umanga Umagiliya wrote: One more modification , I do not call MPI_Finalize() from the "libParallel.so" library. Ashika Umanga Umagiliya wrote: Greetings all, After some reading , I found out that I have to build openMPI using "--enable-mpi-threads" After thatm I changed MPI_INIT() code in my "libParallel.so" and in "parallel-svr" (please refer to http://i27.tinypic.com/mtqurp.jpg ) to : int sup; MPI_Init_thread(NULL,NULL,MPI_THREAD_MULTIPLE,); Now when multiple requests comes (multiple threads) MPI gives following two errors: "[umanga:06127] [[8004,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 299 [umanga:6127] *** An error occurred in MPI_Comm_spawn [umanga:6127] *** on communicator MPI_COMM_SELF [umanga:6127] *** MPI_ERR_UNKNOWN: unknown error [umanga:6127] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [umanga:06126] [[8004,0],0]-[[8004,1],0] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104) -- mpirun has exited due to process rank 0 with PID 6127 on node umanga exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). " or sometimes : "[umanga:5477] *** An error occurred in MPI_Comm_spawn [umanga:5477] *** on communicator MPI_COMM_SELF [umanga:5477] *** MPI_ERR_UNKNOWN: unknown error [umanga:5477] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [umanga:05477] [[7630,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 299 -- mpirun has exited due to process rank 0 with PID 5477 on node umanga exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). --" Any tips ? Thank you Ashika Umanga Umagiliya wrote: Greetings all, Please refer to image at: http://i27.tinypic.com/mtqurp.jpg Here the process illustrated in the image: 1) C++ Webservice loads the "libParallel.so" when it starts up. (dlopen) 2) When a new request comes from a client,*new thread* is created, SOAP data is bound to C++ objects and calcRisk() method of webservice invoked.Inside this method, "calcRisk()" of "libParallel" is invoked (using dlsym ..etc) 3) Inside "calcRisk()" of "libParallel" ,it spawns "parallel-svr" MPI application. (I am using boost MPI and boost serializarion to send custom-data-types across spawned processes.) 4) "parallel-svr" (MPI Application in image) execute the parallel logic and send the result back to "libParallel.so" using boost MPI send..etc. 5) "libParallel.so" send the result to webservice,bind into SOAP and sent result to client and the thread ends. My problem is : Everthing works fine for the first request from the client, For the second request it throws an error (i assume from libParallel.so") saying: "-- Calling any MPI-function after calling MPI_Finalize is erroneous. The only exceptions are MPI_Initialized, MPI_Finalized and MPI_Get_version. -- *** An error occurred in MPI_Init *** after MPI was finalized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [umanga:19390] Abort after MPI_FINALIZE completed successfully; not able to guarantee that all other processes were killed!" Is this because of multithreading ? Any idea how to fix this ? Thanks in advance, umanga
Re: [OMPI users] Multi-threading with OpenMPI ?
One more modification , I do not call MPI_Finalize() from the "libParallel.so" library. Ashika Umanga Umagiliya wrote: Greetings all, After some reading , I found out that I have to build openMPI using "--enable-mpi-threads" After thatm I changed MPI_INIT() code in my "libParallel.so" and in "parallel-svr" (please refer to http://i27.tinypic.com/mtqurp.jpg ) to : int sup; MPI_Init_thread(NULL,NULL,MPI_THREAD_MULTIPLE,); Now when multiple requests comes (multiple threads) MPI gives following two errors: "[umanga:06127] [[8004,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 299 [umanga:6127] *** An error occurred in MPI_Comm_spawn [umanga:6127] *** on communicator MPI_COMM_SELF [umanga:6127] *** MPI_ERR_UNKNOWN: unknown error [umanga:6127] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [umanga:06126] [[8004,0],0]-[[8004,1],0] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104) -- mpirun has exited due to process rank 0 with PID 6127 on node umanga exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). " or sometimes : "[umanga:5477] *** An error occurred in MPI_Comm_spawn [umanga:5477] *** on communicator MPI_COMM_SELF [umanga:5477] *** MPI_ERR_UNKNOWN: unknown error [umanga:5477] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [umanga:05477] [[7630,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 299 -- mpirun has exited due to process rank 0 with PID 5477 on node umanga exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). --" Any tips ? Thank you Ashika Umanga Umagiliya wrote: Greetings all, Please refer to image at: http://i27.tinypic.com/mtqurp.jpg Here the process illustrated in the image: 1) C++ Webservice loads the "libParallel.so" when it starts up. (dlopen) 2) When a new request comes from a client,*new thread* is created, SOAP data is bound to C++ objects and calcRisk() method of webservice invoked.Inside this method, "calcRisk()" of "libParallel" is invoked (using dlsym ..etc) 3) Inside "calcRisk()" of "libParallel" ,it spawns "parallel-svr" MPI application. (I am using boost MPI and boost serializarion to send custom-data-types across spawned processes.) 4) "parallel-svr" (MPI Application in image) execute the parallel logic and send the result back to "libParallel.so" using boost MPI send..etc. 5) "libParallel.so" send the result to webservice,bind into SOAP and sent result to client and the thread ends. My problem is : Everthing works fine for the first request from the client, For the second request it throws an error (i assume from libParallel.so") saying: "-- Calling any MPI-function after calling MPI_Finalize is erroneous. The only exceptions are MPI_Initialized, MPI_Finalized and MPI_Get_version. -- *** An error occurred in MPI_Init *** after MPI was finalized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [umanga:19390] Abort after MPI_FINALIZE completed successfully; not able to guarantee that all other processes were killed!" Is this because of multithreading ? Any idea how to fix this ? Thanks in advance, umanga
Re: [OMPI users] Multi-threading with OpenMPI ?
Greetings all, After some reading , I found out that I have to build openMPI using "--enable-mpi-threads" After thatm I changed MPI_INIT() code in my "libParallel.so" and in "parallel-svr" (please refer to http://i27.tinypic.com/mtqurp.jpg ) to : int sup; MPI_Init_thread(NULL,NULL,MPI_THREAD_MULTIPLE,); Now when multiple requests comes (multiple threads) MPI gives following two errors: "[umanga:06127] [[8004,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 299 [umanga:6127] *** An error occurred in MPI_Comm_spawn [umanga:6127] *** on communicator MPI_COMM_SELF [umanga:6127] *** MPI_ERR_UNKNOWN: unknown error [umanga:6127] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [umanga:06126] [[8004,0],0]-[[8004,1],0] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104) -- mpirun has exited due to process rank 0 with PID 6127 on node umanga exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). " or sometimes : "[umanga:5477] *** An error occurred in MPI_Comm_spawn [umanga:5477] *** on communicator MPI_COMM_SELF [umanga:5477] *** MPI_ERR_UNKNOWN: unknown error [umanga:5477] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [umanga:05477] [[7630,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 299 -- mpirun has exited due to process rank 0 with PID 5477 on node umanga exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). --" Any tips ? Thank you Ashika Umanga Umagiliya wrote: Greetings all, Please refer to image at: http://i27.tinypic.com/mtqurp.jpg Here the process illustrated in the image: 1) C++ Webservice loads the "libParallel.so" when it starts up. (dlopen) 2) When a new request comes from a client,*new thread* is created, SOAP data is bound to C++ objects and calcRisk() method of webservice invoked.Inside this method, "calcRisk()" of "libParallel" is invoked (using dlsym ..etc) 3) Inside "calcRisk()" of "libParallel" ,it spawns "parallel-svr" MPI application. (I am using boost MPI and boost serializarion to send custom-data-types across spawned processes.) 4) "parallel-svr" (MPI Application in image) execute the parallel logic and send the result back to "libParallel.so" using boost MPI send..etc. 5) "libParallel.so" send the result to webservice,bind into SOAP and sent result to client and the thread ends. My problem is : Everthing works fine for the first request from the client, For the second request it throws an error (i assume from libParallel.so") saying: "-- Calling any MPI-function after calling MPI_Finalize is erroneous. The only exceptions are MPI_Initialized, MPI_Finalized and MPI_Get_version. -- *** An error occurred in MPI_Init *** after MPI was finalized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [umanga:19390] Abort after MPI_FINALIZE completed successfully; not able to guarantee that all other processes were killed!" Is this because of multithreading ? Any idea how to fix this ? Thanks in advance, umanga
[OMPI users] Multi-threading with OpenMPI ?
Greetings all, Please refer to image at: http://i27.tinypic.com/mtqurp.jpg Here the process illustrated in the image: 1) C++ Webservice loads the "libParallel.so" when it starts up. (dlopen) 2) When a new request comes from a client,*new thread* is created, SOAP data is bound to C++ objects and calcRisk() method of webservice invoked.Inside this method, "calcRisk()" of "libParallel" is invoked (using dlsym ..etc) 3) Inside "calcRisk()" of "libParallel" ,it spawns "parallel-svr" MPI application. (I am using boost MPI and boost serializarion to send custom-data-types across spawned processes.) 4) "parallel-svr" (MPI Application in image) execute the parallel logic and send the result back to "libParallel.so" using boost MPI send..etc. 5) "libParallel.so" send the result to webservice,bind into SOAP and sent result to client and the thread ends. My problem is : Everthing works fine for the first request from the client, For the second request it throws an error (i assume from libParallel.so") saying: "-- Calling any MPI-function after calling MPI_Finalize is erroneous. The only exceptions are MPI_Initialized, MPI_Finalized and MPI_Get_version. -- *** An error occurred in MPI_Init *** after MPI was finalized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [umanga:19390] Abort after MPI_FINALIZE completed successfully; not able to guarantee that all other processes were killed!" Is this because of multithreading ? Any idea how to fix this ? Thanks in advance, umanga