Re: [OMPI users] Multi-threading with OpenMPI ?
Ralph , thank you for your help. I set "-mca opal_set_max_sys_limits 1" and my "ulimit" us "unlimited" , but still I get the errors. What's happening now is ,for every user request(webservice request) a new thread is created and in the same thread I spawn processes and these newly spawned processes do the calculation is parallel. I think, I have to change the design so that I put the "Requests" in a queue and execute "parallel job" one at a time ,rather than runing multiple "parallel jobs" at once.(this might eventually run-out of system resources ). Thank you , umanga Ralph Castain wrote: Are these threads running for long periods of time? I ask because there typically are system limits on the number of pipes any one process can open, which is what you appear to be hitting. You can check two things (as the error message tells you :-)): 1. set -mca opal_set_max_sys_limits 1 on your cmd line (or in environ). This will tell OMPI to automatically set the system to the max allowed values 2. check "ulimit" to see what you are allowed. You might need to talk to you sys admin about upping limits. On Oct 5, 2009, at 1:33 AM, Ashika Umanga Umagiliya wrote: Greetings all, First of all thank you all for the help. I tried using locks and still I get following problems : 1) When multiple threads calling MPI_Comm_Spawn (sequentially or in parallel), some spawned processes hang up on its "MPI_Init_thread(NULL,NULL,MPI_THREAD_MULTIPLE,);" method. (I can see list of all spawned processes are stacked in the 'top' command.) 2) Randomly, program (webservice) crashes with the error "[umanga:06488] [[4594,0],0] ORTE_ERROR_LOG: The system limit on number of pipes a process can open was reached in file odls_default_module.c at line 218 [umanga:06488] [[4594,0],0] ORTE_ERROR_LOG: The system limit on number of network connections a process can open was reached in file oob_tcp.c at line 447 -- Error: system limit exceeded on number of network connections that can be open This can be resolved by setting the mca parameter opal_set_max_sys_limits to 1, increasing your limit descriptor setting (using limit or ulimit commands), or asking the system administrator to increase the system limit. --" Any advices ? Thank you, umanga Richard Treumann wrote: MPI_COMM_SELF is one example. The only task it contains is the local task. The other case I had in mind is where there is a master doing all spawns. Master is launched as an MPI "job" but it has only one task. In that master, even MPI_COMM_WORLD is what I called a "single task communicator". Because the collective spawn call is "collective: across only one task in this case, it does not have the same sort of dependency on what other tasks do. I think it is common for a single task master to have responsibility for all spawns in the kind of model yours sounds like. I did not study the conversation enough to knew if you are doing all spawn calls from a "single task communicator" and I was trying to give a broadly useful explanation. Dick Treumann - MPI Team IBM Systems & Technology Group Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 users-boun...@open-mpi.org wrote on 09/25/2009 02:59:04 AM: > [image removed] > > Re: [OMPI users] Multi-threading with OpenMPI ? > > Ashika Umanga Umagiliya > > to: > > Open MPI Users > > 09/25/2009 03:00 AM > > Sent by: > > users-boun...@open-mpi.org > > Please respond to Open MPI Users > > Thank you Dick for your detailed reply, > > I am sorry, could you explain more what you meant by "unless you are > calling MPI_Comm_spawn on a single task communicator you would need > to have a different input communicator for each thread that will > make an MPI_Comm_spawn call" , i am confused with the term "single > task communicator" > > Best Regards, > umanga > > Richard Treumann wrote: > It is dangerous to hold a local lock (like a mutex} across a > blocking MPI call unless you can be 100% sure everything that must > happen remotely will be completely independent of what is done with > local locks & communication dependancies on other tasks. > > It is likely that a MPI_Comm_spawn call in which the spawning > communicator is MPI_COMM_SELF would be safe to serialize with a > mutex. But be careful and do not view this as an approach to making > MPI applications thread safe in general. Also, unless you are > calling MPI_Comm_spawn on a single task communicator you would need > to have a different input communicator for ea
Re: [OMPI users] Multi-threading with OpenMPI ?
Greetings all, First of all thank you all for the help. I tried using locks and still I get following problems : 1) When multiple threads calling MPI_Comm_Spawn (sequentially or in parallel), some spawned processes hang up on its "MPI_Init_thread(NULL,NULL,MPI_THREAD_MULTIPLE,);" method. (I can see list of all spawned processes are stacked in the 'top' command.) 2) Randomly, program (webservice) crashes with the error "[umanga:06488] [[4594,0],0] ORTE_ERROR_LOG: The system limit on number of pipes a process can open was reached in file odls_default_module.c at line 218 [umanga:06488] [[4594,0],0] ORTE_ERROR_LOG: The system limit on number of network connections a process can open was reached in file oob_tcp.c at line 447 -- Error: system limit exceeded on number of network connections that can be open This can be resolved by setting the mca parameter opal_set_max_sys_limits to 1, increasing your limit descriptor setting (using limit or ulimit commands), or asking the system administrator to increase the system limit. --" Any advices ? Thank you, umanga Richard Treumann wrote: MPI_COMM_SELF is one example. The only task it contains is the local task. The other case I had in mind is where there is a master doing all spawns. Master is launched as an MPI "job" but it has only one task. In that master, even MPI_COMM_WORLD is what I called a "single task communicator". Because the collective spawn call is "collective: across only one task in this case, it does not have the same sort of dependency on what other tasks do. I think it is common for a single task master to have responsibility for all spawns in the kind of model yours sounds like. I did not study the conversation enough to knew if you are doing all spawn calls from a "single task communicator" and I was trying to give a broadly useful explanation. Dick Treumann - MPI Team IBM Systems & Technology Group Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 users-boun...@open-mpi.org wrote on 09/25/2009 02:59:04 AM: > [image removed] > > Re: [OMPI users] Multi-threading with OpenMPI ? > > Ashika Umanga Umagiliya > > to: > > Open MPI Users > > 09/25/2009 03:00 AM > > Sent by: > > users-boun...@open-mpi.org > > Please respond to Open MPI Users > > Thank you Dick for your detailed reply, > > I am sorry, could you explain more what you meant by "unless you are > calling MPI_Comm_spawn on a single task communicator you would need > to have a different input communicator for each thread that will > make an MPI_Comm_spawn call" , i am confused with the term "single > task communicator" > > Best Regards, > umanga > > Richard Treumann wrote: > It is dangerous to hold a local lock (like a mutex} across a > blocking MPI call unless you can be 100% sure everything that must > happen remotely will be completely independent of what is done with > local locks & communication dependancies on other tasks. > > It is likely that a MPI_Comm_spawn call in which the spawning > communicator is MPI_COMM_SELF would be safe to serialize with a > mutex. But be careful and do not view this as an approach to making > MPI applications thread safe in general. Also, unless you are > calling MPI_Comm_spawn on a single task communicator you would need > to have a different input communicator for each thread that will > make an MPI_Comm_spawn call. MPI requires that collective calls on a > given communicator be made in the same order by all participating tasks. > > If there are two or more tasks making the MPI_Comm_spawn call > collectively from multiple threads (even with per-thread input > communicators) then using a local lock this way is pretty sure to > deadlock at some point. Say task 0 serializes spawning threads as A > then B and task 1 serializes them as B then A. The job will deadlock > because task 0 cannot free its lock for thread A until task 1 makes > the spawn call for thread A as well. That will never happen if task > 1 is stuck in a lock that will not release until task 0 makes its > call for thread B. > > When you look at the code for a particular task and consider thread > interactions within the task, the use of the lock looks safe. It is > only when you consider the dependancies on what other tasks are > doing that the danger becomes clear. This particular case is pretty > easy to see but sometime when there is a temptation to hold a local > mutex across an blocking MPI call, the chain of dependancies that > can lead to deadlock becomes very hard to predic
Re: [OMPI users] Multi-threading with OpenMPI ?
Thank you Dick for your detailed reply, I am sorry, could you explain more what you meant by "unless you are calling MPI_Comm_spawn on a single task communicator you would need to have a different input communicator for each thread that will make an MPI_Comm_spawn call" , i am confused with the term "single task communicator" Best Regards, umanga Richard Treumann wrote: It is dangerous to hold a local lock (like a mutex} across a blocking MPI call unless you can be 100% sure everything that must happen remotely will be completely independent of what is done with local locks & communication dependancies on other tasks. It is likely that a MPI_Comm_spawn call in which the spawning communicator is MPI_COMM_SELF would be safe to serialize with a mutex. But be careful and do not view this as an approach to making MPI applications thread safe in general. Also, unless you are calling MPI_Comm_spawn on a single task communicator you would need to have a different input communicator for each thread that will make an MPI_Comm_spawn call. MPI requires that collective calls on a given communicator be made in the same order by all participating tasks. If there are two or more tasks making the MPI_Comm_spawn call collectively from multiple threads (even with per-thread input communicators) then using a local lock this way is pretty sure to deadlock at some point. Say task 0 serializes spawning threads as A then B and task 1 serializes them as B then A. The job will deadlock because task 0 cannot free its lock for thread A until task 1 makes the spawn call for thread A as well. That will never happen if task 1 is stuck in a lock that will not release until task 0 makes its call for thread B. When you look at the code for a particular task and consider thread interactions within the task, the use of the lock looks safe. It is only when you consider the dependancies on what other tasks are doing that the danger becomes clear. This particular case is pretty easy to see but sometime when there is a temptation to hold a local mutex across an blocking MPI call, the chain of dependancies that can lead to deadlock becomes very hard to predict. BTW - maybe this is obvious but you also need to protect the logic which calls MPI_Thread_init to make sure you do not have a a race in which 2 threads each race to test the flag for whether MPI_Init_thread has already been called. If two thread do: 1) if (MPI_Inited_flag == FALSE) { 2) set MPI_Inited_flag 3) MPI_Init_thread 4) } You have a couple race conditions. 1) Two threads may both try to call MPI_Iint_thread if one thread tests " if (MPI_Inited_flag == FALSE)" while the other is between statements 1 & 2. 2) If some thread tests "if (MPI_Inited_flag == FALSE)" while another thread is between statements 2 and 3, that thread could assume MPI_Init_thread is done and make the MPI_Comm_spawn call before the thread that is trying to initialize MPI manages to do it. Dick Dick Treumann - MPI Team IBM Systems & Technology Group Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 users-boun...@open-mpi.org wrote on 09/17/2009 11:36:48 PM: > [image removed] > > Re: [OMPI users] Multi-threading with OpenMPI ? > > Ralph Castain > > to: > > Open MPI Users > > 09/17/2009 11:37 PM > > Sent by: > > users-boun...@open-mpi.org > > Please respond to Open MPI Users > > Only thing I can suggest is to place a thread lock around the call to > comm_spawn so that only one thread at a time can execute that > function. The call to mpi_init_thread is fine - you just need to > explicitly protect the call to comm_spawn. > > ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Multi-threading with OpenMPI ?
Thanks Ralph, I have not much experience in this area.shall i use pthread_mutex_lock(*),*pthread_mutex_unlock() ..etc or following which i saw in OpenMPI source : static opal_mutex_t ompi_lock; OPAL_THREAD_LOCK(_lock); // OPAL_THREAD_UNLOCK(_lock); Thanks in advance, umanga Ralph Castain wrote: Only thing I can suggest is to place a thread lock around the call to comm_spawn so that only one thread at a time can execute that function. The call to mpi_init_thread is fine - you just need to explicitly protect the call to comm_spawn. On Sep 17, 2009, at 7:44 PM, Ashika Umanga Umagiliya wrote: HI Jeff, Ralph, Yes, I call MPI_COMM_SPAWN in multiple threads simultaneously. Because I need to expose my parallel algorithm as a web service, I need multiple clients connect and execute my logic as same time(ie mutiple threads). For each client , a new thread is created (by Web service framework) and inside the thread,MPI_Init_Thread() is called if the MPI hasnt been initialized. The the thread calls MPI_COMM__SPAWN and create new processes. So ,if this is the case isn't there any workarounds ? Thanks in advance, umanga Jeff Squyres wrote: On Sep 16, 2009, at 9:53 PM, Ralph Castain wrote: Only the obvious, and not very helpful one: comm_spawn isn't thread safe at this time. You'll need to serialize your requests to that function. This is likely the cause of your issues if you are calling MPI_COMM_SPAWN in multiple threads simultaneously. Can you verify? If not, we'll need to dig a little deeper to figure out what's going on. But Ralph is right -- read up on the THREAD_MULTIPLE constraints (check the OMPI README file) to see if that's what's biting you. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Multi-threading with OpenMPI ?
HI Jeff, Ralph, Yes, I call MPI_COMM_SPAWN in multiple threads simultaneously. Because I need to expose my parallel algorithm as a web service, I need multiple clients connect and execute my logic as same time(ie mutiple threads). For each client , a new thread is created (by Web service framework) and inside the thread,MPI_Init_Thread() is called if the MPI hasnt been initialized. The the thread calls MPI_COMM__SPAWN and create new processes. So ,if this is the case isn't there any workarounds ? Thanks in advance, umanga Jeff Squyres wrote: On Sep 16, 2009, at 9:53 PM, Ralph Castain wrote: Only the obvious, and not very helpful one: comm_spawn isn't thread safe at this time. You'll need to serialize your requests to that function. This is likely the cause of your issues if you are calling MPI_COMM_SPAWN in multiple threads simultaneously. Can you verify? If not, we'll need to dig a little deeper to figure out what's going on. But Ralph is right -- read up on the THREAD_MULTIPLE constraints (check the OMPI README file) to see if that's what's biting you.
Re: [OMPI users] Multi-threading with OpenMPI ?
Any tips ? Anyone ? :( Ashika Umanga Umagiliya wrote: One more modification , I do not call MPI_Finalize() from the "libParallel.so" library. Ashika Umanga Umagiliya wrote: Greetings all, After some reading , I found out that I have to build openMPI using "--enable-mpi-threads" After thatm I changed MPI_INIT() code in my "libParallel.so" and in "parallel-svr" (please refer to http://i27.tinypic.com/mtqurp.jpg ) to : int sup; MPI_Init_thread(NULL,NULL,MPI_THREAD_MULTIPLE,); Now when multiple requests comes (multiple threads) MPI gives following two errors: "[umanga:06127] [[8004,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 299 [umanga:6127] *** An error occurred in MPI_Comm_spawn [umanga:6127] *** on communicator MPI_COMM_SELF [umanga:6127] *** MPI_ERR_UNKNOWN: unknown error [umanga:6127] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [umanga:06126] [[8004,0],0]-[[8004,1],0] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104) -- mpirun has exited due to process rank 0 with PID 6127 on node umanga exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). " or sometimes : "[umanga:5477] *** An error occurred in MPI_Comm_spawn [umanga:5477] *** on communicator MPI_COMM_SELF [umanga:5477] *** MPI_ERR_UNKNOWN: unknown error [umanga:5477] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [umanga:05477] [[7630,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 299 -- mpirun has exited due to process rank 0 with PID 5477 on node umanga exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). ----------" Any tips ? Thank you Ashika Umanga Umagiliya wrote: Greetings all, Please refer to image at: http://i27.tinypic.com/mtqurp.jpg Here the process illustrated in the image: 1) C++ Webservice loads the "libParallel.so" when it starts up. (dlopen) 2) When a new request comes from a client,*new thread* is created, SOAP data is bound to C++ objects and calcRisk() method of webservice invoked.Inside this method, "calcRisk()" of "libParallel" is invoked (using dlsym ..etc) 3) Inside "calcRisk()" of "libParallel" ,it spawns "parallel-svr" MPI application. (I am using boost MPI and boost serializarion to send custom-data-types across spawned processes.) 4) "parallel-svr" (MPI Application in image) execute the parallel logic and send the result back to "libParallel.so" using boost MPI send..etc. 5) "libParallel.so" send the result to webservice,bind into SOAP and sent result to client and the thread ends. My problem is : Everthing works fine for the first request from the client, For the second request it throws an error (i assume from libParallel.so") saying: "-- Calling any MPI-function after calling MPI_Finalize is erroneous. The only exceptions are MPI_Initialized, MPI_Finalized and MPI_Get_version. -- *** An error occurred in MPI_Init *** after MPI was finalized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [umanga:19390] Abort after MPI_FINALIZE completed successfully; not able to guarantee that all other processes were killed!" Is this because of multithreading ? Any idea how to fix this ? Thanks in advance, umanga
Re: [OMPI users] Multi-threading with OpenMPI ?
One more modification , I do not call MPI_Finalize() from the "libParallel.so" library. Ashika Umanga Umagiliya wrote: Greetings all, After some reading , I found out that I have to build openMPI using "--enable-mpi-threads" After thatm I changed MPI_INIT() code in my "libParallel.so" and in "parallel-svr" (please refer to http://i27.tinypic.com/mtqurp.jpg ) to : int sup; MPI_Init_thread(NULL,NULL,MPI_THREAD_MULTIPLE,); Now when multiple requests comes (multiple threads) MPI gives following two errors: "[umanga:06127] [[8004,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 299 [umanga:6127] *** An error occurred in MPI_Comm_spawn [umanga:6127] *** on communicator MPI_COMM_SELF [umanga:6127] *** MPI_ERR_UNKNOWN: unknown error [umanga:6127] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [umanga:06126] [[8004,0],0]-[[8004,1],0] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104) -- mpirun has exited due to process rank 0 with PID 6127 on node umanga exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). " or sometimes : "[umanga:5477] *** An error occurred in MPI_Comm_spawn [umanga:5477] *** on communicator MPI_COMM_SELF [umanga:5477] *** MPI_ERR_UNKNOWN: unknown error [umanga:5477] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [umanga:05477] [[7630,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 299 -- mpirun has exited due to process rank 0 with PID 5477 on node umanga exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). ----------" Any tips ? Thank you Ashika Umanga Umagiliya wrote: Greetings all, Please refer to image at: http://i27.tinypic.com/mtqurp.jpg Here the process illustrated in the image: 1) C++ Webservice loads the "libParallel.so" when it starts up. (dlopen) 2) When a new request comes from a client,*new thread* is created, SOAP data is bound to C++ objects and calcRisk() method of webservice invoked.Inside this method, "calcRisk()" of "libParallel" is invoked (using dlsym ..etc) 3) Inside "calcRisk()" of "libParallel" ,it spawns "parallel-svr" MPI application. (I am using boost MPI and boost serializarion to send custom-data-types across spawned processes.) 4) "parallel-svr" (MPI Application in image) execute the parallel logic and send the result back to "libParallel.so" using boost MPI send..etc. 5) "libParallel.so" send the result to webservice,bind into SOAP and sent result to client and the thread ends. My problem is : Everthing works fine for the first request from the client, For the second request it throws an error (i assume from libParallel.so") saying: "-- Calling any MPI-function after calling MPI_Finalize is erroneous. The only exceptions are MPI_Initialized, MPI_Finalized and MPI_Get_version. -- *** An error occurred in MPI_Init *** after MPI was finalized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [umanga:19390] Abort after MPI_FINALIZE completed successfully; not able to guarantee that all other processes were killed!" Is this because of multithreading ? Any idea how to fix this ? Thanks in advance, umanga
Re: [OMPI users] Multi-threading with OpenMPI ?
Greetings all, After some reading , I found out that I have to build openMPI using "--enable-mpi-threads" After thatm I changed MPI_INIT() code in my "libParallel.so" and in "parallel-svr" (please refer to http://i27.tinypic.com/mtqurp.jpg ) to : int sup; MPI_Init_thread(NULL,NULL,MPI_THREAD_MULTIPLE,); Now when multiple requests comes (multiple threads) MPI gives following two errors: "[umanga:06127] [[8004,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 299 [umanga:6127] *** An error occurred in MPI_Comm_spawn [umanga:6127] *** on communicator MPI_COMM_SELF [umanga:6127] *** MPI_ERR_UNKNOWN: unknown error [umanga:6127] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [umanga:06126] [[8004,0],0]-[[8004,1],0] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104) -- mpirun has exited due to process rank 0 with PID 6127 on node umanga exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). " or sometimes : "[umanga:5477] *** An error occurred in MPI_Comm_spawn [umanga:5477] *** on communicator MPI_COMM_SELF [umanga:5477] *** MPI_ERR_UNKNOWN: unknown error [umanga:5477] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [umanga:05477] [[7630,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 299 -- mpirun has exited due to process rank 0 with PID 5477 on node umanga exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). ----------" Any tips ? Thank you Ashika Umanga Umagiliya wrote: Greetings all, Please refer to image at: http://i27.tinypic.com/mtqurp.jpg Here the process illustrated in the image: 1) C++ Webservice loads the "libParallel.so" when it starts up. (dlopen) 2) When a new request comes from a client,*new thread* is created, SOAP data is bound to C++ objects and calcRisk() method of webservice invoked.Inside this method, "calcRisk()" of "libParallel" is invoked (using dlsym ..etc) 3) Inside "calcRisk()" of "libParallel" ,it spawns "parallel-svr" MPI application. (I am using boost MPI and boost serializarion to send custom-data-types across spawned processes.) 4) "parallel-svr" (MPI Application in image) execute the parallel logic and send the result back to "libParallel.so" using boost MPI send..etc. 5) "libParallel.so" send the result to webservice,bind into SOAP and sent result to client and the thread ends. My problem is : Everthing works fine for the first request from the client, For the second request it throws an error (i assume from libParallel.so") saying: "-- Calling any MPI-function after calling MPI_Finalize is erroneous. The only exceptions are MPI_Initialized, MPI_Finalized and MPI_Get_version. -- *** An error occurred in MPI_Init *** after MPI was finalized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [umanga:19390] Abort after MPI_FINALIZE completed successfully; not able to guarantee that all other processes were killed!" Is this because of multithreading ? Any idea how to fix this ? Thanks in advance, umanga
Re: [OMPI users] undefined symbol error when built as a sharedlibrary
Hi Jeff, Thanks jeff, that clears everything . Now I remember,few time ago I came up with a issue like this when using Dynamic loading (dlopen,dlsym..etc) and later I had to use shared libraries.. I think ,this is same as that. thanks again umanga Jeff Squyres wrote: On Sep 10, 2009, at 9:42 PM, Ashika Umanga Umagiliya wrote: That fixed the problem ! You are indeed a voodoo master... could you explain the spell behind your magic :) The problem has to do with how plugins (aka dynamic shared objects, DSO's) are loaded. When a DSO is loaded into a Linux process, it has the option of making all the public symbols in that DSO public to the rest of the process or private within its own scope. Let's back up. Remember that Open MPI is based on plugins (DSO's). It loads lots and lots of plugins during execution (mostly during MPI_INIT). These plugins call functions in OMPI's public libraries (e.g., they call functions in libmpi.so). Hence, when the plugin DSO's are loaded, they need to be able to resolve these symbols into actual code that can be invoked. If the symbols cannot be resolved, the DSO load fails. If libParallel.so is loaded into a private scope, then its linked libraries (e.g., libmpi.so) are also loaded into that same private scope. Hence, all of libmpi.so's public symbols are only public within that single, private scope. Then, when OMPI goes to load its own DSOs, since libmpi.so's public symbols are in a private scope, OMPI's DSO's can't find them -- and therefore they refuse to load. (private scopes are not inherited -- a new DSO load cannot "see" libParallel.so/libmpi.so's private scope). It's an educated guess from your description that this is what was happening. OMPI's --disable-dlopen configure option has Open MPI build in a different way. Instead of building all of OMPI's plugins as DSOs, they are "slurped" up into libmpi.so (etc.). So there's no "loading" of DSOs at MPI_INIT time -- the plugin code actually resides *in* libmpi.so itself. Hence, resolution of all symbols is done when libParallel.so loads libmpi.so. Additionally, there's no secondary private scope created when DSOs are loaded -- they're all self-contained within libmpi.so (etc.). And therefore all the libmpi.so symbols that are required for the plugins are all able to be found/resolved at load time. Does that make sense? Regards, umanga Jeff Squyres wrote: > I'm guessing that this has to do with deep, dark voodoo involved with > the run time linker. > > Can you try configuring/building Open MPI with --disable-dlopen > configure option, and rebuilding your libParallel.so against the new > libmpi.so? > > See if that fixes the problem for you. If it does, I can explain in > more detail (if you care). > > > On Sep 10, 2009, at 3:24 AM, Ashika Umanga Umagiliya wrote: > >> Greetings all, >> >> My parallel application is build as a shared library (libParallel.so). >> (I use Debian Lenny 64bit). >> A webservice is used to dynamically load libParallel.so and inturn >> execute the parallel process . >> >> But during runtime I get the error : >> >> webservicestub: symbol lookup error: >> /usr/local/lib/openmpi/mca_paffinity_linux.so: undefined symbol: >> mca_base_param_reg_int >> >> which I cannot figure out.I followed every 'ldd' and 'nm' seems >> everything is fine. >> So I compiled and tested my parallel code as an executable and then it >> worked fine. >> >> What could be the reason for this? >> >> Thanks in advance, >> umanga >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Multi-threading with OpenMPI ?
Greetings all, Please refer to image at: http://i27.tinypic.com/mtqurp.jpg Here the process illustrated in the image: 1) C++ Webservice loads the "libParallel.so" when it starts up. (dlopen) 2) When a new request comes from a client,*new thread* is created, SOAP data is bound to C++ objects and calcRisk() method of webservice invoked.Inside this method, "calcRisk()" of "libParallel" is invoked (using dlsym ..etc) 3) Inside "calcRisk()" of "libParallel" ,it spawns "parallel-svr" MPI application. (I am using boost MPI and boost serializarion to send custom-data-types across spawned processes.) 4) "parallel-svr" (MPI Application in image) execute the parallel logic and send the result back to "libParallel.so" using boost MPI send..etc. 5) "libParallel.so" send the result to webservice,bind into SOAP and sent result to client and the thread ends. My problem is : Everthing works fine for the first request from the client, For the second request it throws an error (i assume from libParallel.so") saying: "-- Calling any MPI-function after calling MPI_Finalize is erroneous. The only exceptions are MPI_Initialized, MPI_Finalized and MPI_Get_version. -- *** An error occurred in MPI_Init *** after MPI was finalized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [umanga:19390] Abort after MPI_FINALIZE completed successfully; not able to guarantee that all other processes were killed!" Is this because of multithreading ? Any idea how to fix this ? Thanks in advance, umanga
Re: [OMPI users] undefined symbol error when built as a shared library
HI Jeff, Thanks alot. That fixed the problem ! You are indeed a voodoo master... could you explain the spell behind your magic :) Regards, umanga Jeff Squyres wrote: I'm guessing that this has to do with deep, dark voodoo involved with the run time linker. Can you try configuring/building Open MPI with --disable-dlopen configure option, and rebuilding your libParallel.so against the new libmpi.so? See if that fixes the problem for you. If it does, I can explain in more detail (if you care). On Sep 10, 2009, at 3:24 AM, Ashika Umanga Umagiliya wrote: Greetings all, My parallel application is build as a shared library (libParallel.so). (I use Debian Lenny 64bit). A webservice is used to dynamically load libParallel.so and inturn execute the parallel process . But during runtime I get the error : webservicestub: symbol lookup error: /usr/local/lib/openmpi/mca_paffinity_linux.so: undefined symbol: mca_base_param_reg_int which I cannot figure out.I followed every 'ldd' and 'nm' seems everything is fine. So I compiled and tested my parallel code as an executable and then it worked fine. What could be the reason for this? Thanks in advance, umanga ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] undefined symbol error when built as a shared library
Greetings all, My parallel application is build as a shared library (libParallel.so). (I use Debian Lenny 64bit). A webservice is used to dynamically load libParallel.so and inturn execute the parallel process . But during runtime I get the error : webservicestub: symbol lookup error: /usr/local/lib/openmpi/mca_paffinity_linux.so: undefined symbol: mca_base_param_reg_int which I cannot figure out.I followed every 'ldd' and 'nm' seems everything is fine. So I compiled and tested my parallel code as an executable and then it worked fine. What could be the reason for this? Thanks in advance, umanga
[OMPI users] [OT] : Programming on PS3 Cell BE chip ?
Are all the commercial PS3 games developed in "parallel way".(unlike sequential like XBox development) ? Do the developers have *think* in parallel way and use all the MPI_* like commands to communicate with SPEs ?
Re: [OMPI users] OpenMPI 1.3.3 with Boost.MPI ?
Thanks Federico, It worked fine.But I have small issue.Following code demonstrates how I use mpi::intercommunicator.But in the spawned child processes, the intercommunicator size is same as number of spawned processes.But it should be 1 ,right? Because,I execute the manager process (manager.cpp) as "mpirun -np 1 manager" .So there should be only one process. thanks in advance umanga manager.cpp (manager process which spawns child processes) - rank 0 int main(int argc,char *argv[]) { mpi::environment evn(argc,argv); mpi::communicator world; MPI_Comm everyone; //spawn 5 child processes. MPI_Comm_spawn("./worker", MPI_ARGV_NULL, 5, MPI_INFO_NULL, 0, MPI_COMM_SELF, , MPI_ERRCODES_IGNORE); intercommunicator intcomm(everyone,comm_duplicate); if(rank==0){ GPSPosition *obj=new GPSPosition(100,200,300); shared_ptr pos(new Position); pos->setVals(); obj->addP(pos); intcomm.send(0,100,obj); } return 0; } worker.cpp (child process)- rank 0-4 --- int main(int argc,char *argv[]) { mpi::environment evn(argc,argv); MPI_Comm parent; MPI_Comm_get_parent(); intercommunicator incomm(parent,comm_duplicate); communicator world; if(parent==MPI_COMM_NULL){ cout << "Intercommunicator is Null !"<<endl; }else{ int size=incomm.size() ; //Size should be 1 but gives 5 ??? int worldsize=world.size(); //Size 5 int r=incomm.rank(); cout <<"Rank !"<<r<< endl; //get 0-4 if(r==1){ //try receiving data send from manager process } } return 0; } Federico Golfrè Andreasi wrote: Look at http://www.boost.org/doc/libs/1_40_0/doc/html/boost/mpi/intercommunicator.html to have a Boost wrapper for an Intercommunicator. Federico 2009/8/28 Ashika Umanga Umagiliya <auma...@biggjapan.com <mailto:auma...@biggjapan.com>> Greetings all, I wanted to send come complex user defined types between MPI processes and found out that Boost.MPI is quite easy to use for my requirement.So far it worked well and I received my object model in every process without problems. Now I am going to spawn processes (using MPI_Comm_spawn, because Boot.MPI doesn't have such a function) and then use Boost.MPI to send the objects across newly created child processes. Is there any issues with this procedure? And Boost.MPI says it only support OpenMPI 1.0.x (http://www.boost.org/doc/libs/1_40_0/doc/html/mpi/getting_started.html#mpi.mpi_impl) Will there be any version incompatibilities ? thanks in advance, umanga ___ users mailing list us...@open-mpi.org <mailto:us...@open-mpi.org> http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] OpenMPI 1.3.3 with Boost.MPI ?
Greetings all, I wanted to send come complex user defined types between MPI processes and found out that Boost.MPI is quite easy to use for my requirement.So far it worked well and I received my object model in every process without problems. Now I am going to spawn processes (using MPI_Comm_spawn, because Boot.MPI doesn't have such a function) and then use Boost.MPI to send the objects across newly created child processes. Is there any issues with this procedure? And Boost.MPI says it only support OpenMPI 1.0.x (http://www.boost.org/doc/libs/1_40_0/doc/html/mpi/getting_started.html#mpi.mpi_impl) Will there be any version incompatibilities ? thanks in advance, umanga
Re: [OMPI users] Embedding MPI program into a webservice ?
Greetings all, Please refer to the image at : http://i25.tinypic.com/v65ite.png As mentioned in Lisandro's reply, my webservice is acting as a proxy to the MPI application. In the webservice, the SOAP parameters are bind into C++ object model. But I have several questions: (1) It seems MPI_Comm_spawn() command just execute the MPI programm like an external application. So, the C++ object-model created in webservice, is not accessible in my MPI application.(illustrated in blue line) If thats the case,to pass the input parameters, I have to marshal my object-model into an XML file , then call MPI_Comm_spawn() with the filename as an argument,so that MPI programm can read the values from the XML file.(illustrated in red lines) Is there any other way to do this? (2) Before calling MPI_Comm_spawn() in my webservice,I have to initialize MPI by calling, MPI_Init(),MPI_Comm_get_parent()..etc. So do I have to initialize MPI in my webservice logic. If thats the case,I can't start my webservice in standard way like: #./svmWebservice but in MPI way: #mpirun -np 100 -hostfile ~./hosts svmWebservice ??? which is confusing ?? Any tips? Thanks in advance, umanga Lisandro Dalcin wrote: I do not know anything about implementing webservices, but you should take a look at MPI-2 dynamic process management. This way, your webservice can MPI_Comm_spawn() a brand-new set of parallel processes doing the heavy work. This way, your webservice will act as a kind of proxy application between the request coming from the outside world and your parallel computing resources... On Fri, Jul 17, 2009 at 12:44 AM, Ashika Umanga Umagiliya<auma...@biggjapan.com> wrote: Greetings all, I am in the design level of parallizing an SVM algorithm.We need to expose this as a webservice.I have decided to go with Axis2/C implementation. Please refer to : http://i30.tinypic.com/i707qq.png As can be seen in the Figure1 , can I embedd my MPI logic in side my Webservice ? I guess that its not possible because the webservice is packaged as a static library (myService.so) and can not execute the "mpirun". In Figure2, I have illustrated another alternative.In my Webservice, I invoke my parallel program (myParallelProg) using "mpirun" and other parameters. Is there any good design to accomplish what I am trying to do?I think the second is not a good design ? Thanks in advance, umanga ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Embedding MPI program into a webservice ?
Hi prasad , thanks you for your reply.After some googling, I realized that you are a key member behind Axis2/C SAML , so saying "a bit of experience" is being too much humble :) So nice meet a smart sri lankan in the forum.Really appropriate your guys great work. Cheers, umanga Prasadcse Perera wrote: Hi, with a bit of experience I have with the Axis2/C, I think your second model with MPI_Comm_spawn might solve your problem. One of crude restriction with Axis2/C architecture is the run time service loading using static libs. This sometimes prevent many user needed handling where your logic has to be started from Invoke calls. But in this scenario I think the service acting as an agent to parellalize your task (second diagram) is the evident option that will suit your requeirement. On Fri, Jul 17, 2009 at 10:55 AM, Lisandro Dalcin <dalc...@gmail.com <mailto:dalc...@gmail.com>> wrote: I do not know anything about implementing webservices, but you should take a look at MPI-2 dynamic process management. This way, your webservice can MPI_Comm_spawn() a brand-new set of parallel processes doing the heavy work. This way, your webservice will act as a kind of proxy application between the request coming from the outside world and your parallel computing resources... On Fri, Jul 17, 2009 at 12:44 AM, Ashika Umanga Umagiliya<auma...@biggjapan.com <mailto:auma...@biggjapan.com>> wrote: > Greetings all, > > I am in the design level of parallizing an SVM algorithm.We need to expose > this as a webservice.I have decided to go with Axis2/C implementation. > > Please refer to : http://i30.tinypic.com/i707qq.png > > As can be seen in the Figure1 , can I embedd my MPI logic in side my > Webservice ? I guess that its not possible because the webservice is > packaged as a static library (myService.so) and can not execute the > "mpirun". > > In Figure2, I have illustrated another alternative.In my Webservice, I > invoke my parallel program (myParallelProg) using "mpirun" and other > parameters. > > Is there any good design to accomplish what I am trying to do?I think the > second is not a good design ? > > Thanks in advance, > umanga > > > ___ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Lisandro Dalcín --- Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 ___ users mailing list us...@open-mpi.org <mailto:us...@open-mpi.org> http://www.open-mpi.org/mailman/listinfo.cgi/users -- http://www.codeproject.com/script/Articles/MemberArticles.aspx?amid=3489381 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Embedding MPI program into a webservice ?
Greetings all, I am in the design level of parallizing an SVM algorithm.We need to expose this as a webservice.I have decided to go with Axis2/C implementation. Please refer to : http://i30.tinypic.com/i707qq.png As can be seen in the Figure1 , can I embedd my MPI logic in side my Webservice ? I guess that its not possible because the webservice is packaged as a static library (myService.so) and can not execute the "mpirun". In Figure2, I have illustrated another alternative.In my Webservice, I invoke my parallel program (myParallelProg) using "mpirun" and other parameters. Is there any good design to accomplish what I am trying to do?I think the second is not a good design ? Thanks in advance, umanga
Re: [OMPI users] Error connecting to nodes ?
Hi Raymond , Thanks for the tips, I configured out the problem, its with the .bashrc in the nodes. When logged in to Bash in 'non-interactive' mode, I figured out that "$MPI\bin" folder is missing in the PATH. I edited .bashrc in every node so that the "$MPI_HOME/bin" is added to PATH. Setting the PATH in /etc/profile wont help to run MPI.(well.. at least in Debian ) Thanks and best regards, umanga Hi Ashika, Ashika Umanga Umagiliya wrote: In my MPI environment I have 3 Debian machines all setup openMPI in /usr/local/openMPI, configured PATH and LD_LIBRARY_PATH correctly. And I have also configured passwordless SSH login in each node. But when I execute my application , it gives following error , what seems to be the problem ? Have you check whether or not mpirun works on a single machine (i.e., mpirun -np 4 -host localhost mandel)? Did you install openmpi from source or via the apt-get package manager? I used the pkg mgr and orted is located at /usr/bin/orted -- do you have this file on all 3 systems? And this is Debian stable? Ray ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Error connecting to nodes ?
Greetings all, In my MPI environment I have 3 Debian machines all setup openMPI in /usr/local/openMPI, configured PATH and LD_LIBRARY_PATH correctly. And I have also configured passwordless SSH login in each node. But when I execute my application , it gives following error , what seems to be the problem ? Thanks in advance. vito@umanga:~/Mandelbrot20321811212006$ mpirun -np 4 -host brother1,brother2 mandel bash: orted: command not found -- A daemon (pid 7210) died unexpectedly with status 127 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -- -- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -- -- mpirun was unable to cleanly terminate the daemons on the nodes shown below. Additional manual cleanup may be required - please refer to the "orte-clean" tool for assistance. -- brother1 - daemon did not report back when launched brother2 - daemon did not report back when launched vito@umanga:~/Mandelbrot20321811212006$ bash: orted: command not found
[OMPI users] Cannot create X11 window in Rank 0 node(master) ?
Hi all, I have written mandelbrot rendering using X11.I have two nodes with OpenMPI. In my code, the rank 0 node(master) does the data collection from salves, Window creation using X11 and render the graph.(using XCreateSimpleWindow() ). Slave nodes calculate and send data to master. But when I execute my application by : #mpirun -np 4 -host biggserver ./mandel its gives error : "cannot connect to X server '(null)'" Isn't the rank 0 given to the machine which we initially run the MPI application? If so,why the application says it cannot connect to X server?Since rank 0 is my workstation? When I run this using a single node , it works well. Any tips? thanks in advance, umanga
Re: [OMPI users] Some Newbie questions
Thanks all for the answers, I am parallelizing tomography algorithm which takes about 5 hours using single processor. I want to gain full performance and should reduce the computational time as short as possible. I was wondering the SSh/RSH launcher could be a performance issue?(I am just guessing) What kind of software/middleware I should use to solve the network/communication overhead. (different launchers ,may be Sun Grid Engine,XGrid ? ) I am runing Debian/Lenny and since the project is academic I want to use OSS .. Best Regards, Umanga Jeff Squyres wrote: On Jun 29, 2009, at 2:19 AM, vipin kumar wrote: Q. Since I am using TCP/Ethernet I guess that MPI uses SSH/RSH to communicate between peers. Ans. May be you are right. I don't know exactly how peers communicate in MPI environment, but I can say for sure that OpenMPI uses rsh/ssh as one of the available launchers. Open MPI uses two different mechanisms for launching individual MPI processes vs. MPI communications. rsh/ssh is one of the options that Open MPI can use for launching MPI processes, but we don't keep those channels open and don't use them for MPI communications. Individual, new TCP sockets are opened for MPI_SEND / MPI_RECV (etc.) traffic. These sockets are not encrypted (like ssh connections would be). Q. And for that, each peer should have the copy of the application right? Ans. Absolutely correct. But If you don't want to copy binaries manually you should use "--preload-binary" option. OpenMPI will copy the executables in remote nodes before launching processes, and will delete when job gets done. It is almost always good to use latest version. "--preload-binary" option may be absent in old versions. It is new in the 1.3 series; it did not exist in the 1.2 series.
Re: [OMPI users] Some Newbie questions
Hi Vipin , Thanks alot for the reply. I went through the FAQ and it also answered some of my questions. But my problem is, since I am using TCP/Ethernet I guess that MPI uses SSH/RSH to communicate between peers. And for that, each peer should have the copy of the application right? I use 1.2.7rc2 (from Debian/Lenny repo) , and I didn't see the option "--preload-binary" , is it because the lower version? Bets regards, umanga. vipin kumar wrote: Hi, I am not expert, I am user like you but I think I can help you, Q. After installing OpenMPI on each machine ,do i need to run a service/daemon on each machine? Ans. No, not at all, Open MPI takes care of that for you. Q. How does peers in MPI environment communicate ? Ans. Using Communicator(name of the group of processes to whom it belongs, technically handle) and rank of the process in that Communicator. Q. After implementing parallel program , do I have to install the application on every machine ? Ans. Not necessary. Use "--preload-binary" option while launching the application through mpirun or mpiexec. Useful links: 1. http://www.open-mpi.org/faq/?category=running#simple-spmd-run 2. http://www.open-mpi.org/faq/ 3. man page for mpirun Regards, Vipin K. On Mon, Jun 29, 2009 at 8:21 AM, Ashika Umanga Umagiliya <auma...@biggjapan.com <mailto:auma...@biggjapan.com>> wrote: Greeting all, I am new to Open MPI and I have some newbie questions. I have given 4 machines at our laboratory to set up Open MPI.Our net work is simply TCP/Ethernet running Debian Linux. 1) After installing OpenMPI on each machine ,do i need to run a service/daemon on each machine? (How does peers in MPI environment communicate ?) 2) After implementing parallel program , do I have to install the application on every machine ? (I thought the program automatically propagated to other peers , like in RMI applications? ) Thanks in advance, Umanga ___ users mailing list us...@open-mpi.org <mailto:us...@open-mpi.org> http://www.open-mpi.org/mailman/listinfo.cgi/users -- Vipin K. Research Engineer, C-Dot, Bangalore, India ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Some Newbie questions
Greeting all, I am new to Open MPI and I have some newbie questions. I have given 4 machines at our laboratory to set up Open MPI.Our net work is simply TCP/Ethernet running Debian Linux. 1) After installing OpenMPI on each machine ,do i need to run a service/daemon on each machine? (How does peers in MPI environment communicate ?) 2) After implementing parallel program , do I have to install the application on every machine ? (I thought the program automatically propagated to other peers , like in RMI applications? ) Thanks in advance, Umanga