Re: [OMPI users] speed up this problem by MPI
On Fri, 29 Jan 2010 11:25:09 -0500, Richard Treumann wrote: > Any support for automatic serialization of C++ objects would need to be in > some sophisticated utility that is not part of MPI. There may be such > utilities but I do not think anyone who has been involved in the discussion > knows of one you can use. I certainly do not. C++ really doesn't offer sufficient type introspection to implement something like this. Boost.MPI offers serialization for a few types (e.g. some STL containers), but the general solution that you would like just doesn't exist (you'd have to write special code for every type you want to be able to operate on). Python can do things like this, mpi4py can operate transparently on any (pickleable) object, and also offers complete bindings to the low-level MPI interface. CL-MPI (Common Lisp) can also do these things, but it's much less mature than mpi4py. Jed
Re: [OMPI users] speed up this problem by MPI
Tim wrote: By serialization, I mean in the context of data storage and transmission. See http://en.wikipedia.org/wiki/Serialization e.g. in a structure or class, if there is a pointer pointing to some memory outside the structure or class, one has to send the content of the memory besides the structure or class, right? Okay, yes. There are also MPI_Pack/MPI_Unpack functions that take general data types and pack them into contiguous (serialized) buffers. But you first have to describe to MPI what those data structures look like. And it can certainly get complicated. I think I don't have much else to contribute here. There are lots of options and decisions to make based on the particulars of your data structures. The general problem can certainly be complicated, as you've already indicated. Good luck.
Re: [OMPI users] speed up this problem by MPI
Tim MPI is a library providing support for passing messages among several distinct processes. It offers datatype constructors that let an application describe complex layouts of data in the local memory of a process so a message can be sent from a complex data layout or received into a complex layout. MPI does not have access to decisions made by the C++ compiler or the C++ runtime so the MPI library cannot deduce the layout for you. To use MPI you must either organize the data in some way that is easy to describe with MPI datatypes or you must do rather complex data type constructions for every message sent or received. Any support for automatic serialization of C++ objects would need to be in some sophisticated utility that is not part of MPI. There may be such utilities but I do not think anyone who has been involved in the discussion knows of one you can use. I certainly do not. Dick Dick Treumann - MPI Team IBM Systems & Technology Group Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 From: Tim To: Open MPI Users Date: 01/29/2010 11:11 AM Subject: Re: [OMPI users] speed up this problem by MPI Sent by:users-boun...@open-mpi.org By serialization, I mean in the context of data storage and transmission. See http://en.wikipedia.org/wiki/Serialization e.g. in a structure or class, if there is a pointer pointing to some memory outside the structure or class, one has to send the content of the memory besides the structure or class, right? --- On Fri, 1/29/10, Eugene Loh wrote: > From: Eugene Loh > Subject: Re: [OMPI users] speed up this problem by MPI > To: "Open MPI Users" > Date: Friday, January 29, 2010, 11:06 AM > Tim wrote: > > > Sorry, my typo. I meant to say OpenMPI documentation. > > > Okay. "Open (space) MPI" is simply an implementation > of the MPI standard -- e.g., http://www.mpi-forum.org/docs/mpi21-report.pdf . > I imagine an on-line search will turn up a variety of > tutorials and explanations of that standard. But the > standard, itself, is somewhat readable. > > > How to send/recieve and broadcast objects of > self-defined class and of std::vector? If using > MPI_Type_struct, the setup becomes complicated if the class > has various types of data members, and a data member of > another class. > > > I don't really know any C++, but I guess you're looking at > it the right way. That is, use derived MPI data types > and "it's complicated". > > > How to deal with serialization problems? > > > Which serialization problems? You seem to have a > split/join problem. The master starts, at some point > there is parallel computation, then the masters does more > work at the end. > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] speed up this problem by MPI
By serialization, I mean in the context of data storage and transmission. See http://en.wikipedia.org/wiki/Serialization e.g. in a structure or class, if there is a pointer pointing to some memory outside the structure or class, one has to send the content of the memory besides the structure or class, right? --- On Fri, 1/29/10, Eugene Loh wrote: > From: Eugene Loh > Subject: Re: [OMPI users] speed up this problem by MPI > To: "Open MPI Users" > Date: Friday, January 29, 2010, 11:06 AM > Tim wrote: > > > Sorry, my typo. I meant to say OpenMPI documentation. > > > Okay. "Open (space) MPI" is simply an implementation > of the MPI standard -- e.g., http://www.mpi-forum.org/docs/mpi21-report.pdf . > I imagine an on-line search will turn up a variety of > tutorials and explanations of that standard. But the > standard, itself, is somewhat readable. > > > How to send/recieve and broadcast objects of > self-defined class and of std::vector? If using > MPI_Type_struct, the setup becomes complicated if the class > has various types of data members, and a data member of > another class. > > > I don't really know any C++, but I guess you're looking at > it the right way. That is, use derived MPI data types > and "it's complicated". > > > How to deal with serialization problems? > > > Which serialization problems? You seem to have a > split/join problem. The master starts, at some point > there is parallel computation, then the masters does more > work at the end. > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] speed up this problem by MPI
Tim wrote: Sorry, my typo. I meant to say OpenMPI documentation. Okay. "Open (space) MPI" is simply an implementation of the MPI standard -- e.g., http://www.mpi-forum.org/docs/mpi21-report.pdf . I imagine an on-line search will turn up a variety of tutorials and explanations of that standard. But the standard, itself, is somewhat readable. How to send/recieve and broadcast objects of self-defined class and of std::vector? If using MPI_Type_struct, the setup becomes complicated if the class has various types of data members, and a data member of another class. I don't really know any C++, but I guess you're looking at it the right way. That is, use derived MPI data types and "it's complicated". How to deal with serialization problems? Which serialization problems? You seem to have a split/join problem. The master starts, at some point there is parallel computation, then the masters does more work at the end.
Re: [OMPI users] speed up this problem by MPI
Sorry, my typo. I meant to say OpenMPI documentation. How to send/recieve and broadcast objects of self-defined class and of std::vector? If using MPI_Type_struct, the setup becomes complicated if the class has various types of data members, and a data member of another class. How to deal with serialization problems? Are there some good reference for these problems? --- On Fri, 1/29/10, Eugene Loh wrote: > From: Eugene Loh > Subject: Re: [OMPI users] speed up this problem by MPI > To: "Open MPI Users" > Date: Friday, January 29, 2010, 10:39 AM > Tim wrote: > > > BTW: I would like to find some official documentation > of OpenMP, but there seems none? > > > OpenMP (a multithreading specification) has "nothing" to do > with Open MPI (an implementation of MPI, a message-passing > specification). Assuming you meant OpenMP, try their > web site: http://openmp.org > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] speed up this problem by MPI
Tim wrote: BTW: I would like to find some official documentation of OpenMP, but there seems none? OpenMP (a multithreading specification) has "nothing" to do with Open MPI (an implementation of MPI, a message-passing specification). Assuming you meant OpenMP, try their web site: http://openmp.org
Re: [OMPI users] speed up this problem by MPI
Thanks! How to send/recieve and broadcast objects of self-defined class and of std::vector? How to deal with serialization problems? BTW: I would like to find some official documentation of OpenMP, but there seems none? --- On Fri, 1/29/10, Eugene Loh wrote: > From: Eugene Loh > Subject: Re: [OMPI users] speed up this problem by MPI > To: "Open MPI Users" > Date: Friday, January 29, 2010, 12:50 AM > Tim wrote: > > > Sorry, complicated_computation() and f() are > simplified too much. They do take more inputs. > > Among the inputs to complicated_computation(), some is > passed from the main() to f() by address since it is a big > array, some is passed by value, some are created inside f() > before the call to complicated_computation(). > > so actually (although not exactly) the code is like: > > > I think I'm agreeing with Terry. But, to add more > detail: > > > int main(int argc, char ** > argv) { > int size; > > double *feature = new > double[1000]; > > // compute values of elements > of "feature" > > // some operations > > > The array "feature" can be computed by the master and then > broadcast, or it could be computed redundantly by each > process. > > > f(size, feature); > // some > operations delete [] feature; > return 0; > } > void f(int size, double *feature) > { > vector coeff; > // read from a file into > elements of coeff > > > Similarly, coeff can be read in by the master and then > broadcast, or it could be read redundantly by each process, > or each process could read only the portion that it will > need. > > > > > MyClass myobj; > > double * array = new > double [coeff.size()]; > for (int i = 0; i < > coeff.size(); i++) // need to speed up by MPI. > > { > array[i] = myobj.complicated_computation(size, > coeff[i], feature); // time consuming > } > > > Each process loops only over the iterations that correspond > to its rank. Then, the master gathers all results. > > > // some operations using all > elements in array > delete [] array; > > } > > > Once the slaves have finished their computations and sent > their results to the master, they may exit. The slaves > will be launched at the same time as the master, but > presumably have less to do than the master does before the > "parallel loop" starts. If you don't want slaves > consuming excessive CPU time while they wait for the master, > fix that problem later once you have the basic code > working. > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] speed up this problem by MPI
Tim wrote: Sorry, complicated_computation() and f() are simplified too much. They do take more inputs. Among the inputs to complicated_computation(), some is passed from the main() to f() by address since it is a big array, some is passed by value, some are created inside f() before the call to complicated_computation(). so actually (although not exactly) the code is like: I think I'm agreeing with Terry. But, to add more detail: int main(int argc, char ** argv) { int size; double *feature = new double[1000]; // compute values of elements of "feature" // some operations The array "feature" can be computed by the master and then broadcast, or it could be computed redundantly by each process. f(size, feature); // some operations delete [] feature; return 0; } void f(int size, double *feature) { vector coeff; // read from a file into elements of coeff Similarly, coeff can be read in by the master and then broadcast, or it could be read redundantly by each process, or each process could read only the portion that it will need. MyClass myobj; double * array = new double [coeff.size()]; for (int i = 0; i < coeff.size(); i++) // need to speed up by MPI. { array[i] = myobj.complicated_computation(size, coeff[i], feature); // time consuming } Each process loops only over the iterations that correspond to its rank. Then, the master gathers all results. // some operations using all elements in array delete [] array; } Once the slaves have finished their computations and sent their results to the master, they may exit. The slaves will be launched at the same time as the master, but presumably have less to do than the master does before the "parallel loop" starts. If you don't want slaves consuming excessive CPU time while they wait for the master, fix that problem later once you have the basic code working.
Re: [OMPI users] speed up this problem by MPI
In rank 0 main broadcast feature to all processes. In f calculate a slice of array based on rank, then either send/recv back to rank 0 or maybe gather. Only rank 0 does everything else. (Other ranks must call f after recv'ing feature.) On Thu, 2010-01-28 at 21:23 -0800, Tim wrote: > Sorry, complicated_computation() and f() are simplified too much. They do > take more inputs. > > Among the inputs to complicated_computation(), some is passed from the main() > to f() by address since it is a big array, some is passed by value, some are > created inside f() before the call to complicated_computation(). > > so actually (although not exactly) the code is like: > > int main(int argc, char ** argv) > { > int size; > double *feature = new double[1000]; > // compute values of elements of "feature" > // some operations > f(size, feature); > // some operations > delete [] feature; > return 0; > } > > void f(int size, double *feature) > { > vector coeff; > // read from a file into elements of coeff > MyClass myobj; > double * array = new double [coeff.size()]; > for (int i = 0; i < coeff.size(); i++) // need to speed up by MPI. > { > array[i] = myobj.complicated_computation(size, coeff[i], feature); // > time consuming > } > // some operations using all elements in array > delete [] array; > } > > --- On Thu, 1/28/10, Eugene Loh wrote: > > > From: Eugene Loh > > Subject: Re: [OMPI users] speed up this problem by MPI > > To: "Open MPI Users" > > Date: Thursday, January 28, 2010, 11:40 PM > > Tim wrote: > > > > > Thanks Eugene! > > > > > > My case, after simplified, is to speed up the > > time-consuming computation in the loop below by assigning > > iterations to several nodes in a cluster by MPI. Each > > iteration of the loop computes each element of an array. The > > computation of each element is independent of others in the > > array. > > > int main(int argc, char > > ** argv) { > >// some operations > > f(size); > >// some > > operations > >return 0; > >} > >void f(int size) > >{ // some > > operations > >int i; > > double * array = new double > > [size]; > >for (i = 0; i < size; i++) // need to > > speed up by MPI. > > > { > >array[i] = complicated_computation(); // > > time consuming > > What are the inputs to complicated_computation()? > > Does each process know what the inputs are? Or, do > > they need to come from the master process? Are there > > many inputs? > > > > > } > >// some operations using all > > elements in array > >delete [] array; > >} > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] speed up this problem by MPI
Sorry, complicated_computation() and f() are simplified too much. They do take more inputs. Among the inputs to complicated_computation(), some is passed from the main() to f() by address since it is a big array, some is passed by value, some are created inside f() before the call to complicated_computation(). so actually (although not exactly) the code is like: int main(int argc, char ** argv) { int size; double *feature = new double[1000]; // compute values of elements of "feature" // some operations f(size, feature); // some operations delete [] feature; return 0; } void f(int size, double *feature) { vector coeff; // read from a file into elements of coeff MyClass myobj; double * array = new double [coeff.size()]; for (int i = 0; i < coeff.size(); i++) // need to speed up by MPI. { array[i] = myobj.complicated_computation(size, coeff[i], feature); // time consuming } // some operations using all elements in array delete [] array; } --- On Thu, 1/28/10, Eugene Loh wrote: > From: Eugene Loh > Subject: Re: [OMPI users] speed up this problem by MPI > To: "Open MPI Users" > Date: Thursday, January 28, 2010, 11:40 PM > Tim wrote: > > > Thanks Eugene! > > > > My case, after simplified, is to speed up the > time-consuming computation in the loop below by assigning > iterations to several nodes in a cluster by MPI. Each > iteration of the loop computes each element of an array. The > computation of each element is independent of others in the > array. > > int main(int argc, char > ** argv) { > // some operations > f(size); > // some > operations > return 0; > } > void f(int size) > { // some > operations > int i; > double * array = new double > [size]; > for (i = 0; i < size; i++) // need to > speed up by MPI. > > { > array[i] = complicated_computation(); // > time consuming > What are the inputs to complicated_computation()? > Does each process know what the inputs are? Or, do > they need to come from the master process? Are there > many inputs? > > > } > // some operations using all > elements in array > delete [] array; > } > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] speed up this problem by MPI
Tim wrote: Thanks Eugene! My case, after simplified, is to speed up the time-consuming computation in the loop below by assigning iterations to several nodes in a cluster by MPI. Each iteration of the loop computes each element of an array. The computation of each element is independent of others in the array. int main(int argc, char ** argv) { // some operations f(size); // some operations return 0; } void f(int size) { // some operations int i; double * array = new double [size]; for (i = 0; i < size; i++) // need to speed up by MPI. { array[i] = complicated_computation(); // time consuming What are the inputs to complicated_computation()? Does each process know what the inputs are? Or, do they need to come from the master process? Are there many inputs? } // some operations using all elements in array delete [] array; }
Re: [OMPI users] speed up this problem by MPI
Hi Tim Your OpenMP layout suggests that there are no data dependencies in your "complicated_computation()" and the operations therein are local. I will assume this is true in what I suggest. In MPI you could use MPI_Scatter to distribute the (initial) array values before the computational loop, and MPI_Gather to collect the results after the loop. This approach would stay relatively close to your current program logic/structure. The process that distributes and collects the array, typically rank 0, takes responsibility to read/initialize, and write/report the results. Normally it also takes part in the computation, as there is no reason for it to be just the "master", and sit idle while the "slave" processes do the work. On this ("master", rank 0) process the array would be allocated with the "global" "size". On the remaining processes ("slaves"), the allocated array could be smaller, just as big as to hold the array segment that is computed/manipulated there. How much memory you need to allocate depends on how many processes you launch, and can be controlled dynamically, at run time (see below). At the very beginning of the program you need to 1) initialize MPI (MPI_Init), 2) get each process rank (MPI_Comm_rank), and 3) get the number of processes (MPI_Comm_size). Memory allocation would probably come after that, once you know how many processes are at work. At the end of the program you need to 4) shut MPI down (MPI_Finalize). In OpenMP you can use $OMP_NUM_THREADS to decide at run time how many processes to use. In MPI this is done when you launch the executable by the mpirun command: "mpirun -n $NPROC my_mpi_executable", where $NPROC is the counterpart of $OMP_NUM_THREADS, i.e., the number of processes you want to launch. If you have access to a library, check Peter S. Pacheco's book "Parallel Programming with MPI", as it has examples similar to your problem, and will get you going with MPI in no time. You will also need to check the syntactic details of the MPI functions. I hope this helps. Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA - Tim wrote: Hi, (1). I am wondering how I can speed up the time-consuming computation in the loop of my code below using MPI? int main(int argc, char ** argv) { // some operations f(size); // some operations return 0; } void f(int size) { // some operations int i; double * array = new double [size]; for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to compute all elements in the array? { array[i] = complicated_computation(); // time comsuming computation } // some operations using all elements in array delete [] array; } As shown in the code, I want to do some operations before and after the part to be paralleled with MPI, but I don't know how to specify where the parallel part begins and ends. (2) My current code is using OpenMP to speed up the comutation. void f(int size) { // some operations int i; double * array = new double [size]; omp_set_num_threads(_nb_threads); #pragma omp parallel shared(array) private(i) { #pragma omp for schedule(dynamic) nowait for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to compute all elements in the array? { array[i] = complicated_computation(); // time comsuming computation } } // some operations using all elements in array } I wonder if I change to use MPI, is it possible to have the code written both for OpenMP and MPI? If it is possible, how to write the code and how to compile and run the code? Thanks and regards! ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] speed up this problem by MPI
Thanks Eugene! My case, after simplified, is to speed up the time-consuming computation in the loop below by assigning iterations to several nodes in a cluster by MPI. Each iteration of the loop computes each element of an array. The computation of each element is independent of others in the array. int main(int argc, char ** argv) { // some operations f(size); // some operations return 0; } void f(int size) { // some operations int i; double * array = new double [size]; for (i = 0; i < size; i++) // need to speed up by MPI. { array[i] = complicated_computation(); // time consuming } // some operations using all elements in array delete [] array; } --- On Thu, 1/28/10, Eugene Loh wrote: > From: Eugene Loh > Subject: Re: [OMPI users] speed up this problem by MPI > To: "Open MPI Users" > Date: Thursday, January 28, 2010, 8:31 PM > Tim wrote: > > > Thanks, Eugene. > > > > I admit I am not that smart to understand well how to > use MPI, but I did read some basic materials about it and > understand how some simple problems are solved by MPI. > > But dealing with an array in my case, I am not certain > about how to apply MPI to it. Are you saying to use send and > recieve to transfer the value computed for each element from > child process to parent process? > > > You can, but typically that would entail too much > communication overhead for each element. > > > Do you allocate a copy of the array for each process? > > > You can, but typically that would entail excessive memory > consumption. > > Typically, one allocates only a portion of the array on > each process. E.g., if the array has 10,000 elements > and you have four processes, the first gets the first 2,500 > elements, the second the next 2,500, and so on. > > > Also I only need the loop that computes every element > of the array to be parallelized. > > > If you only need the initial computation of array elements > to be parallelized, perhaps any of the above strategies > could work. It depends on how expensive the > computation of each element is. > > > Someone said that the parallel part begins with > MPI_Init and ends with MPI_Finilize, > > > Well, usually all processes are launched in parallel. > So, the parallel begins "immediately." Inter-process > communications using MPI, however, must take place between > the MPI_Init and MPI_Finalize calls. > > > and one can do any serial computations before and/or > after these calls. But I have wrote some MPI programs, and > found that the parallel part is not restricted between > MPI_Init and MPI_Finilize, but instead the whole program. If > the rest part of the code has to be wrapped for process with > ID 0, I have little idea about how to apply that to my case > since the rest part would be the parts before and after the > loop in the function and the whole in main(). > > > I don't understand your case very clearly. I will > take a guess. You could have all processes start and > call MPI_Init. Then, slave processes can go to sleep, > waking occasionally to check if the master has sent a signal > to begin computation. The master does what it has to > do and then sends wake signals. Each slave computes > its portion and sends that portion back to the master. > Each slave exits. The master gathers all the pieces > and resumes its computation. Does that sound right? > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] speed up this problem by MPI
On Thu, 2010-01-28 at 17:05 -0800, Tim wrote: > Also I only need the loop that computes every element of the array to > be parallelized. Someone said that the parallel part begins with > MPI_Init and ends with MPI_Finilize, and one can do any serial > computations before and/or after these calls. But I have wrote some > MPI programs, and found that the parallel part is not restricted > between MPI_Init and MPI_Finilize, but instead the whole program. If > the rest part of the code has to be wrapped for process with ID 0, I > have little idea about how to apply that to my case since the rest > part would be the parts before and after the loop in the function and > the whole in main(). I think you're being polluted by your OpenMP experience! ;-) Unlike in OpenMP, there is no concept of "parallel region" when using MPI. MPI allows you to pass data between processes. That's all. It's up to you to write your code in such a way that the data is used allow parallel computation. Often MPI_Init and MPI_Finilize are amongst the first and last things done in a parallel code, respectively. They effectively say "set up stuff so I can pass messages effectively" and "clean that up". Each process runs from start to finish "independently". As an aside, using MPI is much more invasive than OpenMP. Parallelising an existing serial code can be hard with MPI. But if you start from scratch you usually end up with a better code with MPI than with OpenMP (e.g. MPI makes you think about data locality, whereas you can ignore all the bad things bad locality does and still have a working code with OpenMP.)
Re: [OMPI users] speed up this problem by MPI
Hi Tom, sorry to add something in the same vein as Eugene's reply. i think this is an excellent resource http://ci-tutor.ncsa.illinois.edu/login.php. It's a great online course and detailed! Before I took proper classes, this helped me a lot!! On Thu, Jan 28, 2010 at 7:05 PM, Tim wrote: > Thanks, Eugene. > > I admit I am not that smart to understand well how to use MPI, but I did > read some basic materials about it and understand how some simple problems > are solved by MPI. > > But dealing with an array in my case, I am not certain about how to apply > MPI to it. Are you saying to use send and recieve to transfer the value > computed for each element from child process to parent process? Do you > allocate a copy of the array for each process? > > Also I only need the loop that computes every element of the array to be > parallelized. Someone said that the parallel part begins with MPI_Init and > ends with MPI_Finilize, and one can do any serial computations before and/or > after these calls. But I have wrote some MPI programs, and found that the > parallel part is not restricted between MPI_Init and MPI_Finilize, but > instead the whole program. If the rest part of the code has to be wrapped > for process with ID 0, I have little idea about how to apply that to my case > since the rest part would be the parts before and after the loop in the > function and the whole in main(). > > If someone could give a sample of how to apply MPI in my case, it will > clarify a lot of my questions. Usually I can learn a lot from good examples. > > Thanks! > > --- On Thu, 1/28/10, Eugene Loh wrote: > > > From: Eugene Loh > > Subject: Re: [OMPI users] speed up this problem by MPI > > To: "Open MPI Users" > > Date: Thursday, January 28, 2010, 7:30 PM > > Take a look at some introductory MPI > > materials to learn how to use MPI and what it's about. > > There should be resources on-line... take a look around. > > > > The main idea is that you would have many processes, each > > process would have part of the array. Thereafter, if a > > process needs data or results from any other process, such > > data would have to be exchanged between the processes > > explicitly. > > > > Many codes have both OpenMP and MPI parallelization, but > > you should first familiarize yourself with the basics of MPI > > before dealing with "hybrid" codes. > > > > Tim wrote: > > > > > Hi, > > > > > > (1). I am wondering how I can speed up the > > time-consuming computation in the loop of my code below > > using MPI? > > > int main(int argc, char > > ** argv) { > >// some operations > > f(size); > >// some > > operations > >return 0; > >} > > void f(int size) > >{ // some > > operations > > int i; > >double * array = new double > > [size]; > >for (i = 0; i < size; i++) // how can I > > use MPI to speed up this loop to compute all elements in the > > array? { > >array[i] = complicated_computation(); // > > time comsuming computation > >} > >// some operations using all elements in > > array > >delete [] array; } > > > > > > As shown in the code, I want to do some operations > > before and after the part to be paralleled with MPI, but I > > don't know how to specify where the parallel part begins and > > ends. > > > > > > (2) My current code is using OpenMP to speed up the > > comutation. > > > void f(int size) > >{ // some > > operations > >int i; > > double * array = new double > > [size]; > >omp_set_num_threads(_nb_threads); > > #pragma omp parallel shared(array) > > private(i) { > > > #pragma omp for > > schedule(dynamic) nowait > > for (i = 0; i < size; i++) // how can I use > > MPI to speed up this loop to compute all elements in the > > array? { > >array[i] = complicated_computation(); // > > time comsuming computation > >} > > } // some operations using > > all elements in array > > } > > > > > > I wonder if I change to use MPI, is it possible to > > have the code written both for OpenMP and MPI? If it is > > possible, how to write the code and how to compile and run > > the code? > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] speed up this problem by MPI
Tim wrote: Thanks, Eugene. I admit I am not that smart to understand well how to use MPI, but I did read some basic materials about it and understand how some simple problems are solved by MPI. But dealing with an array in my case, I am not certain about how to apply MPI to it. Are you saying to use send and recieve to transfer the value computed for each element from child process to parent process? You can, but typically that would entail too much communication overhead for each element. Do you allocate a copy of the array for each process? You can, but typically that would entail excessive memory consumption. Typically, one allocates only a portion of the array on each process. E.g., if the array has 10,000 elements and you have four processes, the first gets the first 2,500 elements, the second the next 2,500, and so on. Also I only need the loop that computes every element of the array to be parallelized. If you only need the initial computation of array elements to be parallelized, perhaps any of the above strategies could work. It depends on how expensive the computation of each element is. Someone said that the parallel part begins with MPI_Init and ends with MPI_Finilize, Well, usually all processes are launched in parallel. So, the parallel begins "immediately." Inter-process communications using MPI, however, must take place between the MPI_Init and MPI_Finalize calls. and one can do any serial computations before and/or after these calls. But I have wrote some MPI programs, and found that the parallel part is not restricted between MPI_Init and MPI_Finilize, but instead the whole program. If the rest part of the code has to be wrapped for process with ID 0, I have little idea about how to apply that to my case since the rest part would be the parts before and after the loop in the function and the whole in main(). I don't understand your case very clearly. I will take a guess. You could have all processes start and call MPI_Init. Then, slave processes can go to sleep, waking occasionally to check if the master has sent a signal to begin computation. The master does what it has to do and then sends wake signals. Each slave computes its portion and sends that portion back to the master. Each slave exits. The master gathers all the pieces and resumes its computation. Does that sound right?
Re: [OMPI users] speed up this problem by MPI
Thanks, Eugene. I admit I am not that smart to understand well how to use MPI, but I did read some basic materials about it and understand how some simple problems are solved by MPI. But dealing with an array in my case, I am not certain about how to apply MPI to it. Are you saying to use send and recieve to transfer the value computed for each element from child process to parent process? Do you allocate a copy of the array for each process? Also I only need the loop that computes every element of the array to be parallelized. Someone said that the parallel part begins with MPI_Init and ends with MPI_Finilize, and one can do any serial computations before and/or after these calls. But I have wrote some MPI programs, and found that the parallel part is not restricted between MPI_Init and MPI_Finilize, but instead the whole program. If the rest part of the code has to be wrapped for process with ID 0, I have little idea about how to apply that to my case since the rest part would be the parts before and after the loop in the function and the whole in main(). If someone could give a sample of how to apply MPI in my case, it will clarify a lot of my questions. Usually I can learn a lot from good examples. Thanks! --- On Thu, 1/28/10, Eugene Loh wrote: > From: Eugene Loh > Subject: Re: [OMPI users] speed up this problem by MPI > To: "Open MPI Users" > Date: Thursday, January 28, 2010, 7:30 PM > Take a look at some introductory MPI > materials to learn how to use MPI and what it's about. > There should be resources on-line... take a look around. > > The main idea is that you would have many processes, each > process would have part of the array. Thereafter, if a > process needs data or results from any other process, such > data would have to be exchanged between the processes > explicitly. > > Many codes have both OpenMP and MPI parallelization, but > you should first familiarize yourself with the basics of MPI > before dealing with "hybrid" codes. > > Tim wrote: > > > Hi, > > > > (1). I am wondering how I can speed up the > time-consuming computation in the loop of my code below > using MPI? > > int main(int argc, char > ** argv) { > // some operations > f(size); > // some > operations > return 0; > } > void f(int size) > { // some > operations > int i; > double * array = new double > [size]; > for (i = 0; i < size; i++) // how can I > use MPI to speed up this loop to compute all elements in the > array? { > array[i] = complicated_computation(); // > time comsuming computation > } > // some operations using all elements in > array > delete [] array; } > > > > As shown in the code, I want to do some operations > before and after the part to be paralleled with MPI, but I > don't know how to specify where the parallel part begins and > ends. > > > > (2) My current code is using OpenMP to speed up the > comutation. > > void f(int size) > { // some > operations > int i; > double * array = new double > [size]; > omp_set_num_threads(_nb_threads); > #pragma omp parallel shared(array) > private(i) { > > #pragma omp for > schedule(dynamic) nowait > for (i = 0; i < size; i++) // how can I use > MPI to speed up this loop to compute all elements in the > array? { > array[i] = complicated_computation(); // > time comsuming computation > } > } // some operations using > all elements in array > } > > > > I wonder if I change to use MPI, is it possible to > have the code written both for OpenMP and MPI? If it is > possible, how to write the code and how to compile and run > the code? > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] speed up this problem by MPI
Take a look at some introductory MPI materials to learn how to use MPI and what it's about. There should be resources on-line... take a look around. The main idea is that you would have many processes, each process would have part of the array. Thereafter, if a process needs data or results from any other process, such data would have to be exchanged between the processes explicitly. Many codes have both OpenMP and MPI parallelization, but you should first familiarize yourself with the basics of MPI before dealing with "hybrid" codes. Tim wrote: Hi, (1). I am wondering how I can speed up the time-consuming computation in the loop of my code below using MPI? int main(int argc, char ** argv) { // some operations f(size); // some operations return 0; } void f(int size) { // some operations int i; double * array = new double [size]; for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to compute all elements in the array? { array[i] = complicated_computation(); // time comsuming computation } // some operations using all elements in array delete [] array; } As shown in the code, I want to do some operations before and after the part to be paralleled with MPI, but I don't know how to specify where the parallel part begins and ends. (2) My current code is using OpenMP to speed up the comutation. void f(int size) { // some operations int i; double * array = new double [size]; omp_set_num_threads(_nb_threads); #pragma omp parallel shared(array) private(i) { #pragma omp for schedule(dynamic) nowait for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to compute all elements in the array? { array[i] = complicated_computation(); // time comsuming computation } } // some operations using all elements in array } I wonder if I change to use MPI, is it possible to have the code written both for OpenMP and MPI? If it is possible, how to write the code and how to compile and run the code?