Re: [OMPI devel] mpif.h on Intel build when run with OMPI_FC=gfortran
Hi Dave, .mod files are compiler dependent, so if a library is built with say gfortran, you will get a compiler error in code compiled with a different compiler if you use a "use" statement However, you may still be able to specify a different compiler on OpenMPI with the OPMI_FC if you use the F77 mode "include mpif.h" instead of the F90 mode "use mpi" and compile, build, and run without errors. kindest regards Mike On 04/03/2016, at 1:42 PM, Christopher Samuel wrote: > Hi Gilles, > > On 04/03/16 13:33, Gilles Gouaillardet wrote: > >> there is clearly no hope when you use mpi.mod and mpi_f08.mod >> my point was, it is not even possible to expect "legacy" mpif.h work >> with different compilers. > > Sorry, my knowledge of FORTRAN is limited to trying to debug why their > code wouldn't compile. :-) > > Apologies for the noise. > > All the best, > Chris > -- > Christopher SamuelSenior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.org.au/ http://twitter.com/vlsci > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/03/18679.php
Re: [OMPI devel] OMPIO vs ROMIO
Hi Sreenidhi, you need to specify --collective as an input parameter to mpi_tile_io kindest regards Mike On 11 May 2016 at 12:01, Sreenidhi Bharathkar Ramesh < sreenidhi-bharathkar.ram...@broadcom.com> wrote: > > Thank you so much for the details. > > > 1. while running the "Tile I/O" benchmark, I see the following message: > > $ mpirun -np 28 ./mpi-tile-io --nr_tiles_x 7 --nr_tiles_y 4 --sz_tile_x > 100 --sz_tile_y 100 --sz_element 32 --filename file1g > ... > # collective I/O off > > How do I enable collective I/O ? > > 2. I switched to using Open MPI v 2.0.0rc2 . How do I know which IO is > being used ? How do I switch between OMPIO and ROMIO ? > > > Please let me know. > > Thanks, > - Sreenidhi. > > > On Tue, May 10, 2016 at 7:14 PM, Edgar Gabriel > wrote: > >> in the 1.7, 1.8 and 1.10 series ROMIO remains the default. In the >> upcomgin 2.x series, OMPIO will be the default, except for Lustre file >> systems, where we will stick with ROMIO as the primary resource. >> >> Regarding performance comparison, we ran numerous tests late last year >> and early this year. It really depends on the application scenario and the >> platform that you are using. If you want to know which one should you use, >> I would definitely suggest to stick with ROMIO in the 1.10 series, since >> many of the bug fixes of OMPIO that we did in the last two years could not >> be back-ported to the 1.10 series for technical reasons. If you plan to >> switch to the 2.x series, it might be easiest to just run a couple of tests >> and compare the performance for your application on your platform, and base >> your decision on that. >> >> Edgar >> >> On 5/10/2016 6:32 AM, Sreenidhi Bharathkar Ramesh wrote: >> >> Hi, >> >> 1. During default build of OpenMPI, it looks like both ompio.la and >> romio.la are built. Which I/O MCA library is used and based on what is >> the decision taken ? >> >> 2. Are there any statistics available to compare these two - OMPIO vs >> ROMIO ? >> >> I am using OpenMPI v1.10.1. >> >> Thanks, >> - Sreenidhi. >> >> -- >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2016/05/18951.php >> > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/05/18963.php >
[OMPI devel] parameters for OMPIO
Hi, I am looking at the online FAQ for ompio which seems to show that the following parameters are defined: io_ompio_num_aggregators io_ompio_call_timing But on OMPI version 1.10.1 or 1.8.3: 1: setting mpirun -mca io ompio -mca io_ompio_coll_timing_info appears to not produce a summary. 2: io_ompio_num_aggregators is not listed as a parameter as listed by ompi_info -a | grep ompio Am I doing something wrong, or are these options not supported in these versions? kindest regards Mike
[OMPI devel] Error using MPI_Pack_external / MPI_Unpack_external
Hi, I am running Ubuntu 14.04 LTS with OpenMPI 1.6.5 and gcc 4.8.4 On a single rank program which just packs and unpacks two ints using MPI_Pack_external and MPI_Unpack_external the unpacked ints are in the wrong endian order. However, on a HPC, (not Ubuntu), using OpenMPI 1.6.5 and gcc 4.8.4 the unpacked ints are correct. Is it possible to get some assistance to track down what is going on? Here is the output from the program: ~/tests/mpi/Pack test1 send data 04d2 162e MPI_Pack_external: 0 buffer size: 8 MPI_unpack_external: 0 recv data d204 2e16 And here is the source code: #include #include int main(int argc, char *argv[]) { int numRanks, myRank, error; int send_data[2] = {1234, 5678}; int recv_data[2]; MPI_Aint buffer_size = 1000; char buffer[buffer_size]; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &numRanks); MPI_Comm_rank(MPI_COMM_WORLD, &myRank); printf("send data %08x %08x \n", send_data[0], send_data[1]); MPI_Aint position = 0; error = MPI_Pack_external("external32", (void*) send_data, 2, MPI_INT, buffer, buffer_size, &position); printf("MPI_Pack_external: %d\n", error); printf("buffer size: %d\n", (int) position); position = 0; error = MPI_Unpack_external("external32", buffer, buffer_size, &position, recv_data, 2, MPI_INT); printf("MPI_unpack_external: %d\n", error); printf("recv data %08x %08x \n", recv_data[0], recv_data[1]); MPI_Finalize(); return 0; }
Re: [OMPI devel] Error using MPI_Pack_external / MPI_Unpack_external
Hi Gilles, thanks for the prompt response and assistance. Both systems use Intel CPUs. The problem originally comes from a coupler, yac, used in climate science. There are several reported instances where the coupling tests fail. The problem occurs often enough to incorporate a workaround which is to have a compiler switch to use MPI_Pack and MPI_Unpack instead of MPI_Pack_external and MPI_Unpack_external. How do I determine how OpenMPI was configured for the package installed on Ubuntu 14.04? Is there some way to determine from the OpenMP header or other files whether --enable-heterogeneous was enabled or disabled on either system when I do not have access to the ./configure logs? So, since I have one installation that works and a similar installation that fails, I would like to determine what is causing the problem. I will try: 1: Tonight try later versions of gcc and OpenMP supplied with Ubuntu 15.10 2: Tomorrow, download and install OpenMP 1.10.2 on my Ubuntu 14.04 workstation. and send back the details. kindest regards Mike On 11 February 2016 at 14:48, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Michael, > > does your two systems have the same endianness ? > > do you know how openmpi was configure'd on both systems ? > (is --enable-heterogeneous enabled or disabled on both systems ?) > > fwiw, openmpi 1.6.5 is old now and no more maintained. > I strongly encourage you to use openmpi 1.10.2 > > Cheers, > > Gilles > > On Thursday, February 11, 2016, Michael Rezny > wrote: > >> Hi, >> I am running Ubuntu 14.04 LTS with OpenMPI 1.6.5 and gcc 4.8.4 >> >> On a single rank program which just packs and unpacks two ints using >> MPI_Pack_external and MPI_Unpack_external >> the unpacked ints are in the wrong endian order. >> >> However, on a HPC, (not Ubuntu), using OpenMPI 1.6.5 and gcc 4.8.4 the >> unpacked ints are correct. >> >> Is it possible to get some assistance to track down what is going on? >> >> Here is the output from the program: >> >> ~/tests/mpi/Pack test1 >> send data 04d2 162e >> MPI_Pack_external: 0 >> buffer size: 8 >> MPI_unpack_external: 0 >> recv data d204 2e16 >> >> And here is the source code: >> >> #include >> #include >> >> int main(int argc, char *argv[]) { >> int numRanks, myRank, error; >> >> int send_data[2] = {1234, 5678}; >> int recv_data[2]; >> >> MPI_Aint buffer_size = 1000; >> char buffer[buffer_size]; >> >> MPI_Init(&argc, &argv); >> MPI_Comm_size(MPI_COMM_WORLD, &numRanks); >> MPI_Comm_rank(MPI_COMM_WORLD, &myRank); >> >> printf("send data %08x %08x \n", send_data[0], send_data[1]); >> >> MPI_Aint position = 0; >> error = MPI_Pack_external("external32", (void*) send_data, 2, MPI_INT, >> buffer, buffer_size, &position); >> printf("MPI_Pack_external: %d\n", error); >> >> printf("buffer size: %d\n", (int) position); >> >> position = 0; >> error = MPI_Unpack_external("external32", buffer, buffer_size, >> &position, >> recv_data, 2, MPI_INT); >> printf("MPI_unpack_external: %d\n", error); >> >> printf("recv data %08x %08x \n", recv_data[0], recv_data[1]); >> >> MPI_Finalize(); >> >> return 0; >> } >> >> >> > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/02/18573.php >
Re: [OMPI devel] Error using MPI_Pack_external / MPI_Unpack_external
Hi Gilles, I can confirm that with a fresh download and build from source for OpenMPI 1.10.2 with --enable-heterogeneous the unpacked ints are the wrong endian. However, without --enable-heterogeneous, the unpacked ints are correct. So, this problem still exists in heterogeneous builds with OpenMPI version 1.10.2. kindest regards Mike On 11 February 2016 at 14:48, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Michael, > > does your two systems have the same endianness ? > > do you know how openmpi was configure'd on both systems ? > (is --enable-heterogeneous enabled or disabled on both systems ?) > > fwiw, openmpi 1.6.5 is old now and no more maintained. > I strongly encourage you to use openmpi 1.10.2 > > Cheers, > > Gilles > > On Thursday, February 11, 2016, Michael Rezny > wrote: > >> Hi, >> I am running Ubuntu 14.04 LTS with OpenMPI 1.6.5 and gcc 4.8.4 >> >> On a single rank program which just packs and unpacks two ints using >> MPI_Pack_external and MPI_Unpack_external >> the unpacked ints are in the wrong endian order. >> >> However, on a HPC, (not Ubuntu), using OpenMPI 1.6.5 and gcc 4.8.4 the >> unpacked ints are correct. >> >> Is it possible to get some assistance to track down what is going on? >> >> Here is the output from the program: >> >> ~/tests/mpi/Pack test1 >> send data 04d2 162e >> MPI_Pack_external: 0 >> buffer size: 8 >> MPI_unpack_external: 0 >> recv data d204 2e16 >> >> And here is the source code: >> >> #include >> #include >> >> int main(int argc, char *argv[]) { >> int numRanks, myRank, error; >> >> int send_data[2] = {1234, 5678}; >> int recv_data[2]; >> >> MPI_Aint buffer_size = 1000; >> char buffer[buffer_size]; >> >> MPI_Init(&argc, &argv); >> MPI_Comm_size(MPI_COMM_WORLD, &numRanks); >> MPI_Comm_rank(MPI_COMM_WORLD, &myRank); >> >> printf("send data %08x %08x \n", send_data[0], send_data[1]); >> >> MPI_Aint position = 0; >> error = MPI_Pack_external("external32", (void*) send_data, 2, MPI_INT, >> buffer, buffer_size, &position); >> printf("MPI_Pack_external: %d\n", error); >> >> printf("buffer size: %d\n", (int) position); >> >> position = 0; >> error = MPI_Unpack_external("external32", buffer, buffer_size, >> &position, >> recv_data, 2, MPI_INT); >> printf("MPI_unpack_external: %d\n", error); >> >> printf("recv data %08x %08x \n", recv_data[0], recv_data[1]); >> >> MPI_Finalize(); >> >> return 0; >> } >> >> >> > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/02/18573.php >
Re: [OMPI devel] Error using MPI_Pack_external / MPI_Unpack_external
Hi Ralph, you are indeed correct. However, many of our users have workstations such as me, with OpenMPI provided by installing a package. So we don't know what has been configured. Then we have failures, since, for instance, Ubuntu 14.04 by default appears to have been built with heterogeneous support! The other (working) machine is a large HPC, and it seems OpenMPI was built without heterogeneous support. Currently we work around the problem for packing and unpacking by having a compiler switch that will switch between calls to pack/unpack_external and pac/unpack. It is only now we started to track down what the problem actually is. kindest regards Mike On 11 February 2016 at 15:54, Ralph Castain wrote: > Out of curiosity: if both systems are Intel, they why are you enabling > hetero? You don’t need it in that scenario. > > Admittedly, we do need to fix the bug - just trying to understand why you > are configuring that way. > > > On Feb 10, 2016, at 8:46 PM, Michael Rezny > wrote: > > Hi Gilles, > I can confirm that with a fresh download and build from source for OpenMPI > 1.10.2 > with --enable-heterogeneous > the unpacked ints are the wrong endian. > > However, without --enable-heterogeneous, the unpacked ints are correct. > > So, this problem still exists in heterogeneous builds with OpenMPI version > 1.10.2. > > kindest regards > Mike > > On 11 February 2016 at 14:48, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com> wrote: > >> Michael, >> >> does your two systems have the same endianness ? >> >> do you know how openmpi was configure'd on both systems ? >> (is --enable-heterogeneous enabled or disabled on both systems ?) >> >> fwiw, openmpi 1.6.5 is old now and no more maintained. >> I strongly encourage you to use openmpi 1.10.2 >> >> Cheers, >> >> Gilles >> >> On Thursday, February 11, 2016, Michael Rezny >> wrote: >> >>> Hi, >>> I am running Ubuntu 14.04 LTS with OpenMPI 1.6.5 and gcc 4.8.4 >>> >>> On a single rank program which just packs and unpacks two ints using >>> MPI_Pack_external and MPI_Unpack_external >>> the unpacked ints are in the wrong endian order. >>> >>> However, on a HPC, (not Ubuntu), using OpenMPI 1.6.5 and gcc 4.8.4 the >>> unpacked ints are correct. >>> >>> Is it possible to get some assistance to track down what is going on? >>> >>> Here is the output from the program: >>> >>> ~/tests/mpi/Pack test1 >>> send data 04d2 162e >>> MPI_Pack_external: 0 >>> buffer size: 8 >>> MPI_unpack_external: 0 >>> recv data d204 2e16 >>> >>> And here is the source code: >>> >>> #include >>> #include >>> >>> int main(int argc, char *argv[]) { >>> int numRanks, myRank, error; >>> >>> int send_data[2] = {1234, 5678}; >>> int recv_data[2]; >>> >>> MPI_Aint buffer_size = 1000; >>> char buffer[buffer_size]; >>> >>> MPI_Init(&argc, &argv); >>> MPI_Comm_size(MPI_COMM_WORLD, &numRanks); >>> MPI_Comm_rank(MPI_COMM_WORLD, &myRank); >>> >>> printf("send data %08x %08x \n", send_data[0], send_data[1]); >>> >>> MPI_Aint position = 0; >>> error = MPI_Pack_external("external32", (void*) send_data, 2, MPI_INT, >>> buffer, buffer_size, &position); >>> printf("MPI_Pack_external: %d\n", error); >>> >>> printf("buffer size: %d\n", (int) position); >>> >>> position = 0; >>> error = MPI_Unpack_external("external32", buffer, buffer_size, >>> &position, >>> recv_data, 2, MPI_INT); >>> printf("MPI_unpack_external: %d\n", error); >>> >>> printf("recv data %08x %08x \n", recv_data[0], recv_data[1]); >>> >>> MPI_Finalize(); >>> >>> return 0; >>> } >>> >>> >>> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2016/02/18573.php >> > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/02/18575.php > > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/02/18576.php >
Re: [OMPI devel] Error using MPI_Pack_external / MPI_Unpack_external
Hi Gilles, thanks for thinking about this in more detail. I understand what you are saying, but your comments raise some questions in my mind: If one is in a homogeneous cluster, is it important that, in the case of little-endian, that the data be converted to extern32 format (big-endian), only to be always converted at the receiving rank back to little-endian? This would seem to be inefficient, especially if the site has no need for external MPI access. So, does --enable-heterogeneous do more than put MPI routines using "extern32" into straight pass-through? Back in the old days of PVM, all messages were converted into network order. This had severe performance impacts on little-endian clusters. So much so that a clever way of getting around this was an implementation of "receiver makes right" in which all data was sent in the native format of the sending rank. The receiving rank analysed the message to determine if a conversion was necessary. In those days with Cray format data, it could be more complicated than just byte swapping. So in essence, how is a balance struck between supporting heterogenous architectures and maximum performance with codes where message passing performance is critical? As a follow up, since I am now at home, this same problem also exists with the Ubuntu 15.10 OpenMP packages which surprisingly are still at 1.6.5, same as 14.04. Again, downloading, building, and using the latest stable version of OpenMP solved the problem. kindest regards Mike On 11/02/2016, at 7:31 PM, Gilles Gouaillardet wrote: > Michael, > > I think it is worst than that ... > > without --enable-heterogeneous, it seems the data is not correctly packed > (e.g. it is not converted to big endian), at least on a x86_64 arch. > unpack looks broken too, but pack followed by unpack does work. > that means if you are reading data correctly written in external32e format, > it will not be correctly unpacked. > > with --enable-heterogeneous, it is only half broken > (I do not know yet whether pack or unpack is broken ...) > and pack followed by unpack does not work. > > I will double check that tomorrow > > Cheers, > > Gilles > > On Thursday, February 11, 2016, Michael Rezny > wrote: > Hi Ralph, > you are indeed correct. However, many of our users > have workstations such as me, with OpenMPI provided by installing a package. > So we don't know what has been configured. > > Then we have failures, since, for instance, Ubuntu 14.04 by default appears > to have been built > with heterogeneous support! The other (working) machine is a large HPC, and > it seems OpenMPI was built > without heterogeneous support. > > Currently we work around the problem for packing and unpacking by having a > compiler switch > that will switch between calls to pack/unpack_external and pac/unpack. > > It is only now we started to track down what the problem actually is. > > kindest regards > Mike > > On 11 February 2016 at 15:54, Ralph Castain wrote: > Out of curiosity: if both systems are Intel, they why are you enabling > hetero? You don’t need it in that scenario. > > Admittedly, we do need to fix the bug - just trying to understand why you are > configuring that way. > > >> On Feb 10, 2016, at 8:46 PM, Michael Rezny wrote: >> >> Hi Gilles, >> I can confirm that with a fresh download and build from source for OpenMPI >> 1.10.2 >> with --enable-heterogeneous >> the unpacked ints are the wrong endian. >> >> However, without --enable-heterogeneous, the unpacked ints are correct. >> >> So, this problem still exists in heterogeneous builds with OpenMPI version >> 1.10.2. >> >> kindest regards >> Mike >> >> On 11 February 2016 at 14:48, Gilles Gouaillardet >> wrote: >> Michael, >> >> does your two systems have the same endianness ? >> >> do you know how openmpi was configure'd on both systems ? >> (is --enable-heterogeneous enabled or disabled on both systems ?) >> >> fwiw, openmpi 1.6.5 is old now and no more maintained. >> I strongly encourage you to use openmpi 1.10.2 >> >> Cheers, >> >> Gilles >> >> On Thursday, February 11, 2016, Michael Rezny >> wrote: >> Hi, >> I am running Ubuntu 14.04 LTS with OpenMPI 1.6.5 and gcc 4.8.4 >> >> On a single rank program which just packs and unpacks two ints using >> MPI_Pack_external and MPI_Unpack_external >> the unpacked ints are in the wrong endian order. >> >> However, on a HPC, (not Ubuntu), using OpenMPI 1.6.5 and gcc 4.8.4 the >> unpacked ints are correct. >> >> Is it possible to g
Re: [OMPI devel] Error using MPI_Pack_external / MPI_Unpack_external
Hi Gilles, I enhanced my simple test program to dump the contents of the buffer: If I am not mistaken, it appears that the unpack is not doing the endian conversion. kindest regards Mike Good: send data 04d2 162e MPI_Pack_external: 0 buffer size: 8 Buffer contents d2, 04, 00, 00, 2e, 16, 00, 00, MPI_unpack_external: 0 recv data 04d2 162e Bad: --enable-heterogeneous send data 04d2 162e MPI_Pack_external: 0 buffer size: 8 Buffer contents d2, 04, 00, 00, 2e, 16, 00, 00, MPI_unpack_external: 0 recv data d204 2e16 kindest regards Mike On 11 February 2016 at 19:31, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Michael, > > I think it is worst than that ... > > without --enable-heterogeneous, it seems the data is not correctly packed > (e.g. it is not converted to big endian), at least on a x86_64 arch. > unpack looks broken too, but pack followed by unpack does work. > that means if you are reading data correctly written in external32e format, > it will not be correctly unpacked. > > with --enable-heterogeneous, it is only half broken > (I do not know yet whether pack or unpack is broken ...) > and pack followed by unpack does not work. > > I will double check that tomorrow > > Cheers, > > Gilles > > On Thursday, February 11, 2016, Michael Rezny > wrote: > >> Hi Ralph, >> you are indeed correct. However, many of our users >> have workstations such as me, with OpenMPI provided by installing a >> package. >> So we don't know what has been configured. >> >> Then we have failures, since, for instance, Ubuntu 14.04 by default >> appears to have been built >> with heterogeneous support! The other (working) machine is a large HPC, >> and it seems OpenMPI was built >> without heterogeneous support. >> >> Currently we work around the problem for packing and unpacking by having >> a compiler switch >> that will switch between calls to pack/unpack_external and pac/unpack. >> >> It is only now we started to track down what the problem actually is. >> >> kindest regards >> Mike >> >> On 11 February 2016 at 15:54, Ralph Castain wrote: >> >>> Out of curiosity: if both systems are Intel, they why are you enabling >>> hetero? You don’t need it in that scenario. >>> >>> Admittedly, we do need to fix the bug - just trying to understand why >>> you are configuring that way. >>> >>> >>> On Feb 10, 2016, at 8:46 PM, Michael Rezny >>> wrote: >>> >>> Hi Gilles, >>> I can confirm that with a fresh download and build from source for >>> OpenMPI 1.10.2 >>> with --enable-heterogeneous >>> the unpacked ints are the wrong endian. >>> >>> However, without --enable-heterogeneous, the unpacked ints are correct. >>> >>> So, this problem still exists in heterogeneous builds with OpenMPI >>> version 1.10.2. >>> >>> kindest regards >>> Mike >>> >>> On 11 February 2016 at 14:48, Gilles Gouaillardet < >>> gilles.gouaillar...@gmail.com> wrote: >>> >>>> Michael, >>>> >>>> does your two systems have the same endianness ? >>>> >>>> do you know how openmpi was configure'd on both systems ? >>>> (is --enable-heterogeneous enabled or disabled on both systems ?) >>>> >>>> fwiw, openmpi 1.6.5 is old now and no more maintained. >>>> I strongly encourage you to use openmpi 1.10.2 >>>> >>>> Cheers, >>>> >>>> Gilles >>>> >>>> On Thursday, February 11, 2016, Michael Rezny >>>> wrote: >>>> >>>>> Hi, >>>>> I am running Ubuntu 14.04 LTS with OpenMPI 1.6.5 and gcc 4.8.4 >>>>> >>>>> On a single rank program which just packs and unpacks two ints using >>>>> MPI_Pack_external and MPI_Unpack_external >>>>> the unpacked ints are in the wrong endian order. >>>>> >>>>> However, on a HPC, (not Ubuntu), using OpenMPI 1.6.5 and gcc 4.8.4 the >>>>> unpacked ints are correct. >>>>> >>>>> Is it possible to get some assistance to track down what is going on? >>>>> >>>>> Here is the output from the program: >>>>> >>>>> ~/tests/mpi/Pack test1 >>>>> send data 04d2 162e >>>>> MPI_Pack_external: 0 >>>>> buffer size: 8 >>>>> MPI_unpack_external: 0 >>>>> r
Re: [OMPI devel] Error using MPI_Pack_external / MPI_Unpack_external
Hi, oh, that is good news! The process is meant to be implementing "receiver makes right" which is good news for efficiency. But, in the second case, without --enable-heterogeneous, are you saying that on little-endian machines, byte swapping is meant to always occur? That seems most odd. I would have thought that if one only wants to work and then to configure OpenMPI for this mode, then there is no need to check at the receiving end whether byte-swapping is needed or not. It will be assumed that both sender and receiver are agreed on the format, whatever it is. On a homogeneous little-endian HPC cluster one would not want the extra overhead of two conversions for every packed message. Is it possible that the assert has been implemented incorrectly in this case? There is absolutely no urgency with regard to a fix. Thanks to your quick response, we now understand what is causing the problem and are in the process of implementing a test in ./configure to determine if the bug is present, and if so, add a compiler flag to switch to using MPI_Pack and MPI_Unpack. It would be good if you would be kind enough to let me know when a fix is available and I will download, build, and test it on our application. Then this version can be installed as the default. Once again, many thanks for your prompt and most helpful responses. warmest regards MIke On 12/02/2016, at 7:03 PM, Gilles Gouaillardet wrote: > Michael, > > i'd like to correct what i wrote earlier > > in heterogeneous clusters, data is sent "as is" (e.g. no byte swapping) and > it is byte swapped when received and only if needed. > > with --enable-heterogeneous, MPI_Unpack_external is working, but > MPI_Pack_external is broken > (e.g. no byte swapping occurs on little endian arch) since we internall use > the similar mechanism used to send data. that is a bug and i will work on > that. > > without --enable-heterogeneous, MPI_Pack_external nor MPI_Unpack_external do > any byte swapping and they > are both broken. fwiw, it you configure'd with --enable-debug, you would have > ran into an assert error (e.g. crash). > > i will work on a fix, but it might take some time before it is ready > > Cheers, > > Gilles > On 2/11/2016 6:16 PM, Gilles Gouaillardet wrote: >> Michael, >> >> MPI_Pack_external must convert data to big endian, so it can be dumped into >> a file, and be read correctly on big and little endianness arch, and with >> any MPI flavor. >> >> if you use only one MPI library on one arch, or if data is never >> read/written from/to a file, then it is more efficient to MPI_Pack. >> >> openmpi is optimized and the data is swapped only when needed. >> so if your cluster is little endian only, MPI_Send and MPI_Recv will never >> byte swap data internally. >> if both ends have different endianness, data is sent in big endian format >> and byte swapped when received only if needed. >> generally speaking, a send/recv requires zero or one byte swap. >> >> fwiw, we previously had a claim that debian nor Ubuntu have a maintainer for >> openmpi, which would explain why an obsolete version is shipped. I made a >> few researchs and could not find any evidence openmpi is no more maintained. >> >> Cheers, >> >> Gilles >> >> >> >> On Thursday, February 11, 2016, Michael Rezny >> wrote: >> Hi Gilles, >> thanks for thinking about this in more detail. >> >> I understand what you are saying, but your comments raise some questions in >> my mind: >> >> If one is in a homogeneous cluster, is it important that, in the case of >> little-endian, that the data be >> converted to extern32 format (big-endian), only to be always converted at >> the receiving rank >> back to little-endian? >> >> This would seem to be inefficient, especially if the site has no need for >> external MPI access. >> >> So, does --enable-heterogeneous do more than put MPI routines using >> "extern32" into straight pass-through? >> >> Back in the old days of PVM, all messages were converted into network order. >> This had severe performance impacts >> on little-endian clusters. >> >> So much so that a clever way of getting around this was an implementation of >> "receiver makes right" in which >> all data was sent in the native format of the sending rank. The receiving >> rank analysed the message to determine if >> a conversion was necessary. In those days with Cray format data, it could be >> more complicated than just byte swapping. >> >> So in essence, how is a balance struck bet
Re: [OMPI devel] Error using MPI_Pack_external / MPI_Unpack_external
Hi Gilles, I am misunderstanding something here. What you are now saying seems, to me, to be at odds with what you said previously. Assume the situation where both sender and receiver are little-endian, and discussing only MPI_Pack_external, and MPI_Unpack_external Consider case 1 --enable-heterogeneous: In your previous email I understood that "receiver make right" was being implemented So, sender does not byte-swap, and message is sent in (native) little-endian format. Receiver recognises the received message is in little-endian format and since this is also its native format, no byte swap is needed. Consider case 2 --disable-heterogeneous It seems strange, that, in this case, any byte swapping would ever need to occur. One is assuming a homogeneous system and sender and receiver will always be using their native format. i.e, exactly the same as MPI_Pack and MPI_Unpack. kindest regards Mike On 12/02/2016, at 9:25 PM, Gilles Gouaillardet wrote: > Michael, > > byte swapping only occurs if you invoke MPI_Pack_external and > MPI_Unpack_external on little endianness systems. > > MPI_Pack and MPI_Unpack uses the same engine that MPI_Send and MPI_Recv and > this does not involve any byte swapping if both ends have the same endianness. > > Cheers, > > Gilles > > On Friday, February 12, 2016, Michael Rezny wrote: > Hi, > oh, that is good news! The process is meant to be implementing "receiver > makes right" which is good news for efficiency. > > But, in the second case, without --enable-heterogeneous, are you saying that > on little-endian machines, byte swapping > is meant to always occur? That seems most odd. I would have thought that if > one only wants to work and then to configure > OpenMPI for this mode, then there is no need to check at the receiving end > whether byte-swapping is needed or not. It will be assumed > that both sender and receiver are agreed on the format, whatever it is. On a > homogeneous little-endian HPC cluster one would not want > the extra overhead of two conversions for every packed message. > > Is it possible that the assert has been implemented incorrectly in this case? > > There is absolutely no urgency with regard to a fix. Thanks to your quick > response, we now understand what is causing > the problem and are in the process of implementing a test in ./configure to > determine if the bug is present, and if so, > add a compiler flag to switch to using MPI_Pack and MPI_Unpack. > > It would be good if you would be kind enough to let me know when a fix is > available and I will download, build, > and test it on our application. Then this version can be installed as the > default. > > Once again, many thanks for your prompt and most helpful responses. > > warmest regards > MIke > > On 12/02/2016, at 7:03 PM, Gilles Gouaillardet wrote: > >> Michael, >> >> i'd like to correct what i wrote earlier >> >> in heterogeneous clusters, data is sent "as is" (e.g. no byte swapping) and >> it is byte swapped when received and only if needed. >> >> with --enable-heterogeneous, MPI_Unpack_external is working, but >> MPI_Pack_external is broken >> (e.g. no byte swapping occurs on little endian arch) since we internall use >> the similar mechanism used to send data. that is a bug and i will work on >> that. >> >> without --enable-heterogeneous, MPI_Pack_external nor MPI_Unpack_external do >> any byte swapping and they >> are both broken. fwiw, it you configure'd with --enable-debug, you would >> have ran into an assert error (e.g. crash). >> >> i will work on a fix, but it might take some time before it is ready >> >> Cheers, >> >> Gilles >> On 2/11/2016 6:16 PM, Gilles Gouaillardet wrote: >>> Michael, >>> >>> MPI_Pack_external must convert data to big endian, so it can be dumped into >>> a file, and be read correctly on big and little endianness arch, and with >>> any MPI flavor. >>> >>> if you use only one MPI library on one arch, or if data is never >>> read/written from/to a file, then it is more efficient to MPI_Pack. >>> >>> openmpi is optimized and the data is swapped only when needed. >>> so if your cluster is little endian only, MPI_Send and MPI_Recv will never >>> byte swap data internally. >>> if both ends have different endianness, data is sent in big endian format >>> and byte swapped when received only if needed. >>> generally speaking, a send/recv requires zero or one byte swap. >>> >>> fwiw, we previously had a claim that debian nor Ubuntu have a ma
Re: [OMPI devel] Error using MPI_Pack_external / MPI_Unpack_external
Hi Gilles, thanks for the detailed explanation. Have a nice weekend Mike On 12/02/2016, at 11:23 PM, Gilles Gouaillardet wrote: > Michael, > > Per the specifications, MPI_Pack_external and MPI_Unpack_external must > pack/unpack to/from big endian, regardless the endianness of the host. > On a little endian system, byte swapping must occur because this is what you > are explicitly requesting. > These functions are really meant to be used in order to write a buffer to a > file, so it can be read on an other arch, and potentially with an other MPI > library (see the man page) > > Today, this is not the case and these are two bugs. > 1. with --enable-heterogeneous, MPI_Pack_external does not do any byte > swapping on little endian arch, so your test fails. > 2. without --enable-heterogeneous, nor MPI_Pack_external nor > MPI_Unpack_external does any byte swapping. Even if your test is working > fine, keep in mind the buffer is not in big endian format, and should not be > dumped into a file if you plan to read it later with a bug free > MPI_Unpack_external. > > Once the bugs are fixed, > If you want to run on a heterogeneous cluster, you have to > - configure with --enable-heterogeneous > - use MPI_Pack_external and MPI_unpack_external if you want to pack a > message, send it to an other host with type MPI_PACKED, and unpack it there. > - not use MPI_Pack/MPI_Unpack to send/recv messages between hosts with > different endianness. > > If you are only transferring predefined and derived datatypes, you have > nothing to do, > Openmpi will automatically swap bytes on the receiver side if needed. > > If you want to run on a homogeneous system, you do not need > --enable-heterogeneous, and you can use MPI_Pack/MPI_Unpack, that is more > efficient than MPI_Pack_external/MPI_Unpack_external to send/recv messages. > > > > For the time being, you are not able to write portable data with > MPI_Pack_external. > The easiest way is to run on a homogeneous cluster, configure openmpi without > --enable-heterogeneous and without --enable-debug, so pack/unpack will work > regardless you use the external or the non external subroutines. > Generally speaking, I recommend you use derived datatypes instead of manually > packing/unpacking data to/from buffers. > > Cheers, > > Gilles > > On Friday, February 12, 2016, Michael Rezny wrote: > Hi Gilles, > I am misunderstanding something here. What you are now saying seems, to me, > to be at odds with what you said previously. > > Assume the situation where both sender and receiver are little-endian, and > discussing only MPI_Pack_external, and MPI_Unpack_external > > Consider case 1 --enable-heterogeneous: > In your previous email I understood that "receiver make right" was being > implemented > So, sender does not byte-swap, and message is sent in (native) little-endian > format. > Receiver recognises the received message is in little-endian format and since > this is also its native format, no byte swap is needed. > > Consider case 2 --disable-heterogeneous > It seems strange, that, in this case, any byte swapping would ever need to > occur. > One is assuming a homogeneous system and sender and receiver will always be > using their native format. > i.e, exactly the same as MPI_Pack and MPI_Unpack. > > kindest regards > Mike > > On 12/02/2016, at 9:25 PM, Gilles Gouaillardet wrote: > >> Michael, >> >> byte swapping only occurs if you invoke MPI_Pack_external and >> MPI_Unpack_external on little endianness systems. >> >> MPI_Pack and MPI_Unpack uses the same engine that MPI_Send and MPI_Recv and >> this does not involve any byte swapping if both ends have the same >> endianness. >> >> Cheers, >> >> Gilles >> >> On Friday, February 12, 2016, Michael Rezny wrote: >> Hi, >> oh, that is good news! The process is meant to be implementing "receiver >> makes right" which is good news for efficiency. >> >> But, in the second case, without --enable-heterogeneous, are you saying that >> on little-endian machines, byte swapping >> is meant to always occur? That seems most odd. I would have thought that if >> one only wants to work and then to configure >> OpenMPI for this mode, then there is no need to check at the receiving end >> whether byte-swapping is needed or not. It will be assumed >> that both sender and receiver are agreed on the format, whatever it is. On a >> homogeneous little-endian HPC cluster one would not want >> the extra overhead of two conversions for every packed message. >> >> Is it possible that