Michael, byte swapping only occurs if you invoke MPI_Pack_external and MPI_Unpack_external on little endianness systems.
MPI_Pack and MPI_Unpack uses the same engine that MPI_Send and MPI_Recv and this does not involve any byte swapping if both ends have the same endianness. Cheers, Gilles On Friday, February 12, 2016, Michael Rezny <michael.re...@monash.edu> wrote: > Hi, > oh, that is good news! The process is meant to be implementing "receiver > makes right" which is good news for efficiency. > > But, in the second case, without --enable-heterogeneous, are you saying > that on little-endian machines, byte swapping > is meant to always occur? That seems most odd. I would have thought that > if one only wants to work and then to configure > OpenMPI for this mode, then there is no need to check at the receiving end > whether byte-swapping is needed or not. It will be assumed > that both sender and receiver are agreed on the format, whatever it is. On > a homogeneous little-endian HPC cluster one would not want > the extra overhead of two conversions for every packed message. > > Is it possible that the assert has been implemented incorrectly in this > case? > > There is absolutely no urgency with regard to a fix. Thanks to your quick > response, we now understand what is causing > the problem and are in the process of implementing a test in ./configure > to determine if the bug is present, and if so, > add a compiler flag to switch to using MPI_Pack and MPI_Unpack. > > It would be good if you would be kind enough to let me know when a fix is > available and I will download, build, > and test it on our application. Then this version can be installed as the > default. > > Once again, many thanks for your prompt and most helpful responses. > > warmest regards > MIke > > On 12/02/2016, at 7:03 PM, Gilles Gouaillardet wrote: > > Michael, > > i'd like to correct what i wrote earlier > > in heterogeneous clusters, data is sent "as is" (e.g. no byte swapping) > and it is byte swapped when received and only if needed. > > with --enable-heterogeneous, MPI_Unpack_external is working, but > MPI_Pack_external is broken > (e.g. no byte swapping occurs on little endian arch) since we internall > use the similar mechanism used to send data. that is a bug and i will work > on that. > > without --enable-heterogeneous, MPI_Pack_external nor MPI_Unpack_external > do any byte swapping and they > are both broken. fwiw, it you configure'd with --enable-debug, you would > have ran into an assert error (e.g. crash). > > i will work on a fix, but it might take some time before it is ready > > Cheers, > > Gilles > On 2/11/2016 6:16 PM, Gilles Gouaillardet wrote: > > Michael, > > MPI_Pack_external must convert data to big endian, so it can be dumped > into a file, and be read correctly on big and little endianness arch, and > with any MPI flavor. > > if you use only one MPI library on one arch, or if data is never > read/written from/to a file, then it is more efficient to MPI_Pack. > > openmpi is optimized and the data is swapped only when needed. > so if your cluster is little endian only, MPI_Send and MPI_Recv will never > byte swap data internally. > if both ends have different endianness, data is sent in big endian format > and byte swapped when received only if needed. > generally speaking, a send/recv requires zero or one byte swap. > > fwiw, we previously had a claim that debian nor Ubuntu have a maintainer > for openmpi, which would explain why an obsolete version is shipped. I made > a few researchs and could not find any evidence openmpi is no more > maintained. > > Cheers, > > Gilles > > > > On Thursday, February 11, 2016, Michael Rezny < > <javascript:_e(%7B%7D,'cvml','michael.re...@monash.edu');> > michael.re...@monash.edu > <javascript:_e(%7B%7D,'cvml','michael.re...@monash.edu');>> wrote: > >> Hi Gilles, >> thanks for thinking about this in more detail. >> >> I understand what you are saying, but your comments raise some questions >> in my mind: >> >> If one is in a homogeneous cluster, is it important that, in the case of >> little-endian, that the data be >> converted to extern32 format (big-endian), only to be always converted at >> the receiving rank >> back to little-endian? >> >> This would seem to be inefficient, especially if the site has no need for >> external MPI access. >> >> So, does --enable-heterogeneous do more than put MPI routines using >> "extern32" into straight pass-through? >> >> Back in the old days of PVM, all messages were converted into network >> order. This had severe performance impacts >> on little-endian clusters. >> >> So much so that a clever way of getting around this was an implementation >> of "receiver makes right" in which >> all data was sent in the native format of the sending rank. The receiving >> rank analysed the message to determine if >> a conversion was necessary. In those days with Cray format data, it could >> be more complicated than just byte swapping. >> >> So in essence, how is a balance struck between supporting heterogenous >> architectures and maximum performance >> with codes where message passing performance is critical? >> >> As a follow up, since I am now at home, this same problem also exists >> with the Ubuntu 15.10 OpenMP packages >> which surprisingly are still at 1.6.5, same as 14.04. >> >> Again, downloading, building, and using the latest stable version of >> OpenMP solved the problem. >> >> kindest regards >> Mike >> >> >> On 11/02/2016, at 7:31 PM, Gilles Gouaillardet wrote: >> >> Michael, >> >> I think it is worst than that ... >> >> without --enable-heterogeneous, it seems the data is not correctly packed >> (e.g. it is not converted to big endian), at least on a x86_64 arch. >> unpack looks broken too, but pack followed by unpack does work. >> that means if you are reading data correctly written in external32e >> format, >> it will not be correctly unpacked. >> >> with --enable-heterogeneous, it is only half broken >> (I do not know yet whether pack or unpack is broken ...) >> and pack followed by unpack does not work. >> >> I will double check that tomorrow >> >> Cheers, >> >> Gilles >> >> On Thursday, February 11, 2016, Michael Rezny <michael.re...@monash.edu> >> wrote: >> >>> Hi Ralph, >>> you are indeed correct. However, many of our users >>> have workstations such as me, with OpenMPI provided by installing a >>> package. >>> So we don't know what has been configured. >>> >>> Then we have failures, since, for instance, Ubuntu 14.04 by default >>> appears to have been built >>> with heterogeneous support! The other (working) machine is a large HPC, >>> and it seems OpenMPI was built >>> without heterogeneous support. >>> >>> Currently we work around the problem for packing and unpacking by having >>> a compiler switch >>> that will switch between calls to pack/unpack_external and pac/unpack. >>> >>> It is only now we started to track down what the problem actually is. >>> >>> kindest regards >>> Mike >>> >>> On 11 February 2016 at 15:54, Ralph Castain <r...@open-mpi.org >>> <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>> wrote: >>> >>>> Out of curiosity: if both systems are Intel, they why are you enabling >>>> hetero? You don’t need it in that scenario. >>>> >>>> Admittedly, we do need to fix the bug - just trying to understand why >>>> you are configuring that way. >>>> >>>> >>>> On Feb 10, 2016, at 8:46 PM, Michael Rezny <michael.re...@monash.edu >>>> <javascript:_e(%7B%7D,'cvml','michael.re...@monash.edu');>> wrote: >>>> >>>> Hi Gilles, >>>> I can confirm that with a fresh download and build from source for >>>> OpenMPI 1.10.2 >>>> with --enable-heterogeneous >>>> the unpacked ints are the wrong endian. >>>> >>>> However, without --enable-heterogeneous, the unpacked ints are correct. >>>> >>>> So, this problem still exists in heterogeneous builds with OpenMPI >>>> version 1.10.2. >>>> >>>> kindest regards >>>> Mike >>>> >>>> On 11 February 2016 at 14:48, Gilles Gouaillardet < >>>> gilles.gouaillar...@gmail.com >>>> <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>> wrote: >>>> >>>>> Michael, >>>>> >>>>> does your two systems have the same endianness ? >>>>> >>>>> do you know how openmpi was configure'd on both systems ? >>>>> (is --enable-heterogeneous enabled or disabled on both systems ?) >>>>> >>>>> fwiw, openmpi 1.6.5 is old now and no more maintained. >>>>> I strongly encourage you to use openmpi 1.10.2 >>>>> >>>>> Cheers, >>>>> >>>>> Gilles >>>>> >>>>> On Thursday, February 11, 2016, Michael Rezny < >>>>> michael.re...@monash.edu >>>>> <javascript:_e(%7B%7D,'cvml','michael.re...@monash.edu');>> wrote: >>>>> >>>>>> Hi, >>>>>> I am running Ubuntu 14.04 LTS with OpenMPI 1.6.5 and gcc 4.8.4 >>>>>> >>>>>> On a single rank program which just packs and unpacks two ints using >>>>>> MPI_Pack_external and MPI_Unpack_external >>>>>> the unpacked ints are in the wrong endian order. >>>>>> >>>>>> However, on a HPC, (not Ubuntu), using OpenMPI 1.6.5 and gcc 4.8.4 >>>>>> the unpacked ints are correct. >>>>>> >>>>>> Is it possible to get some assistance to track down what is going on? >>>>>> >>>>>> Here is the output from the program: >>>>>> >>>>>> ~/tests/mpi/Pack test1 >>>>>> send data 000004d2 0000162e >>>>>> MPI_Pack_external: 0 >>>>>> buffer size: 8 >>>>>> MPI_unpack_external: 0 >>>>>> recv data d2040000 2e160000 >>>>>> >>>>>> And here is the source code: >>>>>> >>>>>> #include <stdio.h> >>>>>> #include <mpi.h> >>>>>> >>>>>> int main(int argc, char *argv[]) { >>>>>> int numRanks, myRank, error; >>>>>> >>>>>> int send_data[2] = {1234, 5678}; >>>>>> int recv_data[2]; >>>>>> >>>>>> MPI_Aint buffer_size = 1000; >>>>>> char buffer[buffer_size]; >>>>>> >>>>>> MPI_Init(&argc, &argv); >>>>>> MPI_Comm_size(MPI_COMM_WORLD, &numRanks); >>>>>> MPI_Comm_rank(MPI_COMM_WORLD, &myRank); >>>>>> >>>>>> printf("send data %08x %08x \n", send_data[0], send_data[1]); >>>>>> >>>>>> MPI_Aint position = 0; >>>>>> error = MPI_Pack_external("external32", (void*) send_data, 2, >>>>>> MPI_INT, >>>>>> buffer, buffer_size, &position); >>>>>> printf("MPI_Pack_external: %d\n", error); >>>>>> >>>>>> printf("buffer size: %d\n", (int) position); >>>>>> >>>>>> position = 0; >>>>>> error = MPI_Unpack_external("external32", buffer, buffer_size, >>>>>> &position, >>>>>> recv_data, 2, MPI_INT); >>>>>> printf("MPI_unpack_external: %d\n", error); >>>>>> >>>>>> printf("recv data %08x %08x \n", recv_data[0], recv_data[1]); >>>>>> >>>>>> MPI_Finalize(); >>>>>> >>>>>> return 0; >>>>>> } >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');> >>>>> Subscription: <http://www.open-mpi.org/mailman/listinfo.cgi/devel> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> <http://www.open-mpi.org/community/lists/devel/2016/02/18573.php> >>>>> http://www.open-mpi.org/community/lists/devel/2016/02/18573.php >>>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> Subscription: <http://www.open-mpi.org/mailman/listinfo.cgi/devel> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>>> <http://www.open-mpi.org/community/lists/devel/2016/02/18575.php> >>>> http://www.open-mpi.org/community/lists/devel/2016/02/18575.php >>>> >>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> Subscription: <http://www.open-mpi.org/mailman/listinfo.cgi/devel> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>>> <http://www.open-mpi.org/community/lists/devel/2016/02/18576.php> >>>> http://www.open-mpi.org/community/lists/devel/2016/02/18576.php >>>> >>> >>> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2016/02/18579.php >> >> >> > > _______________________________________________ > devel mailing listde...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/02/18582.php > > > _______________________________________________ > devel mailing list > de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/02/18591.php > > >