Michael,

byte swapping only occurs if you invoke MPI_Pack_external and
MPI_Unpack_external on little endianness systems.

MPI_Pack and MPI_Unpack uses the same engine that MPI_Send and MPI_Recv and
this does not involve any byte swapping if both ends have the same
endianness.

Cheers,

Gilles

On Friday, February 12, 2016, Michael Rezny <michael.re...@monash.edu>
wrote:

> Hi,
> oh, that is good news! The process is meant to be implementing "receiver
> makes right" which is good news for efficiency.
>
> But, in the second case, without --enable-heterogeneous, are you saying
> that on little-endian machines, byte swapping
> is meant to always occur? That seems most odd. I would have thought that
> if one only wants to work and then to configure
> OpenMPI for this mode, then there is no need to check at the receiving end
> whether byte-swapping is needed or not. It will be assumed
> that both sender and receiver are agreed on the format, whatever it is. On
> a homogeneous little-endian HPC cluster one would not want
> the extra overhead of two conversions for every packed message.
>
> Is it possible that the assert has been implemented incorrectly in this
> case?
>
> There is absolutely no urgency with regard to a fix. Thanks to your quick
> response, we now understand what is causing
> the problem and are in the process of implementing a test in ./configure
> to determine if the bug is present, and if so,
> add a compiler flag to switch to using MPI_Pack and MPI_Unpack.
>
> It would be good if you would be kind enough to let me know when a fix is
> available and I will download, build,
> and test it on our application. Then this version can be installed as the
> default.
>
> Once again, many thanks for your prompt and most helpful responses.
>
> warmest regards
> MIke
>
> On 12/02/2016, at 7:03 PM, Gilles Gouaillardet wrote:
>
> Michael,
>
> i'd like to correct what i wrote earlier
>
> in heterogeneous clusters, data is sent "as is" (e.g. no byte swapping)
> and it is byte swapped when received and only if needed.
>
> with --enable-heterogeneous, MPI_Unpack_external is working, but
> MPI_Pack_external is broken
> (e.g. no byte swapping occurs on little endian arch) since we internall
> use the similar mechanism used to send data. that is a bug and i will work
> on that.
>
> without --enable-heterogeneous, MPI_Pack_external nor MPI_Unpack_external
> do any byte swapping and they
> are both broken. fwiw, it you configure'd with --enable-debug, you would
> have ran into an assert error (e.g. crash).
>
> i will work on a fix, but it might take some time before it is ready
>
> Cheers,
>
> Gilles
> On 2/11/2016 6:16 PM, Gilles Gouaillardet wrote:
>
> Michael,
>
> MPI_Pack_external must convert data to big endian, so it can be dumped
> into a file, and be read correctly on big and little endianness arch, and
> with any MPI flavor.
>
> if you use only one MPI library on one arch, or if data is never
> read/written from/to a file, then it is more efficient to MPI_Pack.
>
> openmpi is optimized and the data is swapped only when needed.
> so if your cluster is little endian only, MPI_Send and MPI_Recv will never
> byte swap data internally.
> if both ends have different endianness, data is sent in big endian format
> and byte swapped when received only if needed.
> generally speaking, a send/recv requires zero or one byte swap.
>
> fwiw, we previously had a claim that debian nor Ubuntu have a maintainer
> for openmpi, which would explain why an obsolete version is shipped. I made
> a few researchs and could not find any evidence openmpi is no more
> maintained.
>
> Cheers,
>
> Gilles
>
>
>
> On Thursday, February 11, 2016, Michael Rezny <
> <javascript:_e(%7B%7D,'cvml','michael.re...@monash.edu');>
> michael.re...@monash.edu
> <javascript:_e(%7B%7D,'cvml','michael.re...@monash.edu');>> wrote:
>
>> Hi Gilles,
>> thanks for thinking about this in more detail.
>>
>> I understand what you are saying, but your comments raise some questions
>> in my mind:
>>
>> If one is in a homogeneous cluster, is it important that, in the case of
>> little-endian, that the data be
>> converted to extern32 format (big-endian), only to be always converted at
>> the receiving rank
>> back to little-endian?
>>
>> This would seem to be inefficient, especially if the site has no need for
>> external MPI access.
>>
>> So, does --enable-heterogeneous do more than put MPI routines using
>> "extern32" into straight pass-through?
>>
>> Back in the old days of PVM, all messages were converted into network
>> order. This had severe performance impacts
>> on little-endian clusters.
>>
>> So much so that a clever way of getting around this was an implementation
>> of "receiver makes right" in which
>> all data was sent in the native format of the sending rank. The receiving
>> rank analysed the message to determine if
>> a conversion was necessary. In those days with Cray format data, it could
>> be more complicated than just byte swapping.
>>
>> So in essence, how is a balance struck between supporting heterogenous
>> architectures and maximum performance
>> with codes where message passing performance is critical?
>>
>> As a follow up, since I am now at home, this same problem also exists
>> with the Ubuntu 15.10 OpenMP packages
>> which surprisingly are still at 1.6.5, same as 14.04.
>>
>> Again, downloading, building, and using the latest stable version of
>> OpenMP solved the problem.
>>
>> kindest regards
>> Mike
>>
>>
>> On 11/02/2016, at 7:31 PM, Gilles Gouaillardet wrote:
>>
>> Michael,
>>
>> I think it is worst than that ...
>>
>> without --enable-heterogeneous, it seems the data is not correctly packed
>> (e.g. it is not converted to big endian), at least on a x86_64 arch.
>> unpack looks broken too, but pack followed by unpack does work.
>> that means if you are reading data correctly written in external32e
>> format,
>> it will not be correctly unpacked.
>>
>> with --enable-heterogeneous, it is only half broken
>> (I do not know yet whether pack or unpack is broken ...)
>> and pack followed by unpack does not work.
>>
>> I will double check that tomorrow
>>
>> Cheers,
>>
>> Gilles
>>
>> On Thursday, February 11, 2016, Michael Rezny <michael.re...@monash.edu>
>> wrote:
>>
>>> Hi Ralph,
>>> you are indeed correct. However, many of our users
>>> have workstations such as me, with OpenMPI provided by installing a
>>> package.
>>> So we don't know what has been configured.
>>>
>>> Then we have failures, since, for instance, Ubuntu 14.04 by default
>>> appears to have been built
>>> with heterogeneous support! The other (working) machine is a large HPC,
>>> and it seems OpenMPI was built
>>> without heterogeneous support.
>>>
>>> Currently we work around the problem for packing and unpacking by having
>>> a compiler switch
>>> that will switch between calls to pack/unpack_external and pac/unpack.
>>>
>>> It is only now we started to track down what the problem actually is.
>>>
>>> kindest regards
>>> Mike
>>>
>>> On 11 February 2016 at 15:54, Ralph Castain <r...@open-mpi.org
>>> <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>> wrote:
>>>
>>>> Out of curiosity: if both systems are Intel, they why are you enabling
>>>> hetero? You don’t need it in that scenario.
>>>>
>>>> Admittedly, we do need to fix the bug - just trying to understand why
>>>> you are configuring that way.
>>>>
>>>>
>>>> On Feb 10, 2016, at 8:46 PM, Michael Rezny <michael.re...@monash.edu
>>>> <javascript:_e(%7B%7D,'cvml','michael.re...@monash.edu');>> wrote:
>>>>
>>>> Hi Gilles,
>>>> I can confirm that with a fresh download and build from source for
>>>> OpenMPI 1.10.2
>>>> with --enable-heterogeneous
>>>> the unpacked ints are the wrong endian.
>>>>
>>>> However, without --enable-heterogeneous, the unpacked ints are correct.
>>>>
>>>> So, this problem still exists in heterogeneous builds with OpenMPI
>>>> version 1.10.2.
>>>>
>>>> kindest regards
>>>> Mike
>>>>
>>>> On 11 February 2016 at 14:48, Gilles Gouaillardet <
>>>> gilles.gouaillar...@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>> wrote:
>>>>
>>>>> Michael,
>>>>>
>>>>> does your two systems have the same endianness ?
>>>>>
>>>>> do you know how openmpi was configure'd on both systems ?
>>>>> (is --enable-heterogeneous enabled or disabled on both systems ?)
>>>>>
>>>>> fwiw, openmpi 1.6.5 is old now and no more maintained.
>>>>> I strongly encourage you to use openmpi 1.10.2
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Gilles
>>>>>
>>>>> On Thursday, February 11, 2016, Michael Rezny <
>>>>> michael.re...@monash.edu
>>>>> <javascript:_e(%7B%7D,'cvml','michael.re...@monash.edu');>> wrote:
>>>>>
>>>>>> Hi,
>>>>>> I am running Ubuntu 14.04 LTS with OpenMPI 1.6.5 and gcc 4.8.4
>>>>>>
>>>>>> On a single rank program which just packs and unpacks two ints using
>>>>>> MPI_Pack_external and MPI_Unpack_external
>>>>>> the unpacked ints are in the wrong endian order.
>>>>>>
>>>>>> However, on a HPC, (not Ubuntu), using OpenMPI 1.6.5 and gcc 4.8.4
>>>>>> the unpacked ints are correct.
>>>>>>
>>>>>> Is it possible to get some assistance to track down what is going on?
>>>>>>
>>>>>> Here is the output from the program:
>>>>>>
>>>>>>  ~/tests/mpi/Pack test1
>>>>>> send data 000004d2 0000162e
>>>>>> MPI_Pack_external: 0
>>>>>> buffer size: 8
>>>>>> MPI_unpack_external: 0
>>>>>> recv data d2040000 2e160000
>>>>>>
>>>>>> And here is the source code:
>>>>>>
>>>>>> #include <stdio.h>
>>>>>> #include <mpi.h>
>>>>>>
>>>>>> int main(int argc, char *argv[]) {
>>>>>>   int numRanks, myRank, error;
>>>>>>
>>>>>>   int send_data[2] = {1234, 5678};
>>>>>>   int recv_data[2];
>>>>>>
>>>>>>   MPI_Aint buffer_size = 1000;
>>>>>>   char buffer[buffer_size];
>>>>>>
>>>>>>   MPI_Init(&argc, &argv);
>>>>>>   MPI_Comm_size(MPI_COMM_WORLD, &numRanks);
>>>>>>   MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
>>>>>>
>>>>>>   printf("send data %08x %08x \n", send_data[0], send_data[1]);
>>>>>>
>>>>>>   MPI_Aint position = 0;
>>>>>>   error = MPI_Pack_external("external32", (void*) send_data, 2,
>>>>>> MPI_INT,
>>>>>>           buffer, buffer_size, &position);
>>>>>>   printf("MPI_Pack_external: %d\n", error);
>>>>>>
>>>>>>   printf("buffer size: %d\n", (int) position);
>>>>>>
>>>>>>   position = 0;
>>>>>>   error = MPI_Unpack_external("external32", buffer, buffer_size,
>>>>>> &position,
>>>>>>           recv_data, 2, MPI_INT);
>>>>>>   printf("MPI_unpack_external: %d\n", error);
>>>>>>
>>>>>>   printf("recv data %08x %08x \n", recv_data[0], recv_data[1]);
>>>>>>
>>>>>>   MPI_Finalize();
>>>>>>
>>>>>>   return 0;
>>>>>> }
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');>
>>>>> Subscription: <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post:
>>>>> <http://www.open-mpi.org/community/lists/devel/2016/02/18573.php>
>>>>> http://www.open-mpi.org/community/lists/devel/2016/02/18573.php
>>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post:
>>>> <http://www.open-mpi.org/community/lists/devel/2016/02/18575.php>
>>>> http://www.open-mpi.org/community/lists/devel/2016/02/18575.php
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post:
>>>> <http://www.open-mpi.org/community/lists/devel/2016/02/18576.php>
>>>> http://www.open-mpi.org/community/lists/devel/2016/02/18576.php
>>>>
>>>
>>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2016/02/18579.php
>>
>>
>>
>
> _______________________________________________
> devel mailing listde...@open-mpi.org 
> <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/02/18582.php
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/02/18591.php
>
>
>

Reply via email to