Re: [OMPI users] Scheduling dynamically spawned processes

2011-05-17 Thread Rodrigo Silva Oliveira
Hi Thatyene and Ralph.

Now I got the solution and it worked fine. I did not try spawn_multiple
because I've read the same documentation

Thanks so much!

On Tue, May 17, 2011 at 5:13 PM, Ralph Castain  wrote:

>  Thanks for pointing this out - it's an error in our man page. I've
> fixed it on our devel trunk and will get it push'd to the release.
>
>
> On May 16, 2011, at 1:14 PM, Thatyene Louise Alves de Souza Ramos wrote:
>
> Ralph, thank you the reply.
>
> I just try what you said and it works! I didn't think to try the array of
> info arguments because in the spawn_multiple documentation I read the
> follow:
>
> "... *array_of_info*, is an array of *info *arguments; however, *only the
> first argument in that array is used. Any subsequent arguments in the array
> are ignored* because an *info* argument applies to the entire job that is
> spawned, and cannot be different for each executable in the job. See the
> INFO ARGUMENTS section for more information."
>
> Anyway, I'm glad it works!
>
> Thank you very much!
>
> Regards.
>
> Thatyene Ramos
>
> On Mon, May 16, 2011 at 3:47 PM, Ralph Castain  wrote:
>
>> You need to use MPI_Comm_spawn_multiple. Despite the name, it results in a
>> single communicator being created by a single launch - it just allows you to
>> specify multiple applications to run.
>>
>> In this case, we use the same app, but give each element a different
>> "host" info key to get the behavior we want. Looks something like this:
>>
>> MPI_Comm child;
>> char *cmds[3] = {"myapp", "myapp", "myapp"};
>> MPI_Info info[3];
>> int maxprocs[] = { 1, 3, 1 };
>>
>>   MPI_Info_create([0]);
>>   MPI_Info_set(info[0], "host", "m1");
>>
>>   MPI_Info_create([1]);
>>   MPI_Info_set(info[1], "host", "m2");
>>
>>   MPI_Info_create([2]);
>>   MPI_Info_set(info[2], "host", "m1");
>>
>> MPI_Comm_spawn_multiple(3, cmds, NULL, maxprocs,
>> info, 0, MPI_COMM_WORLD,
>> , MPI_ERRCODES_IGNORE);
>>
>> I won't claim the above is correct - but it gives the gist of the idea.
>>
>>
>> On May 16, 2011, at 12:19 PM, Thatyene Louise Alves de Souza Ramos wrote:
>>
>> Ralph,
>>
>> I have the same issue and I've been searching how to do this, but I
>> couldn't find.
>>
>> What exactly must be the string in the host info key to do what Rodrigo
>> described?
>>
>> <<< Inside your master, you would create an MPI_Info key "host" that has
>> a value
>> <<< consisting of a string "host1,host2,host3" identifying the hosts you
>> want
>> <<< your slave to execute upon. Those hosts must have been included in
>> <<< my_hostfile. Include that key in the MPI_Info array passed to your
>> Spawn.
>>
>> I tried to do what you said above but ompi ignores the repetition of
>> hosts. Using Rodrigo's example I did:
>>
>> host info key = "m1,m2,m2,m2,m3" and number of processes = 5 and the
>> result was
>>
>> m1 -> 2
>> m2 -> 2
>> m3 -> 1
>>
>> and not
>>
>> m1 -> 1
>> m2 -> 3
>> m3 -> 1
>>
>> as I wanted.
>>
>> Thanks in advance.
>>
>> Thatyene Ramos
>>
>> On Fri, May 13, 2011 at 9:16 PM, Ralph Castain  wrote:
>>
>>> I believe I answered that question. You can use the hostfile info key, or
>>> you can use the host info key - either one will do what you require.
>>>
>>> On May 13, 2011, at 4:11 PM, Rodrigo Silva Oliveira wrote:
>>>
>>> Hi,
>>>
>>> I think I was not specific enough. I need to spawn the copies of a
>>> process in a unique mpi_spawn call. It is, I have to specify a list of
>>> machines and how many copies of the process will be spawned on each one. Is
>>> it possible?
>>>
>>> I would be something like that:
>>>
>>> machines #copies
>>> m11
>>> m23
>>> m31
>>>
>>> After an unique call to spawn, I want the copies running in this fashion.
>>> I tried use a hostfile with the option slot, but I'm not sure if it is the
>>> best way.
>>>
>>> hostfile:
>>>
>>> m1 slots=1
>>> m2 slots=3
>>> m3 slots=1
>>>
>>> Thanks
>>>
>>> --
>>> Rodrigo Silva Oliveira
>>> M.Sc. Student - Computer Science
>>> Universidade Federal de Minas Gerais
>>> www.dcc.ufmg.br/~rsilva 
>>>  ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> ___
> users mailing list
> us...@open-mpi.org
> 

Re: [OMPI users] Scheduling dynamically spawned processes

2011-05-17 Thread Ralph Castain
 Thanks for pointing this out - it's an error in our man page. I've 
fixed it on our devel trunk and will get it push'd to the release.


On May 16, 2011, at 1:14 PM, Thatyene Louise Alves de Souza Ramos wrote:

> Ralph, thank you the reply.
> 
> I just try what you said and it works! I didn't think to try the array of 
> info arguments because in the spawn_multiple documentation I read the follow:
> 
> "... array_of_info, is an array of info arguments; however, only the first 
> argument in that array is used. Any subsequent arguments in the array are 
> ignored because an info argument applies to the entire job that is spawned, 
> and cannot be different for each executable in the job. See the INFO 
> ARGUMENTS section for more information."
> 
> Anyway, I'm glad it works!
> 
> Thank you very much!
> 
> Regards.
> 
> Thatyene Ramos
> 
> On Mon, May 16, 2011 at 3:47 PM, Ralph Castain  wrote:
> You need to use MPI_Comm_spawn_multiple. Despite the name, it results in a 
> single communicator being created by a single launch - it just allows you to 
> specify multiple applications to run.
> 
> In this case, we use the same app, but give each element a different "host" 
> info key to get the behavior we want. Looks something like this:
> 
> MPI_Comm child;
> char *cmds[3] = {"myapp", "myapp", "myapp"};
> MPI_Info info[3];
> int maxprocs[] = { 1, 3, 1 };
> 
>   MPI_Info_create([0]);
>   MPI_Info_set(info[0], "host", "m1");
> 
>   MPI_Info_create([1]);
>   MPI_Info_set(info[1], "host", "m2");
> 
>   MPI_Info_create([2]);
>   MPI_Info_set(info[2], "host", "m1");
>   
> MPI_Comm_spawn_multiple(3, cmds, NULL, maxprocs, 
> info, 0, MPI_COMM_WORLD,
> , MPI_ERRCODES_IGNORE);
> 
> I won't claim the above is correct - but it gives the gist of the idea.
> 
> 
> On May 16, 2011, at 12:19 PM, Thatyene Louise Alves de Souza Ramos wrote:
> 
>> Ralph,
>> 
>> I have the same issue and I've been searching how to do this, but I couldn't 
>> find. 
>> 
>> What exactly must be the string in the host info key to do what Rodrigo 
>> described?
>> 
>> <<< Inside your master, you would create an MPI_Info key "host" that has a 
>> value 
>> <<< consisting of a string "host1,host2,host3" identifying the hosts you 
>> want 
>> <<< your slave to execute upon. Those hosts must have been included in 
>> <<< my_hostfile. Include that key in the MPI_Info array passed to your Spawn.
>> 
>> I tried to do what you said above but ompi ignores the repetition of hosts. 
>> Using Rodrigo's example I did:
>> 
>> host info key = "m1,m2,m2,m2,m3" and number of processes = 5 and the result 
>> was
>> 
>> m1 -> 2
>> m2 -> 2
>> m3 -> 1
>> 
>> and not
>> 
>> m1 -> 1
>> m2 -> 3
>> m3 -> 1
>> 
>> as I wanted.
>> 
>> Thanks in advance.
>> 
>> Thatyene Ramos
>> 
>> On Fri, May 13, 2011 at 9:16 PM, Ralph Castain  wrote:
>> I believe I answered that question. You can use the hostfile info key, or 
>> you can use the host info key - either one will do what you require.
>> 
>> On May 13, 2011, at 4:11 PM, Rodrigo Silva Oliveira wrote:
>> 
>>> Hi,
>>> 
>>> I think I was not specific enough. I need to spawn the copies of a process 
>>> in a unique mpi_spawn call. It is, I have to specify a list of machines and 
>>> how many copies of the process will be spawned on each one. Is it possible?
>>> 
>>> I would be something like that:
>>> 
>>> machines #copies
>>> m11
>>> m23
>>> m31
>>> 
>>> After an unique call to spawn, I want the copies running in this fashion. I 
>>> tried use a hostfile with the option slot, but I'm not sure if it is the 
>>> best way.
>>> 
>>> hostfile:
>>> 
>>> m1 slots=1
>>> m2 slots=3
>>> m3 slots=1
>>> 
>>> Thanks
>>> 
>>> -- 
>>> Rodrigo Silva Oliveira
>>> M.Sc. Student - Computer Science
>>> Universidade Federal de Minas Gerais
>>> www.dcc.ufmg.br/~rsilva
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-17 Thread Brock Palen
Sorry typo 314 not 313, 

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On May 17, 2011, at 2:02 PM, Brock Palen wrote:

> Thanks, I though of looking at ompi_info after I sent that note sigh.
> 
> SEND_INPLACE appears to help performance of larger messages in my synthetic 
> benchmarks over regular SEND.  Also it appears that SEND_INPLACE still allows 
> our code to run.
> 
> We working on getting devs access to our system and code. 
> 
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> On May 16, 2011, at 11:49 AM, George Bosilca wrote:
> 
>> Here is the output of the "ompi_info --param btl openib":
>> 
>>MCA btl: parameter "btl_openib_flags" (current value: <306>, 
>> data
>> source: default value)
>> BTL bit flags (general flags: SEND=1, PUT=2, GET=4,
>> SEND_INPLACE=8, RDMA_MATCHED=64, 
>> HETEROGENEOUS_RDMA=256; flags
>> only used by the "dr" PML (ignored by others): 
>> ACK=16,
>> CHECKSUM=32, RDMA_COMPLETION=128; flags only used by 
>> the "bfo"
>> PML (ignored by others): FAILOVER_SUPPORT=512)
>> 
>> So the 305 flags means: HETEROGENEOUS_RDMA | CHECKSUM | ACK | SEND. Most of 
>> these flags are totally useless in the current version of Open MPI (DR is 
>> not supported), so the only value that really matter is SEND | 
>> HETEROGENEOUS_RDMA.
>> 
>> If you want to enable the send protocol try first with SEND | SEND_INPLACE 
>> (9), if not downgrade to SEND (1)
>> 
>> george.
>> 
>> On May 16, 2011, at 11:33 , Samuel K. Gutierrez wrote:
>> 
>>> 
>>> On May 16, 2011, at 8:53 AM, Brock Palen wrote:
>>> 
 
 
 
 On May 16, 2011, at 10:23 AM, Samuel K. Gutierrez wrote:
 
> Hi,
> 
> Just out of curiosity - what happens when you add the following MCA 
> option to your openib runs?
> 
> -mca btl_openib_flags 305
 
 You Sir found the magic combination.
>>> 
>>> :-)  - cool.
>>> 
>>> Developers - does this smell like a registered memory availability hang?
>>> 
 I verified this lets IMB and CRASH progress pass their lockup points,
 I will have a user test this, 
>>> 
>>> Please let us know what you find.
>>> 
 Is this an ok option to put in our environment?  What does 305 mean?
>>> 
>>> There may be a performance hit associated with this configuration, but if 
>>> it lets your users run, then I don't see a problem with adding it to your 
>>> environment.
>>> 
>>> If I'm reading things correctly, 305 turns off RDMA PUT/GET and turns on 
>>> SEND.
>>> 
>>> OpenFabrics gurus - please correct me if I'm wrong :-).
>>> 
>>> Samuel Gutierrez
>>> Los Alamos National Laboratory
>>> 
>>> 
 
 
 Brock Palen
 www.umich.edu/~brockp
 Center for Advanced Computing
 bro...@umich.edu
 (734)936-1985
 
> 
> Thanks,
> 
> Samuel Gutierrez
> Los Alamos National Laboratory
> 
> On May 13, 2011, at 2:38 PM, Brock Palen wrote:
> 
>> On May 13, 2011, at 4:09 PM, Dave Love wrote:
>> 
>>> Jeff Squyres  writes:
>>> 
 On May 11, 2011, at 3:21 PM, Dave Love wrote:
 
> We can reproduce it with IMB.  We could provide access, but we'd have 
> to
> negotiate with the owners of the relevant nodes to give you 
> interactive
> access to them.  Maybe Brock's would be more accessible?  (If you
> contact me, I may not be able to respond for a few days.)
 
 Brock has replied off-list that he, too, is able to reliably reproduce 
 the issue with IMB, and is working to get access for us.  Many thanks 
 for your offer; let's see where Brock's access takes us.
>>> 
>>> Good.  Let me know if we could be useful
>>> 
>> -- we have not closed this issue,
> 
> Which issue?   I couldn't find a relevant-looking one.
 
 https://svn.open-mpi.org/trac/ompi/ticket/2714
>>> 
>>> Thanks.  In csse it's useful info, it hangs for me with 1.5.3 & np=32 on
>>> connectx with more than one collective I can't recall.
>> 
>> Extra data point, that ticket said it ran with mpi_preconnect_mpi 1,  
>> well that doesn't help here, both my production code (crash) and IMB 
>> still hang.
>> 
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> Center for Advanced Computing
>> bro...@umich.edu
>> (734)936-1985
>> 
>>> 
>>> -- 
>>> Excuse the typping -- I have a broken wrist
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>> 
>> 
>> ___
>> 

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-17 Thread Brock Palen
Thanks, I though of looking at ompi_info after I sent that note sigh.

SEND_INPLACE appears to help performance of larger messages in my synthetic 
benchmarks over regular SEND.  Also it appears that SEND_INPLACE still allows 
our code to run.

We working on getting devs access to our system and code. 

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On May 16, 2011, at 11:49 AM, George Bosilca wrote:

> Here is the output of the "ompi_info --param btl openib":
> 
> MCA btl: parameter "btl_openib_flags" (current value: <306>, 
> data
>  source: default value)
>  BTL bit flags (general flags: SEND=1, PUT=2, GET=4,
>  SEND_INPLACE=8, RDMA_MATCHED=64, 
> HETEROGENEOUS_RDMA=256; flags
>  only used by the "dr" PML (ignored by others): 
> ACK=16,
>  CHECKSUM=32, RDMA_COMPLETION=128; flags only used by 
> the "bfo"
>  PML (ignored by others): FAILOVER_SUPPORT=512)
> 
> So the 305 flags means: HETEROGENEOUS_RDMA | CHECKSUM | ACK | SEND. Most of 
> these flags are totally useless in the current version of Open MPI (DR is not 
> supported), so the only value that really matter is SEND | HETEROGENEOUS_RDMA.
> 
> If you want to enable the send protocol try first with SEND | SEND_INPLACE 
> (9), if not downgrade to SEND (1)
> 
>  george.
> 
> On May 16, 2011, at 11:33 , Samuel K. Gutierrez wrote:
> 
>> 
>> On May 16, 2011, at 8:53 AM, Brock Palen wrote:
>> 
>>> 
>>> 
>>> 
>>> On May 16, 2011, at 10:23 AM, Samuel K. Gutierrez wrote:
>>> 
 Hi,
 
 Just out of curiosity - what happens when you add the following MCA option 
 to your openib runs?
 
 -mca btl_openib_flags 305
>>> 
>>> You Sir found the magic combination.
>> 
>> :-)  - cool.
>> 
>> Developers - does this smell like a registered memory availability hang?
>> 
>>> I verified this lets IMB and CRASH progress pass their lockup points,
>>> I will have a user test this, 
>> 
>> Please let us know what you find.
>> 
>>> Is this an ok option to put in our environment?  What does 305 mean?
>> 
>> There may be a performance hit associated with this configuration, but if it 
>> lets your users run, then I don't see a problem with adding it to your 
>> environment.
>> 
>> If I'm reading things correctly, 305 turns off RDMA PUT/GET and turns on 
>> SEND.
>> 
>> OpenFabrics gurus - please correct me if I'm wrong :-).
>> 
>> Samuel Gutierrez
>> Los Alamos National Laboratory
>> 
>> 
>>> 
>>> 
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> Center for Advanced Computing
>>> bro...@umich.edu
>>> (734)936-1985
>>> 
 
 Thanks,
 
 Samuel Gutierrez
 Los Alamos National Laboratory
 
 On May 13, 2011, at 2:38 PM, Brock Palen wrote:
 
> On May 13, 2011, at 4:09 PM, Dave Love wrote:
> 
>> Jeff Squyres  writes:
>> 
>>> On May 11, 2011, at 3:21 PM, Dave Love wrote:
>>> 
 We can reproduce it with IMB.  We could provide access, but we'd have 
 to
 negotiate with the owners of the relevant nodes to give you interactive
 access to them.  Maybe Brock's would be more accessible?  (If you
 contact me, I may not be able to respond for a few days.)
>>> 
>>> Brock has replied off-list that he, too, is able to reliably reproduce 
>>> the issue with IMB, and is working to get access for us.  Many thanks 
>>> for your offer; let's see where Brock's access takes us.
>> 
>> Good.  Let me know if we could be useful
>> 
> -- we have not closed this issue,
 
 Which issue?   I couldn't find a relevant-looking one.
>>> 
>>> https://svn.open-mpi.org/trac/ompi/ticket/2714
>> 
>> Thanks.  In csse it's useful info, it hangs for me with 1.5.3 & np=32 on
>> connectx with more than one collective I can't recall.
> 
> Extra data point, that ticket said it ran with mpi_preconnect_mpi 1,  
> well that doesn't help here, both my production code (crash) and IMB 
> still hang.
> 
> 
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> bro...@umich.edu
> (734)936-1985
> 
>> 
>> -- 
>> Excuse the typping -- I have a broken wrist
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
 
 
 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
 
 
>>> 
>>> 
>>> ___
>>> 

Re: [OMPI users] TotalView Memory debugging and OpenMPI

2011-05-17 Thread Jeff Squyres
Can you send your diff in unified form?

On May 11, 2011, at 4:05 PM, Peter Thompson wrote:

> We've gotten a few reports of problems with memory debugging when using 
> OpenMPI under TotalView.  Usually, TotalView will attach tot he processes 
> started after an MPI_Init.  However in the case where memory debugging is 
> enabled, things seemed to run away or fail.   My analysis showed that we had 
> a number of core files left over from the attempt, and all were mpirun (or 
> orterun) cores.   It seemed to be a regression on our part, since testing 
> seemed to indicate this worked okay before TotalView 8.9.0-0, so I filed an 
> internal bug and passed it to engineering.   After giving our engineer a 
> brief tutorial on how to build a debug version of OpenMPI, he found what 
> appears to be a problem in the code for orterun.c.   He's made a slight 
> change that fixes the issue in 1.4.2, 1.4.3, 1.4.4rc2 and 1.5.3, those being 
> the versions he's tested with so far.He doesn't subscribe to this list 
> that I know of, so I offered to pass this by the group.   Of course, I'm not 
> sure if this is exactly the right place to submit patches, but I'm sure you'd 
> tell me where to put it if I'm in the wrong here.   It's a short patch, so 
> I'll cut and paste it, and attach as well, since cut and paste can do weird 
> things to formatting.
> 
> Credit goes to Ariel Burton for this patch.  Of course he used TotalVIew to 
> find this ;-)  It shows up if you do 'mpirun -tv -np 4 ./foo'   or 'totalview 
> mpirun -a -np 4 ./foo'
> 
> Cheers,
> PeterT
> 
> 
> more ~/patches/anbs-patch
> *** orte/tools/orterun/orterun.c2010-04-13 13:30:34.0 -0400
> --- 
> /home/anb/packages/openmpi-1.4.2/linux-x8664-iwashi/installation/bin/../../.
> ./src/openmpi-1.4.2/orte/tools/orterun/orterun.c2011-05-09 
> 20:28:16.5881
> 83000 -0400
> ***
> *** 1578,1588 
> }
> if (NULL != env) {
> size1 = opal_argv_count(env);
> for (j = 0; j < size1; ++j) {
> ! putenv(env[j]);
> }
> }
> /* All done */
> --- 1578,1600 
> }
> if (NULL != env) {
> size1 = opal_argv_count(env);
> for (j = 0; j < size1; ++j) {
> ! /* Use-after-Free error possible here.  putenv does not copy
> !the string passed to it, and instead stores only the pointer.
> !env[j] may be freed later, in which case the pointer
> !in environ will now be left dangling into a deallocated
> !region.
> !So we make a copy of the variable.
> ! */
> ! char *s = strdup(env[j]);
> !
> ! if (NULL == s) {
> ! return OPAL_ERR_OUT_OF_RESOURCE;
> ! }
> ! putenv(s);
> }
> }
> /* All done */
> 
> *** orte/tools/orterun/orterun.c  2010-04-13 13:30:34.0 -0400
> --- 
> /home/anb/packages/openmpi-1.4.2/linux-x8664-iwashi/installation/bin/../../../src/openmpi-1.4.2/orte/tools/orterun/orterun.c
>   2011-05-09 20:28:16.588183000 -0400
> ***
> *** 1578,1588 
>  }
> 
>  if (NULL != env) {
>  size1 = opal_argv_count(env);
>  for (j = 0; j < size1; ++j) {
> ! putenv(env[j]);
>  }
>  }
> 
>  /* All done */
> 
> --- 1578,1600 
>  }
> 
>  if (NULL != env) {
>  size1 = opal_argv_count(env);
>  for (j = 0; j < size1; ++j) {
> ! /* Use-after-Free error possible here.  putenv does not copy
> !the string passed to it, and instead stores only the pointer.
> !env[j] may be freed later, in which case the pointer
> !in environ will now be left dangling into a deallocated
> !region.
> !So we make a copy of the variable.
> ! */
> ! char *s = strdup(env[j]);
> ! 
> ! if (NULL == s) {
> ! return OPAL_ERR_OUT_OF_RESOURCE;
> ! }
> ! putenv(s);
>  }
>  }
> 
>  /* All done */
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Windows: MPI_Allreduce() crashes when using MPI_DOUBLE_PRECISION

2011-05-17 Thread hi
Did you tried these test programs?
Or any suggestion to overcome this bug???

Thank you.
-Hiral

On Fri, May 13, 2011 at 11:20 AM, hi  wrote:
> Hi Rainer,
>
>> Does REAL work for You?
> No.
> I am observing same errors (see below) even with INTEGER; please find
> the attached test programs with INTEGER and REAL.
>
> C:\test> mpirun mar_f_i.exe
>  size=           1 , rank=           0
>  start --, rcvbuf=           0           0           0           0           0
>  end --, rcvbuf=           2           2           2           2           2
>
> C:\test> mpirun -np 2 mar_f_i.exe
>  size=           2 , rank=           0
>  start --, rcvbuf=           0           0           0           0           0
>  size=           2 , rank=           1
>  start --, rcvbuf=           0           0           0           0           0
> forrtl: severe (157): Program Exception - access violation
> Image              PC                Routine            Line        Source
> [vibgyor:12628] [[31763,0],0]-[[31763,1],0] mca_oob_tcp_msg_recv:
> readv failed: Unknown error (108)
> --
> WARNING: A process refused to die!
>
> Host: vibgyor
> PID:  488
>
> This process may still be running and/or consuming resources.
>
> --
> --
> mpirun has exited due to process rank 0 with PID 452 on node vibgyor
> exiting improperly. There are two reasons this could occur:
>
> 1. this process did not call "init" before exiting, but others in the
> job did. This can cause a job to hang indefinitely while it waits for
> all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
>
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
>
> This may have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --
>
>
> Thank you.
> -Hiral
>
>
> On Thu, May 12, 2011 at 9:03 PM, Rainer Keller  wrote:
>> Hello Hiral,
>> in the ompi_info You attached, the fortran size detection did not work
>> correctly (on viscluster -- aka that shows the you used the std.-installation
>> package):
>> ...
>>      Fort dbl prec size: 4
>> ...
>>
>> This most probably does not match Your compiler's setting for DOUBLE
>> PRECISION, which probably considers this to be 8.
>>
>> Does REAL work for You?
>>
>> Shiqing is currently away, will ask when he returns.
>>
>> With best regards,
>> Rainer
>>
>>
>> On Wednesday 11 May 2011 09:29:03 hi wrote:
>>> Hi Jeff,
>>>
>>> > Can you send the info listed on the help page?
>>> >
>>> >From the HELP page...
>>>
>>> ***For run-time problems:
>>> 1) Check the FAQ first. Really. This can save you a lot of time; many
>>> common problems and solutions are listed there.
>>> I couldn't find reference in FAQ.
>>>
>>> 2) The version of Open MPI that you're using.
>>> I am using pre-built openmpi-1.5.3 64-bit and 32-bit binaries on Window 7
>>> I also tried with locally built openmpi-1.5.2 using Visual Studio 2008
>>> 32-bit compilers
>>> I tried various compilers: VS-9 32-bit and VS-10 64-bit and
>>> corresponding intel ifort compiler.
>>>
>>> 3) The config.log file from the top-level Open MPI directory, if
>>> available (please compress!).
>>> Don't have.
>>>
>>> 4) The output of the "ompi_info --all" command from the node where
>>> you're invoking mpirun.
>>> see output of pre-built openmpi-1.5.3_x64/bin/ompi_info --all" in
>>> attachments.
>>>
>>> 5) If running on more than one node --
>>> I am running test program on single none.
>>>
>>> 6) A detailed description of what is failing.
>>> Already described in this post.
>>>
>>> 7) Please include information about your network:
>>> As I am running test program on local and single machine, this might
>>> not be required.
>>>
>>> > You forgot ierr in the call to MPI_Finalize.  You also paired
>>> > DOUBLE_PRECISION data with MPI_INTEGER in the call to allreduce.  And
>>> > you mixed sndbuf and rcvbuf in the call to allreduce, meaning that when
>>> > your print rcvbuf afterwards, it'll always still be 0.
>>>
>>> As I am not Fortran programmer, this is my mistake !!!
>>>
>>> >        program Test_MPI
>>> >            use mpi
>>> >            implicit none
>>> >
>>> >            DOUBLE PRECISION rcvbuf(5), sndbuf(5)
>>> >            INTEGER nproc, rank, ierr, n, i, ret
>>> >
>>> >            n = 5
>>> >            do i = 1, n
>>> >                sndbuf(i) = 2.0
>>> >                rcvbuf(i) = 0.0
>>> >            end do
>>> >
>>> >            call MPI_INIT(ierr)
>>> >            call