date:20100923

Re: [OMPI users] Running on crashing nodes

2010-09-23 Thread Ralph Castain

In a word, no. If a node crashes, OMPI will abort the currently-running job
if it had processes on that node. There is no current ability to "ride-thru"
such an event.

That said, there is work being done to support "ride-thru". Most of that is
in the current developer's code trunk, and more is coming, but I wouldn't
consider it production-quality just yet.

Specifically, the code that does what you specify below is done and works.
It is recovery of the MPI job itself (collectives, lost messages, etc.) that
remains to be completed.

On Thu, Sep 23, 2010 at 7:22 AM, Andrei Fokau
wrote:

> Dear users,
>
> Our cluster has a number of nodes which have high probability to crash, so
> it happens quite often that calculations stop due to one node getting down.
> May be you know if it is possible to block the crashed nodes during run-time
> when running with OpenMPI? I am asking about principal possibility to
> program such behavior. Does OpenMPI allow such dynamic checking? The scheme
> I am curious about is the following:
>
> 1. A code starts its tasks via mpirun on several nodes
> 2. At some moment one node gets down
> 3. The code realizes that the node is down (the results are lost) and
> excludes it from the list of nodes to run its tasks on
> 4. At later moment the user restarts the crashed node
> 5. The code notices that the node is up again, and puts it back to the list
> of active nodes
>
>
> Regards,
> Andrei
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] function fgets hangs a mpi program when it is used ompi-ps command

2010-09-23 Thread Ralph Castain

ompi-ps talks to mpirun to get the info, and then pretty-prints it to
stderr. Best guess is that it is having problems contacting mpirun. Are you
running it on the same node as mpirun (a requirement, unless you pass it the
full contact info)?

Check the ompi-ps man page and also "ompi-ps -h" to ensure you are running
it correctly. There may be options that would help to figure out what is
wrong (I forget what they all are).



On Thu, Sep 23, 2010 at 12:21 PM, Matheus Bersot Siqueira Barros <
matheusberso...@gmail.com> wrote:

> Jeff and Ralph,
>
> Thank you for your reply.
>
> 1) I'm not running on machines with OpenFabrics.
>
> 2) In my example, ompi-ps prints a maximum 82 bytes per line. Even so, I
> augment to 300 bytes per line to be sure that it is not the problem.
>
> char mystring [300];
> ...
> fgets (mystring , 300 , pFile);
>
> 2) When I run ps, it shows just two process: ps and bash.
> PID TTY  TIME  CMD
> 1961 pts/500:00:00 bash
> 2154 pts/500:00:00 ps
>
> But when I run ps -a -l, it appears my program(test.run) and other
> processes. I put below just the information related to my program.
>
> F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY  TIME CMD
> 0 S  1000  1841  1840  0  80   0 - 18054 pipe_w pts/000:00:00 test.run
> 0 S  1000  1842  1840  0  80   0 - 18053 poll_s pts/000:00:00 test.run
> 0 S  1000  1843  1840  0  80   0 - 18053 poll_s pts/000:00:00 test.run
> 0 S  1000  1844  1840  0  80   0 - 18053 poll_s pts/000:00:00 test.run
>
> pipe_s  = wait state on read/write against a pipe.
>
> So, with that command I concluded that one mpi process is waiting for the
> read of a pipe.
>
> The problem still persists.
>
> Thanks,
> Matheus.
>
>
> On Wed, Sep 22, 2010 at 11:24 AM, Ralph Castain  wrote:
>
>> Printouts of less than 100 bytes would be unusual...but possible
>>
>>
>> On Wed, Sep 22, 2010 at 8:15 AM, Jeff Squyres  wrote:
>>
>>> Are you running on machines with OpenFabrics devices (that Open MPI is
>>> using)?
>>>
>>> Is ompi-ps printing 100 bytes or more?
>>>
>>> What does ps show when your program is hung?
>>>
>>>
>>>
>>> On Sep 17, 2010, at 3:13 PM, Matheus Bersot Siqueira Barros wrote:
>>>
>>> > Open MPI Version = 1.4.2
>>> > OS = Ubuntu 10.04 LTS and CentOS 5.3
>>> >
>>> > When I run the mpi program below in the terminal, the function fgets
>>> hangs.
>>> > How do I know it? I do a printf before and later the call of fgets and
>>> only the message "before fgets()" is showed.
>>> >
>>> > However, when I run the same program at Eclipse 3.6 with CDT
>>> 7.0.0.201006141710 or using gdb it runs normally.
>>> > If you change the command in the function popen  to another one(for
>>> instance: "ls -l"), it will run correctly.
>>> >
>>> > I use the following commands to compile and run the program:
>>> >
>>> > compile : mpicc teste.c -o teste.run
>>> >
>>> > run : mpirun -np 4 ./teste.run
>>> >
>>> >
>>> > Does anyone know why the program behaves like that?
>>> >
>>> > Thanks in advance,
>>> >
>>> > Matheus Bersot.
>>> >
>>> > MPI_PROGRAM:
>>> >
>>> > #include 
>>> > #include "mpi.h"
>>> >
>>> > int main(int argc, char *argv[])
>>> > {
>>> >int rank, nprocs;
>>> >FILE * pFile = NULL;
>>> >char mystring [100];
>>> >
>>> > MPI_Init(,);
>>> > MPI_Comm_size(MPI_COMM_WORLD,);
>>> > MPI_Comm_rank(MPI_COMM_WORLD,);
>>> >
>>> >if(rank == 0)
>>> >{
>>> >pFile = popen ("ompi-ps" , "r");
>>> >if (pFile == NULL) perror ("Error opening file");
>>> >else {
>>> >  while(!feof(pFile))
>>> >  {
>>> >printf("before fgets()\n");
>>> >fgets (mystring , 100 , pFile);
>>> >printf("after fgets()\n");
>>> >puts (mystring);
>>> >  }
>>> >  pclose (pFile);
>>> >}
>>> >   }
>>> >
>>> >   MPI_Finalize();
>>> >return 0;
>>> > }
>>> > ___
>>> > users mailing list
>>> > us...@open-mpi.org
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
> -
> "In moments of crisis, only the inspiration is more important than
> knowledge."
> (Albert Einstein)
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-23 Thread Terry Dontje

Eloi, I am curious about your problem.  Can you tell me what size of job 
it is?  Does it always fail on the same bcast,  or same process?

Eloi Gaudry wrote:

Hi Nysal,

Thanks for your suggestions.

I'm now able to get the checksum computed and redirected to stdout, thanks (I forgot the  
"-mca pml_base_verbose 5" option, you were right).
I haven't been able to observe the segmentation fault (with hdr->tag=0) so far 
(when using pml csum) but I 'll let you know when I am.

I've got two others question, which may be related to the error observed:

1/ does the maximum number of MPI_Comm that can be handled by OpenMPI somehow depends on the btl being used (i.e. if I'm using 
openib, may I use the same number of MPI_Comm object as with tcp) ? Is there something as MPI_COMM_MAX in OpenMPI ?


2/ the segfaults only appears during a mpi collective call, with very small 
message (one int is being broadcast, for instance) ; i followed the guidelines 
given at http://icl.cs.utk.edu/open-
mpi/faq/?category=openfabrics#ib-small-message-rdma but the debug-build of OpenMPI asserts if I use a different min-size that 255. Anyway, if I deactivate eager_rdma, the segfaults remains. 
Does the openib btl handle very small message differently (even with eager_rdma deactivated) than tcp ?
Others on the list does coalescing happen with non-eager_rdma?  If so 
then that would possibly be one difference between the openib btl and 
tcp aside from the actual protocol used.

 is there a way to make sure that large messages and small messages are handled 
the same way ?
  
Do you mean so they all look like eager messages?  How large of messages 
are we talking about here 1K, 1M or 10M?


--td

Regards,
Eloi


On Friday 17 September 2010 17:57:17 Nysal Jan wrote:
  

Hi Eloi,
Create a debug build of OpenMPI (--enable-debug) and while running with the
csum PML add "-mca pml_base_verbose 5" to the command line. This will print
the checksum details for each fragment sent over the wire. I'm guessing it
didnt catch anything because the BTL failed. The checksum verification is
done in the PML, which the BTL calls via a callback function. In your case
the PML callback is never called because the hdr->tag is invalid. So
enabling checksum tracing also might not be of much use. Is it the first
Bcast that fails or the nth Bcast and what is the message size? I'm not
sure what could be the problem at this moment. I'm afraid you will have to
debug the BTL to find out more.

--Nysal

On Fri, Sep 17, 2010 at 4:39 PM, Eloi Gaudry  wrote:


Hi Nysal,

thanks for your response.

I've been unable so far to write a test case that could illustrate the
hdr->tag=0 error.
Actually, I'm only observing this issue when running an internode
computation involving infiniband hardware from Mellanox (MT25418,
ConnectX IB DDR, PCIe 2.0
2.5GT/s, rev a0) with our time-domain software.

I checked, double-checked, and rechecked again every MPI use performed
during a parallel computation and I couldn't find any error so far. The
fact that the very
same parallel computation run flawlessly when using tcp (and disabling
openib support) might seem to indicate that the issue is somewhere
located inside the
openib btl or at the hardware/driver level.

I've just used the "-mca pml csum" option and I haven't seen any related
messages (when hdr->tag=0 and the segfaults occurs).
Any suggestion ?

Regards,
Eloi

On Friday 17 September 2010 16:03:34 Nysal Jan wrote:
  

Hi Eloi,
Sorry for the delay in response. I haven't read the entire email
thread, but do you have a test case which can reproduce this error?
Without that it will be difficult to nail down the cause. Just to
clarify, I do not work for an iwarp vendor. I can certainly try to
reproduce it on an IB system. There is also a PML called csum, you can
use it via "-mca pml csum", which will checksum the MPI messages and
verify it at the receiver side for any data corruption. You can try
using it to see if it is able


to

  

catch anything.

Regards
--Nysal

On Thu, Sep 16, 2010 at 3:48 PM, Eloi Gaudry  wrote:


Hi Nysal,

I'm sorry to intrrupt, but I was wondering if you had a chance to
look
  

at

  

this error.

Regards,
Eloi



--


Eloi Gaudry

Free Field Technologies
Company Website: http://www.fft.be
Company Phone:   +32 10 487 959


-- Forwarded message --
From: Eloi Gaudry 
To: Open MPI Users 
Date: Wed, 15 Sep 2010 16:27:43 +0200
Subject: Re: [OMPI users] [openib] segfault when using openib btl
Hi,

I was wondering if anybody got a chance to have a look at this issue.

Regards,
Eloi

On Wednesday 18 August 2010 09:16:26 Eloi Gaudry wrote:
  

Hi Jeff,

Please find enclosed the output (valgrind.out.gz) from
/opt/openmpi-debug-1.4.2/bin/orterun -np 2 --host pbn11,pbn10 --mca


btl

  

openib,self --display-map --verbose --mca mpi_warn_on_fork 0 --mca

Re: [OMPI users] "self scheduled" work & mpi receive???

2010-09-23 Thread Mikael Lavoie

Hi Ambrose,

I'm interested in you work, i have a app to convert for myself and i don't
know enough the MPI structure and syntaxe to make it...

So if you wanna share your app i'm interested in taking a look at it!!

Thanks and have a nice day!!

Mikael Lavoie
2010/9/23 Lewis, Ambrose J. 

>  Hi All:
>
> I’ve written an openmpi program that “self schedules” the work.
>
> The master task is in a loop chunking up an input stream and handing off
> jobs to worker tasks.  At first the master gives the next job to the next
> highest rank.  After all ranks have their first job, the master waits via an
> MPI receive call for the next free worker.  The master parses out the rank
> from the MPI receive and sends the next job to this node.  The jobs aren’t
> all identical, so they run for slightly different durations based on the
> input data.
>
>
>
> When I plot a histogram of the number of jobs each worker performed, the
> lower mpi ranks are doing much more work than the higher ranks.  For
> example, in a 120 process run, rank 1 did 32 jobs while rank 119 only did 2.
>  My guess is that openmpi returns the lowest rank from the MPI Recv when
> I’ve got MPI_ANY_SOURCE set and multiple sends have happened since the last
> call.
>
>
>
> Is there a different Recv call to make that will spread out the data
> better?
>
>
>
> THANXS!
>
> amb
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] "self scheduled" work & mpi receive???

2010-09-23 Thread Lewis, Ambrose J.

That's a great suggestion...Thanks!
amb


-Original Message-
From: users-boun...@open-mpi.org on behalf of Bowen Zhou
Sent: Thu 9/23/2010 1:18 PM
To: Open MPI Users
Subject: Re: [OMPI users] "self scheduled" work & mpi receive???
 




> Hi All:
> 
> I've written an openmpi program that "self schedules" the work.  
> 
> The master task is in a loop chunking up an input stream and handing off 
> jobs to worker tasks.  At first the master gives the next job to the 
> next highest rank.  After all ranks have their first job, the master 
> waits via an MPI receive call for the next free worker.  The master 
> parses out the rank from the MPI receive and sends the next job to this 
> node.  The jobs aren't all identical, so they run for slightly different 
> durations based on the input data.
> 
>  
> 
> When I plot a histogram of the number of jobs each worker performed, the 
> lower mpi ranks are doing much more work than the higher ranks.  For 
> example, in a 120 process run, rank 1 did 32 jobs while rank 119 only 
> did 2.  My guess is that openmpi returns the lowest rank from the MPI 
> Recv when I've got MPI_ANY_SOURCE set and multiple sends have happened 
> since the last call.
> 
>  
> 
> Is there a different Recv call to make that will spread out the data better?
> 
>  
How about using MPI_Irecv? Let the master issue an MPI_Irecv for each 
worker and call MPI_Test to get the list of idle workers, then choose 
one from the idle list by some randomization?

> 
> THANXS!
> 
> amb
> 
>  
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] function fgets hangs a mpi program when it is used ompi-ps command

2010-09-23 Thread Matheus Bersot Siqueira Barros

Jeff and Ralph,

Thank you for your reply.

1) I'm not running on machines with OpenFabrics.

2) In my example, ompi-ps prints a maximum 82 bytes per line. Even so, I
augment to 300 bytes per line to be sure that it is not the problem.

char mystring [300];
...
fgets (mystring , 300 , pFile);

2) When I run ps, it shows just two process: ps and bash.
PID TTY  TIME  CMD
1961 pts/500:00:00 bash
2154 pts/500:00:00 ps

But when I run ps -a -l, it appears my program(test.run) and other
processes. I put below just the information related to my program.

F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY  TIME CMD
0 S  1000  1841  1840  0  80   0 - 18054 pipe_w pts/000:00:00 test.run
0 S  1000  1842  1840  0  80   0 - 18053 poll_s pts/000:00:00 test.run
0 S  1000  1843  1840  0  80   0 - 18053 poll_s pts/000:00:00 test.run
0 S  1000  1844  1840  0  80   0 - 18053 poll_s pts/000:00:00 test.run

pipe_s  = wait state on read/write against a pipe.

So, with that command I concluded that one mpi process is waiting for the
read of a pipe.

The problem still persists.

Thanks,
Matheus.

On Wed, Sep 22, 2010 at 11:24 AM, Ralph Castain  wrote:

> Printouts of less than 100 bytes would be unusual...but possible
>
>
> On Wed, Sep 22, 2010 at 8:15 AM, Jeff Squyres  wrote:
>
>> Are you running on machines with OpenFabrics devices (that Open MPI is
>> using)?
>>
>> Is ompi-ps printing 100 bytes or more?
>>
>> What does ps show when your program is hung?
>>
>>
>>
>> On Sep 17, 2010, at 3:13 PM, Matheus Bersot Siqueira Barros wrote:
>>
>> > Open MPI Version = 1.4.2
>> > OS = Ubuntu 10.04 LTS and CentOS 5.3
>> >
>> > When I run the mpi program below in the terminal, the function fgets
>> hangs.
>> > How do I know it? I do a printf before and later the call of fgets and
>> only the message "before fgets()" is showed.
>> >
>> > However, when I run the same program at Eclipse 3.6 with CDT
>> 7.0.0.201006141710 or using gdb it runs normally.
>> > If you change the command in the function popen  to another one(for
>> instance: "ls -l"), it will run correctly.
>> >
>> > I use the following commands to compile and run the program:
>> >
>> > compile : mpicc teste.c -o teste.run
>> >
>> > run : mpirun -np 4 ./teste.run
>> >
>> >
>> > Does anyone know why the program behaves like that?
>> >
>> > Thanks in advance,
>> >
>> > Matheus Bersot.
>> >
>> > MPI_PROGRAM:
>> >
>> > #include 
>> > #include "mpi.h"
>> >
>> > int main(int argc, char *argv[])
>> > {
>> >int rank, nprocs;
>> >FILE * pFile = NULL;
>> >char mystring [100];
>> >
>> > MPI_Init(,);
>> > MPI_Comm_size(MPI_COMM_WORLD,);
>> > MPI_Comm_rank(MPI_COMM_WORLD,);
>> >
>> >if(rank == 0)
>> >{
>> >pFile = popen ("ompi-ps" , "r");
>> >if (pFile == NULL) perror ("Error opening file");
>> >else {
>> >  while(!feof(pFile))
>> >  {
>> >printf("before fgets()\n");
>> >fgets (mystring , 100 , pFile);
>> >printf("after fgets()\n");
>> >puts (mystring);
>> >  }
>> >  pclose (pFile);
>> >}
>> >   }
>> >
>> >   MPI_Finalize();
>> >return 0;
>> > }
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
-
"In moments of crisis, only the inspiration is more important than
knowledge."
(Albert Einstein)

Re: [OMPI users] Question about Asynchronous collectives

2010-09-23 Thread Richard Treumann

CC stands for any Collective Communication operation. Every CC occurs on 
some communicator.

Every CC is issued (basically the thread the call is on enters the call) 
at some point in  time.  If two threads are issuing CC calls on the same 
communicator, the issue order can become ambiguous so making CC calls from 
different threads but on the same communicator is generally unsafe. There 
is debate about whether it can be made safe by forcing some kind of thread 
serialization but since the MPI standard does not discuss thread 
serialization, the best  advise is to use a different communicator for 
each thread and be sure you have control of issue order.

When CC  calls appear in some static order in a block of code that has no 
branches, issue order is simple to recognize.  An example like this can 
cause problems unless you are sure every process has the same condition:

If (condition) {
  MPI_Ibcast
  MPI_Ireduce
} else {
  MPI_Ireduce
  MPI_Ibcast
}

If some ranks take the if and some ranks take the else, there is an "issue 
order" problem. (I do not have any idea why someone would do this)

  Dick 

Dick Treumann  -  MPI Team 
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363

From:
Gabriele Fatigati 
To:
Open MPI Users 
List-Post: users@lists.open-mpi.org
Date:
09/23/2010 01:02 PM
Subject:
Re: [OMPI users] Question about Asynchronous collectives
Sent by:
users-boun...@open-mpi.org

Sorry Richard,

what is CC issue order on the communicator?, in particular, "CC", what 
does it mean?

2010/9/23 Richard Treumann 

request_1 and request_2 are just local variable names. 

The only thing that determines matching order is CC issue order on the 
communicator.  At each process, some CC is issued first and some CC is 
issued second.  The first issued CC at each process will try to match the 
first issued CC at the other processes.  By this rule, 
rank 0: 
MPI_Ibcast; MPI_Ibcast 
Rank 1; 
MPI_Ibcast; MPI_Ibcast 
is well defined and 

rank 0: 
MPI_Ibcast; MPI_Ireduce 
Rank 1; 
MPI_Ireducet; MPI_Ibcast 
is incorrect. 

I do not agree with Jeff on this below.   The Proc 1 case where the 
MPI_Waits are reversed simply requires the MPI implementation to make 
progress on both MPI_Ibcast operations in the first MPI_Wait. The second 
MPI_Wait call will simply find that the first MPI_Ibcast is already done. 
 The second MPI_Wait call becomes, effectively, a query function. 

proc 0:
MPI_IBcast(MPI_COMM_WORLD, request_1) // first Bcast
MPI_IBcast(MPI_COMM_WORLD, request_2) // second Bcast
MPI_Wait(_1, ...);
MPI_Wait(_2, ...);

proc 1:
MPI_IBcast(MPI_COMM_WORLD, request_2) // first Bcast
MPI_IBcast(MPI_COMM_WORLD, request_1) // second Bcast
MPI_Wait(_1, ...);
MPI_Wait(_2, ...);

That may/will deadlock. 

Dick Treumann  -  MPI Team   
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363

From: 
Jeff Squyres  
To: 
Open MPI Users  
List-Post: users@lists.open-mpi.org
Date: 
09/23/2010 10:13 AM 
Subject: 
Re: [OMPI users] Question about Asynchronous collectives 
Sent by: 
users-boun...@open-mpi.org

On Sep 23, 2010, at 10:00 AM, Gabriele Fatigati wrote:

> to be sure, if i have one processor who does:
> 
> MPI_IBcast(MPI_COMM_WORLD, request_1) // first Bcast
> MPI_IBcast(MPI_COMM_WORLD, request_2) // second Bcast
> 
> it means that i can't have another process who does the follow:
> 
> MPI_IBcast(MPI_COMM_WORLD, request_2) // firt Bcast for another process
> MPI_IBcast(MPI_COMM_WORLD, request_1) // second Bcast for another 
process
> 
> Because first Bcast of second process matches with first Bcast of first 
process, and it's wrong.

If you did a "waitall" on both requests, it would probably work because 
MPI would just "figure it out".  But if you did something like:

proc 0:
MPI_IBcast(MPI_COMM_WORLD, request_1) // first Bcast
MPI_IBcast(MPI_COMM_WORLD, request_2) // second Bcast
MPI_Wait(_1, ...);
MPI_Wait(_2, ...);

proc 1:
MPI_IBcast(MPI_COMM_WORLD, request_2) // first Bcast
MPI_IBcast(MPI_COMM_WORLD, request_1) // second Bcast
MPI_Wait(_1, ...);
MPI_Wait(_2, ...);

That may/will deadlock.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Ing. Gabriele Fatigati

Parallel programmer

CINECA Systems & Tecnologies Department

Supercomputing Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39

[OMPI users] How to know which process is running on which core?

2010-09-23 Thread Fernando Saez

Hi all, I'm new in the list. I don't know if this post has been treated
before.

My question is:

Is there a way in the OMPI library to report which process is running
on which core in a SMP system? I need to know processor affinity for
optimizations issues.

Regards

Fernando Saez

[OMPI users] Porting Open MPI to ARM: How essential is the opal_sys_timer_get_cycles() function?

2010-09-23 Thread Ken Mighell


Dear Open MPI,

How essential is Open MPI's opal_sys_timer_get_cycles() function?
It apparently needs to access a timestamp register directly.  That is
a trivial operation in PPC (mftb) or x86 (tsc), but the ARM processor
apparently doesn't have a similar function in its instruction set.

Is it critical that opal_sys_timer_get_cycles() be written in assembly?

Would a hack written in C suffice?

Sincerely yours,

Ken Mighell


Kenneth Mighell, Scientist
National Optical Astronomy Observatory
950 North Cherry Avenue
Tucson, AZ 85719 U.S.A.
email: mighell_at_[hidden]
voice: (520) 318-8391

Re: [OMPI users] "self scheduled" work & mpi receive???

2010-09-23 Thread Bowen Zhou







Hi All:

I’ve written an openmpi program that “self schedules” the work.  

The master task is in a loop chunking up an input stream and handing off 
jobs to worker tasks.  At first the master gives the next job to the 
next highest rank.  After all ranks have their first job, the master 
waits via an MPI receive call for the next free worker.  The master 
parses out the rank from the MPI receive and sends the next job to this 
node.  The jobs aren’t all identical, so they run for slightly different 
durations based on the input data.


 

When I plot a histogram of the number of jobs each worker performed, the 
lower mpi ranks are doing much more work than the higher ranks.  For 
example, in a 120 process run, rank 1 did 32 jobs while rank 119 only 
did 2.  My guess is that openmpi returns the lowest rank from the MPI 
Recv when I’ve got MPI_ANY_SOURCE set and multiple sends have happened 
since the last call.


 


Is there a different Recv call to make that will spread out the data better?

 
How about using MPI_Irecv? Let the master issue an MPI_Irecv for each 
worker and call MPI_Test to get the list of idle workers, then choose 
one from the idle list by some randomization?




THANXS!

amb

 





___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Question about Asynchronous collectives

2010-09-23 Thread Gabriele Fatigati

Sorry Richard,

what is CC issue order on the communicator?, in particular, "CC", what does
it mean?

2010/9/23 Richard Treumann 

>
> request_1 and request_2 are just local variable names.
>
> The only thing that determines matching order is CC issue order on the
> communicator.  At each process, some CC is issued first and some CC is
> issued second.  The first issued CC at each process will try to match the
> first issued CC at the other processes.  By this rule,
> rank 0:
> MPI_Ibcast; MPI_Ibcast
> Rank 1;
> MPI_Ibcast; MPI_Ibcast
> is well defined and
>
> rank 0:
> MPI_Ibcast; MPI_Ireduce
> Rank 1;
> MPI_Ireducet; MPI_Ibcast
> is incorrect.
>
> I do not agree with Jeff on this below.   The Proc 1 case where the
> MPI_Waits are reversed simply requires the MPI implementation to make
> progress on both MPI_Ibcast operations in the first MPI_Wait. The second
> MPI_Wait call will simply find that the first MPI_Ibcast is already done.
>  The second MPI_Wait call becomes, effectively, a query function.
>
> proc 0:
> MPI_IBcast(MPI_COMM_WORLD, request_1) // first Bcast
> MPI_IBcast(MPI_COMM_WORLD, request_2) // second Bcast
> MPI_Wait(_1, ...);
> MPI_Wait(_2, ...);
>
> proc 1:
> MPI_IBcast(MPI_COMM_WORLD, request_2) // first Bcast
> MPI_IBcast(MPI_COMM_WORLD, request_1) // second Bcast
> MPI_Wait(_1, ...);
> MPI_Wait(_2, ...);
>
> That may/will deadlock.
>
>
>
>
>
> Dick Treumann  -  MPI Team
> IBM Systems & Technology Group
> Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> Tele (845) 433-7846 Fax (845) 433-8363
>
>
>
>  From:
> Jeff Squyres 
> To: Open MPI Users  Date: 09/23/2010 10:13 AM Subject: Re:
> [OMPI users] Question about Asynchronous collectives Sent by:
> users-boun...@open-mpi.org
> --
>
>
>
> On Sep 23, 2010, at 10:00 AM, Gabriele Fatigati wrote:
>
> > to be sure, if i have one processor who does:
> >
> > MPI_IBcast(MPI_COMM_WORLD, request_1) // first Bcast
> > MPI_IBcast(MPI_COMM_WORLD, request_2) // second Bcast
> >
> > it means that i can't have another process who does the follow:
> >
> > MPI_IBcast(MPI_COMM_WORLD, request_2) // firt Bcast for another process
> > MPI_IBcast(MPI_COMM_WORLD, request_1) // second Bcast for another process
> >
> > Because first Bcast of second process matches with first Bcast of first
> process, and it's wrong.
>
> If you did a "waitall" on both requests, it would probably work because MPI
> would just "figure it out".  But if you did something like:
>
> proc 0:
> MPI_IBcast(MPI_COMM_WORLD, request_1) // first Bcast
> MPI_IBcast(MPI_COMM_WORLD, request_2) // second Bcast
> MPI_Wait(_1, ...);
> MPI_Wait(_2, ...);
>
> proc 1:
> MPI_IBcast(MPI_COMM_WORLD, request_2) // first Bcast
> MPI_IBcast(MPI_COMM_WORLD, request_1) // second Bcast
> MPI_Wait(_1, ...);
> MPI_Wait(_2, ...);
>
> That may/will deadlock.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Ing. Gabriele Fatigati

Parallel programmer

CINECA Systems & Tecnologies Department

Supercomputing Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatigati [AT] cineca.it

Re: [OMPI users] "self scheduled" work & mpi receive???

2010-09-23 Thread pooja varshneya

Hi Lewis,

On Thu, Sep 23, 2010 at 9:38 AM, Lewis, Ambrose J.
 wrote:
> Hi All:
>
> I’ve written an openmpi program that “self schedules” the work.
>
> The master task is in a loop chunking up an input stream and handing off
> jobs to worker tasks.  At first the master gives the next job to the next
> highest rank.  After all ranks have their first job, the master waits via an
> MPI receive call for the next free worker.  The master parses out the rank
> from the MPI receive and sends the next job to this node.  The jobs aren’t
> all identical, so they run for slightly different durations based on the
> input data.
>
>
>
> When I plot a histogram of the number of jobs each worker performed, the
> lower mpi ranks are doing much more work than the higher ranks.  For
> example, in a 120 process run, rank 1 did 32 jobs while rank 119 only did 2.
>  My guess is that openmpi returns the lowest rank from the MPI Recv when
> I’ve got MPI_ANY_SOURCE set and multiple sends have happened since the last
> call.
>

What is the time taken by each computation ? It is possible that
computation time for longer tasks is much greater than computation
time for shorter tasks ?

>
>
> Is there a different Recv call to make that will spread out the data better?
>
>
>
> THANXS!
>
> amb
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] Question about Asynchronous collectives

2010-09-23 Thread Richard Treumann

request_1 and request_2 are just local variable names. 

The only thing that determines matching order is CC issue order on the 
communicator.  At each process, some CC is issued first and some CC is 
issued second.  The first issued CC at each process will try to match the 
first issued CC at the other processes.  By this rule,
rank 0: 
MPI_Ibcast; MPI_Ibcast 
Rank 1;
MPI_Ibcast; MPI_Ibcast 
is well defined and

rank 0: 
MPI_Ibcast; MPI_Ireduce
Rank 1;
MPI_Ireducet; MPI_Ibcast 
is incorrect.

I do not agree with Jeff on this below.   The Proc 1 case where the 
MPI_Waits are reversed simply requires the MPI implementation to make 
progress on both MPI_Ibcast operations in the first MPI_Wait. The second 
MPI_Wait call will simply find that the first MPI_Ibcast is already done. 
The second MPI_Wait call becomes, effectively, a query function.

proc 0:
MPI_IBcast(MPI_COMM_WORLD, request_1) // first Bcast
MPI_IBcast(MPI_COMM_WORLD, request_2) // second Bcast
MPI_Wait(_1, ...);
MPI_Wait(_2, ...);

proc 1:
MPI_IBcast(MPI_COMM_WORLD, request_2) // first Bcast
MPI_IBcast(MPI_COMM_WORLD, request_1) // second Bcast
MPI_Wait(_1, ...);
MPI_Wait(_2, ...);

That may/will deadlock.

Dick Treumann  -  MPI Team 
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363

From:
Jeff Squyres 
To:
Open MPI Users 
List-Post: users@lists.open-mpi.org
Date:
09/23/2010 10:13 AM
Subject:
Re: [OMPI users] Question about Asynchronous collectives
Sent by:
users-boun...@open-mpi.org

On Sep 23, 2010, at 10:00 AM, Gabriele Fatigati wrote:

> to be sure, if i have one processor who does:
> 
> MPI_IBcast(MPI_COMM_WORLD, request_1) // first Bcast
> MPI_IBcast(MPI_COMM_WORLD, request_2) // second Bcast
> 
> it means that i can't have another process who does the follow:
> 
> MPI_IBcast(MPI_COMM_WORLD, request_2) // firt Bcast for another process
> MPI_IBcast(MPI_COMM_WORLD, request_1) // second Bcast for another 
process
> 
> Because first Bcast of second process matches with first Bcast of first 
process, and it's wrong.

If you did a "waitall" on both requests, it would probably work because 
MPI would just "figure it out".  But if you did something like:

proc 0:
MPI_IBcast(MPI_COMM_WORLD, request_1) // first Bcast
MPI_IBcast(MPI_COMM_WORLD, request_2) // second Bcast
MPI_Wait(_1, ...);
MPI_Wait(_2, ...);

proc 1:
MPI_IBcast(MPI_COMM_WORLD, request_2) // first Bcast
MPI_IBcast(MPI_COMM_WORLD, request_1) // second Bcast
MPI_Wait(_1, ...);
MPI_Wait(_2, ...);

That may/will deadlock.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Question about Asynchronous collectives

2010-09-23 Thread Jeff Squyres

On Sep 23, 2010, at 10:00 AM, Gabriele Fatigati wrote:

> to be sure, if i have one processor who does:
> 
> MPI_IBcast(MPI_COMM_WORLD, request_1) // first Bcast
> MPI_IBcast(MPI_COMM_WORLD, request_2) // second Bcast
> 
> it means that i can't have another process who does the follow:
> 
> MPI_IBcast(MPI_COMM_WORLD, request_2) // firt Bcast for another process
> MPI_IBcast(MPI_COMM_WORLD, request_1) // second Bcast for another process
> 
> Because first Bcast of second process matches with first Bcast of first 
> process, and it's wrong.

If you did a "waitall" on both requests, it would probably work because MPI 
would just "figure it out".  But if you did something like:

proc 0:
MPI_IBcast(MPI_COMM_WORLD, request_1) // first Bcast
MPI_IBcast(MPI_COMM_WORLD, request_2) // second Bcast
MPI_Wait(_1, ...);
MPI_Wait(_2, ...);

proc 1:
MPI_IBcast(MPI_COMM_WORLD, request_2) // first Bcast
MPI_IBcast(MPI_COMM_WORLD, request_1) // second Bcast
MPI_Wait(_1, ...);
MPI_Wait(_2, ...);

That may/will deadlock.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] Question about Asynchronous collectives

2010-09-23 Thread Gabriele Fatigati

Mm,

to be sure, if i have one processor who does:

MPI_IBcast(MPI_COMM_WORLD, request_1) // first Bcast
MPI_IBcast(MPI_COMM_WORLD, request_2) // second Bcast

it means that i can't have another process who does the follow:

MPI_IBcast(MPI_COMM_WORLD, request_2) // firt Bcast for another process
MPI_IBcast(MPI_COMM_WORLD, request_1) // second Bcast for another process

Because first Bcast of second process matches with first Bcast of first
process, and it's wrong.

Is it right?



2010/9/23 Jeff Squyres 

> On Sep 23, 2010, at 6:28 AM, Gabriele Fatigati wrote:
>
> > i'm studing the interfaces of new collective routines in next MPI-3, and
> i've read that new collectives haven't any tag.
>
> Correct.
>
> > So all collective operations must follow the ordering rules for
> collective calls.
>
> Also correct.
>
> > From what i understand, this means that i can't use:
> >
> > MPI_IBcast(MPI_COMM_WORLD, request_1) // first Bcast
> > MPI_IBcast(MPI_COMM_WORLD, request_2) // second Bcast
>
> No, not quite right.  You can have multiple outstanding ibcast's -- they'll
> just be satisfied in the same order in all participating MPI processes.
>
> > but is it possible to do this:
> >
> > MPI_IBcast(MPI_COMM_WORLD, request_1) // first Bcast
> > MPI_IReducet(MPI_COMM_WORLD, request_2) // othwer collective
>
> Correct -- this is also possible.
>
> More generally, you can have multiple outstanding non-blocking collectives
> on a single communicator -- it doesn't matter if they are the same or
> different collective operations. They will each be unique instances and will
> be satisfied in order.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>


-- 
Ing. Gabriele Fatigati

Parallel programmer

CINECA Systems & Tecnologies Department

Supercomputing Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatigati [AT] cineca.it

Re: [OMPI users] Question about Asynchronous collectives

2010-09-23 Thread Jeff Squyres

On Sep 23, 2010, at 6:28 AM, Gabriele Fatigati wrote:

> i'm studing the interfaces of new collective routines in next MPI-3, and i've 
> read that new collectives haven't any tag. 

Correct.

> So all collective operations must follow the ordering rules for collective 
> calls.

Also correct.

> From what i understand, this means that i can't use:
> 
> MPI_IBcast(MPI_COMM_WORLD, request_1) // first Bcast
> MPI_IBcast(MPI_COMM_WORLD, request_2) // second Bcast

No, not quite right.  You can have multiple outstanding ibcast's -- they'll 
just be satisfied in the same order in all participating MPI processes.

> but is it possible to do this:
> 
> MPI_IBcast(MPI_COMM_WORLD, request_1) // first Bcast
> MPI_IReducet(MPI_COMM_WORLD, request_2) // othwer collective

Correct -- this is also possible.

More generally, you can have multiple outstanding non-blocking collectives on a 
single communicator -- it doesn't matter if they are the same or different 
collective operations. They will each be unique instances and will be satisfied 
in order.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

[OMPI users] "self scheduled" work & mpi receive???

2010-09-23 Thread Lewis, Ambrose J.

Hi All:

I've written an openmpi program that "self schedules" the work.  

The master task is in a loop chunking up an input stream and handing off
jobs to worker tasks.  At first the master gives the next job to the
next highest rank.  After all ranks have their first job, the master
waits via an MPI receive call for the next free worker.  The master
parses out the rank from the MPI receive and sends the next job to this
node.  The jobs aren't all identical, so they run for slightly different
durations based on the input data.

 

When I plot a histogram of the number of jobs each worker performed, the
lower mpi ranks are doing much more work than the higher ranks.  For
example, in a 120 process run, rank 1 did 32 jobs while rank 119 only
did 2.  My guess is that openmpi returns the lowest rank from the MPI
Recv when I've got MPI_ANY_SOURCE set and multiple sends have happened
since the last call.

 

Is there a different Recv call to make that will spread out the data
better?

 

THANXS!

amb

[OMPI users] Running on crashing nodes

2010-09-23 Thread Andrei Fokau

Dear users,

Our cluster has a number of nodes which have high probability to crash, so
it happens quite often that calculations stop due to one node getting down.
May be you know if it is possible to block the crashed nodes during run-time
when running with OpenMPI? I am asking about principal possibility to
program such behavior. Does OpenMPI allow such dynamic checking? The scheme
I am curious about is the following:

1. A code starts its tasks via mpirun on several nodes
2. At some moment one node gets down
3. The code realizes that the node is down (the results are lost) and
excludes it from the list of nodes to run its tasks on
4. At later moment the user restarts the crashed node
5. The code notices that the node is up again, and puts it back to the list
of active nodes


Regards,
Andrei

[OMPI users] Question about Asynchronous collectives

2010-09-23 Thread Gabriele Fatigati

Dear all,

i'm studing the interfaces of new collective routines in next MPI-3, and
i've read that new collectives haven't any tag.

So all collective operations must follow the ordering rules for collective
calls.

>From what i understand, this means that i can't use:

MPI_IBcast(MPI_COMM_WORLD, request_1) // first Bcast
MPI_IBcast(MPI_COMM_WORLD, request_2) // second Bcast

but is it possible to do this:

MPI_IBcast(MPI_COMM_WORLD, request_1) // first Bcast
MPI_IReducet(MPI_COMM_WORLD, request_2) // othwer collective

In other words, i can't overlap the same collective more time on one
communicator, but is it possible with different collectives?

Thanks a lot.





-- 
Ing. Gabriele Fatigati

Parallel programmer

CINECA Systems & Tecnologies Department

Supercomputing Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatigati [AT] cineca.it

Re: [OMPI users] PathScale problems persist

2010-09-23 Thread Jeff Squyres

You should probably take this up with Pathscale's support team.


On Sep 23, 2010, at 3:56 AM, Rafael Arco Arredondo wrote:

> I am using GCC 4.x:
> 
> $ pathCC -v
> PathScale(TM) Compiler Suite: Version 3.2
> Built on: 2008-06-16 16:41:38 -0700
> Thread model: posix
> GNU gcc version 4.2.0 (PathScale 3.2 driver)
> 
> $ pathCC -show-defaults
> Optimization level and compilation target:
>   -O2 -mcpu=opteron -m64 -msse -msse2 -mno-sse3 -mno-3dnow -mno-sse4a
> -gnu4
> 
> And I also tried with mpiCC -gnu4 to be totally sure. It's rather weird
> that I get this error and Ake does not...
> 
> I configured Open MPI with PathScale with the following line, by the
> way:
> 
> ./configure --with-openib=/usr --with-openib-libdir=/usr/lib64
> --with-sge --enable-static CC=pathcc CXX=pathCC F77=pathf90 F90=pathf90
> FC=pathf90
> 
> And with GCC:
> 
> ./configure --with-openib=/usr --with-openib-libdir=/usr/lib64
> --with-sge --enable-static
> 
> It's not an Infiniband or SGE issue. I also tried with all processes
> running on the same node and without SGE.
> 
> Best regards,
> 
> Rafa
> 
> On Wed, 2010-09-22 at 14:54 +0200, Ake Sandgren wrote:
>> On Wed, 2010-09-22 at 14:16 +0200, Ake Sandgren wrote:
>>> On Wed, 2010-09-22 at 07:42 -0400, Jeff Squyres wrote:
 This is a problem with the Pathscale compiler and old versions of
>> GCC.  See:
 
 
>> http://www.open-mpi.org/faq/?category=building#pathscale-broken-with-mpi-c%2B%2B-api
 
 I note that you said you're already using GCC 4.x, but it's not
>> clear from your text whether pathscale is using that compiler or a
>> different GCC on the back-end.  If you can confirm that pathscale *is*
>> using GCC 4.x on the back-end, then this is worth reporting to the
>> pathscale support people.
>>> 
>>> I have no problem running the code below compiled with openmpi 1.4.2
>> and
>>> pathscale 3.2.
>> 
>> And i should of course have specified that this is with a GCC4.x
>> backend.
> -- 
> Rafael Arco Arredondo
> Centro de Servicios de Informática y Redes de Comunicaciones
> Campus de Fuentenueva - Edificio Mecenas
> Universidad de Granada
> E-18071 Granada Spain
> Tel: +34 958 241010   Ext:31114   E-mail: rafaa...@ugr.es
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] PathScale problems persist

2010-09-23 Thread Rafael Arco Arredondo

I am using GCC 4.x:

$ pathCC -v
PathScale(TM) Compiler Suite: Version 3.2
Built on: 2008-06-16 16:41:38 -0700
Thread model: posix
GNU gcc version 4.2.0 (PathScale 3.2 driver)

$ pathCC -show-defaults
Optimization level and compilation target:
   -O2 -mcpu=opteron -m64 -msse -msse2 -mno-sse3 -mno-3dnow -mno-sse4a
-gnu4

And I also tried with mpiCC -gnu4 to be totally sure. It's rather weird
that I get this error and Ake does not...

I configured Open MPI with PathScale with the following line, by the
way:

./configure --with-openib=/usr --with-openib-libdir=/usr/lib64
--with-sge --enable-static CC=pathcc CXX=pathCC F77=pathf90 F90=pathf90
FC=pathf90

And with GCC:

./configure --with-openib=/usr --with-openib-libdir=/usr/lib64
--with-sge --enable-static

It's not an Infiniband or SGE issue. I also tried with all processes
running on the same node and without SGE.

Best regards,

Rafa

On Wed, 2010-09-22 at 14:54 +0200, Ake Sandgren wrote:
> On Wed, 2010-09-22 at 14:16 +0200, Ake Sandgren wrote:
> > On Wed, 2010-09-22 at 07:42 -0400, Jeff Squyres wrote:
> > > This is a problem with the Pathscale compiler and old versions of
> GCC.  See:
> > > 
> > >
> http://www.open-mpi.org/faq/?category=building#pathscale-broken-with-mpi-c%2B%2B-api
> > > 
> > > I note that you said you're already using GCC 4.x, but it's not
> clear from your text whether pathscale is using that compiler or a
> different GCC on the back-end.  If you can confirm that pathscale *is*
> using GCC 4.x on the back-end, then this is worth reporting to the
> pathscale support people.
> > 
> > I have no problem running the code below compiled with openmpi 1.4.2
> and
> > pathscale 3.2.
> 
> And i should of course have specified that this is with a GCC4.x
> backend.
-- 
Rafael Arco Arredondo
Centro de Servicios de Informática y Redes de Comunicaciones
Campus de Fuentenueva - Edificio Mecenas
Universidad de Granada
E-18071 Granada Spain
Tel: +34 958 241010   Ext:31114   E-mail: rafaa...@ugr.es

Re: [OMPI users] Running on crashing nodes

Re: [OMPI users] function fgets hangs a mpi program when it is used ompi-ps command

Re: [OMPI users] [openib] segfault when using openib btl

Re: [OMPI users] "self scheduled" work & mpi receive???

Re: [OMPI users] "self scheduled" work & mpi receive???

Re: [OMPI users] function fgets hangs a mpi program when it is used ompi-ps command

Re: [OMPI users] Question about Asynchronous collectives

[OMPI users] How to know which process is running on which core?

[OMPI users] Porting Open MPI to ARM: How essential is the opal_sys_timer_get_cycles() function?

Re: [OMPI users] "self scheduled" work & mpi receive???

Re: [OMPI users] Question about Asynchronous collectives

Re: [OMPI users] "self scheduled" work & mpi receive???

Re: [OMPI users] Question about Asynchronous collectives

Re: [OMPI users] Question about Asynchronous collectives

Re: [OMPI users] Question about Asynchronous collectives

Re: [OMPI users] Question about Asynchronous collectives

[OMPI users] "self scheduled" work & mpi receive???

[OMPI users] Running on crashing nodes

[OMPI users] Question about Asynchronous collectives

Re: [OMPI users] PathScale problems persist

Re: [OMPI users] PathScale problems persist

21 matches

Site Navigation

Mail list logo

Footer information