[OMPI devel] 1.4.5rc2 testing: Open64 and PathScale

2012-01-27 Thread Paul H. Hargrove

No bad news this time.
I grabbed the latest free versions of Open64 and PathScale and gave them 
a try:


PASS:
   linux/x86-64 w/ Open64-4.5.1 compilers from AMD
   linux/x86-64 w/ ekopath-4.0.12.1 compilers from PathScale
Where "PASS" is my usual "make all install check".

-Paul

On 1/19/2012 9:55 AM, Jeff Squyres wrote:

Please test:

 http://www.open-mpi.org/software/ompi/v1.4/



--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



[hwloc-devel] Create success (hwloc r1.5a1r4210)

2012-01-27 Thread MPI Team
Creating nightly hwloc snapshot SVN tarball was a success.

Snapshot:   hwloc 1.5a1r4210
Start time: Fri Jan 27 21:01:02 EST 2012
End time:   Fri Jan 27 21:04:29 EST 2012

Your friendly daemon,
Cyrador


Re: [OMPI devel] 1.4.5rc2 testing linux/ppc/IBM [SOLVED]

2012-01-27 Thread Paul H. Hargrove



On 1/27/2012 5:24 AM, Jeff Squyres wrote:

On Jan 27, 2012, at 12:45 AM, Paul H. Hargrove wrote:


On this cluster, statfs() is returning ENOENT, which is breaking 
opal_path_nfs().
So, these results are with test/opal/util/opal_path_nfs.c "disabled".

Paul -- can you explain this a little more?  There should be logic in there to 
effectively handle ENOENT's, meaning that if we get a non-ESTALE error, we try again with 
the directory name.  This is repeated until we get to "/" -- so there should 
definitely be at least one case where statfs() is *not* returning ENOENT.

Is that not happening?



I looked a bit deeper and found that the bug is in OMPI, but a simple 
one to fix.

I added 2 lines to opal/util/path.c:

--- openmpi-1.4.5rc2-orig/opal/util/path.c 2011-02-04 
07:38:16.0 -0600
+++ openmpi-1.4.5rc2/opal/util/path.c 2012-01-27 12:46:30.0 
-0600

@@ -476,6 +476,8 @@
 rc = statvfs (file, );
 #elif defined(linux) || defined (__BSD) || (defined(__APPLE__) && 
defined(__MACH__))

 rc = statfs (file, );
+#else
+  #error "No statvfs or statfs call"
 #endif
 } while (-1 == rc && ESTALE == errno && (0 < --trials));


Can you guess what happens when I "make" now?
There IS no call to statfs, and the ENOENT I saw must have been "left 
over" from some earlier libc call.


The problem is that these compilers have not pre-defined "linux".
It does appear that they are defining "__linux" and "__linux__" 
(double-underscores).

So, a little change of the preprocessor logic should fix this problem:
   $ sed -pi -e 's/defined\(linux\)/defined\(__linux__\)/;' -- 
opal/util/path.c

[more compact than the corresponding diffs]

With that change (and without "disabling" opal_path_nfs.c) all 4 
compilers are PASSing "make all install check".


Source inspection suggests that the 1.5 branch has the same issue.
I've not inspected the HEAD, but somebody should.


FYI:
I've done a bit of grepping for linux,__linux,__linux__.
My search shows only 2 files checking for definition of "linux"
   opal/util/path.c
   opal/mca/memory/ptmalloc2/malloc.c
And exactly one looking for "__linux":
   test/event/event-test.c
Checks for "__linux__" appear in the following files:
   ompi/mca/io/romio/romio/adio/ad_lustre/ad_lustre.h
   ompi/mca/btl/openib/btl_openib_component.c
   opal/util/if.c
   opal/mca/memory/ptmalloc2/arena.c
   test/util/opal_path_nfs.c (IRONY!)
I suggest standardization to "__linux__" in the 3 files that currently 
use "linux" or "__linux".



-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] 1.4.5rc2 tests on MacOSX Lion (2 pass, 1 FAIL): orte_odls visibility issue

2012-01-27 Thread Paul Hargrove
On Fri, Jan 27, 2012 at 5:34 AM, Jeff Squyres  wrote:
[snip]
>
>
> I'm not quite sure how that can happen -- orte_odls appears to be
> prototyped properly in orte/mca/odls/odls.h (i.e., it has ORTE_DECLSPEC,
> for visibility), and is properly instantiated in
> orte/mca/odls/base/odls_base_open.c.
>
> Paul: can you run some nm's and see how the orte_odls symbol appears in
> libopen-rte.a?
>
>

In the PGI build directory:

> $ find . -name '*.a' | while read lib; do
>   out=`nm $lib 2>/dev/null | grep -w _orte_odls`;
>   test -n "$out" && echo -e "${lib}:\n${out}";
>done
> ./orte/.libs/libopen-rte.a:
>  U _orte_odls
>  U _orte_odls
>  U _orte_odls
> 0038 C _orte_odls
>  U _orte_odls
>  U _orte_odls
> ./orte/mca/errmgr/.libs/libmca_errmgr.a:
>  U _orte_odls
> ./orte/mca/odls/.libs/libmca_odls.a:
> 0038 C _orte_odls
>  U _orte_odls
> ./orte/mca/plm/.libs/libmca_plm.a:
>  U _orte_odls


Meanwhile in the GCC build directory the same shell command yields
something quite different:

> ./orte/mca/errmgr/.libs/libmca_errmgr.a:
>  U _orte_odls
> ./orte/mca/odls/.libs/libmca_odls.a:
> 11c0 S _orte_odls
>  U _orte_odls
> ./orte/mca/plm/.libs/libmca_plm.a:
>  U _orte_odls


So the difference boils down to "C" vs "S".
According to "man nm" on this system
  "C" is "common"
  "S" is "other section not listed above"

I don't know much about visibility attributes and so can't follow the path
any further without some instructions to follow.  (Though I will read the
PGI manpages for anything related to common vs noncommon symbols).

Hopefully those are the droids you're looking for,
-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


[OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-01-27 Thread nadia.derbey
Hi,

If a job is launched using "srun --resv-ports --cpu_bind:..." and slurm
is configured with:
   TaskPlugin=task/affinity
   TaskPluginParam=Cpusets

each rank of that job is in a cpuset that contains a single CPU.

Now, if we use carto on top of this, the following happens in
get_ib_dev_distance() (in btl/openib/btl_openib_component.c):
   . opal_paffinity_base_get_processor_info() is called to get the
 number of logical processors (we get 1 due to the singleton cpuset)
   . we loop over that # of processors to check whether our process is
 bound to one of them. In our case the loop will be executed only
 once and we will never get the correct binding information.
   . if the process is bound actually get the distance to the device.
 in our case we won't execute that part of the code.

The attached patch is a proposal to fix the issue.

Regards,
Nadia
fix get_ib_dev_distance: it doesn't see processes as bound if the job has been launched by srun

diff -r 162c1c0c050a ompi/mca/btl/openib/btl_openib_component.c
--- a/ompi/mca/btl/openib/btl_openib_component.cThu Jan 26 21:42:13 2012 +
+++ b/ompi/mca/btl/openib/btl_openib_component.cFri Jan 27 16:36:29 2012 +0100
@@ -2332,11 +2332,20 @@ static int get_ib_dev_distance(struct ib
 {
 opal_paffinity_base_cpu_set_t cpus;
 opal_carto_base_node_t *device_node;
-int min_distance = -1, i, num_processors;
+int min_distance = -1, i;
+int num_processors = OPAL_PAFFINITY_BITMASK_CPU_MAX;
 const char *device = ibv_get_device_name(dev);

-if(opal_paffinity_base_get_processor_info(_processors) != OMPI_SUCCESS) {
-num_processors = 100; /* Choose something big enough */
+if (opal_paffinity_base_get_processor_info(_processors) != OMPI_SUCCESS
+|| 1 == num_processors) {
+/*
+ * We get num_processors=1 if we were launched using srun + binding:
+ * in that case we are placed in a cpuset that contains a single cpu
+ * (the one we were bound to)
+ * ==> the loop after won't be correctly executed
+ */
+/* Choose something big enough */
+num_processors = OPAL_PAFFINITY_BITMASK_CPU_MAX;
 }

 device_node = opal_carto_base_find_node(host_topo, device);



Re: [OMPI devel] Pessimist Event Logger

2012-01-27 Thread Aurélien Bouteiller
Hugo, 

It seems you want to implement some sort of remote pessimistic logging -a la 
MPICH-V1- ? 
MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes -- George 
Bosilca, Aurélien Bouteiller, Franck Cappello, Samir Djilali, Gilles Fédak, 
Cécile Germain, Thomas Hérault, Pierre Lemarinier, Oleg Lodygensky, Frédéric 
Magniette, Vincent Néri, Anton Selikhov -- In proceedings of The IEEE/ACM 
SC2002 Conference, Baltimore USA, November 2002

In the PML-V, unlike older designs, the payload of messages and the 
non-deterministic events follow a different path. The payload of messages is 
logged on the sender's volatile memory, while the non-deterministic events are 
sent to a stable event logger, before allowing the process to impact the state 
of others (the code you have found in the previous email). The best depiction 
of this distinction can be read in this paper 
@inproceedings{DBLP:conf/europar/BouteillerHBD11,
  author= {Aurelien Bouteiller and
   Thomas H{\'e}rault and
   George Bosilca and
   Jack J. Dongarra},
  title = {Correlated Set Coordination in Fault Tolerant Message Logging
   Protocols},
  booktitle = {Euro-Par 2011 Parallel Processing - 17th International 
Conference, Proceedings, Part II},
  month = {September},
  year  = {2011},
  pages = {51-64},
  publisher = {Springer},
  series= {Lecture Notes in Computer Science},
  volume= {6853},
  year  = {2011},
  isbn  = {978-3-642-23396-8},
  doi   = {http://dx.doi.org/10.1007/978-3-642-23397-5_6},




If you intend to store both payload and message log on a remote node, I suggest 
you look at the "sender-based" hooks, as this is where the message payload is 
managed, and adapt from here. The event loggers can already manage a subset 
only of the processes (if you launch as many EL as processes, you get a 1-1 
mapping), but they never handle message payload; you'll have to add all this 
yourself is it so pleases you. 

Hope it clarifies. 
Aurelien




Le 27 janv. 2012 à 11:19, Hugo Daniel Meyer a écrit :

> Hello Aurélien.
> 
> Thanks for the clarification. Considering what you've mentioned i will have 
> to make some adaptations, because to me, every single message has to be 
> logged. So, a sender not only will be sending messages to the receiver, but 
> also to an event logger. Is there any considerations that i've to take into 
> account when modifying the code?. My initial idea is to use the el_comm with 
> a group of event loggers (because every node uses a different event logger in 
> my approach), and then send the messages to them as you do when using 
> MPI_ANY_SOURCE. 
> 
> Thanks for your help.
> 
> Hugo Meyer
> 
> 2012/1/27 Aurélien Bouteiller 
> Hugo,
> 
> Your program does not have non-deterministic events. Therefore, there are no 
> events to log. If you add MPI_ANY_SOURCE, you should see this code being 
> called. Please contact me again if you need more help.
> 
> Aurelien
> 
> 
> Le 27 janv. 2012 à 10:21, Hugo Daniel Meyer a écrit :
> 
> > Hello @ll.
> >
> > George, i'm using some pieces of the pessimist vprotocol. I've observed 
> > that when you do a send, you call vprotocol_receiver_event_flush and here 
> > the macro __VPROTOCOL_RECEIVER_SEND_BUFFER is called. I've noticed that 
> > here you try send a copy of the message to process 0 using the el_comm. 
> > This section of code is never executed, at least in my examples. So, the 
> > message is never sent to the Event Logger, am i correct with this?  I think 
> > that this is happening because the 
> > mca_vprotocol_pessimist.event_buffer_length is always 0.
> >
> > Is there something that i've got to turn on, or i will have to modify this 
> > behavior manually to connect and send messages to the EL?
> >
> > Thanks in advance.
> >
> > Hugo Meyer
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> --
> * Dr. Aurélien Bouteiller
> * Researcher at Innovative Computing Laboratory
> * University of Tennessee
> * 1122 Volunteer Boulevard, suite 350
> * Knoxville, TN 37996
> * 865 974 6321
> 
> 
> 
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
* Dr. Aurélien Bouteiller
* Researcher at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 350
* Knoxville, TN 37996
* 865 974 6321







signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [OMPI devel] Pessimist Event Logger

2012-01-27 Thread Hugo Daniel Meyer
Hello Aurélien.

Thanks for the clarification. Considering what you've mentioned i will have
to make some adaptations, because to me, every single message has to be
logged. So, a sender not only will be sending messages to the receiver, but
also to an event logger. Is there any considerations that i've to take into
account when modifying the code?. My initial idea is to use the el_comm
with a group of event loggers (because every node uses a different event
logger in my approach), and then send the messages to them as you do when
using MPI_ANY_SOURCE.

Thanks for your help.

Hugo Meyer

2012/1/27 Aurélien Bouteiller 

> Hugo,
>
> Your program does not have non-deterministic events. Therefore, there are
> no events to log. If you add MPI_ANY_SOURCE, you should see this code being
> called. Please contact me again if you need more help.
>
> Aurelien
>
>
> Le 27 janv. 2012 à 10:21, Hugo Daniel Meyer a écrit :
>
> > Hello @ll.
> >
> > George, i'm using some pieces of the pessimist vprotocol. I've observed
> that when you do a send, you call vprotocol_receiver_event_flush and here
> the macro __VPROTOCOL_RECEIVER_SEND_BUFFER is called. I've noticed that
> here you try send a copy of the message to process 0 using the el_comm.
> This section of code is never executed, at least in my examples. So, the
> message is never sent to the Event Logger, am i correct with this?  I think
> that this is happening because the
> mca_vprotocol_pessimist.event_buffer_length is always 0.
> >
> > Is there something that i've got to turn on, or i will have to modify
> this behavior manually to connect and send messages to the EL?
> >
> > Thanks in advance.
> >
> > Hugo Meyer
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> --
> * Dr. Aurélien Bouteiller
> * Researcher at Innovative Computing Laboratory
> * University of Tennessee
> * 1122 Volunteer Boulevard, suite 350
> * Knoxville, TN 37996
> * 865 974 6321
>
>
>
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] Pessimist Event Logger

2012-01-27 Thread Aurélien Bouteiller
Hugo, 

Your program does not have non-deterministic events. Therefore, there are no 
events to log. If you add MPI_ANY_SOURCE, you should see this code being 
called. Please contact me again if you need more help.

Aurelien


Le 27 janv. 2012 à 10:21, Hugo Daniel Meyer a écrit :

> Hello @ll.
> 
> George, i'm using some pieces of the pessimist vprotocol. I've observed that 
> when you do a send, you call vprotocol_receiver_event_flush and here the 
> macro __VPROTOCOL_RECEIVER_SEND_BUFFER is called. I've noticed that here you 
> try send a copy of the message to process 0 using the el_comm. This section 
> of code is never executed, at least in my examples. So, the message is never 
> sent to the Event Logger, am i correct with this?  I think that this is 
> happening because the mca_vprotocol_pessimist.event_buffer_length is always 0.
> 
> Is there something that i've got to turn on, or i will have to modify this 
> behavior manually to connect and send messages to the EL?
> 
> Thanks in advance.
> 
> Hugo Meyer
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
* Dr. Aurélien Bouteiller
* Researcher at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 350
* Knoxville, TN 37996
* 865 974 6321







signature.asc
Description: Message signed with OpenPGP using GPGMail


[OMPI devel] Pessimist Event Logger

2012-01-27 Thread Hugo Daniel Meyer
Hello @ll.

George, i'm using some pieces of the pessimist vprotocol. I've observed
that when you do a send, you call vprotocol_receiver_event_flush and here
the macro *__VPROTOCOL_RECEIVER_SEND_BUFFER* is called. I've noticed that
here you try send a copy of the message to process 0 using the el_comm.
This section of code is never executed, at least in my examples. So, the
message is never sent to the Event Logger, am i correct with this?  I think
that this is happening because the *
mca_vprotocol_pessimist.event_buffer_length* is always 0.

Is there something that i've got to turn on, or i will have to modify this
behavior manually to connect and send messages to the EL?

Thanks in advance.

Hugo Meyer


Re: [OMPI devel] 1.4.5rc2 tests on MacOSX Lion (2 pass, 1 FAIL): orte_odls visibility issue

2012-01-27 Thread Jeff Squyres
On Jan 26, 2012, at 8:54 PM, Paul Hargrove wrote:

> libtool: link: pgcc -O -DNDEBUG -o orte-clean orte-clean.o  
> ../../../orte/.libs/libopen-rte.a 
> /Users/paul/openmpi-1.4.5rc2/BLD-pgi-11.10/opal/.libs/libopen-pal.a -lutil
> Undefined symbols for architecture x86_64:
>   "_orte_odls", referenced from:
>   _orte_errmgr_base_error_abort in libopen-rte.a(errmgr_base_fns.o)
> ld: symbol(s) not found for architecture x86_64

I'm not quite sure how that can happen -- orte_odls appears to be prototyped 
properly in orte/mca/odls/odls.h (i.e., it has ORTE_DECLSPEC, for visibility), 
and is properly instantiated in orte/mca/odls/base/odls_base_open.c.

Paul: can you run some nm's and see how the orte_odls symbol appears in 
libopen-rte.a?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] 1.4.5rc2 testing linux/ppc/IBM

2012-01-27 Thread Jeff Squyres
On Jan 27, 2012, at 12:45 AM, Paul H. Hargrove wrote:

> On this cluster, statfs() is returning ENOENT, which is breaking 
> opal_path_nfs().
> So, these results are with test/opal/util/opal_path_nfs.c "disabled".

Paul -- can you explain this a little more?  There should be logic in there to 
effectively handle ENOENT's, meaning that if we get a non-ESTALE error, we try 
again with the directory name.  This is repeated until we get to "/" -- so 
there should definitely be at least one case where statfs() is *not* returning 
ENOENT.

Is that not happening?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




[hwloc-devel] hwloc and HTX device ?

2012-01-27 Thread Brice Goglin
Hello,

I'd like to see what hwloc reports on AMD machines with a HTX card
(hypertransport expansion card). The most widely known case would likely
be a 3-5-years old AMD cluster with Pathscale Infinipath network cards.
But I think there are also some accelerators such as clearspeed, and the
numaconnect single-image interconnect.

HTX slots do not involve PCI, but AMD may have implemented some glue to
make them appear in lspci anyway. So it's not clear if hwloc will see
them or not.

If anybody has access to such a machine, could you please run lstopo
(>=1.3) there and tell us if the HTX device appear? if not, we'll need
to see if there are some /dev files to look at. If yes, does it appear
close to a single socket ? If so, is this the right socket ? (feel free
to tell what model of machine this is, I will check in the motherboard
manual to make sure this is the right socket).

Thanks
Brice



[OMPI devel] 1.4.5rc2 testing linux/ppc/IBM

2012-01-27 Thread Paul H. Hargrove

More positive results, with a caveat.

On this cluster, statfs() is returning ENOENT, which is breaking 
opal_path_nfs().

So, these results are with test/opal/util/opal_path_nfs.c "disabled".

PASS (defined as "make all install check")
   Linux/ppc32 with xlc-11.1 and xlf-13.1 compilers
   Linux/ppc64 with xlc-11.1 and xlf-13.1 compilers
   Linux/ppc64 with "Advance Toolchain 3.0" compilers
   Linux/ppc64 with "Advance Toolchain 4.0" compilers

Where "Advance Toolchain" are IBM's GCC and bintools variants for POWER7:
gcc (GCC) 4.4.4 20100316 (Advance-Toolchain-3.0) [merged from 
redhat/gcc-4_4-branch, 162934]
gcc (GCC) 4.5.4 20110524 (Advance-Toolchain-4.0-2) [ibm/gcc-4_5-branch 
revision 174864]


-Paul

On 1/19/2012 9:55 AM, Jeff Squyres wrote:

Please test:

 http://www.open-mpi.org/software/ompi/v1.4/



--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900