Re: [OMPI devel] v1.5 r25914 DOA

2012-02-21 Thread Paul H. Hargrove
My build with the "2011_sp1.8.273" Intel compilers passes the same tests 
as I detailed below for "2011_sp1.7.256".
I don't suspect any longer that the compiler is at fault, but am willing 
to try additional/alternate tests to help confirm.


-Paul

On 2/21/2012 5:40 PM, Paul H. Hargrove wrote:

Here are the first of the results of the testing I promised.
I am not 100% sure how to reach the code that Eugene reported as 
problematic, so I tried just running the ring test with various 
-bind-to-* options.   I am quite willing to run additional test 
cases.  All runs are w/ OMPI_MCA_btl=sm,self.


+ 2011.5.220
  FAIL: "make check" fails opal_datatype_test
  OK: mpirun -np 2 ./ring_c
  OK: mpirun -np 2 -bind-to-none ./ring_c
  OK: mpirun -np 2 -bind-to-core ./ring_c
  OK: mpirun -np 2 -bind-to-socket ./ring_c

+ 2011_sp1.7.256
  OK: "make check"
  OK: mpirun -np 2 -bind-to-none ./ring_c
  OK: mpirun -np 2 -bind-to-core ./ring_c
  OK: mpirun -np 2 -bind-to-socket ./ring_c

So, I don't think the "2011_sp1.7.256" compilers are broken (and are 
"better" than the ones I've been using).
I have a build with "2011_sp1.8.273" churning away right now (est. 
45minutes to complete - should have disabled the Fortan bindings)


If there is something other than the -bind-to-* flags I should be 
using to reach the problematic code, let me know.
But based on what I've seen so far, I think we can probably rule out 
the compiler as the problem.


-Paul


On 2/21/2012 4:37 PM, Paul H. Hargrove wrote:
I have been testing v1.5 with slightly older Intel 
"composerxe-2011.5.220" compilers.
I see a "make check" failure in opal_datatype_test which is not 
present with any other compiler (such as gcc on the same node).
This has been seen most recently on the 1.5.5rc2r25990 tarball 
generated earlier today.
With "make check -k" I can confirm that opal_datatype_test is the 
ONLY failure I see with this compiler.
So, I have just assumed this was a buggy compiler and thought nothing 
more of it.


I have not yet tested them, but also have the same 
"composer_xe_2011_sp1.7.256" compiler and a more recent 
"composer_xe_2011_sp1.8.273".  I will test both ASAP and report back 
with my findings.


-Paul


On 2/21/2012 4:20 PM, Eugene Loh wrote:
We have some amount of MTT testing going on every night and on ONE 
of our systems v1.5 has been dead since r25914.  The system is


Linux burl-ct-v20z-10 2.6.9-67.ELsmp #1 SMP Wed Nov 7 13:56:44 EST 
2007 x86_64 x86_64 x86_64 GNU/Linux


and I'm encountering the problem with Intel 
(composer_xe_2011_sp1.7.256) compilers.  I haven't poked around 
enough yet to figure out what the problematic characteristic of this 
configuration is.


In r25914, orte/mca/odls/base/odls_base_open.c, we get

222 /* get the number of local sockets unless we were given 
a number */

223 if (0 == orte_default_num_sockets_per_board) {
224 
opal_paffinity_base_get_socket_info(&orte_odls_globals.num_sockets);

225 }
226 /* get the number of local processors */
227 
opal_paffinity_base_get_processor_info(&orte_odls_globals.num_processors);

228 /* compute the base number of cores/socket, if not given */
229 if (0 == orte_default_num_cores_per_socket) {
230 orte_odls_globals.num_cores_per_socket = 
orte_odls_globals.num_processors / orte_odls_globals.num_sockets;

231 }

Well, we execute the branch at line 224, but num_sockets remains 0.  
This leads to the divide-by-0 at line 230.  Digging deeper, the call 
at line 224 led us to 
opal/mca/paffinity/hwloc/paffinity_hwloc_module.c (lots of stuff 
left out):


static int module_get_socket_info(int *num_sockets) {
hwloc_topology_t *t = &opal_hwloc_topology;
*num_sockets = (int) hwloc_get_nbobjs_by_type(*t, 
HWLOC_OBJ_SOCKET);

return OPAL_SUCCESS;
}

Anyhow, SOCKET is somehow an unknown layer, so num_sockets is 
returning 0.


I can poke around more, but does someone want to advise?
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel






--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] v1.5 r25914 DOA

2012-02-21 Thread Paul H. Hargrove

Here are the first of the results of the testing I promised.
I am not 100% sure how to reach the code that Eugene reported as 
problematic, so I tried just running the ring test with various 
-bind-to-* options.   I am quite willing to run additional test cases.  
All runs are w/ OMPI_MCA_btl=sm,self.


+ 2011.5.220
  FAIL: "make check" fails opal_datatype_test
  OK: mpirun -np 2 ./ring_c
  OK: mpirun -np 2 -bind-to-none ./ring_c
  OK: mpirun -np 2 -bind-to-core ./ring_c
  OK: mpirun -np 2 -bind-to-socket ./ring_c

+ 2011_sp1.7.256
  OK: "make check"
  OK: mpirun -np 2 -bind-to-none ./ring_c
  OK: mpirun -np 2 -bind-to-core ./ring_c
  OK: mpirun -np 2 -bind-to-socket ./ring_c

So, I don't think the "2011_sp1.7.256" compilers are broken (and are 
"better" than the ones I've been using).
I have a build with "2011_sp1.8.273" churning away right now (est. 
45minutes to complete - should have disabled the Fortan bindings)


If there is something other than the -bind-to-* flags I should be using 
to reach the problematic code, let me know.
But based on what I've seen so far, I think we can probably rule out the 
compiler as the problem.


-Paul


On 2/21/2012 4:37 PM, Paul H. Hargrove wrote:
I have been testing v1.5 with slightly older Intel 
"composerxe-2011.5.220" compilers.
I see a "make check" failure in opal_datatype_test which is not 
present with any other compiler (such as gcc on the same node).
This has been seen most recently on the 1.5.5rc2r25990 tarball 
generated earlier today.
With "make check -k" I can confirm that opal_datatype_test is the ONLY 
failure I see with this compiler.
So, I have just assumed this was a buggy compiler and thought nothing 
more of it.


I have not yet tested them, but also have the same 
"composer_xe_2011_sp1.7.256" compiler and a more recent 
"composer_xe_2011_sp1.8.273".  I will test both ASAP and report back 
with my findings.


-Paul


On 2/21/2012 4:20 PM, Eugene Loh wrote:
We have some amount of MTT testing going on every night and on ONE of 
our systems v1.5 has been dead since r25914.  The system is


Linux burl-ct-v20z-10 2.6.9-67.ELsmp #1 SMP Wed Nov 7 13:56:44 EST 
2007 x86_64 x86_64 x86_64 GNU/Linux


and I'm encountering the problem with Intel 
(composer_xe_2011_sp1.7.256) compilers.  I haven't poked around 
enough yet to figure out what the problematic characteristic of this 
configuration is.


In r25914, orte/mca/odls/base/odls_base_open.c, we get

222 /* get the number of local sockets unless we were given a 
number */

223 if (0 == orte_default_num_sockets_per_board) {
224 
opal_paffinity_base_get_socket_info(&orte_odls_globals.num_sockets);

225 }
226 /* get the number of local processors */
227 
opal_paffinity_base_get_processor_info(&orte_odls_globals.num_processors);

228 /* compute the base number of cores/socket, if not given */
229 if (0 == orte_default_num_cores_per_socket) {
230 orte_odls_globals.num_cores_per_socket = 
orte_odls_globals.num_processors / orte_odls_globals.num_sockets;

231 }

Well, we execute the branch at line 224, but num_sockets remains 0.  
This leads to the divide-by-0 at line 230.  Digging deeper, the call 
at line 224 led us to 
opal/mca/paffinity/hwloc/paffinity_hwloc_module.c (lots of stuff left 
out):


static int module_get_socket_info(int *num_sockets) {
hwloc_topology_t *t = &opal_hwloc_topology;
*num_sockets = (int) hwloc_get_nbobjs_by_type(*t, HWLOC_OBJ_SOCKET);
return OPAL_SUCCESS;
}

Anyhow, SOCKET is somehow an unknown layer, so num_sockets is 
returning 0.


I can poke around more, but does someone want to advise?
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] v1.5 build failure w/ Solaris Studio 12.2 on Linux

2012-02-21 Thread Paul H. Hargrove

A few things to note:

1) This is NOT a problem w/ the SS12.3 compilers on the same machine.
So, one could say "upgrade your compiler" (a free download) and not 
delay 1.5.5 for this issue.


2) This is ONLY a problem on Linux, and not on Solaris (both SS12.2 and 
SS12.3 tested for x86, x86-64, Sparc/v9 and Sparc/v8plus)


3) Testing the trunk I DON'T see the problem with either SS12.2 or SS12.3.
This is interesting, because it probably means that a u_char definition 
is SOMEWHERE in the headers (because libevent *is* getting built).


Whatever else may be done, I think this should be fixed "properly" 
(whatever that may equate to) for 1.6.
The way I see it now, it feels like OMPI is getting a definition of 
u_char only "by accident".


-Paul

On 2/21/2012 12:16 PM, Paul H. Hargrove wrote:
Building the v1.5 branch on Linux with the Solaris Studio 12.2 
compilers I see the following failure:
"[srcdir]/opal/event/event.h", line 797: Error: Type name expected 
instead of "u_char".
"[srcdir]/opal/event/event.h", line 798: Error: Type name expected 
instead of "u_char".
"[srcdir]/opal/event/event.h", line 1184: Error: "," expected instead 
of "*".

Where line 1184 is a prototype containing "u_char *".

As far as I can find, only several files below opal/event/ contain any 
use of "u_char".

There is a typedef for u_char in hwloc, but no use that I could see.

To the best of my knowledge u_char is NOT defined by any standard, and 
thus there is no particular header one can reliably find it in.
The alternatives, of course are "unsigned char" or "uint8_t" (defined 
in stdint.h).


I had a look at the trunk and VISUALLY is appears the same problem 
exists in:

   opal/event/event.h
   opal/mca/event/libevent2013/libevent/event.h
However, my testing is currently confined to the v1.5 branch in the 
hopes of finally getting the next 1.5.5rc out the door.


-Paul



--
Paul H. hargrovephhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] v1.5 r25914 DOA

2012-02-21 Thread Paul H. Hargrove
I have been testing v1.5 with slightly older Intel 
"composerxe-2011.5.220" compilers.
I see a "make check" failure in opal_datatype_test which is not present 
with any other compiler (such as gcc on the same node).
This has been seen most recently on the 1.5.5rc2r25990 tarball generated 
earlier today.
With "make check -k" I can confirm that opal_datatype_test is the ONLY 
failure I see with this compiler.
So, I have just assumed this was a buggy compiler and thought nothing 
more of it.


I have not yet tested them, but also have the same 
"composer_xe_2011_sp1.7.256" compiler and a more recent 
"composer_xe_2011_sp1.8.273".  I will test both ASAP and report back 
with my findings.


-Paul


On 2/21/2012 4:20 PM, Eugene Loh wrote:
We have some amount of MTT testing going on every night and on ONE of 
our systems v1.5 has been dead since r25914.  The system is


Linux burl-ct-v20z-10 2.6.9-67.ELsmp #1 SMP Wed Nov 7 13:56:44 EST 
2007 x86_64 x86_64 x86_64 GNU/Linux


and I'm encountering the problem with Intel 
(composer_xe_2011_sp1.7.256) compilers.  I haven't poked around enough 
yet to figure out what the problematic characteristic of this 
configuration is.


In r25914, orte/mca/odls/base/odls_base_open.c, we get

222 /* get the number of local sockets unless we were given a 
number */

223 if (0 == orte_default_num_sockets_per_board) {
224 
opal_paffinity_base_get_socket_info(&orte_odls_globals.num_sockets);

225 }
226 /* get the number of local processors */
227 
opal_paffinity_base_get_processor_info(&orte_odls_globals.num_processors);

228 /* compute the base number of cores/socket, if not given */
229 if (0 == orte_default_num_cores_per_socket) {
230 orte_odls_globals.num_cores_per_socket = 
orte_odls_globals.num_processors / orte_odls_globals.num_sockets;

231 }

Well, we execute the branch at line 224, but num_sockets remains 0.  
This leads to the divide-by-0 at line 230.  Digging deeper, the call 
at line 224 led us to 
opal/mca/paffinity/hwloc/paffinity_hwloc_module.c (lots of stuff left 
out):


static int module_get_socket_info(int *num_sockets) {
hwloc_topology_t *t = &opal_hwloc_topology;
*num_sockets = (int) hwloc_get_nbobjs_by_type(*t, HWLOC_OBJ_SOCKET);
return OPAL_SUCCESS;
}

Anyhow, SOCKET is somehow an unknown layer, so num_sockets is 
returning 0.


I can poke around more, but does someone want to advise?
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] v1.5 r25914 DOA

2012-02-21 Thread Jeffrey Squyres
What's the output of running lstopo from hwloc 1.3.2?  (this is the version 
that's in the OMPI trunk and v1.5 branches)

http://www.open-mpi.org/software/hwloc/v1.3/

Is there any difference from v1.4 hwloc?

http://www.open-mpi.org/software/hwloc/v1.4/


On Feb 21, 2012, at 7:20 PM, Eugene Loh wrote:

> We have some amount of MTT testing going on every night and on ONE of our 
> systems v1.5 has been dead since r25914.  The system is
> 
> Linux burl-ct-v20z-10 2.6.9-67.ELsmp #1 SMP Wed Nov 7 13:56:44 EST 2007 
> x86_64 x86_64 x86_64 GNU/Linux
> 
> and I'm encountering the problem with Intel (composer_xe_2011_sp1.7.256) 
> compilers.  I haven't poked around enough yet to figure out what the 
> problematic characteristic of this configuration is.
> 
> In r25914, orte/mca/odls/base/odls_base_open.c, we get
> 
>222 /* get the number of local sockets unless we were given a number */
>223 if (0 == orte_default_num_sockets_per_board) {
>224 
> opal_paffinity_base_get_socket_info(&orte_odls_globals.num_sockets);
>225 }
>226 /* get the number of local processors */
>227 
> opal_paffinity_base_get_processor_info(&orte_odls_globals.num_processors);
>228 /* compute the base number of cores/socket, if not given */
>229 if (0 == orte_default_num_cores_per_socket) {
>230 orte_odls_globals.num_cores_per_socket = 
> orte_odls_globals.num_processors / orte_odls_globals.num_sockets;
>231 }
> 
> Well, we execute the branch at line 224, but num_sockets remains 0.  This 
> leads to the divide-by-0 at line 230.  Digging deeper, the call at line 224 
> led us to opal/mca/paffinity/hwloc/paffinity_hwloc_module.c (lots of stuff 
> left out):
> 
> static int module_get_socket_info(int *num_sockets) {
>hwloc_topology_t *t = &opal_hwloc_topology;
>*num_sockets = (int) hwloc_get_nbobjs_by_type(*t, HWLOC_OBJ_SOCKET);
>return OPAL_SUCCESS;
> }
> 
> Anyhow, SOCKET is somehow an unknown layer, so num_sockets is returning 0.
> 
> I can poke around more, but does someone want to advise?
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI devel] v1.5 r25914 DOA

2012-02-21 Thread Eugene Loh
We have some amount of MTT testing going on every night and on ONE of 
our systems v1.5 has been dead since r25914.  The system is


Linux burl-ct-v20z-10 2.6.9-67.ELsmp #1 SMP Wed Nov 7 13:56:44 EST 2007 
x86_64 x86_64 x86_64 GNU/Linux


and I'm encountering the problem with Intel (composer_xe_2011_sp1.7.256) 
compilers.  I haven't poked around enough yet to figure out what the 
problematic characteristic of this configuration is.


In r25914, orte/mca/odls/base/odls_base_open.c, we get

222 /* get the number of local sockets unless we were given a 
number */

223 if (0 == orte_default_num_sockets_per_board) {
224 
opal_paffinity_base_get_socket_info(&orte_odls_globals.num_sockets);

225 }
226 /* get the number of local processors */
227 
opal_paffinity_base_get_processor_info(&orte_odls_globals.num_processors);

228 /* compute the base number of cores/socket, if not given */
229 if (0 == orte_default_num_cores_per_socket) {
230 orte_odls_globals.num_cores_per_socket = 
orte_odls_globals.num_processors / orte_odls_globals.num_sockets;

231 }

Well, we execute the branch at line 224, but num_sockets remains 0.  
This leads to the divide-by-0 at line 230.  Digging deeper, the call at 
line 224 led us to opal/mca/paffinity/hwloc/paffinity_hwloc_module.c 
(lots of stuff left out):


static int module_get_socket_info(int *num_sockets) {
hwloc_topology_t *t = &opal_hwloc_topology;
*num_sockets = (int) hwloc_get_nbobjs_by_type(*t, HWLOC_OBJ_SOCKET);
return OPAL_SUCCESS;
}

Anyhow, SOCKET is somehow an unknown layer, so num_sockets is returning 0.

I can poke around more, but does someone want to advise?


Re: [OMPI devel] RFC: Allocate free list payload if free list isn't specified

2012-02-21 Thread Nathan Hjelm


On Tue, 21 Feb 2012, Rolf vandeVaart wrote:


I think I am OK with this.

Alternatively, you could have done something like is done in the TCP BTL where 
the payload and header are added together for the frag size?
To state more clearly, I was trying to say you could do something similar to 
what is done at line 1015 in btl_tcp_component.c and ended up with the same 
results?


That will more or less work for my current use case (I found those examples 
this morning). I would have to pad my fragments to ensure cache line alignment 
(if that ends up being faster for SMSG).


This is just making the payload buffer a different chunk of memory than the 
headers?


Yes.


I am just trying to understand the motivation for the change.


The motivation is to allow the payload to be aligned separately from the 
header. Currently, if I want payload alignment I have to pad the header to get 
the correct alignment on the data.


I think the way you have it is more correct so we can support the case where 
someone specifies the header size and the payload size differently and expects 
the free list code to do the right thing.


This is one of my motivations. When writing the uGNI BTL I expected the free list to do 
the "right thing" and allocate a payload buffer even if I didn't specify an 
mpool.

-Nathan


Re: [OMPI devel] RFC: Allocate free list payload if free list isn't specified

2012-02-21 Thread Rolf vandeVaart
I think I am OK with this.  

Alternatively, you could have done something like is done in the TCP BTL where 
the payload and header are added together for the frag size?
To state more clearly, I was trying to say you could do something similar to 
what is done at line 1015 in btl_tcp_component.c and ended up with the same 
results?

This is just making the payload buffer a different chunk of memory than the 
headers?

I am just trying to understand the motivation for the change.

I think the way you have it is more correct so we can support the case where 
someone specifies the header size and the payload size differently and expects 
the free list code to do the right thing.

Rolf

>-Original Message-
>From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>On Behalf Of Nathan Hjelm
>Sent: Tuesday, February 21, 2012 3:59 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] RFC: Allocate free list payload if free list isn't
>specified
>
>Opps, screwed up the title. Should be: RFC: Allocate requested free list
>payload even if an mpool isn't specified.
>
>-Nathan
>
>On Tue, 21 Feb 2012, Nathan Hjelm wro
>
>> What: Allocate free list payload even if a payload size is specified
>> even if no mpool is specified.
>>
>> When: Thursday, Feb 23, 2012
>>
>> Why: The current behavior is to ignore the payload size if no mpool is
>> specified. I see no reason why a payload buffer should't be allocated
>> in the no mpool case. Thoughts?
>>
>> Patch is attached.
>>
>> -Nathan Hjelm
>> HPC-3, LANL
>>
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



Re: [OMPI devel] RFC: Allocate free list payload if free list isn't specified

2012-02-21 Thread Nathan Hjelm

Opps, screwed up the title. Should be: RFC: Allocate requested free list 
payload even if an mpool isn't specified.

-Nathan

On Tue, 21 Feb 2012, Nathan Hjelm wrote:

What: Allocate free list payload even if a payload size is specified even if 
no mpool is specified.


When: Thursday, Feb 23, 2012

Why: The current behavior is to ignore the payload size if no mpool is 
specified. I see no reason why a payload buffer should't be allocated in the 
no mpool case. Thoughts?


Patch is attached.

-Nathan Hjelm
HPC-3, LANL



[OMPI devel] RFC: Allocate free list payload if free list isn't specified

2012-02-21 Thread Nathan Hjelm

What: Allocate free list payload even if a payload size is specified even if no 
mpool is specified.

When: Thursday, Feb 23, 2012

Why: The current behavior is to ignore the payload size if no mpool is 
specified. I see no reason why a payload buffer should't be allocated in the no 
mpool case. Thoughts?

Patch is attached.

-Nathan Hjelm
HPC-3, LANLdiff --git a/ompi/class/ompi_free_list.c b/ompi/class/ompi_free_list.c
index d468a70..e3c0988 100644
--- a/ompi/class/ompi_free_list.c
+++ b/ompi/class/ompi_free_list.c
@@ -1,4 +1,4 @@
-/* -*- Mode: C; c-basic-offset:4 ; -*- */
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */
 /*
  * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
  * University Research and Technology
@@ -13,6 +13,8 @@
  * Copyright (c) 2006-2007 Mellanox Technologies. All rights reserved.
  * Copyright (c) 2010  Cisco Systems, Inc. All rights reserved.
  * Copyright (c) 2011  NVIDIA Corporation.  All rights reserved.
+ * Copyright (c) 2012  Los Alamos National Security, LLC. All rights
+ * reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -70,23 +72,19 @@ static void ompi_free_list_destruct(ompi_free_list_t* fl)
 }
 #endif
 
-if( NULL != fl->fl_mpool ) {
-while(NULL != (item = opal_list_remove_first(&(fl->fl_allocations {
-fl_mem = (ompi_free_list_memory_t*)item;
+while(NULL != (item = opal_list_remove_first(&(fl->fl_allocations {
+fl_mem = (ompi_free_list_memory_t*)item;
 
+if( NULL != fl->fl_mpool ) {
 fl->fl_mpool->mpool_free(fl->fl_mpool, fl_mem->ptr,
  fl_mem->registration);
-
-/* destruct the item (we constructed it), then free the memory 
chunk */
-OBJ_DESTRUCT(item);
-free(item);
-}
-} else {
-while(NULL != (item = opal_list_remove_first(&(fl->fl_allocations {
-/* destruct the item (we constructed it), then free the memory 
chunk */
-OBJ_DESTRUCT(item);
-free(item);
+} else if (fl_mem->ptr) {
+free (fl_mem->ptr);
 }
+
+/* destruct the item (we constructed it), then free the memory chunk */
+OBJ_DESTRUCT(item);
+free(item);
 }
 
 OBJ_DESTRUCT(&fl->fl_allocations);
@@ -171,7 +169,7 @@ int ompi_free_list_init_ex_new(
 }
 int ompi_free_list_grow(ompi_free_list_t* flist, size_t num_elements)
 {
-unsigned char *ptr, *mpool_alloc_ptr = NULL;
+unsigned char *ptr, *mpool_alloc_ptr = NULL, *payload_ptr;
 ompi_free_list_memory_t *alloc_ptr;
 size_t i, alloc_size, head_size, elem_size = 0;
 mca_mpool_base_registration_t *reg = NULL;
@@ -201,7 +199,7 @@ int ompi_free_list_grow(ompi_free_list_t* flist, size_t 
num_elements)
 elem_size = OPAL_ALIGN(flist->fl_payload_buffer_size, 
 flist->fl_payload_buffer_alignment, size_t);
 if(elem_size != 0) {
-mpool_alloc_ptr = (unsigned char *) 
flist->fl_mpool->mpool_alloc(flist->fl_mpool,
+payload_ptr = mpool_alloc_ptr = (unsigned char *) 
flist->fl_mpool->mpool_alloc(flist->fl_mpool,
num_elements * elem_size, 
flist->fl_payload_buffer_alignment,
MCA_MPOOL_FLAGS_CACHE_BYPASS | 
MCA_MPOOL_FLAGS_CUDA_REGISTER_MEM, ®);
 if(NULL == mpool_alloc_ptr) {
@@ -209,6 +207,26 @@ int ompi_free_list_grow(ompi_free_list_t* flist, size_t 
num_elements)
 return OMPI_ERR_TEMP_OUT_OF_RESOURCE;
 }
 }
+} else if (0 != flist->fl_payload_buffer_size) {
+elem_size = OPAL_ALIGN(flist->fl_payload_buffer_size, 
+   flist->fl_payload_buffer_alignment, size_t);
+if(elem_size != 0) {
+#ifdef HAVE_POSIX_MEMALIGN
+posix_memalign ((void **) &mpool_alloc_ptr, 
flist->fl_payload_buffer_alignment,
+num_elements * elem_size);
+payload_ptr = mpool_alloc_ptr;
+#else
+mpool_alloc_ptr = malloc (num_elements * elem_size +
+  flist->fl_payload_buffer_alignment);
+payload_ptr = (void*)OPAL_ALIGN((uintptr_t)mpool_alloc_ptr, 
+flist->fl_payload_buffer_alignment,
+uintptr_t);
+#endif
+if(NULL == mpool_alloc_ptr) {
+free(alloc_ptr);
+return OMPI_ERR_TEMP_OUT_OF_RESOURCE;
+}
+}
 }
 
 /* make the alloc_ptr a list item, save the chunk in the allocations list,
@@ -225,7 +243,7 @@ int ompi_free_list_grow(ompi_free_list_t* flist, size_t 
num_elements)
 for(i=0; iregistration = reg;
-item->ptr = mpool_alloc_ptr;
+item->ptr = payload_ptr;
 
 OBJ_CONSTRUCT_INTERNAL(item, flist->fl_frag_class);
 
@@ -236,7 +254,7 @@ int ompi_free_li

[OMPI devel] v1.5 build failure w/ Solaris Studio 12.2 on Linux

2012-02-21 Thread Paul H. Hargrove
Building the v1.5 branch on Linux with the Solaris Studio 12.2 compilers 
I see the following failure:
"[srcdir]/opal/event/event.h", line 797: Error: Type name expected 
instead of "u_char".
"[srcdir]/opal/event/event.h", line 798: Error: Type name expected 
instead of "u_char".
"[srcdir]/opal/event/event.h", line 1184: Error: "," expected instead 
of "*".

Where line 1184 is a prototype containing "u_char *".

As far as I can find, only several files below opal/event/ contain any 
use of "u_char".

There is a typedef for u_char in hwloc, but no use that I could see.

To the best of my knowledge u_char is NOT defined by any standard, and 
thus there is no particular header one can reliably find it in.
The alternatives, of course are "unsigned char" or "uint8_t" (defined in 
stdint.h).


I had a look at the trunk and VISUALLY is appears the same problem 
exists in:

   opal/event/event.h
   opal/mca/event/libevent2013/libevent/event.h
However, my testing is currently confined to the v1.5 branch in the 
hopes of finally getting the next 1.5.5rc out the door.


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] poor btl sm latency

2012-02-21 Thread Matthias Jurenz
Some supplements:

I tried several compilers for building Open MPI with enabled optimizations for 
the AMD Bulldozer architecture:

* gcc 4.6.2 (-Ofast -mtune=bdver1 -march=bdver1)
* Open64 5.0 (-O3 -march=bgver1 -mtune=bdver1 -mso)
* Intel 12.1 (-O3 -msse4.2)

They all result in similar latencies (~1.4us).

As I mentioned in my previous mail, I get the best results if the processes 
are bound for disabling L2 sharing (i.e. --bind-to-core --cpus-per-proc 2).
Just see what happens when doing this for Platform MPI:

Without process binding:

$ mpirun -np 2 ./NPmpi_pcmpi -u 4 -n 10
0: n091
1: n091
Now starting the main loop
  0:   1 bytes  1 times --> 16.89 Mbps in   0.45 usec
  1:   2 bytes  1 times --> 34.11 Mbps in   0.45 usec
  2:   3 bytes  1 times --> 51.01 Mbps in   0.45 usec
  3:   4 bytes  1 times --> 68.13 Mbps in   0.45 usec

With process binding using 'taskset':

$ mpirun -np 2 taskset -c 0,2 ./NPmpi_pcmpi -u 4 -n 1
0: n051
1: n051
Now starting the main loop
  0:   1 bytes  1 times --> 29.33 Mbps in   0.26 usec
  1:   2 bytes  1 times --> 58.64 Mbps in   0.26 usec
  2:   3 bytes  1 times --> 88.05 Mbps in   0.26 usec
  3:   4 bytes  1 times -->117.33 Mbps in   0.26 usec

I tried to change some parameters of the SM BTL described here: 
http://www.open-mpi.org/faq/?category=sm#sm-params - but also without success.

Do you have any further ideas?

Matthias

On Monday 20 February 2012 13:46:54 Matthias Jurenz wrote:
> If the processes are bound for L2 sharing (i.e. using neighboring cores
> pu:0 and pu:1) I get the *worst* latency results:
> 
> $ mpiexec -np 1 hwloc-bind pu:0 ./NPmpi -S -u 4 -n 10 : -np 1
> hwloc-bind pu:1 ./NPmpi -S -u 4 -n 10
> Using synchronous sends
> Using synchronous sends
> 0: n023
> 1: n023
> Now starting the main loop
>   0:   1 bytes 10 times -->  3.54 Mbps in   2.16 usec
>   1:   2 bytes 10 times -->  7.10 Mbps in   2.15 usec
>   2:   3 bytes 10 times --> 10.68 Mbps in   2.14 usec
>   3:   4 bytes 10 times --> 14.23 Mbps in   2.15 usec
> 
> As it should, I get the same result when using '-bind-to-core' *without*
> '-- cpus-per-proc 2'.
> 
> When using two separate L2's (pu:0,pu:2 or '--cpus-per-proc 2') I get
> better results:
> 
> $ mpiexec -np 1 hwloc-bind pu:0 ./NPmpi -S -u 4 -n 10 : -np 1
> hwloc-bind pu:2 ./NPmpi -S -u 4 -n 10
> Using synchronous sends
> 0: n023
> Using synchronous sends
> 1: n023
> Now starting the main loop
>   0:   1 bytes 10 times -->  5.15 Mbps in   1.48 usec
>   1:   2 bytes 10 times --> 10.15 Mbps in   1.50 usec
>   2:   3 bytes 10 times --> 15.26 Mbps in   1.50 usec
>   3:   4 bytes 10 times --> 20.23 Mbps in   1.51 usec
> 
> So it seems that the process binding within Open MPI works and retires as
> reason for the bad latency :-(
> 
> Matthias
> 
> On Thursday 16 February 2012 17:51:53 Brice Goglin wrote:
> > Le 16/02/2012 17:12, Matthias Jurenz a écrit :
> > > Thanks for the hint, Brice.
> > > I'll forward this bug report to our cluster vendor.
> > > 
> > > Could this be the reason for the bad latencies with Open MPI or does it
> > > only affect hwloc/lstopo?
> > 
> > It affects binding. So it may affect the performance you observed when
> > using "high-level" binding policies that end up binding on wrong cores
> > because of hwloc/kernel problems. If you specify binding manually, it
> > shouldn't hurt.
> > 
> > If the best latency case is supposed to be when L2 is shared, then try:
> > mpiexec -np 1 hwloc-bind pu:0 ./all2all : -np 1 hwloc-bind pu:1
> > 
> > ./all2all
> > Then, we'll see if you can get the same result with one of OMPI binding
> > options.
> > 
> > Brice
> > 
> > > Matthias
> > > 
> > > On Thursday 16 February 2012 15:46:46 Brice Goglin wrote:
> > >> Le 16/02/2012 15:39, Matthias Jurenz a écrit :
> > >>> Here the output of lstopo from a single compute node. I'm wondering
> > >>> that the fact of L1/L2 sharing isn't visible - also not in the
> > >>> graphical output...
> > >> 
> > >> That's a kernel bug. We're waiting for AMD to tell the kernel that L1i
> > >> and L2 are shared across dual-core modules. If you have some contact
> > >> at AMD, please tell them to look at
> > >> https://bugzilla.kernel.org/show_bug.cgi?id=42607
> > >> 
> > >> Brice
> > >> 
> > >> ___
> > >> devel mailing list
> > >> de...@open-mpi.org
> > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > > 
> > > ___
> > > devel mailing list
> > > de...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > 
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 

Re: [OMPI devel] [EXTERNAL] Re: trunk build failure on Altix [w/WORK AROUND]

2012-02-21 Thread Jeffrey Squyres
CMR filed; custom v1.5 patch attached:

https://svn.open-mpi.org/trac/ompi/ticket/3024

On Feb 20, 2012, at 4:52 PM, Jeff Squyres (jsquyres) wrote:

> Yo Brian --
> 
> Do we need to bring this to v1.5, too?
> 
> 
> On Feb 20, 2012, at 11:49 AM, Barrett, Brian W wrote:
> 
> > Hi Paul -
> >
> > Thanks for noticing this.  I guess we don't have many Altix developers.  I
> > think I've fixed it on the trunk with r25968, plus r25967 to make sure the
> > Altix component gets selected over the Linux component on Altix systems.
> > I don't have an Altix to test on; can you give it a go and let me know if
> > it works?  In the trunk right now, and should be in the trunk nightly
> > tarball tomorrow morning.
> >
> > The problem cropped up when we started running the configure macros for
> > components that couldn't possibly succeed (which we needed to make
> > Automake happy in a couple of situations) sometime late in the 1.5 series.
> > Before that, a component could never think it succeeded and then later be
> > told it didn't.  We added yet another macro to handle issues like this, so
> > it was a fairly easy fix.
> >
> > Thanks,
> >
> > Brian
> >
> > On 2/17/12 4:26 PM, "Paul H. Hargrove"  wrote:
> >
> >>
> >>
> >>
> >>   I've poked enough at the ompi configure magic to *think* I
> >>   understand the source of the problem I've seen w/ both trunk and
> >>   1.5.x on the Altix.
> >>
> >>   The problem appears to be that both timer/altix/configure.m4 and
> >>   timer/linux/configure.m4 are setting the value of
> >>   $timer_base_include and the LAST one "wins".  Meanwhile, only the
> >>   FIRST one is getting added to $static_components ("there can be only
> >>   one").  So, I suspect the difference I saw between trunk and 1.5 was
> >>   just a matter of which configure probe ran first.
> >>
> >>   The result of having FIRST and LAST "win" in different settings is a
> >>   mismatch.
> >>
> >>
> >> $ grep -e timer:linux -e timer:altix
> >> configure.out
> >> --- MCA component timer:linux (m4 configuration macro, priority
> >> 30)
> >> checking for MCA component timer:linux compile mode... static
> >> checking if MCA component timer:linux can compile... yes
> >> --- MCA component timer:altix (m4 configuration macro, priority
> >> 30)
> >> checking for MCA component timer:altix compile mode... static
> >> checking if MCA component timer:altix can compile... no
> >>
> >>
> >>   which picks timer:linux and rejects timer:altix, as compared to:
> >>
> >>
> >> $ grep -e '"MCA_opal_timer_[SD]' -e
> >> MCA_timer_ config.status
> >> S["MCA_opal_timer_DSO_SUBDIRS"]=""
> >> S["MCA_opal_timer_STATIC_SUBDIRS"]=" mca/timer/linux"
> >>
> >> S["MCA_opal_timer_STATIC_LTLIBS"]="mca/timer/linux/libmca_timer_linux.la
> >> "
> >> S["MCA_opal_timer_DSO_COMPONENTS"]=""
> >> S["MCA_opal_timer_STATIC_COMPONENTS"]=" linux"
> >> D["MCA_timer_IMPLEMENTATION_HEADER"]="
> >> \"opal/mca/timer/altix/timer_altix.h\""
> >>
> >>
> >>   Which will build timer:linux but has improperly picked up the
> >>   timer:altix HEADER!
> >>
> >>
> >>   For the present, an explicit --with-timer=altix works-around the
> >>   problem in either branch.
> >>   However, the setting of the header variable by a NON-selected
> >>   component is ERRONEOUS and should get fixed.
> >>   In trunk, it may also make sense to raise the priority of
> >>   timer:altix above that of timer:linux.
> >>
> >>   -Paul
> >>
> >>   On 2/15/2012 12:41 AM, Paul Hargrove wrote:
> >>
> >> I've configured the ompi trunk (nightly tarball 1.7a1r25927)
> >>   on an SGI Altix.
> >> I used no special arguments indicating that this is an Altix,
> >>   and there does not appear to be an altix-specific file in
> >>   contrib/platform.
> >>
> >>
> >> My build fails as follows:
> >>
> >>
> >>
> >>
> >> make:
> >> Entering directory
> >> `/mnt/home/c_phargrov/OMPI/openmpi-trunk-linux-ia64/BLD/opal/tools/wrapper
> >> s'
> >>   CC opal_wrapper.o
> >>   CCLD   opal_wrapper
> >> ../../../opal/.libs/libopen-pal.so: undefined reference to
> >> `opal_timer_altix_mmdev_timer_addr'
> >> ../../../opal/.libs/libopen-pal.so: undefined reference to
> >> `opal_timer_altix_freq'
> >> collect2: ld returned 1 exit status
> >>
> >>
> >>
> >>
> >>
> >>
> >> The configure-generated opal_config.h contains
> >> #define MCA_timer_IMPLEMENTATION_HEADER
> >>   "opal/mca/timer/altix/timer_altix.h"
> >>
> >>
> >> Nothing appears to have been built in
> >>   BUILDDIR/opal/mca/timer/altix.
> >> However, BUILDDIR/opal/mca/timer/linux has been built.
> >>
> >>
> >> -Paul
> >>
> >>
> >>   --
> >>   Paul H. Hargrove  phhargr...@lbl.gov
> >>   Future Technologies Group
> >>   HPC Research Department   Tel: +1-510-495-2352
> >> 
> >>   Lawrence Berkeley National Laboratory Fax: +1-510-486-

Re: [OMPI devel] excessive warnings on some BSDs [w/ PATCH]

2012-02-21 Thread Jeffrey Squyres
Committed and CMR'ed.  Thanks!

On Feb 17, 2012, at 3:22 PM, Paul H. Hargrove wrote:

> When building trunk or 1.5.x on OpenBSD-5.0 (and maybe others), I get *LOTS* 
> of the following:
>> /usr/include/arpa/inet.h:74: warning: 'struct in_addr' declared inside 
>> parameter list
>> /usr/include/arpa/inet.h:74: warning: its scope is only this definition or 
>> declaration, which is probably not what you want
>> /usr/include/arpa/inet.h:75: warning: 'struct in_addr' declared inside 
>> parameter list
> 
> This is trivial to fix by including netinet/in.h before arpa/inet.h (see 
> attached patch).
> The patch applies cleanly to both the trunk and the 1.5 branch (perhaps to 
> hold back until 1.6)
> 
> -Paul
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> HPC Research Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] Solaris/SOS build failure in trunk

2012-02-21 Thread Jeffrey Squyres
Should be fixed on the trunk in r25982.

On Feb 18, 2012, at 7:39 AM, Paul Hargrove wrote:

> Same has been seen on Solaris11/x86-64 w/ the Studio 12.3 compiler.
> However, a gcc build on the same system was fine.
> 
> -Paul
> 
> On Fri, Feb 17, 2012 at 10:49 AM, Paul H. Hargrove  wrote:
> Building last night's trunk tarball (1.7a1r25944) On Solaris10/SPARC w/ 
> Solaris Studio compilers if failing in "make check".
> This same problem is presenr with the 12.2 and 12.3 compilers and both v8plus 
> and v9 ABIs:
> 
> Making check in util
> make  opal_bit_ops opal_path_nfs  opal_sos
>  CC opal_bit_ops.o
>  CCLD   opal_bit_ops
>  CC opal_path_nfs.o
>  CCLD   opal_path_nfs
>  CC opal_sos.o
> "/home/hargrove/OMPI/openmpi-trunk-solaris10-sparcT2-ss12u2-v9//openmpi-trunk/test/util/opal_sos.c",
>  line 90: undefined symbol: OPAL_SOS_FUNCTION
> "/home/hargrove/OMPI/openmpi-trunk-solaris10-sparcT2-ss12u2-v9//openmpi-trunk/test/util/opal_sos.c",
>  line 90: warning: improper pointer/integer combination: arg #3
> "/home/hargrove/OMPI/openmpi-trunk-solaris10-sparcT2-ss12u2-v9//openmpi-trunk/test/util/opal_sos.c",
>  line 109: warning: improper pointer/integer combination: arg #3
> "/home/hargrove/OMPI/openmpi-trunk-solaris10-sparcT2-ss12u2-v9//openmpi-trunk/test/util/opal_sos.c",
>  line 129: warning: improper pointer/integer combination: arg #3
> "/home/hargrove/OMPI/openmpi-trunk-solaris10-sparcT2-ss12u2-v9//openmpi-trunk/test/util/opal_sos.c",
>  line 144: warning: improper pointer/integer combination: arg #3
> "/home/hargrove/OMPI/openmpi-trunk-solaris10-sparcT2-ss12u2-v9//openmpi-trunk/test/util/opal_sos.c",
>  line 153: warning: improper pointer/integer combination: arg #3
> cc: acomp failed for 
> /home/hargrove/OMPI/openmpi-trunk-solaris10-sparcT2-ss12u2-v9//openmpi-trunk/test/util/opal_sos.c
> 
> Let me know which bits are needed (config.log, opal_config.h, etc) and I'll 
> gladly send them (but figured the entire list didn't want to see them).
> 
> -Paul
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> HPC Research Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> HPC Research Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] non-portable code in examples/Makefile

2012-02-21 Thread Jeffrey Squyres
Right.

The revamped fortran stuff is *coming* -- it's off in a Mercurial bitbucket 
right now.  It's not on the trunk yet.  It's here, if you care:

https://bitbucket.org/jsquyres/mpi3-fortran


On Feb 21, 2012, at 7:57 AM, Paul H. Hargrove wrote:

> 
> 
> On 2/21/2012 2:55 AM, Jeff Squyres (jsquyres) wrote:
>> That is truly bizarre "make" behavior.
>> 
>> Heads up that in the upcoming fortran revamp, we *only* use FC. I.E., 
>> there's only mpifort wrapper compiler (mpif77 and mpif90 still exist, but 
>> only as sym links to mpifort, signifying that mpifort is the way of the 
>> future).
> 
> 
> But 12 hours ago with the current setup of distinct mpif77 and mpif90, 
> combined with the crazy setting-FC-also-sets-F77 behavior on make, here is 
> what I would see on a Solaris build w/ f77 bindings, but not f90:
>> 
>> $ make hello_f77
>> mpif90 -g  -o hello_f77
>> --
>> Unfortunately, this installation of Open MPI was not compiled with
>> Fortran 90 support.  As such, the mpif90 compiler is non-functional.
>> --
> 
> -Paul
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> HPC Research Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] non-portable code in examples/Makefile

2012-02-21 Thread Paul H. Hargrove



On 2/21/2012 2:55 AM, Jeff Squyres (jsquyres) wrote:

That is truly bizarre "make" behavior.

Heads up that in the upcoming fortran revamp, we *only* use FC. I.E., 
there's only mpifort wrapper compiler (mpif77 and mpif90 still exist, 
but only as sym links to mpifort, signifying that mpifort is the way 
of the future).



But 12 hours ago with the current setup of distinct mpif77 and mpif90, 
combined with the crazy setting-FC-also-sets-F77 behavior on make, here 
is what I would see on a Solaris build w/ f77 bindings, but not f90:


$ make hello_f77
mpif90 -g  -o hello_f77
--
Unfortunately, this installation of Open MPI was not compiled with
Fortran 90 support.  As such, the mpif90 compiler is non-functional.
--


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] non-portable code in examples/Makefile

2012-02-21 Thread Jeffrey Squyres
On Feb 21, 2012, at 6:39 AM, TERRY DONTJE wrote:

>> Heads up that in the upcoming fortran revamp, we *only* use FC. I.E., 
>> there's only mpifort wrapper compiler (mpif77 and mpif90 still exist, but 
>> only as sym links to mpifort, signifying that mpifort is the way of the 
>> future). 
>> 
>> This was done because there have been no f77 compilers for decades 
>> (literally), and no f90 compilers for 10+ years. All the fortran compiler 
>> vendors have long-since moved to a single compiler executable name (e.g., 
>> ifort, gfortran), so mpifort just reflects that. 
>> 
> Hmmm, well Oracle's compiler is still named f90 :-).   (now to duck and cover)

Yes, multiple vendors still have f90 (and/or f77)-named compilers.  
But these are just multiple entry points to a common back end, usually for 
legacy reasons (just like we'll still have mpif90 and mpif77).

Another fun fact: MPI-1 was never compliant with Fortran 77.  The most 
obvious/easiest point to cite is that F77 only allowed 6-character variable and 
subroutine names.  :-)

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] non-portable code in examples/Makefile

2012-02-21 Thread TERRY DONTJE



On 2/21/2012 5:55 AM, Jeff Squyres (jsquyres) wrote:

That is truly bizarre "make" behavior.

Heads up that in the upcoming fortran revamp, we *only* use FC. I.E., 
there's only mpifort wrapper compiler (mpif77 and mpif90 still exist, 
but only as sym links to mpifort, signifying that mpifort is the way 
of the future).


This was done because there have been no f77 compilers for decades 
(literally), and no f90 compilers for 10+ years. All the fortran 
compiler vendors have long-since moved to a single compiler executable 
name (e.g., ifort, gfortran), so mpifort just reflects that.


Hmmm, well Oracle's compiler is still named f90 :-).   (now to duck and 
cover)


--td

Sent from my phone. No type good.

On Feb 21, 2012, at 5:01 AM, "Paul H. Hargrove" > wrote:



Thanks, Ralph.
Excellent point about not needing to use the "FC" name with its 
special (absurd?) behavior.


-Paul

On 2/21/2012 1:52 AM, Ralph Castain wrote:
I went ahead and applied this, with a tweak. There is no reason to 
call our flag "FC" as all we use it for is to call the write 
wrapper. So I renamed it to something less problematic.


On Feb 21, 2012, at 1:20 AM, Paul H. Hargrove wrote:

And while we are looking at examples/Makefile on Solaris-10, why 
are the F77 examples getting built w/ mpif90?
Because w/ the Solaris make setting FC also silently sets F77 (yes, 
I am NOT kidding)!
So, reordering the F77= and FC= lines in Makefile resolves that 
mis-behavior.


Attached is my patch to fix both F77/FC and the "better" ompi_info 
queries mentioned in my previous post.

This REPLACES the patch in the previous post.

-Paul

On 2/20/2012 11:36 PM, Paul H. Hargrove wrote:
The addition on Monday of the Java cases to examples/Makefile has 
shown that the default "make" in Solaris-10 will stop on the first 
failed grep command in the "all" rule:

$ make
mpicc -g   -o hello_c hello_c.c
mpicc -g   -o ring_c ring_c.c
mpicc -g   -o connectivity_c connectivity_c.c
mpic++ -g   -o hello_cxx hello_cxx.cc
mpic++ -g   -o ring_cxx ring_cxx.cc
mpif90 -g hello_f77.f -o hello_f77
mpif90 -g ring_f77.f -o ring_f77
mpif90 -g hello_f90.f90 -o hello_f90
mpif90 -g ring_f90.f90 -o ring_f90
*** Error code 1
The following command caused the error:
if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; 
then \

make Hello.class; \
fi
make: Fatal error: Command failed for target `all'


The addition of java did NOT break anything, but exposed a 
pre-existing problem which  was not evident in my prior testing 
because all language bindings were being build prior to adding java.


The attached patch resolves the problem in my (admittedly minimal) 
testing with the smallest possible change.
However an entirely different avoids both "test" and "true" and 
simply looks like:
@ if ompi_info --parsable | grep bindings:cxx:yes >/dev/null; 
then

I have *also* tested that approach, and it works fine too.

I *did* warn that the introduction of the java bindings would 
bring collateral damage.

I just didn't anticipate encountering it personally.

-Paul



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. hargrovephhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
de...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. hargrovephhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
de...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI devel] non-portable code in examples/Makefile

2012-02-21 Thread Jeff Squyres (jsquyres)
That is truly bizarre "make" behavior. 

Heads up that in the upcoming fortran revamp, we *only* use FC. I.E., there's 
only mpifort wrapper compiler (mpif77 and mpif90 still exist, but only as sym 
links to mpifort, signifying that mpifort is the way of the future). 

This was done because there have been no f77 compilers for decades (literally), 
and no f90 compilers for 10+ years. All the fortran compiler vendors have 
long-since moved to a single compiler executable name (e.g., ifort, gfortran), 
so mpifort just reflects that. 

Sent from my phone. No type good. 

On Feb 21, 2012, at 5:01 AM, "Paul H. Hargrove"  wrote:

> Thanks, Ralph.
> Excellent point about not needing to use the "FC" name with its special 
> (absurd?) behavior.
> 
> -Paul
> 
> On 2/21/2012 1:52 AM, Ralph Castain wrote:
>> 
>> I went ahead and applied this, with a tweak. There is no reason to call our 
>> flag "FC" as all we use it for is to call the write wrapper. So I renamed it 
>> to something less problematic.
>> 
>> On Feb 21, 2012, at 1:20 AM, Paul H. Hargrove wrote:
>> 
>>> And while we are looking at examples/Makefile on Solaris-10, why are the 
>>> F77 examples getting built w/ mpif90?
>>> Because w/ the Solaris make setting FC also silently sets F77 (yes, I am 
>>> NOT kidding)!
>>> So, reordering the F77= and FC= lines in Makefile resolves that 
>>> mis-behavior.
>>> 
>>> Attached is my patch to fix both F77/FC and the "better" ompi_info queries 
>>> mentioned in my previous post.
>>> This REPLACES the patch in the previous post.
>>> 
>>> -Paul
>>> 
>>> On 2/20/2012 11:36 PM, Paul H. Hargrove wrote:
 
 The addition on Monday of the Java cases to examples/Makefile has shown 
 that the default "make" in Solaris-10 will stop on the first failed grep 
 command in the "all" rule: 
> $ make 
> mpicc -g   -o hello_c hello_c.c 
> mpicc -g   -o ring_c ring_c.c 
> mpicc -g   -o connectivity_c connectivity_c.c 
> mpic++ -g   -o hello_cxx hello_cxx.cc 
> mpic++ -g   -o ring_cxx ring_cxx.cc 
> mpif90 -g hello_f77.f -o hello_f77 
> mpif90 -g ring_f77.f -o ring_f77 
> mpif90 -g hello_f90.f90 -o hello_f90 
> mpif90 -g ring_f90.f90 -o ring_f90 
> *** Error code 1 
> The following command caused the error: 
> if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; then \ 
> make Hello.class; \ 
> fi 
> make: Fatal error: Command failed for target `all' 
 
 The addition of java did NOT break anything, but exposed a pre-existing 
 problem which  was not evident in my prior testing because all language 
 bindings were being build prior to adding java. 
 
 The attached patch resolves the problem in my (admittedly minimal) testing 
 with the smallest possible change. 
 However an entirely different avoids both "test" and "true" and simply 
 looks like: 
 @ if ompi_info --parsable | grep bindings:cxx:yes >/dev/null; then 
 I have *also* tested that approach, and it works fine too. 
 
 I *did* warn that the introduction of the java bindings would bring 
 collateral damage. 
 I just didn't anticipate encountering it personally. 
 
 -Paul 
 
 
 
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> -- 
>>> Paul H. Hargrove  phhargr...@lbl.gov
>>> Future Technologies Group
>>> HPC Research Department   Tel: +1-510-495-2352
>>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> HPC Research Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] non-portable code in examples/Makefile

2012-02-21 Thread Paul H. Hargrove

Thanks, Ralph.
Excellent point about not needing to use the "FC" name with its special 
(absurd?) behavior.


-Paul

On 2/21/2012 1:52 AM, Ralph Castain wrote:
I went ahead and applied this, with a tweak. There is no reason to 
call our flag "FC" as all we use it for is to call the write wrapper. 
So I renamed it to something less problematic.


On Feb 21, 2012, at 1:20 AM, Paul H. Hargrove wrote:

And while we are looking at examples/Makefile on Solaris-10, why are 
the F77 examples getting built w/ mpif90?
Because w/ the Solaris make setting FC also silently sets F77 (yes, I 
am NOT kidding)!
So, reordering the F77= and FC= lines in Makefile resolves that 
mis-behavior.


Attached is my patch to fix both F77/FC and the "better" ompi_info 
queries mentioned in my previous post.

This REPLACES the patch in the previous post.

-Paul

On 2/20/2012 11:36 PM, Paul H. Hargrove wrote:
The addition on Monday of the Java cases to examples/Makefile has 
shown that the default "make" in Solaris-10 will stop on the first 
failed grep command in the "all" rule:

$ make
mpicc -g   -o hello_c hello_c.c
mpicc -g   -o ring_c ring_c.c
mpicc -g   -o connectivity_c connectivity_c.c
mpic++ -g   -o hello_cxx hello_cxx.cc
mpic++ -g   -o ring_cxx ring_cxx.cc
mpif90 -g hello_f77.f -o hello_f77
mpif90 -g ring_f77.f -o ring_f77
mpif90 -g hello_f90.f90 -o hello_f90
mpif90 -g ring_f90.f90 -o ring_f90
*** Error code 1
The following command caused the error:
if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; 
then \

make Hello.class; \
fi
make: Fatal error: Command failed for target `all'


The addition of java did NOT break anything, but exposed a 
pre-existing problem which  was not evident in my prior testing 
because all language bindings were being build prior to adding java.


The attached patch resolves the problem in my (admittedly minimal) 
testing with the smallest possible change.
However an entirely different avoids both "test" and "true" and 
simply looks like:

@ if ompi_info --parsable | grep bindings:cxx:yes >/dev/null; then
I have *also* tested that approach, and it works fine too.

I *did* warn that the introduction of the java bindings would bring 
collateral damage.

I just didn't anticipate encountering it personally.

-Paul



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. hargrovephhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
de...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] non-portable code in examples/Makefile

2012-02-21 Thread Ralph Castain
I went ahead and applied this, with a tweak. There is no reason to call our 
flag "FC" as all we use it for is to call the write wrapper. So I renamed it to 
something less problematic.

On Feb 21, 2012, at 1:20 AM, Paul H. Hargrove wrote:

> And while we are looking at examples/Makefile on Solaris-10, why are the F77 
> examples getting built w/ mpif90?
> Because w/ the Solaris make setting FC also silently sets F77 (yes, I am NOT 
> kidding)!
> So, reordering the F77= and FC= lines in Makefile resolves that mis-behavior.
> 
> Attached is my patch to fix both F77/FC and the "better" ompi_info queries 
> mentioned in my previous post.
> This REPLACES the patch in the previous post.
> 
> -Paul
> 
> On 2/20/2012 11:36 PM, Paul H. Hargrove wrote:
>> 
>> The addition on Monday of the Java cases to examples/Makefile has shown that 
>> the default "make" in Solaris-10 will stop on the first failed grep command 
>> in the "all" rule: 
>>> $ make 
>>> mpicc -g   -o hello_c hello_c.c 
>>> mpicc -g   -o ring_c ring_c.c 
>>> mpicc -g   -o connectivity_c connectivity_c.c 
>>> mpic++ -g   -o hello_cxx hello_cxx.cc 
>>> mpic++ -g   -o ring_cxx ring_cxx.cc 
>>> mpif90 -g hello_f77.f -o hello_f77 
>>> mpif90 -g ring_f77.f -o ring_f77 
>>> mpif90 -g hello_f90.f90 -o hello_f90 
>>> mpif90 -g ring_f90.f90 -o ring_f90 
>>> *** Error code 1 
>>> The following command caused the error: 
>>> if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; then \ 
>>> make Hello.class; \ 
>>> fi 
>>> make: Fatal error: Command failed for target `all' 
>> 
>> The addition of java did NOT break anything, but exposed a pre-existing 
>> problem which  was not evident in my prior testing because all language 
>> bindings were being build prior to adding java. 
>> 
>> The attached patch resolves the problem in my (admittedly minimal) testing 
>> with the smallest possible change. 
>> However an entirely different avoids both "test" and "true" and simply looks 
>> like: 
>> @ if ompi_info --parsable | grep bindings:cxx:yes >/dev/null; then 
>> I have *also* tested that approach, and it works fine too. 
>> 
>> I *did* warn that the introduction of the java bindings would bring 
>> collateral damage. 
>> I just didn't anticipate encountering it personally. 
>> 
>> -Paul 
>> 
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> HPC Research Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] non-portable code in examples/Makefile

2012-02-21 Thread Paul H. Hargrove
And while we are looking at examples/Makefile on Solaris-10, why are the 
F77 examples getting built w/ mpif90?
Because w/ the Solaris make setting FC also silently sets F77 (yes, I am 
NOT kidding)!
So, reordering the F77= and FC= lines in Makefile resolves that 
mis-behavior.


Attached is my patch to fix both F77/FC and the "better" ompi_info 
queries mentioned in my previous post.

This REPLACES the patch in the previous post.

-Paul

On 2/20/2012 11:36 PM, Paul H. Hargrove wrote:
The addition on Monday of the Java cases to examples/Makefile has 
shown that the default "make" in Solaris-10 will stop on the first 
failed grep command in the "all" rule:

$ make
mpicc -g   -o hello_c hello_c.c
mpicc -g   -o ring_c ring_c.c
mpicc -g   -o connectivity_c connectivity_c.c
mpic++ -g   -o hello_cxx hello_cxx.cc
mpic++ -g   -o ring_cxx ring_cxx.cc
mpif90 -g hello_f77.f -o hello_f77
mpif90 -g ring_f77.f -o ring_f77
mpif90 -g hello_f90.f90 -o hello_f90
mpif90 -g ring_f90.f90 -o ring_f90
*** Error code 1
The following command caused the error:
if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; then \
make Hello.class; \
fi
make: Fatal error: Command failed for target `all'


The addition of java did NOT break anything, but exposed a 
pre-existing problem which  was not evident in my prior testing 
because all language bindings were being build prior to adding java.


The attached patch resolves the problem in my (admittedly minimal) 
testing with the smallest possible change.
However an entirely different avoids both "test" and "true" and simply 
looks like:

@ if ompi_info --parsable | grep bindings:cxx:yes >/dev/null; then
I have *also* tested that approach, and it works fine too.

I *did* warn that the introduction of the java bindings would bring 
collateral damage.

I just didn't anticipate encountering it personally.

-Paul



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

Index: Makefile
===
--- Makefile(revision 25980)
+++ Makefile(working copy)
@@ -25,8 +25,8 @@
 CC = mpicc
 CXX = mpic++
 CCC = mpic++
+FC = mpif90
 F77 = mpif77
-FC = mpif90
 JAVAC = mpijavac

 # Using -g is not necessary, but it is helpful for example programs,
@@ -49,19 +49,19 @@
 # if Open MPI was build with the relevant language bindings.

 all: hello_c ring_c connectivity_c
-   @ if test "`ompi_info --parsable | grep bindings:cxx:yes`" != ""; then \
+   @ if ompi_info --parsable | grep bindings:cxx:yes >/dev/null; then \
$(MAKE) hello_cxx ring_cxx; \
fi
-   @ if test "`ompi_info --parsable | grep bindings:f77:yes`" != ""; then \
+   @ if ompi_info --parsable | grep bindings:f77:yes >/dev/null; then \
$(MAKE) hello_f77 ring_f77; \
fi
-   @ if test "`ompi_info --parsable | grep bindings:f90:yes`" != ""; then \
+   @ if ompi_info --parsable | grep bindings:f90:yes >/dev/null; then \
$(MAKE) hello_f90 ring_f90; \
fi
-   @ if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; then 
\
+   @ if ompi_info --parsable | grep bindings:java:yes >/dev/null; then \
$(MAKE) Hello.class; \
fi
-   @ if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; then 
\
+   @ if ompi_info --parsable | grep bindings:java:yes >/dev/null ; then \
$(MAKE) Ring.class; \
fi



[OMPI devel] non-portable code in examples/Makefile

2012-02-21 Thread Paul H. Hargrove
The addition on Monday of the Java cases to examples/Makefile has shown 
that the default "make" in Solaris-10 will stop on the first failed grep 
command in the "all" rule:

$ make
mpicc -g   -o hello_c hello_c.c
mpicc -g   -o ring_c ring_c.c
mpicc -g   -o connectivity_c connectivity_c.c
mpic++ -g   -o hello_cxx hello_cxx.cc
mpic++ -g   -o ring_cxx ring_cxx.cc
mpif90 -g hello_f77.f -o hello_f77
mpif90 -g ring_f77.f -o ring_f77
mpif90 -g hello_f90.f90 -o hello_f90
mpif90 -g ring_f90.f90 -o ring_f90
*** Error code 1
The following command caused the error:
if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; then \
make Hello.class; \
fi
make: Fatal error: Command failed for target `all'


The addition of java did NOT break anything, but exposed a pre-existing 
problem which  was not evident in my prior testing because all language 
bindings were being build prior to adding java.


The attached patch resolves the problem in my (admittedly minimal) 
testing with the smallest possible change.
However an entirely different avoids both "test" and "true" and simply 
looks like:

@ if ompi_info --parsable | grep bindings:cxx:yes >/dev/null; then
I have *also* tested that approach, and it works fine too.

I *did* warn that the introduction of the java bindings would bring 
collateral damage.

I just didn't anticipate encountering it personally.

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

Index: Makefile
===
--- Makefile	(revision 25980)
+++ Makefile	(working copy)
@@ -49,19 +49,19 @@
 # if Open MPI was build with the relevant language bindings.

 all: hello_c ring_c connectivity_c
-	@ if test "`ompi_info --parsable | grep bindings:cxx:yes`" != ""; then \
+	@ if test "`ompi_info --parsable | grep bindings:cxx:yes || true`" != ""; then \
 	$(MAKE) hello_cxx ring_cxx; \
 	fi
-	@ if test "`ompi_info --parsable | grep bindings:f77:yes`" != ""; then \
+	@ if test "`ompi_info --parsable | grep bindings:f77:yes || true`" != ""; then \
 	$(MAKE) hello_f77 ring_f77; \
 	fi
-	@ if test "`ompi_info --parsable | grep bindings:f90:yes`" != ""; then \
+	@ if test "`ompi_info --parsable | grep bindings:f90:yes || true`" != ""; then \
 	$(MAKE) hello_f90 ring_f90; \
 	fi
-	@ if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; then \
+	@ if test "`ompi_info --parsable | grep bindings:java:yes || true``" != ""; then \
 	$(MAKE) Hello.class; \
 	fi
-	@ if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; then \
+	@ if test "`ompi_info --parsable | grep bindings:java:yes || true`" != ""; then \
 	$(MAKE) Ring.class; \
 	fi



Re: [OMPI devel] [EXTERNAL] Re: trunk build failure on Altix [w/ WORK AROUND]

2012-02-21 Thread Paul H. Hargrove
Testing tonight's trunk tarball on the Altix system I have access to 
looks fine now.


Thanks, Brian.
-Paul

On 2/20/2012 11:49 AM, Paul H. Hargrove wrote:

Brian,

Thanks for looking into this.
I'll plan to take a look at the trunk tarball tonight and report back.

-Paul

On 2/20/2012 8:49 AM, Barrett, Brian W wrote:

Hi Paul -

Thanks for noticing this.  I guess we don't have many Altix 
developers.  I
think I've fixed it on the trunk with r25968, plus r25967 to make 
sure the

Altix component gets selected over the Linux component on Altix systems.
I don't have an Altix to test on; can you give it a go and let me 
know if

it works?  In the trunk right now, and should be in the trunk nightly
tarball tomorrow morning.

The problem cropped up when we started running the configure macros for
components that couldn't possibly succeed (which we needed to make
Automake happy in a couple of situations) sometime late in the 1.5 
series.
  Before that, a component could never think it succeeded and then 
later be
told it didn't.  We added yet another macro to handle issues like 
this, so

it was a fairly easy fix.

Thanks,

Brian

On 2/17/12 4:26 PM, "Paul H. Hargrove"  wrote:




I've poked enough at the ompi configure magic to *think* I
understand the source of the problem I've seen w/ both trunk and
1.5.x on the Altix.

The problem appears to be that both timer/altix/configure.m4 and
timer/linux/configure.m4 are setting the value of
$timer_base_include and the LAST one "wins".  Meanwhile, only the
FIRST one is getting added to $static_components ("there can be 
only
one").  So, I suspect the difference I saw between trunk and 1.5 
was

just a matter of which configure probe ran first.

The result of having FIRST and LAST "win" in different settings 
is a

mismatch.


$ grep -e timer:linux -e timer:altix
  configure.out
  --- MCA component timer:linux (m4 configuration macro, priority
  30)
  checking for MCA component timer:linux compile mode... static
  checking if MCA component timer:linux can compile... yes
  --- MCA component timer:altix (m4 configuration macro, priority
  30)
  checking for MCA component timer:altix compile mode... static
  checking if MCA component timer:altix can compile... no


which picks timer:linux and rejects timer:altix, as compared to:


$ grep -e '"MCA_opal_timer_[SD]' -e
  MCA_timer_ config.status
  S["MCA_opal_timer_DSO_SUBDIRS"]=""
  S["MCA_opal_timer_STATIC_SUBDIRS"]=" mca/timer/linux"

S["MCA_opal_timer_STATIC_LTLIBS"]="mca/timer/linux/libmca_timer_linux.la 


  "
  S["MCA_opal_timer_DSO_COMPONENTS"]=""
  S["MCA_opal_timer_STATIC_COMPONENTS"]=" linux"
  D["MCA_timer_IMPLEMENTATION_HEADER"]="
  \"opal/mca/timer/altix/timer_altix.h\""


Which will build timer:linux but has improperly picked up the
timer:altix HEADER!


For the present, an explicit --with-timer=altix works-around the
problem in either branch.
However, the setting of the header variable by a NON-selected
component is ERRONEOUS and should get fixed.
In trunk, it may also make sense to raise the priority of
timer:altix above that of timer:linux.

-Paul

On 2/15/2012 12:41 AM, Paul Hargrove wrote:

  I've configured the ompi trunk (nightly tarball 1.7a1r25927)
on an SGI Altix.
  I used no special arguments indicating that this is an Altix,
and there does not appear to be an altix-specific file in
contrib/platform.


  My build fails as follows:




make:
  Entering directory
`/mnt/home/c_phargrov/OMPI/openmpi-trunk-linux-ia64/BLD/opal/tools/wrapper 


s'
CC opal_wrapper.o
CCLD   opal_wrapper
  ../../../opal/.libs/libopen-pal.so: undefined reference to
  `opal_timer_altix_mmdev_timer_addr'
  ../../../opal/.libs/libopen-pal.so: undefined reference to
  `opal_timer_altix_freq'
  collect2: ld returned 1 exit status






  The configure-generated opal_config.h contains
  #define MCA_timer_IMPLEMENTATION_HEADER
"opal/mca/timer/altix/timer_altix.h"


  Nothing appears to have been built in
BUILDDIR/opal/mca/timer/altix.
  However, BUILDDIR/opal/mca/timer/linux has been built.


  -Paul


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352

Lawrence Berkeley National Laboratory Fax: +1-510-486-6900





-- Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel






--
Paul H. Hargr