Re: [OMPI devel] OMPI devel] trunk warnings on x86

2014-08-03 Thread Gilles Gouaillardet
Paul,

i confirm ampersand was missing and this was a bug
/* a similar bug was fixed by Ralph in r32357 */

i commited r32408 in order to fix these three bugs.

i also took the liberty to replace the OMPI_CAST_RTE_NAME
with an inline function (only in debug mode) in order to get a
compiler warning on both 32 and 64 bits arch in this case :

#if OPAL_ENABLE_DEBUG
static inline orte_process_name_t *
OMPI_CAST_RTE_NAME(opal_process_name_t * name);
#else
#define OMPI_CAST_RTE_NAME(a) ((orte_process_name_t*)(a))
#endif

Cheers,

Gilles

On 2014/08/03 14:49, Gilles GOUAILLARDET wrote:
> Paul,
>
> imho, the root cause is a missing ampersand.
>
> I will double check this from tomorrow only
>
> Cheers,
>
> Gilles
>
> Ralph Castain  wrote:
>> Arg - that raises an interesting point. This is a pointer to a 64-bit 
>> number. Will uintptr_t resolve that problem on such platforms?
>>
>>
>> On Aug 2, 2014, at 8:12 PM, Paul Hargrove  wrote:
>>
>>
>> Looks like on a 32-bit platform a (uintptr_t) cast is desired in the 
>> OMPI_CAST_RTE_NAME() macro.
>>
>>
>> Warnings from current trunk tarball attributable to the missing case include:
>>
>>
>> /home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/runtime/ompi_mpi_abort.c:89:
>>  warning: cast to pointer from integer of different size
>>
>> /home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/runtime/ompi_mpi_abort.c:97:
>>  warning: cast to pointer from integer of different size
>>
>> /home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/mca/pml/bfo/pml_bfo_failover.c:1417:
>>  warning: cast to pointer from integer of different size
>>
>>
>> -Paul
>>
>>
>> -- 
>>
>> Paul H. Hargrove  phhargr...@lbl.gov
>>
>> Future Technologies Group
>>
>> Computer and Data Sciences Department Tel: +1-510-495-2352
>>
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/08/15481.php
>>
>>
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/08/15484.php



Re: [OMPI devel] [1.8.2rc3] static linking fails on linux (openpty undefined)

2014-08-03 Thread Paul Hargrove
Hmm,

On a different Linux/x86-64 host things work as expected with '-lutil'
linked explicitly:

$ ./INST/bin/mpicc -showme BLD/examples/hello_c.c
pgcc BLD/examples/hello_c.c
-I/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/include
-L/opt/torque/4.2.7.h1/lib -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib
-Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib -Wl,-rpath
-Wl,/opt/torque/4.2.7.h1/lib -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib
-Wl,-rpath
-Wl,/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/lib
-L/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/lib
-lmpi -lopen-rte -lopen-pal -lm -lpciaccess -ldl -ltorque -lrt -lnsl -lutil

Searching for relevant differences now...

-Paul


On Sun, Aug 3, 2014 at 4:58 PM, Paul Hargrove  wrote:

>
> I've configured the 1.8.2rc3 tarball with "--enable-static
> --disable-shared" on a fairly standard Linux/x86-64 platform.  While there
> are no problems on the same platform w/o these configure flags, with them I
> cannot link any application codes.
>
> $ mpicc -ghello_c.c   -o hello_c
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib/libopen-pal.a(opal_pty.o):
> In function `opal_openpty':
> opal_pty.c:(.text+0x1): undefined reference to `openpty'
>
> I checked "make openpty" and the manpage says to link with '-lutil'.
> The '-showme' does not show libutil:
>
> $ mpicc -showme hello_c.c
> gcc hello_c.c
> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/include
> -pthread -L/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath
> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath
> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath
> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath
> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath
> -Wl,/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib
> -Wl,--enable-new-dtags
> -L/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib
> -lmpi -lopen-rte -lopen-pal -lm -ldl -ltorque -libverbs -lrdmacm
>
>
> It looks like configure is doing the right thing on some level, but
> failing to add '-lutil' to the appropriate list of libs
> (OPAL_WRAPPER_EXTRA_LIBS?):
>
>
> 
> == Library and Function tests
>
> 
> checking if we need -lutil for openpty... yes
> checking for openpty... yes
>
>
> -Paul
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


[OMPI devel] [1.8.2rc3] static linking fails on linux (openpty undefined)

2014-08-03 Thread Paul Hargrove
I've configured the 1.8.2rc3 tarball with "--enable-static
--disable-shared" on a fairly standard Linux/x86-64 platform.  While there
are no problems on the same platform w/o these configure flags, with them I
cannot link any application codes.

$ mpicc -ghello_c.c   -o hello_c
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib/libopen-pal.a(opal_pty.o):
In function `opal_openpty':
opal_pty.c:(.text+0x1): undefined reference to `openpty'

I checked "make openpty" and the manpage says to link with '-lutil'.
The '-showme' does not show libutil:

$ mpicc -showme hello_c.c
gcc hello_c.c
-I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/include
-pthread -L/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath
-Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath
-Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath
-Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath
-Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath
-Wl,/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib
-Wl,--enable-new-dtags
-L/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib
-lmpi -lopen-rte -lopen-pal -lm -ldl -ltorque -libverbs -lrdmacm


It looks like configure is doing the right thing on some level, but failing
to add '-lutil' to the appropriate list of libs (OPAL_WRAPPER_EXTRA_LIBS?):


== Library and Function tests

checking if we need -lutil for openpty... yes
checking for openpty... yes


-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] [1.8.2rc3] another openib bug (#4377)

2014-08-03 Thread Paul Hargrove
On Sun, Aug 3, 2014 at 12:49 PM, Paul Hargrove  wrote:

> BTW:
> Even with the "ignore_device=1" problem fixed, I can't get btl:openib
> running on x86.
> So, there may be additional reports in the next few hours.
>

That turned out to be the already known issue in 1.8.2rc3 that was since
fixed.
So, with manual application of r32395 + the patch for ticket #4377 I can
run btl:openib on x86+tavor

-Paul


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


[OMPI devel] [1.8.2rc3] another openib bug (#4377)

2014-08-03 Thread Paul Hargrove
I have a pair of x86/linux (32 bit) hosts connected by Mellanox Tavor HCAs.
 I have no idea if (or why) this has only appeared on this system, but I
find that blt:openib thinks the INI file says to ignore these HCAs.  See
the 4th line below:


[pcp-j-5][[27705,1],0][/home/pcp1/phargrov/OMPI/openmpi-1.8.2rc3-linux-x86-mx/openmpi-1.8.2rc3/ompi/mca/btl/openib/btl_openib_ip.c:364:add_rdma_addr]
Adding addr 172.18.0.105 (0x690012ac) subnet 0xac12 as mthca0:1
[pcp-j-5][[27705,1],0][/home/pcp1/phargrov/OMPI/openmpi-1.8.2rc3-linux-x86-mx/openmpi-1.8.2rc3/ompi/mca/btl/openib/btl_openib_ini.c:170:ompi_btl_openib_ini_query]
Querying INI files for vendor 0x02c9, part ID 23108
[pcp-j-5][[27705,1],0][/home/pcp1/phargrov/OMPI/openmpi-1.8.2rc3-linux-x86-mx/openmpi-1.8.2rc3/ompi/mca/btl/openib/btl_openib_ini.c:189:ompi_btl_openib_ini_query]
Found corresponding INI values: Mellanox Tavor Infinihost
[pcp-j-5][[27705,1],0][/home/pcp1/phargrov/OMPI/openmpi-1.8.2rc3-linux-x86-mx/openmpi-1.8.2rc3/ompi/mca/btl/openib/btl_openib_component.c:1541:init_one_device]
device mthca0 skipped; ignore_device=1
[pcp-j-5][[27705,1],0][/home/pcp1/phargrov/OMPI/openmpi-1.8.2rc3-linux-x86-mx/openmpi-1.8.2rc3/ompi/mca/btl/openib/btl_openib_component.c:988:device_destruct]
Failed to release mpool
[pcp-j-5][[27705,1],0][/home/pcp1/phargrov/OMPI/openmpi-1.8.2rc3-linux-x86-mx/openmpi-1.8.2rc3/ompi/mca/btl/openib/btl_openib_component.c:1020:device_destruct]
Failed to destroy device resources
[pcp-j-5][[27705,1],0][/home/pcp1/phargrov/OMPI/openmpi-1.8.2rc3-linux-x86-mx/openmpi-1.8.2rc3/ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:1981:rdmacm_component_finalize]
rdmacm_component_finalize

Turns out this is known, and has been entered as trac ticket #4377,
currently assigned to miked.
Applying the 2-line patch attached to the ticket fixes the ignore_device=1
problem for me.

Mike,
Please apply that patch to trunk and CMR for 1.8.2

BTW:
Even with the "ignore_device=1" problem fixed, I can't get btl:openib
running on x86.
So, there may be additional reports in the next few hours.

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] OMPI devel] trunk warnings on x86

2014-08-03 Thread Gilles GOUAILLARDET
Paul,

imho, the root cause is a missing ampersand.

I will double check this from tomorrow only

Cheers,

Gilles

Ralph Castain  wrote:
>Arg - that raises an interesting point. This is a pointer to a 64-bit number. 
>Will uintptr_t resolve that problem on such platforms?
>
>
>On Aug 2, 2014, at 8:12 PM, Paul Hargrove  wrote:
>
>
>Looks like on a 32-bit platform a (uintptr_t) cast is desired in the 
>OMPI_CAST_RTE_NAME() macro.
>
>
>Warnings from current trunk tarball attributable to the missing case include:
>
>
>/home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/runtime/ompi_mpi_abort.c:89:
> warning: cast to pointer from integer of different size
>
>/home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/runtime/ompi_mpi_abort.c:97:
> warning: cast to pointer from integer of different size
>
>/home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/mca/pml/bfo/pml_bfo_failover.c:1417:
> warning: cast to pointer from integer of different size
>
>
>-Paul
>
>
>-- 
>
>Paul H. Hargrove                          phhargr...@lbl.gov
>
>Future Technologies Group
>
>Computer and Data Sciences Department     Tel: +1-510-495-2352
>
>Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>
>___
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post: 
>http://www.open-mpi.org/community/lists/devel/2014/08/15481.php
>
>


Re: [OMPI devel] trunk warnings on x86

2014-08-03 Thread Paul Hargrove
Whether just adding a (uintptr_t) cast is sufficient or not depends on the
usage, and I don't pretend to have looked much deeper than seeing that this
macro is common to the line numbers in the warnings I quoted.

If the intent is to uniformly store a pointer then a (uintptr_t *) cast may
be appropriate, though that would use the most-significant 32-bits on ppc32
and least-significant 32-bits on x86.  Again, the appropriate form for the
macro depends on how the field is used.

-Paul


On Sat, Aug 2, 2014 at 9:14 PM, Ralph Castain  wrote:

> Arg - that raises an interesting point. This is a pointer to a 64-bit
> number. Will uintptr_t resolve that problem on such platforms?
>
> On Aug 2, 2014, at 8:12 PM, Paul Hargrove  wrote:
>
> Looks like on a 32-bit platform a (uintptr_t) cast is desired in the
> OMPI_CAST_RTE_NAME() macro.
>
> Warnings from current trunk tarball attributable to the missing case
> include:
>
> /home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/runtime/ompi_mpi_abort.c:89:
> warning: cast to pointer from integer of different size
> /home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/runtime/ompi_mpi_abort.c:97:
> warning: cast to pointer from integer of different size
> /home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/mca/pml/bfo/pml_bfo_failover.c:1417:
> warning: cast to pointer from integer of different size
>
> -Paul
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>  ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15481.php
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15482.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


[OMPI devel] trunk warnings on x86

2014-08-03 Thread Paul Hargrove
Looks like on a 32-bit platform a (uintptr_t) cast is desired in the
OMPI_CAST_RTE_NAME() macro.

Warnings from current trunk tarball attributable to the missing case
include:

/home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/runtime/ompi_mpi_abort.c:89:
warning: cast to pointer from integer of different size
/home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/runtime/ompi_mpi_abort.c:97:
warning: cast to pointer from integer of different size
/home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/mca/pml/bfo/pml_bfo_failover.c:1417:
warning: cast to pointer from integer of different size

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900