[OMPI users] two problems with openmpi-1.9r28534

2013-05-24 Thread Siegmar Gross
Hi

I installed openmpi-1.9r28534 on "openSuSE Linux 12.1", "Solaris 10
x86_64", and "Solaris 10 sparc" with "Sun C 5.12" in 32- and 64-bit
versions. Unfortunately I have two problems with this version.


sunpc1 hello_1 104 mpiexec -np 3 -host sunpc1,linpc1,rs0 hostname
[sunpc1:18681] [[19223,0],0] ORTE_ERROR_LOG: Data unpack had
  inadequate space in file
  ../../../../openmpi-1.9r28534/orte/mca/plm/base/plm_base_launch_support.c
  at line 854

sunpc1 hello_1 105 which mpiexec
/usr/local/openmpi-1.9_64_cc/bin/mpiexec
sunpc1 hello_1 106 



My second problem is, that "rank_files" don't work as expected.

sunpc1 rankfiles 108 mpiexec -report-bindings \
  -rf rf_ex_sunpc_linpc hostname
-
The rankfile that was used claimed that a host was either not
allocated or oversubscribed its slots.  Please review your rank-slot
assignments and your host allocation to ensure a proper match.  Also,
some systems may require using full hostnames, such as
"host1.example.com" (instead of just plain "host1").

  Host: linpc1
-
sunpc1 rankfiles 109 [linpc1:03952] [[19223,0],1] ORTE_ERROR_LOG:
  Not found in file ../../openmpi-1.9r28534/orte/runtime/orte_globals.c
  at line 488
[linpc1:03952] [[19223,0],1] -> [[19223,0],0] (node: NULL) oob-tcp:
  Number of attempts to create TCP connection has been exceeded.
  Can not communicate with peer

sunpc1 rankfiles 109 



I don't have this problem with openmpi-1.6.5a1r28554.

sunpc1 rankfiles 105 mpiexec -report-bindings \
  -rf rf_ex_sunpc_linpc hostname
[sunpc1:17968] MCW rank 1 bound to socket 0[core 0-1]:
  [B B][. .] (slot list 0:0-1)
[sunpc1:17968] MCW rank 2 bound to socket 1[core 0]:
  [. .][B .] (slot list 1:0)
[sunpc1:17968] MCW rank 3 bound to socket 1[core 1]:
  [. .][. B] (slot list 1:1)
sunpc1
sunpc1
sunpc1
[linpc1:03246] MCW rank 0 bound to socket 0[core 0-1]
  socket 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1)
linpc1

sunpc1 rankfiles 106 which mpiexec
/usr/local/openmpi-1.6.5_32_cc/bin/mpiexec
sunpc1 rankfiles 107 



I would be grateful, if somebody can fix the problems. Thank you
very much for any help in advance.


Kind regards

Siegmar



Re: [OMPI users] two problems with openmpi-1.9r28534

2013-05-24 Thread Ralph Castain
It sounds like your saying that the OMPI trunk replicates the same behavior you 
reported elsewhere - which isn't a surprise. Let's track this on your prior 
messages as the two problems are not related.

On May 24, 2013, at 12:28 AM, Siegmar Gross 
 wrote:

> Hi
> 
> I installed openmpi-1.9r28534 on "openSuSE Linux 12.1", "Solaris 10
> x86_64", and "Solaris 10 sparc" with "Sun C 5.12" in 32- and 64-bit
> versions. Unfortunately I have two problems with this version.
> 
> 
> sunpc1 hello_1 104 mpiexec -np 3 -host sunpc1,linpc1,rs0 hostname
> [sunpc1:18681] [[19223,0],0] ORTE_ERROR_LOG: Data unpack had
>  inadequate space in file
>  ../../../../openmpi-1.9r28534/orte/mca/plm/base/plm_base_launch_support.c
>  at line 854
> 
> sunpc1 hello_1 105 which mpiexec
> /usr/local/openmpi-1.9_64_cc/bin/mpiexec
> sunpc1 hello_1 106 
> 
> 
> 
> My second problem is, that "rank_files" don't work as expected.
> 
> sunpc1 rankfiles 108 mpiexec -report-bindings \
>  -rf rf_ex_sunpc_linpc hostname
> -
> The rankfile that was used claimed that a host was either not
> allocated or oversubscribed its slots.  Please review your rank-slot
> assignments and your host allocation to ensure a proper match.  Also,
> some systems may require using full hostnames, such as
> "host1.example.com" (instead of just plain "host1").
> 
>  Host: linpc1
> -
> sunpc1 rankfiles 109 [linpc1:03952] [[19223,0],1] ORTE_ERROR_LOG:
>  Not found in file ../../openmpi-1.9r28534/orte/runtime/orte_globals.c
>  at line 488
> [linpc1:03952] [[19223,0],1] -> [[19223,0],0] (node: NULL) oob-tcp:
>  Number of attempts to create TCP connection has been exceeded.
>  Can not communicate with peer
> 
> sunpc1 rankfiles 109 
> 
> 
> 
> I don't have this problem with openmpi-1.6.5a1r28554.
> 
> sunpc1 rankfiles 105 mpiexec -report-bindings \
>  -rf rf_ex_sunpc_linpc hostname
> [sunpc1:17968] MCW rank 1 bound to socket 0[core 0-1]:
>  [B B][. .] (slot list 0:0-1)
> [sunpc1:17968] MCW rank 2 bound to socket 1[core 0]:
>  [. .][B .] (slot list 1:0)
> [sunpc1:17968] MCW rank 3 bound to socket 1[core 1]:
>  [. .][. B] (slot list 1:1)
> sunpc1
> sunpc1
> sunpc1
> [linpc1:03246] MCW rank 0 bound to socket 0[core 0-1]
>  socket 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1)
> linpc1
> 
> sunpc1 rankfiles 106 which mpiexec
> /usr/local/openmpi-1.6.5_32_cc/bin/mpiexec
> sunpc1 rankfiles 107 
> 
> 
> 
> I would be grateful, if somebody can fix the problems. Thank you
> very much for any help in advance.
> 
> 
> Kind regards
> 
> Siegmar
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users