Re: [OMPI devel] mpirun error when not using span

2018-09-10 Thread Ralph H Castain
Could you please send the output from “lstopo --of xml foo.xml” (the file 
foo.xml) so I can try to replicate here?


> On Sep 4, 2018, at 12:35 PM, Shrader, David Lee  wrote:
> 
> Hello,
> 
> I have run this issue by Howard, and he asked me to forward it on to the Open 
> MPI devel mailing list. I get an error when trying to use PE=n with '--map-by 
> numa' and not using span when using more than one node:
> 
> [dshrader@ba001 openmpi-3.1.2]$ mpirun -n 16 --map-by numa:PE=4 --bind-to 
> core --report-bindings true
> --
> A request was made to bind to that would result in binding more
> processes than cpus on a resource:
> 
>Bind to: CORE
>Node:ba001
>#processes:  2
>#cpus:   1
> 
> You can override this protection by adding the "overload-allowed"
> option to your binding directive.
> --
> 
> The absolute values of the numbers passed to -n and PE don't really matter; 
> the error pops up as soon as those numbers are combined in such a way that an 
> MPI rank ends up on the second node.
> 
> If I add the "span" parameter, everything works as expected:
> 
> [dshrader@ba001 openmpi-3.1.2]$ mpirun -n 16 --map-by numa:PE=4,span 
> --bind-to core --report-bindings true
> [ba002.localdomain:58502] MCW rank 8 bound to socket 0[core 0[hwt 0]], socket 
> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]]: 
> [B/B/B/B/./././././././././././././.][./././././././././././././././././.]
> [ba002.localdomain:58502] MCW rank 9 bound to socket 0[core 4[hwt 0]], socket 
> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]]: 
> [././././B/B/B/B/./././././././././.][./././././././././././././././././.]
> [ba002.localdomain:58502] MCW rank 10 bound to socket 0[core 8[hwt 0]], 
> socket 0[core 9[hwt 0]], socket 0[core 10[hwt 0]], socket 0[core 11[hwt 0]]: 
> [././././././././B/B/B/B/./././././.][./././././././././././././././././.]
> [ba002.localdomain:58502] MCW rank 11 bound to socket 0[core 12[hwt 0]], 
> socket 0[core 13[hwt 0]], socket 0[core 14[hwt 0]], socket 0[core 15[hwt 0]]: 
> [././././././././././././B/B/B/B/./.][./././././././././././././././././.]
> [ba002.localdomain:58502] MCW rank 12 bound to socket 1[core 18[hwt 0]], 
> socket 1[core 19[hwt 0]], socket 1[core 20[hwt 0]], socket 1[core 21[hwt 0]]: 
> [./././././././././././././././././.][B/B/B/B/./././././././././././././.]
> [ba002.localdomain:58502] MCW rank 13 bound to socket 1[core 22[hwt 0]], 
> socket 1[core 23[hwt 0]], socket 1[core 24[hwt 0]], socket 1[core 25[hwt 0]]: 
> [./././././././././././././././././.][././././B/B/B/B/./././././././././.]
> [ba002.localdomain:58502] MCW rank 14 bound to socket 1[core 26[hwt 0]], 
> socket 1[core 27[hwt 0]], socket 1[core 28[hwt 0]], socket 1[core 29[hwt 0]]: 
> [./././././././././././././././././.][././././././././B/B/B/B/./././././.]
> [ba002.localdomain:58502] MCW rank 15 bound to socket 1[core 30[hwt 0]], 
> socket 1[core 31[hwt 0]], socket 1[core 32[hwt 0]], socket 1[core 33[hwt 0]]: 
> [./././././././././././././././././.][././././././././././././B/B/B/B/./.]
> [ba001.localdomain:11700] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 
> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]]: 
> [B/B/B/B/./././././././././././././.][./././././././././././././././././.]
> [ba001.localdomain:11700] MCW rank 1 bound to socket 0[core 4[hwt 0]], socket 
> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]]: 
> [././././B/B/B/B/./././././././././.][./././././././././././././././././.]
> [ba001.localdomain:11700] MCW rank 2 bound to socket 0[core 8[hwt 0]], socket 
> 0[core 9[hwt 0]], socket 0[core 10[hwt 0]], socket 0[core 11[hwt 0]]: 
> [././././././././B/B/B/B/./././././.][./././././././././././././././././.]
> [ba001.localdomain:11700] MCW rank 3 bound to socket 0[core 12[hwt 0]], 
> socket 0[core 13[hwt 0]], socket 0[core 14[hwt 0]], socket 0[core 15[hwt 0]]: 
> [././././././././././././B/B/B/B/./.][./././././././././././././././././.]
> [ba001.localdomain:11700] MCW rank 4 bound to socket 1[core 18[hwt 0]], 
> socket 1[core 19[hwt 0]], socket 1[core 20[hwt 0]], socket 1[core 21[hwt 0]]: 
> [./././././././././././././././././.][B/B/B/B/./././././././././././././.]
> [ba001.localdomain:11700] MCW rank 5 bound to socket 1[core 22[hwt 0]], 
> socket 1[core 23[hwt 0]], socket 1[core 24[hwt 0]], socket 1[core 25[hwt 0]]: 
> [./././././././././././././././././.][././././B/B/B/B/./././././././././.]
> [ba001.localdomain:11700] MCW rank 6 bound to socket 1[core 26[hwt 0]], 
> socket 1[core 27[hwt 0]], socket 1[core 28[hwt 0]], socket 1[core 29[hwt 0]]: 
> [./././././././././././././././././.][././././././././B/B/B/B/./././././.]
> [ba001.localdomain:11700] MCW rank 7 bound to socket 1[core 30[hwt 0]], 
> socket 1[core 31[hwt 0]], socket 1[core 32[hwt 0]], socket 1[core 33[hwt 0]]: 
> [.

[OMPI devel] Will info keys ever be fixed?

2018-09-10 Thread Ralph H Castain
Still seeing this in today’s head of master:

info_subscriber.c: In function 'opal_infosubscribe_change_info':
../../opal/util/info.h:112:31: warning: '%s' directive output may be truncated 
writing up to 36 bytes into a region of size 27 [-Wformat-truncation=]
 #define OPAL_INFO_SAVE_PREFIX "_OMPI_IN_"
   ^
info_subscriber.c:268:13: note: in expansion of macro 'OPAL_INFO_SAVE_PREFIX'
 OPAL_INFO_SAVE_PREFIX "%s", key);
 ^
info_subscriber.c:268:36: note: format string is defined here
 OPAL_INFO_SAVE_PREFIX "%s", key);
^~
In file included from 
/opt/local/lib/gcc7/gcc/x86_64-apple-darwin17/7.3.0/include-fixed/stdio.h:425:0,
 from ../../opal/class/opal_list.h:71,
 from ../../opal/util/info_subscriber.h:30,
 from info_subscriber.c:45:
info_subscriber.c:267:9: note: '__builtin_snprintf' output between 10 and 46 
bytes into a destination of size 36
 snprintf(modkey, OPAL_MAX_INFO_KEY,
 ^
In file included from 
/opt/local/lib/gcc7/gcc/x86_64-apple-darwin17/7.3.0/include-fixed/stdio.h:425:0,
 from ../../opal/class/opal_list.h:71,
 from ../../opal/util/info.h:30,
 from info.c:46:
info.c: In function 'opal_info_dup_mode.constprop':
../../opal/util/info.h:112:31: warning: '%s' directive output may be truncated 
writing up to 36 bytes into a region of size 28 [-Wformat-truncation=]
 #define OPAL_INFO_SAVE_PREFIX "_OMPI_IN_"
   ^
info.c:212:22: note: in expansion of macro 'OPAL_INFO_SAVE_PREFIX'
  OPAL_INFO_SAVE_PREFIX "%s", pkey);
  ^
info.c:212:45: note: format string is defined here
  OPAL_INFO_SAVE_PREFIX "%s", pkey);
 ^~
In file included from 
/opt/local/lib/gcc7/gcc/x86_64-apple-darwin17/7.3.0/include-fixed/stdio.h:425:0,
 from ../../opal/class/opal_list.h:71,
 from ../../opal/util/info.h:30,
 from info.c:46:
info.c:211:18: note: '__builtin_snprintf' output between 10 and 46 bytes into a 
destination of size 37
  snprintf(savedkey, OPAL_MAX_INFO_KEY+1,
  ^


___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel