-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 14/11/11 21:27, Y.MATSUMOTO wrote:
> I'm a member of MPI library development team in Fujitsu, > Takahiro Kawashima, who sent mail before, is my colleague. > We start to feed back. First of all I'd like to say congratulations on breaking 10PF, and also a big thanks for working on contributing changes back to Open-MPI! Whilst I can't comment on the fix I can confirm that I also see segfaults with Open-MPI 1.4.2 and 1.4.4 with your example program. Intel compilers 11.1: - -------------------------------------------------------------------------- [bruce002:03973] *** Process received signal *** [bruce002:03973] Signal: Segmentation fault (11) [bruce002:03973] Signal code: Address not mapped (1) [bruce002:03973] Failing at address: 0x100000009 [bruce002:03973] [ 0] /lib64/libpthread.so.0 [0x3e1320eb10] [bruce002:03973] [ 1] /usr/local/openmpi/1.4.4-intel/lib/libmpi.so.0 [0x2aaaaab5d79d] [bruce002:03973] [ 2] /usr/local/openmpi/1.4.4-intel/lib/libopen-pal.so.0(opal_progress+0x87) [0x2aaaab1fdc27] [bruce002:03973] [ 3] /usr/local/openmpi/1.4.4-intel/lib/libmpi.so.0 [0x2aaaaabce252] [bruce002:03973] [ 4] /usr/local/openmpi/1.4.4-intel/lib/libmpi.so.0(PMPI_Recv+0x213) [0x2aaaaab1e0f3] [bruce002:03973] [ 5] ./tp_lb_ub_ng(main+0x29b) [0x4021ab] [bruce002:03973] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3e12a1d994] [bruce002:03973] [ 7] ./tp_lb_ub_ng [0x401e59] [bruce002:03973] *** End of error message *** - -------------------------------------------------------------------------- mpiexec noticed that process rank 1 with PID 3973 on node bruce002 exited on signal 11 (Segmentation fault). - -------------------------------------------------------------------------- [bruce002:03972] *** Process received signal *** [bruce002:03972] Signal: Segmentation fault (11) [bruce002:03972] Signal code: Address not mapped (1) [bruce002:03972] Failing at address: 0xffffffffff84bad0 [bruce002:03972] [ 0] /lib64/libpthread.so.0 [0x3e1320eb10] [bruce002:03972] [ 1] ./tp_lb_ub_ng(__intel_new_memcpy+0x2c) [0x403c9c] [bruce002:03972] *** End of error message *** GCC 4.4.4: - -------------------------------------------------------------------------- [bruce002:04049] *** Process received signal *** [bruce002:04049] Signal: Segmentation fault (11) [bruce002:04049] Signal code: Address not mapped (1) [bruce002:04049] Failing at address: 0x100000009 [bruce002:04049] [ 0] /lib64/libpthread.so.0 [0x3e1320eb10] [bruce002:04049] [ 1] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2aaaaab51f27] [bruce002:04049] [ 2] /usr/local/openmpi/1.4.4-gcc/lib/libopen-pal.so.0(opal_progress+0x5a) [0x2aaaab14bb3a] [bruce002:04049] [ 3] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2aaaaabb9985] [bruce002:04049] [ 4] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0(PMPI_Recv+0x12f) [0x2aaaaab1913f] [bruce002:04049] [ 5] ./tp_lb_ub_ng(main+0x21c) [0x400dd0] [bruce002:04049] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3e12a1d994] [bruce002:04049] [ 7] ./tp_lb_ub_ng [0x400af9] [bruce002:04049] *** End of error message *** - -------------------------------------------------------------------------- mpiexec noticed that process rank 1 with PID 4049 on node bruce002 exited on signal 11 (Segmentation fault). - -------------------------------------------------------------------------- [bruce002:04048] *** Process received signal *** [bruce002:04048] Signal: Segmentation fault (11) [bruce002:04048] Signal code: Address not mapped (1) [bruce002:04048] Failing at address: 0x2aaab0833000 [bruce002:04048] [ 0] /lib64/libpthread.so.0 [0x3e1320eb10] [bruce002:04048] [ 1] /lib64/libc.so.6(memcpy+0x3ff) [0x3e12a7c63f] [bruce002:04048] [ 2] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2aaaaaafef7b] [bruce002:04048] [ 3] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2aaaaab4fcdd] [bruce002:04048] [ 4] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2aaaaabc1563] [bruce002:04048] [ 5] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2aaaaabbce78] [bruce002:04048] [ 6] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2aaaaab52036] [bruce002:04048] [ 7] /usr/local/openmpi/1.4.4-gcc/lib/libopen-pal.so.0(opal_progress+0x5a) [0x2aaaab14bb3a] [bruce002:04048] [ 8] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2aaaaabba5f5] [bruce002:04048] [ 9] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0(MPI_Send+0x177) [0x2aaaaab1b1d7] [bruce002:04048] [10] ./tp_lb_ub_ng(main+0x1e4) [0x400d98] [bruce002:04048] [11] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3e12a1d994] [bruce002:04048] [12] ./tp_lb_ub_ng [0x400af9] [bruce002:04048] *** End of error message *** - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk7BybUACgkQO2KABBYQAh9/mwCdEx6FrXaahHRlfIlKX+GqvScO +tcAn0ieXCjxG5JrOvkgSy0YQ9EgA7S8 =nUtx -----END PGP SIGNATURE-----