Hi Ben,
I noted the stack traces refers opal_cuda_memcpy(). Is this issue
specific to CUDA environments ?
The coll/tuned default collective module is known not to work when tasks
use matching but different signatures.
For example, one task sends one vector of N elements, and the other task
receives N elements.
A workaround worth trying is to
mpirun --mca coll basic ...
Last but not least, could you please post a minimal example (and the
number of MPI tasks used) that can evidence the issue ?
Cheers,
Gilles
On 11/2/2018 7:59 AM, Ben Menadue wrote:
Hi,
One of our users is reporting an issue using MPI_Allgatherv with a
large derived datatype — it segfaults inside OpenMPI. Using a debug
build of OpenMPI 3.1.2 produces a ton of messages like this before the
segfault:
[r3816:50921] ../../../../../opal/datatype/opal_datatype_pack.h:53
Pointer 0x2acd0121b010 size 131040 is outside
[0x2ac5ed268010,0x2ac980ad8010] for
base ptr 0x2ac5ed268010 count 1 and data
[r3816:50921] Datatype 0x42998b0[] size 5920000000 align 4 id 0 length
7 used 6
true_lb 0 true_ub 15360000000 (true_extent 15360000000) lb 0 ub
15360000000 (extent 15360000000)
nbElems 1480000000 loops 4 flags 104 (committed )-c-----GD--[---][---]
contain OPAL_FLOAT4:*
--C--------[---][---] OPAL_LOOP_S 4 times the next 2 elements extent
80000000
--C---P-D--[---][---] OPAL_FLOAT4 count 20000000 disp 0x380743000
(15040000000) blen 0 extent 4 (size 80000000)
--C--------[---][---] OPAL_LOOP_E prev 2 elements first elem
displacement 15040000000 size of data 80000000
--C--------[---][---] OPAL_LOOP_S 70 times the next 2 elements
extent 80000000
--C---P-D--[---][---] OPAL_FLOAT4 count 20000000 disp 0x0 (0) blen 0
extent 4 (size 80000000)
--C--------[---][---] OPAL_LOOP_E prev 2 elements first elem
displacement 0 size of data 80000000
-------G---[---][---] OPAL_LOOP_E prev 6 elements first elem
displacement 15040000000 size of data 1625032704
Optimized description
-cC---P-DB-[---][---] OPAL_UINT1 count 320000000 disp 0x380743000
(15040000000) blen 1 extent 1 (size 320000000)
-cC---P-DB-[---][---] OPAL_UINT1 count 1305032704 disp 0x0 (0)
blen 1 extent 1 (size 5600000000)
-------G---[---][---] OPAL_LOOP_E prev 2 elements first elem
displacement 15040000000 size of d
Here is the backtrace:
==== backtrace ====
0 0x000000000008987b memcpy() ???:0
1 0x00000000000639b6 opal_cuda_memcpy()
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_datatype_cuda.c:99
2 0x000000000005cd7a pack_predefined_data()
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_datatype_pack.h:56
3 0x000000000005e845 opal_generic_simple_pack()
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_datatype_pack.c:319
4 0x000000000004ce6e opal_convertor_pack()
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_convertor.c:272
5 0x000000000000e3b6 mca_btl_openib_prepare_src()
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib.c:1609
6 0x0000000000023c75 mca_bml_base_prepare_src()
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/bml/bml.h:341
7 0x0000000000027d2a mca_pml_ob1_send_request_schedule_once()
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.c:995
8 0x000000000002473c mca_pml_ob1_send_request_schedule_exclusive()
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.h:313
9 0x000000000002479d mca_pml_ob1_send_request_schedule()
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.h:337
10 0x00000000000256fe mca_pml_ob1_frag_completion()
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.c:321
11 0x000000000001baaf handle_wc()
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib_component.c:3565
12 0x000000000001c20c poll_device()
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib_component.c:3719
13 0x000000000001c6c0 progress_one_device()
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib_component.c:3829
14 0x000000000001c763 btl_openib_component_progress()
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib_component.c:3853
15 0x000000000002ff90 opal_progress()
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/../../../../opal/runtime/opal_progress.c:228
16 0x000000000001114c ompi_request_wait_completion()
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/request/request.h:413
17 0x0000000000013a80 mca_pml_ob1_send()
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_isend.c:266
18 0x000000000010ca45 ompi_coll_base_sendrecv_actual()
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/coll/../../../../../../ompi/mca/coll/base/coll_base_util.c:55
19 0x000000000010b5bc ompi_coll_base_sendrecv()
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/coll/../../../../../../ompi/mca/coll/base/coll_base_util.h:67
20 0x000000000010ba1e ompi_coll_base_allgatherv_intra_bruck()
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/coll/../../../../../../ompi/mca/coll/base/coll_base_allgatherv.c:184
21 0x0000000000005ac5 ompi_coll_tuned_allgatherv_intra_dec_fixed()
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/coll/tuned/../../../../../../../ompi/mca/coll/tuned/coll_tuned_decision_fixed.c:640
22 0x000000000007c40d PMPI_Allgatherv()
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mpi/c/profile/pallgatherv.c:143
23 0x0000000000401e25 main()
/short/z00/bjm900/help/pxs599/memtest.2/memtest1.c:182
24 0x000000000001ed20 __libc_start_main() ???:0
25 0x00000000004012b9 _start() ???:0
===================
The derived datatype is produced using
MPI_Type_contiguous(P, MPI_FLOAT, &mpitype_vec_nobs)
where P = 20000000 (so quite large).
Is there any restriction on the maximum size a datatype can be? Or,
perhaps on the extent a message can cover, since the Allgatherv
creates its own internal datatypes?
Thanks,
Ben
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel