Hi Ben,

I noted the stack traces refers opal_cuda_memcpy(). Is this issue specific to CUDA environments ?


The coll/tuned default collective module is known not to work when tasks use matching but different signatures.

For example, one task sends one vector of N elements, and the other task receives N elements.


A workaround worth trying is to

mpirun --mca coll basic ...


Last but not least, could you please post a minimal example (and the number of MPI tasks used) that can evidence the issue ?


Cheers,


Gilles


On 11/2/2018 7:59 AM, Ben Menadue wrote:
Hi,

One of our users is reporting an issue using MPI_Allgatherv with a large derived datatype — it segfaults inside OpenMPI. Using a debug build of OpenMPI 3.1.2 produces a ton of messages like this before the segfault:

[r3816:50921] ../../../../../opal/datatype/opal_datatype_pack.h:53
Pointer 0x2acd0121b010 size 131040 is outside [0x2ac5ed268010,0x2ac980ad8010] for
base ptr 0x2ac5ed268010 count 1 and data
[r3816:50921] Datatype 0x42998b0[] size 5920000000 align 4 id 0 length 7 used 6 true_lb 0 true_ub 15360000000 (true_extent 15360000000) lb 0 ub 15360000000 (extent 15360000000)
nbElems 1480000000 loops 4 flags 104 (committed )-c-----GD--[---][---]
contain OPAL_FLOAT4:*
--C--------[---][---]   OPAL_LOOP_S 4 times the next 2 elements extent 80000000 --C---P-D--[---][---]   OPAL_FLOAT4 count 20000000 disp 0x380743000 (15040000000) blen 0 extent 4 (size 80000000) --C--------[---][---]   OPAL_LOOP_E prev 2 elements first elem displacement 15040000000 size of data 80000000 --C--------[---][---]   OPAL_LOOP_S 70 times the next 2 elements extent 80000000 --C---P-D--[---][---]   OPAL_FLOAT4 count 20000000 disp 0x0 (0) blen 0 extent 4 (size 80000000) --C--------[---][---]   OPAL_LOOP_E prev 2 elements first elem displacement 0 size of data 80000000 -------G---[---][---]   OPAL_LOOP_E prev 6 elements first elem displacement 15040000000 size of data 1625032704
Optimized description
-cC---P-DB-[---][---]     OPAL_UINT1 count 320000000 disp 0x380743000 (15040000000) blen 1 extent 1 (size 320000000) -cC---P-DB-[---][---]     OPAL_UINT1 count 1305032704 disp 0x0 (0) blen 1 extent 1 (size 5600000000) -------G---[---][---]   OPAL_LOOP_E prev 2 elements first elem displacement 15040000000 size of d

Here is the backtrace:

==== backtrace ====
 0 0x000000000008987b memcpy()  ???:0
 1 0x00000000000639b6 opal_cuda_memcpy() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_datatype_cuda.c:99  2 0x000000000005cd7a pack_predefined_data() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_datatype_pack.h:56  3 0x000000000005e845 opal_generic_simple_pack() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_datatype_pack.c:319  4 0x000000000004ce6e opal_convertor_pack() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_convertor.c:272  5 0x000000000000e3b6 mca_btl_openib_prepare_src() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib.c:1609  6 0x0000000000023c75 mca_bml_base_prepare_src() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/bml/bml.h:341  7 0x0000000000027d2a mca_pml_ob1_send_request_schedule_once() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.c:995  8 0x000000000002473c mca_pml_ob1_send_request_schedule_exclusive() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.h:313  9 0x000000000002479d mca_pml_ob1_send_request_schedule() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.h:337 10 0x00000000000256fe mca_pml_ob1_frag_completion() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.c:321 11 0x000000000001baaf handle_wc() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib_component.c:3565 12 0x000000000001c20c poll_device() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib_component.c:3719 13 0x000000000001c6c0 progress_one_device() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib_component.c:3829 14 0x000000000001c763 btl_openib_component_progress() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib_component.c:3853 15 0x000000000002ff90 opal_progress() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/../../../../opal/runtime/opal_progress.c:228 16 0x000000000001114c ompi_request_wait_completion() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/request/request.h:413 17 0x0000000000013a80 mca_pml_ob1_send() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_isend.c:266 18 0x000000000010ca45 ompi_coll_base_sendrecv_actual() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/coll/../../../../../../ompi/mca/coll/base/coll_base_util.c:55 19 0x000000000010b5bc ompi_coll_base_sendrecv() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/coll/../../../../../../ompi/mca/coll/base/coll_base_util.h:67 20 0x000000000010ba1e ompi_coll_base_allgatherv_intra_bruck() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/coll/../../../../../../ompi/mca/coll/base/coll_base_allgatherv.c:184 21 0x0000000000005ac5 ompi_coll_tuned_allgatherv_intra_dec_fixed() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/coll/tuned/../../../../../../../ompi/mca/coll/tuned/coll_tuned_decision_fixed.c:640 22 0x000000000007c40d PMPI_Allgatherv() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mpi/c/profile/pallgatherv.c:143 23 0x0000000000401e25 main() /short/z00/bjm900/help/pxs599/memtest.2/memtest1.c:182
24 0x000000000001ed20 __libc_start_main()  ???:0
25 0x00000000004012b9 _start()  ???:0
===================

The derived datatype is produced using
MPI_Type_contiguous(P, MPI_FLOAT, &mpitype_vec_nobs)
where P = 20000000 (so quite large).

Is there any restriction on the maximum size a datatype can be? Or, perhaps on the extent a message can cover, since the Allgatherv creates its own internal datatypes?

Thanks,
Ben



_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Reply via email to