Re: [OMPI devel] basename: a faulty warning 'extra operand --test-name' in tests causes test-driver to fail
I'm happy to provide you with an update on 'extra operand --test-name' occasionally being fed to 'basename' by Open MPI's testsuite, which was fixed by Automake maintainers: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=14840 You may still want to look at 'test/asm/run_tests' why it was passed through. On Fri, Jul 12, 2013 at 9:30 PM, Vasiliy wrote: > I've just gone through a test suite, and in 'test/asm/run_tests' there > is a statement: > > progname="`basename $*`" > > where '--test-name' could accidentally get in, causing the reported > issues, since 'basename' does not have such an option. Somebody > familiar with a test suite may want to look into it. > > On Fri, Jul 12, 2013 at 5:17 PM, Vasiliy wrote: >> Sorry again, my report was a stub because I didn't have enough time to >> investigate the issue. Due to the verbose level was set to zero, I've >> assumed from the log that 'basename' belongs to the Open MPI source >> whereas it is not. Thank you for drawing my attention it's actually a >> utility from 'coreutils' Cygwin package. I'll report it to their team. >> I've also filed a report with Automake's team about their part. >> >> 1. I'm testing the Open MPI SVN patched source, that is, 1.9a1-svn >> with the latest autotools assembled from their git/svn sources, and my >> humble patches, yet have to be polished. >> >> 2. Indeed, I'm running 'make check' when seeing those failures. >> Unfortunately, that failure with 'test-driver' obscures how many (how >> less), if any, true tests have been failed. I've just run it now on >> the latest sources (bzw, there's still an old rot with 'trace.c') and, >> if I could manage to make 'test-drive' working, it passes *ALL* the >> tests, except those with bogus 'test-drive' crashes, that is: >> >> atomic_spinlock_noinline.exe >> atomic_cmpset_noinline.exe >> atomic_math_noinline.exe >> atomic_spinlock_noinline.exe >> atomic_cmpset_noinline.exe >> atomic_spinlock_noinline.exe >> atomic_math_noinline.exe >> atomic_cmpset_noinline.exe >> atomic_spinlock_noinline.exe >> atomic_math_noinline.exe >> atomic_spinlock_noinline.exe >> atomic_cmpset_noinline.exe >> atomic_math_noinline.exe >> >> Clearly, they're inline/noinline issues, need to be looked into at >> some time later. >> >> I can now give a feedback why I've got early reported warning about >> the shared libraries which haven't got created, and a blowout of >> 'undefined symbols'. Indeed, that was a problem with Makefile.am's. >> I've tested just two from about a hundred of other successfully >> compiled static libraries, which DSO counterparts weren't created upon >> compilation process, though being requested to: >> >> - 'ompi/datatype's Makefile compiles 'libdatatype' without very much >> needed 'libopen-pal' and 'libmpi' libraries, what causes a shared >> library not to be created because of undefined symbols; bzw, even if >> added to the libtool (v2.4.2-374) invocation command line they are >> still not being produced, gcc doesn't have this kind of a problem; >> >> - 'ompi/debuggers's Makefile does not make a 'libompi_dbg_msgq.dll.a' >> import library (though there is a shared library), the corresponding >> part has to be created manually; >> >> I haven't checked other 95's. >> >> >> >> On Fri, Jul 12, 2013 at 2:26 PM, Jeff Squyres (jsquyres) >> wrote: >>> I'm sorry, I'm still unclear what you're trying to tell us. :-( >>> >>> 1. What version of Open MPI are you testing? If you're testing Open MPI >>> 1.6.x with very new Automake, I'm not surprised that there's some failures. >>> We usually pick the newest GNU Autotools when we begin a release series, >>> and then stick with those tool versions for the life of that series. We do >>> not attempt to forward-port to newer Autotools on that series, meaning that >>> sometimes newer versions of the Autotools will break the builds of that >>> series. That's ok. >>> >>> 2. Assumedly, you're seeing this failure when you run "make check". Is >>> that correct? What test, exactly, is failing? It's very difficult to grok >>> what you're reporting when you only include the last few lines of output, >>> which exclude the majority of the context that we need to know what you're >>> talking about. >>> >>> Your bug reports have been *extremely* helpful in cleaning out some old >>> kruft from our tree, but could you include more context in the future? >>> E.g., include all the "compile problems" items from here: >>> >>> http://www.open-mpi.org/community/help/ >>> >>> 3. We don't have a test named "basename" or "test-driver"; basename is >>> usually an OS utility, and test-driver is part of the new Automake testing >>> framework. If there's a mistake in how these are being invoked, it's >>> coming from Automake, and you should report the bug to them. >>> >>> ...unless we're doing something wrong in our Makefile.am's in how we list >>> the tests to be run. Is that what you're saying? >>> >>> >>> On Jul 12, 2013, at 3:59 AM, Vasiliy wrote: >>>
[OMPI devel] [bug] One-sided communication with a duplicated datatype
Hi, I encountered an assertion failure in Open MPI trunk and found a bug. See the attached program. This program can be run with mpiexec -n 1. This program calls MPI_Put and writes one int value to the target side. The target side datatype is equivalent to MPI_INT, but is a derived datatype created by MPI_Type_contiguous and MPI_Type_Dup. This program aborts with the following output. == dt1 (0x2626160) type 2 count ints 1 count disp 0 count datatype 1 ints: 1 types:MPI_INT dt2 (0x2626340) type 1 count ints 0 count disp 0 count datatype 1 types:0x2626160 put_dup_type: ../../../ompi/datatype/ompi_datatype_args.c:565: __ompi_datatype_create_from_packed_description: Assertion `data_id < 45' failed. [ppc:05244] *** Process received signal *** [ppc:05244] Signal: Aborted (6) [ppc:05244] Signal code: (-6) [ppc:05244] [ 0] /lib/libpthread.so.0(+0xeff0) [0x7fe58a275ff0] [ppc:05244] [ 1] /lib/libc.so.6(gsignal+0x35) [0x7fe589f371b5] [ppc:05244] [ 2] /lib/libc.so.6(abort+0x180) [0x7fe589f39fc0] [ppc:05244] [ 3] /lib/libc.so.6(__assert_fail+0xf1) [0x7fe589f30301] [ppc:05244] [ 4] /ompi/lib/libmpi.so.0(+0x6504e) [0x7fe58a4e804e] [ppc:05244] [ 5] /ompi/lib/libmpi.so.0(ompi_datatype_create_from_packed_description+0x23) [0x7fe58a4e8cf6] [ppc:05244] [ 6] /ompi/lib/openmpi/mca_osc_rdma.so(+0xd04b) [0x7fe5839a104b] [ppc:05244] [ 7] /ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_sendreq_recv_put+0xa8) [0x7fe5839a3ae5] [ppc:05244] [ 8] /ompi/lib/openmpi/mca_osc_rdma.so(+0x86cc) [0x7fe58399c6cc] [ppc:05244] [ 9] /ompi/lib/openmpi/mca_btl_self.so(mca_btl_self_send+0x87) [0x7fe58510bb04] [ppc:05244] [10] /ompi/lib/openmpi/mca_osc_rdma.so(+0xc44b) [0x7fe5839a044b] [ppc:05244] [11] /ompi/lib/openmpi/mca_osc_rdma.so(+0xd69d) [0x7fe5839a169d] [ppc:05244] [12] /ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_flush+0x50) [0x7fe5839a1776] [ppc:05244] [13] /ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_module_fence+0x8e6) [0x7fe5839a84ab] [ppc:05244] [14] /ompi/lib/libmpi.so.0(MPI_Win_fence+0x16f) [0x7fe58a54127d] [ppc:05244] [15] ompi-trunk/put_dup_type() [0x400d10] [ppc:05244] [16] /lib/libc.so.6(__libc_start_main+0xfd) [0x7fe589f23c8d] [ppc:05244] [17] put_dup_type() [0x400b09] [ppc:05244] *** End of error message *** -- mpiexec noticed that process rank 0 with PID 5244 on node ppc exited on signal 6 (Aborted). -- == __ompi_datatype_create_from_packed_description function, in which the assertion failure occurred, seems to expect the value of data_id is an ID of a predefined datatype. In my environment, the value of data_id is 68, that is an ID of the datatype created by MPI_Type_contiguous. On one-sided communication, the target side datatype is encoded as 'description' at the origin side and then it is decoded at the target side. I think there are problems in both encoding stage and decoding stage. __ompi_datatype_pack_description function in ompi/datatype/ompi_datatype_args.c file encodes the datatype. For MPI_COMBINER_DUP on line 451, it encodes only create_type and id and returns immediately. It doesn't encode the information of the base dataype (in my case, the datatype created by MPI_Type_contiguous). __ompi_datatype_create_from_packed_description function in ompi/datatype/ompi_datatype_args.c file decodes the description. For MPI_COMBINER_DUP in line 557, it expects the value of data_id is an ID of a predefined datatype. It is not always true. I cannot fix this problem yet because I'm not familiar with the datatype code in Open MPI. MPI_COMBINER_DUP is also used for predefined datatypes and the calculation of total_pack_size is also involved. It seems not so simple. Regards, KAWASHIMA Takahiro #include #include #include #define PRINT_ARGS #ifdef PRINT_ARGS /* defined in ompi/datatype/ompi_datatype_args.c */ extern int32_t ompi_datatype_print_args(const struct ompi_datatype_t *pData); #endif int main(int argc, char *argv[]) { MPI_Win win; MPI_Datatype dt1, dt2; int obuf[1], tbuf[1]; obuf[0] = 77; tbuf[0] = 88; MPI_Init(&argc, &argv); MPI_Type_contiguous(1, MPI_INT, &dt1); MPI_Type_dup(dt1, &dt2); MPI_Type_commit(&dt2); #ifdef PRINT_ARGS printf(" dt1 (%p) \n", (void *)dt1); ompi_datatype_print_args(dt1); printf(" dt2 (%p) \n", (void *)dt2); ompi_datatype_print_args(dt2); fflush(stdout); #endif MPI_Win_create(tbuf, sizeof(int), 1, MPI_INFO_NULL, MPI_COMM_SELF, &win); MPI_Win_fence(0, win); MPI_Put(obuf, 1, MPI_INT, 0, 0, 1, dt2, win); MPI_Win_fence(0, win); MPI_Type_free(&dt1); MPI_Type_free(&dt2); MPI_Win_free(&win); MPI_Finalize(); if (tbuf[0] == 77) { printf("OK
Re: [OMPI devel] [bug] One-sided communication with a duplicated datatype
Takahiro, Nice catch. That particular code was an over-optimizations … that failed. Please try with the patch below. Let me know if it's working as expected, I will push it in the trunk once confirmed. George. Index: ompi/datatype/ompi_datatype_args.c === --- ompi/datatype/ompi_datatype_args.c (revision 28787) +++ ompi/datatype/ompi_datatype_args.c (working copy) @@ -449,9 +449,10 @@ } /* For duplicated datatype we don't have to store all the information */ if( MPI_COMBINER_DUP == args->create_type ) { -position[0] = args->create_type; -position[1] = args->d[0]->id; /* On the OMPI - layer, copy the ompi_datatype.id */ -return OMPI_SUCCESS; +ompi_datatype_t* temp_data = args->d[0]; +return __ompi_datatype_pack_description(temp_data, +packed_buffer, +next_index ); } position[0] = args->create_type; position[1] = args->ci; On Jul 14, 2013, at 14:30 , KAWASHIMA Takahiro wrote: > Hi, > > I encountered an assertion failure in Open MPI trunk and found a bug. > > See the attached program. This program can be run with mpiexec -n 1. > This program calls MPI_Put and writes one int value to the target side. > The target side datatype is equivalent to MPI_INT, but is a derived > datatype created by MPI_Type_contiguous and MPI_Type_Dup. > > This program aborts with the following output. > > == > dt1 (0x2626160) > type 2 count ints 1 count disp 0 count datatype 1 > ints: 1 > types:MPI_INT > dt2 (0x2626340) > type 1 count ints 0 count disp 0 count datatype 1 > types:0x2626160 > put_dup_type: ../../../ompi/datatype/ompi_datatype_args.c:565: > __ompi_datatype_create_from_packed_description: Assertion `data_id < 45' > failed. > [ppc:05244] *** Process received signal *** > [ppc:05244] Signal: Aborted (6) > [ppc:05244] Signal code: (-6) > [ppc:05244] [ 0] /lib/libpthread.so.0(+0xeff0) [0x7fe58a275ff0] > [ppc:05244] [ 1] /lib/libc.so.6(gsignal+0x35) [0x7fe589f371b5] > [ppc:05244] [ 2] /lib/libc.so.6(abort+0x180) [0x7fe589f39fc0] > [ppc:05244] [ 3] /lib/libc.so.6(__assert_fail+0xf1) [0x7fe589f30301] > [ppc:05244] [ 4] /ompi/lib/libmpi.so.0(+0x6504e) [0x7fe58a4e804e] > [ppc:05244] [ 5] > /ompi/lib/libmpi.so.0(ompi_datatype_create_from_packed_description+0x23) > [0x7fe58a4e8cf6] > [ppc:05244] [ 6] /ompi/lib/openmpi/mca_osc_rdma.so(+0xd04b) [0x7fe5839a104b] > [ppc:05244] [ 7] > /ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_sendreq_recv_put+0xa8) > [0x7fe5839a3ae5] > [ppc:05244] [ 8] /ompi/lib/openmpi/mca_osc_rdma.so(+0x86cc) [0x7fe58399c6cc] > [ppc:05244] [ 9] /ompi/lib/openmpi/mca_btl_self.so(mca_btl_self_send+0x87) > [0x7fe58510bb04] > [ppc:05244] [10] /ompi/lib/openmpi/mca_osc_rdma.so(+0xc44b) [0x7fe5839a044b] > [ppc:05244] [11] /ompi/lib/openmpi/mca_osc_rdma.so(+0xd69d) [0x7fe5839a169d] > [ppc:05244] [12] /ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_flush+0x50) > [0x7fe5839a1776] > [ppc:05244] [13] > /ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_module_fence+0x8e6) > [0x7fe5839a84ab] > [ppc:05244] [14] /ompi/lib/libmpi.so.0(MPI_Win_fence+0x16f) [0x7fe58a54127d] > [ppc:05244] [15] ompi-trunk/put_dup_type() [0x400d10] > [ppc:05244] [16] /lib/libc.so.6(__libc_start_main+0xfd) [0x7fe589f23c8d] > [ppc:05244] [17] put_dup_type() [0x400b09] > [ppc:05244] *** End of error message *** > -- > mpiexec noticed that process rank 0 with PID 5244 on node ppc exited on > signal 6 (Aborted). > -- > == > > __ompi_datatype_create_from_packed_description function, in which the > assertion failure occurred, seems to expect the value of data_id is an > ID of a predefined datatype. In my environment, the value of data_id > is 68, that is an ID of the datatype created by MPI_Type_contiguous. > > On one-sided communication, the target side datatype is encoded as > 'description' at the origin side and then it is decoded at the target > side. I think there are problems in both encoding stage and decoding > stage. > > __ompi_datatype_pack_description function in > ompi/datatype/ompi_datatype_args.c file encodes the datatype. > For MPI_COMBINER_DUP on line 451, it encodes only create_type and id > and returns immediately. It doesn't encode the information of the base > dataype (in my case, the datatype created by MPI_Type_contiguous). > > __ompi_datatype_create_from_packed_description function in > ompi/datatype/ompi_datatype_args.c file decodes the description. > For MPI_COMBINER_DUP in line 557, it expects the value of data_id is > an ID of a predefined datatype. It
Re: [OMPI devel] [bug] One-sided communication with a duplicated datatype
George, Thanks. But no, your patch does not work correctly. The assertion failure disappeared by your patch but the value of the target buffer of MPI_Put is not a correct one. In rdma OSC (and pt2pt OSC), the following data are packed into the send buffer in ompi_osc_rdma_sendreq_send function on the origin side. - header - datatype description - user data User data are written at the offset of (sizeof(ompi_osc_rdma_send_header_t) + total_pack_size). In the case of my program attached in my previous mail, total_pack_size is 32 because ompi_datatype_set_args set 8 for MPI_COMBINER_DUP and 24 for MPI_COMBINER_CONTIGUOUS. See the following code. int32_t ompi_datatype_set_args(... snip ...) { ... snip ... switch(type){ ... snip ... case MPI_COMBINER_DUP: /* Recompute the data description packed size based on the optimization * for MPI_COMBINER_DUP. */ pArgs->total_pack_size = 2 * sizeof(int); total_pack_size = 8 break; ... snip ... } ... for( pos = 0; pos < cd; pos++ ) { ... snip ... if( !(ompi_datatype_is_predefined(d[pos])) ) { ... snip ... pArgs->total_pack_size += ((ompi_datatype_args_t*)d[pos]->args)->total_pack_size; total_pack_size += 24 ... snip ... } ... snip ... } ... snip ... } But on the target side, user data are read at the offset of (sizeof(ompi_osc_rdma_send_header_t) + 24) because ompi_osc_base_datatype_create function, which is called by ompi_osc_rdma_sendreq_recv_put function, progress the offset only 24 bytes. Not 32 bytes. So the wrong data are written to the target buffer. We need to take care of total_pack_size in the origin side. I modified ompi_datatype_set_args function as a trial. Index: ompi/datatype/ompi_datatype_args.c === --- ompi/datatype/ompi_datatype_args.c (revision 28778) +++ ompi/datatype/ompi_datatype_args.c (working copy) @@ -129,7 +129,7 @@ /* Recompute the data description packed size based on the optimization * for MPI_COMBINER_DUP. */ -pArgs->total_pack_size = 2 * sizeof(int); +pArgs->total_pack_size = 0; break; case MPI_COMBINER_CONTIGUOUS: This patch in addition to your patch works correctly for my program. But I'm not sure this is a correct solution. Regards, KAWASHIMA Takahiro > Takahiro, > > Nice catch. That particular code was an over-optimizations … that failed. > Please try with the patch below. > > Let me know if it's working as expected, I will push it in the trunk once > confirmed. > > George. > > > Index: ompi/datatype/ompi_datatype_args.c > === > --- ompi/datatype/ompi_datatype_args.c(revision 28787) > +++ ompi/datatype/ompi_datatype_args.c(working copy) > @@ -449,9 +449,10 @@ > } > /* For duplicated datatype we don't have to store all the information */ > if( MPI_COMBINER_DUP == args->create_type ) { > -position[0] = args->create_type; > -position[1] = args->d[0]->id; /* On the OMPI - layer, copy the > ompi_datatype.id */ > -return OMPI_SUCCESS; > +ompi_datatype_t* temp_data = args->d[0]; > +return __ompi_datatype_pack_description(temp_data, > +packed_buffer, > +next_index ); > } > position[0] = args->create_type; > position[1] = args->ci; > > > > On Jul 14, 2013, at 14:30 , KAWASHIMA Takahiro > wrote: > > > Hi, > > > > I encountered an assertion failure in Open MPI trunk and found a bug. > > > > See the attached program. This program can be run with mpiexec -n 1. > > This program calls MPI_Put and writes one int value to the target side. > > The target side datatype is equivalent to MPI_INT, but is a derived > > datatype created by MPI_Type_contiguous and MPI_Type_Dup. > > > > This program aborts with the following output. > > > > == > > dt1 (0x2626160) > > type 2 count ints 1 count disp 0 count datatype 1 > > ints: 1 > > types:MPI_INT > > dt2 (0x2626340) > > type 1 count ints 0 count disp 0 count datatype 1 > > types:0x2626160 > > put_dup_type: ../../../ompi/datatype/ompi_datatype_args.c:565: > > __ompi_datatype_create_from_packed_description: Assertion `data_id < 45' > > failed. > > [ppc:05244] *** Process received signal *** > > [ppc:05244] Signal: Aborted (6) > > [ppc:05244] Signal code: (-6) > > [ppc:05244] [ 0] /lib/libpthread.so.0(+0xeff0) [0x7fe58a275ff0] > > [ppc:05244] [ 1] /lib/libc.so.6(gsignal+0x35) [0x7fe589f371b5] > > [ppc:05244] [ 2] /lib
Re: [OMPI devel] [bug] One-sided communication with a duplicated datatype
No. My patch doesn't work for a more simple case, just a duplicate of MPI_INT. Datatype is too complex for me ... Regards, KAWASHIMA Takahiro > George, > > Thanks. But no, your patch does not work correctly. > > The assertion failure disappeared by your patch but the value of the > target buffer of MPI_Put is not a correct one. > > In rdma OSC (and pt2pt OSC), the following data are packed into > the send buffer in ompi_osc_rdma_sendreq_send function on the > origin side. > > - header > - datatype description > - user data > > User data are written at the offset of > (sizeof(ompi_osc_rdma_send_header_t) + total_pack_size). > > In the case of my program attached in my previous mail, total_pack_size > is 32 because ompi_datatype_set_args set 8 for MPI_COMBINER_DUP and > 24 for MPI_COMBINER_CONTIGUOUS. See the following code. > > > int32_t ompi_datatype_set_args(... snip ...) > { > ... snip ... > switch(type){ > ... snip ... > case MPI_COMBINER_DUP: > /* Recompute the data description packed size based on the > optimization > * for MPI_COMBINER_DUP. > */ > pArgs->total_pack_size = 2 * sizeof(int); total_pack_size = 8 > break; > ... snip ... > } > ... > for( pos = 0; pos < cd; pos++ ) { > ... snip ... > if( !(ompi_datatype_is_predefined(d[pos])) ) { > ... snip ... > pArgs->total_pack_size += > ((ompi_datatype_args_t*)d[pos]->args)->total_pack_size; total_pack_size > += 24 > ... snip ... > } > ... snip ... > } > ... snip ... > } > > > But on the target side, user data are read at the offset of > (sizeof(ompi_osc_rdma_send_header_t) + 24) > because ompi_osc_base_datatype_create function, which is called > by ompi_osc_rdma_sendreq_recv_put function, progress the offset > only 24 bytes. Not 32 bytes. > > So the wrong data are written to the target buffer. > > We need to take care of total_pack_size in the origin side. > > I modified ompi_datatype_set_args function as a trial. > > Index: ompi/datatype/ompi_datatype_args.c > === > --- ompi/datatype/ompi_datatype_args.c (revision 28778) > +++ ompi/datatype/ompi_datatype_args.c (working copy) > @@ -129,7 +129,7 @@ > /* Recompute the data description packed size based on the > optimization > * for MPI_COMBINER_DUP. > */ > -pArgs->total_pack_size = 2 * sizeof(int); > +pArgs->total_pack_size = 0; > break; > > case MPI_COMBINER_CONTIGUOUS: > > This patch in addition to your patch works correctly for my program. > But I'm not sure this is a correct solution. > > Regards, > KAWASHIMA Takahiro > > > Takahiro, > > > > Nice catch. That particular code was an over-optimizations … that failed. > > Please try with the patch below. > > > > Let me know if it's working as expected, I will push it in the trunk once > > confirmed. > > > > George. > > > > > > Index: ompi/datatype/ompi_datatype_args.c > > === > > --- ompi/datatype/ompi_datatype_args.c (revision 28787) > > +++ ompi/datatype/ompi_datatype_args.c (working copy) > > @@ -449,9 +449,10 @@ > > } > > /* For duplicated datatype we don't have to store all the information > > */ > > if( MPI_COMBINER_DUP == args->create_type ) { > > -position[0] = args->create_type; > > -position[1] = args->d[0]->id; /* On the OMPI - layer, copy the > > ompi_datatype.id */ > > -return OMPI_SUCCESS; > > +ompi_datatype_t* temp_data = args->d[0]; > > +return __ompi_datatype_pack_description(temp_data, > > +packed_buffer, > > +next_index ); > > } > > position[0] = args->create_type; > > position[1] = args->ci; > > > > > > > > On Jul 14, 2013, at 14:30 , KAWASHIMA Takahiro > > wrote: > > > > > Hi, > > > > > > I encountered an assertion failure in Open MPI trunk and found a bug. > > > > > > See the attached program. This program can be run with mpiexec -n 1. > > > This program calls MPI_Put and writes one int value to the target side. > > > The target side datatype is equivalent to MPI_INT, but is a derived > > > datatype created by MPI_Type_contiguous and MPI_Type_Dup. > > > > > > This program aborts with the following output. > > > > > > == > > > dt1 (0x2626160) > > > type 2 count ints 1 count disp 0 count datatype 1 > > > ints: 1 > > > types:MPI_INT > > > dt2 (0x2626340) > > > type 1 count ints 0 count disp 0 count datatype 1 > > > types:0x2626160 > > >
Re: [OMPI devel] [bug] One-sided communication with a duplicated datatype
George, A improved patch is attached. Latter half is same as your patch. But again, I'm not sure this is a correct solution. It works correctly for my attached put_dup_type_3.c. Run as "mpiexec -n 1 ./put_dup_type_3". It will print seven OKs if succeeded. Regards, KAWASHIMA Takahiro > No. My patch doesn't work for a more simple case, > just a duplicate of MPI_INT. > > Datatype is too complex for me ... > > Regards, > KAWASHIMA Takahiro > > > George, > > > > Thanks. But no, your patch does not work correctly. > > > > The assertion failure disappeared by your patch but the value of the > > target buffer of MPI_Put is not a correct one. > > > > In rdma OSC (and pt2pt OSC), the following data are packed into > > the send buffer in ompi_osc_rdma_sendreq_send function on the > > origin side. > > > > - header > > - datatype description > > - user data > > > > User data are written at the offset of > > (sizeof(ompi_osc_rdma_send_header_t) + total_pack_size). > > > > In the case of my program attached in my previous mail, total_pack_size > > is 32 because ompi_datatype_set_args set 8 for MPI_COMBINER_DUP and > > 24 for MPI_COMBINER_CONTIGUOUS. See the following code. > > > > > > int32_t ompi_datatype_set_args(... snip ...) > > { > > ... snip ... > > switch(type){ > > ... snip ... > > case MPI_COMBINER_DUP: > > /* Recompute the data description packed size based on the > > optimization > > * for MPI_COMBINER_DUP. > > */ > > pArgs->total_pack_size = 2 * sizeof(int); total_pack_size = 8 > > break; > > ... snip ... > > } > > ... > > for( pos = 0; pos < cd; pos++ ) { > > ... snip ... > > if( !(ompi_datatype_is_predefined(d[pos])) ) { > > ... snip ... > > pArgs->total_pack_size += > > ((ompi_datatype_args_t*)d[pos]->args)->total_pack_size; > > total_pack_size += 24 > > ... snip ... > > } > > ... snip ... > > } > > ... snip ... > > } > > > > > > But on the target side, user data are read at the offset of > > (sizeof(ompi_osc_rdma_send_header_t) + 24) > > because ompi_osc_base_datatype_create function, which is called > > by ompi_osc_rdma_sendreq_recv_put function, progress the offset > > only 24 bytes. Not 32 bytes. > > > > So the wrong data are written to the target buffer. > > > > We need to take care of total_pack_size in the origin side. > > > > I modified ompi_datatype_set_args function as a trial. > > > > Index: ompi/datatype/ompi_datatype_args.c > > === > > --- ompi/datatype/ompi_datatype_args.c (revision 28778) > > +++ ompi/datatype/ompi_datatype_args.c (working copy) > > @@ -129,7 +129,7 @@ > > /* Recompute the data description packed size based on the > > optimization > > * for MPI_COMBINER_DUP. > > */ > > -pArgs->total_pack_size = 2 * sizeof(int); > > +pArgs->total_pack_size = 0; > > break; > > > > case MPI_COMBINER_CONTIGUOUS: > > > > This patch in addition to your patch works correctly for my program. > > But I'm not sure this is a correct solution. > > > > Regards, > > KAWASHIMA Takahiro > > > > > Takahiro, > > > > > > Nice catch. That particular code was an over-optimizations … that failed. > > > Please try with the patch below. > > > > > > Let me know if it's working as expected, I will push it in the trunk once > > > confirmed. > > > > > > George. > > > > > > > > > Index: ompi/datatype/ompi_datatype_args.c > > > === > > > --- ompi/datatype/ompi_datatype_args.c(revision 28787) > > > +++ ompi/datatype/ompi_datatype_args.c(working copy) > > > @@ -449,9 +449,10 @@ > > > } > > > /* For duplicated datatype we don't have to store all the > > > information */ > > > if( MPI_COMBINER_DUP == args->create_type ) { > > > -position[0] = args->create_type; > > > -position[1] = args->d[0]->id; /* On the OMPI - layer, copy the > > > ompi_datatype.id */ > > > -return OMPI_SUCCESS; > > > +ompi_datatype_t* temp_data = args->d[0]; > > > +return __ompi_datatype_pack_description(temp_data, > > > +packed_buffer, > > > +next_index ); > > > } > > > position[0] = args->create_type; > > > position[1] = args->ci; > > > > > > > > > > > > On Jul 14, 2013, at 14:30 , KAWASHIMA Takahiro > > > wrote: > > > > > > > Hi, > > > > > > > > I encountered an assertion failure in Open MPI trunk and found a bug. > > > > > > > > See the attached program. This program can be run with mpiexec -n 1. > > > > This program calls MPI_Put and writes one int value t
Re: [OMPI devel] RFC: remove opal_trace macro
I went ahead and committed this after finding only *one* reference to OPAL_TRACE anywhere in the ompi code base. On Jul 11, 2013, at 9:05 AM, Ralph Castain wrote: > WHAT: remove the old and stale "OPAL_TRACE" macro > > WHY: it is old, stale, no longer needed, and largely unused > > WHEN: since it is virtually unused, a short timeout seems appropriate > so let's set it for Tues 7/16 >
[OMPI devel] 'make re-install' : remove 'ortecc' symlink also
Makefile: please, remove/check for 'ortecc' symlink before proceeding with install make[4]: Entering directory '/usr/src/64bit/release/openmpi/openmpi-1.9.0-a1/build/orte/tools/wrappers' test -z "/usr/bin" || /usr/bin/mkdir -p "/usr/bin" make install-data-hook (cd /usr/bin; rm -f ortecc.exe; ln -s opal_wrapper ortecc) ln: failed to create symbolic link `ortecc': File exists make[4]: Entering directory '/usr/src/64bit/release/openmpi/openmpi-1.9.0-a1/build/orte/tools/wrappers' make[4]: Nothing to be done for 'install-data-hook'. make[4]: Leaving directory '/usr/src/64bit/release/openmpi/openmpi-1.9.0-a1/build/orte/tools/wrappers' Makefile:1668: recipe for target 'install-exec-hook-always' failed
Re: [OMPI devel] 'make re-install' : remove 'ortecc' symlink also
also 'opalcc', and others: make[4]: Entering directory '/usr/src/64bit/release/openmpi/openmpi-1.9.0-a1/build/opal/tools/wrappers' (cd /usr/bin; rm -f opalcc.exe; ln -s opal_wrapper opalcc) ln: failed to create symbolic link `opalcc': File exists Makefile:1972: recipe for target 'install-exec-hook' failed ... On Sun, Jul 14, 2013 at 11:35 PM, Vasiliy wrote: > Makefile: please, remove/check for 'ortecc' symlink before proceeding > with install > > make[4]: Entering directory > '/usr/src/64bit/release/openmpi/openmpi-1.9.0-a1/build/orte/tools/wrappers' > test -z "/usr/bin" || /usr/bin/mkdir -p "/usr/bin" > make install-data-hook > (cd /usr/bin; rm -f ortecc.exe; ln -s opal_wrapper ortecc) > ln: failed to create symbolic link `ortecc': File exists > make[4]: Entering directory > '/usr/src/64bit/release/openmpi/openmpi-1.9.0-a1/build/orte/tools/wrappers' > make[4]: Nothing to be done for 'install-data-hook'. > make[4]: Leaving directory > '/usr/src/64bit/release/openmpi/openmpi-1.9.0-a1/build/orte/tools/wrappers' > Makefile:1668: recipe for target 'install-exec-hook-always' failed >
Re: [OMPI devel] [bug] One-sided communication with a duplicated datatype
Takahiro, Please find below another patch, this time hopefully fixing all issues. The problem with my original patch and with yours was that they try to address the packing of the data representation without fixing the computation of the required length. As a result the length on the packer and unpacker differs and the unpacking of the subsequent data is done from a wrong location. I changed the code to force the preparation of the packed data representation before returning the length the first time. This way we can compute exactly how many bytes we need, including the potential alignment requirements. As a result the amount on both sides (the packer and the unpacker) are now identical, and the entire process works flawlessly (or so I hope). Let me know if you still notice issues with this patch. I'll push the tomorrow in the trunk, so it can soak for a few days before propagation to the branches. George. packed.patch Description: Binary data On Jul 14, 2013, at 20:28 , KAWASHIMA Takahiro wrote: > George, > > A improved patch is attached. Latter half is same as your patch. > But again, I'm not sure this is a correct solution. > > It works correctly for my attached put_dup_type_3.c. > Run as "mpiexec -n 1 ./put_dup_type_3". > It will print seven OKs if succeeded. > > Regards, > KAWASHIMA Takahiro > >> No. My patch doesn't work for a more simple case, >> just a duplicate of MPI_INT. >> >> Datatype is too complex for me ... >> >> Regards, >> KAWASHIMA Takahiro >> >>> George, >>> >>> Thanks. But no, your patch does not work correctly. >>> >>> The assertion failure disappeared by your patch but the value of the >>> target buffer of MPI_Put is not a correct one. >>> >>> In rdma OSC (and pt2pt OSC), the following data are packed into >>> the send buffer in ompi_osc_rdma_sendreq_send function on the >>> origin side. >>> >>> - header >>> - datatype description >>> - user data >>> >>> User data are written at the offset of >>> (sizeof(ompi_osc_rdma_send_header_t) + total_pack_size). >>> >>> In the case of my program attached in my previous mail, total_pack_size >>> is 32 because ompi_datatype_set_args set 8 for MPI_COMBINER_DUP and >>> 24 for MPI_COMBINER_CONTIGUOUS. See the following code. >>> >>> >>> int32_t ompi_datatype_set_args(... snip ...) >>> { >>>... snip ... >>>switch(type){ >>>... snip ... >>>case MPI_COMBINER_DUP: >>>/* Recompute the data description packed size based on the >>> optimization >>> * for MPI_COMBINER_DUP. >>> */ >>>pArgs->total_pack_size = 2 * sizeof(int); total_pack_size = 8 >>>break; >>>... snip ... >>>} >>>... >>>for( pos = 0; pos < cd; pos++ ) { >>>... snip ... >>>if( !(ompi_datatype_is_predefined(d[pos])) ) { >>>... snip ... >>>pArgs->total_pack_size += >>> ((ompi_datatype_args_t*)d[pos]->args)->total_pack_size; >>> total_pack_size += 24 >>>... snip ... >>>} >>>... snip ... >>>} >>>... snip ... >>> } >>> >>> >>> But on the target side, user data are read at the offset of >>> (sizeof(ompi_osc_rdma_send_header_t) + 24) >>> because ompi_osc_base_datatype_create function, which is called >>> by ompi_osc_rdma_sendreq_recv_put function, progress the offset >>> only 24 bytes. Not 32 bytes. >>> >>> So the wrong data are written to the target buffer. >>> >>> We need to take care of total_pack_size in the origin side. >>> >>> I modified ompi_datatype_set_args function as a trial. >>> >>> Index: ompi/datatype/ompi_datatype_args.c >>> === >>> --- ompi/datatype/ompi_datatype_args.c (revision 28778) >>> +++ ompi/datatype/ompi_datatype_args.c (working copy) >>> @@ -129,7 +129,7 @@ >>> /* Recompute the data description packed size based on the >>> optimization >>> * for MPI_COMBINER_DUP. >>> */ >>> -pArgs->total_pack_size = 2 * sizeof(int); >>> +pArgs->total_pack_size = 0; >>> break; >>> >>> case MPI_COMBINER_CONTIGUOUS: >>> >>> This patch in addition to your patch works correctly for my program. >>> But I'm not sure this is a correct solution. >>> >>> Regards, >>> KAWASHIMA Takahiro >>> Takahiro, Nice catch. That particular code was an over-optimizations … that failed. Please try with the patch below. Let me know if it's working as expected, I will push it in the trunk once confirmed. George. Index: ompi/datatype/ompi_datatype_args.c === --- ompi/datatype/ompi_datatype_args.c (revision 28787) +++ ompi/datatype/ompi_datatype_args.c (working copy) @@ -449,9 +449,10 @@ } /* For