Re: [OMPI devel] basename: a faulty warning 'extra operand --test-name' in tests causes test-driver to fail

2013-07-14 Thread Vasiliy
I'm happy to provide you with an update on 'extra operand --test-name'
occasionally being fed to 'basename' by Open MPI's testsuite, which
was fixed by Automake maintainers:
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=14840

You may still want to look at 'test/asm/run_tests' why it was passed through.

On Fri, Jul 12, 2013 at 9:30 PM, Vasiliy  wrote:
> I've just gone through a test suite, and in 'test/asm/run_tests' there
> is a statement:
>
> progname="`basename $*`"
>
> where '--test-name' could accidentally get in, causing the reported
> issues, since 'basename' does not have such an option. Somebody
> familiar with a test suite may want to look into it.
>
> On Fri, Jul 12, 2013 at 5:17 PM, Vasiliy  wrote:
>> Sorry again, my report was a stub because I didn't have enough time to
>> investigate the issue. Due to the verbose level was set to zero, I've
>> assumed from the log that 'basename' belongs to the Open MPI source
>> whereas it is not. Thank you for drawing my attention it's actually a
>> utility from 'coreutils' Cygwin package. I'll report it to their team.
>> I've also filed a report with Automake's team about their part.
>>
>> 1. I'm testing the Open MPI SVN patched source, that is, 1.9a1-svn
>> with the latest autotools assembled from their git/svn sources, and my
>> humble patches, yet have to be polished.
>>
>> 2. Indeed, I'm running 'make check' when seeing those failures.
>> Unfortunately, that failure with 'test-driver' obscures how many (how
>> less), if any, true tests have been failed. I've just run it now on
>> the latest sources (bzw, there's still an old rot with 'trace.c') and,
>> if I could manage to make 'test-drive' working, it passes *ALL* the
>> tests, except those with bogus 'test-drive' crashes, that is:
>>
>> atomic_spinlock_noinline.exe
>> atomic_cmpset_noinline.exe
>> atomic_math_noinline.exe
>> atomic_spinlock_noinline.exe
>> atomic_cmpset_noinline.exe
>> atomic_spinlock_noinline.exe
>> atomic_math_noinline.exe
>> atomic_cmpset_noinline.exe
>> atomic_spinlock_noinline.exe
>> atomic_math_noinline.exe
>> atomic_spinlock_noinline.exe
>> atomic_cmpset_noinline.exe
>> atomic_math_noinline.exe
>>
>> Clearly, they're inline/noinline issues, need to be looked into at
>> some time later.
>>
>> I can now give a feedback why I've got early reported warning about
>> the shared libraries which haven't got created, and a blowout of
>> 'undefined symbols'. Indeed, that was a problem with Makefile.am's.
>> I've tested just two from about a hundred of other successfully
>> compiled static libraries, which DSO counterparts weren't created upon
>> compilation process, though being requested to:
>>
>> - 'ompi/datatype's Makefile compiles 'libdatatype' without very much
>> needed 'libopen-pal' and 'libmpi' libraries, what causes a shared
>> library not to be created because of undefined symbols; bzw, even if
>> added to the libtool (v2.4.2-374) invocation command line they are
>> still not being produced, gcc doesn't have this kind of a problem;
>>
>> - 'ompi/debuggers's Makefile does not make a 'libompi_dbg_msgq.dll.a'
>> import library (though there is a shared library), the corresponding
>> part has to be created manually;
>>
>> I haven't checked other 95's.
>>
>>
>>
>> On Fri, Jul 12, 2013 at 2:26 PM, Jeff Squyres (jsquyres)
>>  wrote:
>>> I'm sorry, I'm still unclear what you're trying to tell us.  :-(
>>>
>>> 1. What version of Open MPI are you testing?  If you're testing Open MPI 
>>> 1.6.x with very new Automake, I'm not surprised that there's some failures. 
>>>  We usually pick the newest GNU Autotools when we begin a release series, 
>>> and then stick with those tool versions for the life of that series.  We do 
>>> not attempt to forward-port to newer Autotools on that series, meaning that 
>>> sometimes newer versions of the Autotools will break the builds of that 
>>> series.  That's ok.
>>>
>>> 2. Assumedly, you're seeing this failure when you run "make check".  Is 
>>> that correct?  What test, exactly, is failing?  It's very difficult to grok 
>>> what you're reporting when you only include the last few lines of output, 
>>> which exclude the majority of the context that we need to know what you're 
>>> talking about.
>>>
>>> Your bug reports have been *extremely* helpful in cleaning out some old 
>>> kruft from our tree, but could you include more context in the future?  
>>> E.g., include all the "compile problems" items from here:
>>>
>>> http://www.open-mpi.org/community/help/
>>>
>>> 3. We don't have a test named "basename" or "test-driver"; basename is 
>>> usually an OS utility, and test-driver is part of the new Automake testing 
>>> framework.  If there's a mistake in how these are being invoked, it's 
>>> coming from Automake, and you should report the bug to them.
>>>
>>> ...unless we're doing something wrong in our Makefile.am's in how we list 
>>> the tests to be run.  Is that what you're saying?
>>>
>>>
>>> On Jul 12, 2013, at 3:59 AM, Vasiliy  wrote:
>>>

[OMPI devel] [bug] One-sided communication with a duplicated datatype

2013-07-14 Thread KAWASHIMA Takahiro
Hi,

I encountered an assertion failure in Open MPI trunk and found a bug.

See the attached program. This program can be run with mpiexec -n 1.
This program calls MPI_Put and writes one int value to the target side.
The target side datatype is equivalent to MPI_INT, but is a derived
datatype created by MPI_Type_contiguous and MPI_Type_Dup.

This program aborts with the following output.

==
 dt1 (0x2626160) 
type 2 count ints 1 count disp 0 count datatype 1
ints: 1 
types:MPI_INT 
 dt2 (0x2626340) 
type 1 count ints 0 count disp 0 count datatype 1
types:0x2626160 
put_dup_type: ../../../ompi/datatype/ompi_datatype_args.c:565: 
__ompi_datatype_create_from_packed_description: Assertion `data_id < 45' failed.
[ppc:05244] *** Process received signal ***
[ppc:05244] Signal: Aborted (6)
[ppc:05244] Signal code:  (-6)
[ppc:05244] [ 0] /lib/libpthread.so.0(+0xeff0) [0x7fe58a275ff0]
[ppc:05244] [ 1] /lib/libc.so.6(gsignal+0x35) [0x7fe589f371b5]
[ppc:05244] [ 2] /lib/libc.so.6(abort+0x180) [0x7fe589f39fc0]
[ppc:05244] [ 3] /lib/libc.so.6(__assert_fail+0xf1) [0x7fe589f30301]
[ppc:05244] [ 4] /ompi/lib/libmpi.so.0(+0x6504e) [0x7fe58a4e804e]
[ppc:05244] [ 5] 
/ompi/lib/libmpi.so.0(ompi_datatype_create_from_packed_description+0x23) 
[0x7fe58a4e8cf6]
[ppc:05244] [ 6] /ompi/lib/openmpi/mca_osc_rdma.so(+0xd04b) [0x7fe5839a104b]
[ppc:05244] [ 7] 
/ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_sendreq_recv_put+0xa8) 
[0x7fe5839a3ae5]
[ppc:05244] [ 8] /ompi/lib/openmpi/mca_osc_rdma.so(+0x86cc) [0x7fe58399c6cc]
[ppc:05244] [ 9] /ompi/lib/openmpi/mca_btl_self.so(mca_btl_self_send+0x87) 
[0x7fe58510bb04]
[ppc:05244] [10] /ompi/lib/openmpi/mca_osc_rdma.so(+0xc44b) [0x7fe5839a044b]
[ppc:05244] [11] /ompi/lib/openmpi/mca_osc_rdma.so(+0xd69d) [0x7fe5839a169d]
[ppc:05244] [12] /ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_flush+0x50) 
[0x7fe5839a1776]
[ppc:05244] [13] 
/ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_module_fence+0x8e6) 
[0x7fe5839a84ab]
[ppc:05244] [14] /ompi/lib/libmpi.so.0(MPI_Win_fence+0x16f) [0x7fe58a54127d]
[ppc:05244] [15] ompi-trunk/put_dup_type() [0x400d10]
[ppc:05244] [16] /lib/libc.so.6(__libc_start_main+0xfd) [0x7fe589f23c8d]
[ppc:05244] [17] put_dup_type() [0x400b09]
[ppc:05244] *** End of error message ***
--
mpiexec noticed that process rank 0 with PID 5244 on node ppc exited on signal 
6 (Aborted).
--
==

__ompi_datatype_create_from_packed_description function, in which the
assertion failure occurred, seems to expect the value of data_id is an
ID of a predefined datatype. In my environment, the value of data_id
is 68, that is an ID of the datatype created by MPI_Type_contiguous.

On one-sided communication, the target side datatype is encoded as
'description' at the origin side and then it is decoded at the target
side. I think there are problems in both encoding stage and decoding
stage.

__ompi_datatype_pack_description function in
ompi/datatype/ompi_datatype_args.c file encodes the datatype.
For MPI_COMBINER_DUP on line 451, it encodes only create_type and id
and returns immediately. It doesn't encode the information of the base
dataype (in my case, the datatype created by MPI_Type_contiguous).

__ompi_datatype_create_from_packed_description function in
ompi/datatype/ompi_datatype_args.c file decodes the description.
For MPI_COMBINER_DUP in line 557, it expects the value of data_id is
an ID of a predefined datatype. It is not always true.

I cannot fix this problem yet because I'm not familiar with the datatype
code in Open MPI. MPI_COMBINER_DUP is also used for predefined datatypes
and the calculation of total_pack_size is also involved. It seems not
so simple.

Regards,
KAWASHIMA Takahiro
#include 
#include 
#include 

#define PRINT_ARGS

#ifdef PRINT_ARGS
/* defined in ompi/datatype/ompi_datatype_args.c */
extern int32_t ompi_datatype_print_args(const struct ompi_datatype_t *pData);
#endif

int main(int argc, char *argv[])
{
MPI_Win win;
MPI_Datatype dt1, dt2;
int obuf[1], tbuf[1];

obuf[0] = 77;
tbuf[0] = 88;

MPI_Init(&argc, &argv);

MPI_Type_contiguous(1, MPI_INT, &dt1);
MPI_Type_dup(dt1, &dt2);
MPI_Type_commit(&dt2);

#ifdef PRINT_ARGS
printf(" dt1 (%p) \n", (void *)dt1);
ompi_datatype_print_args(dt1);
printf(" dt2 (%p) \n", (void *)dt2);
ompi_datatype_print_args(dt2);
fflush(stdout);
#endif

MPI_Win_create(tbuf, sizeof(int), 1, MPI_INFO_NULL, MPI_COMM_SELF, &win);
MPI_Win_fence(0, win);
MPI_Put(obuf, 1, MPI_INT, 0, 0, 1, dt2, win);
MPI_Win_fence(0, win);

MPI_Type_free(&dt1);
MPI_Type_free(&dt2);
MPI_Win_free(&win);
MPI_Finalize();

if (tbuf[0] == 77) {
printf("OK

Re: [OMPI devel] [bug] One-sided communication with a duplicated datatype

2013-07-14 Thread George Bosilca
Takahiro,

Nice catch. That particular code was an over-optimizations … that failed. 
Please try with the patch below.

Let me know if it's working as expected, I will push it in the trunk once 
confirmed.

  George.


Index: ompi/datatype/ompi_datatype_args.c
===
--- ompi/datatype/ompi_datatype_args.c  (revision 28787)
+++ ompi/datatype/ompi_datatype_args.c  (working copy)
@@ -449,9 +449,10 @@
 }
 /* For duplicated datatype we don't have to store all the information */
 if( MPI_COMBINER_DUP == args->create_type ) {
-position[0] = args->create_type;
-position[1] = args->d[0]->id; /* On the OMPI - layer, copy the 
ompi_datatype.id */
-return OMPI_SUCCESS;
+ompi_datatype_t* temp_data = args->d[0];
+return __ompi_datatype_pack_description(temp_data,
+packed_buffer,
+next_index );
 }
 position[0] = args->create_type;
 position[1] = args->ci;



On Jul 14, 2013, at 14:30 , KAWASHIMA Takahiro  
wrote:

> Hi,
> 
> I encountered an assertion failure in Open MPI trunk and found a bug.
> 
> See the attached program. This program can be run with mpiexec -n 1.
> This program calls MPI_Put and writes one int value to the target side.
> The target side datatype is equivalent to MPI_INT, but is a derived
> datatype created by MPI_Type_contiguous and MPI_Type_Dup.
> 
> This program aborts with the following output.
> 
> ==
>  dt1 (0x2626160) 
> type 2 count ints 1 count disp 0 count datatype 1
> ints: 1 
> types:MPI_INT 
>  dt2 (0x2626340) 
> type 1 count ints 0 count disp 0 count datatype 1
> types:0x2626160 
> put_dup_type: ../../../ompi/datatype/ompi_datatype_args.c:565: 
> __ompi_datatype_create_from_packed_description: Assertion `data_id < 45' 
> failed.
> [ppc:05244] *** Process received signal ***
> [ppc:05244] Signal: Aborted (6)
> [ppc:05244] Signal code:  (-6)
> [ppc:05244] [ 0] /lib/libpthread.so.0(+0xeff0) [0x7fe58a275ff0]
> [ppc:05244] [ 1] /lib/libc.so.6(gsignal+0x35) [0x7fe589f371b5]
> [ppc:05244] [ 2] /lib/libc.so.6(abort+0x180) [0x7fe589f39fc0]
> [ppc:05244] [ 3] /lib/libc.so.6(__assert_fail+0xf1) [0x7fe589f30301]
> [ppc:05244] [ 4] /ompi/lib/libmpi.so.0(+0x6504e) [0x7fe58a4e804e]
> [ppc:05244] [ 5] 
> /ompi/lib/libmpi.so.0(ompi_datatype_create_from_packed_description+0x23) 
> [0x7fe58a4e8cf6]
> [ppc:05244] [ 6] /ompi/lib/openmpi/mca_osc_rdma.so(+0xd04b) [0x7fe5839a104b]
> [ppc:05244] [ 7] 
> /ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_sendreq_recv_put+0xa8) 
> [0x7fe5839a3ae5]
> [ppc:05244] [ 8] /ompi/lib/openmpi/mca_osc_rdma.so(+0x86cc) [0x7fe58399c6cc]
> [ppc:05244] [ 9] /ompi/lib/openmpi/mca_btl_self.so(mca_btl_self_send+0x87) 
> [0x7fe58510bb04]
> [ppc:05244] [10] /ompi/lib/openmpi/mca_osc_rdma.so(+0xc44b) [0x7fe5839a044b]
> [ppc:05244] [11] /ompi/lib/openmpi/mca_osc_rdma.so(+0xd69d) [0x7fe5839a169d]
> [ppc:05244] [12] /ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_flush+0x50) 
> [0x7fe5839a1776]
> [ppc:05244] [13] 
> /ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_module_fence+0x8e6) 
> [0x7fe5839a84ab]
> [ppc:05244] [14] /ompi/lib/libmpi.so.0(MPI_Win_fence+0x16f) [0x7fe58a54127d]
> [ppc:05244] [15] ompi-trunk/put_dup_type() [0x400d10]
> [ppc:05244] [16] /lib/libc.so.6(__libc_start_main+0xfd) [0x7fe589f23c8d]
> [ppc:05244] [17] put_dup_type() [0x400b09]
> [ppc:05244] *** End of error message ***
> --
> mpiexec noticed that process rank 0 with PID 5244 on node ppc exited on 
> signal 6 (Aborted).
> --
> ==
> 
> __ompi_datatype_create_from_packed_description function, in which the
> assertion failure occurred, seems to expect the value of data_id is an
> ID of a predefined datatype. In my environment, the value of data_id
> is 68, that is an ID of the datatype created by MPI_Type_contiguous.
> 
> On one-sided communication, the target side datatype is encoded as
> 'description' at the origin side and then it is decoded at the target
> side. I think there are problems in both encoding stage and decoding
> stage.
> 
> __ompi_datatype_pack_description function in
> ompi/datatype/ompi_datatype_args.c file encodes the datatype.
> For MPI_COMBINER_DUP on line 451, it encodes only create_type and id
> and returns immediately. It doesn't encode the information of the base
> dataype (in my case, the datatype created by MPI_Type_contiguous).
> 
> __ompi_datatype_create_from_packed_description function in
> ompi/datatype/ompi_datatype_args.c file decodes the description.
> For MPI_COMBINER_DUP in line 557, it expects the value of data_id is
> an ID of a predefined datatype. It 

Re: [OMPI devel] [bug] One-sided communication with a duplicated datatype

2013-07-14 Thread KAWASHIMA Takahiro
George,

Thanks. But no, your patch does not work correctly.

The assertion failure disappeared by your patch but the value of the
target buffer of MPI_Put is not a correct one.

In rdma OSC (and pt2pt OSC), the following data are packed into
the send buffer in ompi_osc_rdma_sendreq_send function on the
origin side.

  - header
  - datatype description
  - user data

User data are written at the offset of
(sizeof(ompi_osc_rdma_send_header_t) + total_pack_size).

In the case of my program attached in my previous mail, total_pack_size
is 32 because ompi_datatype_set_args set 8 for MPI_COMBINER_DUP and
24 for MPI_COMBINER_CONTIGUOUS. See the following code.


int32_t ompi_datatype_set_args(... snip ...)
{
... snip ...
switch(type){
... snip ...
case MPI_COMBINER_DUP:
/* Recompute the data description packed size based on the optimization
 * for MPI_COMBINER_DUP.
 */
pArgs->total_pack_size = 2 * sizeof(int);  total_pack_size = 8
break;
... snip ...
}
...
for( pos = 0; pos < cd; pos++ ) {
... snip ...
if( !(ompi_datatype_is_predefined(d[pos])) ) {
... snip ...
pArgs->total_pack_size += 
((ompi_datatype_args_t*)d[pos]->args)->total_pack_size;  total_pack_size += 
24
... snip ...
}
... snip ...
}
... snip ...
}


But on the target side, user data are read at the offset of
(sizeof(ompi_osc_rdma_send_header_t) + 24)
because ompi_osc_base_datatype_create function, which is called
by ompi_osc_rdma_sendreq_recv_put function, progress the offset
only 24 bytes. Not 32 bytes.

So the wrong data are written to the target buffer.

We need to take care of total_pack_size in the origin side.

I modified ompi_datatype_set_args function as a trial.

Index: ompi/datatype/ompi_datatype_args.c
===
--- ompi/datatype/ompi_datatype_args.c  (revision 28778)
+++ ompi/datatype/ompi_datatype_args.c  (working copy)
@@ -129,7 +129,7 @@
 /* Recompute the data description packed size based on the optimization
  * for MPI_COMBINER_DUP.
  */
-pArgs->total_pack_size = 2 * sizeof(int);
+pArgs->total_pack_size = 0;
 break;

 case MPI_COMBINER_CONTIGUOUS:

This patch in addition to your patch works correctly for my program.
But I'm not sure this is a correct solution.

Regards,
KAWASHIMA Takahiro

> Takahiro,
> 
> Nice catch. That particular code was an over-optimizations … that failed. 
> Please try with the patch below.
> 
> Let me know if it's working as expected, I will push it in the trunk once 
> confirmed.
> 
>   George.
> 
> 
> Index: ompi/datatype/ompi_datatype_args.c
> ===
> --- ompi/datatype/ompi_datatype_args.c(revision 28787)
> +++ ompi/datatype/ompi_datatype_args.c(working copy)
> @@ -449,9 +449,10 @@
>  }
>  /* For duplicated datatype we don't have to store all the information */
>  if( MPI_COMBINER_DUP == args->create_type ) {
> -position[0] = args->create_type;
> -position[1] = args->d[0]->id; /* On the OMPI - layer, copy the 
> ompi_datatype.id */
> -return OMPI_SUCCESS;
> +ompi_datatype_t* temp_data = args->d[0];
> +return __ompi_datatype_pack_description(temp_data,
> +packed_buffer,
> +next_index );
>  }
>  position[0] = args->create_type;
>  position[1] = args->ci;
> 
> 
> 
> On Jul 14, 2013, at 14:30 , KAWASHIMA Takahiro  
> wrote:
> 
> > Hi,
> > 
> > I encountered an assertion failure in Open MPI trunk and found a bug.
> > 
> > See the attached program. This program can be run with mpiexec -n 1.
> > This program calls MPI_Put and writes one int value to the target side.
> > The target side datatype is equivalent to MPI_INT, but is a derived
> > datatype created by MPI_Type_contiguous and MPI_Type_Dup.
> > 
> > This program aborts with the following output.
> > 
> > ==
> >  dt1 (0x2626160) 
> > type 2 count ints 1 count disp 0 count datatype 1
> > ints: 1 
> > types:MPI_INT 
> >  dt2 (0x2626340) 
> > type 1 count ints 0 count disp 0 count datatype 1
> > types:0x2626160 
> > put_dup_type: ../../../ompi/datatype/ompi_datatype_args.c:565: 
> > __ompi_datatype_create_from_packed_description: Assertion `data_id < 45' 
> > failed.
> > [ppc:05244] *** Process received signal ***
> > [ppc:05244] Signal: Aborted (6)
> > [ppc:05244] Signal code:  (-6)
> > [ppc:05244] [ 0] /lib/libpthread.so.0(+0xeff0) [0x7fe58a275ff0]
> > [ppc:05244] [ 1] /lib/libc.so.6(gsignal+0x35) [0x7fe589f371b5]
> > [ppc:05244] [ 2] /lib

Re: [OMPI devel] [bug] One-sided communication with a duplicated datatype

2013-07-14 Thread KAWASHIMA Takahiro
No. My patch doesn't work for a more simple case,
just a duplicate of MPI_INT.

Datatype is too complex for me ...

Regards,
KAWASHIMA Takahiro

> George,
> 
> Thanks. But no, your patch does not work correctly.
> 
> The assertion failure disappeared by your patch but the value of the
> target buffer of MPI_Put is not a correct one.
> 
> In rdma OSC (and pt2pt OSC), the following data are packed into
> the send buffer in ompi_osc_rdma_sendreq_send function on the
> origin side.
> 
>   - header
>   - datatype description
>   - user data
> 
> User data are written at the offset of
> (sizeof(ompi_osc_rdma_send_header_t) + total_pack_size).
> 
> In the case of my program attached in my previous mail, total_pack_size
> is 32 because ompi_datatype_set_args set 8 for MPI_COMBINER_DUP and
> 24 for MPI_COMBINER_CONTIGUOUS. See the following code.
> 
> 
> int32_t ompi_datatype_set_args(... snip ...)
> {
> ... snip ...
> switch(type){
> ... snip ...
> case MPI_COMBINER_DUP:
> /* Recompute the data description packed size based on the 
> optimization
>  * for MPI_COMBINER_DUP.
>  */
> pArgs->total_pack_size = 2 * sizeof(int);  total_pack_size = 8
> break;
> ... snip ...
> }
> ...
> for( pos = 0; pos < cd; pos++ ) {
> ... snip ...
> if( !(ompi_datatype_is_predefined(d[pos])) ) {
> ... snip ...
> pArgs->total_pack_size += 
> ((ompi_datatype_args_t*)d[pos]->args)->total_pack_size;  total_pack_size 
> += 24
> ... snip ...
> }
> ... snip ...
> }
> ... snip ...
> }
> 
> 
> But on the target side, user data are read at the offset of
> (sizeof(ompi_osc_rdma_send_header_t) + 24)
> because ompi_osc_base_datatype_create function, which is called
> by ompi_osc_rdma_sendreq_recv_put function, progress the offset
> only 24 bytes. Not 32 bytes.
> 
> So the wrong data are written to the target buffer.
> 
> We need to take care of total_pack_size in the origin side.
> 
> I modified ompi_datatype_set_args function as a trial.
> 
> Index: ompi/datatype/ompi_datatype_args.c
> ===
> --- ompi/datatype/ompi_datatype_args.c  (revision 28778)
> +++ ompi/datatype/ompi_datatype_args.c  (working copy)
> @@ -129,7 +129,7 @@
>  /* Recompute the data description packed size based on the 
> optimization
>   * for MPI_COMBINER_DUP.
>   */
> -pArgs->total_pack_size = 2 * sizeof(int);
> +pArgs->total_pack_size = 0;
>  break;
>  
>  case MPI_COMBINER_CONTIGUOUS:
> 
> This patch in addition to your patch works correctly for my program.
> But I'm not sure this is a correct solution.
> 
> Regards,
> KAWASHIMA Takahiro
> 
> > Takahiro,
> > 
> > Nice catch. That particular code was an over-optimizations … that failed. 
> > Please try with the patch below.
> > 
> > Let me know if it's working as expected, I will push it in the trunk once 
> > confirmed.
> > 
> >   George.
> > 
> > 
> > Index: ompi/datatype/ompi_datatype_args.c
> > ===
> > --- ompi/datatype/ompi_datatype_args.c  (revision 28787)
> > +++ ompi/datatype/ompi_datatype_args.c  (working copy)
> > @@ -449,9 +449,10 @@
> >  }
> >  /* For duplicated datatype we don't have to store all the information 
> > */
> >  if( MPI_COMBINER_DUP == args->create_type ) {
> > -position[0] = args->create_type;
> > -position[1] = args->d[0]->id; /* On the OMPI - layer, copy the 
> > ompi_datatype.id */
> > -return OMPI_SUCCESS;
> > +ompi_datatype_t* temp_data = args->d[0];
> > +return __ompi_datatype_pack_description(temp_data,
> > +packed_buffer,
> > +next_index );
> >  }
> >  position[0] = args->create_type;
> >  position[1] = args->ci;
> > 
> > 
> > 
> > On Jul 14, 2013, at 14:30 , KAWASHIMA Takahiro  
> > wrote:
> > 
> > > Hi,
> > > 
> > > I encountered an assertion failure in Open MPI trunk and found a bug.
> > > 
> > > See the attached program. This program can be run with mpiexec -n 1.
> > > This program calls MPI_Put and writes one int value to the target side.
> > > The target side datatype is equivalent to MPI_INT, but is a derived
> > > datatype created by MPI_Type_contiguous and MPI_Type_Dup.
> > > 
> > > This program aborts with the following output.
> > > 
> > > ==
> > >  dt1 (0x2626160) 
> > > type 2 count ints 1 count disp 0 count datatype 1
> > > ints: 1 
> > > types:MPI_INT 
> > >  dt2 (0x2626340) 
> > > type 1 count ints 0 count disp 0 count datatype 1
> > > types:0x2626160 
> > > 

Re: [OMPI devel] [bug] One-sided communication with a duplicated datatype

2013-07-14 Thread KAWASHIMA Takahiro
George,

A improved patch is attached. Latter half is same as your patch.
But again, I'm not sure this is a correct solution.

It works correctly for my attached put_dup_type_3.c.
Run as "mpiexec -n 1 ./put_dup_type_3".
It will print seven OKs if succeeded.

Regards,
KAWASHIMA Takahiro

> No. My patch doesn't work for a more simple case,
> just a duplicate of MPI_INT.
> 
> Datatype is too complex for me ...
> 
> Regards,
> KAWASHIMA Takahiro
> 
> > George,
> > 
> > Thanks. But no, your patch does not work correctly.
> > 
> > The assertion failure disappeared by your patch but the value of the
> > target buffer of MPI_Put is not a correct one.
> > 
> > In rdma OSC (and pt2pt OSC), the following data are packed into
> > the send buffer in ompi_osc_rdma_sendreq_send function on the
> > origin side.
> > 
> >   - header
> >   - datatype description
> >   - user data
> > 
> > User data are written at the offset of
> > (sizeof(ompi_osc_rdma_send_header_t) + total_pack_size).
> > 
> > In the case of my program attached in my previous mail, total_pack_size
> > is 32 because ompi_datatype_set_args set 8 for MPI_COMBINER_DUP and
> > 24 for MPI_COMBINER_CONTIGUOUS. See the following code.
> > 
> > 
> > int32_t ompi_datatype_set_args(... snip ...)
> > {
> > ... snip ...
> > switch(type){
> > ... snip ...
> > case MPI_COMBINER_DUP:
> > /* Recompute the data description packed size based on the 
> > optimization
> >  * for MPI_COMBINER_DUP.
> >  */
> > pArgs->total_pack_size = 2 * sizeof(int);  total_pack_size = 8
> > break;
> > ... snip ...
> > }
> > ...
> > for( pos = 0; pos < cd; pos++ ) {
> > ... snip ...
> > if( !(ompi_datatype_is_predefined(d[pos])) ) {
> > ... snip ...
> > pArgs->total_pack_size += 
> > ((ompi_datatype_args_t*)d[pos]->args)->total_pack_size;  
> > total_pack_size += 24
> > ... snip ...
> > }
> > ... snip ...
> > }
> > ... snip ...
> > }
> > 
> > 
> > But on the target side, user data are read at the offset of
> > (sizeof(ompi_osc_rdma_send_header_t) + 24)
> > because ompi_osc_base_datatype_create function, which is called
> > by ompi_osc_rdma_sendreq_recv_put function, progress the offset
> > only 24 bytes. Not 32 bytes.
> > 
> > So the wrong data are written to the target buffer.
> > 
> > We need to take care of total_pack_size in the origin side.
> > 
> > I modified ompi_datatype_set_args function as a trial.
> > 
> > Index: ompi/datatype/ompi_datatype_args.c
> > ===
> > --- ompi/datatype/ompi_datatype_args.c  (revision 28778)
> > +++ ompi/datatype/ompi_datatype_args.c  (working copy)
> > @@ -129,7 +129,7 @@
> >  /* Recompute the data description packed size based on the 
> > optimization
> >   * for MPI_COMBINER_DUP.
> >   */
> > -pArgs->total_pack_size = 2 * sizeof(int);
> > +pArgs->total_pack_size = 0;
> >  break;
> >  
> >  case MPI_COMBINER_CONTIGUOUS:
> > 
> > This patch in addition to your patch works correctly for my program.
> > But I'm not sure this is a correct solution.
> > 
> > Regards,
> > KAWASHIMA Takahiro
> > 
> > > Takahiro,
> > > 
> > > Nice catch. That particular code was an over-optimizations … that failed. 
> > > Please try with the patch below.
> > > 
> > > Let me know if it's working as expected, I will push it in the trunk once 
> > > confirmed.
> > > 
> > >   George.
> > > 
> > > 
> > > Index: ompi/datatype/ompi_datatype_args.c
> > > ===
> > > --- ompi/datatype/ompi_datatype_args.c(revision 28787)
> > > +++ ompi/datatype/ompi_datatype_args.c(working copy)
> > > @@ -449,9 +449,10 @@
> > >  }
> > >  /* For duplicated datatype we don't have to store all the 
> > > information */
> > >  if( MPI_COMBINER_DUP == args->create_type ) {
> > > -position[0] = args->create_type;
> > > -position[1] = args->d[0]->id; /* On the OMPI - layer, copy the 
> > > ompi_datatype.id */
> > > -return OMPI_SUCCESS;
> > > +ompi_datatype_t* temp_data = args->d[0];
> > > +return __ompi_datatype_pack_description(temp_data,
> > > +packed_buffer,
> > > +next_index );
> > >  }
> > >  position[0] = args->create_type;
> > >  position[1] = args->ci;
> > > 
> > > 
> > > 
> > > On Jul 14, 2013, at 14:30 , KAWASHIMA Takahiro 
> > >  wrote:
> > > 
> > > > Hi,
> > > > 
> > > > I encountered an assertion failure in Open MPI trunk and found a bug.
> > > > 
> > > > See the attached program. This program can be run with mpiexec -n 1.
> > > > This program calls MPI_Put and writes one int value t

Re: [OMPI devel] RFC: remove opal_trace macro

2013-07-14 Thread Ralph Castain
I went ahead and committed this after finding only *one* reference to 
OPAL_TRACE anywhere in the ompi code base.

On Jul 11, 2013, at 9:05 AM, Ralph Castain  wrote:

> WHAT: remove the old and stale "OPAL_TRACE" macro
> 
> WHY: it is old, stale, no longer needed, and largely unused
> 
> WHEN: since it is virtually unused, a short timeout seems appropriate
> so let's set it for Tues 7/16
> 




[OMPI devel] 'make re-install' : remove 'ortecc' symlink also

2013-07-14 Thread Vasiliy
Makefile: please, remove/check for 'ortecc' symlink before proceeding
with install

make[4]: Entering directory
'/usr/src/64bit/release/openmpi/openmpi-1.9.0-a1/build/orte/tools/wrappers'
test -z "/usr/bin" || /usr/bin/mkdir -p "/usr/bin"
make  install-data-hook
(cd /usr/bin; rm -f ortecc.exe; ln -s opal_wrapper ortecc)
ln: failed to create symbolic link `ortecc': File exists
make[4]: Entering directory
'/usr/src/64bit/release/openmpi/openmpi-1.9.0-a1/build/orte/tools/wrappers'
make[4]: Nothing to be done for 'install-data-hook'.
make[4]: Leaving directory
'/usr/src/64bit/release/openmpi/openmpi-1.9.0-a1/build/orte/tools/wrappers'
Makefile:1668: recipe for target 'install-exec-hook-always' failed



Re: [OMPI devel] 'make re-install' : remove 'ortecc' symlink also

2013-07-14 Thread Vasiliy
also 'opalcc', and others:

make[4]: Entering directory
'/usr/src/64bit/release/openmpi/openmpi-1.9.0-a1/build/opal/tools/wrappers'
(cd /usr/bin; rm -f opalcc.exe; ln -s opal_wrapper opalcc)
ln: failed to create symbolic link `opalcc': File exists
Makefile:1972: recipe for target 'install-exec-hook' failed
...

On Sun, Jul 14, 2013 at 11:35 PM, Vasiliy  wrote:
> Makefile: please, remove/check for 'ortecc' symlink before proceeding
> with install
> 
> make[4]: Entering directory
> '/usr/src/64bit/release/openmpi/openmpi-1.9.0-a1/build/orte/tools/wrappers'
> test -z "/usr/bin" || /usr/bin/mkdir -p "/usr/bin"
> make  install-data-hook
> (cd /usr/bin; rm -f ortecc.exe; ln -s opal_wrapper ortecc)
> ln: failed to create symbolic link `ortecc': File exists
> make[4]: Entering directory
> '/usr/src/64bit/release/openmpi/openmpi-1.9.0-a1/build/orte/tools/wrappers'
> make[4]: Nothing to be done for 'install-data-hook'.
> make[4]: Leaving directory
> '/usr/src/64bit/release/openmpi/openmpi-1.9.0-a1/build/orte/tools/wrappers'
> Makefile:1668: recipe for target 'install-exec-hook-always' failed
> 


Re: [OMPI devel] [bug] One-sided communication with a duplicated datatype

2013-07-14 Thread George Bosilca
Takahiro,

Please find below another patch, this time hopefully fixing all issues. The 
problem with my original patch and with yours was that they try to address the 
packing of the data representation without fixing the computation of the 
required length. As a result the length on the packer and unpacker differs and 
the unpacking of the subsequent data is done from a wrong location.

I changed the code to force the preparation of the packed data representation 
before returning the length the first time. This way we can compute exactly how 
many bytes we need, including the potential alignment requirements. As a result 
the amount on both sides (the packer and the unpacker) are now identical, and 
the entire process works flawlessly (or so I hope).

Let me know if you still notice issues with this patch. I'll push the tomorrow 
in the trunk, so it can soak for a few days before propagation to the branches.

George.




packed.patch
Description: Binary data

On Jul 14, 2013, at 20:28 , KAWASHIMA Takahiro  
wrote:

> George,
> 
> A improved patch is attached. Latter half is same as your patch.
> But again, I'm not sure this is a correct solution.
> 
> It works correctly for my attached put_dup_type_3.c.
> Run as "mpiexec -n 1 ./put_dup_type_3".
> It will print seven OKs if succeeded.
> 
> Regards,
> KAWASHIMA Takahiro
> 
>> No. My patch doesn't work for a more simple case,
>> just a duplicate of MPI_INT.
>> 
>> Datatype is too complex for me ...
>> 
>> Regards,
>> KAWASHIMA Takahiro
>> 
>>> George,
>>> 
>>> Thanks. But no, your patch does not work correctly.
>>> 
>>> The assertion failure disappeared by your patch but the value of the
>>> target buffer of MPI_Put is not a correct one.
>>> 
>>> In rdma OSC (and pt2pt OSC), the following data are packed into
>>> the send buffer in ompi_osc_rdma_sendreq_send function on the
>>> origin side.
>>> 
>>>  - header
>>>  - datatype description
>>>  - user data
>>> 
>>> User data are written at the offset of
>>> (sizeof(ompi_osc_rdma_send_header_t) + total_pack_size).
>>> 
>>> In the case of my program attached in my previous mail, total_pack_size
>>> is 32 because ompi_datatype_set_args set 8 for MPI_COMBINER_DUP and
>>> 24 for MPI_COMBINER_CONTIGUOUS. See the following code.
>>> 
>>> 
>>> int32_t ompi_datatype_set_args(... snip ...)
>>> {
>>>... snip ...
>>>switch(type){
>>>... snip ...
>>>case MPI_COMBINER_DUP:
>>>/* Recompute the data description packed size based on the 
>>> optimization
>>> * for MPI_COMBINER_DUP.
>>> */
>>>pArgs->total_pack_size = 2 * sizeof(int);  total_pack_size = 8
>>>break;
>>>... snip ...
>>>}
>>>...
>>>for( pos = 0; pos < cd; pos++ ) {
>>>... snip ...
>>>if( !(ompi_datatype_is_predefined(d[pos])) ) {
>>>... snip ...
>>>pArgs->total_pack_size += 
>>> ((ompi_datatype_args_t*)d[pos]->args)->total_pack_size;  
>>> total_pack_size += 24
>>>... snip ...
>>>}
>>>... snip ...
>>>}
>>>... snip ...
>>> }
>>> 
>>> 
>>> But on the target side, user data are read at the offset of
>>> (sizeof(ompi_osc_rdma_send_header_t) + 24)
>>> because ompi_osc_base_datatype_create function, which is called
>>> by ompi_osc_rdma_sendreq_recv_put function, progress the offset
>>> only 24 bytes. Not 32 bytes.
>>> 
>>> So the wrong data are written to the target buffer.
>>> 
>>> We need to take care of total_pack_size in the origin side.
>>> 
>>> I modified ompi_datatype_set_args function as a trial.
>>> 
>>> Index: ompi/datatype/ompi_datatype_args.c
>>> ===
>>> --- ompi/datatype/ompi_datatype_args.c  (revision 28778)
>>> +++ ompi/datatype/ompi_datatype_args.c  (working copy)
>>> @@ -129,7 +129,7 @@
>>> /* Recompute the data description packed size based on the 
>>> optimization
>>>  * for MPI_COMBINER_DUP.
>>>  */
>>> -pArgs->total_pack_size = 2 * sizeof(int);
>>> +pArgs->total_pack_size = 0;
>>> break;
>>> 
>>> case MPI_COMBINER_CONTIGUOUS:
>>> 
>>> This patch in addition to your patch works correctly for my program.
>>> But I'm not sure this is a correct solution.
>>> 
>>> Regards,
>>> KAWASHIMA Takahiro
>>> 
 Takahiro,
 
 Nice catch. That particular code was an over-optimizations … that failed. 
 Please try with the patch below.
 
 Let me know if it's working as expected, I will push it in the trunk once 
 confirmed.
 
  George.
 
 
 Index: ompi/datatype/ompi_datatype_args.c
 ===
 --- ompi/datatype/ompi_datatype_args.c (revision 28787)
 +++ ompi/datatype/ompi_datatype_args.c (working copy)
 @@ -449,9 +449,10 @@
 }
 /* For