Re: [OMPI devel] [1.10.3rc2] testing summary

2016-05-23 Thread Paul Hargrove
On Sat, May 21, 2016 at 3:14 PM, Paul Hargrove  wrote:

>
> I did encounter one issue that still needs more investigating before I can
> attribute it to Solaris or to Open MPI:
> Since I last built Open MPI there x86-64/Solaris systems were updated from
> Solaris 11.2 to 11.3.
> I can build this release tarball using "gmake" fine, but "gmake -j4" is
> failing.
> It appears to be libtool failing to create a ".libs" directory.
> My current guess is some issue between the updated Solaris and the NFS
> server for the filesystem where I am building.
>


This issue that was reproducible yesterday does not occur today.
So, I am dropping my investigation.

-Paul



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] modex getting corrupted

2016-05-23 Thread dpchoudh .
Hello Ralph

Thanks for your input. The routine that does the send is this:

static int btl_lf_modex_send(lfgroup lfgroup)
{
char *grp_name = lf_get_group_name(lfgroup, NULL, 0);
btl_lf_modex_t lf_modex;
int rc;
strncpy(lf_modex.grp_name, grp_name, GRP_NAME_MAX_LEN);
OPAL_MODEX_SEND(rc, OPAL_PMIX_GLOBAL,
&mca_btl_lf_component.super.btl_version,
(char *)&lf_modex, sizeof(lf_modex));
return rc;
}

This routine is called from the component init routine
(mca_btl_lf_component_init()). I have verified that the values in the modex
(lf_modex) are correct.

The receive happens in proc_create, and I call it like this:
OPAL_MODEX_RECV(rc, &mca_btl_lf_component.super.btl_version,
   &opal_proc->proc_name, (uint8_t **)&module_proc->proc_modex,
&size);

In here, I get junk value in proc_modex. If I pass a buffer that was
malloc()'ed in place of module_proc->proc_modex, I still get bad data.


Thanks again for your help.

Durga

We learn from history that we never learn from history.

On Sat, May 21, 2016 at 8:38 PM, Ralph Castain  wrote:

> Please provide the exact code used for both send/recv - you likely have an
> error in the syntax
>
>
> On May 20, 2016, at 9:36 PM, dpchoudh .  wrote:
>
> Hello all
>
> I have a naive question:
>
> My 'cluster' consists of two nodes, connected back to back with a
> proprietary link as well as GbE (over a switch).
> I am calling OPAL_MODEX_SEND() and the modex consists of just this:
>
> struct modex
> {char name[20], unsigned mtu};
>
> The mtu field is not currently being used. I bzero() the struct and have
> verified that the value being written to the 'name' field (this is similar
> to a PKEY for infiniband; the driver will translate this to a unique
> integer) is correct at the sending end.
>
> When I do a OPAL_MODEX_RECV(), the value is completely corrupted. However,
> the size of the modex message is still correct (24 bytes)
> What could I be doing wrong? (Both nodes are little endian x86_64 machines)
>
> Thanks in advance
> Durga
>
> We learn from history that we never learn from history.
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/05/19012.php
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/05/19019.php
>


Re: [OMPI devel] modex getting corrupted

2016-05-23 Thread dpchoudh .
Hello Ralph and all

Please ignore this mail. It is indeed due to a syntax error in my code.
Sorry for the noise; I'll be more careful with my homework from now on.

Best regards
Durga

We learn from history that we never learn from history.

On Mon, May 23, 2016 at 2:13 AM, dpchoudh .  wrote:

> Hello Ralph
>
> Thanks for your input. The routine that does the send is this:
>
> static int btl_lf_modex_send(lfgroup lfgroup)
> {
> char *grp_name = lf_get_group_name(lfgroup, NULL, 0);
> btl_lf_modex_t lf_modex;
> int rc;
> strncpy(lf_modex.grp_name, grp_name, GRP_NAME_MAX_LEN);
> OPAL_MODEX_SEND(rc, OPAL_PMIX_GLOBAL,
> &mca_btl_lf_component.super.btl_version,
> (char *)&lf_modex, sizeof(lf_modex));
> return rc;
> }
>
> This routine is called from the component init routine
> (mca_btl_lf_component_init()). I have verified that the values in the modex
> (lf_modex) are correct.
>
> The receive happens in proc_create, and I call it like this:
> OPAL_MODEX_RECV(rc, &mca_btl_lf_component.super.btl_version,
>&opal_proc->proc_name, (uint8_t
> **)&module_proc->proc_modex, &size);
>
> In here, I get junk value in proc_modex. If I pass a buffer that was
> malloc()'ed in place of module_proc->proc_modex, I still get bad data.
>
>
> Thanks again for your help.
>
> Durga
>
> We learn from history that we never learn from history.
>
> On Sat, May 21, 2016 at 8:38 PM, Ralph Castain  wrote:
>
>> Please provide the exact code used for both send/recv - you likely have
>> an error in the syntax
>>
>>
>> On May 20, 2016, at 9:36 PM, dpchoudh .  wrote:
>>
>> Hello all
>>
>> I have a naive question:
>>
>> My 'cluster' consists of two nodes, connected back to back with a
>> proprietary link as well as GbE (over a switch).
>> I am calling OPAL_MODEX_SEND() and the modex consists of just this:
>>
>> struct modex
>> {char name[20], unsigned mtu};
>>
>> The mtu field is not currently being used. I bzero() the struct and have
>> verified that the value being written to the 'name' field (this is similar
>> to a PKEY for infiniband; the driver will translate this to a unique
>> integer) is correct at the sending end.
>>
>> When I do a OPAL_MODEX_RECV(), the value is completely corrupted.
>> However, the size of the modex message is still correct (24 bytes)
>> What could I be doing wrong? (Both nodes are little endian x86_64
>> machines)
>>
>> Thanks in advance
>> Durga
>>
>> We learn from history that we never learn from history.
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2016/05/19012.php
>>
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2016/05/19019.php
>>
>
>


[OMPI devel] Github having issues ATM...

2016-05-23 Thread Jeff Squyres (jsquyres)
FYI: Github is having a minor outage (i.e., delays).  Looks like they're 
working on it:

https://status.github.com/

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] [1.10.3rc2] testing summary

2016-05-23 Thread Paul Hargrove
On Sat, May 21, 2016 at 3:14 PM, Paul Hargrove  wrote:

> I will note that I was not able to test IA64 (yet?) because that system is
> down.
> I've emailed the owners of that system and am hopeful that I can test it
> in the next few days.
>

The IA64 tests ran today and PASSED.

-Paul


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900