Re: [OMPI devel] SM component init unload

2012-07-05 Thread Jeff Squyres
Thanks George.  I filed https://svn.open-mpi.org/trac/ompi/ticket/3162 about 
this.


On Jul 4, 2012, at 5:34 AM, Juan A. Rico wrote:

> Thanks all of you for your time and early responses.
> 
> After applying the patch, SM can be used by raising its priority. It is 
> enough for me (I hope so). But it continues failing when I specify --mca coll 
> sm,self in the command line (with tuned too).
> I am not going to use this release in production, only for playing with the 
> code :-)
> 
> Regards,
> Juan Antonio.
> 
> El 04/07/2012, a las 02:59, George Bosilca escribió:
> 
>> Juan,
>> 
>> Something weird is going on there. The selection mechanism for the SM coll 
>> and SM BTL should be very similar. However, the SM BTL successfully select 
>> itself while the SM coll fails to determine that all processes are local.
>> 
>> In the coll SM the issue is that the remote procs do not have the LOCAL flag 
>> set, even when they are on the local node (however the ompi_proc_local() 
>> return has a special flag stating that all processes in the job are local). 
>> I compared the initialization of the SM BTL and the SM coll. It turns out 
>> that somehow the procs returned by ompi_proc_all() and the procs provided to 
>> the add_proc of the BTLs are not identical. The second have the local flag 
>> correctly set, so I went a little bit deeper.
>> 
>> Here is what I found while toying with gdb inside:
>> 
>> breakpoint 1, mca_coll_sm_init_query (enable_progress_threads=false, 
>> enable_mpi_threads=false) at coll_sm_module.c:132
>> 
>> (gdb) p procs[0]
>> $1 = (ompi_proc_t *) 0x109a1e8c0
>> (gdb) p procs[1]
>> $2 = (ompi_proc_t *) 0x109a1e970
>> (gdb) p procs[0]->proc_flags
>> $3 = 0
>> (gdb) p procs[1]->proc_flags
>> $4 = 4095
>> 
>> Breakpoint 2, mca_btl_sm_add_procs (btl=0x109baa1c0, nprocs=2, 
>> procs=0x109a319e0, peers=0x109a319f0, reachability=0x7fff691378e8) at 
>> btl_sm.c:427
>> 
>> (gdb) p procs[0]
>> $5 = (struct ompi_proc_t *) 0x109a1e8c0
>> (gdb) p procs[1]
>> $6 = (struct ompi_proc_t *) 0x109a1e970
>> (gdb) p procs[0]->proc_flags
>> $7 = 1920
>> (gdb) p procs[1]->proc_flags
>> $8 = 4095
>> 
>> Thus the problem seems to come from the fact that during the initialization 
>> of the SM coll the flags are not correctly set. However, this is somehow 
>> expected … as the call to the initialization happens before the exchange of 
>> the business cards (and therefore there is no way to have any knowledge 
>> about the remote procs).
>> 
>> So, either something changed drastically in the way we set the flags for 
>> remote processes or we did not use the SM coll for the last 3 years. I think 
>> the culprit is r21967 (https://svn.open-mpi.org/trac/ompi/changeset/21967) 
>> who added a "selection" logic based on knowledge about remote procs in the 
>> coll SM initialization function. But this selection logic was way to early 
>> !!!
>> 
>> I would strongly encourage you not to use this SM collective component in 
>> anything related to production runs.
>> 
>>   george.
>> 
>> PS: However, if you want to toy with the SM coll apply the following patch:
>> Index: coll_sm_module.c
>> ===
>> --- coll_sm_module.c (revision 26737)
>> +++ coll_sm_module.c (working copy)
>> @@ -128,6 +128,7 @@
>>  int mca_coll_sm_init_query(bool enable_progress_threads,
>> bool enable_mpi_threads)
>>  {
>> +#if 0
>>  ompi_proc_t *my_proc, **procs;
>>  size_t i, size;
>>  
>> @@ -158,7 +159,7 @@
>>  "coll:sm:init_query: no other local procs; 
>> disqualifying myself");
>>  return OMPI_ERR_NOT_AVAILABLE;
>>  }
>> -
>> +#endif
>>  /* Don't do much here because we don't really want to allocate any
>> shared memory until this component is selected to be used. */
>>  opal_output_verbose(10, mca_coll_base_output,
>> 
>> 
>> 
>> 
>> 
>> On Jul 4, 2012, at 02:05 , Ralph Castain wrote:
>> 
>>> Okay, please try this again with r26739 or above. You can remove the rest 
>>> of the "verbose" settings and the --display-map so we declutter the output. 
>>> Please add "-mca orte_nidmap_verbose 20" to your cmd line.
>>> 
>>> Thanks!
>>> Ralph
>>> 
>>> 
>>> On Tue, Jul 3, 2012 at 1:50 PM, Juan A. Rico  wrote:
>>> Here is the output.
>>> 
>>> [jarico@Metropolis-01 examples]$ 
>>> /home/jarico/shared/packages/openmpi-cas-dbg/bin/mpiexec --bind-to-core 
>>> --bynode --mca mca_base_verbose 100 --mca mca_coll_base_output 100  --mca 
>>> coll_sm_priority 99 -mca hwloc_base_verbose 90 --display-map --mca 
>>> mca_verbose 100 --mca mca_base_verbose 100 --mca coll_base_verbose 100 -n 2 
>>> -mca grpcomm_base_verbose 5 ./bmem
>>> [Metropolis-01:24563] mca: base: components_open: Looking for hwloc 
>>> components
>>> [Metropolis-01:24563] mca: base: components_open: opening hwloc components
>>> [Metropolis-01:24563] mca: base: components_open: found loaded component 
>>> hwloc142
>>> [Metropolis-01:24563] mca: base

[OMPI devel] openib max_cqe

2012-07-05 Thread TERRY DONTJE
With Jeff's latest changes to how we set up the cq_size I am now seeing 
error messages saying that my machine's memlocked limits are too low.  I 
am concerned that it might be something else because my max'd locked 
memory is unlimited on my machine.


So if I do a run of -np 2 across two separate node than the use of the 
max_cqe of my ib device (4194303) is ok.  Once I go beyond 1 process on 
the node I start getting the memlocked limits message.  So how much 
memory does a cqe take?  Is it 1k by any chance?  I ask this because the 
machine I am running on has 4GB of memory and so I am wondering if I 
just don't have enough backing memory and if that is so I am wondering 
how commone of a case this may be?


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI devel] openib max_cqe

2012-07-05 Thread Shamis, Pavel
> So if I do a run of -np 2 across two separate node than the use of the 
> max_cqe of my ib device (4194303) is ok.  Once I go beyond 1 process on the 
> node I start getting the memlocked limits message.  So how much memory does a 
> cqe take?  Is it 1k by any chance?  I ask this because the machine I am 
> running on has 4GB of memory and so I am wondering if I just don't have 
> enough backing memory and if that is so I am wondering how commone of a case 
> this may be?

I mentioned on the call that for Mellanox devices (+OFA verbs) this resource is 
really cheap. Do you run mellanox hca + OFA verbs ?
Regards,
Pasha




Re: [OMPI devel] openib max_cqe

2012-07-05 Thread Jeff Squyres
On Jul 5, 2012, at 3:53 PM, Shamis, Pavel wrote:

> I mentioned on the call that for Mellanox devices (+OFA verbs) this resource 
> is really cheap. Do you run mellanox hca + OFA verbs ?

(I'll reply because I know Terry is offline for the rest of the day)

Yes, he does.

The heart of the question: is it incorrect to assume that we'll consume (num 
CQE * CQE size) registered memory for each QP opened?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] openib max_cqe

2012-07-05 Thread Shamis, Pavel
>> I mentioned on the call that for Mellanox devices (+OFA verbs) this resource 
>> is really cheap. Do you run mellanox hca + OFA verbs ?
> 
> (I'll reply because I know Terry is offline for the rest of the day)
> 
> Yes, he does.

I asked because SUN used to have own verbs driver.

> 
> The heart of the question: is it incorrect to assume that we'll consume (num 
> CQE * CQE size) registered memory for each QP opened?

QP or CQ ?  I think you don't want to assume anything there. Verbs (user and 
kernel) do their own magic there.
I think Mellanox should address this question.

Regards,
Pasha


Re: [OMPI devel] SM component init unload

2012-07-05 Thread Ralph Castain
George: is there any reason for opening and selecting the coll framework so 
early in mpi_init? I'm wondering if we can move that code to the end of the 
procedure so we wouldn't need the locality info until later.

Sent from my iPad

On Jul 5, 2012, at 10:05 AM, Jeff Squyres  wrote:

> Thanks George.  I filed https://svn.open-mpi.org/trac/ompi/ticket/3162 about 
> this.
> 
> 
> On Jul 4, 2012, at 5:34 AM, Juan A. Rico wrote:
> 
>> Thanks all of you for your time and early responses.
>> 
>> After applying the patch, SM can be used by raising its priority. It is 
>> enough for me (I hope so). But it continues failing when I specify --mca 
>> coll sm,self in the command line (with tuned too).
>> I am not going to use this release in production, only for playing with the 
>> code :-)
>> 
>> Regards,
>> Juan Antonio.
>> 
>> El 04/07/2012, a las 02:59, George Bosilca escribió:
>> 
>>> Juan,
>>> 
>>> Something weird is going on there. The selection mechanism for the SM coll 
>>> and SM BTL should be very similar. However, the SM BTL successfully select 
>>> itself while the SM coll fails to determine that all processes are local.
>>> 
>>> In the coll SM the issue is that the remote procs do not have the LOCAL 
>>> flag set, even when they are on the local node (however the 
>>> ompi_proc_local() return has a special flag stating that all processes in 
>>> the job are local). I compared the initialization of the SM BTL and the SM 
>>> coll. It turns out that somehow the procs returned by ompi_proc_all() and 
>>> the procs provided to the add_proc of the BTLs are not identical. The 
>>> second have the local flag correctly set, so I went a little bit deeper.
>>> 
>>> Here is what I found while toying with gdb inside:
>>> 
>>> breakpoint 1, mca_coll_sm_init_query (enable_progress_threads=false, 
>>> enable_mpi_threads=false) at coll_sm_module.c:132
>>> 
>>> (gdb) p procs[0]
>>> $1 = (ompi_proc_t *) 0x109a1e8c0
>>> (gdb) p procs[1]
>>> $2 = (ompi_proc_t *) 0x109a1e970
>>> (gdb) p procs[0]->proc_flags
>>> $3 = 0
>>> (gdb) p procs[1]->proc_flags
>>> $4 = 4095
>>> 
>>> Breakpoint 2, mca_btl_sm_add_procs (btl=0x109baa1c0, nprocs=2, 
>>> procs=0x109a319e0, peers=0x109a319f0, reachability=0x7fff691378e8) at 
>>> btl_sm.c:427
>>> 
>>> (gdb) p procs[0]
>>> $5 = (struct ompi_proc_t *) 0x109a1e8c0
>>> (gdb) p procs[1]
>>> $6 = (struct ompi_proc_t *) 0x109a1e970
>>> (gdb) p procs[0]->proc_flags
>>> $7 = 1920
>>> (gdb) p procs[1]->proc_flags
>>> $8 = 4095
>>> 
>>> Thus the problem seems to come from the fact that during the initialization 
>>> of the SM coll the flags are not correctly set. However, this is somehow 
>>> expected … as the call to the initialization happens before the exchange of 
>>> the business cards (and therefore there is no way to have any knowledge 
>>> about the remote procs).
>>> 
>>> So, either something changed drastically in the way we set the flags for 
>>> remote processes or we did not use the SM coll for the last 3 years. I 
>>> think the culprit is r21967 
>>> (https://svn.open-mpi.org/trac/ompi/changeset/21967) who added a 
>>> "selection" logic based on knowledge about remote procs in the coll SM 
>>> initialization function. But this selection logic was way to early !!!
>>> 
>>> I would strongly encourage you not to use this SM collective component in 
>>> anything related to production runs.
>>> 
>>>  george.
>>> 
>>> PS: However, if you want to toy with the SM coll apply the following patch:
>>> Index: coll_sm_module.c
>>> ===
>>> --- coll_sm_module.c(revision 26737)
>>> +++ coll_sm_module.c(working copy)
>>> @@ -128,6 +128,7 @@
>>> int mca_coll_sm_init_query(bool enable_progress_threads,
>>>bool enable_mpi_threads)
>>> {
>>> +#if 0
>>> ompi_proc_t *my_proc, **procs;
>>> size_t i, size;
>>> 
>>> @@ -158,7 +159,7 @@
>>> "coll:sm:init_query: no other local procs; 
>>> disqualifying myself");
>>> return OMPI_ERR_NOT_AVAILABLE;
>>> }
>>> -
>>> +#endif
>>> /* Don't do much here because we don't really want to allocate any
>>>shared memory until this component is selected to be used. */
>>> opal_output_verbose(10, mca_coll_base_output,
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Jul 4, 2012, at 02:05 , Ralph Castain wrote:
>>> 
 Okay, please try this again with r26739 or above. You can remove the rest 
 of the "verbose" settings and the --display-map so we declutter the 
 output. Please add "-mca orte_nidmap_verbose 20" to your cmd line.
 
 Thanks!
 Ralph
 
 
 On Tue, Jul 3, 2012 at 1:50 PM, Juan A. Rico  wrote:
 Here is the output.
 
 [jarico@Metropolis-01 examples]$ 
 /home/jarico/shared/packages/openmpi-cas-dbg/bin/mpiexec --bind-to-core 
 --bynode --mca mca_base_verbose 100 --mca mca_coll_base_output 100  --mca 
 coll_sm_priority 99 -mca hwloc_base_verbose 90 --display-map --mca 
 

Re: [OMPI devel] [EXTERNAL] ibarrier failures on MTT

2012-07-05 Thread Barrett, Brian W
On 7/3/12 5:08 PM, "Eugene Loh"  wrote:

>I'll look at this more, but for now I'll just note that the new ibarrier
>test is showing lots of failures on MTT (cisco and oracle).

I was initializing the MPI_ERROR field of the request status after calling
the request completion function, which was causing issues.  Should be
fixed now.

Brian