Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Galen Shipman

Everyone:

George thought this was okay after the discussion, I should have made  
the wiki prior to my commit as it did look very Open IB specific.


please review:

https://svn.open-mpi.org/trac/ompi/wiki/BTLSemantics

Let me know if you want to discuss this further and we can setup a  
call early next week.



Thanks,

Galen



On Jun 7, 2007, at 12:49 PM, Don Kerr wrote:

It would be difficult for me to attend this afternoon.  Tomorrow is  
much

better for me.

-DON

George Bosilca wrote:


I'm available this afternoon.

   george.

On Jun 7, 2007, at 2:35 PM, Galen Shipman wrote:



Are people available today to discuss this over the phone?

- Galen



On Jun 7, 2007, at 11:28 AM, Gleb Natapov wrote:


On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote:


) I expect you to revise the patch in order to propose a generic
solution or I'll trigger a vote against the patch. I vote to be
backed out of the trunk as it export way to much knowledge from  
the

Open IB BTL into the PML layer.


The patch solves real problem. If we want to back it out we need to
find
another solution. I also didn't like this change too much, but I
thought
about other solutions and haven't found something better that what
Galen did. If you have something in mind lets discuss it.

As a general comment this kind of discussion is why I prefer to  
send

significant changes as a patch to the list for discussion before
committing.



  george.

PS: With Gleb changes the problem is the same. The following  
snippet

reflect exactly the same behavior as the original patch.


I didn't try to change the semantic. Just make the code to match  
the

semantic that Galen described.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



- 
---


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Gleb Natapov
On Thu, Jun 07, 2007 at 02:38:51PM -0400, George Bosilca wrote:
> 
> On Jun 7, 2007, at 1:28 PM, Gleb Natapov wrote:
> 
> >On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote:
> >>) I expect you to revise the patch in order to propose a generic
> >>solution or I'll trigger a vote against the patch. I vote to be
> >>backed out of the trunk as it export way to much knowledge from the
> >>Open IB BTL into the PML layer.
> >The patch solves real problem. If we want to back it out we need to  
> >find
> >another solution. I also didn't like this change too much, but I  
> >thought
> >about other solutions and haven't found something better that what
> >Galen did. If you have something in mind lets discuss it.
> 
> Well, I didn't really investigate this matter, as for all devices  
> where I work this never happens. What really bother me, and which is  
> not as Galen describe it is the following line:
> 
> >frag->base.order = order
> 
> As the value of "order" come from the PML level and the Open IB BTL  
The intention is that most of the time PML will use NO_ORDER, but
sometimes order is important.

> use it without any change, to me it means that some knowledge  
> belonging to the BTL (BTL_OPENIB_LP_QP is clearly Open IB related  
> isn't it?) has ben exported to the PML. You can turn it any way you  
BTL_OPENIB_LP_QP is never used in PML code. It just like some kind of
cookie that is transparent to PML. You don't claim that we expose BTL
internals if BTL setups callback for PML to use, don't you?

> want, this clearly means that the PML is able to give direct orders  
> to the BTL which allow it to put the fragment in the high or low  
> priority queue (which is a VERY Open IB feature).
OpenIB and UDAPL are only two BTLs that can reorder packets currently, so
we have two choses here. We can prohibit from BTL to reorder packets or we
have to provide a way to insure ordering between certain packets. The
former will limit BTL interface IMHO.

> 
> Now, if we create a constant MCA_BTL_HP_QUEUE it might look better.  
> But, again as all others BTLs will completely ignore this, it look  
> like a Open IB feature exported to the PML to me.
> 
Galen currently works on adding arbitrary number of different sized
queues in openib BTL. There will be much more then two queues. This kind
of thing is really internal to BTL.

--
Gleb.


Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Galen Shipman

Call in details:

I have scheduled your requested audio conference "Open MPI" for today  
beginning at 2:30pm to 3:30pm mountain time with 7 ports.

Dial in number: 5-4165 local  866-260-0475  toll free


- Galen


On Jun 7, 2007, at 1:47 PM, Galen Shipman wrote:



On Jun 7, 2007, at 12:49 PM, Don Kerr wrote:


It would be difficult for me to attend this afternoon.  Tomorrow is
much
better for me.



Brian and I are both out tomorrow. I think what we will do is have a
call today, report back to the group and then if necessary have
another call on Monday/Tuesday.

- Galen



-DON

George Bosilca wrote:


I'm available this afternoon.

   george.

On Jun 7, 2007, at 2:35 PM, Galen Shipman wrote:



Are people available today to discuss this over the phone?

- Galen



On Jun 7, 2007, at 11:28 AM, Gleb Natapov wrote:


On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote:


) I expect you to revise the patch in order to propose a generic
solution or I'll trigger a vote against the patch. I vote to be
backed out of the trunk as it export way to much knowledge from
the
Open IB BTL into the PML layer.


The patch solves real problem. If we want to back it out we  
need to

find
another solution. I also didn't like this change too much, but I
thought
about other solutions and haven't found something better that what
Galen did. If you have something in mind lets discuss it.

As a general comment this kind of discussion is why I prefer to
send
significant changes as a patch to the list for discussion before
committing.



  george.

PS: With Gleb changes the problem is the same. The following
snippet
reflect exactly the same behavior as the original patch.


I didn't try to change the semantic. Just make the code to match
the
semantic that Galen described.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



 
-

---

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Galen Shipman


On Jun 7, 2007, at 12:49 PM, Don Kerr wrote:

It would be difficult for me to attend this afternoon.  Tomorrow is  
much

better for me.



Brian and I are both out tomorrow. I think what we will do is have a  
call today, report back to the group and then if necessary have  
another call on Monday/Tuesday.


- Galen



-DON

George Bosilca wrote:


I'm available this afternoon.

   george.

On Jun 7, 2007, at 2:35 PM, Galen Shipman wrote:



Are people available today to discuss this over the phone?

- Galen



On Jun 7, 2007, at 11:28 AM, Gleb Natapov wrote:


On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote:


) I expect you to revise the patch in order to propose a generic
solution or I'll trigger a vote against the patch. I vote to be
backed out of the trunk as it export way to much knowledge from  
the

Open IB BTL into the PML layer.


The patch solves real problem. If we want to back it out we need to
find
another solution. I also didn't like this change too much, but I
thought
about other solutions and haven't found something better that what
Galen did. If you have something in mind lets discuss it.

As a general comment this kind of discussion is why I prefer to  
send

significant changes as a patch to the list for discussion before
committing.



  george.

PS: With Gleb changes the problem is the same. The following  
snippet

reflect exactly the same behavior as the original patch.


I didn't try to change the semantic. Just make the code to match  
the

semantic that Galen described.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



- 
---


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Galen Shipman

Okay, how is  2:30 mountain time for everyone?

I will setup a a call in if this works.

Thanks,

Galen


On Jun 7, 2007, at 12:39 PM, George Bosilca wrote:


I'm available this afternoon.

  george.

On Jun 7, 2007, at 2:35 PM, Galen Shipman wrote:



Are people available today to discuss this over the phone?

- Galen



On Jun 7, 2007, at 11:28 AM, Gleb Natapov wrote:


On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote:

) I expect you to revise the patch in order to propose a generic
solution or I'll trigger a vote against the patch. I vote to be
backed out of the trunk as it export way to much knowledge from the
Open IB BTL into the PML layer.

The patch solves real problem. If we want to back it out we need to
find
another solution. I also didn't like this change too much, but I
thought
about other solutions and haven't found something better that what
Galen did. If you have something in mind lets discuss it.

As a general comment this kind of discussion is why I prefer to send
significant changes as a patch to the list for discussion before
committing.



  george.

PS: With Gleb changes the problem is the same. The following  
snippet

reflect exactly the same behavior as the original patch.

I didn't try to change the semantic. Just make the code to match the
semantic that Galen described.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Brian Barrett

I'm available this afternoon...

Brian

On Jun 7, 2007, at 12:39 PM, George Bosilca wrote:


I'm available this afternoon.

  george.

On Jun 7, 2007, at 2:35 PM, Galen Shipman wrote:



Are people available today to discuss this over the phone?

- Galen



On Jun 7, 2007, at 11:28 AM, Gleb Natapov wrote:


On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote:

) I expect you to revise the patch in order to propose a generic
solution or I'll trigger a vote against the patch. I vote to be
backed out of the trunk as it export way to much knowledge from the
Open IB BTL into the PML layer.

The patch solves real problem. If we want to back it out we need to
find
another solution. I also didn't like this change too much, but I
thought
about other solutions and haven't found something better that what
Galen did. If you have something in mind lets discuss it.

As a general comment this kind of discussion is why I prefer to send
significant changes as a patch to the list for discussion before
committing.



  george.

PS: With Gleb changes the problem is the same. The following  
snippet

reflect exactly the same behavior as the original patch.

I didn't try to change the semantic. Just make the code to match the
semantic that Galen described.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Don Kerr
It would be difficult for me to attend this afternoon.  Tomorrow is much 
better for me.


-DON

George Bosilca wrote:


I'm available this afternoon.

   george.

On Jun 7, 2007, at 2:35 PM, Galen Shipman wrote:



Are people available today to discuss this over the phone?

- Galen



On Jun 7, 2007, at 11:28 AM, Gleb Natapov wrote:


On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote:


) I expect you to revise the patch in order to propose a generic
solution or I'll trigger a vote against the patch. I vote to be
backed out of the trunk as it export way to much knowledge from the
Open IB BTL into the PML layer.


The patch solves real problem. If we want to back it out we need to
find
another solution. I also didn't like this change too much, but I
thought
about other solutions and haven't found something better that what
Galen did. If you have something in mind lets discuss it.

As a general comment this kind of discussion is why I prefer to send
significant changes as a patch to the list for discussion before
committing.



  george.

PS: With Gleb changes the problem is the same. The following snippet
reflect exactly the same behavior as the original patch.


I didn't try to change the semantic. Just make the code to match the
semantic that Galen described.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
 



Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread George Bosilca

I'm available this afternoon.

  george.

On Jun 7, 2007, at 2:35 PM, Galen Shipman wrote:



Are people available today to discuss this over the phone?

- Galen



On Jun 7, 2007, at 11:28 AM, Gleb Natapov wrote:


On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote:

) I expect you to revise the patch in order to propose a generic
solution or I'll trigger a vote against the patch. I vote to be
backed out of the trunk as it export way to much knowledge from the
Open IB BTL into the PML layer.

The patch solves real problem. If we want to back it out we need to
find
another solution. I also didn't like this change too much, but I
thought
about other solutions and haven't found something better that what
Galen did. If you have something in mind lets discuss it.

As a general comment this kind of discussion is why I prefer to send
significant changes as a patch to the list for discussion before
committing.



  george.

PS: With Gleb changes the problem is the same. The following snippet
reflect exactly the same behavior as the original patch.

I didn't try to change the semantic. Just make the code to match the
semantic that Galen described.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread George Bosilca


On Jun 7, 2007, at 1:28 PM, Gleb Natapov wrote:


On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote:

) I expect you to revise the patch in order to propose a generic
solution or I'll trigger a vote against the patch. I vote to be
backed out of the trunk as it export way to much knowledge from the
Open IB BTL into the PML layer.
The patch solves real problem. If we want to back it out we need to  
find
another solution. I also didn't like this change too much, but I  
thought

about other solutions and haven't found something better that what
Galen did. If you have something in mind lets discuss it.


Well, I didn't really investigate this matter, as for all devices  
where I work this never happens. What really bother me, and which is  
not as Galen describe it is the following line:



frag->base.order = order


As the value of "order" come from the PML level and the Open IB BTL  
use it without any change, to me it means that some knowledge  
belonging to the BTL (BTL_OPENIB_LP_QP is clearly Open IB related  
isn't it?) has ben exported to the PML. You can turn it any way you  
want, this clearly means that the PML is able to give direct orders  
to the BTL which allow it to put the fragment in the high or low  
priority queue (which is a VERY Open IB feature).


Now, if we create a constant MCA_BTL_HP_QUEUE it might look better.  
But, again as all others BTLs will completely ignore this, it look  
like a Open IB feature exported to the PML to me.





As a general comment this kind of discussion is why I prefer to send
significant changes as a patch to the list for discussion before
committing.



  george.

PS: With Gleb changes the problem is the same. The following snippet
reflect exactly the same behavior as the original patch.

I didn't try to change the semantic. Just make the code to match the
semantic that Galen described.


You didn't change compared with Galen patch. But the same knowledge  
export happens with your patch (for the same reasons as described  
above).




--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Galen Shipman


Are people available today to discuss this over the phone?

- Galen



On Jun 7, 2007, at 11:28 AM, Gleb Natapov wrote:


On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote:

) I expect you to revise the patch in order to propose a generic
solution or I'll trigger a vote against the patch. I vote to be
backed out of the trunk as it export way to much knowledge from the
Open IB BTL into the PML layer.
The patch solves real problem. If we want to back it out we need to  
find
another solution. I also didn't like this change too much, but I  
thought

about other solutions and haven't found something better that what
Galen did. If you have something in mind lets discuss it.

As a general comment this kind of discussion is why I prefer to send
significant changes as a patch to the list for discussion before
committing.



  george.

PS: With Gleb changes the problem is the same. The following snippet
reflect exactly the same behavior as the original patch.

I didn't try to change the semantic. Just make the code to match the
semantic that Galen described.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] BTL Semantics Teleconference: was : Re: [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Galen Shipman

I just had a discussion with Rich regarding the BTL semantics.
I think what might be helpful here is for us to have telecon to  
discuss this further.


I only have one goal out of this, and that is to firmly define the  
ordering semantics of the BTL, or alternatively local/remote  
completion semantics of the BTL, whatever they may be.


I have created a wiki here to help describe the issue as I currently  
see it, please feel free to add to this with suggestions/etc..


https://svn.open-mpi.org/trac/ompi/wiki/BTLSemantics


- Galen






On Jun 7, 2007, at 9:55 AM, Galen Shipman wrote:



On Jun 7, 2007, at 9:11 AM, George Bosilca wrote:


There is something weird with this change, and the patch reflect
it. The new argument "order" come from the PML level and might be
MCA_BTL_NO_ORDER (which is kind of global) or BTL_OPENIB_LP_QP or
BTL_OPENIB_HP_QP (which are definitively Open IB related). Do you
really intend to let the PML knows about Open IB internal constants ?


No, the PML knows only one thing about the order tag, it is either
MCA_BTL_NO_ORDER or it is something that the BTL assigns.
The PML has no idea about BTL_OPENIB_LP_QP or BTL_OPENIB_HP_QP, to
the PML it is just an order tag assigned to a fragment by the BTL.

So the semantics are that after a btl_send/put/get an order tag may
be assigned by the BTL to the descriptor,
This order tag can then be specified to subsequent calls to btl_alloc
or btl_prepare. The PML has no idea what the value means, other than
he is requesting a descriptor that will be ordered w.r.t. a
previously transmitted descriptor.



If it's the case (which seems to be true from the following snippet
if(MCA_BTL_NO_ORDER == order) {
frag->base.order = BTL_OPENIB_LP_QP;
} else {
frag->base.order = order;
}

So I am choosing some ordering to use here because the PML told me he
doesn't care, what is wrong with this?




) I expect you to revise the patch in order to propose a generic
solution or I'll trigger a vote against the patch.

This exports no knowledge of the Open IB BTL to the PML layer, the
PML doesn't know that this is a QP index, he doesn't care! The PML
simply uses this value (if it wants to) to request ordering with
subsequent fragments. We use the QP index only as a BTL optimization,
it could have been anything. So the only new knowledge that the PML
has is how to request that ordering of fragments be enforced, and the
BTL doesn't even have to provide this if it doesn't want, that is the
reason for MCA_BTL_NO_ORDER.


Please describe a use case where this is not a generic solution. Keep
in mind that MX, TCP, GM all can provide ordering guarantees if they
wish, in fact for MX you can simply always assign an order tag, say
the value is 1. MX can then guarantee ordering of all fragments sent
over the same BTL.



I vote to be backed out of the trunk as it export way to much
knowledge from the Open IB BTL into the PML layer.


The only other option that I have identified that doesn't push PML
level protocol into the BTL is to require that BTLs always guarantee
ordering of fragments sent/put/get over the same BTL.




  george.

PS: With Gleb changes the problem is the same. The following
snippet reflect exactly the same behavior as the original patch.


Gleb's changes don't change the semantic guarantees that I have
described above.





frag->base.order = order;
assert(frag->base.order != BTL_OPENIB_HP_QP);

On Jun 7, 2007, at 9:49 AM, Gleb Natapov wrote:


Hi Galen,

On Sun, May 27, 2007 at 10:19:09AM -0600, Galen Shipman wrote:



With current code this is not the case. Order tag is set during a
fragment
allocation. It seems wrong according to your description. Attached
patch fixes
this. If no specific ordering tag is provided to allocation
function order of
the fragment is set to be MCA_BTL_NO_ORDER. After call to send/ 
put/

get order
is set to whatever QP was used for communication. If order is set
before send call
it is used to choose QP.



I do set the order tag during allocation/prepare, but the defined
semantics are that the tag is only valid after send/put/get. We can
set them up any where we wish in the BTL, the PML however cannot
rely
on anything until after the send/put/get call. So really this is an
issue of semantics versus implementation. The implementation I
believe does conform to the semantics as the upper layer (PML)
doesn't use the tag value until after a call to send/put/get.

I will look over the patch however, might make more sense to delay
setting the value until the actual send/put/get call.


Have you had a chance to look over the patch?

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list

Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Gleb Natapov
On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote:
> ) I expect you to revise the patch in order to propose a generic  
> solution or I'll trigger a vote against the patch. I vote to be  
> backed out of the trunk as it export way to much knowledge from the  
> Open IB BTL into the PML layer.
The patch solves real problem. If we want to back it out we need to find
another solution. I also didn't like this change too much, but I thought
about other solutions and haven't found something better that what
Galen did. If you have something in mind lets discuss it.

As a general comment this kind of discussion is why I prefer to send
significant changes as a patch to the list for discussion before
committing.

> 
>   george.
> 
> PS: With Gleb changes the problem is the same. The following snippet  
> reflect exactly the same behavior as the original patch.
I didn't try to change the semantic. Just make the code to match the
semantic that Galen described.

--
Gleb.


Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Galen Shipman


On Jun 7, 2007, at 9:11 AM, George Bosilca wrote:

There is something weird with this change, and the patch reflect  
it. The new argument "order" come from the PML level and might be  
MCA_BTL_NO_ORDER (which is kind of global) or BTL_OPENIB_LP_QP or  
BTL_OPENIB_HP_QP (which are definitively Open IB related). Do you  
really intend to let the PML knows about Open IB internal constants ?


No, the PML knows only one thing about the order tag, it is either  
MCA_BTL_NO_ORDER or it is something that the BTL assigns.
The PML has no idea about BTL_OPENIB_LP_QP or BTL_OPENIB_HP_QP, to  
the PML it is just an order tag assigned to a fragment by the BTL.


So the semantics are that after a btl_send/put/get an order tag may  
be assigned by the BTL to the descriptor,
This order tag can then be specified to subsequent calls to btl_alloc  
or btl_prepare. The PML has no idea what the value means, other than  
he is requesting a descriptor that will be ordered w.r.t. a  
previously transmitted descriptor.




If it's the case (which seems to be true from the following snippet
if(MCA_BTL_NO_ORDER == order) {
frag->base.order = BTL_OPENIB_LP_QP;
} else {
frag->base.order = order;
}
So I am choosing some ordering to use here because the PML told me he  
doesn't care, what is wrong with this?




) I expect you to revise the patch in order to propose a generic  
solution or I'll trigger a vote against the patch.
This exports no knowledge of the Open IB BTL to the PML layer, the  
PML doesn't know that this is a QP index, he doesn't care! The PML  
simply uses this value (if it wants to) to request ordering with  
subsequent fragments. We use the QP index only as a BTL optimization,  
it could have been anything. So the only new knowledge that the PML  
has is how to request that ordering of fragments be enforced, and the  
BTL doesn't even have to provide this if it doesn't want, that is the  
reason for MCA_BTL_NO_ORDER.



Please describe a use case where this is not a generic solution. Keep  
in mind that MX, TCP, GM all can provide ordering guarantees if they  
wish, in fact for MX you can simply always assign an order tag, say  
the value is 1. MX can then guarantee ordering of all fragments sent  
over the same BTL.



I vote to be backed out of the trunk as it export way to much  
knowledge from the Open IB BTL into the PML layer.


The only other option that I have identified that doesn't push PML  
level protocol into the BTL is to require that BTLs always guarantee  
ordering of fragments sent/put/get over the same BTL.





  george.

PS: With Gleb changes the problem is the same. The following  
snippet reflect exactly the same behavior as the original patch.


Gleb's changes don't change the semantic guarantees that I have  
described above.






frag->base.order = order;
assert(frag->base.order != BTL_OPENIB_HP_QP);

On Jun 7, 2007, at 9:49 AM, Gleb Natapov wrote:


Hi Galen,

On Sun, May 27, 2007 at 10:19:09AM -0600, Galen Shipman wrote:



With current code this is not the case. Order tag is set during a
fragment
allocation. It seems wrong according to your description. Attached
patch fixes
this. If no specific ordering tag is provided to allocation
function order of
the fragment is set to be MCA_BTL_NO_ORDER. After call to send/put/
get order
is set to whatever QP was used for communication. If order is set
before send call
it is used to choose QP.



I do set the order tag during allocation/prepare, but the defined
semantics are that the tag is only valid after send/put/get. We can
set them up any where we wish in the BTL, the PML however cannot  
rely

on anything until after the send/put/get call. So really this is an
issue of semantics versus implementation. The implementation I
believe does conform to the semantics as the upper layer (PML)
doesn't use the tag value until after a call to send/put/get.

I will look over the patch however, might make more sense to delay
setting the value until the actual send/put/get call.


Have you had a chance to look over the patch?

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] request help debugging openmpi on openib/ipath

2007-06-07 Thread Jeff Squyres

On May 31, 2007, at 7:25 PM, Ralph Campbell wrote:


I can run the Intel MPI benchmarks OK at np=2 but at np=4,
it hangs.


Bummer.


If I change /usr/share/openmpi/mca-btl-openib-hca-params.ini
[QLogic InfiniPath]
use_eager_rdma = 0


FYI, you can change such values on the command line and/or  
environment -- see http://www.open-mpi.org/faq/? 
category=tuning#setting-mca-params.  The MCA parameter in question is  
btl_openib_use_eager_rdma.



Then, it gets much farther before hanging on 2MB+ messages.
If I create .openmpi/mca-params.conf with
min_rdma_size = 2147483648
The benchmark completes reliably.


Yoinks.  I assume you mean btl_openib_min_rdma_size, right?  (note  
that the name slightly changed for the upcoming 1.3 [i.e., the SVN  
trunk]; although the old name is deprecated, it'll still work)



When the hang happens, the ipath driver thinks all the posted
work requests and completion entries have been generated
and openmpi seems to think they haven't all completed.

Can someone point me to the code where RDMA write is polled
on the destination node?


All the OFA code in OMPI is in ompi/mca/btl/openib (i.e., the  
"openib" BTL plugin).


The completion polling occurs in btl_openib_component.c, in two main  
functions: btl_openib_component_progress() and  
btl_openib_module_progress().  The component progress function mainly  
checks for eager RDMA progress; if there are none (per your setting  
use_eager_rdma to 0), it'll fall through to the module progress()  
function.  There's one module "instance" for each HCA port, so we  
basically loop over checking each module (port).


Galen tells me that it may be a little more subtle than this, such as  
an ordering issue -- he's going to reply with more detail shortly.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread George Bosilca
There is something weird with this change, and the patch reflect it.  
The new argument "order" come from the PML level and might be  
MCA_BTL_NO_ORDER (which is kind of global) or BTL_OPENIB_LP_QP or  
BTL_OPENIB_HP_QP (which are definitively Open IB related). Do you  
really intend to let the PML knows about Open IB internal constants ?  
If it's the case (which seems to be true from the following snippet

if(MCA_BTL_NO_ORDER == order) {
frag->base.order = BTL_OPENIB_LP_QP;
} else {
frag->base.order = order;
}
) I expect you to revise the patch in order to propose a generic  
solution or I'll trigger a vote against the patch. I vote to be  
backed out of the trunk as it export way to much knowledge from the  
Open IB BTL into the PML layer.


  george.

PS: With Gleb changes the problem is the same. The following snippet  
reflect exactly the same behavior as the original patch.


frag->base.order = order;
assert(frag->base.order != BTL_OPENIB_HP_QP);

On Jun 7, 2007, at 9:49 AM, Gleb Natapov wrote:


Hi Galen,

On Sun, May 27, 2007 at 10:19:09AM -0600, Galen Shipman wrote:



With current code this is not the case. Order tag is set during a
fragment
allocation. It seems wrong according to your description. Attached
patch fixes
this. If no specific ordering tag is provided to allocation
function order of
the fragment is set to be MCA_BTL_NO_ORDER. After call to send/put/
get order
is set to whatever QP was used for communication. If order is set
before send call
it is used to choose QP.



I do set the order tag during allocation/prepare, but the defined
semantics are that the tag is only valid after send/put/get. We can
set them up any where we wish in the BTL, the PML however cannot rely
on anything until after the send/put/get call. So really this is an
issue of semantics versus implementation. The implementation I
believe does conform to the semantics as the upper layer (PML)
doesn't use the tag value until after a call to send/put/get.

I will look over the patch however, might make more sense to delay
setting the value until the actual send/put/get call.


Have you had a chance to look over the patch?

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Gleb Natapov
Hi Galen,

On Sun, May 27, 2007 at 10:19:09AM -0600, Galen Shipman wrote:
> 
> > With current code this is not the case. Order tag is set during a  
> > fragment
> > allocation. It seems wrong according to your description. Attached  
> > patch fixes
> > this. If no specific ordering tag is provided to allocation  
> > function order of
> > the fragment is set to be MCA_BTL_NO_ORDER. After call to send/put/ 
> > get order
> > is set to whatever QP was used for communication. If order is set  
> > before send call
> > it is used to choose QP.
> >
> 
> I do set the order tag during allocation/prepare, but the defined  
> semantics are that the tag is only valid after send/put/get. We can  
> set them up any where we wish in the BTL, the PML however cannot rely  
> on anything until after the send/put/get call. So really this is an  
> issue of semantics versus implementation. The implementation I  
> believe does conform to the semantics as the upper layer (PML)  
> doesn't use the tag value until after a call to send/put/get.
> 
> I will look over the patch however, might make more sense to delay  
> setting the value until the actual send/put/get call.
> 
Have you had a chance to look over the patch?

--
Gleb.


Re: [OMPI devel] jnysal-openib-wireup branch

2007-06-07 Thread Jeff Squyres

Thanks!

On Jun 7, 2007, at 7:25 AM, Nysal Jan wrote:

I'll cleanup the code and add the granular selction part. It should  
be ready by monday.

--Nysal

On 6/6/07, Jeff Squyres < jsquy...@cisco.com> wrote:Ok -- so did  
you want to go ahead and make these changes, or did you

want me to do it?

Either way, I'd be in favor of all this stuff coming to the trunk in
the Very Near Future.  :-)



On Jun 6, 2007, at 7:02 AM, Nysal Jan wrote:

> Hi Jeff,
>
> 1. The logic for if_exclude was not correct.  I committed a fix for
> it.  https://svn.open-mpi.org/trac/ompi/changeset/14748
>
> Thanks
>
> 2. I'm a bit confused on a) how the new MCA params mca_num_hcas and
> map_num_procs_per_hca are supposed to be used and b) what their
> default values shoulant code)d be.
>
> Probably these params(and relevant code) should be removed now,
> since there is a plan for generic Socket/Core to HCA mapping
> scheme. mca_num_hcas is the maximum number of HCAs a task can use.
> Eg. If mpa_num_procs_per_hca is 3 and max_num_hcas is 2. On any
> node, task 1/2/3 are mapped to hca1 & hca2, task 4/5/6 are mapped
> to hca3 & hca4 
> Default values were set as 1(thats what we needed at that point in
> time).It needs to be modified so that ompi's default behaviour
> remains unchanged (ie. use all hcas)
>
> 2a. I don't quite understand the logic of is_hca_allowed(); I could
> not get it to work properly.  Specifically, I have 2 machines each
> with 2 HCAs (mthca0 has 1 port, mthca1 has 2 ports).  If I ran 2
> procs (regardless of byslot or bynode), is_hca_allowed() would  
always

> return false for the 2nd proc.  So I put a temporary override in
> is_hca_allowed() to simply always return true.  Can you explain how
> the logic is supposed to work in that function?
>
> Explained above
>
> 2b. The default values of max_num_hcas and map_num_procs_per_hca are
> both 1.  Based on my (potentially flawed) understanding of how these
> MCA params are meant to be used, this is different than the current
> default behavior.  The current default is that all procs use all
> ACTIVE ports on all HCAs.  I *think* your new default param values
> will set each proc to use the ACTIVE ports on exactly one HCA,
> regardless how many there are in the host.  Did you mean to do that?
> Also: both values must currently be >=1; should we allow -1 for both
> of these values, meaning that they can be "infinite" ( i.e.,  
based on

> the number of HCAs in the host)?
>
> Yes,  the defaults need to be changed. I'll also make the selection
> logic more granular (eg. -mca mca_btl_openib_if_include
> mthca0:1,mthca1:1)
>
> --Nysal
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] jnysal-openib-wireup branch

2007-06-07 Thread Nysal Jan

I'll cleanup the code and add the granular selction part. It should be ready
by monday.
--Nysal

On 6/6/07, Jeff Squyres  wrote:


Ok -- so did you want to go ahead and make these changes, or did you
want me to do it?

Either way, I'd be in favor of all this stuff coming to the trunk in
the Very Near Future.  :-)



On Jun 6, 2007, at 7:02 AM, Nysal Jan wrote:

> Hi Jeff,
>
> 1. The logic for if_exclude was not correct.  I committed a fix for
> it.  https://svn.open-mpi.org/trac/ompi/changeset/14748
>
> Thanks
>
> 2. I'm a bit confused on a) how the new MCA params mca_num_hcas and
> map_num_procs_per_hca are supposed to be used and b) what their
> default values shoulant code)d be.
>
> Probably these params(and relevant code) should be removed now,
> since there is a plan for generic Socket/Core to HCA mapping
> scheme. mca_num_hcas is the maximum number of HCAs a task can use.
> Eg. If mpa_num_procs_per_hca is 3 and max_num_hcas is 2. On any
> node, task 1/2/3 are mapped to hca1 & hca2, task 4/5/6 are mapped
> to hca3 & hca4 
> Default values were set as 1(thats what we needed at that point in
> time).It needs to be modified so that ompi's default behaviour
> remains unchanged (ie. use all hcas)
>
> 2a. I don't quite understand the logic of is_hca_allowed(); I could
> not get it to work properly.  Specifically, I have 2 machines each
> with 2 HCAs (mthca0 has 1 port, mthca1 has 2 ports).  If I ran 2
> procs (regardless of byslot or bynode), is_hca_allowed() would always
> return false for the 2nd proc.  So I put a temporary override in
> is_hca_allowed() to simply always return true.  Can you explain how
> the logic is supposed to work in that function?
>
> Explained above
>
> 2b. The default values of max_num_hcas and map_num_procs_per_hca are
> both 1.  Based on my (potentially flawed) understanding of how these
> MCA params are meant to be used, this is different than the current
> default behavior.  The current default is that all procs use all
> ACTIVE ports on all HCAs.  I *think* your new default param values
> will set each proc to use the ACTIVE ports on exactly one HCA,
> regardless how many there are in the host.  Did you mean to do that?
> Also: both values must currently be >=1; should we allow -1 for both
> of these values, meaning that they can be "infinite" ( i.e., based on
> the number of HCAs in the host)?
>
> Yes,  the defaults need to be changed. I'll also make the selection
> logic more granular (eg. -mca mca_btl_openib_if_include
> mthca0:1,mthca1:1)
>
> --Nysal
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel