Re: [OMPI devel] RFC: Eliminate ompi/class/ompi_[circular_buffer_]fifo.h

2009-02-13 Thread Eugene Loh

Got it, thanks.

Is anyone else looking at that ticket?  I'm still a newbie and I suspect 
someone else could figure this problem out a lot faster than I could.  
So, I'm curious how much I should be looking at this ticket.


If amateurs are allowed to speculate, however, my guess is that this 
isn't really a BTL thing.  It reminds me of trac ticket 1468 (aka 
1516).  In that case, there was a lot of one-way traffic.  We needed a 
way to return frags to the sender.  I guess that was solved.


So, the present problem is something different.  My guess is that 
senders are overrunning receivers.  Could it be that some receiver (like 
the root in the MPI_Reduce) ends up with too many in-coming messages.  
It has to queue up unexpected messages, which slows it down further, 
which means it has to deal with even more unexpected messages, etc.  
Those messages have to be placed somewhere, which means memory is 
allocated, etc.?


Just a theory.  I don't know the PML well enough to judge its soundness.

But if this is the case, it's a PML issue rather than a BTL issue.  
Maybe there should be some flow control -- particular in our 
implementation of collectives!


Ralph Castain wrote:

The connection is only that, if you are going to modify the sm BTL as  
you say, you might at least want to be aware that we have a problem 
in  it so you (a) don't make it worse than it already is, and (b) 
might  keep an eye open for the problem as you are changing things.


On Feb 12, 2009, at 3:58 PM, Eugene Loh wrote:

Sorry, what's the connection?  Are we talking about 
https://svn.open-mpi.org/trac/ompi/ticket/1791  ?  Are you simply 
saying that if I'm doing some sm BTL work, I  should also look at 
1791?  I'm trying to figure out if there's some  more specific 
connection I'm missing.


Ralph Castain wrote:

You might want to look at ticket #1791 while you are doing this -  
Brad  added some valuable data earlier today.




Re: [OMPI devel] RFC: Eliminate ompi/class/ompi_[circular_buffer_]fifo.h

2009-02-13 Thread Jeff Squyres
George -- can you confirm/deny?  Is this something we need to fix for  
v1.3.1?


On Feb 12, 2009, at 10:15 PM, Eugene Loh wrote:


Got it, thanks.

Is anyone else looking at that ticket?  I'm still a newbie and I  
suspect someone else could figure this problem out a lot faster than  
I could.  So, I'm curious how much I should be looking at this ticket.


If amateurs are allowed to speculate, however, my guess is that this  
isn't really a BTL thing.  It reminds me of trac ticket 1468 (aka  
1516).  In that case, there was a lot of one-way traffic.  We needed  
a way to return frags to the sender.  I guess that was solved.


So, the present problem is something different.  My guess is that  
senders are overrunning receivers.  Could it be that some receiver  
(like the root in the MPI_Reduce) ends up with too many in-coming  
messages.  It has to queue up unexpected messages, which slows it  
down further, which means it has to deal with even more unexpected  
messages, etc.  Those messages have to be placed somewhere, which  
means memory is allocated, etc.?


Just a theory.  I don't know the PML well enough to judge its  
soundness.


But if this is the case, it's a PML issue rather than a BTL issue.   
Maybe there should be some flow control -- particular in our  
implementation of collectives!


Ralph Castain wrote:

The connection is only that, if you are going to modify the sm BTL  
as  you say, you might at least want to be aware that we have a  
problem in  it so you (a) don't make it worse than it already is,  
and (b) might  keep an eye open for the problem as you are changing  
things.


On Feb 12, 2009, at 3:58 PM, Eugene Loh wrote:

Sorry, what's the connection?  Are we talking about https://svn.open-mpi.org/trac/ompi/ticket/1791 
  ?  Are you simply saying that if I'm doing some sm BTL work, I   
should also look at 1791?  I'm trying to figure out if there's  
some  more specific connection I'm missing.


Ralph Castain wrote:

You might want to look at ticket #1791 while you are doing this  
-  Brad  added some valuable data earlier today.



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Eliminate ompi/class/ompi_[circular_buffer_]fifo.h

2009-02-13 Thread George Bosilca
I can't confirm or deny. The only thing I can tell is that the same  
test works fine over other BTL, so this tent either to pinpoint a  
problem in the sm BTL or in a particular path in the PML (the one used  
by the sm BTL). I'll have to dig a little bit more into it, but I was  
hoping to do it in the context of the new sm BTL (just to avoid having  
to do it twice).


  george.

On Feb 13, 2009, at 08:05 , Jeff Squyres wrote:

George -- can you confirm/deny?  Is this something we need to fix  
for v1.3.1?


On Feb 12, 2009, at 10:15 PM, Eugene Loh wrote:


Got it, thanks.

Is anyone else looking at that ticket?  I'm still a newbie and I  
suspect someone else could figure this problem out a lot faster  
than I could.  So, I'm curious how much I should be looking at this  
ticket.


If amateurs are allowed to speculate, however, my guess is that  
this isn't really a BTL thing.  It reminds me of trac ticket 1468  
(aka 1516).  In that case, there was a lot of one-way traffic.  We  
needed a way to return frags to the sender.  I guess that was solved.


So, the present problem is something different.  My guess is that  
senders are overrunning receivers.  Could it be that some receiver  
(like the root in the MPI_Reduce) ends up with too many in-coming  
messages.  It has to queue up unexpected messages, which slows it  
down further, which means it has to deal with even more unexpected  
messages, etc.  Those messages have to be placed somewhere, which  
means memory is allocated, etc.?


Just a theory.  I don't know the PML well enough to judge its  
soundness.


But if this is the case, it's a PML issue rather than a BTL issue.   
Maybe there should be some flow control -- particular in our  
implementation of collectives!


Ralph Castain wrote:

The connection is only that, if you are going to modify the sm BTL  
as  you say, you might at least want to be aware that we have a  
problem in  it so you (a) don't make it worse than it already is,  
and (b) might  keep an eye open for the problem as you are  
changing things.


On Feb 12, 2009, at 3:58 PM, Eugene Loh wrote:

Sorry, what's the connection?  Are we talking about https://svn.open-mpi.org/trac/ompi/ticket/1791 
  ?  Are you simply saying that if I'm doing some sm BTL work, I   
should also look at 1791?  I'm trying to figure out if there's  
some  more specific connection I'm missing.


Ralph Castain wrote:

You might want to look at ticket #1791 while you are doing this  
-  Brad  added some valuable data earlier today.



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] RFC: Eliminate ompi/class/ompi_[circular_buffer_]fifo.h

2009-02-13 Thread Eugene Loh

George Bosilca wrote:

I can't confirm or deny. The only thing I can tell is that the same  
test works fine over other BTL, so this tent either to pinpoint a  
problem in the sm BTL or in a particular path in the PML (the one 
used  by the sm BTL). I'll have to dig a little bit more into it, but 
I was  hoping to do it in the context of the new sm BTL (just to avoid 
having  to do it twice).


Okay.  I'll try to get "single queue" put back soon and might look at 
1791 along the way.


But here is what I wonder.  Let's say you have one-way traffic -- either 
rank A sending rank B messages without ever any traffic in the other 
direction, or repeated MPI_Reduce operations always with the same root 
-- and the senders somehow get well ahead of the receiver.  Say, A wants 
to pump 1,000,000 messages over and B is busy doing something else.  
What should happen?  What should the PML and BTL do?  The conditions 
could range from B not being in MPI at all, or B listening to the BTL 
without yet having the posted receives to match.  Should the connection 
become congested and force the sender to wait -- and if so, is this in 
the BTL or PML?  Or, should B keep on queueing up the unexpected messages?


After some basic "single queue" putbacks, I'll try to look at the code 
and understand what the PML is doing in cases like this.


[OMPI devel] svn commit

2009-02-13 Thread Eugene Loh

I'm having trouble figuring out how to put my changes back to the trunk.

I've been looking at the wiki pages, but don't really see the one last 
piece that I need of this puzzle.  I've used 
https://svn.open-mpi.org/trac/ompi/wiki/UsingMercurial to get me through 
these steps:


svn check-out of trunk to make an svn workspace on milliways
turn this also into an hg repository
bring hg workspace over to a local (Sun) workspace
make changes
hg commit and push back to milliways

Now, two questions:

1) Why don't I see my changes on milliways?  If I look at a file I 
changed in my local workspace, I don't see that change when I look at 
the same file on milliways.  However, if I do a fresh hg clone from the 
milliways workspace, I do see the changes.  So, somehow my changes are 
on milliways, but only in a way that hg sees them.


2) How do I get the changes from my milliways svn/hg workspace back into 
the trunk?


The workspace on milliways is /u/eloh/hg/sm_latency in case that matters.


Re: [OMPI devel] svn commit

2009-02-13 Thread Ralph Castain
When you push something to an hg repo, you have to go to that repo and  
do an "hg up" to update it. Hg holds your changes until you do the  
update.


Once you have them in the hg repo, you can do an "svn st" to see if  
you need to do anything further before committing back to the svn repo  
- e.g., add or remove files. When you are ready, just do an "svn ci"  
to commit your changes to the svn repo.


Ralph

On Feb 13, 2009, at 1:08 PM, Eugene Loh wrote:

I'm having trouble figuring out how to put my changes back to the  
trunk.


I've been looking at the wiki pages, but don't really see the one  
last piece that I need of this puzzle.  I've used https://svn.open-mpi.org/trac/ompi/wiki/UsingMercurial 
 to get me through these steps:


svn check-out of trunk to make an svn workspace on milliways
turn this also into an hg repository
bring hg workspace over to a local (Sun) workspace
make changes
hg commit and push back to milliways

Now, two questions:

1) Why don't I see my changes on milliways?  If I look at a file I  
changed in my local workspace, I don't see that change when I look  
at the same file on milliways.  However, if I do a fresh hg clone  
from the milliways workspace, I do see the changes.  So, somehow my  
changes are on milliways, but only in a way that hg sees them.


2) How do I get the changes from my milliways svn/hg workspace back  
into the trunk?


The workspace on milliways is /u/eloh/hg/sm_latency in case that  
matters.

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] svn commit

2009-02-13 Thread Eugene Loh

Ralph Castain wrote:

Once you have them in the hg repo, you can do an "svn st" to see if  
you need to do anything further before committing back to the svn 
repo  - e.g., add or remove files. When you are ready, just do an "svn 
ci"  to commit your changes to the svn repo.


Thanks, but I get:

Authentication realm:  Open MPI Subversion 
repositories access

Password for 'eloh':
Authentication realm:  Open MPI Subversion 
repositories access

Username: eloh
Password for 'eloh':
Authentication realm:  Open MPI Subversion 
repositories access

Username: eloh
Password for 'eloh':
svn: Commit failed (details follow):
svn: CHECKOUT of 
'/svn/ompi/!svn/ver/20515/trunk/ompi/mca/btl/sm/btl_sm.c': authorization 
failed (https://svn.open-mpi.org)

svn: Your commit message was left in a temporary file:
svn:'/nfs/rontok/xraid/users/eloh/hg/sm_latency/svn-commit.tmp'
[eloh@milliways sm_latency]$

Do I need some other authorization for a trunk putback?


Re: [OMPI devel] svn commit

2009-02-13 Thread Ralph Castain

Yeah - it looks like Jeff and/or Tim didn't authorize you yet.


On Feb 13, 2009, at 1:53 PM, Eugene Loh wrote:


Ralph Castain wrote:

Once you have them in the hg repo, you can do an "svn st" to see  
if  you need to do anything further before committing back to the  
svn repo  - e.g., add or remove files. When you are ready, just do  
an "svn ci"  to commit your changes to the svn repo.


Thanks, but I get:

Authentication realm:  Open MPI  
Subversion repositories access

Password for 'eloh':
Authentication realm:  Open MPI  
Subversion repositories access

Username: eloh
Password for 'eloh':
Authentication realm:  Open MPI  
Subversion repositories access

Username: eloh
Password for 'eloh':
svn: Commit failed (details follow):
svn: CHECKOUT of '/svn/ompi/!svn/ver/20515/trunk/ompi/mca/btl/sm/ 
btl_sm.c': authorization failed (https://svn.open-mpi.org)

svn: Your commit message was left in a temporary file:
svn:'/nfs/rontok/xraid/users/eloh/hg/sm_latency/svn-commit.tmp'
[eloh@milliways sm_latency]$

Do I need some other authorization for a trunk putback?
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] Announcing searchable OMPI source code tree

2009-02-13 Thread Jeff Squyres
Indiana U. has added another service to the Open MPI web site: a fully  
indexed and searchable database of Open MPI source code trees.   
There's a link under "Source Code Access" entitled "Searchable source  
tree" on the OMPI web site that takes you to https://svn.open-mpi.org/source/ 
 (get the CA cert from http://www.cs.indiana.edu/Facilities/FAQ/Mail/csci.crt 
 to avoid SSL certificate warnings from your browser).


This site, powered by OpenGrok, is a "wicked fast" searchable database  
of both the Open MPI SVN trunk and v1.3 release branch.


Enjoy!

--
Jeff Squyres
Cisco Systems