[OMPI devel] Next Tuesday call: review face-to-face meeting results

2015-01-28 Thread Jeff Squyres (jsquyres)
The major agenda item for next Tuesday's call will be to review the results of 
this week's face-to-face developer meeting.

We've only had 1 day so far, but it's fairly obvious that we need to share the 
results, topics, and decisions that have been made at the meeting with everyone 
who was not here.  

We're putting bullet points on the wiki for the major decisions, but it's no 
substitute for a) being here, or b) being able to talk through some of the 
details to give insight into how some of the decisions were made, and c) 
getting input from those who were not here.  Here's the wiki (scroll down to 
see the "Resolved" section):

https://github.com/open-mpi/ompi/wiki/Meeting-2015-01

Big item from yesterday, for example, is a plan to change how OMPI handles 
procs in each MPI process (for scalability reasons), and its implications on 
PML/BTL interactions (and potentially for PML/MTL interactions, but probably 
need some feedback from MTL maintainers before making changes to cm).

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI devel] OMPI dev meeting today

2015-01-28 Thread Jeff Squyres (jsquyres)
I just created a webex for today's meeting.  We're likely to go over some 
topics that people care about (Nathan, Sandia).

Feel free to join whenever you want -- it's in-room audio, but of course, is no 
substitute for being here.  :-)

https://cisco.webex.com/cisco/e.php?MTID=m1f96c4d5e8bf6852973a8e31072adec1

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] OMPI dev meeting today

2015-01-28 Thread Jeff Squyres (jsquyres)
The webex died when we went to lunch.  Here's the new one we just started.  
We're going to start with memkind:

   https://cisco.webex.com/cisco/e.php?MTID=m49f6a7eaea454fc876bb129d526a66bf


> On Jan 28, 2015, at 11:22 AM, Jeff Squyres (jsquyres)  
> wrote:
> 
> I just created a webex for today's meeting.  We're likely to go over some 
> topics that people care about (Nathan, Sandia).
> 
> Feel free to join whenever you want -- it's in-room audio, but of course, is 
> no substitute for being here.  :-)
> 
>https://cisco.webex.com/cisco/e.php?MTID=m1f96c4d5e8bf6852973a8e31072adec1
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/01/16829.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI devel] MTL interfaces

2015-01-28 Thread Jeff Squyres (jsquyres)
Ryan / Sandia (anyone else who cares about MTL interfaces):

Can you attend a webex tomorrow at 1pm US Central to discuss adding one-sided 
interfaces to the MTL?  (it must be before 2pm US Central)

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] MTL interfaces

2015-01-28 Thread Friedley, Andrew
I care and can attend at that time.

Andrew

> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff
> Squyres (jsquyres)
> Sent: Wednesday, January 28, 2015 1:56 PM
> To: Open MPI Developers List
> Subject: [OMPI devel] MTL interfaces
> 
> Ryan / Sandia (anyone else who cares about MTL interfaces):
> 
> Can you attend a webex tomorrow at 1pm US Central to discuss adding one-
> sided interfaces to the MTL?  (it must be before 2pm US Central)
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-
> mpi.org/community/lists/devel/2015/01/16831.php


Re: [OMPI devel] MTL interfaces

2015-01-28 Thread Burette, Yohann
I care and will attend as well.

Will you create a new webex for tomorrow or should we use the same one as today?

Thank you,
Yohann

-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Friedley, Andrew
Sent: Wednesday, January 28, 2015 2:05 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] MTL interfaces

I care and can attend at that time.

Andrew

> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff 
> Squyres (jsquyres)
> Sent: Wednesday, January 28, 2015 1:56 PM
> To: Open MPI Developers List
> Subject: [OMPI devel] MTL interfaces
> 
> Ryan / Sandia (anyone else who cares about MTL interfaces):
> 
> Can you attend a webex tomorrow at 1pm US Central to discuss adding 
> one- sided interfaces to the MTL?  (it must be before 2pm US Central)
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-
> mpi.org/community/lists/devel/2015/01/16831.php
___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2015/01/16832.php


Re: [OMPI devel] MTL interfaces

2015-01-28 Thread Jeff Squyres (jsquyres)
Great, thanks guys.  I'll make a new webex for tomorrow and send it out.


> On Jan 28, 2015, at 5:23 PM, Burette, Yohann  wrote:
> 
> I care and will attend as well.
> 
> Will you create a new webex for tomorrow or should we use the same one as 
> today?
> 
> Thank you,
> Yohann
> 
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Friedley, Andrew
> Sent: Wednesday, January 28, 2015 2:05 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] MTL interfaces
> 
> I care and can attend at that time.
> 
> Andrew
> 
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff 
>> Squyres (jsquyres)
>> Sent: Wednesday, January 28, 2015 1:56 PM
>> To: Open MPI Developers List
>> Subject: [OMPI devel] MTL interfaces
>> 
>> Ryan / Sandia (anyone else who cares about MTL interfaces):
>> 
>> Can you attend a webex tomorrow at 1pm US Central to discuss adding 
>> one- sided interfaces to the MTL?  (it must be before 2pm US Central)
>> 
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: http://www.open-
>> mpi.org/community/lists/devel/2015/01/16831.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/01/16832.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/01/16833.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI devel] Git tip of the week: "hub" script

2015-01-28 Thread Jeff Squyres (jsquyres)
Here's a totally sweet command line script that does a lot of common Github 
actions for you:

https://hub.github.com/

On OSX, you can install via Homebrew or ports.  There's precompiled binaries 
for Linux.

One common OMPI use case:

--
$ cd path_to_my_ompi_release_clone
$ hub checkout https://github.com/open-mpi/ompi-release/pull/173

--> Note: that's just the URL of the PR (copied from my web browser)

Updating rhc54
remote: Counting objects: 6, done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 6 (delta 0), reused 1 (delta 0)
Unpacking objects: 100% (6/6), done.
>From git://github.com/rhc54/ompi-release
* [new branch]  cmr/george -> rhc54/cmr/george
Branch rhc54-cmr/george set up to track remote branch cmr/george from rhc54.
Switched to a new branch 'rhc54-cmr/george'
-

And now I'm on a local branch representing that PR.  I can 
autogen/build/install/MTT/etc.  And then "git checkout v1.8" to go back to the 
stock v1.8 branch when I'm done.

Sweet!

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI devel] For discussion tomorrow: MTL issues

2015-01-28 Thread Jeff Squyres (jsquyres)
MTL authors --

We had *some* discussion of MTL issues this afternoon in the room, but need 
your input (since most of you are not here).  Here's what we'd like to talk 
about tomorrow (and we realize you might not have answers for this tomorrow).

Short version: based on Mellanox's experience, why not ditch the CM PML and 
have all current MTLs move up to be PMLs?

More detail:

We all know that Mellanox moved their MXM MTL up to be a PML.  The short 
version of "why did they do this?" is because CM really added no value for MXM. 
 Literally, all it did was add overhead:

1. translate some OMPI data structures to a neutral/CM data structure
2. which was then translated into the MXM data structures
3. then call MXM

So why not chop out one of those layers:

1. translate OMPI data structures into MXM data structures
2. then call MXM

Taking a crass look at the existing MTLs, we wonder if it would be worthwhile 
to do the same thing for all of them.  It doesn't seem (to us) that it would be 
a lot of work -- the PML and MTL interfaces are quite similar.  And there could 
be message rate improvements for those MTLs-turned-PMLs, just like it did for 
MXM/yalla.

*If* this is a good assumption -- that MTLs should all become PMLs -- then MPI 
one-sided operations become the next logical question.  I.e., what happens when 
you call MPI_PUT / MPI_GET / etc.?

Right now, you'll end up using the osc/pt2pt component, which will use PML 
calls to effect MPI RMA functionality over the PML interface.  Which is fine, 
and will work correctly in all cases.

However, MTL-turned-PML authors will then have the option of writing an 
osc/YOUR_COMPONENT for doing optimized MPI-one-sided operations on your network.

This is what we would like to discuss with you tomorrow.  Tell us that this 
idea is crazy, or that it's ok, or that you need to think about it, ...etc. 
Let's chat.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI devel] mlx4 QP operation err

2015-01-28 Thread Dave Turner
I'm testing RoCE on 40 Gbps Mellanox ethernet cards and am getting a
mlx4 QP operation error every time it gets to testing 132 kB packets.  These
are aggregate tests in that 16 cores on one host are doing bi-directional
ping-pongs to 16 cores on another host across the Mellanox cards.

  I've found some old references to similar mlx4 errors dating back to
2009 that lead me to believe this may be a firmware error.  I believe we're
running the most up to date version of the firmware.

 Could someone comment on whether these are firmware issues, and
if so how to report them to Mellanox?  I've attached some files with more
detailed information on this problem.

 Dave Turner

-- 
Work: davetur...@ksu.edu (785) 532-7791
 118 Nichols Hall, Manhattan KS  66502
Home:drdavetur...@gmail.com
  cell: (785) 770-5929


mlx4_error.tar.gz
Description: GNU Zip compressed data


Re: [OMPI devel] mlx4 QP operation err

2015-01-28 Thread Christopher Samuel
Hi Dave,

On 29/01/15 11:31, Dave Turner wrote:

>   I've found some old references to similar mlx4 errors dating back to
> 2009 that lead me to believe this may be a firmware error.  I believe we're
> running the most up to date version of the firmware.

There was a new version released a few days ago, 2.33.5100:

http://www.mellanox.com/page/firmware_table_ConnectX3ProEN

Release notes are here:

http://www.mellanox.com/pdf/firmware/ConnectX3Pro-FW-2_33_5100-release_notes.pdf

Bug fixes start on page 23, looks like there are 29 fixes
in this version, and fix 1 is for RoCE (though of course may
not be relevant) - "The first Read response was not treated as
implicit ACK" (discovered in 2.30.8000).

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci



Re: [OMPI devel] mlx4 QP operation err

2015-01-28 Thread Devendar Bureddy
are you able to reproduce this error with ib verbs bw test?  I hope,  you are 
running on lossless Ethernet fabric setup and selecting correct VLAN .

-Devendar

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Dave Turner
Sent: Wednesday, January 28, 2015 4:31 PM
To: de...@open-mpi.org
Subject: [OMPI devel] mlx4 QP operation err


I'm testing RoCE on 40 Gbps Mellanox ethernet cards and am getting a
mlx4 QP operation error every time it gets to testing 132 kB packets.  These
are aggregate tests in that 16 cores on one host are doing bi-directional
ping-pongs to 16 cores on another host across the Mellanox cards.

  I've found some old references to similar mlx4 errors dating back to
2009 that lead me to believe this may be a firmware error.  I believe we're
running the most up to date version of the firmware.

 Could someone comment on whether these are firmware issues, and
if so how to report them to Mellanox?  I've attached some files with more
detailed information on this problem.

 Dave Turner

--
Work: davetur...@ksu.edu (785) 532-7791
 118 Nichols Hall, Manhattan KS  66502
Home:drdavetur...@gmail.com
  cell: (785) 770-5929


Re: [OMPI devel] MTL interfaces

2015-01-28 Thread Todd Kordenbrock
Hi Jeff,

I can attend at that time.

todd


On Wed, Jan 28, 2015 at 3:55 PM, Jeff Squyres (jsquyres)  wrote:

> Ryan / Sandia (anyone else who cares about MTL interfaces):
>
> Can you attend a webex tomorrow at 1pm US Central to discuss adding
> one-sided interfaces to the MTL?  (it must be before 2pm US Central)
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/01/16831.php
>