Hey Alex,

On Wed, 2011-04-06 at 07:09 -0700, Alex Netes wrote:
> Hi Al, Jared,
> 
> On 14:31 Wed 23 Mar     , Albert Chu wrote:
> > > 
> > > 1) Port Shifting
> > > 
> > > This is similar to what was done with some of the LMC > 0 code.
> > > Congestion would occur due to "alignment" of routes w/ common traffic
> > > patterns.  However, we found that it was also necessary for LMC=0 and
> > > only for used-ports.  For example, lets say there are 4 ports (called A,
> > > B, C, D) and we are routing lids 1-9 through them.  Suppose only routing
> > > through A, B, and C will reach lids 1-9.
> > > 
> > > The LFT would normally be:
> > > 
> > > A: 1 4 7
> > > B: 2 5 8
> > > C: 3 6 9
> > > D:
> > > 
> > > The Port Shifting option would make this:
> > > 
> > > A: 1 6 8
> > > B: 2 4 9
> > > C: 3 5 7
> > > D:
> > > 
> > > This option by itself improved the mpiGraph average send/recv bandwidth
> > > from 420 MB/s and 508 MB/s to to 991 MB/s and 1172 MB/s.
> > > 
> 
> After thinking about this a little more and reviewing Jared Carr's - Scatter 
> ports
> patch, I think we should combine these efforts into one framework as Al
> suggested. Moreover, isn't "port_shifting" too much fabric oriented? Do
> general OpenSM users will find this useful for them?
> Moreover, how can user identify that port_shifting may improve performance for
> him.

I will admit, I'm unsure of how much non-HPC users would benefit from
this option, be hurt by it, or if they would even care.  I can't speak
for all users, but here at LLNL and at most of the lab HPC sites, people
play with the options and experiment to find the best routing algorithm
+ settings that support their environment.  I would imagine the
port_shifting option would just be another option for people to
experiment with.

I think adding Jared's Scatter Ports would be easy to merge into my line
of patches.  Let me see if I can integrate his patch into my line
easily.

> Is providing shift factor (more than the suggested 1) will help to make it
> suitable foo a general case?

That seems like a good idea, we certainly could support an arbitrary
shift, allowing users to experiment if there is a better one for their
particular environment.

> > > 2) Remote Guid Sorting
> > > 
> > > Most core/spine switches we've seen thus far have had line boards
> > > connected to spine boards in a consistent pattern.  However, we recently
> > > got some Qlogic switches that connect from line/leaf boards to spine
> > > boards in a (to the casual observer) random pattern.  I'm sure there was
> > > a good electrical/board reason for this design, but it does hurt routing
> > > b/c updn doesn't account for this.  Here's an output from iblinkinfo as
> > > an example.
> > > 
> 
> Why this problem can't be addressed by guid_routing_order_file option?

The problem we encountered in our fabric is predominantly a
switch-to-switch routing issue with a spine switch.  The
guid_routing_order_file wouldn't be able to solve this, since its input
is just end ports.

Or another way to say it, this option directly affects the routing
decisions made.  The guid_routing_order_file does not, it only affects
the order in which routes are chosen (which can have consequences, but
the routing algorithm itself is unchanged).

Al

> 
> --Alex
-- 
Albert Chu
ch...@llnl.gov
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to