Re: [ewg] [PATCHv8 07/11] ib_core: Add API to support IBoE from userspace

2010-05-17 Thread Liran Liss
If we have a dedicated ABI call for this mapping, then it seems reasonable to 
have it device independent.
However, this mapping is really only used when creating address handles.

So, we can base the mapping on the (device specific) create_ah() flow, but 
provide generic mapping functions for all devices to use (this is kind of what 
happens now).
Also, using create_ah() doesn't introduce an ABI call that is specific to 
ib-->eth mappings.

This is similar to how device-specific ib_reg_user_mr() functions call the 
generic ib_umem_get()...

-Original Message-
From: linux-rdma-ow...@vger.kernel.org 
[mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Roland Dreier
Sent: Thursday, May 13, 2010 10:18 PM
To: Sean Hefty
Cc: 'Eli Cohen'; Eli Cohen; Linux RDMA list; ewg
Subject: Re: [PATCHv8 07/11] ib_core: Add API to support IBoE from userspace

 > Basically, what I want to understand is why does this change make sense?
 >
 > @@ -1139,6 +1139,10 @@ struct ib_device {
 >struct ib_grh *in_grh,
 >struct ib_mad *in_mad,
 >struct ib_mad *out_mad);
 > +int(*get_eth_l2_addr)(struct ib_device *device,
 > u8 port,
 > +  union ib_gid *dgid, int
 > sgid_idx,
 > +  u8 *mac, u16 *vlan_id, u8
 > *tagged);
 > +

Yes, that was pretty much my original question.  Why do we have a verb for 
userspace to call a device-specific method to do the mapping?  The layering 
seems wrong somewhere if we have a generic verb to do this mapping, but then 
put the mapping in device-specific code.

 - R.
--
Roland Dreier  || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] OFED-1.5.1 failure over iWarp

2010-02-04 Thread Liran Liss
The rdma_dev_addr refers to an L2 netdevice, so it makes perfect sense that the 
hw addresses stored in src/dst_dev_addr are macs for both iWarp and RoCEE (as 
is already the case).
Note that with this approach, dev_type is no longer sufficient to determine the 
ibdev type.
Following the "spirit" of the current code, it is probably cma_acquire_dev()'s 
job to fill in the missing ibdev type information after matching the netdev to 
an ibdev.

As for the match process, we could encode the mac in one of a RoCEE port's 
gids, but this entry would be a dummy, i.e., it would only serve for this 
matching process.
In contrast to iWARP, RoCEE gids really *are* gids, and serve as the port's 
*network* addresses.
In the current implementation, the link-local GID is a fully-qualified L3 
address, which borrows from IPv6's automatic configuration scheme; it is always 
be present and usuable.
So, the current suggestion of using the link-local gid for device matching has 
the advantage that the GID table contains only usable L3 gids - no dummys.

I don't know which of these alternatives is "cleaner".
--Liran

P.S. - I really wish that we had a cleaner way to match an ibdev to a netdev 
without overloading the gid table entries.
Basically, it should be the job of the entity that created the netdev to make 
this association, and stuff a pointer in the netdev.
Another option is to register a list of "L2 HW addresses" with an ibdev's port 
(i.e., in a different structure than the gid table), so the lookup would be 
straight-forward.



-Original Message-
From: Or Gerlitz [mailto:ogerl...@voltaire.com] 
Sent: Thursday, February 04, 2010 10:29 AM
To: Sean Hefty; Steve Wise; Liran Liss
Cc: 'Eli Cohen'; OpenFabrics EWG
Subject: Re: [ewg] OFED-1.5.1 failure over iWarp

Sean Hefty wrote:
> If I look at what's there today, we're trying to find some way to 
> match the net_device src_dev_addr with some sort of address associated with 
> an ib_device.
> In the case of actual IB, the net_device src_dev_addr contains the 
> SGID, which provides the mapping.

 
> Steve, can you please clarify the iWarp case for me?  For iWarp, 
> doesn't the src_dev_addr contain the MAC?  So, the 'GID's reported for 
> an iWarp device is really just the MAC.  Is this correct?


> If this is the case, then couldn't rocee (I hate that name) report its 
> MAC as one of its GIDs?  This would ensure that the mapping between 
> net_device and ib_device was correct.

Sean, AFAIK, reporting the MAC as one of the GIDs was part of the IBoE (feel 
free not to use names which you don't like) design presented couple of time, 
isn't it, Eli, Liran?

Or.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ewg] Re: [ofw] SC'09 BOF - Meeting notes

2009-12-01 Thread Liran Liss
The link local address that we are currently passing down from the
rdmacm encodes a MAC address that was obtained through neighbor
discovery; so we are safe. 

There are RDMAoE applications (some in the embedded space) that do not
use the rdmacm. Some of these rely on custom L2 address assignment and
would like to completely avoid the use of neighbor discovery. For these,
we can clearly state the requirement that the "Interface Identifier" in
the link local address that they pass down should be such that it
encodes a valid MAC address that the interface currently responds to.

In the future we also intend to allow the use of (non link local) IP
addresses encoded in the GIDs. And we will definitely use neighbor
discovery to translate those.
--Liran


-Original Message-
From: Roland Dreier [mailto:rdre...@cisco.com] 
Sent: Monday, November 30, 2009 7:34 AM
To: Liran Liss
Cc: Richard Frank; o...@lists.openfabrics.org; OpenFabrics EWG
Subject: Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes


 > RFC 4291, Appendix A.

Thanks for the pointer.  As far as I can tell from reading some IPv6
stuff, it really is broken to try to go from a link-local IPv6 address
back to a L2 ethernet address.  For example, RFC 2464 (pointed to by RFC
4291) says:

Ethernet Address
   The 48 bit Ethernet IEEE 802 address, in canonical bit
   order.  This is the address the interface currently
   responds to, and may be different from the built-in
   address used to derive the Interface Identifier.

It really seems to be setting ourselves up for trouble not to use
neighbor discovery to map IPv6 addresses to link-layer addresses.

 - R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ewg] Re: [ofw] SC'09 BOF - Meeting notes

2009-11-26 Thread Liran Liss
All addressing code now resides within a rdmaoe-specific flow in the
cma, so the changes do not seem invasive.
Is there any specific change that concerns you?

http://www.t11.org/ftp/t11/pub/fc/study/09-543v0.pdf describes the IBTA
definition in progress, which is in line with the current driver stack
implementation. 

--Liran


-Original Message-
From: Roland Dreier [mailto:rdre...@cisco.com] 
Sent: Monday, November 23, 2009 9:20 PM
To: Liran Liss
Cc: Richard Frank; o...@lists.openfabrics.org; OpenFabrics EWG
Subject: Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes


 > In any case, this is not a correctness issue that prohibits  >
experimentation with rdmaoe multicast on any network today.

I agree -- nothing prevents experimentation.  I am just leery about
making invasive changes to the core stack in the absence of any
documented design for IBoE (that I've seen at least).

 - R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ewg] Re: [ofw] SC'09 BOF - Meeting notes

2009-11-26 Thread Liran Liss
RFC 4291, Appendix A.
--Liran

-Original Message-
From: Roland Dreier [mailto:rdre...@cisco.com] 
Sent: Monday, November 23, 2009 9:18 PM
To: Liran Liss
Cc: Richard Frank; o...@lists.openfabrics.org; OpenFabrics EWG
Subject: Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes


 > RFC 3041 deals with static global IP addresses on the Internet,  >
especially for portable devices.
 > rmdaoe allows using link-local GIDs for applications residing on the
> same subnet, so I don't see the relevance.

I guess you're right -- I was confused about when random addresses are
used for generating stateless autoconfig addresses, and I guess even
with RFC3041 they are not for link-local scope.  However, do you know of
anything in the IPv6 RFCs that guarantees that link-local IPv6 addresses
are generated using ethernet addresses?

 - R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ewg] Re: [ofw] SC'09 BOF - Meeting notes

2009-11-23 Thread Liran Liss
See below.
--Liran

I understand that this is your assessment of the situation, looking on
the series present at the ofed1.5 rdmaoe branch in a black box manner
yields that many many files are touched, see below. Coming and saying
that changes in your HW LL driver are out of the scope for other
companies to discuss is not acceptable, since we provide enterprise
ready stack based on your HW driver.

LL: Any comments on our low-level driver are more than welcome.
That being said, we have been running extensive testing on this code
base for several months now and see no stability issues.

all the rdmaoe materials saying the lossless traffic class is a must,
are you saying that this works well also without it? then why from
architect point of view you have posed this requirement?

LL: lossless traffic can be achieved today using global pause, for
example. PFC is still important; we will submit initial patches that
support it next week.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ewg] Re: [ofw] SC'09 BOF - Meeting notes

2009-11-23 Thread Liran Liss
In the past few months of review, the responsibility for rdmaoe
addressing was moved to the rdmacm.
So, any future addressing enhancements can be confined to the rdmacm
module without breaking existing APIs.

RFC 3041 deals with static global IP addresses on the Internet,
especially for portable devices.
rmdaoe allows using link-local GIDs for applications residing on the
same subnet, so I don't see the relevance.
Note that for rdmacm apps, the intention is to map the IP addresses that
were assigned to the host's interfaces.
Please see http://www.t11.org/ftp/t11/pub/fc/study/09-543v0.pdf.

Regarding multicast, current switches will flood the traffic just as any
other non-IP multicast traffic (e.g., fcoe).
Using switches that support multicast pruning for additional ethertypes,
you can optimize the traffic and achieve the same link utilization as
normal IP multicast.
In any case, this is not a correctness issue that prohibits
experimentation with rdmaoe multicast on any network today.
--Liran
 

-Original Message-
From: ewg-boun...@lists.openfabrics.org
[mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Roland Dreier
Sent: Thursday, November 19, 2009 9:35 PM
To: Richard Frank
Cc: o...@lists.openfabrics.org; OpenFabrics EWG
Subject: Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes


 > Having lots of testing exposure can help in validating that all the
> edge cases are handled..

To some extent -- but there also needs to be some thinking involved to
make sure that the interface can actually handle future cases.

 > Are there a set of cases that you have in mind ?

For example -- how is multicast going to interact with IGMP on ethernet
switches?  How is address resolution going to be done (current patches
seem to assume that stateless IPv6 link-local addresses contain the
ethernet address, which is not valid if RFC 3041 is used)?  etc

 - R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ewg] Re: [ofw] SC'09 BOF - Meeting notes

2009-11-23 Thread Liran Liss
90% of the changes are either in the mlx4 driver, or self-contained in
the rdmaoe flow of the cma, which handles rdmaoe addressing and
connection setup.
The rest of the changes indeed touch various locations of the stack, but
they are either definitions or follow the same logic:

if (rdma_is_trasnport(ib_device, RDMA_TRANSPORT_RDMAOE))
do_something_rdmaoe_specific();

The patches don't change the logic of existing flows at all, so we are
not risking *anything* in terms of the stability of the current stack.

As for vlan id and priorities - we are fully aware to the importance of
exposing vlan ids and priorities to the user, but thanks for pointing
this out.
There are deployments today that work fine with the current patches; but
in any case, we are planning to send a follow-up patch set that adds
vlan+priority support in the near future.

--Liran 

-Original Message-
From: ewg-boun...@lists.openfabrics.org
[mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Or Gerlitz
Sent: Friday, November 20, 2009 1:39 AM
To: Richard Frank
Cc: Sean Hefty; Roland Dreier; OpenFabrics EWG
Subject: Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes

Richard Frank  wrote:

> How can 1500 lines out of 240k lines be a big change.. do I have these

> numbers right
>  - is the big change you are referring too?

Rick, the change set is way not self contained but rather touches
various parts of the core IB stack (rdma-cm module, ib address
resolution module, ib uverbs module and even the mad module) and
ofcourse some of the kernel and user space IB hw specific libraries.

> What is the risk area that you are worried about .. do you think it 
> will break current  transports or existing ULPs?

yes, this would be simply not supportable, think about that, you want to
hand your customers with a code which didn't pass review nor acceptance
by the Linux IB stack maintainers (Roland and Sean), say, next a crash
happens at this or that module / line, next, what you except the
maintainers to do?

> If it's just about how the implementation is done.. can this be 
> resolved concurrently with getting the bits available for evaluation
now..

an rdmaoe branch at the git tree was set and an releases are maintained,
its all what you need for evaluation, five lines later you're talking on
deployments...

> As RoCEE is totally transparent to existing ULPs.. any potential 
> changes would not be visible.. and therefore not an issue for ULP /
clients going forward.. right?

this is how you see things, since the IBTA IBXoE annex isn't released,
you just don't know what would be the bottom line.

> Oracle would like to see RoCEE get into 1.5

you guys have set a note to the rds developer community that that Oracle
recently moved from 1.3.x to 1.4.y, no special work is expected on 1.5.z
and that you have lots of plans for 1.6.w ... what's the urgency to get
these bits into 1.5?

> We are testing with RoCEE now and plan to deploy it fairly soon.. in 
> very large configuratio

the proposed patch set doesn't let you use non zero VLAN, aren't you
expecting Ethernet customers to trivially require that? also you can't
use non zero traffic class (priority bits), where all the IBXoE
materials are talking about how much working on a lossless traffic class
is a must... if indeed this is the case, the patch set is useless
without the ability to specify a traffic class, as CEE switches would
typically (always?) set only some of the traffic classes to be lossless
(e.g the ones used for FCoE, IBXoE) and the rest to be lossy


Or
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ewg] Re: [ofw] SC'09 BOF - Meeting notes

2009-11-23 Thread Liran Liss
As far as core APIs go, the patch set introduces 2 basic additions
rather than changes:
- A new ABI function to resolve gids to macs - ib_get_mac()
- A new kernel ib_device function to get the port transport -
ib_get_port_transport().
There are no changes to the Verbs API.

All the address resolution stuff is contained in the cma code, so I
think we code extend its logic in the future without breaking things at
the interface level.
Do you have anything specific in mind?

--Liran
 

-Original Message-
From: ewg-boun...@lists.openfabrics.org
[mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Roland Dreier
Sent: Thursday, November 19, 2009 9:17 PM
To: Richard Frank
Cc: o...@lists.openfabrics.org; OpenFabrics EWG
Subject: Re: [ewg] Re: [ofw] SC'09 BOF - Meeting notes


 > How can 1500 lines out of 240k lines be a big change.. do I have
these  > numbers right - is the  > big change you are referring too?

If there are significant changes to the core APIs -- and IBoE has
exactly this impact -- then yes it can be a big change even if the line
count is small.

 > What is the risk area that you are worried about .. do you think it
> will break current  > transports or existing ULPs ?

I am worried that no one has thought through all the issues and corner
cases around address resolution, multicast, etc, and that when we do get
a standardized version of IBoE, we'll have to break core APIs yet again.

 - R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ofa-general] RE: [ewg] [PATCH 2/8 v3] ib_core: RDMAoE support onlyQP1

2009-07-14 Thread Liran Liss
Hi Robert,

Your suggestion to represent RDMAoE as a transport indeed makes the code
simpler.
Thus, we will have:
switch(port_transport) {
case RDMA_TRANSPORT_IB:
...
break;
case RDMA_TRANSPORT_RDMAOE:
...
break;
case RDMA_TRANSPORT_IWARP:
...
break;
};

instead of:
switch(port_transport) {
case RDMA_TRANSPORT_IB:
if (port_type == IB) {
...
} else {
...
}
break;
case RDMA_TRANSPORT_IWARP:
...
break;
};

which is cleaner.
In addition, for places in which IB and RDMAOE behave the same, we will
have:
case RDMA_TRANSPORT_IB:
case RDMA_TRANSPORT_RDMAOE:
...
break;

which will make this fact explicit.
The only difference is that the switch() will operate on port-transport
rather than node transport.
(We can add a wrapper that if the ib_dev didn't regsiter a
port-transport function, it will default to the node transport.)

Thanks!
--Liran


-Original Message-
From: Liran Liss 
Sent: Tuesday, July 14, 2009 11:53 AM
To: 'Woodruff, Robert J'; Eli Cohen; Hefty, Sean; Roland Dreier
Cc: ewg; general-list
Subject: RE: [ofa-general] RE: [ewg] [PATCH 2/8 v3] ib_core: RDMAoE
support onlyQP1

S.B.
--Liran 


> Trying to emulate IB for mad services is a total hack and not how this
new transport should be added into the core. It should be it's own
transport type, just like iWarp was added.
> You should start with adding a new transport type to ib_verbs.h, e.g.,

LL: it is not a hack: RDMAoE will probably use mad services at least for
connection management, and additional ones in the future.


--- ib_verbs.h  2009-07-13 09:06:10.0 -0400
+++ ib_verbs_new.h  2009-07-14 03:00:23.0 -0400
@@ -64,12 +64,14 @@ enum rdma_node_type {
RDMA_NODE_IB_CA = 1,
RDMA_NODE_IB_SWITCH,
RDMA_NODE_IB_ROUTER,
-   RDMA_NODE_RNIC
+   RDMA_NODE_RNIC,
+   RDMA_NODE_IBXOE
 };

LL: a multi-port HCA can have both IB and Ethernet ports, so this is not
a per-node thing.

 enum rdma_transport_type {
RDMA_TRANSPORT_IB,
-   RDMA_TRANSPORT_IWARP
+   RDMA_TRANSPORT_IWARP,
+   RDMA_TRANSPORT_IBXOE
 };

LL: thanks, we will look into this. I am not sure that "transport" is
the right terminology, since we are using the IB transport layer.


 enum rdma_transport_type___
general mailing list
gene...@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ofa-general] RE: [ewg] [PATCH 6/8 v3] IB/ipoib: restrict IPoIB to work on IB ports only

2009-07-14 Thread Liran Liss
Oops, I meant "exits" instead of "exists"...


-Original Message-----
From: Liran Liss 
Sent: Tuesday, July 14, 2009 11:16 AM
To: 'Woodruff, Robert J'; Eli Cohen; Hefty, Sean; Roland Dreier
Cc: ewg; general-list
Subject: RE: [ofa-general] RE: [ewg] [PATCH 6/8 v3] IB/ipoib: restrict
IPoIB to work on IB ports only

This exaclty the same as for iWARP: IPoIB checks the node transport, and
if it is != IB, it exists.
For RDMAoE, we do the same check but at the port level.


-Original Message-
From: general-boun...@lists.openfabrics.org
[mailto:general-boun...@lists.openfabrics.org] On Behalf Of Woodruff,
Robert J
Sent: Tuesday, July 14, 2009 12:04 AM
To: Eli Cohen; Hefty, Sean; Roland Dreier
Cc: ewg; general-list
Subject: [ofa-general] RE: [ewg] [PATCH 6/8 v3] IB/ipoib: restrict IPoIB
to work on IB ports only

Eli Cohen wrote, 

>We don't want IPoIB to work over RDMAoE since it will give worse 
>performance than working directly on Ethernet interfaces which are a 
>prerequisite to RDMAoE anyway.

This is another reason why NOT to try to add IBxOE under the IB
transport, but rather add it as it's own transport type. We should not
need to hack all the InfiniBand ULPs to now have to know the difference
between real IB and
IBxOE.___
general mailing list
gene...@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ofa-general] RE: [ewg] [PATCH 2/8 v3] ib_core: RDMAoE support onlyQP1

2009-07-14 Thread Liran Liss
S.B.
--Liran 


> Trying to emulate IB for mad services is a total hack and not how this
new transport should be added into the core. It should be it's own
transport type, just like iWarp was added.
> You should start with adding a new transport type to ib_verbs.h, e.g.,

LL: it is not a hack: RDMAoE will probably use mad services at least for
connection management, and additional ones in the future.


--- ib_verbs.h  2009-07-13 09:06:10.0 -0400
+++ ib_verbs_new.h  2009-07-14 03:00:23.0 -0400
@@ -64,12 +64,14 @@ enum rdma_node_type {
RDMA_NODE_IB_CA = 1,
RDMA_NODE_IB_SWITCH,
RDMA_NODE_IB_ROUTER,
-   RDMA_NODE_RNIC
+   RDMA_NODE_RNIC,
+   RDMA_NODE_IBXOE
 };

LL: a multi-port HCA can have both IB and Ethernet ports, so this is not
a per-node thing.

 enum rdma_transport_type {
RDMA_TRANSPORT_IB,
-   RDMA_TRANSPORT_IWARP
+   RDMA_TRANSPORT_IWARP,
+   RDMA_TRANSPORT_IBXOE
 };

LL: thanks, we will look into this. I am not sure that "transport" is
the right terminology, since we are using the IB transport layer.


 enum rdma_transport_type___
general mailing list
gene...@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ofa-general] RE: [ewg] [PATCH 6/8 v3] IB/ipoib: restrict IPoIB to work on IB ports only

2009-07-14 Thread Liran Liss
This exaclty the same as for iWARP: IPoIB checks the node transport, and
if it is != IB, it exists.
For RDMAoE, we do the same check but at the port level.


-Original Message-
From: general-boun...@lists.openfabrics.org
[mailto:general-boun...@lists.openfabrics.org] On Behalf Of Woodruff,
Robert J
Sent: Tuesday, July 14, 2009 12:04 AM
To: Eli Cohen; Hefty, Sean; Roland Dreier
Cc: ewg; general-list
Subject: [ofa-general] RE: [ewg] [PATCH 6/8 v3] IB/ipoib: restrict IPoIB
to work on IB ports only

Eli Cohen wrote, 

>We don't want IPoIB to work over RDMAoE since it will give worse 
>performance than working directly on Ethernet interfaces which are a 
>prerequisite to RDMAoE anyway.

This is another reason why NOT to try to add IBxOE under the IB
transport, but rather add it as it's own transport type. We should not
need to hack all the InfiniBand ULPs to now have to know the difference
between real IB and
IBxOE.___
general mailing list
gene...@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [ofa-general] [PATCH 0/9] RDMAoE - RDMA over Ethernet

2009-06-18 Thread Liran Liss

>Let's just say that at this point I completely disagree with where
these patches try to abstract the differences, which are many.

>RDMA apps that want to use this and IB without going through an
abstraction will need different code -- just like they would for iWarp,
which also provides RDMA over Ethernet, and is a standard.  IB mad and
SA query modules are not 
>appropriate places for abstracting the differences between IB, iWarp,
and whatever name we give this.

>This could change depending on whether this is really trying to be IB
with a different L2, or is just another RDMA protocol that runs on
Ethernet.

>- Sean

Sean,

These are indeed real concerns; I know that the cma is the natural place
for abstracting transport differences, but I am worried about non-cma
Infiniband ULPs which can work just as well with RDMAoE (perhaps we can
specifically expose RDMAoE "path queries" as a simple library function).

We will rethink our approach to SA queries and post new patches shortly.
Note that without SA query emulation, the RDMAoE patches really amount
to just a few cosmetic changes to ib_core...:)

Thanks for the feedback.
--Liran


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [ofa-general] [PATCH 0/9] RDMAoE - RDMA over Ethernet

2009-06-17 Thread Liran Liss
S.B.
--Liran 


>RDMA over Ethernet (RDMAoE) allows running the IB transport protocol 
>over Ethernet, providing IB capabilities for Ethernet fabrics. The 
>packets are standard Ethernet frames with an Ethertype, an IB GRH,  
>unmodified IB transport headers and payload. HCA RDMAoE ports are no 
>different than regular IB ports from the RDMA stack perspective.

I would refer to this as IBoE, not RDMAoE.

The RDMA stack should see these ports different than regular IB HCA
ports.
There are a lot of differences that should not simply be hidden or
incorrectly
assumed: QP0, QoS, multiple paths, routing(?), no SA, etc. 

LL: the RDMA stack will see that the port has different link types.
SLs map cleanly to VLAN user priorities.

>IB subnet management and SA services are not required for RDMAoE 
>operation;

Then I would not try to emulate it at all.  As Hal mentioned in a
separate post, there are too many ways to interact with the SA that an
emulation won't cover.

LL: you need to emulate *enough* so that typical applications don't need
to worry about the link type. SA path queries is the best example.
Otherwise, every RDMA application (not necessarily a CMA app) will need
to have different code paths depending on the link type.

>Ethernet management practices are used instead. In Ethernet, nodes are 
>commonly referred to by applications by means of an IP address. RDMAoE 
>treats IP addresses that were assigned to the corresponding Ethernet 
>port as GIDs, and makes use of the IP stack to bind a destination 
>address to the corresponding netdevice (just as the CMA does today for 
>IB and iWARP) and to obtain its L2 MAC addresses.

Is the actual L3 address an IP address, or just an encoded IP address in
an IBoE
L3 address?  What L3 protocol is being used and will it interoperate
with some peer L3 protocol (IP or IB)?

LL: RDMAoE uses GIDs that encoded IP addresses. For IPv6, this is
straightforward. We use mapped address for IPv4 (::0x).
Currently, RDMAoE is not routable, as the IB routing specs are not
complete.
However, nothing prohibits making it so in the future (either Eth to Eth
or Eth to IB).

- Sean

___
general mailing list
gene...@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [ofa-general] [PATCH 3/9] ib_core: RDMAoE support only QP1

2009-06-17 Thread Liran Liss

> Which modules will use QP1 and for what purpose?  I see
sa_query/multicast, but there's not an actual SA.  I'm guessing that the
ib_cm works without changes.

Currently, QP1 will be used only for the CM, which indeed doesn't
require any changes for RDMAoE.
However, we can gradually extend the support for additional QP1 services
in the future.

> To clarify, do all IBoE packets carry a GRH? 

Yes.

___
general mailing list
gene...@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [ofa-general] [PATCH 2/9] ib_core: kernel API for GID -->MAC translations

2009-06-17 Thread Liran Liss

> Why not just use IP to MAC calls?  Or use the MAC as the GUID?

We do use standard OS services to map the IP addresses (that were
encoded in the GID) to MACs.
GIDs encode IP addresses rather than MACs to enable users to use the
node names that they are used to.
Specifically, we will feed in all IP addresses that were assigned to the
Ethernet interface to the corresponding port GID table.
This will also enable routing in the future.

The only exception is IPv6 link-local addresses, which already encodes
the MAC.
In this case, a simple algorithmic operation extracts the MAC without
requiring ARP, etc.

> Do the GIDs follow the IB GID format?
Yes.


___
general mailing list
gene...@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg