Re: When IBoE will be merged to upstream?

2010-07-20 Thread Jason Gunthorpe
On Mon, Jul 19, 2010 at 10:28:24PM -0700, Paul Grun wrote:

 This is incorrect.  The intent of the text is that a verbs consumer deals
 ONLY in layer 3 addresses (GIDs).  It doesn't make any sense to restrict the
 interpretation to mean 'LIDs', since LIDs are a NOP for RoCE.  
 
 CA16-17: When accessing the services of a RoCE verbs provider, the
 source and destination identifiers contained in the address vector shall
 consist of GIDs; the address vector shall not contain layer 2 references
 (e.g. local addresses). Layer 2 references include source and destination
 local identifiers and LID Path Bits.
 
 layer 2 references refers to LIDs, MAC IDs or any other form of layer 2
 address.  If the wording of the text is insufficiently clear, please post a
 comment on comment tracker on the IBTA website.  Nevertheless, that is the
 intent.  

Since it never actually says MAC address it reads like it is just
excluding existing IB L2 addresses from use in ROCEE which makes alot
sense. If MAC address was ment, it should have been listed explicitly!

  Exactly how and where the MAC address comes about was never decided,
 
 Correct.  The IBTA IBXoE WG felt that defining the mapping from GID to MAC
 ID should be a function of the underlying fabric (Ethernet) and thus was out
 of scope for us to define the mapping mechanism.

IHMO, it is a bad design to create a architecture that requires the L2
information in the AH, forbid the L2 information from being passed
into the AH APIs, and then not specify how the L2 information is
created.

How is anyone supposed to implement this?

  BTW, I absolutely hate the mixing of 'Sometimes it is a IPv4,
  sometimes it is a GID, and sometimes it is an IPv6' in the same
  field. That is just so nasty. The GID is a GID, don't overload it in
  an ambiguous way to mean 2 other things!
 
 I am unaware of any overloading, at least in the RoCE spec.

This is a feature added as part of the proposed patch set that spurned
this whole discussion. It is this feature that made create_ah into a
blocking call ...

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: When IBoE will be merged to upstream?

2010-07-20 Thread Liran Liss
   Small correction needed regarding the multicast forwarding.
   Since we are talking about IPv6 multicast groups, which 
 translate to   33:33:xx:xx:xx:xx MAC address, the router 
 listener notification protocol   is going to be MLD and not 
 IGMP. Still there are switches which support   MLD 
 forwarding to prevent the network flooding.
 
 Well as I said the mapping of IBoE MGID to Ethernet address 
 is not specified.  However I agree that using the same 
 mapping as IPv6 so we end up with 33:33:... addresses makes sense.

Agreed.

 
 Yes, you are right that MLD snooping is the mechanism for 
 switches to discover IPv6 multicast group membership.  
 However for the IBoE case there is no requirement that IPv6 
 multicast group membership corresponds in any way to the IBoE 
 multicast group membership for the interface (and indeed as 
 far as I can tell from the IBoE spec, there is no requirement 
 that any IPv6 interface be configured on an IBoE port).
 
 Furthermore, even if an IBoE interface sends MLD messages for a given
 IPv6 group, there is no requirement that a switch use the 
 membership information for that group to forward multicast 
 packets with a non-IPv6 ethertype.

Right. Initially there can be flooding within the VLAN. In the future we can
evolve to use a group-membership protocol when customers that care about
the efficiency drive their switch vendors to support it.

Are there any other issues that you would like us to address before updating 
the patches?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: When IBoE will be merged to upstream?

2010-07-19 Thread Paul Grun


 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
 ow...@vger.kernel.org] On Behalf Of Jason Gunthorpe
 Sent: Monday, July 12, 2010 9:22 AM
 To: Liran Liss
 Cc: Or Gerlitz; Roland Dreier; Hefty, Sean; Aleksey Senin; linux-rdma;
 mo...@voltaire.com; aleks...@voltaire.com; yift...@voltaire.com; Tziporet
 Koren; al...@voltaire.com
 Subject: Re: When IBoE will be merged to upstream?
 
 On Mon, Jul 12, 2010 at 10:58:19AM +0300, Liran Liss wrote:
 
  ...A verbs consumer using a RoCE network relies strictly
   on so-called
Layer 3 addressing (GIDs); layer 2 addresses (e.g. subnet local
identifiers) are not passed across the verbs interface...
  
   Ah, hmm, well, I was on that list during this time and I don't
  think
   this statement means what you are saying it does  :)
  
 
  ?? It doesn't get any clearer than this.
 
 'subnet local identidifer' == LID
 
 The text is saying that the specification does not use any of the LID
 fields in the verbs interface, that is it. It isn't talking about MAC
 addresses.
 

This is incorrect.  The intent of the text is that a verbs consumer deals
ONLY in layer 3 addresses (GIDs).  It doesn't make any sense to restrict the
interpretation to mean 'LIDs', since LIDs are a NOP for RoCE.  

CA16-17: When accessing the services of a RoCE verbs provider, the
source and destination identifiers contained in the address vector shall
consist of GIDs; the address vector shall not contain layer 2 references
(e.g. local addresses). Layer 2 references include source and destination
local identifiers and LID Path Bits.

layer 2 references refers to LIDs, MAC IDs or any other form of layer 2
address.  If the wording of the text is insufficiently clear, please post a
comment on comment tracker on the IBTA website.  Nevertheless, that is the
intent.  

 Exactly how and where the MAC address comes about was never decided,

Correct.  The IBTA IBXoE WG felt that defining the mapping from GID to MAC
ID should be a function of the underlying fabric (Ethernet) and thus was out
of scope for us to define the mapping mechanism.


 and at least some participants thought it should be a 1:1 algorithmic
 mapping from the GID.
 
 Ditto for VLANs, how and where the vlan tag comes about is not part of
 the spec.
 
  Good idea! This is exactly what we do today for addresses that the
  user explicitly declares as link-local addresses.  But, we can't
  mandate an overload of the GID in a way that it prevents its use as
  a true L3 address (eventually routable).
 
 We are very unlikely to see routable IBoE, ever.. But, even if we do
 get there some day then we could extend the AH.
 

Please keep in mind that IB routing is not yet defined.  Although the RoCE
spec doesn't explicitly say so, the intent is that routing for RoCE can be
defined once the work on IB routing has been completed. 

 BTW, I absolutely hate the mixing of 'Sometimes it is a IPv4,
 sometimes it is a GID, and sometimes it is an IPv6' in the same
 field. That is just so nasty. The GID is a GID, don't overload it in
 an ambiguous way to mean 2 other things!

I am unaware of any overloading, at least in the RoCE spec.

 
   create_ah does not accept any sort of source address
   specifier
 
  You are wrong -- sgid_index specifies it.
 
 So, what do you propose to put in sgid_index? It isn't big enough to
 store an IPv6 address. You can't exactly number every IP assigned to
 every ethernet interface.
 
 The other fields you mention are not a supserset of socket parameters,
 they are only IPv6 parameters, IPv4 uses a different set.
 
  Jason, bottom line, I think that we both agree that the rdmacm
  should do the address resolution.  The difference is that by having
  the rdmacm initially only bind to the device and complete the
  resolution later (by a call from create_ah()), we don't change the
  user API for *all* gid types.  Having addressed your concerns
  regarding resolution below the Verbs, we continue to believe that
  this is the best approach.
 
 Again, I don't see how what I've outlined changes the API in any
 way.
 
 Doing two routing lookups for the same connection is bad design, it is
 racey. L2 parameters have to flow from the first routing lookup in
 RDMA-CM to everything else.
 
 Liran, I don't think you have at all come close to addressing my
 concerns, you still haven't explained how a full route lookup is even
 possible in create_ah, for instance. Let alone my other concerns!
 
 Jason
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Paul (Chair, IBTA IBXoE WG)

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: When IBoE will be merged to upstream?

2010-07-15 Thread Liran Liss
   The text is saying that the specification does not use any of the 
   LID fields in the verbs interface, that is it. It isn't talking 
   about MAC addresses.
   
   Exactly how and where the MAC address comes about was 
 never decided, 
   and at least some participants thought it should be a 1:1 
   algorithmic mapping from the GID.
   
   Ditto for VLANs, how and where the vlan tag comes about 
 is not part 
   of the spec.
 
  You are trying to rewrite history.
  Read the spec, address handles fields are fixed.
 
 Not really, this was all discussed on this list before the 
 IBxoE working group was formed,

The paragraph above is about the RoCE spec. And *this list* did not write the 
RoCE spec.

 it was discussed in the 
 working group,

The RoCE spec adopts the verbs defined in the base IB spec and does not add any 
new input modifiers to the AH verb. You may not agree with it but that does not 
change the spec.

 I objected to the draft spec leaving this area 
 absent, even.

You should submit a comment on this matter using the IBTA comment tracker 
database if you intend your concern to be taken into account.

 The spec doesn't say squat about how MAC and 
 VLAN values get into the AH,

True. The spec does not say it because there are no MAC and VLAN input 
modifiers to the create AH verb. The spec assumes the resolution from the L3 
address happens below the channel interface.

 and you have already heard how 
 my opinion on this subject differs from others.

I never attempted to misrepresent your opinion. I am just pointing out what the 
RoCE spec says.

 
   But, even if we do get there some day then we could extend the AH.
  
  This is unacceptable - we are not going to add another L3 
 identifier.
 
 It wouldn't be adding another L3 itentifier it would be an L2 
 next hop MAC address for the router. It would be nice to do 
 this from the start but if growing the AH is really that 
 scary then it should wait until someone figures out how to 
 solve the lossless routing problem on ethernet.

Augmenting the AH has a significant cost. There is a tradeoff here between 
preserving the verbs api vs. dealing with the implementation challenges 
associated with doing address resolution below the verbs. The RoCE spec 
deliberately chooses one direction. You seem to favor the other one. But in the 
interest of progress and since we all seem to agree on the way things work when 
we use link local GIDs, let us move forward with that approach for now. And we 
can get back to non local GIDs later.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: When IBoE will be merged to upstream?

2010-07-15 Thread Liran Liss
   But, we can't mandate an overload of the GID in a way that 
 it   prevents its use as a true L3 address (eventually routable).
 
 Actually I'm beginning to think that the only possible way we 
 can use the GID in IBoE is as a link-local IPv6 addresses 
 containing an Ethernet address.  Trying to hide neighbour 
 discovery or ARP below the verbs doesn't seem workable -- 
 being forced to change the locking rules we've had for the 
 past 5+ years about create_ah is just the beginning.  We get 
 further problems if a remote address should ever change and 
 I'm probably missing other issues.
 

We believe the problems are workable. But let us stop arguing for a while and
make progress with link local addressees since we all seem to agree with that.
We can get back to non-local GIDs later

 So the best solution I can see is to declare that an IBoE GID 
 must be an
 IPv6 address coming from an EUI-64 Ethernet address for the 
 corresponding port; for MGIDs I guess we use the standard 
 IPv6 mapping to Ethernet address 33:33:xx:xx:xx:xx.
 
 I'm not sure how we want to handle IPv4 -- presumably unicast 
 ARP can be done within the RDMA CM, which will then create a 
 DGID with the appropriate Ethernet address.  However it's not 
 clear to me whether we need a way to create IPv4 
 (01:00:5e:xx:xx:xx) multicast addresses.
 
 Also, since there is no way to map a link-local IPv6 address 
 to a particular interface, then I guess we need a way to pass 
 in the VLAN tag to be used -- presumably we can steal some 
 other field for the 12 bits.
 (The fact that the IBoE annex does not mention VLANs or 
 802.1q a single time is just another thing that shows how 
 rushed and incomplete it is)
 
 With all this said, I think it means we do not need to do the 
 mapping from GID to Ethernet address in the kernel for IBoE 
 user verbs, since it is so simple -- we can simply add a 
 fairly trivial helper to libibverbs.
 
  - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: When IBoE will be merged to upstream?

2010-07-15 Thread Liran Liss
   A quibble about multicast - AFAIK this is unsolved. I 
 think some spec   needs to be agreed that documents what 
 sort of multicast snooping   operations switches need to do, 
 ie if IGMP joins imply that IBoE   traffic for the same DMAC 
 is included in the join, or if IBoE requires   a seperate 
 IGMP type process on its own ether-type. That would make it  
  much clearer what to do with MGIDs.


It would be quite naïve to require *new* snooping functionality in Eth 
switches. Some switches will gracefully apply to non-ip traffic the filtering 
information acquired through IGMP snooping. And some will just flood non-ip MC 
frames within the corresponding VLAN which is benign (e.g. that is the way FIP 
works). A cleaner solution would be based on MMRP but that, AFAIK, is not very 
widely deployed so it is less practical at this stage. 
 
 I agree -- the current spec is rather broken for multicast.  
 Choosing a different ethertype and then saying that all 
 switches will just flood multicast traffic is half-baked at best.


It is a realistic approach. Do you claim that there are switches that will not 
forward the packets?
 
   It would be nice to at least have a plan on how to 
 integrate a   non-link local address, if that is ever 
 necessary in future. An   extended AH with an additional 48 
 DMAC field seems reasonable to me?
 
 You mean have a next-hop destination + a final destination?  
 Could be done I guess.  But I'm not sure how having a routing 
 table where you have to look up 48-bit Ethernet addresses is 
 all that different from just having a standard Ethernet 
 forwarding table.

I guess Jason suggests regarding the GID as a true L3 address and using a new 
added L2 field for the next hop L2 address.

 
 I suppose something based on MAC-in-MAC (a la 802.1ah) could 
 be done but to be honest the IBoE spec that the IBTA came up 
 with looks rather broken for routing.

Routing is out of the scope of the current RoCE spec.
And I do not see how .1ah would be relevant for this purpose.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When IBoE will be merged to upstream?

2010-07-15 Thread Jason Gunthorpe
On Thu, Jul 15, 2010 at 01:55:32PM +0300, Liran Liss wrote:

  I objected to the draft spec leaving this area 
  absent, even.
 
 You should submit a comment on this matter using the IBTA comment
 tracker database if you intend your concern to be taken into account.

The position of IBTA is that the L2 layer is not specified as part of
the spec, so of course there is no talk of how to get/create L2
information. The spec is *silent* on the issue of L2 addressing, so,
IMHO, it is compltely wrong to assume it specs one approach over
another, just because it omits L2 addressing related
discussion/fields/etc.

It, unfortunately, becomes implementation defined - and if that means
an implementation chooses to extend the AH, then so be it.

This is the problem with rushing incomplete specs through :)

  It wouldn't be adding another L3 itentifier it would be an L2 
  next hop MAC address for the router. It would be nice to do 
  this from the start but if growing the AH is really that 
  scary then it should wait until someone figures out how to 
  solve the lossless routing problem on ethernet.

 Augmenting the AH has a significant cost. There is a tradeoff here
 between preserving the verbs api vs. dealing with the implementation
 challenges associated with doing address resolution below the
 verbs. The RoCE spec deliberately chooses one direction. You seem to
 favor the other one. But in the interest of progress and since we
 all seem to agree on the way things work when we use link local
 GIDs, let us move forward with that approach for now. And we can get
 back to non local GIDs later.

You still have to solve the problem with vlan tags, and either each
vlan interface has a seperate rdma interface or the tag has to flow
into the AH from the RDMA-CM.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When IBoE will be merged to upstream?

2010-07-15 Thread Roland Dreier
  No, I think all switches will flood unknown multicast packets.  But
  there is a reason that IGMP snooping was invented -- it is inefficient
  (to say the least) to flood all multicast traffic.

And by the way I view the fact that the IBoE spec does not say anything
at all about how to map MGIDs to Ethernet addresses as another serious
shortcoming of the spec.

 - R.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: When IBoE will be merged to upstream?

2010-07-15 Thread Alex Rosenbaum
 No, I think all switches will flood unknown multicast packets.  But
 there is a reason that IGMP snooping was invented -- it is inefficient
 (to say the least) to flood all multicast traffic.

 - R.


Small correction needed regarding the multicast forwarding.
Since we are talking about IPv6 multicast groups, which translate to
33:33:xx:xx:xx:xx MAC address, the router listener notification protocol
is going to be MLD and not IGMP. Still there are switches which support
MLD forwarding to prevent the network flooding.

Alex

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When IBoE will be merged to upstream?

2010-07-15 Thread Roland Dreier
  Small correction needed regarding the multicast forwarding.
  Since we are talking about IPv6 multicast groups, which translate to
  33:33:xx:xx:xx:xx MAC address, the router listener notification protocol
  is going to be MLD and not IGMP. Still there are switches which support
  MLD forwarding to prevent the network flooding.

Well as I said the mapping of IBoE MGID to Ethernet address is not
specified.  However I agree that using the same mapping as IPv6 so we
end up with 33:33:... addresses makes sense.

Yes, you are right that MLD snooping is the mechanism for switches to
discover IPv6 multicast group membership.  However for the IBoE case
there is no requirement that IPv6 multicast group membership corresponds
in any way to the IBoE multicast group membership for the interface (and
indeed as far as I can tell from the IBoE spec, there is no requirement
that any IPv6 interface be configured on an IBoE port).

Furthermore, even if an IBoE interface sends MLD messages for a given
IPv6 group, there is no requirement that a switch use the membership
information for that group to forward multicast packets with a non-IPv6
ethertype.

 - R.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: When IBoE will be merged to upstream?

2010-07-13 Thread Liran Liss
  ...A verbs consumer using a RoCE network relies strictly
   on so-called
Layer 3 addressing (GIDs); layer 2 addresses (e.g. subnet local
identifiers) are not passed across the verbs interface...
   
   Ah, hmm, well, I was on that list during this time and I don't
  think
   this statement means what you are saying it does  :)
  
 
  ?? It doesn't get any clearer than this.
   
 'subnet local identidifer' == LID
 
 The text is saying that the specification does not use any of the LID
 fields in the verbs interface, that is it. It isn't talking about MAC
 addresses.
 
 Exactly how and where the MAC address comes about was never decided,
 and at least some participants thought it should be a 1:1 algorithmic
 mapping from the GID.
 
 Ditto for VLANs, how and where the vlan tag comes about is not part of
 the spec.
 

You are trying to rewrite history.
Read the spec, address handles fields are fixed.

  Good idea! This is exactly what we do today for addresses that the
  user explicitly declares as link-local addresses.  But, we can't
  mandate an overload of the GID in a way that it prevents its use as
  a true L3 address (eventually routable).
 
 We are very unlikely to see routable IBoE, ever..

Says who?

 But, even if we do get there some day then we could extend the AH.

This is unacceptable - we are not going to add another L3 identifier.
 
 BTW, I absolutely hate the mixing of 'Sometimes it is a IPv4,
 sometimes it is a GID, and sometimes it is an IPv6' in the same
 field. That is just so nasty. The GID is a GID, don't overload it in
 an ambiguous way to mean 2 other things!

A GID is a GID indeed -- in a RoCE environment, it's the layer 3 identifier.
All of our intended values are standard ipv6 encapsulations.

   create_ah does not accept any sort of source address 
   specifier
  
  You are wrong -- sgid_index specifies it.
 
 So, what do you propose to put in sgid_index? It isn't big enough to
 store an IPv6 address. You can't exactly number every IP assigned to
 every ethernet interface.

An iboe device is associated with a specific Ethernet interface. Thus, its gid 
table
only needs to map the ip addresses assigned to that interface.

 The other fields you mention are not a supserset of socket parameters,
 they are only IPv6 parameters, IPv4 uses a different set.

Like what?

  Jason, bottom line, I think that we both agree that the rdmacm
  should do the address resolution.  The difference is that by having
  the rdmacm initially only bind to the device and complete the
  resolution later (by a call from create_ah()), we don't change the
  user API for *all* gid types.  Having addressed your concerns
  regarding resolution below the Verbs, we continue to believe that
  this is the best approach.
 
 Again, I don't see how what I've outlined changes the API in any
 way.

We currently support link-local gids, but the architecture must not limit the 
scope.

 
 Doing two routing lookups for the same connection is bad design, it is
 racey. L2 parameters have to flow from the first routing lookup in
 RDMA-CM to everything else.
 

So is caching L3--L2 mappings that change a second later...
So what?

 Liran, I don't think you have at all come close to addressing my
 concerns, you still haven't explained how a full route lookup is even
 possible in create_ah, for instance. Let alone my other concerns!
 
 Jason
 --
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When IBoE will be merged to upstream?

2010-07-13 Thread Jason Gunthorpe
On Tue, Jul 13, 2010 at 11:26:41AM +0300, Liran Liss wrote:

  'subnet local identidifer' == LID
  
  The text is saying that the specification does not use any of the LID
  fields in the verbs interface, that is it. It isn't talking about MAC
  addresses.
  
  Exactly how and where the MAC address comes about was never decided,
  and at least some participants thought it should be a 1:1 algorithmic
  mapping from the GID.
  
  Ditto for VLANs, how and where the vlan tag comes about is not part of
  the spec.

 You are trying to rewrite history.
 Read the spec, address handles fields are fixed.

Not really, this was all discussed on this list before the IBxoE
working group was formed, it was discussed in the working group, I
objected to the draft spec leaving this area absent, even. The spec
doesn't say squat about how MAC and VLAN values get into the AH, and
you have already heard how my opinion on this subject differs from
others.

  But, even if we do get there some day then we could extend the AH.
 
 This is unacceptable - we are not going to add another L3 identifier.

It wouldn't be adding another L3 itentifier it would be an L2 next hop
MAC address for the router. It would be nice to do this from the start
but if growing the AH is really that scary then it should wait until
someone figures out how to solve the lossless routing problem on ethernet.

  BTW, I absolutely hate the mixing of 'Sometimes it is a IPv4,
  sometimes it is a GID, and sometimes it is an IPv6' in the same
  field. That is just so nasty. The GID is a GID, don't overload it in
  an ambiguous way to mean 2 other things!
 
 A GID is a GID indeed -- in a RoCE environment, it's the layer 3 identifier.
 All of our intended values are standard ipv6 encapsulations.

What makes a GID a GID is the fact that it is a seperate addressing
space from IPv6! If it is a GID then you don't overload it, if it is
an IPv6 then you don't get to special case certain things, like link
local!

create_ah does not accept any sort of source address 
specifier
   
   You are wrong -- sgid_index specifies it.
  
  So, what do you propose to put in sgid_index? It isn't big enough to
  store an IPv6 address. You can't exactly number every IP assigned to
  every ethernet interface.
 
 An iboe device is associated with a specific Ethernet
 interface. Thus, its gid table only needs to map the ip addresses
 assigned to that interface.

A few messages ago you said there was only one RDMA device per
physical ethernet interface, not one per vlan! VLAN interfaces can
have overlapping addreses (ie IPv6 link local) so I really don't see
how creating an GID table helps dis-ambiguate these cases.

  Doing two routing lookups for the same connection is bad design, it is
  racey. L2 parameters have to flow from the first routing lookup in
  RDMA-CM to everything else.
 
 So is caching L3--L2 mappings that change a second later...
 So what?

No, it is not the same.

If you do a route lookup you get an atomic result from the routing
table that represents something an admin configured. If you do two
lookups and use information from both then the net result might be a
configuration that was never admin configured - ie you loose the
atomicity of route configuration change.

Normally ND mappings (L3-L2) track updates through the things that
use them. The fact this cannot happen with IBoE is another bug, and
again, a reason why it is unsuitable to treat a GID as an IPv6 address
when you cannot provide the same functionality.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When IBoE will be merged to upstream?

2010-07-13 Thread Jason Gunthorpe
On Mon, Jul 12, 2010 at 02:20:22PM -0700, Roland Dreier wrote:

 So the best solution I can see is to declare that an IBoE GID must be an
 IPv6 address coming from an EUI-64 Ethernet address for the
 corresponding port; for MGIDs I guess we use the standard IPv6 mapping
 to Ethernet address 33:33:xx:xx:xx:xx.

This is what I have been advocating..

A quibble about multicast - AFAIK this is unsolved. I think some spec
needs to be agreed that documents what sort of multicast snooping
operations switches need to do, ie if IGMP joins imply that IBoE
traffic for the same DMAC is included in the join, or if IBoE requires
a seperate IGMP type process on its own ether-type. That would make it
much clearer what to do with MGIDs.

IPv4 could be handled by mapping a IPv4 multicast address within an
IPv6 mapped address, if necessary.

It would be nice to at least have a plan on how to integrate a
non-link local address, if that is ever necessary in future. An
extended AH with an additional 48 DMAC field seems reasonable to me?

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When IBoE will be merged to upstream?

2010-07-13 Thread Roland Dreier
  A quibble about multicast - AFAIK this is unsolved. I think some spec
  needs to be agreed that documents what sort of multicast snooping
  operations switches need to do, ie if IGMP joins imply that IBoE
  traffic for the same DMAC is included in the join, or if IBoE requires
  a seperate IGMP type process on its own ether-type. That would make it
  much clearer what to do with MGIDs.

I agree -- the current spec is rather broken for multicast.  Choosing a
different ethertype and then saying that all switches will just flood
multicast traffic is half-baked at best.

  It would be nice to at least have a plan on how to integrate a
  non-link local address, if that is ever necessary in future. An
  extended AH with an additional 48 DMAC field seems reasonable to me?

You mean have a next-hop destination + a final destination?  Could be
done I guess.  But I'm not sure how having a routing table where you
have to look up 48-bit Ethernet addresses is all that different from
just having a standard Ethernet forwarding table.

I suppose something based on MAC-in-MAC (a la 802.1ah) could be done but
to be honest the IBoE spec that the IBTA came up with looks rather
broken for routing.

 - R.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When IBoE will be merged to upstream?

2010-07-12 Thread Roland Dreier
  But, we can't mandate an overload of the GID in a way that it
  prevents its use as a true L3 address (eventually routable).

Actually I'm beginning to think that the only possible way we can use
the GID in IBoE is as a link-local IPv6 addresses containing an Ethernet
address.  Trying to hide neighbour discovery or ARP below the verbs
doesn't seem workable -- being forced to change the locking rules we've
had for the past 5+ years about create_ah is just the beginning.  We get
further problems if a remote address should ever change and I'm probably
missing other issues.

So the best solution I can see is to declare that an IBoE GID must be an
IPv6 address coming from an EUI-64 Ethernet address for the
corresponding port; for MGIDs I guess we use the standard IPv6 mapping
to Ethernet address 33:33:xx:xx:xx:xx.

I'm not sure how we want to handle IPv4 -- presumably unicast ARP can be
done within the RDMA CM, which will then create a DGID with the
appropriate Ethernet address.  However it's not clear to me whether we
need a way to create IPv4 (01:00:5e:xx:xx:xx) multicast addresses.

Also, since there is no way to map a link-local IPv6 address to a
particular interface, then I guess we need a way to pass in the VLAN tag
to be used -- presumably we can steal some other field for the 12 bits.
(The fact that the IBoE annex does not mention VLANs or 802.1q a single
time is just another thing that shows how rushed and incomplete it is)

With all this said, I think it means we do not need to do the mapping
from GID to Ethernet address in the kernel for IBoE user verbs, since it
is so simple -- we can simply add a fairly trivial helper to libibverbs.

 - R.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: When IBoE will be merged to upstream?

2010-07-10 Thread Liran Liss
This discussion has derived into whether we need to expose Eth L2 params across 
the Verbs interface.
This point has been discussed extensively in the IBTA during the development of 
the RoCE spec. And the direction chosen by the spec is clear:

...A verbs consumer using a RoCE network relies strictly on so-called Layer
3 addressing (GIDs); layer 2 addresses (e.g. subnet local identifiers) are not 
passed across the verbs interface...

The motivation behind this direction had to do with preserving API transparency 
for applications.
The goal was to allow existing and future applications to run above RoCE and 
native IB without changes. 

As opposed to what it may seem at first sight, adding Eth L2 parameters to the 
address vector, *does not* make RoCE closer to IB.
It actually goes the other way around.
Here is a quick list of what would have to be changed if we were to include Eth 
L2 address parameters to the Address Vector and other structures/functions that 
expose L2 params:

Structure changes:
- ibv_wc
- ibv_ah_attr
-- ibv_qp_attr
-- rdma_ud_param
--- rdma_cm_event
- ibv_port_attr

Verb API changes:
- ibv_poll_cq()
- ibv_init_ah_from_wc()
- ibv_create_ah()

- ibv_query_qp()
- ibv_modify_qp()

- ibv_query_port()

- ibv_attach_mcast()
- ibv_detach_mcast()

rdmacm API changes:
- rdma_post_ud_send()
- rdma_get_cm_event()
- rdma_ack_cm_event()
 
As a result of this:
- Existing IB binaries would cease working over RoCE.
- Due to added fields in structures, even just recompiling existing 
applications from source would be problematic.
- To make future applications work on both ib and RoCE transparently, you would 
need additional wrappers such as init_ah(), copy_ah(), and ah_is_equal(), and 
never inspect address handle fields directly.

So why introduce differences between RoCE and IB (for the Application 
writers!!) when they *aren't* needed? Using rdmacm won't solve this either (UD 
traffic).
By following the direction set forth by the RoCE spec none of this is required. 
Existing (rdmacm) application binaries do run over RoCE or IB unchanged.

Granted, the RoCE spec approach introduces 2 *implementation* issues that we 
need to tackle:
1. Address resolution, which is a generic function, should not be a 
device-specific call.
In this matter, we already proposed a solution where resolution is done, as 
required, in generic functions in the kernel.
Specifically, we provide L2 information to user-space drivers via create_ah(), 
avoiding the need to add a new ABI call altogether, while the resolution would 
take place in a generic CMA routine.

2. The Kernel currently assumes that create_ah() can execute in atomic context.
One option is to distinguish between the create_ah() calls (in the kernel) that 
are done for iboe, which are very few, and the rest of the calls that are 
ib-only.
There are other approaches to solve this as well.

It seems clear that our goal should be to solve these issues inside the kernel, 
in the cleanest manner as possible, while preserving transparency to the 
applications.

Comments are welcome.

Liran 

 -Original Message-
 From: Or Gerlitz [mailto:ogerl...@voltaire.com] 
 Sent: Wednesday, July 07, 2010 9:00 AM
 To: Liran Liss
 Cc: Roland Dreier; Jason Gunthorpe; Hefty, Sean; Aleksey 
 Senin; linux-rdma; mo...@voltaire.com; aleks...@voltaire.com; 
 yift...@voltaire.com; Tziporet Koren; al...@voltaire.com
 Subject: Re: When IBoE will be merged to upstream?
 
 Liran Liss wrote:
  but keeping ib_create_ah() callable from any context is not 
 a goal by itself.
 
 going with your approach, if your proposed design is 
 accepted, I believe that you probably need to patch all the 
 code-chains that makes calls under the current assumption
 
  I am looking for constructive ideas for supporting iboe without 
  breaking Verbs/CQE/CM syntax.
 
 I don't agree that exposing the Ethernet L2 related 
 information to the caller is breaking something, the 
 converse, it is a required enhancement. 
 
 I think we need to let resolve through the rdma-cm  get to 
 know at the consumer level, what are the source / destination 
 macs, vlan id and vlan priority used by an IBoE QP, in the 
 exact manner all the IB equivalents (src/dst lid, pkey, sl) 
 are resolved by the rdma-cm and exposed to the consmer app for IB QP.
 
 Or.
 
 --
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When IBoE will be merged to upstream?

2010-07-10 Thread Jason Gunthorpe
On Sat, Jul 10, 2010 at 11:55:18AM +0300, Liran Liss wrote:

 ...A verbs consumer using a RoCE network relies strictly on
 so-called Layer 3 addressing (GIDs); layer 2 addresses (e.g. subnet
 local identifiers) are not passed across the verbs interface...

Ah, hmm, well, I was on that list during this time and I don't think
this statement means what you are saying it does :)
 
 As opposed to what it may seem at first sight, adding Eth L2
 parameters to the address vector, *does not* make RoCE closer to IB.
 It actually goes the other way around.  Here is a quick list of what
 would have to be changed if we were to include Eth L2 address
 parameters to the Address Vector and other structures/functions that
 expose L2 params:

Umh, no.. Stick the L2 address in the GID itself, stick the vlan tag
in either the DLID or the PKey and you are done. No structures get
bigger, nothing really changes.

There are already AH differences between iwarp and IB and generic code
to handle them.

rdmacm can support address resultion only for UD applications.

Still not seeing how this is a big issue? 

You cannot hide destination resolution in create_ah. You just
can't. create_ah does not accept any sort of source address specifier
which is absolutely critical to reoslve a destination ip address in
all cases, ie for instance IPv6 link local addresses, but there are
other cases too.

If IP semantics are going to be used then *ALL* linux IP semantics
must be supported, not just those that are convinent to implement, and
that means you need a source IP address, device, QOS parameters and
destination IP address to make a route determination.

The only subsystem in verbs that has this information is RDMA CM.

To me this is a fundamental problem that completely nixes L3 path 
resolution in create_ah as a possible solution - as you explained
you need to do the L3 path resolution to figure out the vlan tag to
use, to get the desired IP netdevice, to execute the ND query..

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When IBoE will be merged to upstream?

2010-07-07 Thread Or Gerlitz
Liran Liss wrote:
 but keeping ib_create_ah() callable from any context is not a goal by itself.

going with your approach, if your proposed design is accepted, I believe that 
you probably need to patch all the code-chains that makes calls under the 
current assumption

 I am looking for constructive ideas for supporting iboe without breaking 
 Verbs/CQE/CM syntax. 

I don't agree that exposing the Ethernet L2 related information to the caller 
is breaking something, the converse, it is a required enhancement. 

I think we need to let resolve through the rdma-cm  get to know at the 
consumer level, what are the source / destination macs, vlan id and vlan 
priority used by an IBoE QP, in the exact manner all the IB equivalents 
(src/dst lid, pkey, sl) are resolved by the rdma-cm and exposed to the consmer 
app for IB QP.

Or.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When IBoE will be merged to upstream?

2010-07-03 Thread Roland Dreier
  Third, RoCE is not IB; its all about making RDMA user-friendly to Ethernet 
  users.

This is utter nonsense.  RoCE (or IBoE as I prefer ;) is absolutely
IB-over-Ethernet and it is all about making minimal changes to IB and IB
applications to run on Ethernet.

  Most importantly, we don't want to change the way Ethernet networks are 
  managed.

That makes sense.  However let's be honest with ourselves -- the
fraction of Ethernet networks using IPv6 as their only or even main
address scheme is pretty small.  Of course having a migration path to
work with IPv6 is important, but for the moment users want to use IPv4
addresses to specify destinations.

  - RoCE gids are L3 addresses, which are not (necessarily) of link-local
scope; people will mostly use IP-mapped gids of global scope.
  - These gids will map to an IP address, which then can resolve to an
outgoing vlan device exactly as in Ethernet.

At that level it all makes sense, but the problem is the specifics of
where, when and how the mapping is done.

  We have a specification, we have an implementation, and we have clean
  way of passing RoCE L2 information to user-space via address handles.

We may have an implementation but we absolutely don't have a
specification.  Or at least the IBA annex has nothing beyond this:

A16.5.1 ADDRESS ASSIGNMENT AND RESOLUTION

Layer 2 local addresses (i.e. SMAC, DMAC), and the methods by which
those addresses are assigned, are outside the scope of this annex.

The means for resolving a GID to a local port address (i.e. SMAC or
DMAC) are outside the scope of this annex. It is assumed that
standard Ethernet mechanisms, such as ARP or Neighbor Discovery are
used to maintain an appropriate address cache for RoCE ports.

which was really pretty unfortunate, since it means the exact point
we're talking about is completely unspecified.  Or is there some other
spec you can point to?

(This also means it's pretty important that we get this right, since
every future implementation is going to have a lot of pressure to follow
what Linux does)

  I don't see any substantial reason to change the basic approach.

I don't really even know what the basic approach is.  For example what's
the plan for handling GIDs that aren't derived from a MAC address?  For
a long time we've assumed that the create_ah verb can't sleep, so where
are you going to do neighbor discovery?

 - R.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When IBoE will be merged to upstream?

2010-07-02 Thread Jason Gunthorpe
On Thu, Jul 01, 2010 at 02:50:55PM +0300, Liran Liss wrote:

 We have a specification, we have an implementation, and we have
 clean way of passing RoCE L2 information to user-space via address
 handles.  I don't see any substantial reason to change the basic
 approach.

Actually, we have a spec that omits how to do the L3 to L2 address
mapping - and there is much disagreement on this point. Since no
standard was reached then agreement will have to be reached at least
with the Linux maintainers before I think anything can realistically
be merged.

The basic approach must be something people can agree on, and it seem
pretty clear to me at least that the current approach is not agreeable
to many people.

All three of your points are all entirely avoidable if you simply
stick to the idea that the L3 address is a GID and not an IP, and rely
on IPoIB like RDMA-CM mechanisms to go from IP to GID to L2 MAC/VLAN.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When IBoE will be merged to upstream?

2010-07-02 Thread Jason Gunthorpe
On Thu, Jul 01, 2010 at 05:20:05PM -0700, Hefty, Sean wrote:

  The most evident example is the CM protocol, which has L2 fields in its
  payloads.
 
 How does moving the L3 to L2 mapping from outside create AH result
 in the CM protocol needing to change?  Why is hiding this under modify
 QP desirable?

I'm not too sure about that either.. The concept of a reduced
functionality TCA does not seem to have been considered as part of
RoCE, there is nothing like APM or path selection, so it is reasonable
to assume that the target in a CM can do the PR lookup equivilent to
get from a GID to the L2 information. The RoCE spec does not specify
that DMAC addresses must be exact match to QP data, so this is not a
problem like it would be in IB.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: When IBoE will be merged to upstream?

2010-07-01 Thread Hefty, Sean
 There are also good reasons why the RoCE standard left the syntax of
 address handles.
 First, it keeps the Verbs unchanged. Even if you are using rdmacm to make
 connections, you still have to inspect address handles when connecting to
 UD QPs or joining multicast addresses.

Verbs isn't an API, and the verbs usage changed anyway - the L2 is no longer 
provided by the user.  The spec is wrong to make the L3 to L2 mapping device 
specific, if that was the intent.  ARP or neighbor discovery should not be 
hidden under create AH or modify QP.

 Second, making Ethernet L2 fields explicit has implications beyond the
 address handle and CQE formats. Specifically, a lot of the IBTA defined
 MADs must be modified as well.

Specifically, which MADs and why?

 The most evident example is the CM protocol, which has L2 fields in its
 payloads.

How does moving the L3 to L2 mapping from outside create AH result in the CM 
protocol needing to change?  Why is hiding this under modify QP desirable?

 Third, RoCE is not IB; its all about making RDMA user-friendly to Ethernet
 users.

Then why is it part of the IBTA?

 Most importantly, we don't want to change the way Ethernet networks are
 managed.
 This means that admins configure their normal network interfaces, define
 VLAN sub-interfaces, assign IP addresses (or use DHCP), and then work with
 RoCE using IP-mapped addresses, which reference the same IP addresses they
 use for their Ethernet interfaces.
 So, regarding our VLAN discussion:
 - RoCE gids are L3 addresses, which are not (necessarily) of link-local
 scope; people will mostly use IP-mapped gids of global scope.
 - These gids will map to an IP address, which then can resolve to an
 outgoing vlan device exactly as in Ethernet.
 
 We have a specification, we have an implementation, and we have clean way
 of passing RoCE L2 information to user-space via address handles.
 I don't see any substantial reason to change the basic approach.

All feedback on this has been discarded since the initial patches were 
submitted, and I still don't see a substantial reason why L3 to L2 mappings are 
device specific, or why significant network protocols should be hidden under 
verbs calls.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: When IBoE will be merged to upstream?

2010-06-25 Thread Liran Liss
VLANs are part of L2 in Ethernet -- when you resolve a destination L3 address 
to an L2 address, you get the outgoing interface, which also determines the 
VLAN.
I think this approach has an advantage over an RDMA device per VLAN in that you 
keep the standard OS VLAN management (vconfig).

I wouldn't judge the RoCE spec so quickly --- it guarantees that rdma 
application binaries could run on any network.
What do you gain by exposing Eth-specific L2 params in the address handle?
--Liran


-Original Message-
From: Jason Gunthorpe [mailto:jguntho...@obsidianresearch.com] 
Sent: Thursday, June 24, 2010 11:37 PM
To: Liran Liss
Cc: Hefty, Sean; Roland Dreier; Aleksey Senin; linux-rdma; mo...@voltaire.com; 
aleks...@voltaire.com; yift...@voltaire.com; Tziporet Koren; al...@voltaire.com
Subject: Re: When IBoE will be merged to upstream?

 The current behavior of ibv_create_ah() requires that the caller 
 provide the L2, and if needed, L3 addressing.  Any translation between 
 the L3 and L2 addressing must be done before the call is made.  E.g. 
 ibv_create_ah does not use the GID to query the SA to obtain LIDs.  
 Why doesn't IBoE follow this same model?
 
 LL: because of the RoCE spec, which states that only GID addressing is 
 used at the Verbs level. The address handle fields are unchanged, and 
 the L2 fields (e.g., lid) are reserved.  Note that in Ethernet, you 
 normally don't specify L2 addresses at the transport level (i.e., 
 sockets).

We do not have to lavishly follow the IBTA spec in the Linux implementation, 
especially if it makes no sense.

I think Sean is on the right track here, the AH should take the L2 as input 
just like for IB, and the resolution is done in librdmacm, or somehow manually.

The verbs layer is not really analogous to sockets anyhow, the librdmacm is 
much closer to a socket like interface, and it having a GID go into rdmacm and 
a full AH with L2/L3 info come out seems entirely reasonable.

BTW, what ever was decided about vlans tagging? Is that part of the AH or do 
you use seperate RDMA devices per vlan? Seems like a point worth considering 
now.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When IBoE will be merged to upstream?

2010-06-25 Thread Jason Gunthorpe
On Fri, Jun 25, 2010 at 11:04:28AM +0300, Liran Liss wrote:

 VLANs are part of L2 in Ethernet -- when you resolve a destination
 L3 address to an L2 address, you get the outgoing interface, which
 also determines the VLAN.  I think this approach has an advantage
 over an RDMA device per VLAN in that you keep the standard OS VLAN
 management (vconfig).

Except that in RoCE all L3 addresses are link local GIDs, which must
be scoped to an interface and cannot be resolved by routing to a
specific interface. vconfig creates child ethernet devices, I think
you have no choice but to do the same for RDMA. The GID, when it is
resolved, must be scoped to the RDMA device it is going to be bound
to, which in turn must be bound to a VLAN.

(BTW, Sean, did AF_IB's sockaddr include a scoping field, and did you
figure out some way to make that work?)

 I wouldn't judge the RoCE spec so quickly --- it guarantees that
 rdma application binaries could run on any network.  What do you
 gain by exposing Eth-specific L2 params in the address handle?

Well, 1) invariably that is how the hardware must work, and verbs is
about exposing that interface to userspace 2) You don't suddenly make
AH setup require network traffic, and potentitally large time
delays 3) it keeps the whole RoCE architecture far more consistent
with IB.

You can pose the same question for IB, why doesn't AH resolution
resolve the GID? There are lots of good answers :)

Also bear in mind that APM is entirely possible over RoCE and doing
that will require a finer touch for managing the data in the AH's.

What do you get by doing all this extra work? I say nothing at
all. Users won't even be able to tell the difference as long as they
use rdmacm to setup the connections.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: When IBoE will be merged to upstream?

2010-06-25 Thread Hefty, Sean
 (BTW, Sean, did AF_IB's sockaddr include a scoping field, and did you
 figure out some way to make that work?)

The af_ib does include a scoping field, but the last set of patches doesn't 
make use of it.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: When IBoE will be merged to upstream?

2010-06-24 Thread Liran Liss
Regarding GID to Eth mappings, we discussed using the standard create_ah() Verb 
for this.
In the kernel, create_ah() will call a generic address resolution function in 
the cma.
The returned information will be copied back to user-space in a device-specific 
structure (since address handles are device-specific).

This eliminates adding a new get_eth_l2_addr() ABI call (device-specific or 
not).
In fact, this approach eliminates adding new ABIs for any kind of address 
translation...

Does this seem reasonable?
--Liran


-Original Message-
From: linux-rdma-ow...@vger.kernel.org 
[mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Roland Dreier
Sent: Thursday, June 24, 2010 2:24 AM
To: Aleksey Senin
Cc: linux-rdma; mo...@voltaire.com; aleks...@voltaire.com; 
yift...@voltaire.com; Tziporet Koren; al...@voltaire.com
Subject: Re: When IBoE will be merged to upstream?

  This is actually a continue of the RAW_ET()  issue. We want to   make a 
  submition  of the patches to the upstream, but there is not   support for 
  IB transport in Ethernet devices, and the mlx4_en drivers   version  is a 
  bit outdated 1.4.1.1 in upstream and 1.5.1 in the OFED   There is also 
  missing VLAN support that already present in the OFED.
  When do you planning to submit changes from OFED to upstream?

 - I do not search for more things to merge upstream.  I have enough
   work reviewing things that are sent to me.  So I will never look
   through OFED for changes.

 - I do not handle the mlx4_en driver.  Changes for mlx4_en should go to
   netdev and Dave Miller.

 - I will try to get back to the IBoE changes when I have time, and I
   will admit that my time to spend as RDMA maintainer is nowhere near
   full time and less than it was in the past.

 - I did allocate a fair amount of time to spend on IBoE recently but
   unfortunately the patches were not really in a suitable state to
   merge, and I exhausted that time slice before we reached the end.
   When patch sets sit outside of the upstream kernel and are shipped in
   OFED for months and years, it would probably make upstream merging
   easier if that time was used to fix the patch set.

 - Specifically for the IBoE patches, shouldn't someone have realized
   that having a device-specific interface to do the standard mapping of
   GID to Ethernet address makes no sense?
--
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: When IBoE will be merged to upstream?

2010-06-24 Thread Liran Liss
S.B.
--Liran 

-Original Message-
From: Hefty, Sean [mailto:sean.he...@intel.com] 
Sent: Thursday, June 24, 2010 9:06 PM
To: Liran Liss; Roland Dreier; Aleksey Senin
Cc: linux-rdma; mo...@voltaire.com; aleks...@voltaire.com; 
yift...@voltaire.com; Tziporet Koren; al...@voltaire.com
Subject: RE: When IBoE will be merged to upstream?

 Regarding GID to Eth mappings, we discussed using the standard 
 create_ah() Verb for this.
 In the kernel, create_ah() will call a generic address resolution 
 function in the cma.
 The returned information will be copied back to user-space in a 
 device- specific structure (since address handles are device-specific).
 
 This eliminates adding a new get_eth_l2_addr() ABI call 
 (device-specific or not).
 In fact, this approach eliminates adding new ABIs for any kind of 
 address translation...
 
 Does this seem reasonable?

The current behavior of ibv_create_ah() requires that the caller provide the 
L2, and if needed, L3 addressing.  Any translation between the L3 and L2 
addressing must be done before the call is made.  E.g. ibv_create_ah does not 
use the GID to query the SA to obtain LIDs.  Why doesn't IBoE follow this same 
model?

LL: because of the RoCE spec, which states that only GID addressing is used at 
the Verbs level. The address handle fields are unchanged, and the L2 fields 
(e.g., lid) are reserved.
Note that in Ethernet, you normally don't specify L2 addresses at the transport 
level (i.e., sockets).

Callers can use some out of band mechanism for the mapping, call 
rdma_resolve_addr, or use some standard networking routine.
LL: this would require changes to the Verbs API. rdmacm programs addresses 
using user-space Verbs, or even just passes the application just address handle 
attributes...
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When IBoE will be merged to upstream?

2010-06-23 Thread Roland Dreier
  This is actually a continue of the RAW_ET()  issue. We want to
  make a submition  of the patches to the upstream, but there is not
  support for IB transport in Ethernet devices, and the mlx4_en drivers
  version  is a bit outdated 1.4.1.1 in upstream and 1.5.1 in the OFED
  There is also missing VLAN support that already present in the OFED.
  When do you planning to submit changes from OFED to upstream?

 - I do not search for more things to merge upstream.  I have enough
   work reviewing things that are sent to me.  So I will never look
   through OFED for changes.

 - I do not handle the mlx4_en driver.  Changes for mlx4_en should go to
   netdev and Dave Miller.

 - I will try to get back to the IBoE changes when I have time, and I
   will admit that my time to spend as RDMA maintainer is nowhere near
   full time and less than it was in the past.

 - I did allocate a fair amount of time to spend on IBoE recently but
   unfortunately the patches were not really in a suitable state to
   merge, and I exhausted that time slice before we reached the end.
   When patch sets sit outside of the upstream kernel and are shipped in
   OFED for months and years, it would probably make upstream merging
   easier if that time was used to fix the patch set.

 - Specifically for the IBoE patches, shouldn't someone have realized
   that having a device-specific interface to do the standard mapping of
   GID to Ethernet address makes no sense?
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html