Re: [PATCHv8 04/11] ib_core: IBoE CMA device binding

2010-05-14 Thread Roland Dreier
   I'm having a hard time working out why the iboe case needs to schedule
   to a work queue here since its already in process context, right?  It
   seems it would be really preferable to avoid all the extra pointer
   munging and reference counting, and just call things directly.

  I assume that the caller might attempt to acquire the same lock when
  calling join and in the callback. Specifically, ucma_join_multicast()
  calls rdma_join_multicast() with file-mut acquired and
  ucma_event_handler() does the same.

I see... we can't call the consumer's callback directly since it might
have locking assumptions.

It would be nice if we didn't have this reference counting sometimes
used and sometimes not used.  I'll have to think about whether this can
be made cleaner.

 - R.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv8 04/11] ib_core: IBoE CMA device binding

2010-05-13 Thread Eli Cohen
On Wed, May 12, 2010 at 01:14:33PM -0700, Roland Dreier wrote:
   Multicast GIDs are always mapped to multicast MACs
   as is done in IPv6. Some helper functions are added to ib_addr.h.  IPv4
   multicast is enabled by translating IPv4 multicast addresses to IPv6 
 multicast
   as described in
   http://www.mail-archive.com/i...@sunroof.eng.sun.com/msg02134.html.
 
 I guess it's a bit unfortunate that the RoCE annex completely ignored
 how to map multicast GIDs to ethernet addresses (I suppose as part of
 the larger decision to ignore address resolution entirely).  Anyway,
 looking at the email message you reference, it seems to be someone
 asking what the right way to map IPv4 multicast addresses to IPv6
 addresses is.  Is there a more definitive document you can point to?

I am not aware of any definitive document that addresses this issue.

 
 It seems that unfortunately the way the layering of addresses is done,
 there's no way to just use the standard mapping of IPv4 multicast
 addresses to Ethernet addresses (since IBoE is does addressing via
 the CMA mapping to GIDs followed by an unspecified mapping from GIDs to
 Ethernet addresses).
 

It is natural to treat gids as IPv6 addresses, so using the standard
mapping from IPv6 multicast addresses to mac addresses seems reasonable.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv8 04/11] ib_core: IBoE CMA device binding

2010-05-12 Thread Roland Dreier
   int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
  - struct ib_sa_path_rec *rec, struct ib_ah_attr *ah_attr)
  + struct ib_sa_path_rec *rec, struct ib_ah_attr *ah_attr,
  + int force_grh)

Rather than this change in API, could we just have this function look at
the link layer, and if it's ethernet, then always set the GRH flag?
Seems simpler than requiring the upper layers to do this and then pass
the result in?
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv8 04/11] ib_core: IBoE CMA device binding

2010-05-12 Thread Roland Dreier
  Multicast GIDs are always mapped to multicast MACs
  as is done in IPv6. Some helper functions are added to ib_addr.h.  IPv4
  multicast is enabled by translating IPv4 multicast addresses to IPv6 
  multicast
  as described in
  http://www.mail-archive.com/i...@sunroof.eng.sun.com/msg02134.html.

I guess it's a bit unfortunate that the RoCE annex completely ignored
how to map multicast GIDs to ethernet addresses (I suppose as part of
the larger decision to ignore address resolution entirely).  Anyway,
looking at the email message you reference, it seems to be someone
asking what the right way to map IPv4 multicast addresses to IPv6
addresses is.  Is there a more definitive document you can point to?

It seems that unfortunately the way the layering of addresses is done,
there's no way to just use the standard mapping of IPv4 multicast
addresses to Ethernet addresses (since IBoE is does addressing via
the CMA mapping to GIDs followed by an unspecified mapping from GIDs to
Ethernet addresses).

 - R.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv8 04/11] ib_core: IBoE CMA device binding

2010-02-18 Thread Eli Cohen
Add support for IBoE device binding and IP -- GID resolution. Path resolving
and multicast joining are implemented within cma.c by filling the responses and
pushing the callbacks to the cma work queue. IP-GID resolution always yields
IPv6 link local addresses - remote GIDs are derived from the destination MAC
address of the remote port. Multicast GIDs are always mapped to multicast MACs
as is done in IPv6. Some helper functions are added to ib_addr.h.  IPv4
multicast is enabled by translating IPv4 multicast addresses to IPv6 multicast
as described in
http://www.mail-archive.com/i...@sunroof.eng.sun.com/msg02134.html.

Signed-off-by: Eli Cohen e...@mellanox.co.il
---

Chages from v7:

1. Add force_grh flag to ib_init_ah_from_path() to request IB_AH_GRH for
   IB_LINK_LAYER_ETHERNET ports thus allowing to use hop limit 1 in path
   records.
2. cma_acquire_dev() finds the cma_dev by first assuming an iboe type device
   for none ARPHRD_INFINIBAND dev type. If it fails to do that, it falls back to
   old method.


 drivers/infiniband/core/cm.c  |5 +-
 drivers/infiniband/core/cma.c |  283 +++--
 drivers/infiniband/core/sa_query.c|5 +-
 drivers/infiniband/core/ucma.c|   45 -
 drivers/infiniband/ulp/ipoib/ipoib_main.c |2 +-
 include/rdma/ib_addr.h|   98 ++-
 include/rdma/ib_sa.h  |3 +-
 7 files changed, 412 insertions(+), 29 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 5130fc5..6513b1c 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -351,6 +351,7 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, 
struct cm_av *av)
unsigned long flags;
int ret;
u8 p;
+   int force_grh;
 
read_lock_irqsave(cm.device_lock, flags);
list_for_each_entry(cm_dev, cm.device_list, list) {
@@ -371,8 +372,10 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, 
struct cm_av *av)
return ret;
 
av-port = port;
+   force_grh = rdma_port_link_layer(cm_dev-ib_device, port-port_num) ==
+   IB_LINK_LAYER_ETHERNET ? 1 : 0;
ib_init_ah_from_path(cm_dev-ib_device, port-port_num, path,
-av-ah_attr);
+av-ah_attr, force_grh);
av-timeout = path-packet_life_time + 1;
return 0;
 }
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index cc9b594..df5f636 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -58,6 +58,7 @@ MODULE_LICENSE(Dual BSD/GPL);
 #define CMA_CM_RESPONSE_TIMEOUT 20
 #define CMA_MAX_CM_RETRIES 15
 #define CMA_CM_MRA_SETTING (IB_CM_MRA_FLAG_DELAY | 24)
+#define IBOE_PACKET_LIFETIME 18
 
 static void cma_add_one(struct ib_device *device);
 static void cma_remove_one(struct ib_device *device);
@@ -157,6 +158,7 @@ struct cma_multicast {
struct list_headlist;
void*context;
struct sockaddr_storage addr;
+   struct kref mcref;
 };
 
 struct cma_work {
@@ -173,6 +175,12 @@ struct cma_ndev_work {
struct rdma_cm_eventevent;
 };
 
+struct iboe_mcast_work {
+   struct work_struct   work;
+   struct rdma_id_private  *id;
+   struct cma_multicast*mc;
+};
+
 union cma_ip_addr {
struct in6_addr ip6;
struct {
@@ -281,6 +289,8 @@ static void cma_attach_to_dev(struct rdma_id_private 
*id_priv,
atomic_inc(cma_dev-refcount);
id_priv-cma_dev = cma_dev;
id_priv-id.device = cma_dev-device;
+   id_priv-id.route.addr.dev_addr.transport =
+   rdma_node_get_transport(cma_dev-device-node_type);
list_add_tail(id_priv-list, cma_dev-id_list);
 }
 
@@ -290,6 +300,14 @@ static inline void cma_deref_dev(struct cma_device 
*cma_dev)
complete(cma_dev-comp);
 }
 
+static inline void release_mc(struct kref *kref)
+{
+   struct cma_multicast *mc = container_of(kref, struct cma_multicast, 
mcref);
+
+   kfree(mc-multicast.ib);
+   kfree(mc);
+}
+
 static void cma_detach_from_dev(struct rdma_id_private *id_priv)
 {
list_del(id_priv-list);
@@ -330,15 +348,29 @@ static int cma_acquire_dev(struct rdma_id_private 
*id_priv)
union ib_gid gid;
int ret = -ENODEV;
 
-   rdma_addr_get_sgid(dev_addr, gid);
+   if (dev_addr-dev_type != ARPHRD_INFINIBAND) {
+   iboe_addr_get_sgid(dev_addr, gid);
+   list_for_each_entry(cma_dev, dev_list, list) {
+   ret = ib_find_cached_gid(cma_dev-device, gid,
+id_priv-id.port_num, NULL);
+   if (!ret)
+   goto out;
+   }
+   }
+
+   memcpy(gid, dev_addr-src_dev_addr +
+  rdma_addr_gid_offset(dev_addr), sizeof gid);