Re: [PATCHv8 04/11] ib_core: IBoE CMA device binding
I'm having a hard time working out why the iboe case needs to schedule to a work queue here since its already in process context, right? It seems it would be really preferable to avoid all the extra pointer munging and reference counting, and just call things directly. I assume that the caller might attempt to acquire the same lock when calling join and in the callback. Specifically, ucma_join_multicast() calls rdma_join_multicast() with file-mut acquired and ucma_event_handler() does the same. I see... we can't call the consumer's callback directly since it might have locking assumptions. It would be nice if we didn't have this reference counting sometimes used and sometimes not used. I'll have to think about whether this can be made cleaner. - R. -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv8 04/11] ib_core: IBoE CMA device binding
On Wed, May 12, 2010 at 01:14:33PM -0700, Roland Dreier wrote: Multicast GIDs are always mapped to multicast MACs as is done in IPv6. Some helper functions are added to ib_addr.h. IPv4 multicast is enabled by translating IPv4 multicast addresses to IPv6 multicast as described in http://www.mail-archive.com/i...@sunroof.eng.sun.com/msg02134.html. I guess it's a bit unfortunate that the RoCE annex completely ignored how to map multicast GIDs to ethernet addresses (I suppose as part of the larger decision to ignore address resolution entirely). Anyway, looking at the email message you reference, it seems to be someone asking what the right way to map IPv4 multicast addresses to IPv6 addresses is. Is there a more definitive document you can point to? I am not aware of any definitive document that addresses this issue. It seems that unfortunately the way the layering of addresses is done, there's no way to just use the standard mapping of IPv4 multicast addresses to Ethernet addresses (since IBoE is does addressing via the CMA mapping to GIDs followed by an unspecified mapping from GIDs to Ethernet addresses). It is natural to treat gids as IPv6 addresses, so using the standard mapping from IPv6 multicast addresses to mac addresses seems reasonable. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv8 04/11] ib_core: IBoE CMA device binding
int ib_init_ah_from_path(struct ib_device *device, u8 port_num, - struct ib_sa_path_rec *rec, struct ib_ah_attr *ah_attr) + struct ib_sa_path_rec *rec, struct ib_ah_attr *ah_attr, + int force_grh) Rather than this change in API, could we just have this function look at the link layer, and if it's ethernet, then always set the GRH flag? Seems simpler than requiring the upper layers to do this and then pass the result in? -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv8 04/11] ib_core: IBoE CMA device binding
Multicast GIDs are always mapped to multicast MACs as is done in IPv6. Some helper functions are added to ib_addr.h. IPv4 multicast is enabled by translating IPv4 multicast addresses to IPv6 multicast as described in http://www.mail-archive.com/i...@sunroof.eng.sun.com/msg02134.html. I guess it's a bit unfortunate that the RoCE annex completely ignored how to map multicast GIDs to ethernet addresses (I suppose as part of the larger decision to ignore address resolution entirely). Anyway, looking at the email message you reference, it seems to be someone asking what the right way to map IPv4 multicast addresses to IPv6 addresses is. Is there a more definitive document you can point to? It seems that unfortunately the way the layering of addresses is done, there's no way to just use the standard mapping of IPv4 multicast addresses to Ethernet addresses (since IBoE is does addressing via the CMA mapping to GIDs followed by an unspecified mapping from GIDs to Ethernet addresses). - R. -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv8 04/11] ib_core: IBoE CMA device binding
Add support for IBoE device binding and IP -- GID resolution. Path resolving and multicast joining are implemented within cma.c by filling the responses and pushing the callbacks to the cma work queue. IP-GID resolution always yields IPv6 link local addresses - remote GIDs are derived from the destination MAC address of the remote port. Multicast GIDs are always mapped to multicast MACs as is done in IPv6. Some helper functions are added to ib_addr.h. IPv4 multicast is enabled by translating IPv4 multicast addresses to IPv6 multicast as described in http://www.mail-archive.com/i...@sunroof.eng.sun.com/msg02134.html. Signed-off-by: Eli Cohen e...@mellanox.co.il --- Chages from v7: 1. Add force_grh flag to ib_init_ah_from_path() to request IB_AH_GRH for IB_LINK_LAYER_ETHERNET ports thus allowing to use hop limit 1 in path records. 2. cma_acquire_dev() finds the cma_dev by first assuming an iboe type device for none ARPHRD_INFINIBAND dev type. If it fails to do that, it falls back to old method. drivers/infiniband/core/cm.c |5 +- drivers/infiniband/core/cma.c | 283 +++-- drivers/infiniband/core/sa_query.c|5 +- drivers/infiniband/core/ucma.c| 45 - drivers/infiniband/ulp/ipoib/ipoib_main.c |2 +- include/rdma/ib_addr.h| 98 ++- include/rdma/ib_sa.h |3 +- 7 files changed, 412 insertions(+), 29 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 5130fc5..6513b1c 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -351,6 +351,7 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av) unsigned long flags; int ret; u8 p; + int force_grh; read_lock_irqsave(cm.device_lock, flags); list_for_each_entry(cm_dev, cm.device_list, list) { @@ -371,8 +372,10 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av) return ret; av-port = port; + force_grh = rdma_port_link_layer(cm_dev-ib_device, port-port_num) == + IB_LINK_LAYER_ETHERNET ? 1 : 0; ib_init_ah_from_path(cm_dev-ib_device, port-port_num, path, -av-ah_attr); +av-ah_attr, force_grh); av-timeout = path-packet_life_time + 1; return 0; } diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index cc9b594..df5f636 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -58,6 +58,7 @@ MODULE_LICENSE(Dual BSD/GPL); #define CMA_CM_RESPONSE_TIMEOUT 20 #define CMA_MAX_CM_RETRIES 15 #define CMA_CM_MRA_SETTING (IB_CM_MRA_FLAG_DELAY | 24) +#define IBOE_PACKET_LIFETIME 18 static void cma_add_one(struct ib_device *device); static void cma_remove_one(struct ib_device *device); @@ -157,6 +158,7 @@ struct cma_multicast { struct list_headlist; void*context; struct sockaddr_storage addr; + struct kref mcref; }; struct cma_work { @@ -173,6 +175,12 @@ struct cma_ndev_work { struct rdma_cm_eventevent; }; +struct iboe_mcast_work { + struct work_struct work; + struct rdma_id_private *id; + struct cma_multicast*mc; +}; + union cma_ip_addr { struct in6_addr ip6; struct { @@ -281,6 +289,8 @@ static void cma_attach_to_dev(struct rdma_id_private *id_priv, atomic_inc(cma_dev-refcount); id_priv-cma_dev = cma_dev; id_priv-id.device = cma_dev-device; + id_priv-id.route.addr.dev_addr.transport = + rdma_node_get_transport(cma_dev-device-node_type); list_add_tail(id_priv-list, cma_dev-id_list); } @@ -290,6 +300,14 @@ static inline void cma_deref_dev(struct cma_device *cma_dev) complete(cma_dev-comp); } +static inline void release_mc(struct kref *kref) +{ + struct cma_multicast *mc = container_of(kref, struct cma_multicast, mcref); + + kfree(mc-multicast.ib); + kfree(mc); +} + static void cma_detach_from_dev(struct rdma_id_private *id_priv) { list_del(id_priv-list); @@ -330,15 +348,29 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv) union ib_gid gid; int ret = -ENODEV; - rdma_addr_get_sgid(dev_addr, gid); + if (dev_addr-dev_type != ARPHRD_INFINIBAND) { + iboe_addr_get_sgid(dev_addr, gid); + list_for_each_entry(cma_dev, dev_list, list) { + ret = ib_find_cached_gid(cma_dev-device, gid, +id_priv-id.port_num, NULL); + if (!ret) + goto out; + } + } + + memcpy(gid, dev_addr-src_dev_addr + + rdma_addr_gid_offset(dev_addr), sizeof gid);