[openib-general] RE: [iSER]How to send the login request PDU?
As it is known to all,the iSCSI layer at the Initiator should send a Login Request PDU to the Target after allocating the connection resources. My question is How to send this PDU? Could the Send_Control Primitive Operation be used? But itis not in theiSER-assisted mode at present. Or else, the dapl API should used directly. Yes, you should use send_control to send the login PDU.With this ISER implementationyou can do itbefore allocate connection resources. There is the similar problem at the Target when sending the Login Response PDU. Same answer. Hope my descriptionis clear. Any suggestion is appreciated.Ian Jiang [EMAIL PROTECTED] Computer Architecture Laboratory Institute of Computing Technology Chinese Academy of Sciences Beijing,P.R.China Zip code: 100080 Tel: +86-10-62564394(office) 免费下载 MSN Explorer ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RDMA connection and address translation API
Sean wrote: It looks like this would work. If a client wanted to create multiple connections to the same remote service (for example, to separate control and data), then it seems more efficient to move the asynchronous at outside of the connect call. - Sean Thats a good point. What I had in mind was mainly simplicity for the consumer - save him dealing with another upcall. Maybe caching in at module would make things better, but I agree that for multiple connections to the same remote service, the asynchronous at aproach, seems more appropriate. OTOH, After thinking about it some more, there might be problems in letting each and every consumer do his own caching. The at.c has a (non implemented yet) mechanism with invalidate for caching tables. Do we really want to let the consumer handle all the cases of routing tables changing on the fly etc. or centralize it in one place (i.e at.c) ? What do you think, Sean ? Guy ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Payload Length in first RMPP sent segment
Hi, On the RMPP send side, while the Payload Length field in the last segment is clear that it indicates the number of valid bytes in Transferred Data, there seems to be some ambiguity in the optional Payload Length field in the first segment. I think it can work either way but I also think the intent was to reflect the valid bytes. Maybe it is this way to allow flexibility (choice in the implementation). What is the correct interpretation ? Should I enter a comment on this ? Thanks. -- Hal IBA 1.2 p.775 line 37 In the first packet of an RMPP transfer (RMPPFlags.First=1), PayloadLength may indicate the sum of the lengths, in bytes, of the TransferredData fields in all packets of the entire multipacket response; this is done by using a nonzero value for PayloadLength in the first packet. IBA 1.2 p. 776 line 8 In the last packet of an RMPP transfer (RMPPFlags.Last=1), PayloadLength indicates the number of valid bytes in the TransferredData field, allowing data transfers that are not an integral multiple of the length of the TransferredData field. A transfer terminates when either: (a) a packet containing RMPPFlags.Last=1 is received; or (b) a nonzero PayloadLength was given in the first packet of a transfer, and a packet is received containing sufficient TransferredData bytes to equal or exceed the PayloadLength originally provided. If case (b) occurs and RMPPFlags.Last is not 1 for that packet, the Receiver sends an ABORT packet with RMPPStatus of Inconsistent Last and PayloadLength and terminates the transfer. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH][iWARP] IW CM Verbs
On Fri, 26 Aug 2005, Tom Tucker wrote: Index: ib_verbs.h === --- ib_verbs.h(revision 3120) +++ ib_verbs.h(working copy) @@ -804,6 +806,7 @@ struct ib_gid_cache **gid_cache; }; +struct iw_cm; struct ib_device { struct device*dma_device; @@ -820,6 +823,8 @@ u32 flags; + struct iw_cm *iwcm; + int(*query_device)(struct ib_device *device, struct ib_device_attr *device_attr); int(*query_port)(struct ib_device *device, Why does the ib_device need a cm structure for iWARP but not IB? If you used either Guy or Roland's generic RDMA connection API and did the iWARP implementation, would you still need to add the iw_cm structure? ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] RMPP: Fix length in first segment of multipacket sends
Hi Michael, On Mon, 2005-08-29 at 03:23, Michael S. Tsirkin wrote: Quoting Hal Rosenstock [EMAIL PROTECTED]: Index: mad_rmpp.c === --- mad_rmpp.c (revision 3197) +++ mad_rmpp.c (working copy) @@ -593,7 +593,8 @@ Hal, could you diff with -p in the future please? This makes the function name visible in the patch, making it possible to understand what is being changed without applying it. I'll try harder to remember to do this. rmpp_mad-rmpp_hdr.paylen_newwin = cpu_to_be32(mad_send_wr-total_seg * (sizeof(struct ib_rmpp_mad) - - offsetof(struct ib_rmpp_mad, data))); + offsetof(struct ib_rmpp_mad, data)) - + mad_send_wr-pad); BTW, I just noticed that whitespace was (and remains) broken in these lines: indentation is done by spaces. The whitespace is preceeded by tabs and is to make the parameters line up. I thought that was allowable coding style. It has been used in many places in OpenIB code. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] uDAPL added ibv_query support
On Thu, 25 Aug 2005, Arlin Davis wrote: arlin James, arlin arlin Support for ibv_query_port, device, and gid. arlin arlin Thanks, arlin arlin -arlin Committed in revision 3227. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: Question on the best approach to debug aninfiniband connectio n problem
Hi Michael, On Sun, 2005-08-28 at 02:34, Michael S. Tsirkin wrote: Quoting r. Hal Rosenstock [EMAIL PROTECTED]: I think you are referring to a Set of PortInfo which causes an event to IPoIB. Yes. There seems to be some bug related to local MAD handling: Can you elaborate more on this ? -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] RMPP: Fix payload length of middle RMPP sent segments
RMPP: Fix payload length of middle RMPP sent segments. Middle payload lengths should be 0 on the send side. (This is a compliance and should not be an interop issue as middle payload lengths are supposed to be ignored on receive). Signed-off-by: Hal Rosenstock [EMAIL PROTECTED] Note also that diff -p did not show the routine name for this 1 line change. Index: mad_rmpp.c === --- mad_rmpp.c (revision 3197) +++ mad_rmpp.c (working copy) @@ -602,6 +603,7 @@ mad_send_wr-sg_list[1].length = sizeof(struct ib_rmpp_mad) - mad_send_wr-data_offset; mad_send_wr-sg_list[1].lkey = mad_send_wr-sg_list[0].lkey; + rmpp_mad-rmpp_hdr.paylen_newwin = 0; } if (mad_send_wr-seg_num == mad_send_wr-total_seg) { ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RDMA Generic Connection Management
Hi, After receiving feedbacks from people here - I want to see if we can agree on a generic CM API, so we can start implementing it. I will try and summarize the 2 options, the way I understand it. If I am missing something or misrepresenting - please don't hesitate to correct me. both suggestion include the following verbs (or semantically equivalent): ib_cma_get_device, ib_cma_create_qp, ib_cma_connect, ib_cma_disconnect, ib_cma_listen, ib_cma_destroy, ib_cma_accept, ib_cma_reject, ib_cma_get_src_ip. a connect flow will be something like: - ib_cma_get_device (...) /* get device(1) or device+path(2) */ - pd = ib_alloc_pd(...) /* pd allocated in the given device */ - qp = ib_cma_create_qp(...) /* qp returned in init state */ - ib_post_recv(qp, ...); - ib_cma_connect (qp, dst_addr(1)/path(2), ...); Now, there are 2 suggestions for the device discovery: 1. get_device returns device and port, according the local routing tables, synchronously. ib_cma_connect calls the at module for address resolving (cache handled) before calling the cm_connect. 2. get_device registers an upcall and return in the upcall the data path to the consumer. In this case caching is done by the consumer. I prefer option 1, because it makes the consumer code simpler, without having to handle upcalls for address translations (which are not asynchronous in iWARP) or hold the transport's data information. Also it saves the consumer the trouble of caching routes to destinations. I would like to hear what other people in the list think of it ... Thanks, Guy ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] RMPP: Fix length in first segment of multipacket sends
Hal, first, sorry about nitpicking. Quoting Hal Rosenstock [EMAIL PROTECTED]: rmpp_mad-rmpp_hdr.paylen_newwin = cpu_to_be32(mad_send_wr-total_seg * (sizeof(struct ib_rmpp_mad) - - offsetof(struct ib_rmpp_mad, data))); + offsetof(struct ib_rmpp_mad, data)) - + mad_send_wr-pad); BTW, I just noticed that whitespace was (and remains) broken in these lines: indentation is done by spaces. The whitespace is preceeded by tabs and is to make the parameters line up. I thought that was allowable coding style. It has been used in many places in OpenIB code. Sure, whitespace preceeded by tabs is OK. But, pls take a look at the original file, or the patch, I think you'll see that its not preceeded by tabs in this instance. Some more nitpicks: In this specific instance, its probably best to just add a temp variable and write w = mad_send_wr-total_seg * (sizeof(struct ib_rmpp_mad) - offsetof(struct ib_rmpp_mad, data)) - mad_send_wr-pad; rmpp_mad-rmpp_hdr.paylen_newwin = cpu_to_be32(w) to avoid placing the descendant cpu_to_be32 left of the = sign. Documentation/CodingStyle says about this: Descendants are always substantially shorter than the parent and are placed substantially to the right. And by the way, wouldnt it be a good idea to replace hard-coded 220 and such in ib_mad.h by a symbolic constant, and then we'd just have w = mad_send_wr-total_seg * IB_RMMP_DATA_SIZE - mad_send_wr-pad; rmpp_mad-rmpp_hdr.paylen_newwin = cpu_to_be32(w) What do you think? Thanks, MST -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: Question on the best approach to debug aninfiniband connectio n problem
Quoting r. Hal Rosenstock [EMAIL PROTECTED]: There seems to be some bug related to local MAD handling: Can you elaborate more on this ? I have a back to back setup. I sometimes start with all modules unloaded, load them back and bring up ipoib with a fixed ip address. I then start opensm on one node, and try to ping one from another. This does not work until I down and up ib0 on the node with opensm running. down and up on the other node does not help, which made me think local mad handling is the culprit. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OSM vendor layer: In umad_receiver, handle allocating RMPP large MADs from OSM MAD pool
Hi Grant, On Fri, 2005-08-26 at 13:23, Grant Grundler wrote: On Fri, Aug 26, 2005 at 12:34:58PM -0400, Hal Rosenstock wrote: - MAD_BLOCK_SIZE, osm_addr))) { + length MAD_BLOCK_SIZE ? + length : MAD_BLOCK_SIZE, + osm_addr))) { Can max(length, MAD_BLOCK_SIZE) be used instead? Yes; I made that change. There is a MAX macro in complib/cl_math.h. Thanks. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] uDAPL common code fix for default attribute settings
On Thu, 25 Aug 2005, Arlin Davis wrote: arlin James, arlin arlin Please review this common code patch that fixes default arlin settings so they don't exceed device maximums. arlin arlin Thanks, arlin arlin -arlin I've moved the check into dapli_ep_default_attrs() so all future callers will also benefit from this. Index: dapl/common/dapl_ep_util.c === --- dapl/common/dapl_ep_util.c (revision 3231) +++ dapl/common/dapl_ep_util.c (working copy) @@ -260,7 +260,9 @@ void dapli_ep_default_attrs ( IN DAPL_EP *ep_ptr ) { +DAT_EP_ATTRep_attr_limit; DAT_EP_ATTR*ep_attr; +DAT_RETURN dat_status; ep_attr = ep_ptr-param.ep_attr; /* Set up defaults */ @@ -295,7 +297,36 @@ dapli_ep_default_attrs ( *- provider_specific_params: 0 */ -return; + dat_status = dapls_ib_query_hca (ep_ptr-header.owner_ia-hca_ptr, + NULL, ep_attr_limit, NULL); + /* check against HCA maximums */ + if (dat_status == DAT_SUCCESS) + { +ep_ptr-param.ep_attr.max_mtu_size = +DAPL_MIN(ep_ptr-param.ep_attr.max_mtu_size, + ep_attr_limit.max_mtu_size); +ep_ptr-param.ep_attr.max_rdma_size = +DAPL_MIN(ep_ptr-param.ep_attr.max_rdma_size, + ep_attr_limit.max_rdma_size); +ep_ptr-param.ep_attr.max_recv_dtos = +DAPL_MIN(ep_ptr-param.ep_attr.max_recv_dtos, + ep_attr_limit.max_recv_dtos); +ep_ptr-param.ep_attr.max_request_dtos = +DAPL_MIN(ep_ptr-param.ep_attr.max_request_dtos, + ep_attr_limit.max_request_dtos); +ep_ptr-param.ep_attr.max_recv_iov = +DAPL_MIN(ep_ptr-param.ep_attr.max_recv_iov, + ep_attr_limit.max_recv_iov); +ep_ptr-param.ep_attr.max_request_iov = +DAPL_MIN(ep_ptr-param.ep_attr.max_request_iov, + ep_attr_limit.max_request_iov); +ep_ptr-param.ep_attr.max_rdma_read_in = +DAPL_MIN(ep_ptr-param.ep_attr.max_rdma_read_in, + ep_attr_limit.max_rdma_read_in); +ep_ptr-param.ep_attr.max_rdma_read_out = +DAPL_MIN(ep_ptr-param.ep_attr.max_rdma_read_out, + ep_attr_limit.max_rdma_read_out); + } } ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] sdp: use linux/list.h in sdp_link.c
The following kills sdp_link.h and converts sdp_link.c to use linux/list.h Locking is still missing here. Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED] Index: linux-2.6.12.2/drivers/infiniband/ulp/sdp/sdp_link.c === --- linux-2.6.12.2.orig/drivers/infiniband/ulp/sdp/sdp_link.c +++ linux-2.6.12.2/drivers/infiniband/ulp/sdp/sdp_link.c @@ -35,17 +35,105 @@ #include ipoib.h #include sdp_main.h -#include sdp_link.h -static kmem_cache_t *wait_cache = NULL; -static kmem_cache_t *info_cache = NULL; +#define SDP_LINK_F_VALID 0x01 /* valid path info record. */ +#define SDP_LINK_F_ARP 0x02 /* arp request in progress. */ +#define SDP_LINK_F_PATH 0x04 /* arp request in progress. */ +/* + * wait for an ARP event to complete. + */ +struct sdp_path_info { + u32 src;/* source IP address. */ + u32 dst;/* destination IP address */ + int dif;/* bound device interface option */ + u32 gw; /* gateway IP address */ + int qid;/* path record query ID */ + u8 port; /* HCA port */ + u32 flags; /* record flags */ + int sa_time;/* path_rec request timeout */ + unsigned long arp_time; /* ARP request timeout */ + unsigned long use; /* last time accessed. */ + struct ib_device *ca; /* HCA device. */ + struct ib_sa_path_rec path; /* path record info */ +struct ib_sa_query *query; -static struct sdp_path_info *info_list = NULL; + struct work_struct timer; /* arp request timers. */ + + struct list_head info_list; + + struct list_head wait_list; +}; + +struct sdp_path_wait { + u64 id; /* request identifier */ + void (*completion)(u64 id, + int status, + u32 dst_addr, + u32 src_addr, + u8 hw_port, + struct ib_device *ca, + struct ib_sa_path_rec *path, + void *arg); + void *arg; + int retry; + struct list_head list; +}; + +struct sdp_work { + struct work_struct work; + void *arg; +}; + +struct sdp_link_arp { + /* +* generic arp header +*/ + u16 addr_type;/* format of hardware address */ + u16 proto_type; /* format of protocol address */ + u8 addr_len; /* length of hardware address */ + u8 proto_len;/* length of protocol address */ + u16 op; /* ARP opcode (command) */ + /* +* begin IB specific section +*/ + u32 src_qpn; /* MSB = reserved, low 3 bytes=QPN */ + union ib_gid src_gid; + u32 src_ip; + + u32 dst_qpn; /* MSB = reserved, low 3 bytes=QPN */ + union ib_gid dst_gid; + u32 dst_ip; + +} __attribute__ ((packed)); /* sdp_link_arp */ + +#define SDP_LINK_SWEEP_INTERVAL (10 * (HZ)) /* frequency of sweep function */ +#define SDP_LINK_INFO_TIMEOUT (300UL * (HZ)) /* unused time */ +#define SDP_LINK_SA_RETRY (3) /* number of SA retry requests */ +#define SDP_LINK_ARP_RETRY (3) /* number of ARP retry requests */ + +#define SDP_LINK_SA_TIME_MIN(500) /* milliseconds. */ +#define SDP_LINK_SA_TIME_MAX(1) /* milliseconds. */ +#define SDP_LINK_ARP_TIME_MIN (HZ) +#define SDP_LINK_ARP_TIME_MAX (32UL * (HZ)) + +#if 0 +#define SDP_IPOIB_RETRY_VALUE3/* number of retries. */ +#define SDP_IPOIB_RETRY_INTERVAL (HZ * 1) /* retry frequency */ + +#define SDP_DEV_PATH_WAIT (5 * (HZ)) +#define SDP_PATH_TIMER_INTERVAL (15 * (HZ)) /* cache sweep frequency */ +#define SDP_PATH_REAPING_AGE(300 * (HZ)) /* idle time before reaping */ +#endif + +static kmem_cache_t *wait_cache; +static kmem_cache_t *info_cache; + +static LIST_HEAD(info_list); static struct workqueue_struct *link_wq; static struct work_struct link_timer; -static u64 path_lookup_id = 0; +static u64 path_lookup_id; #define _SDP_PATH_LOOKUP_ID() \ ((++path_lookup_id) ? path_lookup_id : ++path_lookup_id) @@ -95,42 +183,6 @@ static void sdp_link_path_complete(u64 i } /* - * sdp_path_wait_add - add a wait entry into the wait list for a path - */ -static void sdp_path_wait_add(struct sdp_path_info *info, - struct sdp_path_wait *wait) -{ - - wait-next = info-wait_list; - info-wait_list = wait; - wait-pext = info-wait_list; - - if (wait-next) - wait-next-pext = wait-next; -} - -/* - * sdp_path_wait_destroy - destroy an entry for a wait element - */ -static void sdp_path_wait_destroy(struct sdp_path_wait *wait) -{ - /* -* if it's in the list, pext will not be null -*/ - if
[openib-general] [PATCH} OpenSM vendor layer: Use MAX macro in umad_receiver
OpenSM vendor layer: Use MAX macro in umad_receiver rather than open coding it. Pointed out by Grant Grundler [EMAIL PROTECTED] Signed-off-by: Hal Rosenstock [EMAIL PROTECTED] Index: osm_vendor_ibumad.c === --- osm_vendor_ibumad.c (revision 3200) +++ osm_vendor_ibumad.c (working copy) @@ -271,8 +271,7 @@ umad_receiver(void *p_ptr) if (!(madw_p = osm_mad_pool_get(p_bind-p_mad_pool, (osm_bind_handle_t)p_bind, - length MAD_BLOCK_SIZE ? - length : MAD_BLOCK_SIZE, + MAX(length, MAD_BLOCK_SIZE), osm_addr))) { osm_log( p_vend-p_log, OSM_LOG_ERROR, umad_receiver: request for a new madw failed -- dropping packet\n ); ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] OpenIB user space includes location
Hi, Should OpenIB user space library includes be under /usr/include/infiniband or /usr/include/rdma now (similar to the kernel move of includes) ? The two that make me think this are user verbs and perhaps CM. IB management (OpenSM and diags) as well as AT (address translation) are IB specific. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RDMA connection and address translation API
Michael What about using an Externally Administrated Service ID? Michael Openib gets Service ID = 0x1H00 1405 where H is Michael any digit. That would work. I think we've already converged on picking a service ID range for our iWARP emulation spec. The only question is whether it should be in the IBTA or IETF service ID range, and I don't think that really matters much. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] RMPP: Fix length in first segment of multipacket sends
Hi Michael, On Mon, 2005-08-29 at 10:47, Michael S. Tsirkin wrote: Hal, first, sorry about nitpicking. Quoting Hal Rosenstock [EMAIL PROTECTED]: rmpp_mad-rmpp_hdr.paylen_newwin = cpu_to_be32(mad_send_wr-total_seg * (sizeof(struct ib_rmpp_mad) - - offsetof(struct ib_rmpp_mad, data))); + offsetof(struct ib_rmpp_mad, data)) - + mad_send_wr-pad); BTW, I just noticed that whitespace was (and remains) broken in these lines: indentation is done by spaces. The whitespace is preceeded by tabs and is to make the parameters line up. I thought that was allowable coding style. It has been used in many places in OpenIB code. Sure, whitespace preceeded by tabs is OK. But, pls take a look at the original file, or the patch, I think you'll see that its not preceeded by tabs in this instance. I think this is in the original file. Some more nitpicks: In this specific instance, its probably best to just add a temp variable and write w = mad_send_wr-total_seg * (sizeof(struct ib_rmpp_mad) - offsetof(struct ib_rmpp_mad, data)) - mad_send_wr-pad; rmpp_mad-rmpp_hdr.paylen_newwin = cpu_to_be32(w) to avoid placing the descendant cpu_to_be32 left of the = sign. Documentation/CodingStyle says about this: Descendants are always substantially shorter than the parent and are placed substantially to the right. And by the way, wouldnt it be a good idea to replace hard-coded 220 and such in ib_mad.h by a symbolic constant, and then we'd just have w = mad_send_wr-total_seg * IB_RMMP_DATA_SIZE - mad_send_wr-pad; rmpp_mad-rmpp_hdr.paylen_newwin = cpu_to_be32(w) What do you think? All seem reasonable to me. Sean should comment and has the final say on this. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [PATCH][iWARP] IW CM Verbs
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of James Lentini Sent: Monday, August 29, 2005 8:42 AM To: Tom Tucker Cc: openib-general@openib.org Subject: Re: [openib-general] [PATCH][iWARP] IW CM Verbs On Fri, 26 Aug 2005, Tom Tucker wrote: Index: ib_verbs.h === --- ib_verbs.h (revision 3120) +++ ib_verbs.h (working copy) @@ -804,6 +806,7 @@ struct ib_gid_cache **gid_cache; }; +struct iw_cm; struct ib_device { struct device*dma_device; @@ -820,6 +823,8 @@ u32 flags; + struct iw_cm *iwcm; + int(*query_device)(struct ib_device *device, struct ib_device_attr *device_attr); int(*query_port)(struct ib_device *device, Why does the ib_device need a cm structure for iWARP but not IB? Some RNIC devices fully establish the connection in hardware. However, _all_ openib IB devices export the MAD interface, which is used to send low level connection setup primitives, thus allowing connection setup in the host. If you used either Guy or Roland's generic RDMA connection API and did the iWARP implementation, would you still need to add the iw_cm structure? Yes. The struct iw_cm allows the RNIC device to export a set of high level connection verbs, if that device supports it. Hope this helps... Steve. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [PATCH][iWARP] Added provider CM verbs andquery provider methods
Caitlin For the openib folk: keep in mind in the following that iWARP runs on top of TCP, and the TCP is offloaded so TOE enters the iWARP picture. I second the requirement that an iWARP RNIC needs to integrate with host configuration, reporting, and security mechanisms, and this is the approach taken in the Chelsio TOE patches that we have submitted. Standard tools such as netstat, ifconfig, work with the Chelsio TOE today, and there's nothing to prevent netfilter to work, etc. For those that are interested there is more information available at https://service.chelsio.com/open_toe and in particular in Chelsio_toe_arch.pdf I have to disagree that this means that the connection is necessarily setup by the host stack. The Chelsio 10GE NIC/TOE/iSCSI/iWARP products setup the connection on the card, but the setup includes an ASK host phase initiated by the card where the host can filter the connect request, and modify any of the initial TCP values chosen by the card, etc., and then respond with accept or reject. In general the iWARP connection manager needs to accommodate three possible TCP connection setup models in use today in iWARP devices: a) what you seem to be advocating, i.e. TCP connection setup on the host stack, b) what I brought up, i.e. offloaded TCP connection setup with an ASK phase, and c) what was brought up previously and that's full TCP connection setup offload. 'Asgeir From: Caitlin Bestler [EMAIL PROTECTED] To: Roland Dreier [EMAIL PROTECTED],Tom Tucker [EMAIL PROTECTED] CC: openib-general@openib.org Subject: RE: [openib-general] [PATCH][iWARP] Added provider CM verbs andquery provider methods Date: Thu, 25 Aug 2005 16:46:38 -0700 Device vendors would jump at the opportunity to have a stable interface with the host stack. Things like routing, protection from denial of service attacks, rules for logging and filtering connection requests and more all *should* be handled by the host stack. That's where the end user wants to control them, it's where the security code can be kept most current and most robust. It is also largely on packets that do not require offload optimization. But we also need time to ensure that the community understands this as giving the host stack control of an offload connection during connection establishment -- rather than as the offload device stealing the connection from the host stack. Moving the entire TCP connection logic to the offload device not only increases the work that the offload device must do, it reduces the auditability of the system and the user's control over their network activity. So the intent is not to evade the stack, rather it is to allow time for proper integration with host stack control. The tradeoffs are complex, and neither side fully understands the other's issues yet. We need to work together to determine how to provide the acceleration that our users want without sacrificing the OS provided security that they assume will not be sacrificed. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Roland Dreier Sent: Thursday, August 25, 2005 4:21 PM To: Tom Tucker Cc: openib-general@openib.org Subject: Re: [openib-general] [PATCH][iWARP] Added provider CM verbs and query provider methods Tom RNIC Verbs imply that the modify qp verb takes a handle to a Tom connection -- presumably a socket. This CAN'T be done on Tom Linux in any fashion that is acceptable to the netdev Tom crowd. SOOO we modeled this after DAPL. Trust me, I would Tom LOVE to be able to establish the connection using bind, Tom listen, etc..., query the Linux connection state and then Tom pass this down to the qp modify verb...but I can't. Let's not be too quick to say that this is impossible. I think we should work with the Linux networking community and come up with the right answer, and not accept a bad solution just because it lets us go around the networking people. Has there been any real discussion of this on netdev? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RDMA connection and address translation API
Quoting r. Roland Dreier [EMAIL PROTECTED]: Subject: Re: RDMA connection and address translation API Michael What about using an Externally Administrated Service ID? Michael Openib gets Service ID = 0x1H00 1405 where H is Michael any digit. That would work. I think we've already converged on picking a service ID range for our iWARP emulation spec. The only question is whether it should be in the IBTA or IETF service ID range, and I don't think that really matters much. Or neither :) Are there disadvantages to Externally Administrated Service ID? This avoids any need for approvals from either IETF or IBTA. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH][iWARP] Added provider CM verbs andquery provider methods
Asgeir ...this is the approach taken in the Chelsio TOE patches Asgeir that we have submitted. What are your plans for these patches? I am not subscribed to netdev, but from reading the archives, it seems that your most recent submission was rejected quite strongly. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] ipoib: device removal races
Quoting r. Roland Dreier [EMAIL PROTECTED]: Subject: Re: [openib-general] [PATCH] ipoib: device removal races Thanks, I finally applied this and put it in my git queue for 2.6.14. I'm still thinking about the bigger patch that adds a second work queue. Having one extra work queue because of the rtnl_lock issues is ugly enough, and I'd really like to find a way to avoid two queues. I thought about this some more. I think I see even more problems, like potential deadlocks were ipoib needs to flush the link wq, link wq waits for sa query to time out, and that needs the default wq to run. It seems that most of the complexity comes from the way core uses work queues to pass events to upper layers. And I wander: couldnt core be simplified by passing up events directly in the interrupt context? I think IPoIB could then use plain spinlocks for most synchronisations. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RDMA connection and address translation API
Quoting r. Roland Dreier [EMAIL PROTECTED]: Subject: Re: RDMA connection and address translation API Michael What about using an Externally Administrated Service ID? Michael Openib gets Service ID = 0x1H00 1405 where H is Michael any digit. That would work. So I'm saying, we dont need reserved bits in cm req then, do we? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RDMA connection and address translation API
Michael So I'm saying, we dont need reserved bits in cm req then, Michael do we? Right. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] ipoib: device removal races
Michael It seems that most of the complexity comes from the way Michael core uses work queues to pass events to upper layers. Michael And I wander: couldnt core be simplified by passing up Michael events directly in the interrupt context? I'm confused -- which core and which events are you talking about? The ib_core module just passes on asynchronous events directly from the low-level driver, without using work queues. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] kdapl: Change for new include location
kdapl: Change for new include location (rdma rather than infiniband) Signed-off-by: Hal Rosenstock [EMAIL PROTECTED] Index: dapl_openib_cm.h === --- dapl_openib_cm.h(revision 3232) +++ dapl_openib_cm.h(working copy) @@ -34,9 +34,9 @@ #ifndef DAPL_OPENIB_CM_H #define DAPL_OPENIB_CM_H -#include ib_cm.h -#include ib_sa.h -#include ib_at.h +#include rdma/ib_cm.h +#include rdma/ib_sa.h +#include rdma/ib_at.h struct dapl_cm_ctx { struct ib_at_ib_route dapl_rt; Index: dapl_openib_util.h === --- dapl_openib_util.h (revision 3232) +++ dapl_openib_util.h (working copy) @@ -33,8 +33,8 @@ #define DAPL_OPENIB_UTIL_H #include dapl.h -#include ib_verbs.h -#include ib_cm.h +#include rdma/ib_verbs.h +#include rdma/ib_cm.h enum dapl_async_handler_type { DAPL_ASYNC_UNAFILIATED, ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [PATCH][iWARP] Added provider CM verbsandquery provider methods
From my reading of the thread, there is resistence to TOE in general. The patch is just the messenger. The principle opponent is Dave Miller who strongly believes that stateless acceleration such as TSO (TCP Segmentation Offload) suffices for all needs. Ironically, this requires a much higher level of stack integration than TOE does. TOE for the purposes of RDMA may have more legs within the community, however, this has yet to be tested. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Roland Dreier Sent: Monday, August 29, 2005 11:24 AM To: Asgeir Eiriksson Cc: openib-general@openib.org Subject: Re: [openib-general] [PATCH][iWARP] Added provider CM verbsandquery provider methods Asgeir ...this is the approach taken in the Chelsio TOE patches Asgeir that we have submitted. What are your plans for these patches? I am not subscribed to netdev, but from reading the archives, it seems that your most recent submission was rejected quite strongly. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general][PATCH][kdapl]: FMR and EVD patch
Hi Guy, I agree with you on the problems poised by the current interface. I hope we can find a solution that fixes the problem. Note that the same problem must be handled by a ULP using the native verbs. I still think that there may be a race condition with this patch. Here's the scenario I'm concerned about: - Receive an evd upcall - Disable evd upcall policy - Wakeup polling thread - Dequeue all events - Enable evd upcall policy by: 1. Call dapl_evd_modify_upcall() to enable the evd upcall 2. Obtain the EVD spin lock via spin_lock_irqsave, thus disabling local interrupts 3. Check that the EVD's ring buffer is empty (there are no DAPL software events) 4. A DTO completion occurs on the EVD's CQ 5. Enable the CQ's upcall via ib_req_notify_cq() If I understand you correctly, you are asserting that event #4, the CQ's DTO completion, cannot occur because the local interrupts are disabled by spin_lock_irqsave(). Have I understood you correctly? My belief is that the completion will occur on the card regardless of the interrupt state. Can you provide me with a reference that guarantees this will not happen? james On Thu, 18 Aug 2005, Guy German wrote: Hi James, I will try to explain the reason behind this patch: In IB, a normal working flow, for a consumer, is: - Receive a CQ notification callback - Wakeup polling thread - Poll for completion (empty the queue) - Request completion notification There is no problem here. In kdapl, however, the consumer will keep getting upcalls, until he sets the upcall policy to disable. So a working flow will be: - Receive an evd upcall - Disable evd upcall policy - Wakeup polling thread - Dequeue all evds - Enable evd upcall policy There is a race here: A completion can come after the last dequeu and before the Enabling. The provider wont call for the consumer (policy is disabled) and the consumer would not dequeu any more because he knows the queue is empty. I think it is a very bad idea, to solve this race by adding another evd_dequeue after you enable the upcall policy. If you do that you would have a polling thread (because while you dequeue one completion you can have many more following) and at the same time you will receive upcall from the dapl provider. Beside the fact that this is an expensive and unnecessary context switch you have an upcall and a thread racing. You will have a situation that the upcall has an event at hand and the thread has an event, both not handled yet - you will have to queue them again internally or something to keep the order. And I think that is only a partial list of the problems in this case. SO My suggestion is simple, it solves the race, it saves the unnecessary context switch and it spares the complexity from the consumer side. The solution is to notify the consumer when he tries to enable upcall policy, that the queue is actually not empty, and force him to continue polling (in the same thread context he is now). dat_evd_modify_upcall is guarded by a spin_lock_irqsave, when it checks the queue and so the race would not occur. BTW, Im not sure if it is still the case, but I think that one of the ulps in openib, did not use a kernel thread for dequeu-ing. This is a very bad design, as the upcall can be polling for *long* periods of time, in a tasklet/interrupt context. Thats it Sorry for the long mail I hope It was not to blur Guy. -Original Message- From: James Lentini [mailto:[EMAIL PROTECTED] Sent: Thu 8/18/2005 10:28 PM To: Guy German Cc: Openib Subject: Re: [openib-general][PATCH][kdapl]: FMR and EVD patch Hi Guy, The one piece of this patch that remains unaccepted is: Index: ib/dapl_evd.c === --- ib/dapl_evd.c (revision 3136) +++ ib/dapl_evd.c (working copy) @@ -1028,6 +1028,7 @@ { struct dapl_evd *evd; int status = 0; + int pending_events; evd = (struct dapl_evd *)evd_handle; dapl_dbg_log (DAPL_DBG_TYPE_API, %s: (evd=%p, upcall_policy=%d)\n, @@ -1035,14 +1036,25 @@ spin_lock_irqsave(evd-common.lock, evd-common.flags); if ((upcall_policy != DAT_UPCALL_TEARDOWN) - (upcall_policy != DAT_UPCALL_DISABLE) - (evd-evd_flags DAT_EVD_DTO_FLAG)) { - status = ib_req_notify_cq(evd-cq, IB_CQ_NEXT_COMP); - if (status) { - printk(KERN_ERR %s: dapls_ib_completion_notify failed -(status=0x%x)\n,__func__, status); + (upcall_policy != DAT_UPCALL_DISABLE)) { + pending_events = dapl_rbuf_count(evd-pending_event_queue); + if (pending_events) { + dapl_dbg_log(DAPL_DBG_TYPE_WARN, +
[openib-general] Re: [PATCH] IBAT resolve_ats_route
On Fri, 2005-08-26 at 16:42, James Lentini wrote: I was reading through the IBAT sources when I noticed that in resolve_ats_route() you set req-pend.sa_query to null on line 1127 and then check to see if it is null a few lines later. I don't think you need to do that. Yes, it looks like that code path could never be taken. Thanks. Applied. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [PATCH][iWARP] Added provider CM verbsandquery provider methods
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom Tucker Sent: Monday, August 29, 2005 9:47 AM To: Roland Dreier; Asgeir Eiriksson Cc: openib-general@openib.org Subject: RE: [openib-general] [PATCH][iWARP] Added provider CM verbsandquery provider methods From my reading of the thread, there is resistence to TOE in general. The patch is just the messenger. The principle opponent is Dave Miller who strongly believes that stateless acceleration such as TSO (TCP Segmentation Offload) suffices for all needs. Ironically, this requires a much higher level of stack integration than TOE does. TOE for the purposes of RDMA may have more legs within the community, however, this has yet to be tested. And even once we have concensus to do it, we then need to reach concensus on issues such as connect-on-chip-with-host-approval and/or connect-on-host-then-transfer to work through. For example, I think the host stack should support either, leaving the tradeoffs between NIC and host processing to be resolved in the marketplace. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Shubha Mudgal In Concert TOMORROW NIGHT!
Update Profile | Unsubscribe | Confirm | Complain ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] iser: Make iser Makefile like other OpenIB ULP makefiles
Make iser Makefile like other OpenIB ULP makefiles Signed-off-by: Hal Rosenstock [EMAIL PROTECTED] Index: Makefile === --- Makefile(revision 3232) +++ Makefile(working copy) @@ -1,16 +1,14 @@ -ISER_OBJ = iser_mod.o -ISER_OBJ += iser_conn.o -ISER_OBJ += iser_initiator.o -ISER_OBJ += iser_memory.o -ISER_OBJ += iser_task.o -ISER_OBJ += iser_utils.o -ISER_OBJ += iser_dto.o -ISER_OBJ += iser_lkdapl.o +EXTRA_CFLAGS += -Idrivers/infiniband/include -Idrivers/infiniband/ulp/kdapl \ + -I$(src)/include -DLINUX_KDAT -EXTRA_CFLAGS += -Idrivers/infiniband/include -EXTRA_CFLAGS += -Idrivers/infiniband/ulp/kdapl -EXTRA_CFLAGS += -I$(src)/include -EXTRA_CFLAGS += -DLINUX_KDAT +obj-$(CONFIG_INFINIBAND_ISER) += ib_iser.o -obj-$(CONFIG_INFINIBAND_ISER) += $(ISER_OBJ) +ib_iser-y := iser_mod.o \ + iser_conn.o \ + iser_initiator.o \ + iser_memory.o \ + iser_task.o \ + iser_utils.o \ + iser_dto.o \ + iser_lkdapl.o ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] kdapl: Change for new include location
On Mon, 29 Aug 2005, Hal Rosenstock wrote: halr kdapl: Change for new include location (rdma rather than infiniband) Committed in revision 3235. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] ipoib: device removal races
Quoting r. Roland Dreier [EMAIL PROTECTED]: I'm confused -- which core and which events are you talking about? OK, I wasnt exactly clear. I was talking about activating the sa queries: if starting and cancelling sa queries could be done from under spinlock/interrupt context, IPoIB could use straight spinlocks for synchronisation, and avoid using workqueues altogether. Is this feasible? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] RDMA Generic Connection Management
James What happens if multiple devices can reach the destination James address? How will they be enumerated to the consumer? I guess we need to move towards the full horror of getaddrinfo(). Probably we need some unusable native API, and then library functions layered on top for consumers that don't care. Although maybe it's not necessary -- are there any consumers of this API that really want to choose among different equal-metric routes? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RDMA Generic Connection Management
a connect flow will be something like: - ib_cma_get_device (...) /* get device(1) or device+path(2) */ - pd = ib_alloc_pd(...) /* pd allocated in the given device */ - qp = ib_cma_create_qp(...) /* qp returned in init state */ - ib_post_recv(qp, ...); - ib_cma_connect (qp, dst_addr(1)/path(2), ...); Now, there are 2 suggestions for the device discovery: 1. get_device returns device and port, according the local routing tables, synchronously. ib_cma_connect calls the at module for address resolving (cache handled) before calling the cm_connect. 2. get_device registers an upcall and return in the upcall the data path to the consumer. In this case caching is done by the consumer. What happens if multiple devices can reach the destination address? How will they be enumerated to the consumer? At the DAT layer the assumption was that multiple paths would be chosen based upon the Class of Service. So either the CoS must be passed down, or get_device must return an array of devices with the required info to allow the DAT Provider to make the determination. Passing it down sounds simpler to me. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RDMA Generic Connection Management
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Roland Dreier Sent: Monday, August 29, 2005 12:41 PM To: James Lentini Cc: openib-general@openib.org Subject: Re: [openib-general] RDMA Generic Connection Management James What happens if multiple devices can reach the destination James address? How will they be enumerated to the consumer? I guess we need to move towards the full horror of getaddrinfo(). Probably we need some unusable native API, and then library functions layered on top for consumers that don't care. Although maybe it's not necessary -- are there any consumers of this API that really want to choose among different equal-metric routes? The assumption implicit in the DAT connection APIs is that there are none (i.e., if you can't distinguish based on Class of Service then you don't care which actual path you get). ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] ipoib: device removal races
Quoting r. Roland Dreier [EMAIL PROTECTED]: Subject: Re: [PATCH] ipoib: device removal races Michael OK, I wasnt exactly clear. I was talking about Michael activating the sa queries: if starting and cancelling sa Michael queries could be done from under spinlock/interrupt Michael context, IPoIB could use straight spinlocks for Michael synchronisation, and avoid using workqueues altogether. I'd have to audit the code to make sure, but as far as I know it should be fine to call the SA query API with spinlocks held. Okay, but it also seems that, at least to cancel a query, its unsufficient to call ib_sa_cancel_query - you then have to wait until you get a callback, which seems to be performed from a work queue. Can sa query be changed to perform the callback directly, and so guarantee that query isnt used after ib_sa_cancel_query returns? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] RDMA Generic Connection Management
Caitlin The assumption implicit in the DAT connection APIs is Caitlin that there are none (i.e., if you can't distinguish based Caitlin on Class of Service then you don't care which actual path Caitlin you get). Let's forget about what DAT specified and just try to come up with the right answer. In any case, DAT ignored routing completely, so I don't think it's helpful to consider it. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] ipoib: device removal races
Michael Can sa query be changed to perform the callback directly, Michael and so guarantee that query isnt used after Michael ib_sa_cancel_query returns? Hmm, that gets into the MAD layer design, but I think it gets very tricky. For example, how do we know that the query isn't already completing on a different CPU as we enter the cancel call? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general][PATCH][kdapl]: FMR and EVD patch
James Lentini wrote: I agree with you on the problems poised by the current interface. I hope we can find a solution that fixes the problem. Note that the same problem must be handled by a ULP using the native verbs. I don't think we have the same problem in the verbs. In the currently Mellanox hw (which is AFAIK the only available hw in openib) there is no race at all (because of the proprietary, more considerate, completion notification implementation). - Receive a CQ notification callback - Wakeup polling thread - Poll for completion (empty the queue) - Request completion notification [you will get a completion notification even for old completions on the queue] - exit thread In the case of other, more harsh ib compliant future hw implementation Request completion Notification extended verb could encapsulate: - request CQ notification - if cq !empty request CQ notification _again_ (note that you are not *polling* the cq just checking the queue. This is different then draining the evd one more time) And the race is solved. Indeed, it is not as efficient as sparing the context switch (to interrupt and back to thread) altogether. I still think that there may be a race condition with this patch. Here's the scenario I'm concerned about: - Receive an evd upcall - Disable evd upcall policy - Wakeup polling thread - Dequeue all events - Enable evd upcall policy by: 1. Call dapl_evd_modify_upcall() to enable the evd upcall 2. Obtain the EVD spin lock via spin_lock_irqsave, thus disabling local interrupts 3. Check that the EVD's ring buffer is empty (there are no DAPL software events) 4. A DTO completion occurs on the EVD's CQ 5. Enable the CQ's upcall via ib_req_notify_cq() If I understand you correctly, you are asserting that event #4, the CQ's DTO completion, cannot occur because the local interrupts are disabled by spin_lock_irqsave(). Have I understood you correctly? Not quite. The *consumers upcall* would not be called, due to the irq disable. The race would not occur, OTOH, because the Mellanox hw will initiate a completion notification even if the completions in the cq arrived before the notification request. If you want to be more ib compliant, for future possible implementations, you can apply the extended-notify-routine (as mentioned above). My belief is that the completion will occur on the card regardless of the interrupt state. True, but the consumer will be notified only as soon as the irq is enabled again Can you provide me with a reference that guarantees this will not happen? Im not saying that it wont ;) but I don't think there will be a race... Guy ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RDMA Generic Connection Management
James What happens if multiple devices can reach the destination James address? How will they be enumerated to the consumer? Roland I guess we need to move towards the full horror of getaddrinfo(). Roland Probably we need some unusable native API, and then library functions Roland layered on top for consumers that don't care. Roland Although maybe it's not necessary -- are there any consumers of this Roland API that really want to choose among different equal-metric routes? I don't think iSER does. Any way, I think we need to agree on the basic principle API, and if we want to extend it, along the way of implementation (like an array of suitable devices instead of a chosen one) we would be able to patch it. Guy ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RDMA Generic Connection Management
Quoting r. Guy German [EMAIL PROTECTED]: 1. get_device returns device and port, according the local routing tables, synchronously. ib_cma_connect calls the at module for address resolving (cache handled) before calling the cm_connect. How does one cancel address resolution request? 2. get_device registers an upcall and return in the upcall the data path to the consumer. In this case caching is done by the consumer. I prefer option 1, because it makes the consumer code simpler, without having to handle upcalls for address translations (which are not asynchronous in iWARP) or hold the transport's data information. Also it saves the consumer the trouble of caching routes to destinations. I would like to hear what other people in the list think of it ... In the case of callback (option 2) I really hope functions will work with some kind of object pointer, avoiding another layer of hash lookups and stuff. Something like struct ib_cma_path { struct ib_device *device; struct list_head arp_list; struct ib_sa_query *query; int id; . void (*comp_handler)(struct ib_cma_path *, int status); }; Users should simply pass this object back to the cancel request. I am also in favor of making this structure public, making it possible for users to add arbitrary amount of private data by simply inheriting the structure and using container_of in comp_handler, but this is a separate issue. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RDMA Generic Connection Management
Quoting r. Roland Dreier [EMAIL PROTECTED]: Subject: Re: RDMA Generic Connection Management James What happens if multiple devices can reach the destination James address? How will they be enumerated to the consumer? I guess we need to move towards the full horror of getaddrinfo(). Probably we need some unusable native API, and then library functions layered on top for consumers that don't care. I see a problem in that the number of paths may be very big. It just does not make sense to me to pass them all up the layer to let the ULP deal with selecting one. Although maybe it's not necessary -- are there any consumers of this API that really want to choose among different equal-metric routes? I think that yes: APM might be one good reason to want to get more than one path, would it not? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] ipoib: device removal races
Quoting r. Roland Dreier [EMAIL PROTECTED]: Subject: Re: [PATCH] ipoib: device removal races Michael Can sa query be changed to perform the callback directly, Michael and so guarantee that query isnt used after Michael ib_sa_cancel_query returns? Hmm, that gets into the MAD layer design, but I think it gets very tricky. For example, how do we know that the query isn't already completing on a different CPU as we enter the cancel call? Something like: Remove it from the idr before completing, under a spinlock. Now if its in idr its not completing. Could this work? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Re: RDMA Generic Connection Management
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Michael S. Tsirkin Sent: Monday, August 29, 2005 1:19 PM To: Roland Dreier Cc: openib-general@openib.org Subject: [openib-general] Re: RDMA Generic Connection Management I think that yes: APM might be one good reason to want to get more than one path, would it not? But you would have to define automatic path migration in generic/transport neutral terms. I've actually come up with some definitions that are inclusive of IB and SCTP, but a definition of Automatic Path Migration that includes TCP isn't going to be very meaningul since the migration of a TCP connection occurs one layer lower than for IB. If APM can only be defined for IB then it does not have to be addressed in the generic interface. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Re: RMPP Message Format Errors
In my interpretation, partial data is indicated by the PayloadLength field in the last segment only. It's quite possible that my interpretation is incorrect, in which case the calculation in the RMPP code is off. I agree the text might be missing an example or two for clarification. Anyway, we probably can use the IB Analyzer as the ultimate interpretation test. Note that there are IB implementations that uses the first segment payload length as the source of packet length and count on it to represent the correct DATA length. We can take your interpretation to discussion in the IBTA MGTWG for further discussion. Is the effort for fixing it big? It's not a big deal to change it. If the common interpretation is to only include the partial data size, I will change it. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [PATCH][iWARP] Added provider CM verbs andquery provider methods
Roland We're planning to go back with a new submission which addresses the concerns that were directly relevant to the patch itself. In the process, we'll be porting the patch to 2.6.14. A couple of comments: If one were to look at the patch in its current form, you'd find that it is already quite minimal compared to the changes needed for 10GE TCP/IP alternatives. The architecture that we propose also accommodates different TOE approaches, e.g. different connection setup models, etc. We currently have the proposed architecture running on Linux in conjunction with a regular NIC, iWARP RNIC, and iSCSI HBA. Finally, with the new submission, we're hoping to get a more constructive dialogue going, which focuses on the patch itself, because it is clear that there is user interest in the technology, and Linux support would be beneficial to all parties. 'Asgeir -Original Message- From: Roland Dreier [mailto:[EMAIL PROTECTED] Sent: Monday, August 29, 2005 9:24 AM To: Asgeir Eiriksson Cc: openib-general@openib.org Subject: Re: [openib-general] [PATCH][iWARP] Added provider CM verbs andquery provider methods Asgeir ...this is the approach taken in the Chelsio TOE patches Asgeir that we have submitted. What are your plans for these patches? I am not subscribed to netdev, but from reading the archives, it seems that your most recent submission was rejected quite strongly. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] ipoib: device removal races
Michael Something like: Michael Remove it from the idr before completing, under a Michael spinlock. Now if its in idr its not completing. Michael Could this work? I think you have to hold the spinlock across the consumer callback to avoid all races. And that's kind of a bummer, because it means you can't do anything that might sleep (like modify a QP) from the callback. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: Re: RDMA Generic Connection Management
Quoting Caitlin Bestler [EMAIL PROTECTED]: APM might be one good reason to want to get more than one path, would it not? But you would have to define automatic path migration in generic/transport neutral terms. I've actually come up with I dont see a problem. For the sake of this argument, lets assume APM cant be done with iWARP. How is an iWARP card different from an HCA on a fabric where there's only a single path to a specific node then? I dont have the IB spec in front of me now - is APM support optional or required in IB? If APM can only be defined for IB then it does not have to be addressed in the generic interface. I dont see how you can layer this on top without support for reporting multiple paths, so it will need to be addressed as part of this module. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH][iWARP] IW CM Verbs
James Lentini wrote: Why does the ib_device need a cm structure for iWARP but not IB? If you used either Guy or Roland's generic RDMA connection API and did the iWARP implementation, would you still need to add the iw_cm structure? Their connection protocol is implemented in hardware. Even with a generic CM API, I believe that they'll need these calls. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] RDMA Generic Connection Management
Guy German wrote: - ib_cma_get_device (...) /* get device(1) or device+path(2) */ - pd = ib_alloc_pd(...) /* pd allocated in the given device */ - qp = ib_cma_create_qp(...) /* qp returned in init state */ - ib_post_recv(qp, ...); - ib_cma_connect (qp, dst_addr(1)/path(2), ...); To focus on something a little different... do we want an API that returns a pointer to a device structure? Specifically, how does this affect device removal? - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: Re: RDMA Generic Connection Management
-Original Message- From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED] Sent: Monday, August 29, 2005 1:46 PM To: Caitlin Bestler Cc: Roland Dreier; openib-general@openib.org Subject: Re: Re: RDMA Generic Connection Management Quoting Caitlin Bestler [EMAIL PROTECTED]: APM might be one good reason to want to get more than one path, would it not? But you would have to define automatic path migration in generic/transport neutral terms. I've actually come up with I dont see a problem. For the sake of this argument, lets assume APM cant be done with iWARP. How is an iWARP card different from an HCA on a fabric where there's only a single path to a specific node then? There is a very important difference. The iWARP card *can* support automatic path migration that is not visible to the RDMA layer -- i.e.,it can move an L3 address to a new L2 address (port migration). As such it is very different from an IB device where all path migration that exists is visible. Making it even more complex iWARP/SCTP could support both L4 and L3 migration. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] RDMA Generic Connection Management
Sean To focus on something a little different... do we want an Sean API that returns a pointer to a device structure? Sean Specifically, how does this affect device removal? Hey, that's a really good point. We should make sure that our API makes it easy to handle device hotplug. One solution is to start reference counting device references, but that inevitably leads to bugs in ULPs -- protocol authors won't get it right unless we make it really easy. And I don't see how to make the reference counting trivial. Anyone have a better idea? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] ipoib: device removal races
Quoting r. Roland Dreier [EMAIL PROTECTED]: Subject: Re: [PATCH] ipoib: device removal races Michael Something like: Michael Remove it from the idr before completing, under a Michael spinlock. Now if its in idr its not completing. Michael Could this work? I think you have to hold the spinlock across the consumer callback to avoid all races. Hmm. I think I see what you mean. Would setting the completion callback to NULL in the query structure under the idr spinlock work? It now seems to me it will. And that's kind of a bummer, because it means you can't do anything that might sleep (like modify a QP) from the callback. Its an sa query, so I'm not sure why would you want to modify a QP there. Further, please note that in the current API the callback is always called even if the query is cancelled. And clearly you cant allow cancel under a spinlock and at the same time ensure callback is performed and is allowed to sleep. I think its not a big problem to let cancel return a code meaning completion was cancelled, perform the callback yourself if you want to. I imagine ulps may special-case cancellation, anyway. Would such an API change be OK? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RDMA Generic Connection Management
Quoting r. Roland Dreier [EMAIL PROTECTED]: Subject: Re: RDMA Generic Connection Management Sean To focus on something a little different... do we want an Sean API that returns a pointer to a device structure? Sean Specifically, how does this affect device removal? Hey, that's a really good point. We should make sure that our API makes it easy to handle device hotplug. One solution is to start reference counting device references, but that inevitably leads to bugs in ULPs -- protocol authors won't get it right unless we make it really easy. And I don't see how to make the reference counting trivial. Anyone have a better idea? Roland, could you please explain what the problem is? If you have an outstanding request, and all devices went down, cant it simply be completed with an error status? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: [PATCH] ipoib: device removal races
Michael S. Tsirkin wrote: Its an sa query, so I'm not sure why would you want to modify a QP there. Further, please note that in the current API the callback is always called even if the query is cancelled. And clearly you cant allow cancel under a spinlock and at the same time ensure callback is performed and is allowed to sleep. I think its not a big problem to let cancel return a code meaning completion was cancelled, perform the callback yourself if you want to. I imagine ulps may special-case cancellation, anyway. Would such an API change be OK? This is similar to some of the discussions that went into cancel MADs. It should be possible for the SA to return a value from cancel that indicates that no callback will occur. However, it's not possible for it to return a value that indicates that one will occur. In the latter case, the callback could have already occurred or may be in progress. Which means that a user calling cancel has to be able to deal with a callback occurring anyway. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: Re: [PATCH] ipoib: device removal races
Quoting Sean Hefty [EMAIL PROTECTED]: In the latter case, the callback could have already occurred or may be in progress. Which means that a user calling cancel has to be able to deal with a callback occurring anyway. Wont a bit in the query structure suffice? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RDMA Generic Connection Management
Michael Roland, could you please explain what the problem is? If Michael you have an outstanding request, and all devices went Michael down, cant it simply be completed with an error status? Something like: get_device_for_route(device); /* hot unplug device */ ib_create_qp(device); /* how do we handle this? */ ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RDMA Generic Connection Management
Quoting r. Sean Hefty [EMAIL PROTECTED]: Subject: Re: RDMA Generic Connection Management Guy German wrote: - ib_cma_get_device (...) /* get device(1) or device+path(2) */ - pd = ib_alloc_pd(...) /* pd allocated in the given device */ - qp = ib_cma_create_qp(...) /* qp returned in init state */ - ib_post_recv(qp, ...); - ib_cma_connect (qp, dst_addr(1)/path(2), ...); To focus on something a little different... do we want an API that returns a pointer to a device structure? Yes, I think its much better than dealing with type-unsafe indexes, wasting memory on tables and/or forcing table lookups on each call. Specifically, how does this affect device removal? - Sean How is this different from what we have with ib_verbs now? I think that reasonable ULPs must register for hotplug events in the ib layer, anyway. So when they get a device removal callback, they close the qps etc. Makes sense? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] ipoib: device removal races
Sean said it well, but to repeat: the problem you run into is what to do when a consumer tries to cancel while the callback is running. For example, one CPU might be in the middle of jumping to the consumer's callback when the other CPU enters the cancel function. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RDMA Generic Connection Management
Quoting r. Roland Dreier [EMAIL PROTECTED]: Subject: Re: RDMA Generic Connection Management Michael Roland, could you please explain what the problem is? If Michael you have an outstanding request, and all devices went Michael down, cant it simply be completed with an error status? Something like: get_device_for_route(device); /* hot unplug device */ ib_create_qp(device); /* how do we handle this? */ Register with ib layer for hotplug events, flush the queue that does this. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RDMA Generic Connection Management
Michael S. Tsirkin wrote: How is this different from what we have with ib_verbs now? With ib_verbs, users receive notification of device addition/removal. This interface doesn't require receiving that notification. I think that reasonable ULPs must register for hotplug events in the ib layer, anyway. So when they get a device removal callback, they close the qps etc. Makes sense? This opens up the possibility for a user to receive a reference to a device that they may not have received previous notification for. Similarly, the device could have been removed before the call returned, making the pointer invalid. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] license mismatches
I had reviewed the licenses used by files in https://openib.org/svn/gen2/trunk. The following .c and .h files do not match the OpenIB licenses: https://openib.org/svn/gen2/trunk/src/userspace/tvflash/src/tvflash.c https://openib.org/svn/gen2/trunk/src/userspace/tvflash/src/firmware.h https://openib.org/svn/gen2/trunk/src/userspace/examples/aio/ttcp.aio.c https://openib.org/svn/gen2/trunk/src/userspace/management/osm/complib/M akefile.mlx https://openib.org/svn/gen2/trunk/src/userspace/management/osm/opensm/os m_indent all files in directories: https://openib.org/svn/gen2/trunk/src/userspace/mstflint/ https://openib.org/svn/gen2/trunk/src/userspace/mpi/ files in directory https://openib.org/svn/gen2/trunk/src/userspace/libsdp/src/ have the right licenses but the copyright message does not match the OpenIB copyright. Several files do not have any licences, like Makefile, configure and map files. For example, https://openib.org/svn/gen2/trunk/src/userspace/libibcm/src/libibcm.map https://openib.org/svn/gen2/trunk/src/userspace/libibcm/Makefile.am I think this is OK. I suspect that all these are oversites and all the files should be available under both BSD and GPL2 licenses. Thanks, Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] license mismatches
Arkady I suspect that all these are oversites and all the files Arkady should be available under both BSD and GPL2 licenses. tvflash at least is licensed correctly. It links to the pciutils library, which is licensed under the GPL. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] rc ping pong error
I have the latest openib code on 2.16 machine, when I run the rc pingpong program I get the following error (The first time it passed, but subsequent ones got an error, I tried changing the iteration count to a large number, 10 after the first time) #dmesg ib_mthca :05:00.0: Mapped page at 395aa000 to 8 for ICM. ib_mthca :05:00.0: CQ overrun on CQN 5b0083 = ib_mthca :05:00.0: Unmapping 1 pages at 8 from ICM. [EMAIL PROTECTED] ./ibv_rc_pingpong 192.169.8.117 local address: LID 0x0003, QPN 0x440405, PSN 0xd6ae4e remote address: LID 0x0001, QPN 0x3a0405, PSN 0x9317a4 [ 0] 00440405 [ 4] [ 8] [ c] [10] 1581 [14] [18] 8002 [1c] ff10 Failed status 12 for wr_id 2 Start your day with Yahoo! - make it your home page http://www.yahoo.com/r/hs ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] uat: make uat.c compile on 2.6.13-rc3
On Thu, 2005-07-28 at 14:50, Tom Duffy wrote: This patch is similar to the one for ucm. It updates the class code to work with 2.6.13-rc3. Thanks. (Finally) applied now that 2.6.13 is out :-) -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [mgtwg] Payload Length in first RMPP sent segment
Hal, My take is that there's no ambiguity. Then again, I wrote it, so I would think that, right? :-) The idea is that we're trying to allow *either* of the usual two options for specifying a string of stuff: (a) Start out by giving the length; or (b) go until you reach a special mark meaning the end. The thing is it gets complicated when there is only one packet. So take two cases: 1 packet, and ==1 packet. length 1 packet: -- PayloadLength 0 on 1st packet means case (a). Just read until you get that many bytes, which may use only part of the last packet. If the last packet isn't also marked last, scream about inconsistency. -- PayloadLength=0 on first packet - case (b). Read until you get a marked last packet. PayloadLength in that last packet tells you how many are valid in that packet (zero in that case -- I'm not sure; whole packet, I think). length ==1 packet meaning RMPPFlags.Last=1 and RMPPFlags.First=1 in the same packet. -- Interpretation is the same as the last packet case above, i.e., RMPPFlags.Last=1 dominates the interpretation. As far as I know, that's it. Any comments from others? (This may not forward to openib-general, since I'm not on that list; if it doesn't please forward.) Greg Pfister IBM Distinguished Engineer, Member IBM Academy of Technology IBM Systems Technology Group, Server Technology Architecture (512) 838-8338 | IBM tieline 678-8338 | FAX (512) 838-3418 Sic Crustulum Frangitur Hal Rosenstock [EMAIL PROTECTED] 08/29/2005 08:14 AM To [EMAIL PROTECTED] cc openib-general@openib.org Subject [mgtwg] Payload Length in first RMPP sent segment Hi, On the RMPP send side, while the Payload Length field in the last segment is clear that it indicates the number of valid bytes in Transferred Data, there seems to be some ambiguity in the optional Payload Length field in the first segment. I think it can work either way but I also think the intent was to reflect the valid bytes. Maybe it is this way to allow flexibility (choice in the implementation). What is the correct interpretation ? Should I enter a comment on this ? Thanks. -- Hal IBA 1.2 p.775 line 37 In the first packet of an RMPP transfer (RMPPFlags.First=1), PayloadLength may indicate the sum of the lengths, in bytes, of the TransferredData fields in all packets of the entire multipacket response; this is done by using a nonzero value for PayloadLength in the first packet. IBA 1.2 p. 776 line 8 In the last packet of an RMPP transfer (RMPPFlags.Last=1), PayloadLength indicates the number of valid bytes in the TransferredData field, allowing data transfers that are not an integral multiple of the length of the TransferredData field. A transfer terminates when either: (a) a packet containing RMPPFlags.Last=1 is received; or (b) a nonzero PayloadLength was given in the first packet of a transfer, and a packet is received containing sufficient TransferredData bytes to equal or exceed the PayloadLength originally provided. If case (b) occurs and RMPPFlags.Last is not 1 for that packet, the Receiver sends an ABORT packet with RMPPStatus of Inconsistent Last and PayloadLength and terminates the transfer. smime.p7s Description: S/MIME Cryptographic Signature ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] sdp: use linux/list.h in sdp_link.c
On Aug 29, 2005, at 8:00 AM, Michael S. Tsirkin wrote: The following kills sdp_link.h and converts sdp_link.c to use linux/ list.h Locking is still missing here. Cool, cool. I was going go get to this eventually. I just got back from vacation and I am still waiting for a machine so I can setup a rudimentary IB network at home to test my code. I have a patch that converts sdp_buff.[ch] to use linux/list.h (glad you didn't decide to work on that), but I want to test it before submitting to the list (it compiles!). -tduffy ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [mgtwg] Payload Length in first RMPP sent segment
Hi Greg, On Mon, 2005-08-29 at 18:56, Greg Pfister wrote: Hal, My take is that there's no ambiguity. Then again, I wrote it, so I would think that, right? :-) The idea is that we're trying to allow *either* of the usual two options for specifying a string of stuff: (a) Start out by giving the length; or (b) go until you reach a special mark meaning the end. The latter being streaming mode. The thing is it gets complicated when there is only one packet. So take two cases: 1 packet, and ==1 packet. It seems more complicated (perhaps 2 options when there is more than 1 packet). length 1 packet: -- PayloadLength 0 on 1st packet means case (a). Just read until you get that many bytes, which may use only part of the last packet. If the last packet isn't also marked last, scream about inconsistency. So if one is using this option, does the payload length in the 1st packet reduced by 220 * (number of packets - 1) need to match the payload length in the last packet ? That's a slightly different inconsistency from the packet not being marked last but the original length not exhausted. -- PayloadLength=0 on first packet - case (b). Read until you get a marked last packet. PayloadLength in that last packet tells you how many are valid in that packet (zero in that case -- I'm not sure; whole packet, I think). For SA, wouldn't anything less than 20 would be an error in the last packet ? If it were 20, it would be legal but an inefficient implementation (as really the previous packet was full and could have terminated the RMPP send). length ==1 packet meaning RMPPFlags.Last=1 and RMPPFlags.First=1 in the same packet. -- Interpretation is the same as the last packet case above, i.e., RMPPFlags.Last=1 dominates the interpretation. As far as I know, that's it. Any comments from others? (This may not forward to openib-general, since I'm not on that list; if it doesn't please forward.) It made it to openib. It's an open list as far as posting goes. Thanks. -- Hal Greg Pfister IBM Distinguished Engineer, Member IBM Academy of Technology IBM Systems Technology Group, Server Technology Architecture (512) 838-8338 | IBM tieline 678-8338 | FAX (512) 838-3418 Sic Crustulum Frangitur Hal Rosenstock [EMAIL PROTECTED] 08/29/2005 08:14 AM To [EMAIL PROTECTED] cc openib-general@openib.org Subject [mgtwg] Payload Length in first RMPP sent segment Hi, On the RMPP send side, while the Payload Length field in the last segment is clear that it indicates the number of valid bytes in Transferred Data, there seems to be some ambiguity in the optional Payload Length field in the first segment. I think it can work either way but I also think the intent was to reflect the valid bytes. Maybe it is this way to allow flexibility (choice in the implementation). What is the correct interpretation ? Should I enter a comment on this ? Thanks. -- Hal IBA 1.2 p.775 line 37 In the first packet of an RMPP transfer (RMPPFlags.First=1), PayloadLength may indicate the sum of the lengths, in bytes, of the TransferredData fields in all packets of the entire multipacket response; this is done by using a nonzero value for PayloadLength in the first packet. IBA 1.2 p. 776 line 8 In the last packet of an RMPP transfer (RMPPFlags.Last=1), PayloadLength indicates the number of valid bytes in the TransferredData field, allowing data transfers that are not an integral multiple of the length of the TransferredData field. A transfer terminates when either: (a) a packet containing RMPPFlags.Last=1 is received; or (b) a nonzero PayloadLength was given in the first packet of a transfer, and a packet is received containing sufficient TransferredData bytes to equal or exceed the PayloadLength originally provided. If case (b) occurs and RMPPFlags.Last is not 1 for that packet, the Receiver sends an ABORT packet with RMPPStatus of Inconsistent Last and PayloadLength and terminates the transfer. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [MailServer Notification]To Recipient virus found and action taken.
ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = [EMAIL PROTECTED] Recipient(s) = openib-general@openib.org Subject = [openib-general] Your password has been successfully updated Scanning time = 8/29/2005 10:08:35 PM Engine/Pattern = 7.510-1002/2.805.00 Action on virus found: The attachment account-password.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. 8/29/2005 account-password.zip/Deleted openib-general@openib.org [EMAIL PROTECTED] [openib-general] Your password has been successfully updated ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] rc ping pong error
viswanath I have the latest openib code on 2.16 machine, when I viswanath run the rc pingpong program I get the following error viswanath (The first time it passed, but subsequent ones got an viswanath error, I tried changing the iteration count to a large viswanath number, 10 after the first time) I left ibv_rc_pingpong -n 10 running in a loop between two of my machines with no problems, so there's something specific to your setup. When you say latest openib code, what does this mean? Are you running something from subversion or a standard Linux kernel? Do you have 1-port or 2-port HCAs? What HCA firmware version are you running? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general