Hi,

- Here is a header file for cm abstraction API proposition.
- This is just a preliminary suggestion, for review.
- All comments are welcome.
- Please read the notes in the header remarks
- I am attaching the file and will send it later in a different message,
to the list.
- I think that the ib_ prefix should be changed to rdma_, but that
should be done for the rest of the verbs as well, if we are claiming
that the ib verbs abstract iwarp.
- I think that the main difference between the 2 propositions is the
question of whether or not to expose the consumer to the address
resolution. I believe this suggestion (of covering it in the cma) is
simpler, because it saves unnecessary upcall handling for the consumer.
In any case - I don't believe this is clear cut, and would like to hear
other opinions from people on the list.
- Also please see my embedded answer to this mail


Thanks,
Guy.

> We already discussed the problem with having the listen callback pass
> the consumer a remote source address -- doing this requires the
> connection handling module to do an ATS reverse lookup in the IB case,
> which the consumer might not want.  I think there's agreement that the
> correct thing here is for the listen callback to pass a transport
> address to the consumer and provide a function that the consumer can
> call to perform an ATS reverse lookup if desired.  This isn't a major
> problem and can be dealt with.

I agree. This is corrected in the current suggestion

> However, there's another problem with trying to lump address
> translation and connection into a single "connect" call, and this
> problem looks fundamental and fatal to me.  The connect call takes a
> QP pointer, but to create a QP the consumer needs to know which local
> device to use.  However, the consumer doesn't know which device to use
> until the destination address has been resolved to a route, including
> a local interface.

The proposition, also presented (I beleive) in the OpenIB workshop,
include a function called ib_cma_get_device, that retrieves the device
(for qp creation purposes) according to the destination address and the
local routing table. This is done synchronously, and it is implemented
today in the at module. If using link-local IPv6 addresses, I think that
this function isn't even necessary (If I understand it correctly - you
need to know which device to get out from).

> As far as I can tell, kDAPL punts on this and simply requires the
> consumer to handle the route lookup itself before calling
> dat_ep_connect().  It seems that current kDAPL consumers similarly
> punt on this issue: the iSER initiator and the NFS-RDMA client both
> just use a single device which is statically discovered at init time.
> 
> It seems that the kDAPL connection model has a serious flaw, in that
> it pushes the complexity of route lookup into the consumer.  Further,
> we have strong evidence that this routing code is hard to write and
> that consumers will just ignore this complexity and hard-code
> solutions that don't work under all configurations.
> With this in mind, I believe that the connection API needs to be
> something more like the following:
> 
>     rdma_resolve_address():
>         inputs: dest IP address, qos, npaths,
>             done callback, opaque context
>       done callback params: status, local RDMA device,
>             RDMA transport address, context
> 
>         This function starts the process of resolving an IP address to
>         an RDMA device and address.  When the resolution is complete,
>         the callback is called with a status.  If the status is
>         "success" then the callback also gets the device pointer and
>         transport address (as well as the original context that the
>         consumer passed in).

In the address resolution you have 2 upcalls (from ip to gid and from
gid to path). So, if you are already covering one upcall in the cma, why
not cover both ?

>         The "RDMA transport address" type is a union containing
>         transport-dependent data.  In the IB case, it's all of the
>         SGID, DGID, SLID, DLID, SL etc. that we know and love.  In the
>         iWARP case, it's the source IP, destination IP and QOS.
> 
>         npaths can be either 1 or 2 in the IB case; if it's 2, then
>         the resolver will try to find a primary and alternate path for
>         APM.  In the iWARP case, I guess npaths will always be 1, and
>         I guess anyone who wants to use iWARP over multihomed SCTP
>         will probably have to use some lower-level API.
> 
>         By the way, we may also have to have the option of passing in
>         a local netdev so that we can handle link-local IPv6
>         addresses.  There may be other cases I haven't thought of yet.
>         I just hope we can avoid going all the way to the horror of
>         the getaddrinfo() API.
> 
>         I also hope we can agree to use IPoIB ARP to resolve the
>         address in the IB case; having a flag or some other hack in
>         the API to expose the option of ATS seems unacceptably ugly.
> 
>     rdma_connect():
>         inputs: local QP, RDMA transport address, destination service,
>             private data, timeout, event callback, opaque context
> 
>         This function takes the resolved address and actually connects.
> 
>         I'm not sure how we want to abstract the IB service vs. iWARP
>         TCP port number difference.  I guess it's OK to have iWARP
>         consumers stick their (16-bit) port number in a 64-bit
>         parameter, even if it's not the prettiest API.
> 
> To head off the knee-jerk objection: this API does NOT require any
> transport-specific code in consumers (unless a particular consumer
> WANTS to look inside the RDMA transport address).  Code to connect
> would be as simple as:
> 
>     rdma_resolve_address(...);
>     /* wait for resolution */
>     ib_create_qp(...) /* use device pointer we got from 
> rdma_resolve_address() */
>     rdma_connect(...); /* pass transport address we got from 
> rdma_resolve_address() */
>     /* wait for connection to finish... */


Wouldn't it be simpler (for the consumer) to do:

        resolve_device_by_destip();
        /* don't wait */
        ib_create_qp(...) /* use device pointer we got */
        rdma_connect(dest_ip); /* cma resolution implementation for ib*/
        /* wait for at + connection to finish... */

I think this flow is also more "iwarp friendly" - saves them the
asynchronic rdma_resolve_address wait.

> 
> The listen side is even simpler:
> 
>     rdma_listen():
>         inputs: local service, event callback, consumer context
> 
>         Wait for connection requests and pass events to the consumer's
>         callback.  I'm not sure if/home we want to support binding to
>         a particular IP address.  The current IB CM in Linux doesn't
>         support binding a listen to a single device or port, and even
>         if it did it's not clear how to handle binding to one IP
>         address when a port has more than one IP.
>         I guess the event callback would receive a device pointer and
>         the same RDMA transport address union I talked about above
>         when discussing address resolution.
> 
>         It would be possible to have another function like
>         rdma_getpeername() that takes the transport address and
>         returns a source IP address.  In the IB case this would do an
>         ATS reverse lookup.  However, I hate this idea.  iSER already
>         uses the CM private data to pass the source IP in the IB case,
>         and I would much rather fix NFS/RDMA to do the same thing (so
>         we can just kill ATS as an address resolution method).

>  - R.



/*
 * Copyright (c) 2005 Voltaire Inc.  All rights reserved.
 *
 * This Software is licensed under one of the following licenses:
 *
 * 1) under the terms of the "Common Public License 1.0" a copy of which is
 *    available from the Open Source Initiative, see
 *    http://www.opensource.org/licenses/cpl.php.
 *
 * 2) under the terms of the "The BSD License" a copy of which is
 *    available from the Open Source Initiative, see
 *    http://www.opensource.org/licenses/bsd-license.php.
 *
 * 3) under the terms of the "GNU General Public License (GPL) Version 2" a
 *    copy of which is available from the Open Source Initiative, see
 *    http://www.opensource.org/licenses/gpl-license.php.
 *
 * Licensee has the right to choose one of the above licenses.
 *
 * Redistributions of source code must retain the above copyright
 * notice and one of the license notices.
 *
 * Redistributions in binary form must reproduce both the above copyright
 * notice, one of the license notices in the documentation
 * and/or other materials provided with the distribution.
 *
 */

/*
 *  This header file as a preliminary proposition for a connection manager 
 *  abstraction layer (cma) for IB and iwarp 
 *  - there is an assumption that iwarp uses the same openib qp terminology in 
 *    the rest of the verbs, and the only place needs abstraction is the cm.
 *  - This proposition assumes that the address translation is done in the cma
 *    layer.
 *  - The cma also modifies the qp states to init/rtr/rts and error as needed.
 *  - for calling accept/reject or disconnect on the passive side you need to 
 *    use  the cma handle accepted in ib_cma_listen cb.
 *  - cma_id is created when calling connect or listen and destroyed when 
 *    accepting disconnected/rejected/unreachable events on either active
 *    side (connect cb) or passive side (accept cb)
 */

#ifndef IB_CMA_H
#define IB_CMA_H

#include <linux/socket.h>

enum ib_cma_event {
	IB_CMA_EVENT_ESTABLISHED,
	IB_CMA_EVENT_REJECTED,
	IB_CMA_EVENT_DISCONNECTED,
	IB_CMA_EVENT_UNREACHABLE
};

enum ib_qos {
	IB_QOS_BEST_EFFORT = 0,
	IB_QOS_HIGH_THROUGHPUT = (1 << 0),
	IB_QOS_LOW_LATENCY = (1 << 1),
	IB_QOS_ECONOMY = (1 << 2),
	IB_QOS_PREMIUM = (1 << 3)
};

enum ib_connect_flags {
	IB_CONNECT_DEFAULT_FLAG = 0x00,
	IB_CONNECT_MULTIPATH_FLAG = 0x01
};

/* 
 * for ib_cma_get_src_ip - ib_cma_id will have to include 
 * the path data received in the request handler
 */
union ib_cma_id{
	struct ib_cm_id *cm_id;
	u32 iwarp_id;
};

typedef void (*ib_cma_rarp_handler)(struct sockaddr *src_ip, void *context);
typedef void (*ib_cma_ac_handler)(enum ib_cma_event event, void *context);
typedef void (*ib_cma_event_handler)(enum ib_cma_event event, void *context,
				     void *private_data);
typedef void (*ib_cma_listen_handler)(union ib_cma_id *cma_id, 
				      void *private_data, void *context);

struct ib_cma_conn {
	struct ib_qp *qp;
	struct ib_qp_attr *qp_attr;
	struct sockaddr *dst_ip;
	__be64 service_id;
	void *context;
	ib_cma_event_handler cma_event_handler;
	const void *private_data;
	u8 private_data_len;
	u32 timeout;
	enum ib_qos qos;
	enum ib_connect_flags connect_flags;
};


/**
 * ib_cma_get_device - Returns the device to be used according to
 *   the destination ip address (this can be detemined according 
 *   to the local routing table). Call this function before 
 *   creating the qp. If using link-local IPv6 addresses
 * @remote_address: The destination address for connection
 * @device: The device to use (returned by the function)
 */
int ib_cma_get_device(struct sockaddr *remote_address,
		      struct ib_device **device);


/**
 * ib_cma_connect - this is the connect request function, called by 
 *   the active side. The consumer registers an upcall that will be 
 *   initiated by the cma with an appropriate connection event 
 *   notification (established/rejected/disconnected etc)
 * @cma_conn: This structure contains the following connection parameters:
 *   @qp: qp for establishing the connection
 *   @qp_attr: only relevant attributes are used
 *   @dst_ip: destination ip address
 *   @service_id: destination service id (port)
 *   @context: context to be returned in the callback
 *   @cma_event_handler: the upcall function for the active side
 *   @private_data: private data to be received at the listener upcall
 *   @private_data_len: private data length (max 255)
 *   @timeout: 
 *   @qos: Quality os service for the rc
 *   @connect_flags: default or multipath connection
 * @cma_id: This returned handle is a union (different in ib and iwarp)
 *   in ib - it is the cm_id.
 */
int ib_cma_connect(struct ib_cma_conn *cma_conn,
		   union ib_cma_id *cma_id);


/**
 * ib_cma_disconnect - this function disconnects the rc. It can be 
 *   called, by either the passive or active side
 * @qp: the connected qp to disconnect
 * @cma_id: On the active side- this handle is the one returned 
 *   when ib_cma_connect was called.
 *   On the passive side- this handle was accepted in cma_listen callback
 */
int ib_cma_disconnect(struct ib_qp *qp, union ib_cma_id *cma_id);


/**
 * ib_cma_sid_listen - this function is called by the passive side. It is
 *   listening on a the specified port (ib service id) for incomming 
 *   connection requests
 * @device: ? need to resolve this issue
 * @service_id: service id (port) to listen on
 * @context: user context to be returned in the callback
 * @cm_listen_handler: the listen callback
 * @cma_id: cma handle for the passive side
 */
int ib_cma_sid_listen(struct ib_device *device, __be64 service_id,
		      void *context, ib_cma_listen_handler cm_listen_handler,
		      union ib_cma_id *cma_id);


/**
 * ib_cma_sid_destroy - this functionis is called on the passive side, to 
 *   stop listenning on a certain sevice id
 * @cma_id: the same cma handle received when ib_cma_sid_listen was called
 */
int ib_cma_sid_destroy(union ib_cma_id *cma_id);


/**
 * ib_cma_accept - call on the passive side to accept a connection request
 * @cma_id: this handle was accepted in cma_listen callback
 * @qp: the connection's qp
 * @private_data: private data to send back to the initiator
 * @private_data_len: private data length
 * @context: user context to be returned in the callback
 * @cm_accept_handler: the cma accept callback - triggered when RTU ack
 *   received
 */
int ib_cma_accept(union ib_cma_id *cma_id, struct ib_qp *qp, 
		  const void *private_data, u8 private_data_len, 
		  void *context, ib_cma_ac_handler cm_accept_handler);

/**
 * ib_cma_reject - call on the passive side to reject a connection request.
 *   This call destroys the cma_id, hence when the active side accepts
 *   the reject the cma_id is already destroyed.
 * @cma_id: this handle was accepted in cma_listen callback
 * @private_data: private data to send back to the initiator
 * @private_data_len: private data length
 */
int ib_cma_reject(union ib_cma_id *cma_id, const void *private_data,
		  u8 private_data_len);


/**
 * ib_cma_get_src_ip - this function performs "rarp", asynchronicly
 *   from cma_id to src ip
 * @cma_id: the cma_id will have to include the path data received 
 *   in the request handler
 * @src_ip: source ip of the initiator
 */
int ib_cma_get_src_ip(union ib_cma_id *cma_id,
		      ib_cma_rarp_handler rarp_handler,
		      void *context);

#endif /* IB_CMA_H */

_______________________________________________
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to