Re: [PATCH v5 0/1] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path

2010-08-17 Thread Ralph Campbell
BTW, I will be around today and tomorrow to reply to comments
but then I will be on vacation through August.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 0/1] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path

2010-08-17 Thread Ralph Campbell
Hopefully, this is the last update needed to this patch and
it can be applied. It has been in use within QLogic for several months
without problems.

Changes from v4 to v5:

Removed the "break;" in ipoib_cm_flush_path() so now the function
comment matches the code.

Updated the error case code in ipoib_cm_tx_start() so the address
handle and cached GID are reset so the next call to ipoib_start_xmit()
will create a new CM connection.

Added netif_tx_lock_bh() in ipoib_neigh_cleanup()
in order to call ipoib_cm_destroy_tx().
I don't think it is strictly needed, I'm just being paranoid and I
don't think it is a performance path when deleting struct neighbour.

=
These are my notes on the IPoIB locking and what I figured out
in the process of creating the patch that follows.

IPoIB connected mode uses a separate QP for transmit and receive.
I will only talk about the transmit side although the some of the
data structures are used by both.

The executive summary for locking is that most things are protected
by the struct ipoib_dev_priv.lock. The network device lock,
netif_tx_lock(), netif_tx_lock_bh(), or stopping network output via
netif_stop_queue() is used to prevent ipoib_start_xmit() and
ipoib_cm_send() from being called which can access struct ipoib_neigh
and struct ipoib_cm_tx without holding locks.

struct sk_buff {
  // This pointer may hold a reference (see SKB_DST_NOREF).
  // IPoIB doesn't change this pointer so the locking rules aren't important.
  unsigned long _skb_refdst;
}

struct dst_entry {
  // The neighbour pointer holds a reference.
  // IPoIB doesn't change this pointer so the locking rules aren't important.
  struct neighbour *neighbour;

  atomic_t refcnt;
}

struct neighbour {
  // stores the IPoIB "hardware" address:
  // (control flags (one byte), QPN (3 bytes), destination GID (16 bytes),
  // padding (0 or 4 bytes), and a pointer to struct ipoib_neigh (4 or 8 bytes)
  // which is not reference counted.
  // It is protected by calling netif_tx_lock(dev) or netif_stop_queue(dev).
  // The Linux network stack can free the ipoib_neigh pointer by calling
  // ipoib_neigh_cleanup()
  uchar ha[];

  struct neigh_ops *ops;
  atomic_t refcnt;
}

struct ipoib_neigh {
  // This is protected by priv->lock and *does not* hold a reference
  struct neighbour *neighbour; // back pointer to containing struct

  // This is protected by priv->lock although it is accessed w/o
  // holding the priv->lock in ipoib_start_xmit() which means that
  // to clear the pointer, ipoib_start_xmit() has to be prevented from
  // being called if there is a chance that
  // "to_ipoib_neigh(skb_dst(skb)->neighbour)" could point to this struct.
  struct ipoib_cm_tx *cm;

  // This is protected by priv->lock and holds a reference
  struct ipoib_ah *ah;

  // Link for path->neigh_list or mcast->neigh_list.
  // This is protected by priv->lock.
  struct list_head list;
}

struct ipoib_cm_tx {
  // This is protected by priv->lock.
  struct ipoib_neigh *neigh;
}

struct ipoib_path {
  struct ib_sa_query *query; // non-NULL if SA path record query is pending

  // This is protected by priv->lock and holds a reference
  struct ipoib_ah *ah;

  struct completion   done;  // True if query is finished

  // list of all struct ipoib_neigh.list with the same ah pointer
  struct list_head neigh_list;
}

struct ipoib_dev_priv {
  // This lock protects a number of things associated with this
  // IPoIB network device.
  spinlock_t lock;

  // Contains the struct ipoib_mcast nodes indexed by MGID.
  // It is protected by priv->lock.
  struct rb_root multicast_tree;
}


Before any nodes can send IPoIB packets, the SA creates the
"all HCA multicast group GID" which encodes , ,
, .  For example, transient, link local scope, IPv4,
pkey of 0x8001, limited broadcast address we have:
FF12:401B:8001:::, in compressed format.
The group also has a P_Key, Q_Key, and MTU associated with it in the SA
path record.
This multicast group is used for IPv4 address resolution (ARP).
The group is joined when the IPoIB device is brought up and the details
saved in priv->broadcast.

The transmit process starts with the Linux networking code calling
netif_tx_lock(dev) and then calling:

ipoib_hard_header()
  // This prepends the "hardware address" onto the sk_buff header if
  // the neighbor address hasn't been set in the sk_buff.

ipoib_start_xmit(skb, dev)
  // The first time through, skb_dst(skb)->neighbour won't be set and
  // ipoib_hard_header() will have prepended the IPv4 broadcast
  // "hardware address" which we strip off and call the following.
  ipoib_mcast_send(dev, mgid, skb)
spin_lock_irqsave(&priv->lock, flags)
// Search for the mgid and create an entry if not found.
mcast = __ipoib_mcast_find(dev, mgid)
// Since mgid is probably the "all HCA multicast group GID" which
// was initialized when the network interface was started, we call:
spin_unlock_irqrestore(&priv->lock, flags)
// XXX bug? u