Re: [PATCH] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path
On Thu, 2010-02-25 at 12:15 -0800, Roland Dreier wrote: > > When using connected mode, ipoib_cm_create_tx() kmallocs a > > struct ipoib_cm_tx which contains pointers to ipoib_neigh and > > ipoib_path. If the paths are flushed or the struct neighbour is > > destroyed, the pointers held by struct ipoib_cm_tx can reference > > freed memory. The fix is to add reference counts to struct > > ipoib_neigh and ipoib_path and to add locking when getting > > new references. > > Good debugging. > > First look at this patch is that it ends up being rather invasive. I > wonder if we could fix this in the other direction by keeping a list of > the ipoib_cm_tx structures affected in the neigh and path structures, > and clean the cm_tx stuff up when flushing? > > Also I don't see any issues from a first read, but can you confirm that > you're not adding more locking/atomic ops (via kref) to the main data path? > > - R. I agree it is invasive. I thought it would be easier to discuss an actual patch than me trying to hand wave about a solution. Plus, now that I understand the problems better, I'm thinking of new ways to fix them. There is most definitely a new lock/unlock in the normal send path because ipoib_start_xmit() now calls neighbour_priv() which acquires the priv->lock() and does a kref_get(). I'm not really sure what things can change while ipoib_start_xmit() is active so I was being cautious. I guess at a minimum, ipoib_neigh_cleanup() won't be called by the network stack while ipoib_start_xmit() is active so the to_ipoib_neigh(neighbour) should be valid without my added locking. We could avoid adding a kref_t to struct ipoib_path by replacing the pointer to ipoib_path in struct ipoib_cm_tx with a struct ib_sa_path_rec. Otherwise, I think ipoib_flush_paths() could call into ipoib_cm.c to make sure no ipoib_cm_tx is queued on the priv->cm.start_list which points to the given struct ipoib_path (and remove it from the list if found). I will try these ideas out and send an updated patch based on the results. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path
On Thu, 2010-02-25 at 12:03 -0800, Arthur Kepner wrote: > On Thu, Feb 25, 2010 at 11:29:02AM -0800, Ralph Campbell wrote: > > > > I haven't looked carefully at the whole patch, but this bit > looks wrong: > > > @@ -848,61 +823,112 @@ static void ipoib_neigh_cleanup(struct neighbour *n) > > struct ipoib_neigh *neigh; > > struct ipoib_dev_priv *priv = netdev_priv(n->dev); > > unsigned long flags; > > - struct ipoib_ah *ah = NULL; > > + > > + spin_lock_irqsave(&priv->lock, flags); > > > > neigh = *to_ipoib_neigh(n); > > - if (neigh) > > - priv = netdev_priv(neigh->dev); > > - else > > + if (neigh) { > > Should this be "if (!neigh)" ? > > > + spin_unlock_irqrestore(&priv->lock, flags); > > return; > > + } > > + *to_ipoib_neigh(n) = NULL; > > + neigh->neighbour = NULL; > > + You are correct.Homer Simpson Doh! :-) I will send an updated patch after resolving Roland's questions. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path
On Thu, Feb 25, 2010 at 11:29:02AM -0800, Ralph Campbell wrote: > I haven't looked carefully at the whole patch, but this bit looks wrong: > @@ -848,61 +823,112 @@ static void ipoib_neigh_cleanup(struct neighbour *n) > struct ipoib_neigh *neigh; > struct ipoib_dev_priv *priv = netdev_priv(n->dev); > unsigned long flags; > - struct ipoib_ah *ah = NULL; > + > + spin_lock_irqsave(&priv->lock, flags); > > neigh = *to_ipoib_neigh(n); > - if (neigh) > - priv = netdev_priv(neigh->dev); > - else > + if (neigh) { Should this be "if (!neigh)" ? > + spin_unlock_irqrestore(&priv->lock, flags); > return; > + } > + *to_ipoib_neigh(n) = NULL; > + neigh->neighbour = NULL; > + -- Arthur -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path
> When using connected mode, ipoib_cm_create_tx() kmallocs a > struct ipoib_cm_tx which contains pointers to ipoib_neigh and > ipoib_path. If the paths are flushed or the struct neighbour is > destroyed, the pointers held by struct ipoib_cm_tx can reference > freed memory. The fix is to add reference counts to struct > ipoib_neigh and ipoib_path and to add locking when getting > new references. Good debugging. First look at this patch is that it ends up being rather invasive. I wonder if we could fix this in the other direction by keeping a list of the ipoib_cm_tx structures affected in the neigh and path structures, and clean the cm_tx stuff up when flushing? Also I don't see any issues from a first read, but can you confirm that you're not adding more locking/atomic ops (via kref) to the main data path? - R. -- Roland Dreier For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path
>From 4a2f3a9685fd82b57e75a31d04d6967d7d9b33c2 Mon Sep 17 00:00:00 2001 From: Ralph Campbell Date: Thu, 25 Feb 2010 11:22:02 -0800 Subject: [PATCH] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path When using connected mode, ipoib_cm_create_tx() kmallocs a struct ipoib_cm_tx which contains pointers to ipoib_neigh and ipoib_path. If the paths are flushed or the struct neighbour is destroyed, the pointers held by struct ipoib_cm_tx can reference freed memory. The fix is to add reference counts to struct ipoib_neigh and ipoib_path and to add locking when getting new references. Signed-off-by: Ralph Campbell --- drivers/infiniband/ulp/ipoib/ipoib.h | 42 +++- drivers/infiniband/ulp/ipoib/ipoib_cm.c| 91 +++ drivers/infiniband/ulp/ipoib/ipoib_main.c | 322 +--- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 94 +++- 4 files changed, 280 insertions(+), 269 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 753a983..49c9097 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -379,6 +379,7 @@ struct ipoib_path { struct rb_noderb_node; struct list_head list; int valid; + struct kref ref; }; struct ipoib_neigh { @@ -393,6 +394,7 @@ struct ipoib_neigh { struct net_device *dev; struct list_headlist; + struct kref ref; }; #define IPOIB_UD_MTU(ib_mtu) (ib_mtu - IPOIB_ENCAP_LEN) @@ -415,12 +417,33 @@ static inline struct ipoib_neigh **to_ipoib_neigh(struct neighbour *neigh) INFINIBAND_ALEN, sizeof(void *)); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh, - struct net_device *dev); -void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh); +void ipoib_neigh_flush(struct ipoib_neigh *neigh); +void ipoib_neigh_free(struct kref *kref); +void ipoib_path_free(struct kref *kref); extern struct workqueue_struct *ipoib_workqueue; +static inline void ipoib_path_get(struct ipoib_path *path) +{ + kref_get(&path->ref); +} + +/* This should not be called while holding priv->lock */ +static inline void ipoib_path_put(struct ipoib_path *path) +{ + kref_put(&path->ref, ipoib_path_free); +} + +static inline void ipoib_neigh_get(struct ipoib_neigh *neigh) +{ + kref_get(&neigh->ref); +} + +static inline void ipoib_neigh_put(struct ipoib_neigh *neigh) +{ + kref_put(&neigh->ref, ipoib_neigh_free); +} + /* functions */ int ipoib_poll(struct napi_struct *napi, int budget); @@ -464,7 +487,8 @@ void ipoib_dev_cleanup(struct net_device *dev); void ipoib_mcast_join_task(struct work_struct *work); void ipoib_mcast_carrier_on_task(struct work_struct *work); -void ipoib_mcast_send(struct net_device *dev, void *mgid, struct sk_buff *skb); +void ipoib_mcast_send(struct net_device *dev, void *mgid, struct sk_buff *skb, + struct ipoib_neigh *neigh); void ipoib_mcast_restart_task(struct work_struct *work); int ipoib_mcast_start_thread(struct net_device *dev); @@ -567,8 +591,8 @@ void ipoib_cm_dev_stop(struct net_device *dev); int ipoib_cm_dev_init(struct net_device *dev); int ipoib_cm_add_mode_attr(struct net_device *dev); void ipoib_cm_dev_cleanup(struct net_device *dev); -struct ipoib_cm_tx *ipoib_cm_create_tx(struct net_device *dev, struct ipoib_path *path, - struct ipoib_neigh *neigh); +void ipoib_cm_create_tx(struct net_device *dev, struct ipoib_path *path, + struct ipoib_neigh *neigh); void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx); void ipoib_cm_skb_too_long(struct net_device *dev, struct sk_buff *skb, unsigned int mtu); @@ -646,10 +670,10 @@ void ipoib_cm_dev_cleanup(struct net_device *dev) } static inline -struct ipoib_cm_tx *ipoib_cm_create_tx(struct net_device *dev, struct ipoib_path *path, - struct ipoib_neigh *neigh) +void ipoib_cm_create_tx(struct net_device *dev, struct ipoib_path *path, + struct ipoib_neigh *neigh) { - return NULL; + return; } static inline diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 30bdf42..0a7343e 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -794,31 +794,14 @@ void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) if (wc->status != IB_WC_SUCCESS && wc->status != IB_WC_WR_FLUSH_ERR) { - struct ipoib_neigh *neigh; - ipoib_dbg(priv, "failed cm send event " "(status=%d, wrid=%d vend_err %x)\n