Re: [ewg] [PATCH]IPOIB/CM fix for bug# 906 -OFED-1.3
Stefan Roscher wrote: yes this problem does also exist in 2.6.25-rc1. It was introduced by a patch from roland: http://git.kernel.org/?p=linux/kernel/git/roland/infiniband.git;a=commitdiff;h=efcd99717f76c6d19dd81203c60fe198480de522 In function ipoib_cm_dev_stop() the error-,drain- and flush lists are put into a local list after a timeout. In the past there was a list_for_each_entry loop iterating over this local list and destroyed all added QPs. With the patch above the list_for_each_entry call is moved to function ipoib_cm_free_rx_reap_list(), which does not iterate the former local list, but device's reap_list. Pradeeps patch puts now all QPs after a timeout from error, drain and flush lists into the reap_list so that they were all freed in poib_cm_free_rx_reap_list(). OK, so send the patch to Roland for review before you put it in ofed. Or. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH]IPOIB/CM fix for bug# 906 -OFED-1.3
On Wednesday 13 February 2008 09:04:53 Or Gerlitz wrote: > Pradeep Satyanarayana wrote: > > This patch fixes -fail to destroy ipoib rx QP > > (https://bugs.openfabrics.org/show_bug.cgi?id=906) > > Hence the usecnt issue reported previously on ehca is solved and allows the > > qp to be destroyed. > > > > As per Eli's request, I am splitting up the patches. This is first portion > > of yesterday's patch. > > Tested on ppc64 machines with ehca and mthca. > > Also here, does this problem exist in the 2.6.25-rc1 upstream code as > well? from the change log I don't understand the source of the problem > (only the symptom of failing to destroy ipoib/cm rx QP) and the solution. > > Or. Hi, yes this problem does also exist in 2.6.25-rc1. It was introduced by a patch from roland: http://git.kernel.org/?p=linux/kernel/git/roland/infiniband.git;a=commitdiff;h=efcd99717f76c6d19dd81203c60fe198480de522 In function ipoib_cm_dev_stop() the error-,drain- and flush lists are put into a local list after a timeout. In the past there was a list_for_each_entry loop iterating over this local list and destroyed all added QPs. With the patch above the list_for_each_entry call is moved to function ipoib_cm_free_rx_reap_list(), which does not iterate the former local list, but device's reap_list. Pradeeps patch puts now all QPs after a timeout from error, drain and flush lists into the reap_list so that they were all freed in poib_cm_free_rx_reap_list(). Stefan ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH]IPOIB/CM fix for bug# 906 -OFED-1.3
On Wed, 2008-02-13 at 10:04 +0200, Or Gerlitz wrote: > Also here, does this problem exist in the 2.6.25-rc1 upstream code as > well? from the change log I don't understand the source of the > problem > (only the symptom of failing to destroy ipoib/cm rx QP) and the > solution. > > Or. I believe so. This is not a new problem in OFED-1.3 release. Thanks Shirley ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH]IPOIB/CM fix for bug# 906 -OFED-1.3
Pradeep Satyanarayana wrote: This patch fixes -fail to destroy ipoib rx QP (https://bugs.openfabrics.org/show_bug.cgi?id=906) Hence the usecnt issue reported previously on ehca is solved and allows the qp to be destroyed. As per Eli's request, I am splitting up the patches. This is first portion of yesterday's patch. Tested on ppc64 machines with ehca and mthca. Also here, does this problem exist in the 2.6.25-rc1 upstream code as well? from the change log I don't understand the source of the problem (only the symptom of failing to destroy ipoib/cm rx QP) and the solution. Or. Signed-off-by: Pradeep Satyanarayana <[EMAIL PROTECTED]> --- --- ofa_kernel-1.3_a/drivers/infiniband/ulp/ipoib/ipoib_cm.c2008-02-11 14:28:47.0 -0500 +++ ofa_kernel-1.3_b/drivers/infiniband/ulp/ipoib/ipoib_cm.c2008-02-12 17:44:07.0 -0500 @@ -883,9 +883,9 @@ void ipoib_cm_dev_stop(struct net_device /* * assume the HW is wedged and just free up everything. */ - list_splice_init(&priv->cm.rx_flush_list, &list); - list_splice_init(&priv->cm.rx_error_list, &list); - list_splice_init(&priv->cm.rx_drain_list, &list); + list_splice_init(&priv->cm.rx_flush_list, &priv->cm.rx_reap_list); + list_splice_init(&priv->cm.rx_error_list, &priv->cm.rx_reap_list); + list_splice_init(&priv->cm.rx_drain_list, &priv->cm.rx_reap_list); break; } spin_unlock_irq(&priv->lock); ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH]IPOIB/CM fix for bug# 906 -OFED-1.3
This patch fixes -fail to destroy ipoib rx QP (https://bugs.openfabrics.org/show_bug.cgi?id=906) Hence the usecnt issue reported previously on ehca is solved and allows the qp to be destroyed. As per Eli's request, I am splitting up the patches. This is first portion of yesterday's patch. Tested on ppc64 machines with ehca and mthca. Signed-off-by: Pradeep Satyanarayana <[EMAIL PROTECTED]> --- --- ofa_kernel-1.3_a/drivers/infiniband/ulp/ipoib/ipoib_cm.c2008-02-11 14:28:47.0 -0500 +++ ofa_kernel-1.3_b/drivers/infiniband/ulp/ipoib/ipoib_cm.c2008-02-12 17:44:07.0 -0500 @@ -883,9 +883,9 @@ void ipoib_cm_dev_stop(struct net_device /* * assume the HW is wedged and just free up everything. */ - list_splice_init(&priv->cm.rx_flush_list, &list); - list_splice_init(&priv->cm.rx_error_list, &list); - list_splice_init(&priv->cm.rx_drain_list, &list); + list_splice_init(&priv->cm.rx_flush_list, &priv->cm.rx_reap_list); + list_splice_init(&priv->cm.rx_error_list, &priv->cm.rx_reap_list); + list_splice_init(&priv->cm.rx_drain_list, &priv->cm.rx_reap_list); break; } spin_unlock_irq(&priv->lock); ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg