[openib-general] [RFC] IB/ipoib: Asynchronous events delivered without port parameter.
Hello, I did a short code review of the ipoib code concentrating on partitioning support and I mentioned that the asynchronous events handler in the ipoib code does not take the port number reported in the event record into consideration. The effect of that is that all of the ib# devices related to that specific HCA are flushed when it seems to me that only the relevant port one should be. Is that done on purpose, or am I missing something ? Thanks, Moni p.s. I'm working on a patch that should solve another issue caused by PKEY reordering ipoib behavior and the above issue further complicates things for me. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for pkey reordering
This issue was found during partitioning SM fail over testing. The fix was tested over the weekend with pkey reshuffling, removal and addition every few seconds concurrent with OFED restart. The patch applies on Roland's git tree. Changes from v1: * added flush flag to ipoib_ib_dev_stop(), ipoib_ib_dev_down() alike * fixed a bug in device extraction from the work struct * removed some warnings in case they are caused due to missing PKEY as this seems like a valid flow now. SM reconfiguration or failover possibly causes a shuffling of the values in the port pkey table. The current implementation only queries for the index of the pkey once, when it creates the device QP and after that moves it into working state, and hence does not address this scenario. Fix this by using the PKEY_CHANGE event as a trigger to reconfigure the device QP. Signed-off-by: Moni Levy [EMAIL PROTECTED] --- ipoib.h |4 +++- ipoib_ib.c| 51 +-- ipoib_main.c |5 +++-- ipoib_multicast.c | 11 ++- ipoib_verbs.c |8 +++- 5 files changed, 60 insertions(+), 19 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 2594db2..d08ecca 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -205,6 +205,7 @@ struct ipoib_dev_priv { struct delayed_work pkey_task; struct delayed_work mcast_task; struct work_struct flush_task; + struct work_struct flush_restart_qp_task; struct work_struct restart_task; struct delayed_work ah_reap_task; @@ -334,12 +335,13 @@ struct ipoib_dev_priv *ipoib_intf_alloc( int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port); void ipoib_ib_dev_flush(struct work_struct *work); +void ipoib_ib_dev_flush_restart_qp(struct work_struct *work); void ipoib_ib_dev_cleanup(struct net_device *dev); int ipoib_ib_dev_open(struct net_device *dev); int ipoib_ib_dev_up(struct net_device *dev); int ipoib_ib_dev_down(struct net_device *dev, int flush); -int ipoib_ib_dev_stop(struct net_device *dev); +int ipoib_ib_dev_stop(struct net_device *dev, int flush); int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port); void ipoib_dev_cleanup(struct net_device *dev); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index f2aa923..b0287c1 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -415,21 +415,22 @@ int ipoib_ib_dev_open(struct net_device ret = ipoib_init_qp(dev); if (ret) { - ipoib_warn(priv, ipoib_init_qp returned %d\n, ret); + if (ret != -ENOENT) + ipoib_warn(priv, ipoib_init_qp returned %d\n, ret); return -1; } ret = ipoib_ib_post_receives(dev); if (ret) { ipoib_warn(priv, ipoib_ib_post_receives returned %d\n, ret); - ipoib_ib_dev_stop(dev); + ipoib_ib_dev_stop(dev, 1); return -1; } ret = ipoib_cm_dev_open(dev); if (ret) { ipoib_warn(priv, ipoib_ib_post_receives returned %d\n, ret); - ipoib_ib_dev_stop(dev); + ipoib_ib_dev_stop(dev, 1); return -1; } @@ -508,7 +509,7 @@ static int recvs_pending(struct net_devi return pending; } -int ipoib_ib_dev_stop(struct net_device *dev) +int ipoib_ib_dev_stop(struct net_device *dev, int flush) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_qp_attr qp_attr; @@ -581,7 +582,8 @@ timeout: /* Wait for all AHs to be reaped */ set_bit(IPOIB_STOP_REAPER, priv-flags); cancel_delayed_work(priv-ah_reap_task); - flush_workqueue(ipoib_workqueue); + if (flush) + flush_workqueue(ipoib_workqueue); begin = jiffies; @@ -622,13 +624,17 @@ int ipoib_ib_dev_init(struct net_device return 0; } -void ipoib_ib_dev_flush(struct work_struct *work) +static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, int restart_qp) { - struct ipoib_dev_priv *cpriv, *priv = - container_of(work, struct ipoib_dev_priv, flush_task); + struct ipoib_dev_priv *cpriv; struct net_device *dev = priv-dev; - if (!test_bit(IPOIB_FLAG_INITIALIZED, priv-flags) ) { + /* +* ipoib_ib_dev_stop() below may not find the PKey and leave the +* IPOIB_FLAG_INITIALIZED flag off so flush in that case with restart_qp +* flag on is Ok. +*/ + if (!test_bit(IPOIB_FLAG_INITIALIZED, priv-flags) !restart_qp) { ipoib_dbg(priv, Not flushing - IPOIB_FLAG_INITIALIZED not set.\n); return; } @@ -641,6 +647,13 @@ void ipoib_ib_dev_flush
Re: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for pkey reordering
On 2/27/07, Roland Dreier [EMAIL PROTECTED] wrote: I just gave this a cursory glance. I haven't really read it except to think why is this so complicated? Do you refer to that complication of the patch of the issue ? A suggestion: would it not be much simpler to modify the QP from RTS to RTS on pkey change? Changing the P_Key index is not allowed for RTS-RTS. You would have to modify the QP RTS-SQD, wait for the SQ to drain, then modify the P_Key index with SQD-SQD, and finally go SQD-RTS. Do you think that using that way to solve it will be a significant simplification ? We'll still have to reuse that handling for missed completion that is currently implemented in ipoib_ib_dev_stop and still have additional work element. -- Moni - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered without port parameter.
On 2/27/07, Roland Dreier [EMAIL PROTECTED] wrote: I did a short code review of the ipoib code concentrating on partitioning support and I mentioned that the asynchronous events handler in the ipoib code does not take the port number reported in the event record into consideration. The effect of that is that all of the ib# devices related to that specific HCA are flushed when it seems to me that only the relevant port one should be. Is that done on purpose, or am I missing something ? I don't think there's any particular reason the code is that way except for the oversight never being corrected. But it looks trivial to fix, like the patch below. Does that look right to you? p.s. I'm working on a patch that should solve another issue caused by PKEY reordering ipoib behavior and the above issue further complicates things for me. Why not fix the issue first then? commit a27cbe878203076247c1b5287f5ab59ed143b560 Author: Roland Dreier [EMAIL PROTECTED] Date: Tue Feb 27 07:37:49 2007 -0800 IPoIB: Only handle async events for one port An asynchronous event carries the port number that the event occurred on, so there's no reason for an IPoIB interface to process an event associated with a different local HCA port. Signed-off-by: Roland Dreier [EMAIL PROTECTED] diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c index 3cb551b..7f3ec20 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c @@ -259,12 +259,13 @@ void ipoib_event(struct ib_event_handler *handler, struct ipoib_dev_priv *priv = container_of(handler, struct ipoib_dev_priv, event_handler); - if (record-event == IB_EVENT_PORT_ERR|| - record-event == IB_EVENT_PKEY_CHANGE || - record-event == IB_EVENT_PORT_ACTIVE || - record-event == IB_EVENT_LID_CHANGE || - record-event == IB_EVENT_SM_CHANGE || - record-event == IB_EVENT_CLIENT_REREGISTER) { + if ((record-event == IB_EVENT_PORT_ERR|| +record-event == IB_EVENT_PKEY_CHANGE || +record-event == IB_EVENT_PORT_ACTIVE || +record-event == IB_EVENT_LID_CHANGE || +record-event == IB_EVENT_SM_CHANGE || +record-event == IB_EVENT_CLIENT_REREGISTER) + record-element.port_num == priv-port) { ipoib_dbg(priv, Port state change event\n); queue_work(ipoib_workqueue, priv-flush_task); } That's exactly what I intended to post. --Moni ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for pkey reordering
On 2/27/07, Roland Dreier [EMAIL PROTECTED] wrote: I haven't really read it except to think why is this so complicated? Do you refer to that complication of the patch of the issue ? the patch. Please advise and I'll change it. Changing the P_Key index is not allowed for RTS-RTS. You would have to modify the QP RTS-SQD, wait for the SQ to drain, then modify the P_Key index with SQD-SQD, and finally go SQD-RTS. Do you think that using that way to solve it will be a significant simplification ? We'll still have to reuse that handling for missed completion that is currently implemented in ipoib_ib_dev_stop and still have additional work element. no, I don't think SQD is really useful in practice. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered without port parameter.
On 2/27/07, Moni Levy [EMAIL PROTECTED] wrote: On 2/27/07, Roland Dreier [EMAIL PROTECTED] wrote: I did a short code review of the ipoib code concentrating on partitioning support and I mentioned that the asynchronous events handler in the ipoib code does not take the port number reported in the event record into consideration. The effect of that is that all of the ib# devices related to that specific HCA are flushed when it seems to me that only the relevant port one should be. Is that done on purpose, or am I missing something ? I don't think there's any particular reason the code is that way except for the oversight never being corrected. But it looks trivial to fix, like the patch below. Does that look right to you? p.s. I'm working on a patch that should solve another issue caused by PKEY reordering ipoib behavior and the above issue further complicates things for me. Why not fix the issue first then? commit a27cbe878203076247c1b5287f5ab59ed143b560 Author: Roland Dreier [EMAIL PROTECTED] Date: Tue Feb 27 07:37:49 2007 -0800 IPoIB: Only handle async events for one port An asynchronous event carries the port number that the event occurred on, so there's no reason for an IPoIB interface to process an event associated with a different local HCA port. Signed-off-by: Roland Dreier [EMAIL PROTECTED] diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c index 3cb551b..7f3ec20 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c @@ -259,12 +259,13 @@ void ipoib_event(struct ib_event_handler *handler, struct ipoib_dev_priv *priv = container_of(handler, struct ipoib_dev_priv, event_handler); - if (record-event == IB_EVENT_PORT_ERR|| - record-event == IB_EVENT_PKEY_CHANGE || - record-event == IB_EVENT_PORT_ACTIVE || - record-event == IB_EVENT_LID_CHANGE || - record-event == IB_EVENT_SM_CHANGE || - record-event == IB_EVENT_CLIENT_REREGISTER) { + if ((record-event == IB_EVENT_PORT_ERR|| +record-event == IB_EVENT_PKEY_CHANGE || +record-event == IB_EVENT_PORT_ACTIVE || +record-event == IB_EVENT_LID_CHANGE || +record-event == IB_EVENT_SM_CHANGE || +record-event == IB_EVENT_CLIENT_REREGISTER) + record-element.port_num == priv-port) { ipoib_dbg(priv, Port state change event\n); queue_work(ipoib_workqueue, priv-flush_task); } That's exactly what I intended to post. On a second thought based on the fact that on a two port HCA we'll have a 50% miss on the events being delivered, I would move the new condition to be evaluated first. I apologize if this is too much of micro optimization. What do you think ? --Moni --Moni ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC] [PATCH] ib_cache: do not mask upper bit when searching for a pkey
Sean, On 2/26/07, Sean Hefty [EMAIL PROTECTED] wrote: I think the following patch would make ipoib spec compliant. ib_find_cached_pkey is called by ib_cm, rdma_cm, ib_srp, and ib_ipoib. I'm not certain what this change would do to SRP, but the ib_cm and rdma_cm look okay, given that non-reversible paths aren't supported yet anyway. Sorry for jumping into that thread, but although this patch will make things more spec compliant, it will break functionality we depend one. I suggest that we first find an alternate way to enable usage of partial partition membership before disabling that functionality at all. --Moni -- ib_find_cached_pkey masks off the upper-bit of the PKey when searching for a match. The upper bit indicates partial or full membership. Ignoring the upper bit can result in a full membership PKey matching with a partial membership PKey. For ipoib, this can result in joining a multicast group that disallows communication between all members. Signed-off-by: Sean Hefty [EMAIL PROTECTED] --- drivers/infiniband/core/cache.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c index 558c9a0..6f366c3 100644 --- a/drivers/infiniband/core/cache.c +++ b/drivers/infiniband/core/cache.c @@ -179,7 +179,7 @@ int ib_find_cached_pkey(struct ib_device *device, *index = -1; for (i = 0; i cache-table_len; ++i) - if ((cache-table[i] 0x7fff) == (pkey 0x7fff)) { + if (cache-table[i] == pkey) { *index = i; ret = 0; break; -- 1.4.4.3 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered without port parameter.
On 2/27/07, Roland Dreier [EMAIL PROTECTED] wrote: On a second thought based on the fact that on a two port HCA we'll have a 50% miss on the events being delivered, I would move the new condition to be evaluated first. I apologize if this is too much of micro optimization. What do you think ? That wouldn't really be correct since element.port_num isn't valid unless we already know it's a port-related event. You're perfectly right, sorry. And it's not worth worrying about this since it's not remotely a hot path. Ok. --Moni - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC] [PATCH] ib_cache: do not mask upper bit when searching for a pkey
On 2/27/07, Sean Hefty [EMAIL PROTECTED] wrote: Sorry for jumping into that thread, but although this patch will make things more spec compliant, it will break functionality we depend one. I suggest that we first find an alternate way to enable usage of partial partition membership before disabling that functionality at all. Can you clarify the functionality you depend on? Are you reliant on ipoib being able to join a multicast group from partial partition membership? Exactly. If so, do all SA's and switches support this? I can't commit on all the SA's and switches. -- Moni - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] IB/ipoib: Fix ipoib handling for pkey reordering
This issue was found during partitioning SM fail over testing. The fix was tested for 24 hours with pkey reshuffling every few seconds. The patch applies to Roland's master branch. SM reconfiguration or failover possibly causes a shuffling of the values in the port pkey table. The current implementation only queries for the index of the pkey once, when it creates the device QP and after that moves it into working state, and hence does not address this scenario. Fix this by using the PKEY_CHANGE event as a trigger to reconfigure the device QP. Signed-off-by: Moni Levy [EMAIL PROTECTED] --- ipoib.h |2 ++ ipoib_ib.c| 22 -- ipoib_main.c |1 + ipoib_verbs.c |4 +++- 4 files changed, 26 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 07deee8..ed854e8 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -139,6 +139,7 @@ struct ipoib_dev_priv { struct delayed_work pkey_task; struct delayed_work mcast_task; struct work_struct flush_task; + struct work_struct flush_restart_qp_task; struct work_struct restart_task; struct delayed_work ah_reap_task; @@ -261,6 +262,7 @@ struct ipoib_dev_priv *ipoib_intf_alloc( int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port); void ipoib_ib_dev_flush(struct work_struct *work); +void ipoib_ib_dev_flush_restart_qp(struct work_struct *work); void ipoib_ib_dev_cleanup(struct net_device *dev); int ipoib_ib_dev_open(struct net_device *dev); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 59d9594..5e2ada9 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -611,7 +611,7 @@ int ipoib_ib_dev_init(struct net_device return 0; } -void ipoib_ib_dev_flush(struct work_struct *work) +static void __ipoib_ib_dev_flush(struct work_struct *work, int restart_qp) { struct ipoib_dev_priv *cpriv, *priv = container_of(work, struct ipoib_dev_priv, flush_task); @@ -630,6 +630,12 @@ void ipoib_ib_dev_flush(struct work_stru ipoib_dbg(priv, flushing\n); ipoib_ib_dev_down(dev, 0); + + if (restart_qp) { + ipoib_dbg(priv, restarting the device QP\n); + ipoib_ib_dev_stop(dev); + ipoib_ib_dev_open(dev); + } /* * The device could have been brought down between the start and when @@ -644,11 +650,23 @@ void ipoib_ib_dev_flush(struct work_stru /* Flush any child interfaces too */ list_for_each_entry(cpriv, priv-child_intfs, list) - ipoib_ib_dev_flush(cpriv-flush_task); + __ipoib_ib_dev_flush(cpriv-flush_task, restart_qp); mutex_unlock(priv-vlan_mutex); } +void ipoib_ib_dev_flush(struct work_struct *work) +{ + /* We only restart the QP in case of PKEY change event */ + __ipoib_ib_dev_flush(work, 0); +} + +void ipoib_ib_dev_flush_restart_qp(struct work_struct *work) +{ + /* We only restart the QP in case of PKEY change event */ + __ipoib_ib_dev_flush(work, 1); +} + void ipoib_ib_dev_cleanup(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 705eb1d..da46b79 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -942,6 +942,7 @@ static void ipoib_setup(struct net_devic INIT_DELAYED_WORK(priv-pkey_task,ipoib_pkey_poll); INIT_DELAYED_WORK(priv-mcast_task, ipoib_mcast_join_task); INIT_WORK(priv-flush_task, ipoib_ib_dev_flush); + INIT_WORK(priv-flush_restart_qp_task, ipoib_ib_dev_flush_restart_qp); INIT_WORK(priv-restart_task, ipoib_mcast_restart_task); INIT_DELAYED_WORK(priv-ah_reap_task, ipoib_reap_ah); } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c index 7b717c6..c249915 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c @@ -252,12 +252,14 @@ void ipoib_event(struct ib_event_handler container_of(handler, struct ipoib_dev_priv, event_handler); if (record-event == IB_EVENT_PORT_ERR|| - record-event == IB_EVENT_PKEY_CHANGE || record-event == IB_EVENT_PORT_ACTIVE || record-event == IB_EVENT_LID_CHANGE || record-event == IB_EVENT_SM_CHANGE || record-event == IB_EVENT_CLIENT_REREGISTER) { ipoib_dbg(priv, Port state change event\n); queue_work(ipoib_workqueue, priv-flush_task); + } else if (record-event == IB_EVENT_PKEY_CHANGE) { + ipoib_dbg(priv, PKEY change event\n
Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey
Or, On 2/19/07, Or Gerlitz [EMAIL PROTECTED] wrote: Hi Sean, this fixes a bug which did not allow to run librdmacm apps over a node which is partial member of a partition. The patch takes the approach of the kernel ib_find_cached_pkey implementation. If you approve this, i suggest pushing it also into OFED 1.2 as a bug fix. Or. -- The pkey extracted by the RDMA CM from the IPoIB device hardware address always has the full membership bit set. However, when looking in the pkey table the search must mask out the full membership bit. Signed-off-by: Or Gerlitz [EMAIL PROTECTED] Signed-off-by: Olga Shern [EMAIL PROTECTED] diff --git a/src/cma.c b/src/cma.c index c5f8cd9..9c24c6a 100644 --- a/src/cma.c +++ b/src/cma.c @@ -661,7 +661,7 @@ static int ucma_find_pkey(struct cma_dev for (i = 0, ret = 0; !ret; i++) { ret = ibv_query_pkey(cma_dev-verbs, port_num, i, chk_pkey); - if (!ret pkey == chk_pkey) { + if ((!ret pkey == chk_pkey) || (!ret htons(ntohs(pkey) 0x7fff) == chk_pkey)) { What about just using: if (!ret pkey | 0x8000 == chk_pkey | 0x8000) { even if not there is no need to check the ret twice in case of limited membership -- Moni *pkey_index = (uint16_t) i; return 0; } ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IB/ipoib: Fix ipoib handling for pkey reordering
On 2/19/07, Moni Levy [EMAIL PROTECTED] wrote: This issue was found during partitioning SM fail over testing. The fix was tested for 24 hours with pkey reshuffling every few seconds. The patch applies to Roland's master branch. I found an issue with that patch, I'll post an updated one soon. -- Moni SM reconfiguration or failover possibly causes a shuffling of the values in the port pkey table. The current implementation only queries for the index of the pkey once, when it creates the device QP and after that moves it into working state, and hence does not address this scenario. Fix this by using the PKEY_CHANGE event as a trigger to reconfigure the device QP. Signed-off-by: Moni Levy [EMAIL PROTECTED] --- ipoib.h |2 ++ ipoib_ib.c| 22 -- ipoib_main.c |1 + ipoib_verbs.c |4 +++- 4 files changed, 26 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 07deee8..ed854e8 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -139,6 +139,7 @@ struct ipoib_dev_priv { struct delayed_work pkey_task; struct delayed_work mcast_task; struct work_struct flush_task; + struct work_struct flush_restart_qp_task; struct work_struct restart_task; struct delayed_work ah_reap_task; @@ -261,6 +262,7 @@ struct ipoib_dev_priv *ipoib_intf_alloc( int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port); void ipoib_ib_dev_flush(struct work_struct *work); +void ipoib_ib_dev_flush_restart_qp(struct work_struct *work); void ipoib_ib_dev_cleanup(struct net_device *dev); int ipoib_ib_dev_open(struct net_device *dev); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 59d9594..5e2ada9 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -611,7 +611,7 @@ int ipoib_ib_dev_init(struct net_device return 0; } -void ipoib_ib_dev_flush(struct work_struct *work) +static void __ipoib_ib_dev_flush(struct work_struct *work, int restart_qp) { struct ipoib_dev_priv *cpriv, *priv = container_of(work, struct ipoib_dev_priv, flush_task); @@ -630,6 +630,12 @@ void ipoib_ib_dev_flush(struct work_stru ipoib_dbg(priv, flushing\n); ipoib_ib_dev_down(dev, 0); + + if (restart_qp) { + ipoib_dbg(priv, restarting the device QP\n); + ipoib_ib_dev_stop(dev); + ipoib_ib_dev_open(dev); + } /* * The device could have been brought down between the start and when @@ -644,11 +650,23 @@ void ipoib_ib_dev_flush(struct work_stru /* Flush any child interfaces too */ list_for_each_entry(cpriv, priv-child_intfs, list) - ipoib_ib_dev_flush(cpriv-flush_task); + __ipoib_ib_dev_flush(cpriv-flush_task, restart_qp); mutex_unlock(priv-vlan_mutex); } +void ipoib_ib_dev_flush(struct work_struct *work) +{ + /* We only restart the QP in case of PKEY change event */ + __ipoib_ib_dev_flush(work, 0); +} + +void ipoib_ib_dev_flush_restart_qp(struct work_struct *work) +{ + /* We only restart the QP in case of PKEY change event */ + __ipoib_ib_dev_flush(work, 1); +} + void ipoib_ib_dev_cleanup(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 705eb1d..da46b79 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -942,6 +942,7 @@ static void ipoib_setup(struct net_devic INIT_DELAYED_WORK(priv-pkey_task,ipoib_pkey_poll); INIT_DELAYED_WORK(priv-mcast_task, ipoib_mcast_join_task); INIT_WORK(priv-flush_task, ipoib_ib_dev_flush); + INIT_WORK(priv-flush_restart_qp_task, ipoib_ib_dev_flush_restart_qp); INIT_WORK(priv-restart_task, ipoib_mcast_restart_task); INIT_DELAYED_WORK(priv-ah_reap_task, ipoib_reap_ah); } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c index 7b717c6..c249915 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c @@ -252,12 +252,14 @@ void ipoib_event(struct ib_event_handler container_of(handler, struct ipoib_dev_priv, event_handler); if (record-event == IB_EVENT_PORT_ERR|| - record-event == IB_EVENT_PKEY_CHANGE || record-event == IB_EVENT_PORT_ACTIVE || record-event == IB_EVENT_LID_CHANGE || record-event == IB_EVENT_SM_CHANGE || record-event == IB_EVENT_CLIENT_REREGISTER
Re: [openib-general] issues with compilation of ofed 1.2
Doug, On 2/7/07, Yosef Etigin [EMAIL PROTECTED] wrote: 7. On RHAS5 beta 2, the setup requires sysfstuils-devel RPM which is not included in this distro. Can you please help us with that ? -- Moni -- Yosef Etigin Alex Tabachnik ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OFED-1.2 first release
Vlad, # tail -10 /tmp/OFED.10899.log Wrote: /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1-rh-x86_64.rpm Wrote: /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-debuginfo-0.9.0-1-rh-x86_64.rpm Executing(--clean): /bin/sh -e /var/tmp/rpm-tmp.98615 + umask 022 + cd /var/tmp/OFEDRPM/BUILD + rm -rf ib-bonding-0.9.0 + exit 0 /bin/mv: cannot stat `/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm I see that there is a small difference in the expected RPM name. Can you fix that in the script or should we change the name of the RPM ? -- Moni ': No such file or directory ERROR: Failed executing /bin/mv -f /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9. 0-1.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OFED 1.2 release - to be reviewed in the meeting today
Tziporet, On 1/31/07, Tziporet Koren [EMAIL PROTECTED] wrote: Shaun Rowland wrote: Hi. I am not exactly sure where the ofed_1_2 directory for MPI SRPMs is supposed to go. I assume from previous meetings this is just a filesystem directory. Should it be a directory in my home directory on staging.openfabrics.org, in ~/public_html, or is there something else I need to do to put this into place? I think from the previous MPI specific meeting, this was supposed to be done in a web directory. Since I am unclear, I wanted to ask here. Please place your SRPM under your home directory at ofed_1_2 directory. Then you can make this directory accessible to the web in this way: 1. mkdir public_html 2. chmod 755 public_html Now you can put any stuff under public_html (also symbolic links) and it will be available via web www.openfabrics.org/~user name/ I have put the ib-bonding SRPM in ~monis/ofed_1_2 --Moni Tziporet ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Installation on openSUSE 10.2 Beta1 fails
On 11/10/06, Diego Guella [EMAIL PROTECTED] wrote: Hi Vladimir, Thanks for your answer. I have installed: compat-libstdc++ (version 5.0.7-35) libstdc++-32bit (version 4.1.2_20060705-2) libstdc++41 (version 4.1.2_20061024-3) libstdc++41-devel (version 4.1.2_20061024-3) libstdc++-devel (version 4.1.3-22) but remember that in the log file, first it says (line 6393): - checking for C compiler default output file name... a.out - and about 5000 lines below, it says my compiler can't create executables (of course this isn't true, because this is the machine on wich I compile all the programs I make) Have you got any other suggestion? Please try to install a 32 bit glibc-devel package. -- Moni Thanks, Diego - Original Message - From: Vladimir Sokolovsky [EMAIL PROTECTED] To: Diego Guella [EMAIL PROTECTED] Cc: Tziporet Koren [EMAIL PROTECTED]; openib-general@openib.org Sent: Thursday, November 09, 2006 4:48 PM Subject: Re: [openib-general] Installation on openSUSE 10.2 Beta1 fails Hello Diego, Check that you have libstdc++, libstdc++-devel and compat-libstdc++ RPMs installed. Regards, Vladimir Diego Guella wrote: From: Tziporet Koren The failing is utility is used for IPoIB high availability. If you don't need to use them you can just change this line in ofed.conf: ipoibtools=n Tziporet Thanks Tziporet for your answer. Tried just right now, i disabled ipoibtools. I get another, more strange error: (attached OFED.3816.log) - /bin/rm -f /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/examples cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/libibverbs Running: ./configure --cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache --disable-libcheck --prefix /usr/local/ofed --libdir /usr/local/ofed/lib CPPFLAGS=-I../libibverbs/include configure: creating cache /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for gawk... gawk checking whether make sets $(MAKE)... yes checking build system type... x86_64-unknown-linux-gnu checking host system type... x86_64-unknown-linux-gnu checking for style of include used by make... GNU checking for gcc... gcc checking for C compiler default output file name... configure: error: C compiler cannot create executables See `config.log' for more details. Failed to execute: ./configure --cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache --disable-libcheck --prefix /usr/local/ofed --libdir /usr/local/ofed/lib CPPFLAGS=-I../libibverbs/include error: Bad exit status from /var/tmp/rpm-tmp.46102 (%install) - Am I right? It says my C compiler cannot create executables Is it joking me In the log file, line 6393, it says: - checking for C compiler default output file name... a.out - I don't understand! Is there something I can do to fix this? Thanks, Diego ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OFED 1.1 Build Issue
Vlad, On 10/31/06, Vladimir Sokolovsky [EMAIL PROTECTED] wrote: Ramachandra K wrote: Moni Shoua wrote: We already tried to go this way and found that a local Module.symvers is not always generated (but we might have missed something though). I suggest that you check that this alternative way works under all OSs compilation (SuSE and RedHat to be precise)... I think Module.symvers generation for external modules was added sometime around 2.6.16, so its not generated on the older kernels (for eg 2.6.9 kernels on RHEL) In this scenario, when there is no Module.symvers file, I guess the other option is to use a single Kbuild file to build both modules, as explained in section 7.3 of Documentation/kbuild/modules.txt. But this may not be feasible always. Come to think of it, why does the OFED installation procedure not update the kernel Module.symvers file when it replaces the old kernel modules present in /lib/modules/ with the new ones ? BTW, Why not updating the kernel Module.symvers when kernel-ib-devel is installed? This will free the developer from copying it to his/hers private directory. It might be a good idea to update the Module.symvers file as part of the normal installation and not only kernel-ib-devel. Because if the kernel modules are being replaced (or new modules are being added), shouldn't the Module.symvers file also be updated ? Regards, Ram Agree, Module.symvers should be updated by kernel-ib RPM. AFAIK Module.symvers is used in compile time only so the same logic that is used for .h files (the devel package) seems reasonable for it. --Moni So, need to implement Moni's suggestion with light changes: update kernel-ib RPM %post and %preun sections instead of kernel-ib-devel RPM %pre and %postun. Regards, Vladimir ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [openfabrics-ewg] We wish to do the 1.1 release next week
Sounds like a great idea. We don't have blocking issues, but would be happy to test the pre-release. Moni On 10/16/06, Tziporet Koren [EMAIL PROTECTED] wrote: This patch is already in. We will publish latest pre-release version tomorrow so everybody can do latest checks. Is this OK? Tziporet -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Scott Weitzenkamp (sweitzen) Sent: Sunday, October 15, 2006 10:16 PM To: Tziporet Koren; [EMAIL PROTECTED]; OPENIB Subject: Re: [openfabrics-ewg] [openib-general] We wish to do the 1.1 release next week Yes, bug 273 (http://openib.org/bugzilla/show_bug.cgi?id=273) is a blocking issue for Cisco. Roland sent a patch last Monday. I'm done testing the other parts of rc7, and am testing his patch later today. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tziporet Koren Sent: Thursday, October 12, 2006 7:44 AM To: [EMAIL PROTECTED]; OPENIB Subject: [openib-general] We wish to do the 1.1 release next week Hi all, I am back from vacation and found you waited with the release for me :-) From a quick look at status mails I think we can do the official release next week. Please reply if there are still any blocking issues you have. Also - please update all documents till end of Monday next week. Tziporet ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openfabrics-ewg mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openfabrics-ewg ___ openfabrics-ewg mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openfabrics-ewg ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [openfabrics-ewg] OFED 1.1-rc1 is available
Hi, Tziporet, On 8/8/06, Tziporet Koren [EMAIL PROTECTED] wrote: o iSER: - Stability - Testing more platforms (e.g. ppc64 and ia64) - Performance improvements Only number two above is in the scope of OFED from our perspective, so we prefer to have it listed alone. 2. iSER support in install script for SLES 10 is missing We have a fix for that and it will be part of RC2 -- Moni ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [openfabrics-ewg] Multicast traffic performace of OFED 1.0 ipoib
Mike, On 8/2/06, Michael Krause [EMAIL PROTECTED] wrote: Is the performance being measured on an identical topology and hardware set as before? Multicast by its very nature is sensitive to topology, hardware components used (buffer depth, latency, etc.) and workload occurring within the fabric. Loss occurs as a function of congestion or lack of forward progress resulting in a timeout and thus a toss of a packet. If the hardware is different or the settings chosen are changed, then the results would be expected to change. It is not clear what you hope to achieve with such tests as there will be other workloads flowing over the fabric which will create random HOL blocking which can result in packet loss. Multicast workloads should be tolerant of such loss. Mike I'm sorry about not beeing clear. My intention in the last sentance was that we got the better (120k-140k PPS) results with our proprietary IB stack and not with a previous openib snapshot. The tests were run on the same setup, which by the way was dedicated only to that traffic. I' m aware of the network implications of the test, I was looking for hints of improvements needed in the ipoib implementation. -- Moni At 04:30 AM 8/2/2006, Moni Levy wrote: Hi, we are doing some performance testing of multicast traffic over ipoib. The tests are performed by using iperf on dual 1.6G AMD PCI-X servers with PCI-X Tavor cards with 3.4.FW. Below are the command the may be used to run the test. Iperf server: route add -net 224.0.0.0 netmask 240.0.0.0 dev ib0 /home/qa/testing-tools/iperf-2.0.2/iperf -us -B 224.4.4.4 -i 1 Iperf client: route add -net 224.0.0.0 netmask 240.0.0.0 dev ib0 /home/qa/testing-tools/iperf-2.0.2/iperf -uc 224.4.4.4 -i 1 -b 100M -t 400 -l 100 We are looking for the max PPT rate (100 byte packets size) without losses, by changing the BW parameter and looking at the point where we get no losses reported. The best results we received were around 50k PPS. I remember that we got some 120k-140k packets of the same size running without losses. We are going to look into it and try to see where is the time spent, but any ideas are welcome. Best regards, Moni ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general At 04:30 AM 8/2/2006, Moni Levy wrote: Hi, we are doing some performance testing of multicast traffic over ipoib. The tests are performed by using iperf on dual 1.6G AMD PCI-X servers with PCI-X Tavor cards with 3.4.FW. Below are the command the may be used to run the test. Iperf server: route add -net 224.0.0.0 netmask 240.0.0.0 dev ib0 /home/qa/testing-tools/iperf-2.0.2/iperf -us -B 224.4.4.4 -i 1 Iperf client: route add -net 224.0.0.0 netmask 240.0.0.0 dev ib0 /home/qa/testing-tools/iperf-2.0.2/iperf -uc 224.4.4.4 -i 1 -b 100M -t 400 -l 100 We are looking for the max PPT rate (100 byte packets size) without losses, by changing the BW parameter and looking at the point where we get no losses reported. The best results we received were around 50k PPS. I remember that we got some 120k-140k packets of the same size running without losses. We are going to look into it and try to see where is the time spent, but any ideas are welcome. Best regards, Moni ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openfabrics-ewg mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openfabrics-ewg ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Multicast traffic performace of OFED 1.0 ipoib
Hi, we are doing some performance testing of multicast traffic over ipoib. The tests are performed by using iperf on dual 1.6G AMD PCI-X servers with PCI-X Tavor cards with 3.4.FW. Below are the command the may be used to run the test. Iperf server: route add -net 224.0.0.0 netmask 240.0.0.0 dev ib0 /home/qa/testing-tools/iperf-2.0.2/iperf -us -B 224.4.4.4 -i 1 Iperf client: route add -net 224.0.0.0 netmask 240.0.0.0 dev ib0 /home/qa/testing-tools/iperf-2.0.2/iperf -uc 224.4.4.4 -i 1 -b 100M -t 400 -l 100 We are looking for the max PPT rate (100 byte packets size) without losses, by changing the BW parameter and looking at the point where we get no losses reported. The best results we received were around 50k PPS. I remember that we got some 120k-140k packets of the same size running without losses. We are going to look into it and try to see where is the time spent, but any ideas are welcome. Best regards, Moni ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [openfabrics-ewg] IPoIB bonding solution for OFED 1.1 (was re: [PATCH] ipoib: fix address update handling (was Re: OFED 1.1 release - schedule and features))
On 7/20/06, Tziporet Koren [EMAIL PROTECTED] wrote: Or Gerlitz wrote: Hi Tziporet, Do you have an initial drop of the bonding solution planned for OFED 1.1 that is ready to see the daylight? if not, when is this expected? As i mentioned to you, we are investigating few possible ways to implement HA for IPoIB and want to examine your approach as well. Or. Vlad already answered. We will be happy for any help in this area. Tziporet, In order to get as much cooperation as possible I think that we should post an RFC about that before implementing it before getting the implementation in OFED 1.1. We looked into the more standard implementation that uses bonding device and tried to find out what the issues are. More then that I'm not sure that what you, guys, suggest will work if we have multicast applications running. -- Moni Tziporet ___ openfabrics-ewg mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openfabrics-ewg ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] RE: netperf for RDS needed
Ranjit, BTW, we sent all this information to Moni Levy couple of weeks back. I guess it's something with my mailbox, because I never received it. -- Moni ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] IPoIB interface for unauthorized partition
On 4/23/06, Eitan Zahavi [EMAIL PROTECTED] wrote: Hi Moni, Sorry it took me a while to get back to you (was out on vacation ...) Moni Levy wrote: On 4/10/06, Eitan Zahavi [EMAIL PROTECTED] wrote: Hi Hal, -Original Message- From: Hal Rosenstock [mailto:[EMAIL PROTECTED] Sent: Monday, April 10, 2006 2:00 PM To: Eitan Zahavi Cc: Roland Dreier; openib-general@openib.org Subject: Re: [openib-general] IPoIB interface for unauthorized partition Hi Eitan, On Mon, 2006-04-10 at 02:35, Eitan Zahavi wrote: Hi Roland, Roland Dreier wrote: Eitan I thought the intent of the IB spec when defining P_Key Eitan index usage (and not P_Key value) was that the P_Key values Eitan would never need to be known above the driver level. To Eitan avoid exposing the P_Key values we could use P_Key index Eitan for creating the IPoIB interfaces. Eitan Does it make sense to work on a patch that would setup Eitan IPoIB interfaces by the P_Key index (and not by P_Key Eitan value)? I don't see how this is feasible. The index that a particular P_Key lands at is completely undetermined -- if two nodes wanted to talk on partition 0x8001 say, how does one know which interface to use without knowing the index of that P_Key? OK, I get it. Actually the way IPoIB defines the broadcast group MGID exposes P_Key anyway. Eitan Also I think the expected behavior for IPoIB should be that Eitan IPoIB child interfaces should be automatically Eitan initialized by the code that brings up the interface Eitan (ifconfig scripts). All valid IPoIB partitions (valid = Eitan have corresponding broadcast groups) should be Eitan initialized. By doing so we provide a centralized control Eitan of the partitions and their IPoIB interfaces through the Eitan SM. Not sure if this is so. I may want a partition strictly for storage traffic something like that, so it doesn't make sense to create an IPoIB interface for that partition. OpenSM provides this capability in the partition policy: Each partition is marked explicitly if to be used for IPoIB or not. So through one file one could actually control the IPoIB interfaces that will exist in the subnet. The end node does not know the SM policy for that partition though. My intent is to write some extension to ifup for IPoIB such that all sub interfaces will be automatically started (based on pre-availability of IPoIB broadcast MGID). I'm not sure how ifup is related to that. From what I understand you'd like ipoib driver to behave as follows: 1. Get an event ( or figure it out) when a new PKEY is added to the relevant port partition table. I prefer not to rely on new events. Instead I would like to rely on existing IB Notices: If we register to multicast group create/delete events (traps 66/67) IPoIB can know about each new partition created. I'm not sure that this is a good idea, because that way all of the IPoIB nodes will get that event and try to join every new MC group and partitioning by definition is good for separating a fabric. I think that the right thing should be that only the relevant nodes try to join the specific MCG. 2. Try to join that new MC group with the MGID it created according to the PKEY and the spec. (or maybe query for the MC group existance but that's not atomic) Simply join the group. We rely on these groups to be pre-created by the SM enforcing policy dictating with partitions should be used for IPoIB and which not. If you let all the IPoIB nodes join every new group without checking their PKEY tables first, they may even get joined if the SM is not eforcing MCG to port policy. Is that your plan ? 3. In case it fails nothing is done (no relevant MC group was pre-created in the SM). Exactly 4. In case it succeeds a new interface is created. Is that what you meant ? - Moni If that were to be done, it would be cleanest if the child IPoIB interface was created only if that IPoIB broadcast group for that partition exists. [EZ] This is exactly what I had in mind. -- Hal - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] IPoIB interface for unauthorized partition
On 4/10/06, Eitan Zahavi [EMAIL PROTECTED] wrote: Hi Hal, -Original Message- From: Hal Rosenstock [mailto:[EMAIL PROTECTED] Sent: Monday, April 10, 2006 2:00 PM To: Eitan Zahavi Cc: Roland Dreier; openib-general@openib.org Subject: Re: [openib-general] IPoIB interface for unauthorized partition Hi Eitan, On Mon, 2006-04-10 at 02:35, Eitan Zahavi wrote: Hi Roland, Roland Dreier wrote: Eitan I thought the intent of the IB spec when defining P_Key Eitan index usage (and not P_Key value) was that the P_Key values Eitan would never need to be known above the driver level. To Eitan avoid exposing the P_Key values we could use P_Key index Eitan for creating the IPoIB interfaces. Eitan Does it make sense to work on a patch that would setup Eitan IPoIB interfaces by the P_Key index (and not by P_Key Eitan value)? I don't see how this is feasible. The index that a particular P_Key lands at is completely undetermined -- if two nodes wanted to talk on partition 0x8001 say, how does one know which interface to use without knowing the index of that P_Key? OK, I get it. Actually the way IPoIB defines the broadcast group MGID exposes P_Key anyway. Eitan Also I think the expected behavior for IPoIB should be that Eitan IPoIB child interfaces should be automatically Eitan initialized by the code that brings up the interface Eitan (ifconfig scripts). All valid IPoIB partitions (valid = Eitan have corresponding broadcast groups) should be Eitan initialized. By doing so we provide a centralized control Eitan of the partitions and their IPoIB interfaces through the Eitan SM. Not sure if this is so. I may want a partition strictly for storage traffic something like that, so it doesn't make sense to create an IPoIB interface for that partition. OpenSM provides this capability in the partition policy: Each partition is marked explicitly if to be used for IPoIB or not. So through one file one could actually control the IPoIB interfaces that will exist in the subnet. The end node does not know the SM policy for that partition though. My intent is to write some extension to ifup for IPoIB such that all sub interfaces will be automatically started (based on pre-availability of IPoIB broadcast MGID). I'm not sure how ifup is related to that. From what I understand you'd like ipoib driver to behave as follows: 1. Get an event ( or figure it out) when a new PKEY is added to the relevant port partition table. 2. Try to join that new MC group with the MGID it created according to the PKEY and the spec. (or maybe query for the MC group existance but that's not atomic) 3. In case it fails nothing is done (no relevant MC group was pre-created in the SM). 4. In case it succeeds a new interface is created. Is that what you meant ? - Moni If that were to be done, it would be cleanest if the child IPoIB interface was created only if that IPoIB broadcast group for that partition exists. [EZ] This is exactly what I had in mind. -- Hal - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] ib_local_sa testing and observations.
Hi Sean, we've thought about possible ways of testing the implementation of ib_local_sa and tried to estimate the load that it would cause to the fabric. We did some math about the number of packets that the SM should be able to handle in a test case of 1k node fabric and it looks that this should be pretty heavy load on the SM side. The first, bring up storm will be something like approximately 1000 paths / 3 paths per packet = 333 RMPP packets, lets say that the RMPP window is 20 , that means 17 more ACKs (RX) so approx 350 packets to handle per node. In case we have 1000 nodes then the SM will have to handle 350k packets in 1000 concurrent RMPP sessions. Now we get to implementation details of the SMs. Do you know how many RMPP packets per second (maximum) the OSM can handle? Please keep in mind that in case of RMPP packets there is a lot of processing in the sender side like timers, window management and ACK/NACK processing, also the whole list of paths should be recreated for each session(CPU load on the SM machine). That probably means we'll have a period at the beginning of the fabric bring up during which the SM will just not be able to process any queries. That's the exact period that all of the IPoIB interfaces in the nodes would like to join to the relevant MC groups and will probably not get processed in a reasonable time period (timeout). I'm not even thinking about retransmissions of lost RMPP packets , 2-3 partitions and lmc 0. Did you do any tests or have any ideas of possible simulations that can help to verify the above. Regards, Moni ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Please give 1.0 RC1 a whirl
On 3/13/06, James Lentini [EMAIL PROTECTED] wrote: On Wed, 8 Mar 2006, Hal Rosenstock wrote: On Tue, 2006-03-07 at 20:56, Bryan O'Sullivan wrote: On Tue, 2006-03-07 at 17:45 -0800, Sean Hefty wrote: Bryan O'Sullivan wrote: libibat libibat-debuginfo libibat-devel libibat-utils The kernel modules to support these are obsolete. We should remove them from the release. Fine by me :-) Are we killing off the kernel code, too, in parallel? I'm still waiting to hear all consumers have moved to addr/CMA. The kDAPL OpenIB provider still uses IBAT. I'd like to see a copy of this code available from OpenIB, but it does not have to be on the trunk. For my purposes a copy in https://openib.org/svn/gen2/branches/shaharf-ibat would be acceptable. Do you plan to continue using this code ? Any chance that this branch's at.c and at_priv.h could be updated to match the versions on the trunk? I don't know the state of the shaharf-ibat branch. The shaharf-ibat branch was not maintained. If the current IBAT code wouldn't be consistent with that branch, I don't mind keeping the current IBAT code at https://openib.org/svn/gen2/users/jlentini/ibat ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] mthca - read byte count read request size
On 3/7/06, Roland Dreier [EMAIL PROTECTED] wrote: Grant Why is this an enum? +static int pcix_max_rbc = PCIX_MAX_RBC_INVALID; Grant It's declared an int and is user visible. I think the Grant user interface would be better served if the user could Grant just specify pcix_max_rbc=2048 instead of some magic Grant value. Yes, makes sense, and any invalid value (including a default value of say 0) would mean for the driver to ignore the module parameter. Maybe a message to the syslog can inform the user that his value was ignored in order to spare false assumptions. - Moni - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: OpenIb 1.0 release components
On 2/24/06, Michael S. Tsirkin [EMAIL PROTECTED] wrote: Quoting Moni Levy [EMAIL PROTECTED]: While this might be a good idea for modules such as iSER which are not currently part of the mainline kernel tree, it is in my opinion clearly not a good idea to replace the modules which *are* distributed with the mainline kernel. I agree, for the most part. What I have in mind for non-upstream kernel support is this: * We have to ship out-of-tree drivers, simply because there's only one driver in the upstream kernel, and the others are not yet ready for submission. * Some kernel components are clearly not contenders for shipping. One example is kdapl, because it appears to be dead due to upstream veto. * Others might be reasonable, if they (a) see some testing and (b) don't intrusively patch the core kernel. I'm thinking here about iSER and, to a lesser extent, SDP. I would like to add another point also. It looks like that in this round of the major distribution releases they will just not be able to include the 1.0 release due to time constraints, so the only way to use 1.0 release (or newer) will be to replace them in the kernel. Moni I dont really understand this last point. What do you mean when you say replace them in kernel? Replace what? Is there an option that the distros would like to get more stable code that is not in kernel.org (yet) ? I understand it why you might want to add out of kernel modules such as iSER. My point is they must work with core components included in kernel, not with core out of the svn tree. Now I understand. I gather Brian here agrees. -- Michael S. Tsirkin Staff Engineer, Mellanox Technologies ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Towards a 1.0 release of OpenIB
On 2/22/06, Bryan O'Sullivan [EMAIL PROTECTED] wrote: * We would like everyone to be able to run the same tests, so someone must gather test suites and execution instructions together. How would you like to manage that list of tests ? Wiki ? Within the next week, I'd like to gain an understanding of the following things: * Which features users want to see tested again, do you expect that the tests will be listed in email or you prefer to start some kind of a document ? * Which distros users want binary packages for I guess that SLES 10 latest beta EL4 in my opinion will be ok to start with. * Who can sign up to build and test those packages I hope that the distros teams will be happy to do so together with the vendor companies. * Whether we need to be building binary kernel packages to make testing more consistent That might be a good idea for also simplifying the test setups bring up process. I think that we at least need to agree on a reference .config for the latest kernel to use for common ground. Moni Levy | +972-971-7670(o) Project Manager, Mainstream IB host stack Voltaire – The Grid Backbone http://www.voltaire.com/ ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenIb 1.0 release components
On 2/23/06, Bryan O'Sullivan [EMAIL PROTECTED] wrote: On Thu, 2006-02-23 at 19:03 +0200, Michael S. Tsirkin wrote: It seems that the openib release 1.0 as planned will include not only userspace libraries but also some kernel level modules. Yes, I expect so. While this might be a good idea for modules such as iSER which are not currently part of the mainline kernel tree, it is in my opinion clearly not a good idea to replace the modules which *are* distributed with the mainline kernel. I agree, for the most part. What I have in mind for non-upstream kernel support is this: * We have to ship out-of-tree drivers, simply because there's only one driver in the upstream kernel, and the others are not yet ready for submission. * Some kernel components are clearly not contenders for shipping. One example is kdapl, because it appears to be dead due to upstream veto. * Others might be reasonable, if they (a) see some testing and (b) don't intrusively patch the core kernel. I'm thinking here about iSER and, to a lesser extent, SDP. I would like to add another point also. It looks like that in this round of the major distribution releases they will just not be able to include the 1.0 release due to time constraints, so the only way to use 1.0 release (or newer) will be to replace them in the kernel. Moni The problem with SDP in particular is that we need the socket family to be present in the upstream kernel, or we can't offer a stable ABI. But SDP seems to be quite flaky, so it's not obviously a candidate for pushing to the upstream kernel as it stands. b ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] We have an OpenIB code release team
Hi Matt, I would be happy to join the release team as additional Voltaire representative. Moni Levy | +972-971-7670(o) Project Manager, Mainstream IB host stack Voltaire – The Grid Backbone http://www.voltaire.com/ On 2/14/06, Tziporet Koren [EMAIL PROTECTED] wrote: Hi Matt, Good that we start the release effort. I would like to join the release team as Mellanox representative. Tziporet ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general