[openib-general] [RFC] IB/ipoib: Asynchronous events delivered without port parameter.

2007-02-27 Thread Moni Levy
Hello,
I did a short code review of the ipoib code concentrating on
partitioning support and I mentioned that the asynchronous events
handler in the ipoib code does not take the port number reported in
the event record into consideration. The effect of that is that all of
the ib# devices related to that specific HCA are flushed when it seems
to me that only the relevant port one should be. Is that done on
purpose, or am I missing something ?

Thanks,
Moni

p.s. I'm working on a patch that should solve another issue caused by
PKEY reordering  ipoib behavior and the above issue further
complicates things for me.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for pkey reordering

2007-02-27 Thread Moni Levy
This issue was found during partitioning  SM fail over testing. The fix was 
tested over the weekend with pkey reshuffling, removal and addition every few 
seconds concurrent with OFED restart. The patch applies on Roland's git tree. 

Changes from v1: 
* added flush flag to ipoib_ib_dev_stop(), ipoib_ib_dev_down() alike
* fixed a bug in device extraction from the work struct
* removed some warnings in case they are caused due to missing PKEY as 
this seems like a valid flow now.

SM reconfiguration or failover possibly causes a shuffling of the values in the 
port pkey table. The current implementation only queries for the index of the 
pkey once, when it creates the device QP and after that moves it into working 
state, and hence does not address this scenario. Fix this by using the 
PKEY_CHANGE event as a trigger to reconfigure the device QP.

Signed-off-by: Moni Levy [EMAIL PROTECTED]
---
 ipoib.h   |4 +++-
 ipoib_ib.c|   51 +--
 ipoib_main.c  |5 +++--
 ipoib_multicast.c |   11 ++-
 ipoib_verbs.c |8 +++-
 5 files changed, 60 insertions(+), 19 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 2594db2..d08ecca 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -205,6 +205,7 @@ struct ipoib_dev_priv {
struct delayed_work pkey_task;
struct delayed_work mcast_task;
struct work_struct flush_task;
+   struct work_struct flush_restart_qp_task;
struct work_struct restart_task;
struct delayed_work ah_reap_task;
 
@@ -334,12 +335,13 @@ struct ipoib_dev_priv *ipoib_intf_alloc(
 
 int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port);
 void ipoib_ib_dev_flush(struct work_struct *work);
+void ipoib_ib_dev_flush_restart_qp(struct work_struct *work);
 void ipoib_ib_dev_cleanup(struct net_device *dev);
 
 int ipoib_ib_dev_open(struct net_device *dev);
 int ipoib_ib_dev_up(struct net_device *dev);
 int ipoib_ib_dev_down(struct net_device *dev, int flush);
-int ipoib_ib_dev_stop(struct net_device *dev);
+int ipoib_ib_dev_stop(struct net_device *dev, int flush);
 
 int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port);
 void ipoib_dev_cleanup(struct net_device *dev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 
b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index f2aa923..b0287c1 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -415,21 +415,22 @@ int ipoib_ib_dev_open(struct net_device 
 
ret = ipoib_init_qp(dev);
if (ret) {
-   ipoib_warn(priv, ipoib_init_qp returned %d\n, ret);
+   if (ret != -ENOENT)
+   ipoib_warn(priv, ipoib_init_qp returned %d\n, ret);
return -1;
}
 
ret = ipoib_ib_post_receives(dev);
if (ret) {
ipoib_warn(priv, ipoib_ib_post_receives returned %d\n, ret);
-   ipoib_ib_dev_stop(dev);
+   ipoib_ib_dev_stop(dev, 1);
return -1;
}
 
ret = ipoib_cm_dev_open(dev);
if (ret) {
ipoib_warn(priv, ipoib_ib_post_receives returned %d\n, ret);
-   ipoib_ib_dev_stop(dev);
+   ipoib_ib_dev_stop(dev, 1);
return -1;
}
 
@@ -508,7 +509,7 @@ static int recvs_pending(struct net_devi
return pending;
 }
 
-int ipoib_ib_dev_stop(struct net_device *dev)
+int ipoib_ib_dev_stop(struct net_device *dev, int flush)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
struct ib_qp_attr qp_attr;
@@ -581,7 +582,8 @@ timeout:
/* Wait for all AHs to be reaped */
set_bit(IPOIB_STOP_REAPER, priv-flags);
cancel_delayed_work(priv-ah_reap_task);
-   flush_workqueue(ipoib_workqueue);
+   if (flush)
+   flush_workqueue(ipoib_workqueue);
 
begin = jiffies;
 
@@ -622,13 +624,17 @@ int ipoib_ib_dev_init(struct net_device 
return 0;
 }
 
-void ipoib_ib_dev_flush(struct work_struct *work)
+static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, int restart_qp)
 {
-   struct ipoib_dev_priv *cpriv, *priv =
-   container_of(work, struct ipoib_dev_priv, flush_task);
+   struct ipoib_dev_priv *cpriv;
struct net_device *dev = priv-dev;
 
-   if (!test_bit(IPOIB_FLAG_INITIALIZED, priv-flags) ) {
+   /*
+* ipoib_ib_dev_stop() below may not find the PKey and leave the
+* IPOIB_FLAG_INITIALIZED flag off so flush in that case with restart_qp
+* flag on is Ok.
+*/
+   if (!test_bit(IPOIB_FLAG_INITIALIZED, priv-flags)  !restart_qp) {
ipoib_dbg(priv, Not flushing - IPOIB_FLAG_INITIALIZED not 
set.\n);
return;
}
@@ -641,6 +647,13 @@ void ipoib_ib_dev_flush

Re: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for pkey reordering

2007-02-27 Thread Moni Levy
On 2/27/07, Roland Dreier [EMAIL PROTECTED] wrote:
   I just gave this a cursory glance.

 I haven't really read it except to think why is this so complicated?

Do you refer to that complication of the patch of the issue ?


   A suggestion: would it not be much simpler to modify the QP from RTS to 
 RTS on pkey
   change?

 Changing the P_Key index is not allowed for RTS-RTS.  You would have
 to modify the QP RTS-SQD, wait for the SQ to drain, then modify the
 P_Key index with SQD-SQD, and finally go SQD-RTS.

Do you think that using that way to solve it will be a significant
simplification ? We'll still have to reuse that handling for missed
completion that is currently implemented in ipoib_ib_dev_stop and
still have additional work element.

-- Moni


  - R.

 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered without port parameter.

2007-02-27 Thread Moni Levy
On 2/27/07, Roland Dreier [EMAIL PROTECTED] wrote:
  I did a short code review of the ipoib code concentrating on
   partitioning support and I mentioned that the asynchronous events
   handler in the ipoib code does not take the port number reported in
   the event record into consideration. The effect of that is that all of
   the ib# devices related to that specific HCA are flushed when it seems
   to me that only the relevant port one should be. Is that done on
   purpose, or am I missing something ?

 I don't think there's any particular reason the code is that way
 except for the oversight never being corrected.  But it looks trivial
 to fix, like the patch below.  Does that look right to you?

   p.s. I'm working on a patch that should solve another issue caused by
   PKEY reordering  ipoib behavior and the above issue further
   complicates things for me.

 Why not fix the issue first then?

 commit a27cbe878203076247c1b5287f5ab59ed143b560
 Author: Roland Dreier [EMAIL PROTECTED]
 Date:   Tue Feb 27 07:37:49 2007 -0800

 IPoIB: Only handle async events for one port

 An asynchronous event carries the port number that the event occurred
 on, so there's no reason for an IPoIB interface to process an event
 associated with a different local HCA port.

 Signed-off-by: Roland Dreier [EMAIL PROTECTED]

 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 
 b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
 index 3cb551b..7f3ec20 100644
 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
 +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
 @@ -259,12 +259,13 @@ void ipoib_event(struct ib_event_handler *handler,
 struct ipoib_dev_priv *priv =
 container_of(handler, struct ipoib_dev_priv, event_handler);

 -   if (record-event == IB_EVENT_PORT_ERR||
 -   record-event == IB_EVENT_PKEY_CHANGE ||
 -   record-event == IB_EVENT_PORT_ACTIVE ||
 -   record-event == IB_EVENT_LID_CHANGE  ||
 -   record-event == IB_EVENT_SM_CHANGE   ||
 -   record-event == IB_EVENT_CLIENT_REREGISTER) {
 +   if ((record-event == IB_EVENT_PORT_ERR||
 +record-event == IB_EVENT_PKEY_CHANGE ||
 +record-event == IB_EVENT_PORT_ACTIVE ||
 +record-event == IB_EVENT_LID_CHANGE  ||
 +record-event == IB_EVENT_SM_CHANGE   ||
 +record-event == IB_EVENT_CLIENT_REREGISTER) 
 +   record-element.port_num == priv-port) {
 ipoib_dbg(priv, Port state change event\n);
 queue_work(ipoib_workqueue, priv-flush_task);
 }


That's exactly what I intended to post.

--Moni

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for pkey reordering

2007-02-27 Thread Moni Levy
On 2/27/07, Roland Dreier [EMAIL PROTECTED] wrote:
I haven't really read it except to think why is this so complicated?
  
   Do you refer to that complication of the patch of the issue ?

 the patch.

Please advise and I'll change it.


Changing the P_Key index is not allowed for RTS-RTS.  You would have
to modify the QP RTS-SQD, wait for the SQ to drain, then modify the
P_Key index with SQD-SQD, and finally go SQD-RTS.
  
   Do you think that using that way to solve it will be a significant
   simplification ? We'll still have to reuse that handling for missed
   completion that is currently implemented in ipoib_ib_dev_stop and
   still have additional work element.

 no, I don't think SQD is really useful in practice.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered without port parameter.

2007-02-27 Thread Moni Levy
On 2/27/07, Moni Levy [EMAIL PROTECTED] wrote:
 On 2/27/07, Roland Dreier [EMAIL PROTECTED] wrote:
   I did a short code review of the ipoib code concentrating on
partitioning support and I mentioned that the asynchronous events
handler in the ipoib code does not take the port number reported in
the event record into consideration. The effect of that is that all of
the ib# devices related to that specific HCA are flushed when it seems
to me that only the relevant port one should be. Is that done on
purpose, or am I missing something ?
 
  I don't think there's any particular reason the code is that way
  except for the oversight never being corrected.  But it looks trivial
  to fix, like the patch below.  Does that look right to you?
 
p.s. I'm working on a patch that should solve another issue caused by
PKEY reordering  ipoib behavior and the above issue further
complicates things for me.
 
  Why not fix the issue first then?
 
  commit a27cbe878203076247c1b5287f5ab59ed143b560
  Author: Roland Dreier [EMAIL PROTECTED]
  Date:   Tue Feb 27 07:37:49 2007 -0800
 
  IPoIB: Only handle async events for one port
 
  An asynchronous event carries the port number that the event occurred
  on, so there's no reason for an IPoIB interface to process an event
  associated with a different local HCA port.
 
  Signed-off-by: Roland Dreier [EMAIL PROTECTED]
 
  diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 
  b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
  index 3cb551b..7f3ec20 100644
  --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
  +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
  @@ -259,12 +259,13 @@ void ipoib_event(struct ib_event_handler *handler,
  struct ipoib_dev_priv *priv =
  container_of(handler, struct ipoib_dev_priv, event_handler);
 
  -   if (record-event == IB_EVENT_PORT_ERR||
  -   record-event == IB_EVENT_PKEY_CHANGE ||
  -   record-event == IB_EVENT_PORT_ACTIVE ||
  -   record-event == IB_EVENT_LID_CHANGE  ||
  -   record-event == IB_EVENT_SM_CHANGE   ||
  -   record-event == IB_EVENT_CLIENT_REREGISTER) {
  +   if ((record-event == IB_EVENT_PORT_ERR||
  +record-event == IB_EVENT_PKEY_CHANGE ||
  +record-event == IB_EVENT_PORT_ACTIVE ||
  +record-event == IB_EVENT_LID_CHANGE  ||
  +record-event == IB_EVENT_SM_CHANGE   ||
  +record-event == IB_EVENT_CLIENT_REREGISTER) 
  +   record-element.port_num == priv-port) {
  ipoib_dbg(priv, Port state change event\n);
  queue_work(ipoib_workqueue, priv-flush_task);
  }
 

 That's exactly what I intended to post.

On a second thought based on the fact that on a two port HCA we'll
have a 50% miss on the events being delivered, I would move the new
condition to be evaluated first. I apologize if this is too much of
micro optimization. What do you think ?

--Moni


 --Moni


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC] [PATCH] ib_cache: do not mask upper bit when searching for a pkey

2007-02-27 Thread Moni Levy
Sean,
On 2/26/07, Sean Hefty [EMAIL PROTECTED] wrote:
 I think the following patch would make ipoib spec compliant.
 ib_find_cached_pkey is called by ib_cm, rdma_cm, ib_srp, and ib_ipoib.
 I'm not certain what this change would do to SRP, but the ib_cm and
 rdma_cm look okay, given that non-reversible paths aren't supported
 yet anyway.

Sorry for jumping into that thread, but although this patch will make
things more spec compliant, it will break functionality we depend one.
I suggest that we first find an alternate way to enable usage of
partial partition membership before disabling that functionality at
all.

--Moni

 --

 ib_find_cached_pkey masks off the upper-bit of the PKey when searching
 for a match.  The upper bit indicates partial or full membership.  Ignoring
 the upper bit can result in a full membership PKey matching with a partial
 membership PKey.  For ipoib, this can result in joining a multicast group
 that disallows communication between all members.

 Signed-off-by: Sean Hefty [EMAIL PROTECTED]
 ---
  drivers/infiniband/core/cache.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
 index 558c9a0..6f366c3 100644
 --- a/drivers/infiniband/core/cache.c
 +++ b/drivers/infiniband/core/cache.c
 @@ -179,7 +179,7 @@ int ib_find_cached_pkey(struct ib_device *device,
 *index = -1;

 for (i = 0; i  cache-table_len; ++i)
 -   if ((cache-table[i]  0x7fff) == (pkey  0x7fff)) {
 +   if (cache-table[i] == pkey) {
 *index = i;
 ret = 0;
 break;
 --
 1.4.4.3



 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered without port parameter.

2007-02-27 Thread Moni Levy
On 2/27/07, Roland Dreier [EMAIL PROTECTED] wrote:
   On a second thought based on the fact that on a two port HCA we'll
   have a 50% miss on the events being delivered, I would move the new
   condition to be evaluated first. I apologize if this is too much of
   micro optimization. What do you think ?

 That wouldn't really be correct since element.port_num isn't valid
 unless we already know it's a port-related event.

You're perfectly right, sorry.


 And it's not worth worrying about this since it's not remotely a hot path.

Ok.

--Moni


  - R.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC] [PATCH] ib_cache: do not mask upper bit when searching for a pkey

2007-02-27 Thread Moni Levy
On 2/27/07, Sean Hefty [EMAIL PROTECTED] wrote:
  Sorry for jumping into that thread, but although this patch will make
  things more spec compliant, it will break functionality we depend one.
  I suggest that we first find an alternate way to enable usage of
  partial partition membership before disabling that functionality at
  all.

 Can you clarify the functionality you depend on?  Are you reliant on ipoib 
 being
 able to join a multicast group from partial partition membership?

Exactly.

 If so, do all SA's and switches support this?

I can't commit on all the SA's and switches.

-- Moni


 - Sean

 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH] IB/ipoib: Fix ipoib handling for pkey reordering

2007-02-19 Thread Moni Levy
This issue was found during partitioning  SM fail over testing. The fix was 
tested for 24 hours with pkey reshuffling every few seconds. The patch applies 
to Roland's master branch.

SM reconfiguration or failover possibly causes a shuffling of the values in the 
port pkey table. The current implementation only queries for the index of the 
pkey once, when it creates the device QP and after that moves it into working 
state, and hence does not address this scenario. Fix this by using the 
PKEY_CHANGE event as a trigger to reconfigure the device QP. 

Signed-off-by: Moni Levy [EMAIL PROTECTED]
---
 ipoib.h   |2 ++
 ipoib_ib.c|   22 --
 ipoib_main.c  |1 +
 ipoib_verbs.c |4 +++-
 4 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 07deee8..ed854e8 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -139,6 +139,7 @@ struct ipoib_dev_priv {
struct delayed_work pkey_task;
struct delayed_work mcast_task;
struct work_struct flush_task;
+   struct work_struct flush_restart_qp_task;
struct work_struct restart_task;
struct delayed_work ah_reap_task;
 
@@ -261,6 +262,7 @@ struct ipoib_dev_priv *ipoib_intf_alloc(
 
 int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port);
 void ipoib_ib_dev_flush(struct work_struct *work);
+void ipoib_ib_dev_flush_restart_qp(struct work_struct *work);
 void ipoib_ib_dev_cleanup(struct net_device *dev);
 
 int ipoib_ib_dev_open(struct net_device *dev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 
b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 59d9594..5e2ada9 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -611,7 +611,7 @@ int ipoib_ib_dev_init(struct net_device 
return 0;
 }
 
-void ipoib_ib_dev_flush(struct work_struct *work)
+static void __ipoib_ib_dev_flush(struct work_struct *work, int restart_qp)
 {
struct ipoib_dev_priv *cpriv, *priv =
container_of(work, struct ipoib_dev_priv, flush_task);
@@ -630,6 +630,12 @@ void ipoib_ib_dev_flush(struct work_stru
ipoib_dbg(priv, flushing\n);
 
ipoib_ib_dev_down(dev, 0);
+   
+   if (restart_qp) {
+   ipoib_dbg(priv, restarting the device QP\n);
+   ipoib_ib_dev_stop(dev);
+   ipoib_ib_dev_open(dev);
+   }
 
/*
 * The device could have been brought down between the start and when
@@ -644,11 +650,23 @@ void ipoib_ib_dev_flush(struct work_stru
 
/* Flush any child interfaces too */
list_for_each_entry(cpriv, priv-child_intfs, list)
-   ipoib_ib_dev_flush(cpriv-flush_task);
+   __ipoib_ib_dev_flush(cpriv-flush_task, restart_qp);
 
mutex_unlock(priv-vlan_mutex);
 }
 
+void ipoib_ib_dev_flush(struct work_struct *work)
+{
+   /* We only restart the QP in case of PKEY change event */ 
+   __ipoib_ib_dev_flush(work, 0);
+}
+
+void ipoib_ib_dev_flush_restart_qp(struct work_struct *work)
+{
+   /* We only restart the QP in case of PKEY change event */ 
+   __ipoib_ib_dev_flush(work, 1);
+}
+
 void ipoib_ib_dev_cleanup(struct net_device *dev)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 705eb1d..da46b79 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -942,6 +942,7 @@ static void ipoib_setup(struct net_devic
INIT_DELAYED_WORK(priv-pkey_task,ipoib_pkey_poll);
INIT_DELAYED_WORK(priv-mcast_task,   ipoib_mcast_join_task);
INIT_WORK(priv-flush_task,   ipoib_ib_dev_flush);
+   INIT_WORK(priv-flush_restart_qp_task, ipoib_ib_dev_flush_restart_qp);
INIT_WORK(priv-restart_task, ipoib_mcast_restart_task);
INIT_DELAYED_WORK(priv-ah_reap_task, ipoib_reap_ah);
 }
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 
b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
index 7b717c6..c249915 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
@@ -252,12 +252,14 @@ void ipoib_event(struct ib_event_handler
container_of(handler, struct ipoib_dev_priv, event_handler);
 
if (record-event == IB_EVENT_PORT_ERR||
-   record-event == IB_EVENT_PKEY_CHANGE ||
record-event == IB_EVENT_PORT_ACTIVE ||
record-event == IB_EVENT_LID_CHANGE  ||
record-event == IB_EVENT_SM_CHANGE   ||
record-event == IB_EVENT_CLIENT_REREGISTER) {
ipoib_dbg(priv, Port state change event\n);
queue_work(ipoib_workqueue, priv-flush_task);
+   } else if (record-event == IB_EVENT_PKEY_CHANGE) {
+   ipoib_dbg(priv, PKEY change event\n

Re: [openib-general] [PATCH] librdmacm: fix bug causing failure to work with partial membership pkey

2007-02-19 Thread Moni Levy
Or,
On 2/19/07, Or Gerlitz [EMAIL PROTECTED] wrote:
 Hi Sean,

 this fixes a bug which did not allow to run librdmacm apps over a node
 which is partial member of a partition. The patch takes the approach of the
 kernel ib_find_cached_pkey implementation.

 If you approve this, i suggest pushing it also into OFED 1.2 as a bug fix.

 Or.

 --
 The pkey extracted by the RDMA CM from the IPoIB device hardware address 
 always
 has the full membership bit set. However, when looking in the pkey table the
 search must mask out the full membership bit.

 Signed-off-by: Or Gerlitz [EMAIL PROTECTED]
 Signed-off-by: Olga Shern [EMAIL PROTECTED]

 diff --git a/src/cma.c b/src/cma.c
 index c5f8cd9..9c24c6a 100644
 --- a/src/cma.c
 +++ b/src/cma.c
 @@ -661,7 +661,7 @@ static int ucma_find_pkey(struct cma_dev

 for (i = 0, ret = 0; !ret; i++) {
 ret = ibv_query_pkey(cma_dev-verbs, port_num, i, chk_pkey);
 -   if (!ret  pkey == chk_pkey) {
 +   if ((!ret  pkey  == chk_pkey) || (!ret  htons(ntohs(pkey) 
  0x7fff)  == chk_pkey)) {

What about just using:
if (!ret  pkey | 0x8000  == chk_pkey | 0x8000) {

even if not there is no need to check the ret twice in case of limited
membership

-- Moni
 *pkey_index = (uint16_t) i;
 return 0;
 }

 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] IB/ipoib: Fix ipoib handling for pkey reordering

2007-02-19 Thread Moni Levy
On 2/19/07, Moni Levy [EMAIL PROTECTED] wrote:
 This issue was found during partitioning  SM fail over testing. The fix was 
 tested for 24
 hours with pkey reshuffling every few seconds. The patch applies to Roland's 
 master
 branch.

I found an issue with that patch, I'll post an updated one soon.

-- Moni


 SM reconfiguration or failover possibly causes a shuffling of the values in 
 the port pkey
 table. The current implementation only queries for the index of the pkey 
 once, when it
 creates the device QP and after that moves it into working state, and hence 
 does not
 address this scenario. Fix this by using the PKEY_CHANGE event as a trigger to
 reconfigure the device QP.

 Signed-off-by: Moni Levy [EMAIL PROTECTED]
 ---
  ipoib.h   |2 ++
  ipoib_ib.c|   22 --
  ipoib_main.c  |1 +
  ipoib_verbs.c |4 +++-
  4 files changed, 26 insertions(+), 3 deletions(-)

 diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
 b/drivers/infiniband/ulp/ipoib/ipoib.h
 index 07deee8..ed854e8 100644
 --- a/drivers/infiniband/ulp/ipoib/ipoib.h
 +++ b/drivers/infiniband/ulp/ipoib/ipoib.h
 @@ -139,6 +139,7 @@ struct ipoib_dev_priv {
 struct delayed_work pkey_task;
 struct delayed_work mcast_task;
 struct work_struct flush_task;
 +   struct work_struct flush_restart_qp_task;
 struct work_struct restart_task;
 struct delayed_work ah_reap_task;

 @@ -261,6 +262,7 @@ struct ipoib_dev_priv *ipoib_intf_alloc(

  int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int 
 port);
  void ipoib_ib_dev_flush(struct work_struct *work);
 +void ipoib_ib_dev_flush_restart_qp(struct work_struct *work);
  void ipoib_ib_dev_cleanup(struct net_device *dev);

  int ipoib_ib_dev_open(struct net_device *dev);
 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 
 b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
 index 59d9594..5e2ada9 100644
 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
 +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
 @@ -611,7 +611,7 @@ int ipoib_ib_dev_init(struct net_device
 return 0;
  }

 -void ipoib_ib_dev_flush(struct work_struct *work)
 +static void __ipoib_ib_dev_flush(struct work_struct *work, int restart_qp)
  {
 struct ipoib_dev_priv *cpriv, *priv =
 container_of(work, struct ipoib_dev_priv, flush_task);
 @@ -630,6 +630,12 @@ void ipoib_ib_dev_flush(struct work_stru
 ipoib_dbg(priv, flushing\n);

 ipoib_ib_dev_down(dev, 0);
 +
 +   if (restart_qp) {
 +   ipoib_dbg(priv, restarting the device QP\n);
 +   ipoib_ib_dev_stop(dev);
 +   ipoib_ib_dev_open(dev);
 +   }

 /*
  * The device could have been brought down between the start and when
 @@ -644,11 +650,23 @@ void ipoib_ib_dev_flush(struct work_stru

 /* Flush any child interfaces too */
 list_for_each_entry(cpriv, priv-child_intfs, list)
 -   ipoib_ib_dev_flush(cpriv-flush_task);
 +   __ipoib_ib_dev_flush(cpriv-flush_task, restart_qp);

 mutex_unlock(priv-vlan_mutex);
  }

 +void ipoib_ib_dev_flush(struct work_struct *work)
 +{
 +   /* We only restart the QP in case of PKEY change event */
 +   __ipoib_ib_dev_flush(work, 0);
 +}
 +
 +void ipoib_ib_dev_flush_restart_qp(struct work_struct *work)
 +{
 +   /* We only restart the QP in case of PKEY change event */
 +   __ipoib_ib_dev_flush(work, 1);
 +}
 +
  void ipoib_ib_dev_cleanup(struct net_device *dev)
  {
 struct ipoib_dev_priv *priv = netdev_priv(dev);
 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
 b/drivers/infiniband/ulp/ipoib/ipoib_main.c
 index 705eb1d..da46b79 100644
 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
 +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
 @@ -942,6 +942,7 @@ static void ipoib_setup(struct net_devic
 INIT_DELAYED_WORK(priv-pkey_task,ipoib_pkey_poll);
 INIT_DELAYED_WORK(priv-mcast_task,   ipoib_mcast_join_task);
 INIT_WORK(priv-flush_task,   ipoib_ib_dev_flush);
 +   INIT_WORK(priv-flush_restart_qp_task, 
 ipoib_ib_dev_flush_restart_qp);
 INIT_WORK(priv-restart_task, ipoib_mcast_restart_task);
 INIT_DELAYED_WORK(priv-ah_reap_task, ipoib_reap_ah);
  }
 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 
 b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
 index 7b717c6..c249915 100644
 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
 +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
 @@ -252,12 +252,14 @@ void ipoib_event(struct ib_event_handler
 container_of(handler, struct ipoib_dev_priv, event_handler);

 if (record-event == IB_EVENT_PORT_ERR||
 -   record-event == IB_EVENT_PKEY_CHANGE ||
 record-event == IB_EVENT_PORT_ACTIVE ||
 record-event == IB_EVENT_LID_CHANGE  ||
 record-event == IB_EVENT_SM_CHANGE   ||
 record-event == IB_EVENT_CLIENT_REREGISTER

Re: [openib-general] issues with compilation of ofed 1.2

2007-02-07 Thread Moni Levy
Doug,
On 2/7/07, Yosef Etigin [EMAIL PROTECTED] wrote:
 7. On RHAS5 beta 2, the setup requires sysfstuils-devel RPM which is not 
 included in this distro.

Can you please help us with that ?

-- Moni


 --
 Yosef Etigin
 Alex Tabachnik


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OFED-1.2 first release

2007-02-05 Thread Moni Levy
Vlad,

 # tail -10 /tmp/OFED.10899.log
 Wrote:
 /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1-rh-x86_64.rpm
 Wrote:
 /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-debuginfo-0.9.0-1-rh-x86_64.rpm
 Executing(--clean): /bin/sh -e /var/tmp/rpm-tmp.98615
 + umask 022
 + cd /var/tmp/OFEDRPM/BUILD
 + rm -rf ib-bonding-0.9.0
 + exit 0
 /bin/mv: cannot stat
 `/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm

I see that there is a small difference in the expected RPM name. Can
you fix that in the script or should we change the name of the RPM ?

-- Moni

 ': No such file or directory
 ERROR: Failed executing /bin/mv -f
 /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.
 0-1.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OFED 1.2 release - to be reviewed in the meeting today

2007-02-01 Thread Moni Levy
Tziporet,
On 1/31/07, Tziporet Koren [EMAIL PROTECTED] wrote:
 Shaun Rowland wrote:
 
  Hi. I am not exactly sure where the ofed_1_2 directory for MPI SRPMs is
  supposed to go. I assume from previous meetings this is just a
  filesystem directory. Should it be a directory in my home directory on
  staging.openfabrics.org, in ~/public_html, or is there something else I
  need to do to put this into place? I think from the previous MPI
  specific meeting, this was supposed to be done in a web directory. Since
  I am unclear, I wanted to ask here.

 Please place your SRPM under your home directory at ofed_1_2 directory.
 Then you can make this directory accessible to the web in this way:
 1. mkdir public_html
 2. chmod 755 public_html

 Now you can put any stuff under public_html (also symbolic links) and it
 will be available via web
 www.openfabrics.org/~user name/

I have put the ib-bonding SRPM in ~monis/ofed_1_2

--Moni


 Tziporet



 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Installation on openSUSE 10.2 Beta1 fails

2006-11-12 Thread Moni Levy
On 11/10/06, Diego Guella [EMAIL PROTECTED] wrote:
 Hi Vladimir,
 Thanks for your answer.

 I have installed:

 compat-libstdc++ (version 5.0.7-35)
 libstdc++-32bit (version 4.1.2_20060705-2)
 libstdc++41 (version 4.1.2_20061024-3)
 libstdc++41-devel (version 4.1.2_20061024-3)
 libstdc++-devel (version 4.1.3-22)


 but remember that in the log file, first it says (line 6393):
 -
 checking for C compiler default output file name... a.out
 -

 and about 5000 lines below, it says my compiler can't create executables
 (of course this isn't true, because this is the machine on wich I compile
 all the programs I make)
 Have you got any other suggestion?

Please try to install a 32 bit glibc-devel package.

-- Moni



 Thanks,
 Diego


 - Original Message -
 From: Vladimir Sokolovsky [EMAIL PROTECTED]
 To: Diego Guella [EMAIL PROTECTED]
 Cc: Tziporet Koren [EMAIL PROTECTED];
 openib-general@openib.org
 Sent: Thursday, November 09, 2006 4:48 PM
 Subject: Re: [openib-general] Installation on openSUSE 10.2 Beta1 fails


  Hello Diego,
  Check that you have libstdc++, libstdc++-devel and compat-libstdc++ RPMs
  installed.
 
  Regards,
  Vladimir
 
  Diego Guella wrote:
 
  From: Tziporet Koren
  The failing is utility is used for IPoIB high availability. If you don't
  need to use them you can just change this line in ofed.conf:
  ipoibtools=n
 
  Tziporet
 
  Thanks Tziporet for your answer.
 
 
  Tried just right now, i disabled ipoibtools. I get another, more strange
  error:
  (attached OFED.3816.log)
  -
  /bin/rm -f /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache
  cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/examples
  cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/libibverbs
  Running:
  ./configure --cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache
   --disable-libcheck --prefix /usr/local/ofed --libdir /usr/local/ofed/lib
  CPPFLAGS=-I../libibverbs/include
  configure: creating cache
  /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache
  checking for a BSD-compatible install... /usr/bin/install -c
  checking whether build environment is sane... yes
  checking for gawk... gawk
  checking whether make sets $(MAKE)... yes
  checking build system type... x86_64-unknown-linux-gnu
  checking host system type... x86_64-unknown-linux-gnu
  checking for style of include used by make... GNU
  checking for gcc... gcc
  checking for C compiler default output file name... configure: error: C
  compiler cannot create executables
  See `config.log' for more details.
  Failed to execute:
  ./configure --cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache
   --disable-libcheck --prefix /usr/local/ofed --libdir /usr/local/ofed/lib
  CPPFLAGS=-I../libibverbs/include
  error: Bad exit status from /var/tmp/rpm-tmp.46102 (%install)
  -
 
  Am I right? It says my C compiler cannot create executables Is it
  joking me
  In the log file, line 6393, it says:
  -
  checking for C compiler default output file name... a.out
  -
 
  I don't understand!
  Is there something I can do to fix this?
 
 
  Thanks,
  Diego
  
 
  ___
  openib-general mailing list
  openib-general@openib.org
  http://openib.org/mailman/listinfo/openib-general
 
  To unsubscribe, please visit
  http://openib.org/mailman/listinfo/openib-general
 


 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OFED 1.1 Build Issue

2006-11-02 Thread Moni Levy
Vlad,
On 10/31/06, Vladimir Sokolovsky [EMAIL PROTECTED] wrote:

 Ramachandra K wrote:
  Moni Shoua wrote:
 
  We already tried to go this way and found that a local Module.symvers
  is not always generated (but we might have missed something though).
  I suggest that you check that this alternative way works under all
  OSs compilation (SuSE and RedHat to be precise)...
 
 
  I think Module.symvers generation for external modules was added sometime
  around 2.6.16, so its not generated on the older kernels (for eg 2.6.9
  kernels
  on RHEL)
 
  In this scenario, when there is no Module.symvers file, I guess the other
  option is to use a single Kbuild file to build both modules,
  as explained in section 7.3 of Documentation/kbuild/modules.txt.
 
  But this may not be feasible always. Come to think of it, why does the
  OFED installation procedure not update the kernel Module.symvers file
  when it replaces the old kernel modules present in /lib/modules/
  with the new ones ?
 
  BTW, Why not updating the kernel Module.symvers when kernel-ib-devel
  is installed? This will free the developer from copying it to
  his/hers private directory.
 
 
  It might be a good idea to update the Module.symvers file as part of the
  normal installation and not only kernel-ib-devel. Because if the kernel
  modules are being replaced (or new modules are being added), shouldn't
  the Module.symvers file also be updated ?
  Regards,
  Ram
 Agree,
 Module.symvers should be updated by kernel-ib RPM.

AFAIK Module.symvers is used in compile time only so the same logic
that is used for .h files (the devel package) seems reasonable for it.

--Moni

 So, need to implement Moni's suggestion with light changes: update
 kernel-ib RPM %post and %preun sections instead of kernel-ib-devel RPM
 %pre and %postun.

 Regards,
 Vladimir

 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [openfabrics-ewg] We wish to do the 1.1 release next week

2006-10-17 Thread Moni Levy
Sounds like a great idea. We don't have blocking issues, but would be
happy to test the pre-release.

Moni
On 10/16/06, Tziporet Koren [EMAIL PROTECTED] wrote:
 This patch is already in.
 We will publish latest pre-release version tomorrow so everybody can do
 latest checks.

 Is this OK?
 Tziporet

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Scott
 Weitzenkamp (sweitzen)
 Sent: Sunday, October 15, 2006 10:16 PM
 To: Tziporet Koren; [EMAIL PROTECTED]; OPENIB
 Subject: Re: [openfabrics-ewg] [openib-general] We wish to do the 1.1
 release next week

 Yes, bug 273 (http://openib.org/bugzilla/show_bug.cgi?id=273) is a
 blocking issue for Cisco.  Roland sent a patch last Monday.  I'm done
 testing the other parts of rc7, and am testing his patch later today.

 Scott Weitzenkamp
 SQA and Release Manager
 Server Virtualization Business Unit
 Cisco Systems


  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Tziporet Koren
  Sent: Thursday, October 12, 2006 7:44 AM
  To: [EMAIL PROTECTED]; OPENIB
  Subject: [openib-general] We wish to do the 1.1 release next week
 
  Hi all,
 
  I am back from vacation and found you waited with the release
  for me :-)
 
   From a quick look at status mails I think we can do the official
  release next week.
 
  Please reply if there are still any blocking issues you have.
 
  Also - please update all documents till end of Monday next week.
 
  Tziporet
 
 
  ___
  openib-general mailing list
  openib-general@openib.org
  http://openib.org/mailman/listinfo/openib-general
 
  To unsubscribe, please visit
  http://openib.org/mailman/listinfo/openib-general
 

 ___
 openfabrics-ewg mailing list
 [EMAIL PROTECTED]
 http://openib.org/mailman/listinfo/openfabrics-ewg

 ___
 openfabrics-ewg mailing list
 [EMAIL PROTECTED]
 http://openib.org/mailman/listinfo/openfabrics-ewg



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [openfabrics-ewg] OFED 1.1-rc1 is available

2006-08-09 Thread Moni Levy
Hi, Tziporet,

On 8/8/06, Tziporet Koren [EMAIL PROTECTED] wrote:
 o iSER:
- Stability
- Testing more platforms (e.g. ppc64 and ia64)
- Performance improvements

Only number two above is in the scope of OFED from our perspective, so
we prefer to have it listed alone.

 2. iSER support in install script for SLES 10 is missing

We have a fix for that and it will be part of RC2


-- Moni

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [openfabrics-ewg] Multicast traffic performace of OFED 1.0 ipoib

2006-08-03 Thread Moni Levy
Mike,
On 8/2/06, Michael Krause [EMAIL PROTECTED] wrote:


 Is the performance being measured on an identical topology and hardware set
 as before?  Multicast by its very nature is sensitive to topology, hardware
 components used (buffer depth, latency, etc.) and workload occurring within
 the fabric.  Loss occurs as a function of congestion or lack of forward
 progress resulting in a timeout and thus a toss of a packet.   If the
 hardware is different or the settings chosen are changed, then the results
 would be expected to change.

 It is not clear what you hope to achieve with such tests as there will be
 other workloads flowing over the fabric which will create random HOL
 blocking which can result in packet loss.  Multicast workloads should be
 tolerant of such loss.

 Mike

I'm sorry about not beeing clear. My intention in the last sentance
was that we got the better (120k-140k PPS) results with our
proprietary IB stack and not with a previous openib snapshot. The
tests were run on the same setup, which by the way was dedicated only
to that traffic. I' m aware of the network implications of the test, I
was looking for hints of improvements needed in the ipoib
implementation.

-- Moni






 At 04:30 AM 8/2/2006, Moni Levy wrote:

 Hi,
 we are doing some performance testing of multicast traffic over
 ipoib. The tests are performed by using iperf on dual 1.6G AMD PCI-X
 servers with PCI-X Tavor cards with 3.4.FW.  Below are the command the
 may be used to run the test.

 Iperf server:
 route add -net 224.0.0.0 netmask 240.0.0.0 dev ib0
 /home/qa/testing-tools/iperf-2.0.2/iperf -us -B 224.4.4.4 -i 1

 Iperf client:
 route add -net 224.0.0.0 netmask 240.0.0.0 dev ib0
 /home/qa/testing-tools/iperf-2.0.2/iperf -uc 224.4.4.4 -i 1 -b 100M -t
 400 -l 100

 We are looking for the max PPT rate (100 byte packets size) without
 losses, by changing the BW parameter and looking at the point where we
 get no losses reported. The best results we received were around 50k
 PPS. I remember that we got some 120k-140k packets of the same size
 running without losses.

 We are going to look into it and try to see where is the time spent,
 but any ideas are welcome.

 Best regards,
 Moni

 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit
 http://openib.org/mailman/listinfo/openib-general




 At 04:30 AM 8/2/2006, Moni Levy wrote:

 Hi,
 we are doing some performance testing of multicast traffic over
 ipoib. The tests are performed by using iperf on dual 1.6G AMD PCI-X
 servers with PCI-X Tavor cards with 3.4.FW.  Below are the command the
 may be used to run the test.

 Iperf server:
 route add -net 224.0.0.0 netmask 240.0.0.0 dev ib0
 /home/qa/testing-tools/iperf-2.0.2/iperf -us -B 224.4.4.4 -i 1

 Iperf client:
 route add -net 224.0.0.0 netmask 240.0.0.0 dev ib0
 /home/qa/testing-tools/iperf-2.0.2/iperf -uc 224.4.4.4 -i 1 -b 100M -t
 400 -l 100

 We are looking for the max PPT rate (100 byte packets size) without
 losses, by changing the BW parameter and looking at the point where we
 get no losses reported. The best results we received were around 50k
 PPS. I remember that we got some 120k-140k packets of the same size
 running without losses.

 We are going to look into it and try to see where is the time spent,
 but any ideas are welcome.

 Best regards,
 Moni

 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit
 http://openib.org/mailman/listinfo/openib-general

 ___
 openfabrics-ewg mailing list
 [EMAIL PROTECTED]
 http://openib.org/mailman/listinfo/openfabrics-ewg




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] Multicast traffic performace of OFED 1.0 ipoib

2006-08-02 Thread Moni Levy
Hi,
we are doing some performance testing of multicast traffic over
ipoib. The tests are performed by using iperf on dual 1.6G AMD PCI-X
servers with PCI-X Tavor cards with 3.4.FW.  Below are the command the
may be used to run the test.

Iperf server:
route add -net 224.0.0.0 netmask 240.0.0.0 dev ib0
/home/qa/testing-tools/iperf-2.0.2/iperf -us -B 224.4.4.4 -i 1

Iperf client:
route add -net 224.0.0.0 netmask 240.0.0.0 dev ib0
/home/qa/testing-tools/iperf-2.0.2/iperf -uc 224.4.4.4 -i 1 -b 100M -t
400 -l 100

We are looking for the max PPT rate (100 byte packets size) without
losses, by changing the BW parameter and looking at the point where we
get no losses reported. The best results we received were around 50k
PPS. I remember that we got some 120k-140k packets of the same size
running without losses.

We are going to look into it and try to see where is the time spent,
but any ideas are welcome.

Best regards,
Moni

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [openfabrics-ewg] IPoIB bonding solution for OFED 1.1 (was re: [PATCH] ipoib: fix address update handling (was Re: OFED 1.1 release - schedule and features))

2006-07-25 Thread Moni Levy
On 7/20/06, Tziporet Koren [EMAIL PROTECTED] wrote:
 Or Gerlitz wrote:
  Hi Tziporet,
 
  Do you have an initial drop of the bonding solution planned for OFED 1.1
  that is ready to see the daylight? if not, when is this expected?
 
  As i mentioned to you, we are investigating few possible ways to
  implement HA for IPoIB and want to examine your approach as well.
 
  Or.
 
 
 
 Vlad already answered. We will be happy for any help in this area.

Tziporet,
In order to get as much cooperation as possible I think that we
should post an RFC about that before implementing it  before getting
the implementation in OFED 1.1. We looked into the more standard
implementation that uses bonding device and tried to find out what the
issues are. More then that I'm not sure that what you, guys, suggest
will work if we have multicast applications running.

-- Moni

 Tziporet

 ___
 openfabrics-ewg mailing list
 [EMAIL PROTECTED]
 http://openib.org/mailman/listinfo/openfabrics-ewg



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] RE: netperf for RDS needed

2006-04-26 Thread Moni Levy
Ranjit,


 BTW, we sent all this information to Moni Levy couple of weeks back.


I guess it's something with my mailbox, because I never received it.

-- Moni
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] IPoIB interface for unauthorized partition

2006-04-23 Thread Moni Levy
On 4/23/06, Eitan Zahavi [EMAIL PROTECTED] wrote:
 Hi Moni,

 Sorry it took me a while to get back to you (was out on vacation ...)

 Moni Levy wrote:
  On 4/10/06, Eitan Zahavi [EMAIL PROTECTED] wrote:
 
 Hi Hal,
 
 
 -Original Message-
 From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
 Sent: Monday, April 10, 2006 2:00 PM
 To: Eitan Zahavi
 Cc: Roland Dreier; openib-general@openib.org
 Subject: Re: [openib-general] IPoIB interface for unauthorized
 
 partition
 
 Hi Eitan,
 
 On Mon, 2006-04-10 at 02:35, Eitan Zahavi wrote:
 
 Hi Roland,
 
 Roland Dreier wrote:
 
 Eitan I thought the intent of the IB spec when defining P_Key
 Eitan index usage (and not P_Key value) was that the P_Key
 
 values
 
 Eitan would never need to be known above the driver level.
 
 To
 
 Eitan avoid exposing the P_Key values we could use P_Key
 
 index
 
 Eitan for creating the IPoIB interfaces.
 
 Eitan Does it make sense to work on a patch that would setup
 Eitan IPoIB interfaces by the P_Key index (and not by P_Key
 Eitan value)?
 
 I don't see how this is feasible.  The index that a particular
 
 P_Key
 
 lands at is completely undetermined -- if two nodes wanted to talk
 
 on
 
 partition 0x8001 say, how does one know which interface to use
 
 without
 
 knowing the index of that P_Key?
 
 OK, I get it. Actually the way IPoIB defines the broadcast group
 
 MGID exposes
 
 P_Key anyway.
 
 Eitan Also I think the expected behavior for IPoIB should be
 
 that
 
 Eitan IPoIB child interfaces should be automatically
 Eitan initialized by the code that brings up the interface
 Eitan (ifconfig scripts). All valid IPoIB partitions (valid =
 Eitan have corresponding broadcast groups) should be
 Eitan initialized. By doing so we provide a centralized
 
 control
 
 Eitan of the partitions and their IPoIB interfaces through
 
 the
 
 Eitan SM.
 
 Not sure if this is so.  I may want a partition strictly for
 
 storage
 
 traffic something like that, so it doesn't make sense to create an
 IPoIB interface for that partition.
 
 OpenSM provides this capability in the partition policy:
 Each partition is marked explicitly if to be used for IPoIB or not.
 So through one file one could actually control the IPoIB interfaces
 that will exist in the subnet.
 
 The end node does not know the SM policy for that partition though.
 
 
 My intent is to write some extension to ifup for IPoIB such that all
 
 sub
 
 interfaces will be automatically started (based on pre-availability
 
 of IPoIB
 
 broadcast MGID).
 
 
  I'm not sure how ifup is related to that. From what I understand you'd
  like ipoib driver to behave as follows:
 
  1. Get an event ( or figure it out) when a new PKEY is added to the
  relevant port partition table.
 I prefer not to rely on new events. Instead I would like to rely on existing 
 IB Notices:
 If we register to multicast group create/delete events (traps 66/67) IPoIB 
 can know about each new partition created.

I'm not sure that this is a good idea, because that way all of the
IPoIB nodes will get that event and try to join every new MC group and
partitioning by definition is good for separating a fabric. I think
that the right thing should be that only the relevant nodes try to
join the specific MCG.


  2. Try to join that new MC group with the MGID it created according to
  the PKEY and the spec.  (or maybe query for the MC group existance but
  that's not atomic)
 Simply join the group. We rely on these groups to be pre-created by the SM 
 enforcing policy dictating with partitions should
 be used for IPoIB and which not.

If you let all the IPoIB nodes join every new group without checking
their PKEY tables first, they may even get joined if the SM is not
eforcing MCG to port policy.
Is that your plan ?


  3. In case it fails nothing is done (no relevant MC group was
  pre-created in the SM).
 Exactly

  4. In case it succeeds a new interface is created.
 
  Is that what you meant ?
 
  - Moni
 
 
 If that were to be done, it would be cleanest if the child IPoIB
 interface was created only if that IPoIB broadcast group for that
 partition exists.
 
 [EZ] This is exactly what I had in mind.
 
 -- Hal
 
 
  - R.
 
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib-general
 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] IPoIB interface for unauthorized partition

2006-04-10 Thread Moni Levy
On 4/10/06, Eitan Zahavi [EMAIL PROTECTED] wrote:
 Hi Hal,

  -Original Message-
  From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
  Sent: Monday, April 10, 2006 2:00 PM
  To: Eitan Zahavi
  Cc: Roland Dreier; openib-general@openib.org
  Subject: Re: [openib-general] IPoIB interface for unauthorized
 partition
 
  Hi Eitan,
 
  On Mon, 2006-04-10 at 02:35, Eitan Zahavi wrote:
   Hi Roland,
  
   Roland Dreier wrote:
Eitan I thought the intent of the IB spec when defining P_Key
Eitan index usage (and not P_Key value) was that the P_Key
 values
Eitan would never need to be known above the driver level.
 To
Eitan avoid exposing the P_Key values we could use P_Key
 index
Eitan for creating the IPoIB interfaces.
   
Eitan Does it make sense to work on a patch that would setup
Eitan IPoIB interfaces by the P_Key index (and not by P_Key
Eitan value)?
   
I don't see how this is feasible.  The index that a particular
 P_Key
lands at is completely undetermined -- if two nodes wanted to talk
 on
partition 0x8001 say, how does one know which interface to use
 without
knowing the index of that P_Key?
   OK, I get it. Actually the way IPoIB defines the broadcast group
 MGID exposes
  P_Key anyway.
  
   
Eitan Also I think the expected behavior for IPoIB should be
 that
Eitan IPoIB child interfaces should be automatically
Eitan initialized by the code that brings up the interface
Eitan (ifconfig scripts). All valid IPoIB partitions (valid =
Eitan have corresponding broadcast groups) should be
Eitan initialized. By doing so we provide a centralized
 control
Eitan of the partitions and their IPoIB interfaces through
 the
Eitan SM.
   
Not sure if this is so.  I may want a partition strictly for
 storage
traffic something like that, so it doesn't make sense to create an
IPoIB interface for that partition.
   OpenSM provides this capability in the partition policy:
   Each partition is marked explicitly if to be used for IPoIB or not.
   So through one file one could actually control the IPoIB interfaces
   that will exist in the subnet.
 
  The end node does not know the SM policy for that partition though.
 
   My intent is to write some extension to ifup for IPoIB such that all
 sub
   interfaces will be automatically started (based on pre-availability
 of IPoIB
   broadcast MGID).

I'm not sure how ifup is related to that. From what I understand you'd
like ipoib driver to behave as follows:

1. Get an event ( or figure it out) when a new PKEY is added to the
relevant port partition table.
2. Try to join that new MC group with the MGID it created according to
the PKEY and the spec.  (or maybe query for the MC group existance but
that's not atomic)
3. In case it fails nothing is done (no relevant MC group was
pre-created in the SM).
4. In case it succeeds a new interface is created.

Is that what you meant ?

- Moni

 
  If that were to be done, it would be cleanest if the child IPoIB
  interface was created only if that IPoIB broadcast group for that
  partition exists.
 [EZ] This is exactly what I had in mind.
 
  -- Hal
 
   
 - R.
   
  
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] ib_local_sa testing and observations.

2006-03-29 Thread Moni Levy
Hi Sean,
 we've thought about possible ways of testing the implementation
of ib_local_sa and tried to estimate the load that it would cause to
the fabric. We did some math about the number of packets that the SM
should be able to handle in a test case of 1k node fabric and it looks
that this should be pretty heavy load on the SM side. The first,
bring up storm will be something like approximately 1000 paths / 3
paths per packet = 333 RMPP packets, lets say that the RMPP window is
20 , that means 17 more ACKs (RX) so approx 350 packets to handle per
node. In case we have 1000 nodes then the SM will have to handle 350k
packets in 1000 concurrent RMPP sessions. Now we get to implementation
details of the SMs. Do you know how many RMPP packets per second
(maximum) the OSM can handle? Please keep in mind that in case of RMPP
packets there is a lot of processing in the sender side like timers,
window management and ACK/NACK processing, also the whole list of
paths should be recreated for each session(CPU load on the SM
machine).  That probably means we'll have a period at the beginning of
the fabric bring up during which the SM will just not be able to
process any queries. That's the exact period that all of the IPoIB
interfaces in the nodes would like to join to the relevant MC groups
and will probably not get processed in a reasonable time period
(timeout). I'm not even thinking about retransmissions of lost RMPP
packets , 2-3 partitions and lmc  0. Did you do any tests or have any
ideas of possible simulations that can help to verify the above.

Regards,
Moni
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Please give 1.0 RC1 a whirl

2006-03-13 Thread Moni Levy
On 3/13/06, James Lentini [EMAIL PROTECTED] wrote:


 On Wed, 8 Mar 2006, Hal Rosenstock wrote:

  On Tue, 2006-03-07 at 20:56, Bryan O'Sullivan wrote:
   On Tue, 2006-03-07 at 17:45 -0800, Sean Hefty wrote:
Bryan O'Sullivan wrote:
 libibat
 libibat-debuginfo
 libibat-devel
 libibat-utils
   
The kernel modules to support these are obsolete.  We should
remove them from the release.
  
   Fine by me :-)  Are we killing off the kernel code, too, in
   parallel?
 
  I'm still waiting to hear all consumers have moved to addr/CMA.

 The kDAPL OpenIB provider still uses IBAT.

 I'd like to see a copy of this code available from OpenIB, but it does
 not have to be on the trunk. For my purposes a copy in

 https://openib.org/svn/gen2/branches/shaharf-ibat

 would be acceptable.

Do you plan to continue using this code ?

 Any chance that this branch's at.c and at_priv.h
 could be updated to match the versions on the trunk?

 I don't know the state of the shaharf-ibat branch.

The shaharf-ibat branch was not maintained.

If the current IBAT
 code wouldn't be consistent with that branch, I don't mind keeping the
 current IBAT code at

 https://openib.org/svn/gen2/users/jlentini/ibat
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] mthca - read byte count read request size

2006-03-07 Thread Moni Levy
On 3/7/06, Roland Dreier [EMAIL PROTECTED] wrote:
Grant Why is this an enum?

 +static int pcix_max_rbc = PCIX_MAX_RBC_INVALID;

Grant It's declared an int and is user visible.  I think the
Grant user interface would be better served if the user could
Grant just specify pcix_max_rbc=2048 instead of some magic
Grant value.

 Yes, makes sense, and any invalid value (including a default value of
 say 0) would mean for the driver to ignore the module parameter.

Maybe a message to the syslog can inform the user that his value was
ignored in order to spare false assumptions.

- Moni

  - R.
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: OpenIb 1.0 release components

2006-02-26 Thread Moni Levy
On 2/24/06, Michael S. Tsirkin [EMAIL PROTECTED] wrote:
 Quoting Moni Levy [EMAIL PROTECTED]:
While this might be a good idea for modules such as iSER
which are not currently part of the mainline kernel tree,
it is in my opinion clearly not a good idea to replace the
modules which *are* distributed with the mainline kernel.
  
   I agree, for the most part.
  
   What I have in mind for non-upstream kernel support is this:
  
* We have to ship out-of-tree drivers, simply because there's only
  one driver in the upstream kernel, and the others are not yet
  ready for submission.
* Some kernel components are clearly not contenders for shipping.
  One example is kdapl, because it appears to be dead due to
  upstream veto.
* Others might be reasonable, if they (a) see some testing and (b)
  don't intrusively patch the core kernel.  I'm thinking here
  about iSER and, to a lesser extent, SDP.
 
  I would like to add another point also. It looks like that in this
  round of the major distribution releases they will just not be able to
  include the 1.0 release due to time constraints, so the only way to
  use 1.0 release (or newer) will be to replace them in the kernel.
 
  Moni

 I dont really understand this last point. What do you mean when you say
 replace them in kernel? Replace what?

Is there an option that the distros would like to get more stable code
that is not in kernel.org (yet) ?


 I understand it why you might want to add out of kernel modules such as iSER.
 My point is they must work with core components included in kernel, not
 with core out of the svn tree.

Now I understand.


 I gather Brian here agrees.

 --
 Michael S. Tsirkin
 Staff Engineer, Mellanox Technologies
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Towards a 1.0 release of OpenIB

2006-02-23 Thread Moni Levy
On 2/22/06, Bryan O'Sullivan [EMAIL PROTECTED] wrote:
  * We would like everyone to be able to run the same tests, so
someone must gather test suites and execution instructions
together.

How would you like to manage that list of tests ? Wiki ?

 Within the next week, I'd like to gain an understanding of the following
 things:

  * Which features users want to see tested

again, do you expect that the tests will be listed in email or you
prefer to start some kind of a document ?

  * Which distros users want binary packages for

I guess that SLES 10 latest beta  EL4 in my opinion will be ok to start with.

  * Who can sign up to build and test those packages

I hope that the distros teams will be happy to do so together with the
vendor companies.

  * Whether we need to be building binary kernel packages to make
testing more consistent

That might be a good idea for also simplifying the test setups bring up process.
I think that we at least need to agree on a reference .config for the
latest kernel to use for common ground.

Moni Levy   |  +972-971-7670(o)
Project Manager, Mainstream IB host stack
Voltaire – The Grid Backbone
http://www.voltaire.com/
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] OpenIb 1.0 release components

2006-02-23 Thread Moni Levy
On 2/23/06, Bryan O'Sullivan [EMAIL PROTECTED] wrote:
 On Thu, 2006-02-23 at 19:03 +0200, Michael S. Tsirkin wrote:

  It seems that the openib release 1.0 as planned will include not only 
  userspace
  libraries but also some kernel level modules.

 Yes, I expect so.

  While this might be a good idea for modules such as iSER
  which are not currently part of the mainline kernel tree,
  it is in my opinion clearly not a good idea to replace the
  modules which *are* distributed with the mainline kernel.

 I agree, for the most part.

 What I have in mind for non-upstream kernel support is this:

  * We have to ship out-of-tree drivers, simply because there's only
one driver in the upstream kernel, and the others are not yet
ready for submission.
  * Some kernel components are clearly not contenders for shipping.
One example is kdapl, because it appears to be dead due to
upstream veto.
  * Others might be reasonable, if they (a) see some testing and (b)
don't intrusively patch the core kernel.  I'm thinking here
about iSER and, to a lesser extent, SDP.

I would like to add another point also. It looks like that in this
round of the major distribution releases they will just not be able to
include the 1.0 release due to time constraints, so the only way to
use 1.0 release (or newer) will be to replace them in the kernel.

Moni


 The problem with SDP in particular is that we need the socket family to
 be present in the upstream kernel, or we can't offer a stable ABI.  But
 SDP seems to be quite flaky, so it's not obviously a candidate for
 pushing to the upstream kernel as it stands.

b

 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] We have an OpenIB code release team

2006-02-14 Thread Moni Levy
Hi Matt,
  I would be happy to join the release team as additional Voltaire
representative.

Moni Levy   |  +972-971-7670(o)
Project Manager, Mainstream IB host stack
Voltaire – The Grid Backbone
http://www.voltaire.com/



On 2/14/06, Tziporet Koren [EMAIL PROTECTED] wrote:
 Hi Matt,

 Good that we start the release effort. I would like to join the release
 team as Mellanox representative.

 Tziporet
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general