Re: [openib-general] Re: IBM eHCA testing..

2005-10-13 Thread Hal Rosenstock
On Thu, 2005-10-13 at 18:46, Troy Benjegerdes wrote:
> I'm also attaching part of an opensm log file.
> 
> (the full copy is at http://scl.ameslab.gov/~troy/osm-ehca.log )
> 
> The IBM galaxy adapters are at:
>   Initial path: [0][1][16]
>   Initial path: [0][1][13]
> 

The OpenSM is just saying that a SMP transaction it issued (in this
case, SM Get P_KeyTable) is timing out (no response made it back to
OpenSM).

BTW, what svn rev is OpenSM up to ?

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Roland Dreier
Helen> Not in realtime.  My observations were made after the fact.
Helen> I supose I can launch another test and watch the cunter in
Helen> realtime if you believe that is necessary?

That might be interesting.

Assuming the HCA continues to work fine, and IPoIB recovers, the only
theory I can come up is that something is causing interrupts to be
held off for a long time, so the IPoIB driver doesn't get to see sends
completing.  But I don't know what such a workload might be.  Perhaps
something else you're running (Lustre?, iSCSI?) holds a lock for a
long time and causes the timeout.  But it's not clear to me why the TX
watchdog would get to run if the interrupt handler doesn't get to run.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: [PATCH] [SA Query] Change sa_query MAD allocation

2005-10-13 Thread Sean Hefty

Roland Dreier wrote:

Thanks, I'll read this over.

What's the motivation here?  To shift over to ib_create_send_mad() so
that all the MAD-related DMA mapping stuff is in one place, to make it
easier to fix?


Yes - the motivation is to fix the DMA mapping issue that you pointed out by 
changing ib_post_send_mad() to take an ib_mad_send_buf as input.


There are three places that I see where ib_post_send_mad() is called without 
using ib_create_mad_send(): sa_query, mthca_mad, and agent.  (Their 
implementation pre-dates the call.)  My intent was to patch each of these 
separately to use ib_create_mad_send(), then apply a patch to convert the API. 
If the API does not change to take an ib_mad_send_buf, then it's your call 
whether to apply the patch.


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Helen Chen
Roland,

>From [EMAIL PROTECTED] Thu Oct 13 16:19:30 2005
>
>Helen> BTW, the state of the IPoIB network seemed fine after the
>Helen> failed test, nd the mthca counters are moving up nicely.
>
>Even on the server on3-ib?

Yes, even on the server on3-ib.

>
>Helen> Do you still think this is a crash of the HCA firmware?
>Helen> Should I call Mellanox?
>
>Not if IPoIB is working on the systems printing the TX time out
>messages.  However, if everything stops working on one of your
>systems, then yes, an HCA crash is likely.
>
>I'm still a unclear on what is happening.  Do you see TX time
>out messages on a particular server, but IPoIB and mthca counters
>still work fine on that same server?  Or is it just the rest of the
>fabric that continues working?
>

Not in realtime.  My observations were made after the fact.  I supose 
I can launch another test and watch the cunter in realtime if you
believe that is necessary?

>Thanks,
>  Roland

Thank you so much for the speedy fix.  I will apply the patch and 
stress test it as soon as possible.

Helen :-)

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH] [SA Query] Change sa_query MAD allocation

2005-10-13 Thread Roland Dreier
Thanks, I'll read this over.

What's the motivation here?  To shift over to ib_create_send_mad() so
that all the MAD-related DMA mapping stuff is in one place, to make it
easier to fix?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [PATCH] [SA Query] Change sa_query MAD allocation

2005-10-13 Thread Sean Hefty
This patch changes sa_query to allocate MADs using the ib_create_send_mad()
routine.

The intent behind this change was to eventually change ib_post_send_mad() to
take an ib_send_mad_buf as input, but see the "DMA mapping abuses in MAD layer"
thread.  We may want to go with an alternate solution.  However, I'm posting
the patch since it's usable even without changes to ib_post_send_mad().

Signed-off-by: Sean Hefty <[EMAIL PROTECTED]>


Index: sa_query.c
===
--- sa_query.c  (revision 3692)
+++ sa_query.c  (working copy)
@@ -74,9 +74,8 @@ struct ib_sa_query {
void (*callback)(struct ib_sa_query *, int, struct ib_sa_mad *);
void (*release)(struct ib_sa_query *);
struct ib_sa_port  *port;
-   struct ib_sa_mad   *mad;
+   struct ib_mad_send_buf *mad_buf;
struct ib_sa_sm_ah *sm_ah;
-   DECLARE_PCI_UNMAP_ADDR(mapping)
int id;
 };
 
@@ -426,6 +425,7 @@ void ib_sa_cancel_query(int id, struct i
 {
unsigned long flags;
struct ib_mad_agent *agent;
+   u64 wr_id;
 
spin_lock_irqsave(&idr_lock, flags);
if (idr_find(&query_idr, id) != query) {
@@ -433,9 +433,10 @@ void ib_sa_cancel_query(int id, struct i
return;
}
agent = query->port->agent;
+   wr_id = (unsigned long) query->mad_buf;
spin_unlock_irqrestore(&idr_lock, flags);
 
-   ib_cancel_mad(agent, id);
+   ib_cancel_mad(agent, wr_id);
 }
 EXPORT_SYMBOL(ib_sa_cancel_query);
 
@@ -455,73 +456,51 @@ static void init_mad(struct ib_sa_mad *m
spin_unlock_irqrestore(&tid_lock, flags);
 }
 
+static void acquire_ah(struct ib_sa_port *port, struct ib_sa_query *query)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(&port->ah_lock, flags);
+   kref_get(&port->sm_ah->ref);
+   query->sm_ah = port->sm_ah;
+   spin_unlock_irqrestore(&port->ah_lock, flags);
+}
+
 static int send_mad(struct ib_sa_query *query, int timeout_ms)
 {
struct ib_sa_port *port = query->port;
+   struct ib_send_wr *bad_wr;
unsigned long flags;
-   int ret;
-   struct ib_sge  gather_list;
-   struct ib_send_wr *bad_wr, wr = {
-   .opcode  = IB_WR_SEND,
-   .sg_list = &gather_list,
-   .num_sge = 1,
-   .send_flags  = IB_SEND_SIGNALED,
-   .wr  = {
-.ud = {
-.mad_hdr = &query->mad->mad_hdr,
-.remote_qpn  = 1,
-.remote_qkey = IB_QP1_QKEY,
-.timeout_ms  = timeout_ms,
-}
-}
-   };
+   int ret, id;
 
 retry:
if (!idr_pre_get(&query_idr, GFP_ATOMIC))
return -ENOMEM;
spin_lock_irqsave(&idr_lock, flags);
-   ret = idr_get_new(&query_idr, query, &query->id);
+   ret = idr_get_new(&query_idr, query, &id);
spin_unlock_irqrestore(&idr_lock, flags);
if (ret == -EAGAIN)
goto retry;
if (ret)
return ret;
 
-   wr.wr_id = query->id;
-
-   spin_lock_irqsave(&port->ah_lock, flags);
-   kref_get(&port->sm_ah->ref);
-   query->sm_ah = port->sm_ah;
-   wr.wr.ud.ah  = port->sm_ah->ah;
-   spin_unlock_irqrestore(&port->ah_lock, flags);
-
-   gather_list.addr   = dma_map_single(port->agent->device->dma_device,
-   query->mad,
-   sizeof (struct ib_sa_mad),
-   DMA_TO_DEVICE);
-   gather_list.length = sizeof (struct ib_sa_mad);
-   gather_list.lkey   = port->agent->mr->lkey;
-   pci_unmap_addr_set(query, mapping, gather_list.addr);
+   query->mad_buf->send_wr.wr.ud.timeout_ms  = timeout_ms;
+   query->mad_buf->context[0] = query;
+   query->id = id;
 
-   ret = ib_post_send_mad(port->agent, &wr, &bad_wr);
+   ret = ib_post_send_mad(port->agent, &query->mad_buf->send_wr, &bad_wr);
if (ret) {
-   dma_unmap_single(port->agent->device->dma_device,
-pci_unmap_addr(query, mapping),
-sizeof (struct ib_sa_mad),
-DMA_TO_DEVICE);
-   kref_put(&query->sm_ah->ref, free_sm_ah);
spin_lock_irqsave(&idr_lock, flags);
-   idr_remove(&query_idr, query->id);
+   idr_remove(&query_idr, id);
spin_unlock_irqrestore(&idr_lock, flags);
}
 
/*
 * It's not safe to dereference query any more, because the
 * send may already have completed and freed the query in
-* another context.  So use wr.wr_id, which has a copy of the
-* query's id.
+* another context.
 */
-   return ret ? ret 

Re: [openib-general] DMA mapping abuses in MAD layer

2005-10-13 Thread Roland Dreier
Sean> Any preference to pursuing this change or modifying
Sean> ib_post_send_mad to take an ib_mad_send_buf?

I think it's going to be confusing to cast a virtual address to a long
and then ignore the lkey field.  So I would go with a new interface
not built on ib_sge.

On the other hand, maybe struct sg_list is what we should be using??
(Just thinking out loud here, so to speak)

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Roland Dreier
Helen> BTW, the state of the IPoIB network seemed fine after the
Helen> failed test, nd the mthca counters are moving up nicely.

Even on the server on3-ib?

Helen> Do you still think this is a crash of the HCA firmware?
Helen> Should I call Mellanox?

Not if IPoIB is working on the systems printing the TX time out
messages.  However, if everything stops working on one of your
systems, then yes, an HCA crash is likely.

I'm still a unclear on what is happening.  Do you see TX time
out messages on a particular server, but IPoIB and mthca counters
still work fine on that same server?  Or is it just the rest of the
fabric that continues working?

Thanks,
  Roland
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: IBM eHCA testing..

2005-10-13 Thread Shirley Ma

Thanks. It's strange the copy-paste
gave an extra 1.

Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] DMA mapping abuses in MAD layer

2005-10-13 Thread Sean Hefty

Sean Hefty wrote:

Does anyone else have any other ideas on how to fix this issue?


The current MAD interface requires the user to have code similar to this:

send_buf->sge.addr = dma_map_single(mad_agent->device->dma_device,
buf, buf_size, DMA_TO_DEVICE);
pci_unmap_addr_set(send_buf, mapping, send_buf->sge.addr);

This is consistent with how an ib_send_wr would be formatted for other QPs. 
Another possibility, however, is to let the user do:


send_buf->sge.addr = (unsigned long) buf;

And then have the MAD layer perform the mapping/unmapping immediately before and 
after posting to the QP.  This keeps the syntax of the current interface, but 
still requires user changes.


Any preference to pursuing this change or modifying ib_post_send_mad to take an 
ib_mad_send_buf?


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Helen Chen
Roland,

Ci
So you are right, it is not a moving target.  After repeating 
the IOZONE tests several times, I narrowed down the culprit,
server on3-ib.  Parallel I/O had made it a bit difficult to 
chase it down :-(  

BTW, the state of the IPoIB network seemed fine after the failed
test, nd the mthca counters are moving up nicely.  Do you still 
think this is a crash of the HCA firmware?  Should I call Mellanox? 

Thanks,
Helen


-- Original Message -
>From [EMAIL PROTECTED] Thu Oct 13 15:13:16 2005
>
>Helen> It doesn't seem like shrinking the TCP window had helped.
>Helen> I captured the Dmesg log from Lustre server and associated
>Helen> client reporting IOZONE error.
>
>What is the state of the system after you start seeing the ib0
>transmit time out messages?  Does IPoIB work at all?  Is the HCA
>responsive at all -- for example what do you see if you do
>
>  cat /sys/class/infiniband/mthca0/ports/1/state
>
>or
>
>  cat /sys/class/infiniband/mthca0/ports/1/counters/*
>
>Helen> BTW, this problem is a moving target so it is hard to
>Helen> believe that it is hardware related(?)  BTW, I am using the
>Helen> mellanox DDR switch and HCA.
>
>Not sure what you mean by a moving target... the symptoms really look
>like a crash of the HCA firmware to me.
>
>Thanks,
>  Roland
>
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[PATCH, please test] IPoIB: recycle RX bufs (was: [openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer)

2005-10-13 Thread Roland Dreier
Roland> My plan is to change the receive handling of IPoIB
Roland> slightly, so that if it can't allocate a new receive
Roland> buffer, it reposts the old buffer and drops the packet it
Roland> just received.

Here's a patch that changes IPoIB to use this scheme.  This should be
much more robust when the system gets low on GFP_ATOMIC memory.

I'd appreciate it if people could stress test and benchmark this.  It
works well for me, but I'm wondering if this patch has any effect on
performance (either better or worse).

Helen, it would be especially interesting if you could run your test
with this patch and without increasing min_free_kbytes, since you are
able to reproduce GFP_ATOMIC failures.  I'd be curious to know what
you see in /sys/class/net/ib0/statistics/rx_dropped after running the test.

Thanks,
  Roland

--- infiniband/ulp/ipoib/ipoib_main.c   (revision 3707)
+++ infiniband/ulp/ipoib/ipoib_main.c   (working copy)
@@ -729,7 +729,7 @@ int ipoib_dev_init(struct net_device *de
 
/* Allocate RX/TX "rings" to hold queued skbs */
 
-   priv->rx_ring = kmalloc(IPOIB_RX_RING_SIZE * sizeof (struct ipoib_buf),
+   priv->rx_ring = kmalloc(IPOIB_RX_RING_SIZE * sizeof (struct 
ipoib_rx_buf),
GFP_KERNEL);
if (!priv->rx_ring) {
printk(KERN_WARNING "%s: failed to allocate RX ring (%d 
entries)\n",
@@ -737,9 +737,9 @@ int ipoib_dev_init(struct net_device *de
goto out;
}
memset(priv->rx_ring, 0,
-  IPOIB_RX_RING_SIZE * sizeof (struct ipoib_buf));
+  IPOIB_RX_RING_SIZE * sizeof (struct ipoib_rx_buf));
 
-   priv->tx_ring = kmalloc(IPOIB_TX_RING_SIZE * sizeof (struct ipoib_buf),
+   priv->tx_ring = kmalloc(IPOIB_TX_RING_SIZE * sizeof (struct 
ipoib_tx_buf),
GFP_KERNEL);
if (!priv->tx_ring) {
printk(KERN_WARNING "%s: failed to allocate TX ring (%d 
entries)\n",
@@ -747,7 +747,7 @@ int ipoib_dev_init(struct net_device *de
goto out_rx_ring_cleanup;
}
memset(priv->tx_ring, 0,
-  IPOIB_TX_RING_SIZE * sizeof (struct ipoib_buf));
+  IPOIB_TX_RING_SIZE * sizeof (struct ipoib_tx_buf));
 
/* priv->tx_head & tx_tail are already 0 */
 
--- infiniband/ulp/ipoib/ipoib.h(revision 3726)
+++ infiniband/ulp/ipoib/ipoib.h(working copy)
@@ -100,7 +100,12 @@ struct ipoib_pseudoheader {
 
 struct ipoib_mcast;
 
-struct ipoib_buf {
+struct ipoib_rx_buf {
+   struct sk_buff *skb;
+   dma_addr_t  mapping;
+};
+
+struct ipoib_tx_buf {
struct sk_buff *skb;
DECLARE_PCI_UNMAP_ADDR(mapping)
 };
@@ -150,14 +155,14 @@ struct ipoib_dev_priv {
unsigned int admin_mtu;
unsigned int mcast_mtu;
 
-   struct ipoib_buf *rx_ring;
+   struct ipoib_rx_buf *rx_ring;
 
-   spinlock_ttx_lock;
-   struct ipoib_buf *tx_ring;
-   unsigned  tx_head;
-   unsigned  tx_tail;
-   struct ib_sge tx_sge;
-   struct ib_send_wr tx_wr;
+   spinlock_t   tx_lock;
+   struct ipoib_tx_buf *tx_ring;
+   unsigned tx_head;
+   unsigned tx_tail;
+   struct ib_sgetx_sge;
+   struct ib_send_wrtx_wr;
 
struct ib_wc ibwc[IPOIB_NUM_WC];
 
--- infiniband/ulp/ipoib/ipoib_ib.c (revision 3726)
+++ infiniband/ulp/ipoib/ipoib_ib.c (working copy)
@@ -95,57 +95,65 @@ void ipoib_free_ah(struct kref *kref)
}
 }
 
-static inline int ipoib_ib_receive(struct ipoib_dev_priv *priv,
-  unsigned int wr_id,
-  dma_addr_t addr)
-{
-   struct ib_sge list = {
-   .addr= addr,
-   .length  = IPOIB_BUF_SIZE,
-   .lkey= priv->mr->lkey,
-   };
-   struct ib_recv_wr param = {
-   .wr_id  = wr_id | IPOIB_OP_RECV,
-   .sg_list= &list,
-   .num_sge= 1,
-   };
+static int ipoib_ib_post_receive(struct net_device *dev, int id)
+{
+   struct ipoib_dev_priv *priv = netdev_priv(dev);
+   struct ib_sge list;
+   struct ib_recv_wr param;
struct ib_recv_wr *bad_wr;
+   int ret;
+
+   list.addr = priv->rx_ring[id].mapping;
+   list.length   = IPOIB_BUF_SIZE;
+   list.lkey = priv->mr->lkey;
+
+   param.next= NULL;
+   param.wr_id   = id | IPOIB_OP_RECV;
+   param.sg_list = &list;
+   param.num_sge = 1;
+
+   ret = ib_post_recv(priv->qp, ¶m, &bad_wr);
+   if (unlikely(ret)) {
+   ipoib_warn(priv, "receive failed for buf %d (%d)\n", id, ret);
+   dma_unmap_single(priv->ca->dma_device,
+priv->rx_ring[id].mapping,
+IPOIB_BUF_SIZE, DMA_FROM_DEVICE);
+   dev_kfree_skb_any(priv->rx_ring[id].skb);
+   priv->rx_ri

Re: [openib-general] Re: IBM eHCA testing..

2005-10-13 Thread Roland Dreier
>  http://ozlabs.org/pipermail/linuxppc64-dev/2005-July/004662.html1

delete the '1' from the end of the URL...

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: [RFC] Kernel uverbs changes for PathScale merge

2005-10-13 Thread Roland Dreier
Robert> Since the rest of the patch needed to get this working
Robert> isn't applied to either the trunk or the ipath branch yet
Robert> (and since the branch will be going away shortly), can you
Robert> just apply this patch to the trunk when you do the merge?

Sure, no problem.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: IBM eHCA testing..

2005-10-13 Thread Shirley Ma

I am not sure whether something related
to dma_addr_t. Could you please try below patch? 

>  http://ozlabs.org/pipermail/linuxppc64-dev/2005-July/004662.html1

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [RFC] Kernel uverbs changes for PathScale merge

2005-10-13 Thread Robert Walsh
> And here's a patch to ipath to make it work with the uverbs command mask...

Roland,

Since the rest of the patch needed to get this working isn't applied to
either the trunk or the ipath branch yet (and since the branch will be
going away shortly), can you just apply this patch to the trunk when you
do the merge?

Regards,
 Robert.

-- 
Robert Walsh Email: [EMAIL PROTECTED]
PathScale, Inc.  Phone: +1 650 934 8117
2071 Stierlin Court, Suite 200 Fax: +1 650 428 1969
Mountain View, CA 94043


signature.asc
Description: This is a digitally signed message part
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: IBM eHCA testing..

2005-10-13 Thread Troy Benjegerdes
On Wed, Oct 12, 2005 at 01:04:37PM +0200, IBMEHCA DD wrote:
> I just released the ehca2_0028 which uses svn 3615 on 
> https://sourceforge.net/projects/ibmehcad/
> As you might notice the license already has changed to the openib.org 
> license.
> 
> With 2.6.13 we had the non-issue that our maun focus was on 2.6.5-7.191 
> and we're only now moving to the latest kernel.

I just built against svn 3774, and 2.6.13.3, with the timeout set to 120
seconds. There's some bad interaction going on with OpenSM.

p5l2:~# modprobe hcad_mod ehca_nr_ports=1
[ 6186.855237] eBus Device Driver
[ 6186.907578] eHCA Infiniband Device Driver (Rel.: EHCA2_0028)
[ 6186.912203] xics_enable_irq: irq=36868: ibm_int_on returned fffd
p5l2:~# modprobe ib_ipoib
hang for awhile.. entries appear in osm.log ***
[ 6309.683651] PU0003 00060103:ehca_parse_ec  EHCA port 1 is available.
[ 6310.253303] kernel BUG in dma_map_single at arch/ppc64/kernel/dma.c:86!
[ 6310.253320] Oops: Exception in kernel mode, sig: 5 [#1]
[ 6310.253339] SMP NR_CPUS=8 NUMA PSERIES LPAR
[ 6310.253364] Modules linked in: ib_mad hcad_mod ib_core ebus
[ 6310.253383] NIP: C000FA10 XER: 0020 LR: C000F9B0 CTR: 
C000F980
[ 6310.253400] REGS: cf3bb770 TRAP: 0700   Not tainted (2.6.13.3-power5)
[ 6310.253421] MSR: 80029032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 CR: 
24002444
[ 6310.253436] DAR:  DSISR: 
[ 6310.253471] TASK: c209f060[1874] 'modprobe' THREAD: 
cf3b8000CPU: 7
[ 6310.253492] GPR00: C04B3660 CF3BB9F0 C05EE948 
C001DBEC5C18
[ 6310.253513] GPR04: C003CB5B1D0C 0128 0002 
0008
[ 6310.253532] GPR08: C003CBD5EEE8  CF67FC00 
C000F980
[ 6310.253553] GPR12: D00621D0 C04B7800 10017078 

[ 6310.253609] GPR16:   0001 
0001
[ 6310.253665] GPR20: C8DE7800 0002 0001 
CF67FDC8
[ 6310.253688] GPR24: CF67FD40 0002 C001DBEC5C18 
0002
[ 6310.253708] GPR28: 0128 C003CB5B1D0C D006EB00 
C003CB5B1C80
[ 6310.253731] NIP [c000fa10] .dma_map_single+0x90/0xc0
[ 6310.253753] LR [c000f9b0] .dma_map_single+0x30/0xc0
[ 6310.253778] Call Trace:
[ 6310.253797] [cf3bb9f0] [c8de7800] 0xc8de7800 
(unreliable)
[ 6310.253838] [cf3bba90] [d005aee8] 
.ib_mad_post_receive_mads+0xb8/0x270 [ib_mad]
[ 6310.253880] [cf3bbb80] [d005c840] 
.ib_mad_init_device+0x350/0x660 [ib_mad]
[ 6310.253905] [cf3bbc70] [d004d0bc] 
.ib_register_client+0xdc/0x150 [ib_core]
[ 6310.253936] [cf3bbd00] [d0061e6c] 
.ib_mad_init_module+0x8c/0xf0 [ib_mad]
[ 6310.253999] [cf3bbd90] [c0070720] 
.sys_init_module+0x1e0/0x4d0
[ 6310.254030] [cf3bbe30] [c000d300] syscall_exit+0x0/0x18
[ 6310.254045] Instruction dump:
[ 6310.254053] 4e800421 e8410028 382100a0 e8010010 eb41ffd0 eb61ffd8 eb81ffe0 
eba1ffe8
[ 6310.254089] 7c0803a6 4e800020 6000 6000 <0fe0> 382100a0 
3860e8010010
[ 6310.254206]  Segmentation fault

I'm also attaching part of an opensm log file.

(the full copy is at http://scl.ameslab.gov/~troy/osm-ehca.log )

The IBM galaxy adapters are at:
Initial path: [0][1][16]
Initial path: [0][1][13]

00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 
00

00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 
00

Oct 13 10:42:05 978875 [42FFF970] -> umad_receiver: ERR 5409: send completed 
with error (method=1 attr=16) -- dropping.
Oct 13 10:42:05 978883 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 
hop count 2 DR SLID 0x0 DR DLID 0x0
Oct 13 10:42:05 978892 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: 
MAD completed in error (IB_TIMEOUT).
Oct 13 10:42:05 978925 [42FFF970] -> SMP dump:
base_ver0x1
mgmt_class..0x81
class_ver...0x1
method..0x1 (SubnGet)
D bit...0x0
status..0x0
hop_ptr.0x0
hop_count...0x2
trans_id0x1810
attr_id.0x16 (P_KeyTable)
resv0x0
attr_mod0x3E
m_key...0x
dr_slid.0x
dr_dlid.

Re: [openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Roland Dreier
Helen> It doesn't seem like shrinking the TCP window had helped.
Helen> I captured the Dmesg log from Lustre server and associated
Helen> client reporting IOZONE error.

What is the state of the system after you start seeing the ib0
transmit time out messages?  Does IPoIB work at all?  Is the HCA
responsive at all -- for example what do you see if you do

  cat /sys/class/infiniband/mthca0/ports/1/state

or

  cat /sys/class/infiniband/mthca0/ports/1/counters/*

Helen> BTW, this problem is a moving target so it is hard to
Helen> believe that it is hardware related(?)  BTW, I am using the
Helen> mellanox DDR switch and HCA.

Not sure what you mean by a moving target... the symptoms really look
like a crash of the HCA firmware to me.

Thanks,
  Roland
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Helen Chen
Roland,

It doesn't seem like shrinking the TCP window had helped.  I captured the
Dmesg log from Lustre server and associated client reporting IOZONE error.
BTW, this problem is a moving target so it is hard to believe that it
is hardware related(?)  BTW, I am using the mellanox DDR switch and HCA.

Thanks,
Helen

--- Dmesg from Lustre server --
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 1638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 2638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 3638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 4638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 5638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 6638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 7638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 8638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 9638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 10638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 11638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 12638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 13638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 14638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 15638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 16638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 17638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 18638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 19638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 20638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 21638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 22638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 23638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 24638
LustreError: 12471:0:(ost_handler.c:735:ost_brw_write()) @@@ timeout on bulk 
GET [EMAIL PROTECTED] x20249/t0 o4->@:-1 lens 328/288 ref 0 fl
Interpret:/0/0 rc 0/0
LustreError: 12485:0:(ost_handler.c:822:ost_brw_write()) on3-ost2: bulk IO comm 
error evicting [EMAIL PROTECTED] id
192.168.2.73-12345
LustreError: 12468:0:(ost_handler.c:735:ost_brw_write()) @@@ timeout on bulk 
GET [EMAIL PROTECTED] x20359/t0 o4->@:-1 lens 328/288 ref 0 fl
Interpret:/0/0 rc 0/0
LustreError: 12468:0:(ost_handler.c:735:ost_brw_write()) previously skipped 1 
similar messages
LustreError: 12477:0:(ost_handler.c:822:ost_brw_write()) on3-ost2: bulk IO comm 
error evicting [EMAIL PROTECTED] id
192.168.2.78-12345
LustreError: 12477:0:(filter.c:1728:filter_grant_sanity_check()) 
filter_disconnect: tot_granted 48570368 != fo_tot_granted 49618944
LustreError: 12477:0:(filter.c:1731:filter_grant_sanity_check()) 
filter_disconnect: tot_pending 7340032 != fo_tot_pending 8388608
Lustre: A connection with 192.168.2.80 timed out; the network or that node may 
be down.
LustreError: 12189:0:(socknal_cb.c:2264:ksocknal_check_peer_timeouts()) Timeout 
out conn->0xc0a80250 ip 192.168.2.80:1022
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 25638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 26638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 27638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 28638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 29638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 30638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 31638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 32638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 33638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 34638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 35638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 36638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 37638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 38638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 39638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 40638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 41638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 42638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 43638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 44638
NETDEV WATCHDOG: ib0: transmit timed out
ib0: tra

Re: [openib-general] Re: [RFC] Kernel uverbs changes for PathScale merge

2005-10-13 Thread Roland Dreier
And here's a patch to ipath to make it work with the uverbs command mask...

Index: infiniband/hw/ipath/ib_ipath/ipath_openib.c
===
--- infiniband/hw/ipath/ib_ipath/ipath_openib.c (revision 3758)
+++ infiniband/hw/ipath/ib_ipath/ipath_openib.c (working copy)
@@ -5733,6 +5733,32 @@ static int ipath_register_ib_device(cons
 
strlcpy(dev->name, "infinipath_ib%d", IB_DEVICE_NAME_MAX);
dev->uverbs_abi_ver = IPATH_UVERBS_ABI_VERSION;
+   dev->uverbs_cmd_mask =
+   (1ull << IB_USER_VERBS_CMD_GET_CONTEXT) |
+   (1ull << IB_USER_VERBS_CMD_QUERY_DEVICE)|
+   (1ull << IB_USER_VERBS_CMD_QUERY_PORT)  |
+   (1ull << IB_USER_VERBS_CMD_ALLOC_PD)|
+   (1ull << IB_USER_VERBS_CMD_DEALLOC_PD)  |
+   (1ull << IB_USER_VERBS_CMD_CREATE_AH)   |
+   (1ull << IB_USER_VERBS_CMD_DESTROY_AH)  |
+   (1ull << IB_USER_VERBS_CMD_REG_MR)  |
+   (1ull << IB_USER_VERBS_CMD_DEREG_MR)|
+   (1ull << IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL) |
+   (1ull << IB_USER_VERBS_CMD_CREATE_CQ)   |
+   (1ull << IB_USER_VERBS_CMD_DESTROY_CQ)  |
+   (1ull << IB_USER_VERBS_CMD_POLL_CQ) |
+   (1ull << IB_USER_VERBS_CMD_REQ_NOTIFY_CQ)   |
+   (1ull << IB_USER_VERBS_CMD_CREATE_QP)   |
+   (1ull << IB_USER_VERBS_CMD_MODIFY_QP)   |
+   (1ull << IB_USER_VERBS_CMD_DESTROY_QP)  |
+   (1ull << IB_USER_VERBS_CMD_POST_SEND)   |
+   (1ull << IB_USER_VERBS_CMD_POST_RECV)   |
+   (1ull << IB_USER_VERBS_CMD_ATTACH_MCAST)|
+   (1ull << IB_USER_VERBS_CMD_DETACH_MCAST)|
+   (1ull << IB_USER_VERBS_CMD_CREATE_SRQ)  |
+   (1ull << IB_USER_VERBS_CMD_MODIFY_SRQ)  |
+   (1ull << IB_USER_VERBS_CMD_DESTROY_SRQ) |
+   (1ull << IB_USER_VERBS_CMD_POST_SRQ_RECV);
dev->node_type = IB_NODE_CA;
dev->phys_port_cnt = 1;
dev->dma_device = ipath_layer_get_pcidev(t);
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: [RFC] Kernel uverbs changes for PathScale merge

2005-10-13 Thread Roland Dreier
OK, here's a new patch that adds a mask of allowed userspace commands
set by the kernel low-level driver.

Thanks, good catch Michael...

 - R.

--- include/rdma/ib_user_verbs.h(revision 3707)
+++ include/rdma/ib_user_verbs.h(working copy)
@@ -1,6 +1,7 @@
 /*
  * Copyright (c) 2005 Topspin Communications.  All rights reserved.
  * Copyright (c) 2005 Cisco Systems.  All rights reserved.
+ * Copyright (c) 2005 PathScale, Inc.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -88,8 +89,11 @@ enum {
  * Make sure that all structs defined in this file remain laid out so
  * that they pack the same way on 32-bit and 64-bit architectures (to
  * avoid incompatibility between 32-bit userspace and 64-bit kernels).
- * In particular do not use pointer types -- pass pointers in __u64
- * instead.
+ * Specifically:
+ *  - Do not use pointer types -- pass pointers in __u64 instead.
+ *  - Make sure that any structure larger than 4 bytes is padded to a
+ *multiple of 8 bytes.  Otherwise the structure size will be
+ *different between 32-bit and 64-bit architectures.
  */
 
 struct ib_uverbs_async_event_desc {
@@ -261,6 +265,42 @@ struct ib_uverbs_create_cq_resp {
__u32 cqe;
 };
 
+struct ib_uverbs_poll_cq {
+   __u64 response;
+   __u32 cq_handle;
+   __u32 ne;
+   __u64 wc;
+};
+
+struct ib_uverbs_wc {
+   __u64 wr_id;
+   __u32 status;
+   __u32 opcode;
+   __u32 vendor_err;
+   __u32 byte_len;
+   __u32 imm_data;
+   __u32 qp_num;
+   __u32 src_qp;
+   __u32 wc_flags;
+   __u16 pkey_index;
+   __u16 slid;
+   __u8 sl;
+   __u8 dlid_path_bits;
+   __u8 port_num;
+   __u8 reserved;
+};
+
+struct ib_uverbs_poll_cq_resp {
+   __u32 count;
+   __u32 reserved;
+   struct ib_uverbs_wc wc[0];
+};
+
+struct ib_uverbs_req_notify_cq {
+   __u32 cq_handle;
+   __u32 solicited_only;
+};
+
 struct ib_uverbs_destroy_cq {
__u64 response;
__u32 cq_handle;
@@ -358,6 +398,127 @@ struct ib_uverbs_destroy_qp_resp {
__u32 events_reported;
 };
 
+/*
+ * Note: the ib_uverbs_sge structure isn't used anywhere, as the ib_sge
+ * structure is packed the same way on 32-bit and 64-bit architectures
+ * in both kernel and user space.  It's just here to document the ABI.
+ */
+
+struct ib_uverbs_sge {
+   __u64 addr;
+   __u32 length;
+   __u32 lkey;
+};
+
+struct ib_uverbs_send_wr {
+   __u64 wr_id; 
+   __u32 num_sge;
+   __u32 opcode;
+   __u32 send_flags;
+   __u32 imm_data;
+   union {
+   struct {
+   __u64 remote_addr;
+   __u32 rkey;
+   __u32 reserved;
+   } rdma;
+   struct {
+   __u64 remote_addr;
+   __u64 compare_add;
+   __u64 swap;
+   __u32 rkey;
+   __u32 reserved;
+   } atomic;
+   struct {
+   __u32 ah;
+   __u32 remote_qpn;
+   __u32 remote_qkey;
+   __u32 reserved;
+   } ud;
+   } wr;
+};
+
+struct ib_uverbs_post_send {
+   __u64 response;
+   __u32 qp_handle;
+   __u32 wr_count;
+   __u32 sge_count;
+   __u32 wqe_size;
+   struct ib_uverbs_send_wr send_wr[0];
+};
+
+struct ib_uverbs_post_send_resp {
+   __u32 bad_wr;
+};
+
+struct ib_uverbs_recv_wr {
+   __u64 wr_id;
+   __u32 num_sge;
+   __u32 reserved;
+};
+
+struct ib_uverbs_post_recv {
+   __u64 response;
+   __u32 qp_handle;
+   __u32 wr_count;
+   __u32 sge_count;
+   __u32 wqe_size;
+   struct ib_uverbs_recv_wr recv_wr[0];
+};
+
+struct ib_uverbs_post_recv_resp {
+   __u32 bad_wr;
+};
+
+struct ib_uverbs_post_srq_recv {
+   __u64 response;
+   __u32 srq_handle;
+   __u32 wr_count;
+   __u32 sge_count;
+   __u32 wqe_size;
+   struct ib_uverbs_recv_wr recv[0];
+};
+
+struct ib_uverbs_post_srq_recv_resp {
+   __u32 bad_wr;
+};
+
+struct ib_uverbs_global_route {
+   __u8  dgid[16];
+   __u32 flow_label;
+   __u8  sgid_index;
+   __u8  hop_limit;
+   __u8  traffic_class;
+   __u8  reserved;
+};
+
+struct ib_uverbs_ah_attr {
+   struct ib_uverbs_global_route grh;
+   __u16 dlid;
+   __u8  sl;
+   __u8  src_path_bits;
+   __u8  static_rate;
+   __u8  is_global;
+   __u8  port_num;
+   __u8  reserved;
+};
+
+struct ib_uverbs_create_ah {
+   __u64 response;
+   __u64 user_handle;
+   __u32 pd_handle;
+   __u32 reserved;
+   struct ib_uverbs_ah_attr attr;
+};
+
+struct ib_uverbs_create_ah_resp {
+   __u32 ah_handle;
+};
+
+struct ib_uverbs_destroy_ah {
+   __u32 ah_handle;
+};
+
 struct i

[openib-general] Re: [RFC] Kernel uverbs changes for PathScale merge

2005-10-13 Thread Roland Dreier
Michael> What prevents the user from passing e.g. poll cq command
Michael> on mthca device? If that happens, it seems that
Michael> ib_poll_cq will then crash.

Michael> Is there a mask somewhere that lets the device specify
Michael> which uverbs commands are allowed for it?

Hmm, excellent point.  A mask would be one way to avoid this -- let me
think about whether there's a better way to handle this.

Thanks,
  Roland

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] libibverbs changes for PathScale merge

2005-10-13 Thread Roland Dreier
Robert> Since qp_type is now in ibv_qp, it probably no longer
Robert> needs to be in mthca_qp.  This is just a minor
Robert> optimization.

Yep, I'll make that change too.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] libibverbs changes for PathScale merge

2005-10-13 Thread Robert Walsh
> @@ -488,6 +489,7 @@ struct ibv_qp {
>   uint32_thandle;
>   uint32_tqp_num;
>   enum ibv_qp_state   state;
> + enum ibv_qp_typeqp_type;
>  
>   pthread_mutex_t mutex;
>   pthread_cond_t  cond;

Since qp_type is now in ibv_qp, it probably no longer needs to be in
mthca_qp.  This is just a minor optimization.

Regards,
 Robert.

-- 
Robert Walsh Email: [EMAIL PROTECTED]
PathScale, Inc.  Phone: +1 650 934 8117
2071 Stierlin Court, Suite 200 Fax: +1 650 428 1969
Mountain View, CA 94043


signature.asc
Description: This is a digitally signed message part
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: [RFC] Kernel uverbs changes for PathScale merge

2005-10-13 Thread Michael S. Tsirkin
Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> Subject: [RFC] Kernel uverbs changes for PathScale merge
> 
> Here are the changes to the kernel part of userspace verbs required to
> support PathScale's driver.  I'm now happy with them and ready to
> commit them to the svn trunk and queue them for 2.6.15.  This will
> allow the PathScale hardware-specific driver to be move to the trunk
> as well, although quite a bit of cleanup is necessary before merging
> the driver upstream.
> 
> Does anyone have any comments on these changes before I commit?

What prevents the user from passing e.g. poll cq command on
mthca device? If that happens, it seems that ib_poll_cq will
then crash.

Is there a mask somewhere that lets the device specify which
uverbs commands are allowed for it?


> --- infiniband/core/uverbs_cmd.c  (revision 3707)
> +++ infiniband/core/uverbs_cmd.c  (working copy)
> @@ -665,6 +665,93 @@ err:
>   return ret;
>  }
>  
> +ssize_t ib_uverbs_poll_cq(struct ib_uverbs_file *file,
> +   const char __user *buf, int in_len,
> +   int out_len)
> +{
> + struct ib_uverbs_poll_cq   cmd;
> + struct ib_uverbs_poll_cq_resp *resp;
> + struct ib_cq  *cq;
> + struct ib_wc  *wc;
> + intret = 0;
> + inti;
> + intrsize;
> +
> + if (copy_from_user(&cmd, buf, sizeof cmd))
> + return -EFAULT;
> +
> + wc = kmalloc(cmd.ne * sizeof *wc, GFP_KERNEL);
> + if (!wc)
> + return -ENOMEM;
> +
> + rsize = sizeof *resp + cmd.ne * sizeof(struct ib_uverbs_wc);
> + resp = kmalloc(rsize, GFP_KERNEL);
> + if (!resp) {
> + ret = -ENOMEM;
> + goto out_wc;
> + }
> +
> + down(&ib_uverbs_idr_mutex);
> + cq = idr_find(&ib_uverbs_cq_idr, cmd.cq_handle);
> + if (!cq || cq->uobject->context != file->ucontext) {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + resp->count = ib_poll_cq(cq, cmd.ne, wc);
> +
> + for (i = 0; i < resp->count; i++) {
> + resp->wc[i].wr_id  = wc[i].wr_id;
> + resp->wc[i].status = wc[i].status;
> + resp->wc[i].opcode = wc[i].opcode;
> + resp->wc[i].vendor_err = wc[i].vendor_err;
> + resp->wc[i].byte_len   = wc[i].byte_len;
> + resp->wc[i].imm_data   = wc[i].imm_data;
> + resp->wc[i].qp_num = wc[i].qp_num;
> + resp->wc[i].src_qp = wc[i].src_qp;
> + resp->wc[i].wc_flags   = wc[i].wc_flags;
> + resp->wc[i].pkey_index = wc[i].pkey_index;
> + resp->wc[i].slid   = wc[i].slid;
> + resp->wc[i].sl = wc[i].sl;
> + resp->wc[i].dlid_path_bits = wc[i].dlid_path_bits;
> + resp->wc[i].port_num   = wc[i].port_num;
> + }
> +
> + if (copy_to_user((void __user *) (unsigned long) cmd.response, resp, 
> rsize))
> + ret = -EFAULT;
> +
> +out:
> + up(&ib_uverbs_idr_mutex);
> + kfree(resp);
> +
> +out_wc:
> + kfree(wc);
> + return ret ? ret : in_len;
> +}

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [RFC] libibverbs changes for PathScale merge

2005-10-13 Thread Roland Dreier
Here are the changes to libibverbs required to support PathScale's
driver.  Again, I'm happy with them and would just like to get
comments on them before I commit them to svn.

Thanks,
  Roland

--- libibverbs/include/infiniband/driver.h  (revision 3774)
+++ libibverbs/include/infiniband/driver.h  (working copy)
@@ -1,6 +1,7 @@
 /*
  * Copyright (c) 2004, 2005 Topspin Communications.  All rights reserved.
  * Copyright (c) 2005 Cisco Systems.  All rights reserved.
+ * Copyright (c) 2005 PathScale, Inc.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -92,6 +93,8 @@ extern int ibv_cmd_create_cq(struct ibv_
 int comp_vector, struct ibv_cq *cq,
 struct ibv_create_cq *cmd, size_t cmd_size,
 struct ibv_create_cq_resp *resp, size_t resp_size);
+extern int ibv_cmd_poll_cq(struct ibv_cq *cq, int ne, struct ibv_wc *wc);
+extern int ibv_cmd_req_notify_cq(struct ibv_cq *cq, int solicited);
 extern int ibv_cmd_destroy_cq(struct ibv_cq *cq);
 
 extern int ibv_cmd_create_srq(struct ibv_pd *pd,
@@ -111,6 +114,15 @@ extern int ibv_cmd_modify_qp(struct ibv_
 enum ibv_qp_attr_mask attr_mask,
 struct ibv_modify_qp *cmd, size_t cmd_size);
 extern int ibv_cmd_destroy_qp(struct ibv_qp *qp);
+extern int ibv_cmd_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
+struct ibv_send_wr **bad_wr);
+extern int ibv_cmd_post_recv(struct ibv_qp *ibqp, struct ibv_recv_wr *wr,
+struct ibv_recv_wr **bad_wr);
+extern int ibv_cmd_post_srq_recv(struct ibv_srq *srq, struct ibv_recv_wr *wr,
+struct ibv_recv_wr **bad_wr);
+extern int ibv_cmd_create_ah(struct ibv_pd *pd, struct ibv_ah *ah,
+struct ibv_ah_attr *attr);
+extern int ibv_cmd_destroy_ah(struct ibv_ah *ah);
 extern int ibv_cmd_attach_mcast(struct ibv_qp *qp, union ibv_gid *gid, 
uint16_t lid);
 extern int ibv_cmd_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, 
uint16_t lid);
 
--- libibverbs/include/infiniband/verbs.h   (revision 3774)
+++ libibverbs/include/infiniband/verbs.h   (working copy)
@@ -2,6 +2,7 @@
  * Copyright (c) 2004, 2005 Topspin Communications.  All rights reserved.
  * Copyright (c) 2004 Intel Corporation.  All rights reserved.
  * Copyright (c) 2005 Cisco Systems.  All rights reserved.
+ * Copyright (c) 2005 PathScale, Inc.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -488,6 +489,7 @@ struct ibv_qp {
uint32_thandle;
uint32_tqp_num;
enum ibv_qp_state   state;
+   enum ibv_qp_typeqp_type;
 
pthread_mutex_t mutex;
pthread_cond_t  cond;
@@ -513,6 +515,7 @@ struct ibv_cq {
 struct ibv_ah {
struct ibv_context *context;
struct ibv_pd  *pd;
+   uint32_thandle;
 };
 
 struct ibv_device;
--- libibverbs/include/infiniband/kern-abi.h(revision 3774)
+++ libibverbs/include/infiniband/kern-abi.h(working copy)
@@ -1,6 +1,7 @@
 /*
  * Copyright (c) 2005 Topspin Communications.  All rights reserved.
  * Copyright (c) 2005 Cisco Systems.  All rights reserved.
+ * Copyright (c) 2005 PathScale, Inc.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -93,8 +94,11 @@ enum {
  * Make sure that all structs defined in this file remain laid out so
  * that they pack the same way on 32-bit and 64-bit architectures (to
  * avoid incompatibility between 32-bit userspace and 64-bit kernels).
- * In particular do not use pointer types -- pass pointers in __u64
- * instead.
+ * Specifically:
+ *  - Do not use pointer types -- pass pointers in __u64 instead.
+ *  - Make sure that any structure larger than 4 bytes is padded to a
+ *multiple of 8 bytes.  Otherwise the structure size will be
+ *different between 32-bit and 64-bit architectures.
  */
 
 struct ibv_kern_async_event {
@@ -298,6 +302,47 @@ struct ibv_create_cq_resp {
__u32 cqe;
 };
 
+struct ibv_kern_wc {
+__u64  wr_id;
+__u32  status;
+__u32  opcode;
+__u32  vendor_err;
+__u32  byte_len;
+__u32  imm_data;
+__u32  qp_num;
+__u32  src_qp;
+__u32  wc_flags;
+__u16  pkey_index;
+__u16  slid;
+__u8   sl;
+__u8   dlid_path_bits;
+   __u8   port_num;
+   __u8   reserved;
+};
+
+struct ibv_poll_cq {
+   __u32 command;
+   __u16 in_words;
+   __u16 out_words;
+   __u64 response;
+   __u32 cq_handle;
+   __u32 ne;
+};

[openib-general] [RFC] Kernel uverbs changes for PathScale merge

2005-10-13 Thread Roland Dreier
Here are the changes to the kernel part of userspace verbs required to
support PathScale's driver.  I'm now happy with them and ready to
commit them to the svn trunk and queue them for 2.6.15.  This will
allow the PathScale hardware-specific driver to be move to the trunk
as well, although quite a bit of cleanup is necessary before merging
the driver upstream.

Does anyone have any comments on these changes before I commit?

Thanks,
  Roland

--- infiniband/include/rdma/ib_user_verbs.h (revision 3707)
+++ infiniband/include/rdma/ib_user_verbs.h (working copy)
@@ -1,6 +1,7 @@
 /*
  * Copyright (c) 2005 Topspin Communications.  All rights reserved.
  * Copyright (c) 2005 Cisco Systems.  All rights reserved.
+ * Copyright (c) 2005 PathScale, Inc.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -88,8 +89,11 @@ enum {
  * Make sure that all structs defined in this file remain laid out so
  * that they pack the same way on 32-bit and 64-bit architectures (to
  * avoid incompatibility between 32-bit userspace and 64-bit kernels).
- * In particular do not use pointer types -- pass pointers in __u64
- * instead.
+ * Specifically:
+ *  - Do not use pointer types -- pass pointers in __u64 instead.
+ *  - Make sure that any structure larger than 4 bytes is padded to a
+ *multiple of 8 bytes.  Otherwise the structure size will be
+ *different between 32-bit and 64-bit architectures.
  */
 
 struct ib_uverbs_async_event_desc {
@@ -261,6 +265,42 @@ struct ib_uverbs_create_cq_resp {
__u32 cqe;
 };
 
+struct ib_uverbs_poll_cq {
+   __u64 response;
+   __u32 cq_handle;
+   __u32 ne;
+   __u64 wc;
+};
+
+struct ib_uverbs_wc {
+   __u64 wr_id;
+   __u32 status;
+   __u32 opcode;
+   __u32 vendor_err;
+   __u32 byte_len;
+   __u32 imm_data;
+   __u32 qp_num;
+   __u32 src_qp;
+   __u32 wc_flags;
+   __u16 pkey_index;
+   __u16 slid;
+   __u8 sl;
+   __u8 dlid_path_bits;
+   __u8 port_num;
+   __u8 reserved;
+};
+
+struct ib_uverbs_poll_cq_resp {
+   __u32 count;
+   __u32 reserved;
+   struct ib_uverbs_wc wc[0];
+};
+
+struct ib_uverbs_req_notify_cq {
+   __u32 cq_handle;
+   __u32 solicited_only;
+};
+
 struct ib_uverbs_destroy_cq {
__u64 response;
__u32 cq_handle;
@@ -358,6 +398,127 @@ struct ib_uverbs_destroy_qp_resp {
__u32 events_reported;
 };
 
+/*
+ * Note: the ib_uverbs_sge structure isn't used anywhere, as the ib_sge
+ * structure is packed the same way on 32-bit and 64-bit architectures
+ * in both kernel and user space.  It's just here to document the ABI.
+ */
+
+struct ib_uverbs_sge {
+   __u64 addr;
+   __u32 length;
+   __u32 lkey;
+};
+
+struct ib_uverbs_send_wr {
+   __u64 wr_id; 
+   __u32 num_sge;
+   __u32 opcode;
+   __u32 send_flags;
+   __u32 imm_data;
+   union {
+   struct {
+   __u64 remote_addr;
+   __u32 rkey;
+   __u32 reserved;
+   } rdma;
+   struct {
+   __u64 remote_addr;
+   __u64 compare_add;
+   __u64 swap;
+   __u32 rkey;
+   __u32 reserved;
+   } atomic;
+   struct {
+   __u32 ah;
+   __u32 remote_qpn;
+   __u32 remote_qkey;
+   __u32 reserved;
+   } ud;
+   } wr;
+};
+
+struct ib_uverbs_post_send {
+   __u64 response;
+   __u32 qp_handle;
+   __u32 wr_count;
+   __u32 sge_count;
+   __u32 wqe_size;
+   struct ib_uverbs_send_wr send_wr[0];
+};
+
+struct ib_uverbs_post_send_resp {
+   __u32 bad_wr;
+};
+
+struct ib_uverbs_recv_wr {
+   __u64 wr_id;
+   __u32 num_sge;
+   __u32 reserved;
+};
+
+struct ib_uverbs_post_recv {
+   __u64 response;
+   __u32 qp_handle;
+   __u32 wr_count;
+   __u32 sge_count;
+   __u32 wqe_size;
+   struct ib_uverbs_recv_wr recv_wr[0];
+};
+
+struct ib_uverbs_post_recv_resp {
+   __u32 bad_wr;
+};
+
+struct ib_uverbs_post_srq_recv {
+   __u64 response;
+   __u32 srq_handle;
+   __u32 wr_count;
+   __u32 sge_count;
+   __u32 wqe_size;
+   struct ib_uverbs_recv_wr recv[0];
+};
+
+struct ib_uverbs_post_srq_recv_resp {
+   __u32 bad_wr;
+};
+
+struct ib_uverbs_global_route {
+   __u8  dgid[16];
+   __u32 flow_label;
+   __u8  sgid_index;
+   __u8  hop_limit;
+   __u8  traffic_class;
+   __u8  reserved;
+};
+
+struct ib_uverbs_ah_attr {
+   struct ib_uverbs_global_route grh;
+   __u16 dlid;
+   __u8  sl;
+   __u8  src_path_bits;
+   __u8  static_rate;
+   __u8  is_global;
+   __u8  port_num;
+   __u8  reserved;

Re: [openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Helen Chen
Roland,

>From [EMAIL PROTECTED] Thu Oct 13 13:53:05 2005
>
>Helen> Roland, Thank you for your response.  That fixed my initial
>Helen> buffer allocation failure.  After we tuned the Lustre and
>Helen> reran same IOZONE tests again, we got the following
>Helen> problem.  Was there an actual network interrupt? If so, the
>Helen> problem is not obvious now; the two nodes are pinging over
>Helen> IPoIB.  Please advice.
>
>That's very odd.  This message:
>
>Helen> NETDEV WATCHDOG: ib0: transmit timed out
>Helen> ib0: transmit timeout: latency 1846
>
>says that we are not seeing send completions from the HCA.  However,
>are you saying that even when you are seeing this message, ping over
>IPoIB is working?
>

No, I didn't know there were any problem until IOZONE reported read 
error from the Lustre Client.  

BTW, the backend storage is iSCSI over 10 GbE using jumbo frame.  This
pl\roblem only appeared after our tuning errfor: we increased the iSCSI
payload to 1 MB, and increased the TCP window to 512 KB from 256 KB. I
will shrink my TCP window and see if the problem goes away.

Thanks,
Helen
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Roland Dreier
Helen> Roland, Thank you for your response.  That fixed my initial
Helen> buffer allocation failure.  After we tuned the Lustre and
Helen> reran same IOZONE tests again, we got the following
Helen> problem.  Was there an actual network interrupt? If so, the
Helen> problem is not obvious now; the two nodes are pinging over
Helen> IPoIB.  Please advice.

That's very odd.  This message:

Helen> NETDEV WATCHDOG: ib0: transmit timed out
Helen> ib0: transmit timeout: latency 1846

says that we are not seeing send completions from the HCA.  However,
are you saying that even when you are seeing this message, ping over
IPoIB is working?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: [PATCH] perftest/rdma_bw; add support for RDMA read and starting PSN

2005-10-13 Thread Arlin Davis

Michael S. Tsirkin wrote:


Quoting r. Arlin Davis <[EMAIL PROTECTED]>:
 


Subject: [PATCH] perftest/rdma_bw; add support for RDMA read and starting PSN

Michael,

The patch adds command line options for RDMA reads and starting PSN. I
used these modifications to
help isolate the RDMA read performance degradation with 4.6.2 firmware. 


-arlin
   



Thanks Arlin. I plan to look into integrating this.
One question: for which psn values do you see performance drop on 4.6.0 FW?


 



A quick run at 1 and then 0x10 dropped from 682MB/s to 49MB/s for 
32KB buffers.
What is really strange is that it takes a couple runs to start seeing 
the drop in performance.


PSN=1 no problems...

[EMAIL PROTECTED] perftest]$  ./rdma_bw -P 0x1 -s 32768 -r iclust-19
 local address:  LID 0x02, QPN 0x20406, PSN 0x0001 RKey 0x0c0032 VAddr 
0x514000 RDMA_READ
 remote address: LID 0x05, QPN 0x20406, PSN 0x0001 RKey 0x0c0032 VAddr 
0x513000 RDMA_READ

Bandwidth peak (#0 to #999): 682.504 MB/sec
Bandwidth average: 682.501 MB/sec
Service Demand peak (#0 to #999): 5138 cycles/KB
Service Demand Avg  : 5138 cycles/KB

[EMAIL PROTECTED] perftest]$  ./rdma_bw -P 0x1 -s 32768 -r iclust-19
 local address:  LID 0x02, QPN 0x30406, PSN 0x0001 RKey 0x120032 VAddr 
0x514000 RDMA_READ
 remote address: LID 0x05, QPN 0x30406, PSN 0x0001 RKey 0x120032 VAddr 
0x513000 RDMA_READ

Bandwidth peak (#0 to #990): 682.496 MB/sec
Bandwidth average: 682.496 MB/sec
Service Demand peak (#0 to #990): 5138 cycles/KB
Service Demand Avg  : 5138 cycles/KB

[EMAIL PROTECTED] perftest]$  ./rdma_bw -P 0x1 -s 32768 -r iclust-19
 local address:  LID 0x02, QPN 0x40406, PSN 0x0001 RKey 0x180032 VAddr 
0x514000 RDMA_READ
 remote address: LID 0x05, QPN 0x40406, PSN 0x0001 RKey 0x180032 VAddr 
0x513000 RDMA_READ

Bandwidth peak (#0 to #990): 682.5 MB/sec
Bandwidth average: 682.499 MB/sec
Service Demand peak (#0 to #990): 5138 cycles/KB
Service Demand Avg  : 5138 cycles/KB

PSN=0x10  (start to see problems after first run)

[EMAIL PROTECTED] perftest]$  ./rdma_bw -P 0x10 -s 32768 -r iclust-19
 local address:  LID 0x02, QPN 0xb0406, PSN 0x10 RKey 0x420032 
VAddr 0x514000 RDMA_READ
 remote address: LID 0x05, QPN 0x90406, PSN 0x10 RKey 0x360032 
VAddr 0x513000 RDMA_READ

Bandwidth peak (#0 to #996): 682.5 MB/sec
Bandwidth average: 682.499 MB/sec
Service Demand peak (#0 to #996): 5138 cycles/KB
Service Demand Avg  : 5138 cycles/KB

[EMAIL PROTECTED] perftest]$  ./rdma_bw -P 0x10 -s 32768 -r iclust-19
 local address:  LID 0x02, QPN 0xc0406, PSN 0x10 RKey 0x480032 
VAddr 0x514000 RDMA_READ
 remote address: LID 0x05, QPN 0xa0406, PSN 0x10 RKey 0x3c0032 
VAddr 0x513000 RDMA_READ

Bandwidth peak (#0 to #0): 48.5441 MB/sec
Bandwidth average: 47.4502 MB/sec
Service Demand peak (#0 to #0): 72244 cycles/KB
Service Demand Avg  : 73909 cycles/KB

[EMAIL PROTECTED] perftest]$  ./rdma_bw -P 0x10 -s 32768 -r iclust-19
 local address:  LID 0x02, QPN 0xd0406, PSN 0x10 RKey 0x4e0032 
VAddr 0x514000 RDMA_READ
 remote address: LID 0x05, QPN 0xb0406, PSN 0x10 RKey 0x420032 
VAddr 0x513000 RDMA_READ

Bandwidth peak (#0 to #0): 48.4803 MB/sec
Bandwidth average: 47.4501 MB/sec
Service Demand peak (#0 to #0): 72339 cycles/KB
Service Demand Avg  : 73909 cycles/KB

PSN = 1 (first run is bad, and then it is back to normal)

[EMAIL PROTECTED] perftest]$  ./rdma_bw -P 0x1 -s 32768 -r iclust-19
 local address:  LID 0x02, QPN 0xe0406, PSN 0x0001 RKey 0x540032 VAddr 
0x514000 RDMA_READ
 remote address: LID 0x05, QPN 0xc0406, PSN 0x0001 RKey 0x480032 VAddr 
0x513000 RDMA_READ

Bandwidth peak (#0 to #0): 48.5798 MB/sec
Bandwidth average: 47.4502 MB/sec
Service Demand peak (#0 to #0): 72190 cycles/KB
Service Demand Avg  : 73909 cycles/KB

[EMAIL PROTECTED] perftest]$  ./rdma_bw -P 0x1 -s 32768 -r iclust-19
 local address:  LID 0x02, QPN 0xf0406, PSN 0x0001 RKey 0x5a0032 VAddr 
0x514000 RDMA_READ
 remote address: LID 0x05, QPN 0xd0406, PSN 0x0001 RKey 0x4e0032 VAddr 
0x513000 RDMA_READ

Bandwidth peak (#0 to #990): 682.492 MB/sec
Bandwidth average: 682.49 MB/sec
Service Demand peak (#0 to #990): 5138 cycles/KB
Service Demand Avg  : 5138 cycles/KB

-arlin



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Helen Chen
Roland,

Thank you for your response.  That fixed my initial buffer
allocation failure.  After we tuned the Lustre and reran 
same IOZONE tests again, we got the following problem.
Was there an actual network interrupt? If so, the problem
is not obvious now; the two nodes are pinging over IPoIB.
Please advice.

Thanks,
Helen

 Dmesg Report from Lustre server -
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 1846
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 2846
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 3846
Lustre: A connection with 192.168.2.79 timed out; the network or that node may 
be down.
LustreError: 10501:0:(socknal_cb.c:2264:ksocknal_check_peer_timeouts()) Timeout 
out conn->0xc0a8024f ip 192.168.2.79:1021
LustreError: 10793:0:(ldlm_lib.c:506:target_handle_reconnect()) 
460e5_lov2_7d3910bb5c reconnecting

- Dmesg from Lustre client (192.168.2.79) --
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 1965
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 2965
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 3965
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 4965
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 5965
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 6965
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 7965
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 8965
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 9965
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 10965
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 11965
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 12965
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 13965
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 14965
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 15965
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 16965
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 17965
Lustre: 10035:0:(socknal_cb.c:1326:ksocknal_process_receive()) [f6256000] EOF 
from 0xc0a80253 ip 192.168.2.83:988
LustreError: 10169:0:(client.c:568:ptlrpc_check_status()) @@@ type == 
PTL_RPC_MSG_ERR, err == -107 [EMAIL PROTECTED] x13853/t0
o400->[EMAIL PROTECTED]:6 lens 64/64 ref 1 fl Rpc:RN/0/0 rc 0/-107
LustreError: Connection to service on5-ost2 via nid 192.168.2.76 was lost; in 
progress operations using this service will wait for recovery to
complete.
Lustre: 10169:0:(import.c:142:ptlrpc_set_import_discon()) 
OSC_on8_on5-ost2_MNT_on8-ib_2: connection lost to [EMAIL PROTECTED]
LustreError: This client was evicted by on5-ost2; in progress operations using 
this service will fail.
LustreError: 10413:0:(rw.c:1253:ll_readpage()) page c1538cc0 map f6193328 index 
825344 flags 20001023 count 3 priv e91da940: lock match failed: rc -5
LustreError: 10169:0:(client.c:502:ptlrpc_import_delay_req()) @@@ IMP_INVALID 
[EMAIL PROTECTED] x13862/t0 o3->[EMAIL PROTECTED]:6 lens 328/280
ref 2 fl Rpc:/0/0 rc 0/0
LustreError: 10169:0:(client.c:502:ptlrpc_import_delay_req()) @@@ IMP_INVALID 
[EMAIL PROTECTED] x13868/t0 o3->[EMAIL PROTECTED]:6 lens 328/280
ref 2 fl Rpc:/0/0 rc 0/0
LustreError: 10169:0:(client.c:502:ptlrpc_import_delay_req()) previously 
skipped 4 similar messages
LustreError: 10169:0:(client.c:502:ptlrpc_import_delay_req()) @@@ IMP_INVALID 
[EMAIL PROTECTED] x13880/t0 o3->[EMAIL PROTECTED]:6 lens 328/280
ref 2 fl Rpc:/0/0 rc 0/0
LustreError: 10169:0:(client.c:502:ptlrpc_import_delay_req()) previously 
skipped 11 similar messages
Lustre: A connection with 192.168.2.75 timed out; the network or that node may 
be down.
LustreError: 10041:0:(socknal_cb.c:2264:ksocknal_check_peer_timeouts()) Timeout 
out conn->0xc0a8024b ip 192.168.2.75:988
Lustre: Connection restored to service on5-ost2 using nid 192.168.2.76.
Lustre: 10496:0:(import.c:714:ptlrpc_import_recovery_state_machine()) 
OSC_on8_on5-ost2_MNT_on8-ib_2: connection restored to
[EMAIL PROTECTED]
LustreError: 10169:0:(client.c:945:ptlrpc_expire_one_request()) @@@ timeout 
(sent at 1129234515, 101s ago) [EMAIL PROTECTED] x13850/t0
o400->[EMAIL PROTECTED]:12 lens 64/64 ref 1 fl Rpc:N/0/0 rc 0/0
LustreError: Connection to service on12-mds2 via nid 192.168.2.83 was lost; in 
progress operations using this service will wait for recovery to
complete.
Lustre: 10169:0:(import.c:142:ptlrpc_set_import_discon()) 
MDC_on8_on12-mds2_MNT_on8-ib_2: connection lost to [EMAIL PROTECTED]
Lustre: Connection restored to service on3-ost2 using nid 192.168.2.74.
Lustre: 10170:0:(import.c:714:ptlrpc_import_recovery_state_machine()) 
OSC_on8_on3-ost2_MNT_on8-ib_2: connection restored to
[EMAIL PROTECTED]

_

[openib-general] Re: [PATCH] uDAPL async QP/CQ error handling fixed

2005-10-13 Thread James Lentini


On Thu, 13 Oct 2005, Arlin Davis wrote:

> James,
> 
> Patch will fix the async error handling and callback mappings. QP/CQ 
> error mappings were totally screwed up. Updated TODO list.
> 
> -arlin

Committed in revision 3774.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Michael S. Tsirkin
Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> Subject: Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to 
> allocate receive buffer
> 
> Michael> Yes, it seems that if such an allocation fails IPoIB may
> Michael> never repost the receive buffer. Is that right?
> 
> I think so.
> 
> My plan is to change the receive handling of IPoIB slightly, so that
> if it can't allocate a new receive buffer, it reposts the old buffer
> and drops the packet it just received.

Sounds like a good idea.

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Roland Dreier
Michael> Yes, it seems that if such an allocation fails IPoIB may
Michael> never repost the receive buffer. Is that right?

I think so.

My plan is to change the receive handling of IPoIB slightly, so that
if it can't allocate a new receive buffer, it reposts the old buffer
and drops the packet it just received.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH] perftest/rdma_bw; add support for RDMA read and starting PSN

2005-10-13 Thread Michael S. Tsirkin
Quoting r. Arlin Davis <[EMAIL PROTECTED]>:
> Subject: [PATCH] perftest/rdma_bw; add support for RDMA read and starting PSN
> 
> Michael,
> 
> The patch adds command line options for RDMA reads and starting PSN. I
> used these modifications to
> help isolate the RDMA read performance degradation with 4.6.2 firmware. 
> 
> -arlin

Thanks Arlin. I plan to look into integrating this.
One question: for which psn values do you see performance drop on 4.6.0 FW?


-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Michael S. Tsirkin
Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> IPoIB's handling of these allocation errors can definitely be improved

Yes, it seems that if such an allocation fails IPoIB may never repost
the receive buffer. Is that right?

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [PATCH] uDAPL async QP/CQ error handling fixed

2005-10-13 Thread Arlin Davis
James,

Patch will fix the async error handling and callback mappings. QP/CQ error 
mappings were totally
screwed up. Updated TODO list. 

-arlin

Signed-off by: Arlin Davis <[EMAIL PROTECTED]>


Index: dapl/openib/TODO
===
--- dapl/openib/TODO(revision 3768)
+++ dapl/openib/TODO(working copy)
@@ -1,12 +1,10 @@
 
 IB Verbs:
 - CQ resize
-- mulitple CQ event support
 - memory window support
 
 DAPL:
 - reinit EP needs a QP timewait completion notification
-- direct cq_wait_object when multi-CQ verbs event support arrives
 - shared receive queue support
 
 Under discussion:
Index: dapl/openib/dapl_ib_util.c
===
--- dapl/openib/dapl_ib_util.c  (revision 3768)
+++ dapl/openib/dapl_ib_util.c  (working copy)
@@ -214,8 +214,11 @@ DAT_RETURN dapls_ib_open_hca (
/* Get list of all IB devices, find match, open */
dev_list = ibv_get_devices();
dlist_start(dev_list);
-   dlist_for_each_data(dev_list,hca_ptr->ib_trans.ib_dev,struct 
ibv_device) {
-   if 
(!strcmp(ibv_get_device_name(hca_ptr->ib_trans.ib_dev),hca_name))
+   dlist_for_each_data(dev_list,
+   hca_ptr->ib_trans.ib_dev,
+   struct ibv_device) {
+   if (!strcmp(ibv_get_device_name(hca_ptr->ib_trans.ib_dev),
+   hca_name))
break;
}
 
@@ -226,20 +229,22 @@ DAT_RETURN dapls_ib_open_hca (
return DAT_INTERNAL_ERROR;
}

-   dapl_dbg_log (DAPL_DBG_TYPE_UTIL," open_hca: Found dev %s %016llx\n", 
-   ibv_get_device_name(hca_ptr->ib_trans.ib_dev),
-   (unsigned long
long)bswap_64(ibv_get_device_guid(hca_ptr->ib_trans.ib_dev)));
+   dapl_dbg_log (
+   DAPL_DBG_TYPE_UTIL," open_hca: Found dev %s %016llx\n", 
+   ibv_get_device_name(hca_ptr->ib_trans.ib_dev),
+   (unsigned long long)
+   bswap_64(ibv_get_device_guid(hca_ptr->ib_trans.ib_dev)));
 
hca_ptr->ib_hca_handle = ibv_open_device(hca_ptr->ib_trans.ib_dev);
if (!hca_ptr->ib_hca_handle) {
dapl_dbg_log (DAPL_DBG_TYPE_ERR, 
  " open_hca: IB dev open failed for %s\n", 
- ibv_get_device_name(hca_ptr->ib_trans.ib_dev) );
+ ibv_get_device_name(hca_ptr->ib_trans.ib_dev));
return DAT_INTERNAL_ERROR;
}
hca_ptr->ib_trans.ib_ctx = hca_ptr->ib_hca_handle;
 
-   /* set inline max with enviromment or default, get local lid and gid 0 
*/
+   /* set inline max with env or default, get local lid and gid 0 */
hca_ptr->ib_trans.max_inline_send = 
dapl_os_get_env_val("DAPL_MAX_INLINE", INLINE_SEND_DEFAULT);
 
@@ -253,15 +258,17 @@ DAT_RETURN dapls_ib_open_hca (
}

dapl_dbg_log(DAPL_DBG_TYPE_UTIL,
-" open_hca: GID subnet %016llx id %016llx\n",
-(unsigned long 
long)bswap_64(hca_ptr->ib_trans.gid.global.subnet_prefix),
-(unsigned long 
long)bswap_64(hca_ptr->ib_trans.gid.global.interface_id) );
+   " open_hca: GID subnet %016llx id %016llx\n",
+   (unsigned long long)
+   bswap_64(hca_ptr->ib_trans.gid.global.subnet_prefix),
+   (unsigned long long)
+   bswap_64(hca_ptr->ib_trans.gid.global.interface_id));
 
/* get the IP address of the device using GID */
if (dapli_get_hca_addr(hca_ptr)) {
dapl_dbg_log (DAPL_DBG_TYPE_ERR, 
  " open_hca: ERR ib_at_ips_by_gid for %s\n", 
- ibv_get_device_name(hca_ptr->ib_trans.ib_dev) );
+ ibv_get_device_name(hca_ptr->ib_trans.ib_dev));
goto bail;
}
 
@@ -310,15 +317,23 @@ DAT_RETURN dapls_ib_open_hca (
write(g_ib_pipe[1], "w", sizeof "w");
dapl_os_unlock(&g_hca_lock);

-   dapl_dbg_log (DAPL_DBG_TYPE_UTIL, 
- " open_hca: %s, port %d, %s  %d.%d.%d.%d 
INLINE_MAX=%d\n", 
- ibv_get_device_name(hca_ptr->ib_trans.ib_dev), 
hca_ptr->port_num,
- ((struct sockaddr_in *)&hca_ptr->hca_address)->sin_family 
== AF_INET ?
"AF_INET":"AF_INET6",
- ((struct sockaddr_in 
*)&hca_ptr->hca_address)->sin_addr.s_addr >> 0 & 0xff,
- ((struct sockaddr_in 
*)&hca_ptr->hca_address)->sin_addr.s_addr >> 8 & 0xff,
- ((struct sockaddr_in 
*)&hca_ptr->hca_address)->sin_addr.s_addr >> 16 & 0xff,
- ((struct sockaddr_in 
*)&hca_ptr->hca_address)->sin_addr.s_addr >> 24 & 0xff,
- hca_ptr->ib_trans.max_inline_send );
+   dapl_dbg_log (
+ 

RE: [openib-general] [RFC] IB address translation using ARP

2005-10-13 Thread Caitlin Bestler



I agree with Mike's analysis. But I'd also like to point 
out that even
when source compatability is not a requirement, source 
familiarity
is. That is, even when recoding is feasible the API should 
only
introduce new concepts as required to improve efficiency. 
The
shift from socket model to QP/CQ is challenging enough as 
is.
It's also where the benefit is. Changing how the 
application
requests and accepts connections is just piling on more 
things
for the developers to learn onto an already very full 
plate, and
with nowhere near the same benefit.
 
The simple, IP/DNS-centric methods that Mike outlined 
will
work on either iWARP or IB, and are very easily 
understood
by those familiar with existing sockets/IP network 
development.
The more complex models provide minor enhancements 
for
very corner cases at the very heavy concept of requiring 

the developer to understand a lot more about network 
topology.
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [RFC] IB address translation using ARP

2005-10-13 Thread Michael Krause


At 03:14 PM 10/12/2005, Caitlin Bestler wrote:
 
> -Original Message-
> From: [EMAIL PROTECTED] 
>
[
mailto:[EMAIL PROTECTED]] On Behalf Of Sean
Hefty
> Sent: Wednesday, October 12, 2005 2:36 PM
> To: Michael Krause
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] [RFC] IB address translation using
ARP
> 
> Michael Krause wrote:
> > 1. Applications want to use existing API to identify remote

> endnodes / 
> > services.
> 
> To clarify, the applications want to use IP based addressing 
> to identify remote endnotes.  The connection API is under
development.
> 

No, I think Mike's comment was dead on. Applications want to
use the existing API. They want to use the existing API even
when the API is clearly defective. Note that there are several
generations of host-resolution APIs for the IP world, with the
earlier ones clearly being heavily inferior (not thread safe,
not IPv4/IPv6 neutral, etc). But they have not been eliminated.
Why, because applications want to use the existing API.
If application developers were rationale and totally open to
adopt new ideas instantly then the active side would ask to
make a connection to a *service*, not to a host with a service
qualifier.
A new API may be under development to meet new needs. But keep in
mind that the application developers expect it to be as close to
what they are used to as possible, and will grumble that it is
not 100% compatible. 
This all comes down to economics which is why some ULP such as SDP are
created.  Let's examine SDP for a moment.  The purpose of SDP
to enable synchronous and asynchronous Sockets applications to
transparently run unmodified over a RDMA capable
interconnect.   Unmodified means no source code changes and no
recompile required (this is possible if the Sockets library is a shared
library and dynamically linked).   The first part of unmodified
means that the existing address / service resolution API calls work
(further, no change to the address family, etc. is required to make this
work either).  Hence, pick any of the get* API calls that are in use
today and they should just work.  
How does this work?  The SDP implementation takes on the burden for
the application developer.  For iWARP, there really isn't anything
special that has to be done as these calls all should provide the
necessary information.  The port mapper protocol would be invoked
which would map to the actual RDMA listen QP and target RNIC.  For
IB, there is some additional work both in using SID as well as resolving
the IP address to the IB address vector but the work isn't that hard
to   implement (we know this because this has all been
implemented on various OS within the industry).  The same will be
true for NFS/RDMA and iSER - again all use the existing interfaces to
identify the address / service and map to an address vector (and again,
all of this has been implemented on various OS within the
industry).
The above makes ISV and customers very happy as they can take advantage
of RDMA technologies without having to go through the lengthy and
expensive qualification process that comes when any application is
modified / recompiled.   This keeps costs low and improves
TTM.  As for the RDMA connection API, that is simply attempting to
abstract to a common interface that any ULP implementation can use to
access either iWARP or IB.   The RDMA connection API should not
be viewed as something end application developers will use but towards
middleware developers.  This allows everyone to use IP addresses,
port spaces, etc. through the existing application API while allowing
RDMA to transparently add some intelligence to the process and eventually
enable new capabilities like policy management (e.g. how best to map ULP
QoS needs to a given path, service rate,etc.) without permuting
everything above.  Keeping things transparent is best for all. 
Attempting to require end application developers to modify their code
will result in slower adoption and reduced utilization of RDMA
technologies within the industry.  It really is all about economics
and re-using the existing ecosystem / infrastructure.
Mike


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH] perftest/rdma_bw; add support for RDMA read and starting PSN

2005-10-13 Thread Arlin Davis
Michael,

The patch adds command line options for RDMA reads and starting PSN. I used 
these modifications to
help isolate the RDMA read performance degradation with 4.6.2 firmware. 

-arlin


Signed-off by: Arlin Davis <[EMAIL PROTECTED]>

Index: rdma_bw.c
===
--- rdma_bw.c   (revision 3768)
+++ rdma_bw.c   (working copy)
@@ -304,7 +304,9 @@ static struct pingpong_context *pp_init_
  * The Consumer is not allowed to assign Remote Write or Remote Atomic 
to
  * a Memory Region that has not been assigned Local Write. */
ctx->mr = ibv_reg_mr(ctx->pd, ctx->buf, size * 2,
-IBV_ACCESS_REMOTE_WRITE | IBV_ACCESS_LOCAL_WRITE);
+IBV_ACCESS_REMOTE_WRITE | 
+IBV_ACCESS_REMOTE_READ |
+IBV_ACCESS_LOCAL_WRITE);
if (!ctx->mr) {
fprintf(stderr, "Couldn't allocate MR\n");
return NULL;
@@ -345,7 +347,9 @@ static struct pingpong_context *pp_init_
attr.qp_state= IBV_QPS_INIT;
attr.pkey_index  = 0;
attr.port_num= port;
-   attr.qp_access_flags = IBV_ACCESS_REMOTE_WRITE;
+   attr.qp_access_flags = IBV_ACCESS_REMOTE_WRITE |
+  IBV_ACCESS_REMOTE_READ |
+  IBV_ACCESS_LOCAL_WRITE;
 
if (ibv_modify_qp(ctx->qp, &attr,
  IBV_QP_STATE  |
@@ -370,7 +374,7 @@ static int pp_connect_ctx(struct pingpon
attr.path_mtu   = IBV_MTU_2048;
attr.dest_qp_num= dest->qpn;
attr.rq_psn = dest->psn;
-   attr.max_dest_rd_atomic = 1;
+   attr.max_dest_rd_atomic = 4;
attr.min_rnr_timer  = 12;
attr.ah_attr.is_global  = 0;
attr.ah_attr.dlid   = dest->lid;
@@ -394,7 +398,7 @@ static int pp_connect_ctx(struct pingpon
attr.retry_cnt  = 7;
attr.rnr_retry  = 7;
attr.sq_psn = my_psn;
-   attr.max_rd_atomic  = 1;
+   attr.max_rd_atomic  = 4;
if (ibv_modify_qp(ctx->qp, &attr,
  IBV_QP_STATE  |
  IBV_QP_TIMEOUT|
@@ -417,6 +421,7 @@ static void usage(const char *argv0)
printf("\n");
printf("Options:\n");
printf("  -p, --port=  listen on/connect to port  
(default 18515)\n");
+   printf("  -P, --starting_psn starting sequence on QP (default 
random)\n");
printf("  -d, --ib-dev= use IB device  (default first 
device found)\n");
printf("  -i, --ib-port=   use port  of IB device (default 
1)\n");
printf("  -s, --size=  size of message to exchange (default 
65536)\n");
@@ -487,6 +492,8 @@ int main(int argc, char *argv[])
int  scnt, ccnt;
int  sockfd;
int  duplex = 0;
+   int  rdma_read = 0;
+   int  starting_psn = 0;
struct ibv_qp   *qp;
 
cycles_t*tposted;
@@ -498,16 +505,18 @@ int main(int argc, char *argv[])
 
static struct option long_options[] = {
{ .name = "port",   .has_arg = 1, .val = 'p' },
+   { .name = "starting_psn",   .has_arg = 1, .val = 'P' },
{ .name = "ib-dev", .has_arg = 1, .val = 'd' },
{ .name = "ib-port",.has_arg = 1, .val = 'i' },
{ .name = "size",   .has_arg = 1, .val = 's' },
{ .name = "iters",  .has_arg = 1, .val = 'n' },
{ .name = "tx-depth",   .has_arg = 1, .val = 't' },
{ .name = "bidirectional",  .has_arg = 0, .val = 'b' },
+   { .name = "rdma_read",  .has_arg = 0, .val = 'r' },
{ 0 }
};
 
-   c = getopt_long(argc, argv, "p:d:i:s:n:t:b", long_options, 
NULL);
+   c = getopt_long(argc, argv, "p:P:d:i:s:n:t:br", long_options, 
NULL);
if (c == -1)
break;
 
@@ -520,6 +529,14 @@ int main(int argc, char *argv[])
}
break;
 
+   case 'P':
+   starting_psn = strtol(optarg, NULL, 0);
+   if (port <= 0) {
+   usage(argv[0]);
+   return 1;
+   }
+   break;
+
case 'd':
ib_devname = strdupa(optarg);
break;
@@ -567,6 +584,10 @@ int main(int argc, char *argv[])
duplex = 1;
break;
 
+

RE: [openib-general] QP with large starting sequence adds latencyto RDMA READ???

2005-10-13 Thread Fab Tillier
> From: Arlin Davis [mailto:[EMAIL PROTECTED]
> Sent: Thursday, October 13, 2005 9:42 AM
> 
> Sean Hefty wrote:
> 
> > Arlin Davis wrote:
> >
> >> I just noticed some RDMA read performance issues that seem to be
> >> related to the QP starting sequence number. If I set the starting
> >> sequence to 1 then all is fine but if I set it to 0x1 then it
> >> seems to add ~40us to my 32KB RDMA read operation (polling for
> >> completions). Has anyone seen anything like this?
> >
> >
> > Has anyone else noticed this issue?  You could try to reproduce this
> > by using the rdma_bw test and changing the PSN.
> >
> > - Sean
> >
> 
> I added a starting PSN and RDMA READ option to the rdma_bw test and was
> able to reproduce on a PCI-E adapter with 4.6.2 firmware. I retried on a
> system with 4.7.0 and it looks like the problem is fixed. However,  I
> see nothing about this problem in the "bug fix" list in the release
> notes. Can someone at Mellanox confirm this problem with RDMA reads and
> add to release notes as a fix so it is documented somewhere?
> 
> http://www.mellanox.com/products/fw_images/fw-25208-4_7_0-release_notes.pdf

Note that I have seen similar behavior (drop in bandwidth) correlated to
starting PSN using Winsock Direct under Windows, so this doesn't seem to be a
uDAPL or Linux issue.  As for Arlin, the issue disappeared in firmware 4.7.0,
and I too would like to see some confirmation that there was an issue and that
it was fixed.

Thanks,

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] QP with large starting sequence adds latency to RDMA READ???

2005-10-13 Thread Arlin Davis

Sean Hefty wrote:


Arlin Davis wrote:

I just noticed some RDMA read performance issues that seem to be 
related to the QP starting sequence number. If I set the starting 
sequence to 1 then all is fine but if I set it to 0x1 then it 
seems to add ~40us to my 32KB RDMA read operation (polling for 
completions). Has anyone seen anything like this?



Has anyone else noticed this issue?  You could try to reproduce this 
by using the rdma_bw test and changing the PSN.


- Sean



I added a starting PSN and RDMA READ option to the rdma_bw test and was 
able to reproduce on a PCI-E adapter with 4.6.2 firmware. I retried on a 
system with 4.7.0 and it looks like the problem is fixed. However,  I 
see nothing about this problem in the "bug fix" list in the release 
notes. Can someone at Mellanox confirm this problem with RDMA reads and 
add to release notes as a fix so it is documented somewhere?


http://www.mellanox.com/products/fw_images/fw-25208-4_7_0-release_notes.pdf

-arlin

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] mvapich-gen2 IA64 compile problem

2005-10-13 Thread John Partridge

Sayantan,

Thanks for the reply. I was just using make in the mvapich-gen2 directory,
that may call the script I don't know. I'll take a look at the doc you suggested
and go through the troule shooting in there.

John

Sayantan Sur wrote:

Hi John,

* On Oct,6 John Partridge<[EMAIL PROTECTED]> wrote :


Roland,

Actually, I just checked (and reinstalled in case there was a problem)
and libibverbs is installed OK and I still get the problem.



The mvapich.make.[gcc,icc,pgi] script in the top level directory of
MVAPICH-Gen2 includes all the library paths and appropriate -l's.

Can you please tell us if you are using this script? There is a user
guide in the distribution too (called: mvapich.user_guide.pdf), which
lists some common troubleshooting issues when installing/using MVAPICH.

Thanks,
Sayantan.



--
John Partridge

Silicon Graphics Inc
Tel:  651-683-3428
Vnet: 233-3428
E-Mail: [EMAIL PROTECTED]
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Migration Solution

2005-10-13 Thread Hal Rosenstock
On Thu, 2005-10-13 at 03:10, Mohit Katiyar, Noida wrote:
> Hi all,
> If anyone can suggest some good possible solution for migrating from 
> Clients FC Switch -> SAN connection
> To 
> Clients---> IB network---> SAN Connection

It depends on your storage. There are two choices here: iSER based IB
storage and SRP based IB storage.

> The most economical I can think of is
> Clients -> IB Switch > IB FC gateway---> FC
> Switch> SAN
> But performance enhancement is doubtful

> The Expensive but high performance will be
> Clients > IB Switch-> SAN

Yes, this is more direct and is higher performance but is this more
expensive ? The tradeoff is the cost of the IB FC gateway versus the
cost delta of the native IB v. FC based storage. The main issue is the
availability of the native IB storage solutions (I think several are
emerging) and the initiator side (there are iSER and SRP initiators
available for OpenIB).

> Does anyone having any other ideas or any other middleway?

Not that I am aware of.

-- Hal

> Thanks
> Mohit
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH] [ADDR] return gateway GID for non-local IP addresses

2005-10-13 Thread Hal Rosenstock
On Wed, 2005-10-12 at 19:39, Sean Hefty wrote:
> The following patch returns the GID of the IP gateway for non-local
> subnet IP addresses.
> 
> Hal, does this change look correct to you?  I don't have an easy way
> to test this fully.

Yes, this looks right. 

I think the address resolution part can be tested without a real gateway
for the connection by just adding a route off the IPoIB subnet to some
other endnode and trying to connect to something on that remote
destination subnet. You should at least see the ARP complete for that
next hop and the connect (perhaps) fail depending on the discrimination
in the passive side on the IP address passed in the private data.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Migration Solution

2005-10-13 Thread Mohit Katiyar, Noida
Hi all,
If anyone can suggest some good possible solution for migrating from 
Clients FC Switch -> SAN connection
To 
Clients---> IB network---> SAN Connection

The most economical I can think of is
Clients -> IB Switch > IB FC gateway---> FC
Switch> SAN
But performance enhancement is doubtful

The Expensive but high performance will be
Clients > IB Switch-> SAN
Does anyone having any other ideas or any other middleway?

Thanks
Mohit
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general