RE: [openib-general] segmentation fault in ibv_modify_srq

2005-10-11 Thread Tziporet Koren
Title: RE: [openib-general] segmentation fault in ibv_modify_srq





SRQ limit event will be supported also on cards with memory (both Infinihost and Infinihost III)
If someone need it nowadays we can give a drop of FW that supports it.
It will be officially released in Q4.


Tziporet


-Original Message-
From: Roland Dreier [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, October 05, 2005 9:42 PM
To: Sayantan Sur
Cc: openib-general@openib.org
Subject: Re: [openib-general] segmentation fault in ibv_modify_srq



 Sayantan Hello, This is in regard to the use of `ibv_modify_srq'
 Sayantan call. When I use this call, I get a segmentation
 Sayantan fault.


This is because the modify SRQ operation is not implemented at all in
libmthca. Do you just want to set the SRQ limit? That's not so hard
for me to implement. However, you should be aware that as far as I
know, only mem-free HCAs generate the SRQ limited reached event.


- R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general


To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] segmentation fault in ibv_modify_srq

2005-10-09 Thread Sayantan Sur
Roland,

* On Oct,13 Roland Dreier[EMAIL PROTECTED] wrote :
 Sayantan I noticed that the test re-posts buffers only when the
 Sayantan outstanding recv count is = 1. I set a SRQ limit as
 Sayantan max_recv - 5. So, I should get the event when 5 WQEs are
 Sayantan consumed from the SRQ, right?
 
 Yes, your code is correct.  The problem was that the mthca kernel
 driver was dispatching SRQ events incorrectly, so the event never
 reached userspace.  I've checked in a fix for that, and I'm going to
 queue the SRQ limit event stuff for 2.6.15 (now that I've seen it
 working).

I did some further testing with this. Apparently, when the asynchronous
thread is first started, it gets the limit event (since no receives are
posted yet ...). But after that when the number of posted receives
actually drop below max_recv - 5, I am not able to see another limit
event.

Do you think that this could happen in the current implementation?

Thanks,
Sayantan.

-- 
http://www.cse.ohio-state.edu/~surs
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] segmentation fault in ibv_modify_srq

2005-10-09 Thread Jack Morgenstein
Sayantan,
The Limit Event must be re-armed after an event has occurred (it is a 
one-shot).
(i.e., modify-srq/set-limit must be re-invoked).This is compliant with the 
IB Spec (see section 10.2.9.3, first paragraph). (Note that after each SRQ LWM
event, the limit for the SRQ gets reset back to zero -- i.e., disabled).

Therefore, proper use of this feature is as follows (after creating the SRQ):
  a. Post the SRQ WQEs
  b. Arm the Limit to a non-zero value (less than the number of WQEs posted,
or the arming is useless -- you will immediately get the event).
  c. If the number of posted WQEs falls below your limit, you will get an
event.
  d. Handling the event:
1) FIRST, post more WQEs to the SRQ to get the number of posted wqe's 
to be 
greater than your desired limit.
2) THEN, re-arm the event (i.e., modify the SRQ limit again to
be a non-zero value).

Jack

-Original Message-

On Sun, Oct 09, 2005 at 05:18:53PM +0200, Sayantan Sur wrote:
 Roland,
 
 * On Oct,13 Roland Dreier[EMAIL PROTECTED] wrote :
  Sayantan I noticed that the test re-posts buffers only when the
  Sayantan outstanding recv count is = 1. I set a SRQ limit as
  Sayantan max_recv - 5. So, I should get the event when 5 WQEs are
  Sayantan consumed from the SRQ, right?
  
  Yes, your code is correct.  The problem was that the mthca kernel
  driver was dispatching SRQ events incorrectly, so the event never
  reached userspace.  I've checked in a fix for that, and I'm going to
  queue the SRQ limit event stuff for 2.6.15 (now that I've seen it
  working).
 
 I did some further testing with this. Apparently, when the asynchronous
 thread is first started, it gets the limit event (since no receives are
 posted yet ...). But after that when the number of posted receives
 actually drop below max_recv - 5, I am not able to see another limit
 event.
 
 Do you think that this could happen in the current implementation?
 
 Thanks,
 Sayantan.
 
 -- 
 http://www.cse.ohio-state.edu/~surs
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit
 http://openib.org/mailman/listinfo/openib-general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] segmentation fault in ibv_modify_srq

2005-10-09 Thread Sayantan Sur
Jack,

* On Oct,16 Jack Morgenstein[EMAIL PROTECTED] wrote :
 Sayantan,
 The Limit Event must be re-armed after an event has occurred (it is a 
 one-shot).
 (i.e., modify-srq/set-limit must be re-invoked).This is compliant with the 
 IB Spec (see section 10.2.9.3, first paragraph). (Note that after each SRQ LWM
 event, the limit for the SRQ gets reset back to zero -- i.e., disabled).
 
 Therefore, proper use of this feature is as follows (after creating the SRQ):
   a. Post the SRQ WQEs
   b. Arm the Limit to a non-zero value (less than the number of WQEs posted,
   or the arming is useless -- you will immediately get the event).
   c. If the number of posted WQEs falls below your limit, you will get an
   event.
   d. Handling the event:
   1) FIRST, post more WQEs to the SRQ to get the number of posted wqe's 
 to be 
   greater than your desired limit.
   2) THEN, re-arm the event (i.e., modify the SRQ limit again to
   be a non-zero value).

Thanks for the detailed instructions. I am able to see the limit event
exactly when the buffer count goes down.

Thanks,
Sayantan.

-- 
http://www.cse.ohio-state.edu/~surs
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] segmentation fault in ibv_modify_srq

2005-10-06 Thread Sayantan Sur
* On Oct,10 Roland Dreier[EMAIL PROTECTED] wrote :
 Sayantan I am getting a segmentation fault after a couple of
 Sayantan thousand messages are sent over SRQ (using ping-pong
 Sayantan latency test). Here is a snippet from the core
 Sayantan generated.
 
 Is it possible that you are posting one more receive to the SRQ than
 the max capacity you requested when creating the SRQ?
 
 What happens with the patch below applied to libmthca?

Upon inspection of my code, I found that there _is_ a possibility of
posting more than srq config. I fixed that and the ping-pong test works.

The patch you sent is good, it prevents the application from posting
more than max.

I will test out the limit event generation next.

Thanks,
Sayantan.

 
 Thanks,
   Roland
 
 
 --- libmthca/src/srq.c(revision 3664)
 +++ libmthca/src/srq.c(working copy)
 @@ -110,6 +110,13 @@ int mthca_tavor_post_srq_recv(struct ibv
  
   wqe   = get_wqe(srq, ind);
   next_ind  = *wqe_to_link(wqe);
 +
 + if (next_ind  0) {
 + err = -1;
 + *bad_wr = wr;
 + break;
 + }
 +
   prev_wqe  = srq-last;
   srq-last = wqe;
  
 @@ -197,6 +204,12 @@ int mthca_arbel_post_srq_recv(struct ibv
   wqe   = get_wqe(srq, ind);
   next_ind  = *wqe_to_link(wqe);
  
 + if (next_ind  0) {
 + err = -1;
 + *bad_wr = wr;
 + break;
 + }
 +
   ((struct mthca_next_seg *) wqe)-nda_op =
   htonl((next_ind  srq-wqe_shift) | 1);
   ((struct mthca_next_seg *) wqe)-ee_nds = 0;

-- 
http://www.cse.ohio-state.edu/~surs
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] segmentation fault in ibv_modify_srq

2005-10-06 Thread Sayantan Sur
Roland,

* On Oct,11 Sayantan Sur[EMAIL PROTECTED] wrote :
 I will test out the limit event generation next.

I made some simple modifications to srq_pingpong.c to see if I am able
to generate the IBV_EVENT_SRQ_LIMIT_REACHED event. I have attached my
changes as a patch and the full file (for easy execution).

I noticed that the test re-posts buffers only when the outstanding recv
count is = 1. I set a SRQ limit as max_recv - 5. So, I should get the
event when 5 WQEs are consumed from the SRQ, right?

As of now, I am not able to see the event happening. I'd be glad if you
could see if this issue can be resolved.

Thanks for your prompt help.

Sayantan.

-- 
http://www.cse.ohio-state.edu/~surs
Index: srq_pingpong.c
===
--- srq_pingpong.c  (revision 3676)
+++ srq_pingpong.c  (working copy)
@@ -36,6 +36,8 @@
 #  include config.h
 #endif /* HAVE_CONFIG_H */
 
+#define _GNU_SOURCE
+
 #include stdio.h
 #include stdlib.h
 #include unistd.h
@@ -62,6 +64,8 @@
 
 static int page_size;
 
+static pthread_t limit_thread;
+
 struct pingpong_context {
struct ibv_context  *context;
struct ibv_comp_channel *channel;
@@ -82,6 +86,25 @@
int psn;
 };
 
+
+static void asyncwatch(struct ibv_context *context)
+{
+   struct ibv_async_event event;
+
+   while (1) {
+
+   if (ibv_get_async_event(context, event)) {
+fprintf(stderr,Error getting event!\n);
+}
+
+   fprintf(stderr,   event_type %d, port %d\n, event.event_type,
+  event.element.port_num);
+fflush(stderr);
+
+   ibv_ack_async_event(event);
+   }
+}
+
 static uint16_t pp_get_local_lid(struct pingpong_context *ctx, int port)
 {
struct ibv_port_attr attr;
@@ -382,7 +405,11 @@
return NULL;
}
 
+pthread_create(limit_thread, NULL, (void *) asyncwatch, (void 
*)ctx-context);
+
{
+struct ibv_srq_attr srq_attr;
+
struct ibv_srq_init_attr attr = {
.attr = {
.max_wr  = rx_depth,
@@ -395,6 +422,15 @@
fprintf(stderr, Couldn't create SRQ\n);
return NULL;
}
+
+srq_attr.max_wr = rx_depth;
+srq_attr.max_sge = 1;
+srq_attr.srq_limit = rx_depth-5;
+
+if(ibv_modify_srq(ctx-srq, srq_attr, IBV_SRQ_LIMIT)) {
+fprintf(stderr,Error modifying SRQ\n);
+exit(-1);
+}
}
 
for (i = 0; i  num_qp; ++i) {
@@ -434,6 +470,7 @@
}
}
 
+
return ctx;
 }
 
@@ -742,6 +779,8 @@
}
}
 
+fprintf(stderr,routs %d\n, routs);
+
if (scnt  iters) {
j = find_qp(wc[i].qp_num, ctx, 
num_qp);
if (j  0) {
@@ -784,5 +823,7 @@
   iters, usec / 100., usec / iters);
}
 
+sleep(3);
+
return 0;
 }
/*
 * Copyright (c) 2005 Topspin Communications.  All rights reserved.
 *
 * This software is available to you under a choice of one of two
 * licenses.  You may choose to be licensed under the terms of the GNU
 * General Public License (GPL) Version 2, available from the file
 * COPYING in the main directory of this source tree, or the
 * OpenIB.org BSD license below:
 *
 * Redistribution and use in source and binary forms, with or
 * without modification, are permitted provided that the following
 * conditions are met:
 *
 *  - Redistributions of source code must retain the above
 *copyright notice, this list of conditions and the following
 *disclaimer.
 *
 *  - Redistributions in binary form must reproduce the above
 *copyright notice, this list of conditions and the following
 *disclaimer in the documentation and/or other materials
 *provided with the distribution.
 *
 * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
 * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
 * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
 * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 * SOFTWARE.
 *
 * $Id: srq_pingpong.c 3551 2005-09-26 21:07:33Z roland $
 */

#if HAVE_CONFIG_H
#  include config.h
#endif /* HAVE_CONFIG_H */

#define _GNU_SOURCE

#include stdio.h
#include stdlib.h
#include unistd.h
#include string.h
#include sys/types.h
#include sys/socket.h
#include sys/time.h
#include netdb.h
#include malloc.h
#include getopt.h
#include arpa/inet.h
#include 

Re: [openib-general] segmentation fault in ibv_modify_srq

2005-10-06 Thread Roland Dreier
Sayantan I noticed that the test re-posts buffers only when the
Sayantan outstanding recv count is = 1. I set a SRQ limit as
Sayantan max_recv - 5. So, I should get the event when 5 WQEs are
Sayantan consumed from the SRQ, right?

Yes, your code is correct.  The problem was that the mthca kernel
driver was dispatching SRQ events incorrectly, so the event never
reached userspace.  I've checked in a fix for that, and I'm going to
queue the SRQ limit event stuff for 2.6.15 (now that I've seen it
working).

BTW, in your code, you have:

fprintf(stderr,   event_type %d, port %d\n, event.event_type,
   event.element.port_num);

it would be more sensible to print event.element.srq here, since
you're expecting an SRQ event.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] segmentation fault in ibv_modify_srq

2005-10-06 Thread Sayantan Sur
Roland,

* On Oct,13 Roland Dreier[EMAIL PROTECTED] wrote :
 Sayantan I noticed that the test re-posts buffers only when the
 Sayantan outstanding recv count is = 1. I set a SRQ limit as
 Sayantan max_recv - 5. So, I should get the event when 5 WQEs are
 Sayantan consumed from the SRQ, right?
 
 Yes, your code is correct.  The problem was that the mthca kernel
 driver was dispatching SRQ events incorrectly, so the event never
 reached userspace.  I've checked in a fix for that, and I'm going to
 queue the SRQ limit event stuff for 2.6.15 (now that I've seen it
 working).
 
 BTW, in your code, you have:
 
   fprintf(stderr,   event_type %d, port %d\n, event.event_type,
  event.element.port_num);
 
 it would be more sensible to print event.element.srq here, since
 you're expecting an SRQ event.

Thanks for the fix!! I have updated our systems, and am able to see the
event. Thanks for the tip too. My async function was a quick copy from
the example asyncwatch.c :-)

Thanks,
Sayantan.

 
  - R.

-- 
http://www.cse.ohio-state.edu/~surs
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] segmentation fault in ibv_modify_srq

2005-10-05 Thread Roland Dreier
Sayantan Hello, This is in regard to the use of `ibv_modify_srq'
Sayantan call. When I use this call, I get a segmentation
Sayantan fault.

This is because the modify SRQ operation is not implemented at all in
libmthca.  Do you just want to set the SRQ limit?  That's not so hard
for me to implement.  However, you should be aware that as far as I
know, only mem-free HCAs generate the SRQ limited reached event.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] segmentation fault in ibv_modify_srq

2005-10-05 Thread Sayantan Sur
Roland,

* On Oct,2 Roland Dreier[EMAIL PROTECTED] wrote :
 Sayantan Hello, This is in regard to the use of `ibv_modify_srq'
 Sayantan call. When I use this call, I get a segmentation
 Sayantan fault.
 
 This is because the modify SRQ operation is not implemented at all in
 libmthca.  Do you just want to set the SRQ limit?  That's not so hard
 for me to implement.  However, you should be aware that as far as I
 know, only mem-free HCAs generate the SRQ limited reached event.

Thanks for your reply. Yes, I want to set a SRQ limit. Yes, I am aware
that only mem-free HCAs generate SRQ limit reached event. I am trying
this on a Mem-free HCA.

If you could implement this feature, that would be really great!

Thanks,
Sayantan.

 
  - R.

-- 
http://www.cse.ohio-state.edu/~surs
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] segmentation fault in ibv_modify_srq

2005-10-05 Thread Roland Dreier
Sayantan If you could implement this feature, that would be
Sayantan really great!

OK, there's not much left to do.  I should have something to check in
today.  I'll let you know when it's ready.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] segmentation fault in ibv_modify_srq

2005-10-05 Thread Roland Dreier
OK, I just checked in an initial implementation of both setting the
SRQ limit with the modify SRQ verb, and also getting SRP limit reached
events when the occur.  You will need to update your kernel drivers,
libibverbs and libmthca to get this.

I've done zero testing, so please let me know how it works.  You
should at least get an interesting new failure.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] segmentation fault in ibv_modify_srq

2005-10-05 Thread Sayantan Sur
Roland,

* On Oct,5 Roland Dreier[EMAIL PROTECTED] wrote :
 OK, I just checked in an initial implementation of both setting the
 SRQ limit with the modify SRQ verb, and also getting SRP limit reached
 events when the occur.  You will need to update your kernel drivers,
 libibverbs and libmthca to get this.

Thanks a lot for checking this in so quickly! I got the changes and
updated our systems.

 
 I've done zero testing, so please let me know how it works.  You
 should at least get an interesting new failure.

With your changes the `ibv_modify_qp' works. I will have the message
passing part done sometime soon. If I see any failure, I'll report it
to this reflector.

Thanks,
Sayantan.

 
  - R.

-- 
http://www.cse.ohio-state.edu/~surs
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] segmentation fault in ibv_modify_srq

2005-10-05 Thread Matt L. Leininger
On Wed, 2005-10-05 at 15:09 -0400, Sayantan Sur wrote:

  This is because the modify SRQ operation is not implemented at all in
  libmthca.  Do you just want to set the SRQ limit?  That's not so hard
  for me to implement.  However, you should be aware that as far as I
  know, only mem-free HCAs generate the SRQ limited reached event.
 
 Thanks for your reply. Yes, I want to set a SRQ limit. Yes, I am aware
 that only mem-free HCAs generate SRQ limit reached event. I am trying
 this on a Mem-free HCA.

   Is this due to memfree vs. memfull hardware or firmware difference?
If you flash the memfull HCA with the memfree firmware (which I was told
you can do) will the HCA generate an SRQ limit reached event?

 
 Thanks,

- Matt


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] segmentation fault in ibv_modify_srq

2005-10-05 Thread Roland Dreier
MattIs this due to memfree vs. memfull hardware or firmware
Matt difference?  If you flash the memfull HCA with the memfree
Matt firmware (which I was told you can do) will the HCA generate
Matt an SRQ limit reached event?

I believe it's a firmware difference.  There are basically three
Mellanox HCA chips:

MT23108 - PCI-X - memfull only (FW 3.x.y)
MT25208 - 2 port PCI Express - memfull (FW 4.x.y) or memfree (FW 5.x.y)
   memfree FW will work even if HCA board
   has memory on it.  Obviously memfree FW
   is required if the HCA board has no memory.
MT25204 - 1 port PCI Express - memfree only (FW 1.x.y)

Any HCA that works with memfree FW (ie any PCI Express HCA) should be
able to generate SRQ limit events.  In the current FW release, memfull
HCAs do not generate SRQ limit events.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general