[openib-general] Re: [PATCH] mad.c::ib_register_mad_agent: Fix RMPP version check during agent registration

2006-04-13 Thread Hal Rosenstock
On Thu, 2006-04-13 at 01:44, Roland Dreier wrote:
 OK, I applied this by hand ... your mailer turned all your tabs into
 spaces somewhere along the way, so the patch wouldn't apply.

Wow. That hasn't happened in a while. I used preformat on evolution the
same as the other patches so I'm not sure what's up.

Thanks for applying it.

-- Hal

  - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] thanks and a question

2006-04-13 Thread Hal Rosenstock
Hi again Ron,

On Wed, 2006-04-12 at 23:46, Ronald G Minnich wrote:
 Hal Rosenstock wrote:
 
  hoq is HOQLife. Is slv the switch LifeTimeValue ?
 
 I believe so.
 
  Does that have anything to do with those settings ?
 
 it would not work until hoq and slv were 17.
 
  Truly hanging ?
 
 yes, and it was the only real connection at that point, from the bproc 
 daemon on the slave node to the bproc daemon on the master. There was 
 only 1 host powered up at that point. It was very repeatable -- we tried 
 to get it to boot many times. And, weirdly, it always hung at that same 
 point.
 
 
  Switches might drop 64 bytes at a time based on those parameters.
 
 But why does the sender think the segment has been acked, when the 
 receiver has never seen that last 64 bytes? Where did the sender get 
 that TCP-level ack?

I don't know. It doesn't make sense.

Dropping a buffer (64 bytes) in a packet should cause a CRC error which
should mean the TCP packet is not valid. In any case, you should be able
to see the drops in the various Port (error) counters.

  That effectively doubles the time before the drops would occur which
  probably eliminated the drops so you didn't see this.
  
  16 = 268.435 msec
  17 = 526.871 msec
 
 which leads to another question. This is 1/2 second. Does it really mean 
 that you could end up buffering 1/2 worth of flow on each port for all 
 256 ports?

It is limited by the number of buffers (per VL per port) which is no
where near this so that could not occur.

The credits advertised on the link are reduced by the buffers in use so
the throughput would slow down on a congested port (meaning either
congestion or a slow receiver). 

  
  What doesn't make sense to me is the one flow. Are you sure there's no
  other data traffic ? If so, that doesn't make sense to me and hang
  together with the rest of this scenario.
 
 no other traffic that we could see, but there had been traffic prior to 
 this.

I would recommend putting an IB analyzer on the last link towards that
slave node and capturing the data traffic.

-- Hal

 Thanks hal!
 
 ron

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Trying to compile mvapich RHEL4U3 for ib.

2006-04-13 Thread Roger Heflin

Sayantan Sur wrote:

Hello Roger,


With mvapich-0.9.7 it errors out in the building
stage with an error ibv_free_device_list/ibv_get_device_list missing,
I cannot find any of the ib libraries on RHEL4U3 that appear to contain
that library.


Thanks for trying out MVAPICH-0.9.7. Currently, we don't have any
machine with RHEL4U3. We are installing two machines with RHEL4U3 and we
will try out MVAPICH on that as soon as possible.

The verbs `ibv_get_device_list' was introduced before the 1.0 branch.
So, if you have either OpenIB installed from the trunk or from the 1.0
branch, you _should_ be able to see this verb in the library.

I am wondering if you are trying out the default versions of the OpenIB
rpms on RHEL4U3?


Yes, I am trying the default version of RHEL4U3, alot of our
customers would much rather use unmodified RHEL, though I can probably
talk them out of it with a bit of work.   They have some strange
ideas that RHEL is somehow guaranteed to work right, and from
what I can tell it won't completely work just because RH did not
include a IB mpi variant, at least not one that I can find.




Using the mvapich-gen2-1.src.rpm from openib.org results in
these errors (on the first thing it tries to compile).
viainit.c: In function `create_cq':
viainit.c:118: error: too few arguments to function `ibv_create_cq'


This is also due to a verb change made a while back to the
ibv_create_cq. I believe this version of mvapich-gen2 source rpm was
created against the version of userspace support which is present in the
very same .src.rpm (you may install those if you want, though they are a
little old now). The userspace verbs changed after this src rpm was
created.





I have verified that the include file prototype has more arguments, than
are contained in viainit.c.


Yes, it seems that the RPM you have installed is from somewhere in
between the ibv_create_cq verb change and the later introduction of the
ibv_get_device list verb.

I'm wondering if you could try it out with the latest 1.0 branch of
OpenIB? In addition, we will get back to you asap with our testing on
RHEL4U3.

Thanks,
Sayantan.



Do you know if it would be possible to just replace the userspace
section and not mess with the kernel part of OpenIB?   I am guessing
from what I have read that this is very possible, and only requires
me to remove the already existing RHEL rpms for OpenIB userspace
support.

Thank you very much.

If you guys need access I have 2 test machines that I can give
access to to do whatever testing is needed.

  Roger
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Trying to compile mvapich RHEL4U3 for ib.

2006-04-13 Thread Sayantan Sur

Hello Roger,


Do you know if it would be possible to just replace the userspace
section and not mess with the kernel part of OpenIB?   I am guessing
from what I have read that this is very possible, and only requires
me to remove the already existing RHEL rpms for OpenIB userspace
support.


IMHO, it should be possible. However, OpenIB userspace and kernel module 
authors should be able to exactly answer this question.


Roland, any thoughts on which SVN version of userspace support may work 
with the RHEL default RPMs?




Thank you very much.

If you guys need access I have 2 test machines that I can give
access to to do whatever testing is needed.


That's great! You can send the login information to me.

Thanks,
Sayantan.

--
http://www.cse.ohio-state.edu/~surs

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Trying to compile mvapich RHEL4U3 for ib.

2006-04-13 Thread Roland Dreier
Sayantan Roland, any thoughts on which SVN version of userspace
Sayantan support may work with the RHEL default RPMs?

Any version should work.  It might be simpler to use stable releases
such as libibverbs-1.0.2 and libmthca-1.0.1.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Trying to compile mvapich RHEL4U3 for ib.

2006-04-13 Thread Scott Weitzenkamp (sweitzen)
 
 Yes, I am trying the default version of RHEL4U3, alot of our
 customers would much rather use unmodified RHEL, though I can probably
 talk them out of it with a bit of work.   They have some strange
 ideas that RHEL is somehow guaranteed to work right, and from
 what I can tell it won't completely work just because RH did not
 include a IB mpi variant, at least not one that I can find.

I didn't try MVAPICH, but I had no luck getting Open MPI 1.0.1 to work
with the RHEL4 U3 OpenIB code.

The RHEL4 U3 relnotes are pretty clear that its included OpenIB is a
technology preview not for production environments, and the APIs are
subject to change (which they already did comparing RHEL4 U3 to OF 1.0).

I think you are much better off trying the OF 1.0 code.

Scott Weitzenkamp
SQA Manager
Cisco Systems
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Trying to compile mvapich RHEL4U3 for ib.

2006-04-13 Thread Bob Woodruff
Scott wrote,
I didn't try MVAPICH, but I had no luck getting Open MPI 1.0.1 to work
with the RHEL4 U3 OpenIB code.

Not sure if you are interested in a comercial MPI or not, but we 
did test Intel MPI with the RHEL4-U3 code and it worked fine, except 
on Mellanox DDR cards.

woody
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Trying to compile mvapich RHEL4U3 for ib.

2006-04-13 Thread Sayantan Sur
Hello Roger,

I'm just CC-ing this to openib-general for the community.

Thanks for giving us access. I have verified that the
`ibv_get_device_list' verb is indeed *missing* from the OpenIB install.
I'm afraid that given this Redhat rpm, it is difficult to get mvapich to
work (without patching it).

As Roland and others have indicated, perhaps the best way is for you to
upgrade to atleast the 1.0 branch. That should be the most stable OpenIB
release yet.

https://openib.org/svn/gen2/branches/1.0/src/userspace/

You should be able to keep the kernel stuff intact and just upgrade the
user level support (management, libibverbs, libmthca). You may skip
upgrading management, however it'll be best to upgrade it too, lest you
face any OpenSM issues.

Thanks,
Sayantan.


* On Apr,4 Sayantan Sur[EMAIL PROTECTED] wrote :
 Hello Roger,
 
 Do you know if it would be possible to just replace the userspace
 section and not mess with the kernel part of OpenIB?   I am guessing
 from what I have read that this is very possible, and only requires
 me to remove the already existing RHEL rpms for OpenIB userspace
 support.
 
 IMHO, it should be possible. However, OpenIB userspace and kernel module 
 authors should be able to exactly answer this question.
 
 Roland, any thoughts on which SVN version of userspace support may work 
 with the RHEL default RPMs?
 
 
 Thank you very much.
 
 If you guys need access I have 2 test machines that I can give
 access to to do whatever testing is needed.
 
 That's great! You can send the login information to me.
 
 Thanks,
 Sayantan.
 
 -- 
 http://www.cse.ohio-state.edu/~surs
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib-general

-- 
http://www.cse.ohio-state.edu/~surs
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [PATCH] IB/ipath: Fix whitespace

2006-04-13 Thread Roland Dreier
Signed-off-by: Roland Dreier [EMAIL PROTECTED]
---
Nothing but replacing spaces with tabs.  Please apply to svn and let
me know if it's OK to queue for upstream.

BTW, any progress on reviewing the static function cleanups I sent earlier?

 drivers/infiniband/hw/ipath/ipath_intr.c  |4 +
 drivers/infiniband/hw/ipath/ipath_verbs.c |  114 +++--
 2 files changed, 59 insertions(+), 59 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c 
b/drivers/infiniband/hw/ipath/ipath_intr.c
index 60f5f41..0bcb428 100644
--- a/drivers/infiniband/hw/ipath/ipath_intr.c
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c
@@ -172,8 +172,8 @@ static void handle_e_ibstatuschanged(str
   was %s\n, dd-ipath_unit,
   ib_linkstate(lstate),
   ib_linkstate((unsigned)
-   dd-ipath_lastibcstat
-IPATH_IBSTATE_MASK));
+   dd-ipath_lastibcstat
+IPATH_IBSTATE_MASK));
}
else {
lstate = dd-ipath_lastibcstat  IPATH_IBSTATE_MASK;
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c 
b/drivers/infiniband/hw/ipath/ipath_verbs.c
index e3be492..8d2558a 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
@@ -1125,26 +1125,26 @@ static void __exit ipath_verbs_cleanup(v
 
 static ssize_t show_rev(struct class_device *cdev, char *buf)
 {
-struct ipath_ibdev *dev =
-container_of(cdev, struct ipath_ibdev, ibdev.class_dev);
-int vendor, boardrev, majrev, minrev;
-
-ipath_layer_query_device(dev-dd, vendor, boardrev,
- majrev, minrev);
-return sprintf(buf, %d.%d\n, majrev, minrev);
+   struct ipath_ibdev *dev =
+   container_of(cdev, struct ipath_ibdev, ibdev.class_dev);
+   int vendor, boardrev, majrev, minrev;
+
+   ipath_layer_query_device(dev-dd, vendor, boardrev,
+majrev, minrev);
+   return sprintf(buf, %d.%d\n, majrev, minrev);
 }
 
 static ssize_t show_hca(struct class_device *cdev, char *buf)
 {
-struct ipath_ibdev *dev =
-container_of(cdev, struct ipath_ibdev, ibdev.class_dev);
-int ret;
-
-ret = ipath_layer_get_boardname(dev-dd, buf, 128);
-if (ret  0)
-goto bail;
-strcat(buf, \n);
-ret = strlen(buf);
+   struct ipath_ibdev *dev =
+   container_of(cdev, struct ipath_ibdev, ibdev.class_dev);
+   int ret;
+
+   ret = ipath_layer_get_boardname(dev-dd, buf, 128);
+   if (ret  0)
+   goto bail;
+   strcat(buf, \n);
+   ret = strlen(buf);
 
 bail:
return ret;
@@ -1152,40 +1152,40 @@ bail:
 
 static ssize_t show_stats(struct class_device *cdev, char *buf)
 {
-struct ipath_ibdev *dev =
-container_of(cdev, struct ipath_ibdev, ibdev.class_dev);
-int i;
-int len;
-
-len = sprintf(buf,
-  RC resends  %d\n
-  RC QACKs%d\n
-  RC ACKs %d\n
-  RC SEQ NAKs %d\n
-  RC RDMA seq %d\n
-  RC RNR NAKs %d\n
-  RC OTH NAKs %d\n
-  RC timeouts %d\n
-  RC RDMA dup %d\n
-  piobuf wait %d\n
-  no piobuf   %d\n
-  PKT drops   %d\n
-  WQE errs%d\n,
-  dev-n_rc_resends, dev-n_rc_qacks, dev-n_rc_acks,
-  dev-n_seq_naks, dev-n_rdma_seq, dev-n_rnr_naks,
-  dev-n_other_naks, dev-n_timeouts,
-  dev-n_rdma_dup_busy, dev-n_piowait,
-  dev-n_no_piobuf, dev-n_pkt_drops, dev-n_wqe_errs);
-for (i = 0; i  ARRAY_SIZE(dev-opstats); i++) {
+   struct ipath_ibdev *dev =
+   container_of(cdev, struct ipath_ibdev, ibdev.class_dev);
+   int i;
+   int len;
+
+   len = sprintf(buf,
+ RC resends  %d\n
+ RC QACKs%d\n
+ RC ACKs %d\n
+ RC SEQ NAKs %d\n
+ RC RDMA seq %d\n
+ RC RNR NAKs %d\n
+ RC OTH NAKs %d\n
+ RC timeouts %d\n
+ RC RDMA dup %d\n
+ piobuf wait %d\n
+ no piobuf   %d\n
+ PKT drops   %d\n
+ WQE errs%d\n,
+ dev-n_rc_resends, dev-n_rc_qacks, dev-n_rc_acks,
+ dev-n_seq_naks, dev-n_rdma_seq, dev-n_rnr_naks,
+ dev-n_other_naks, dev-n_timeouts,
+ 

[openib-general][PATCH] srp: tuned parameters,

2006-04-13 Thread Vu Pham

Hi Roland,
Please review this patch
+ introducing srp_sg_tablesize as module parameter
+ adjusting SRP_MAX_IU_LEN, SRP_MAX_INDIRECT from srp_sg_tablesize
+ throttling command per lun ie. max_cmd_per_lun can be passed in when 
adding target (same as max_sect)


Signed-off-by: Vu Pham [EMAIL PROTECTED]
Index: infiniband/ulp/srp/ib_srp.c
===
--- infiniband/ulp/srp/ib_srp.c	(revision 6455)
+++ infiniband/ulp/srp/ib_srp.c	(working copy)
@@ -62,6 +62,12 @@ MODULE_DESCRIPTION(InfiniBand SCSI RDMA
 		   v DRV_VERSION  ( DRV_RELDATE ));
 MODULE_LICENSE(Dual BSD/GPL);
 
+int srp_sg_tablesize = SRP_MAX_SG_TABLESIZE;
+
+module_param(srp_sg_tablesize, int, 0444);
+MODULE_PARM_DESC(srp_sg_tablesize,
+		 Max number of scatter lists supportted per IO - default is 32);
+
 static int topspin_workarounds = 1;
 
 module_param(topspin_workarounds, int, 0444);
@@ -1325,7 +1331,6 @@ static struct scsi_host_template srp_tem
 	.eh_host_reset_handler		= srp_reset_host,
 	.can_queue			= SRP_SQ_SIZE,
 	.this_id			= -1,
-	.sg_tablesize			= SRP_MAX_INDIRECT,
 	.cmd_per_lun			= SRP_SQ_SIZE,
 	.use_clustering			= ENABLE_CLUSTERING,
 	.shost_attrs			= srp_host_attrs
@@ -1381,6 +1386,7 @@ enum {
 	SRP_OPT_PKEY		= 1  3,
 	SRP_OPT_SERVICE_ID	= 1  4,
 	SRP_OPT_MAX_SECT	= 1  5,
+	SRP_OPT_MAX_CMD_PER_LUN	= 1  6,
 	SRP_OPT_ALL		= (SRP_OPT_ID_EXT	|
    SRP_OPT_IOC_GUID	|
    SRP_OPT_DGID		|
@@ -1389,13 +1395,14 @@ enum {
 };
 
 static match_table_t srp_opt_tokens = {
-	{ SRP_OPT_ID_EXT,	id_ext=%s 	},
-	{ SRP_OPT_IOC_GUID,	ioc_guid=%s 	},
-	{ SRP_OPT_DGID,		dgid=%s 	},
-	{ SRP_OPT_PKEY,		pkey=%x 	},
-	{ SRP_OPT_SERVICE_ID,	service_id=%s },
-	{ SRP_OPT_MAX_SECT, max_sect=%d 	},
-	{ SRP_OPT_ERR,		NULL 		}
+	{ SRP_OPT_ID_EXT,		id_ext=%s 		},
+	{ SRP_OPT_IOC_GUID,		ioc_guid=%s 		},
+	{ SRP_OPT_DGID,			dgid=%s 		},
+	{ SRP_OPT_PKEY,			pkey=%x 		},
+	{ SRP_OPT_SERVICE_ID,		service_id=%s		},
+	{ SRP_OPT_MAX_SECT,		max_sect=%d 		},
+	{ SRP_OPT_MAX_CMD_PER_LUN,	max_cmd_per_lun=%d 	},
+	{ SRP_OPT_ERR,			NULL 			}
 };
 
 static int srp_parse_options(const char *buf, struct srp_target_port *target)
@@ -1471,6 +1478,14 @@ static int srp_parse_options(const char 
 			target-scsi_host-max_sectors = token;
 			break;
 
+		case SRP_OPT_MAX_CMD_PER_LUN:
+			if (match_int(args, token)) {
+printk(KERN_WARNING PFX bad max cmd_per_lun parameter '%s'\n, p);
+goto out;
+			}
+			target-scsi_host-cmd_per_lun = token;
+			break;
+
 		default:
 			printk(KERN_WARNING PFX unknown parameter or missing value 
 			   '%s' in target creation request\n, p);
@@ -1509,6 +1524,7 @@ static ssize_t srp_create_target(struct 
 		return -ENOMEM;
 
 	target_host-max_lun = SRP_MAX_LUN;
+	target_host-sg_tablesize = srp_sg_tablesize;
 
 	target = host_to_target(target_host);
 	memset(target, 0, sizeof *target);
Index: infiniband/ulp/srp/ib_srp.h
===
--- infiniband/ulp/srp/ib_srp.h	(revision 6455)
+++ infiniband/ulp/srp/ib_srp.h	(working copy)
@@ -47,6 +47,8 @@
 #include rdma/ib_sa.h
 #include rdma/ib_cm.h
 
+extern int srp_sg_tablesize;
+
 enum {
 	SRP_PATH_REC_TIMEOUT_MS	= 1000,
 	SRP_ABORT_TIMEOUT_MS	= 5000,
@@ -55,7 +57,7 @@ enum {
 	SRP_DLID_REDIRECT	= 2,
 
 	SRP_MAX_LUN		= 512,
-	SRP_MAX_IU_LEN		= 256,
+	SRP_MAX_SG_TABLESIZE	= 32,
 
 	SRP_RQ_SHIFT	= 6,
 	SRP_RQ_SIZE		= 1  SRP_RQ_SHIFT,
@@ -66,9 +68,10 @@ enum {
 };
 
 #define SRP_OP_RECV		(1  31)
-#define SRP_MAX_INDIRECT	((SRP_MAX_IU_LEN -			\
-  sizeof (struct srp_cmd) -		\
-  sizeof (struct srp_indirect_buf)) / 16)
+#define SRP_MAX_INDIRECT	srp_sg_tablesize	
+#define SRP_MAX_IU_LEN		(srp_sg_tablesize * 16 +		\
+ sizeof (struct srp_cmd) +		\
+ sizeof (struct srp_indirect_buf))	\
 
 enum srp_target_state {
 	SRP_TARGET_LIVE,
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Fix for ibping

2006-04-13 Thread Viswanath Krishnamurthy
Works like a charm...

-Viswa
On 12 Apr 2006 21:32:33 -0400, Hal Rosenstock [EMAIL PROTECTED] wrote:
On Wed, 2006-04-12 at 20:46, Hal Rosenstock wrote: On Wed, 2006-04-12 at 18:25, Viswanath Krishnamurthy wrote:  The RMPP version needs to be 1. Thanks. I'm not sure what changed here to require this. I need to do
 some more digging.I figured it out. The fix is in r6448. Can you update and try it ?Thanks.-- Hal -- Hal  [EMAIL PROTECTED] src]# svn diff ibping.c  Index: 
ibping.c  ===  -- ibping.c(revision 6446)  +++ ibping.c(working copy)  @@ -336,7 +336,7 @@  exit(0);
  }   - if (mad_register_client(ping_class, 0)  0)  + if (mad_register_client(ping_class, 1)  0)

IBERROR(can't register to ping class %d,  ping_class);   if (ib_resolve_portid_str(portid, argv[0], dest_type, sm_id)   0)  
   __   ___  openib-general mailing list  
openib-general@openib.org  http://openib.org/mailman/listinfo/openib-general   To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list 
openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [PATCH] git: updates to rdma_cm branch

2006-04-13 Thread Sean Hefty

Roland Dreier wrote:

OK, I updated my rdma_cm branch with all of this.

In addition I put the following in -- it's idiomatic in the kernel to
let the compiler handle htons(A_CONSTANT) in code.  Should I commit
this to svn too?


This change is fine.  Please commit to svn too.  Thanks.

- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general][patch review] srp: fmr implementation,

2006-04-13 Thread Vu Pham

Hi Roland,


Apr  7 18:17:17 lab105 kernel: Unable to handle kernel paging request at 
virtual address 6b6b6b6b6b6b6b6b


I think I fixed the bug causing this oops (I was able to reproduce it,
and I don't see it any more).  I checked the following patch in and
queued it for kernel 2.6.17:



My ia64 system still crashes with the patch applied. Please see log below


Apr 13 13:10:21 lab105 kernel: Abort for req_index 1
Apr 13 13:10:26 lab105 kernel: ib_srp: SRP reset_host called
Apr 13 13:10:28 lab105 kernel: ib_srp: connection closed
Apr 13 13:10:28 lab105 kernel: Unable to handle kernel paging request at 
virtual address 6b6b6b6b6b6b6b6b

Apr 13 13:10:28 lab105 kernel: scsi_eh_2[13324]: Oops 11012296146944 [1]
Apr 13 13:10:28 lab105 kernel: Modules linked in: ib_srp ib_cm ib_sa 
evdev joydev sg st sr_mod ide_cd cdrom usbserial parport_pc lp parport 
ipv6 thermal processor fan button binfmt_misc usbhid ib_mthca ib_mad 
ib_core ehci_hcd uhci_hcd usbcore i2c_i801 i2c_core e1000 nls_iso8859_1 
nls_cp437 dm_mod reiserfs mptspi scsi_transport_spi mptscsih mptbase 
sd_mod scsi_mod

Apr 13 13:10:28 lab105 kernel:
Apr 13 13:10:28 lab105 kernel: Pid: 13324, CPU 1, comm:scsi_eh_2
Apr 13 13:10:28 lab105 kernel: psr : 121008026018 ifs : 
850d ip  : [a0020235a0f1]Not tainted
Apr 13 13:10:28 lab105 kernel: ip is at srp_reconnect_target+0x2b1/0x5c0 
[ib_srp]
Apr 13 13:10:28 lab105 kernel: unat:  pfs : 
050d rsc : 0003
Apr 13 13:10:28 lab105 kernel: rnat:  bsps: 
 pr  : 9541
Apr 13 13:10:28 lab105 kernel: ldrs:  ccv : 
 fpsr: 0009804c8a70433f

Apr 13 13:10:28 lab105 kernel: csd :  ssd : 
Apr 13 13:10:28 lab105 kernel: b0  : a0020235a060 b6  : 
a0013320 b7  : a002023ddd80
Apr 13 13:10:28 lab105 kernel: f6  : 1003e6b6b6b6b6b6b6b6b f7  : 
0ffdd8000
Apr 13 13:10:28 lab105 kernel: f8  : 1003e3598 f9  : 
1003e0118
Apr 13 13:10:28 lab105 kernel: f10 : 1003e f11 : 
1003e
Apr 13 13:10:28 lab105 kernel: r1  : a0020235c200 r2  : 
e001e58f8b58 r3  : e0018d748a40
Apr 13 13:10:28 lab105 kernel: r8  : e001e58f8ba8 r9  : 
e001e58f89f8 r10 : a00100931338
Apr 13 13:10:28 lab105 kernel: r11 : 0001 r12 : 
e001ea8f7d00 r13 : e001ea8f
Apr 13 13:10:28 lab105 kernel: r14 : a00100931340 r15 : 
e001ea8f r16 : 0001
Apr 13 13:10:28 lab105 kernel: r17 : 0001 r18 : 
e001ea8f0f84 r19 : a00100931348
Apr 13 13:10:28 lab105 kernel: r20 :  r21 : 
0008 r22 : e479c980
Apr 13 13:10:28 lab105 kernel: r23 : e001f5e7a920 r24 : 
0080 r25 : e479c99f
Apr 13 13:10:28 lab105 kernel: r26 : a002023ddd80 r27 : 
e00187d4c1e0 r28 : e00187d4c000
Apr 13 13:10:28 lab105 kernel: r29 : e001f5e7a880 r30 : 
e0018d748ab8 r31 : e0018d748a20

Apr 13 13:10:28 lab105 kernel:
Apr 13 13:10:28 lab105 kernel: Call Trace:
Apr 13 13:10:28 lab105 kernel:  [a00100013000] show_stack+0x80/0xa0
Apr 13 13:10:28 lab105 kernel: 
sp=e001ea8f7880 bsp=e001ea8f1308

Apr 13 13:10:28 lab105 kernel:  [a00100013860] show_regs+0x840/0x880
Apr 13 13:10:28 lab105 kernel: 
sp=e001ea8f7a50 bsp=e001ea8f12a8

Apr 13 13:10:28 lab105 kernel:  [a00100035a10] die+0x1b0/0x2e0
Apr 13 13:10:28 lab105 kernel: 
sp=e001ea8f7a60 bsp=e001ea8f1260
Apr 13 13:10:28 lab105 kernel:  [a00100057840] 
ia64_do_page_fault+0x9a0/0xb20
Apr 13 13:10:28 lab105 kernel: 
sp=e001ea8f7a80 bsp=e001ea8f11f0
Apr 13 13:10:28 lab105 kernel:  [a001bc80] 
ia64_leave_kernel+0x0/0x280
Apr 13 13:10:28 lab105 kernel: 
sp=e001ea8f7b30 bsp=e001ea8f11f0
Apr 13 13:10:28 lab105 kernel:  [a0020235a0f0] 
srp_reconnect_target+0x2b0/0x5c0 [ib_srp]
Apr 13 13:10:28 lab105 kernel: 
sp=e001ea8f7d00 bsp=e001ea8f1188
Apr 13 13:10:28 lab105 kernel:  [a0020235a460] 
srp_reset_host+0x60/0xa0 [ib_srp]
Apr 13 13:10:28 lab105 kernel: 
sp=e001ea8f7dc0 bsp=e001ea8f1160
Apr 13 13:10:28 lab105 kernel:  [a00201b2f4d0] 
scsi_try_host_reset+0xd0/0x240 [scsi_mod]
Apr 13 13:10:28 lab105 kernel: 
sp=e001ea8f7dc0 bsp=e001ea8f1130
Apr 13 13:10:28 lab105 kernel:  [a00201b320a0] 
scsi_error_handler+0x1860/0x2000 [scsi_mod]
Apr 13 13:10:28 lab105 kernel: 
sp=e001ea8f7dc0 bsp=e001ea8f1040

Apr 13 13:10:28 lab105 kernel:  [a001000b98e0] kthread+0x220/0x280
Apr 13 13:10:28 lab105 kernel: 
sp=e001ea8f7e10 bsp=e001ea8f1000
Apr 13 13:10:28 lab105 kernel:  [a00100011440] 
kernel_thread_helper+0xe0/0x100
Apr 13 13:10:28 lab105 kernel: 
sp=e001ea8f7e30 bsp=e001ea8f0fd0
Apr 13 13:10:28 lab105 kernel:  [a0019140] 
start_kernel_thread+0x20/0x40
Apr 13 13:10:28 lab105 kernel: 
sp=e001ea8f7e30 bsp=e001ea8f0fd0
Apr 

Re: [openib-general] RDMA RC QP returning RNR Retry Counter Exceeded Error

2006-04-13 Thread Sean Hefty

Ira Weiny wrote:

I have started writing a simple RDMA app which uses the rdmacm.  I have gotten
the connection established, QP's and MR's set up, and have sent the RDMA ETH.
However, more and more I am getting the RNR Retry Counter Exceeded error back
from the client's post send of the RDMA ETH.  About 1/10 times it will work
but most of the time it does not.  I have figured out that you can't set the
IBV_QP_RNR_RETRY attribute unless you go from RTR to RTS.  The state of the QP
is RTS and the IBV_QP_RNR_RETRY value is 0 as set by the rdmacm.  Do I have to,
or can I, transition the QP from RTS to RTR and then back again to set the
IBV_QP_RNR_RETRY?


You cannot transition a QP from RTS to RTR.

Did you post receive buffers before you complete the connection?  Also, what's 
RDMA ETH?


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] New diags tool available

2006-04-13 Thread Hal Rosenstock
Hi,

With svn r6460, a new diags tool is now available on the trunk. It is
Ira Weiny's saquery. (Thanks for bearing with me on this).

saquery tool obtains information based on node name:

saquery -h
Usage: saquery [-h -d -P -N -L -G][name]
   Queries node records by default
   -d enable debugging
   -P get PathRecord info
   -N get NodeRecord info
   -L Return just the Lid of the name specified
   -G Return just the Guid of the name specified

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] opensm issues on 64 node RHEL4 cluster?

2006-04-13 Thread Troy Benjegerdes
We just moved a cluster over to the latest redhat release, and opensm
seems to be having issues.

This is running the redhat provided kernel and opensm packages

[EMAIL PROTECTED] troy]# uname -r
2.6.9-34.ELsmp
[EMAIL PROTECTED] troy]# cat /etc/redhat-release
Red Hat Enterprise Linux WS release 4 (Nahant Update 3)

[EMAIL PROTECTED] troy]# rpm -qi opensm
Name: opensm   Relocations: (not
relocatable)
Version : 1.0   Vendor: Red Hat, Inc.
Release : 0.4265.2.EL4  Build Date: Thu 02 Feb 2006
02:24:15 PM CST
Install Date: Tue 14 Mar 2006 12:35:09 PM CST  Build Host:
hs20-bc1-7.build.redhat.com
Group   : System Environment/Base   Source RPM:
opensm-1.0-0.4265.2.EL4.src.rpm
Size: 1122289  License: GPL/BSD
Signature   : DSA/SHA1, Thu 16 Feb 2006 01:45:15 PM CST, Key ID
219180cddb42a60e
Packager: Red Hat, Inc. http://bugzilla.redhat.com/bugzilla
URL : https://openib.org/svn/gen2/trunk

The opensm log file is at:

http://scl.ameslab.gov/~troy/64-node-RHEL4-osm.log.gz


Should I go ahead and grab the opensm from the latest subversion and see
if it's any better?
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] opensm issues on 64 node RHEL4 cluster?

2006-04-13 Thread Hal Rosenstock
Hi Troy,

On Thu, 2006-04-13 at 15:35, Troy Benjegerdes wrote:
 We just moved a cluster over to the latest redhat release, and opensm
 seems to be having issues.
 
 This is running the redhat provided kernel and opensm packages
 
 [EMAIL PROTECTED] troy]# uname -r
 2.6.9-34.ELsmp
 [EMAIL PROTECTED] troy]# cat /etc/redhat-release
 Red Hat Enterprise Linux WS release 4 (Nahant Update 3)
 
 [EMAIL PROTECTED] troy]# rpm -qi opensm
 Name: opensm   Relocations: (not
 relocatable)
 Version : 1.0   Vendor: Red Hat, Inc.
 Release : 0.4265.2.EL4  Build Date: Thu 02 Feb 2006
 02:24:15 PM CST
 Install Date: Tue 14 Mar 2006 12:35:09 PM CST  Build Host:
 hs20-bc1-7.build.redhat.com
 Group   : System Environment/Base   Source RPM:
 opensm-1.0-0.4265.2.EL4.src.rpm
 Size: 1122289  License: GPL/BSD
 Signature   : DSA/SHA1, Thu 16 Feb 2006 01:45:15 PM CST, Key ID
 219180cddb42a60e
 Packager: Red Hat, Inc. http://bugzilla.redhat.com/bugzilla
 URL : https://openib.org/svn/gen2/trunk
 
 The opensm log file is at:
 
 http://scl.ameslab.gov/~troy/64-node-RHEL4-osm.log.gz
 
 
 Should I go ahead and grab the opensm from the latest subversion and see
 if it's any better?

If that is the technology preview, then using OpenSM from either OF 1.0
rc2 or from the trunk _should_ be much better especially in your
environment. Note you that if you do this, you would also need the
management libraries as well as OpenSM.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [PATCH] RFC: start weaning userspace drivers from sysfs

2006-04-13 Thread Roland Dreier
As part of the libibverbs 1.1 release, I would like to remove the
dependency on libsysfs, since libsysfs is not very well maintained,
not consistent across distros, and the simple sysfs stuff we need is
easy to do directly.

In this direction, I've already made some changes to libibverbs to
reduce its internal use of sysfs.  However, sysfs is embedded in the
ABI between libibverbs and low-level drivers: libibverbs looks for a
function in each driver with the name openib_driver_init and calls
it with a struct sysfs_class_device *.

To fix this in libibverbs 1.1 (which will break ABI from libibverbs
1.0), I propose to replace the driver entry point with a new entry
point that looks like

struct ibv_device *ibv_driver_init(const char *uverbs_sys_path,
   int abi_version);

where uverbs_sys_path will be a string like 
/sys/class/infiniband_verbs/uverbs0 
and abi_version will be the contents of the file abi_version under
that path, or 0 if the file is not present.  (This last parameter is
just to save every low-level driver from implementing the same code to
read the standard abi_version sysfs attribute).

However, we can move low-level drivers in this direction in a
piecemeal, forwards and backwards compatible way: just add a new
ibv_driver_init entry point, but leave the old openib_driver_init
entry point there and make it a simple wrapper around the new
function.  As an example, here's a patch to libmthca that does that.

Thoughts?

Thanks,
  Roland

--- src/userspace/libmthca/configure.in (revision 6431)
+++ src/userspace/libmthca/configure.in (working copy)
@@ -12,16 +12,21 @@ dnl Checks for programs
 AC_PROG_CC
 
 dnl Checks for libraries
+AC_CHECK_LIB(ibverbs, ibv_get_device_list, [],
+AC_MSG_ERROR([ibv_get_device_list() not found.  libmthca requires 
libibverbs.]))
 
 dnl Checks for header files.
 AC_CHECK_HEADER(infiniband/driver.h, [],
-AC_MSG_ERROR([infiniband/driver.h not found.  Is libibverbs installed?]))
+AC_MSG_ERROR([infiniband/driver.h not found.  libmthca requires 
libibverbs.]))
 AC_HEADER_STDC
 
 dnl Checks for typedefs, structures, and compiler characteristics.
 AC_C_CONST
 AC_CHECK_SIZEOF(long)
 
+dnl Checks for library functions
+AC_CHECK_FUNCS(ibv_read_sysfs_file)
+
 AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script,
 if test -n `$LD --help  /dev/null 2/dev/null | grep version-script`; 
then
 ac_cv_version_script=yes
--- src/userspace/libmthca/src/mthca.map(revision 6431)
+++ src/userspace/libmthca/src/mthca.map(working copy)
@@ -1,4 +1,6 @@
 {
-   global: openib_driver_init;
+   global:
+   ibv_driver_init;
+   openib_driver_init;
local: *;
 };
--- src/userspace/libmthca/src/mthca.c  (revision 6431)
+++ src/userspace/libmthca/src/mthca.c  (working copy)
@@ -217,29 +217,53 @@ static struct ibv_device_ops mthca_dev_o
.free_context  = mthca_free_context
 };
 
-struct ibv_device *openib_driver_init(struct sysfs_class_device *sysdev)
+/*
+ * Keep a private implementation of HAVE_IBV_READ_SYSFS_FILE to handle
+ * old versions of libibverbs that didn't implement it.  This can be
+ * removed when libibverbs 1.0.3 or newer is available everywhere.
+ */
+#ifndef HAVE_IBV_READ_SYSFS_FILE
+static int ibv_read_sysfs_file(const char *dir, const char *file,
+  char *buf, size_t size)
+{
+   char path[256];
+   int fd;
+   int len;
+
+   snprintf(path, sizeof path, %s/%s, dir, file);
+
+   fd = open(path, O_RDONLY);
+   if (fd  0)
+   return -1;
+
+   len = read(fd, buf, size);
+
+   close(fd);
+
+   if (len  0  buf[len - 1] == '\n')
+   buf[--len] = '\0';
+
+   return len;
+}
+#endif /* HAVE_IBV_READ_SYSFS_FILE */
+
+struct ibv_device *ibv_driver_init(const char *uverbs_sys_path,
+  int abi_version)
 {
-   struct sysfs_device*pcidev;
-   struct sysfs_attribute *attr;
+   charvalue[8];
struct mthca_device*dev;
unsignedvendor, device;
int i;
 
-   pcidev = sysfs_get_classdev_device(sysdev);
-   if (!pcidev)
+   if (ibv_read_sysfs_file(uverbs_sys_path, device/vendor,
+   value, sizeof value)  0)
return NULL;
+   sscanf(value, %i, vendor);
 
-   attr = sysfs_get_device_attr(pcidev, vendor);
-   if (!attr)
+   if (ibv_read_sysfs_file(uverbs_sys_path, device/device,
+   value, sizeof value)  0)
return NULL;
-   sscanf(attr-value, %i, vendor);
-   sysfs_close_attribute(attr);
-
-   attr = sysfs_get_device_attr(pcidev, device);
-   if (!attr)
-   return NULL;
-   sscanf(attr-value, %i, device);
-   sysfs_close_attribute(attr);
+   sscanf(value, %i, device);
 
for (i = 0; 

Re: [openib-general][patch review] srp: fmr implementation,

2006-04-13 Thread Roland Dreier
Hmm, it's clearly a use-after-free bug.  Based on

ip is at srp_reconnect_target+0x2b1/0x5c0 [ib_srp]

can you guess where it is in the SRP driver or what it's accessing?

Also this is happening because the connection is being reconnected,
because SCSI commands are timing out.  Do you have any idea why this
is happening?  What does the target see when this happens?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general][patch review] srp: fmr implementation,

2006-04-13 Thread Roland Dreier
Roland Hmm, it's clearly a use-after-free bug.

(...because 6b is the slab poisoning free value, and the oops is at
6b6b6b6b6b6b6b6b...)
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general][patch review] srp: fmr implementation,

2006-04-13 Thread Roland Dreier
One stupid but useful way to narrow this down would be to reproduce
the crash with the following patch applied on top...

Index: linux-kernel/infiniband/ulp/srp/ib_srp.c
===
--- linux-kernel.orig/infiniband/ulp/srp/ib_srp.c   2006-04-12 
12:24:37.398566000 -0700
+++ linux-kernel/infiniband/ulp/srp/ib_srp.c2006-04-13 13:57:45.793412000 
-0700
@@ -428,7 +428,12 @@
target-state = SRP_TARGET_CONNECTING;
spin_unlock_irq(target-scsi_host-host_lock);
 
+   printk(KERN_ERR %s/%d: about to disconnect...\n, __func__, __LINE__);
+
srp_disconnect_target(target);
+
+   printk(KERN_ERR %s/%d: disconnected...\n, __func__, __LINE__);
+
/*
 * Now get a new local CM ID so that we avoid confusing the
 * target in case things are really fouled up.
@@ -442,23 +447,33 @@
ib_destroy_cm_id(target-cm_id);
target-cm_id = new_cm_id;
 
+   printk(KERN_ERR %s/%d: got a new CM ID...\n, __func__, __LINE__);
+
qp_attr.qp_state = IB_QPS_RESET;
ret = ib_modify_qp(target-qp, qp_attr, IB_QP_STATE);
if (ret)
goto err;
 
+   printk(KERN_ERR %s/%d: Reset QP...\n, __func__, __LINE__);
+
ret = srp_init_qp(target, target-qp);
if (ret)
goto err;
 
+   printk(KERN_ERR %s/%d: Init QP...\n, __func__, __LINE__);
+
while (ib_poll_cq(target-cq, 1, wc)  0)
; /* nothing */
 
+   printk(KERN_ERR %s/%d: cleared CQ...\n, __func__, __LINE__);
+
list_for_each_entry(req, target-req_queue, list) {
req-scmnd-result = DID_RESET  16;
req-scmnd-scsi_done(req-scmnd);
}
 
+   printk(KERN_ERR %s/%d: cleared request queue...\n, __func__, 
__LINE__);
+
target-rx_head  = 0;
target-tx_head  = 0;
target-tx_tail  = 0;
@@ -468,10 +483,14 @@
target-req_ring[SRP_SQ_SIZE - 1].next = -1;
INIT_LIST_HEAD(target-req_queue);
 
+   printk(KERN_ERR %s/%d: reinited req ring...\n, __func__, __LINE__);
+
ret = srp_connect_target(target);
if (ret)
goto err;
 
+   printk(KERN_ERR %s/%d: connected target...\n, __func__, __LINE__);
+
spin_lock_irq(target-scsi_host-host_lock);
if (target-state == SRP_TARGET_CONNECTING) {
ret = 0;
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: [uDAPL] dtest server never ends when using the dapl provider OpenIB-scm1

2006-04-13 Thread Arlin Davis

Dotan Barak wrote:



Hi.

thanks for the quick response.

I executed the dtest with the -v parameter and here is the output of both sides.
I added the test the '-l' parameter to be able to change to dapl provider in 
command line (if you wish i can post you a patch).

full server output:
---
sw043:/tmp/tsscr/svn.mlx_tp/gen2/userspace/ulps/udapl/dtest # ./dtest -l 
OpenIB-scm2 -v
23996 DAPL_PROVIDER is OpenIB-scm2
23996 Verbose
23996 Running as server
23996 Allocated RDMA buffers (r:0x8052390,s:0x8052618) len 64
23996 Opened Interface Adaptor
...
23996 waiting for message receive event
23996 inbound message; message arrived!
23996 SERVER: RCV buffer 0x80525d0 contains: 0x55 len=64
23996 SERVER: SND buffer 0x8052858 contains: 0xffaa len=64
23996 calling post_send
23996 send_msg completed
23996 do_ping_pong_msg complete
23996 Disconnect and Free EP 0x805f518

 

Hmm, not sure what this thread is waiting on. I would expect to see the 
dat_ep_disconnect messages before the wait complete or at least the 
dat_ep_disconnect message indicating a blocking disconnect call. The 
next 3 messages expected are as follow:


dat_ep_disconnect
dat_ep_disconnect completed
dat_evd_wait for h_conn_evd completed

Can you attach to the server process with gdb and get me a back trace from each 
of the threads?

What does driver IBED-1.0-rc3 consist of? 


Thanks,

-arlin


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: [uDAPL] dtest server never ends when using the dapl provider OpenIB-scm1

2006-04-13 Thread Sean Hefty

Arlin Davis wrote:

What does driver IBED-1.0-rc3 consist of?


I think that we want all IBED release issues to go directly to the IBED release 
team.


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Compile problems with core code and pathscale for svn6462 and linux-2.6.17-rc1

2006-04-13 Thread Matt Leininger
I'm trying to compile the svn 6462 snapshot with linux-2.6.17-rc1 on a
RHEL4 based system.

I get the following error for addr.c:

  CC [M]  drivers/infiniband/core/index.o
  CC [M]  drivers/infiniband/core/addr.o
In file included from drivers/infiniband/core/addr.c:38:
drivers/infiniband/include/rdma/ib_addr.h:43: error: field `dev_type'
has incomplete type
drivers/infiniband/core/addr.c: In function `copy_addr':
drivers/infiniband/core/addr.c:95: error: `RDMA_NODE_IB_CA' undeclared
(first use in this function)
drivers/infiniband/core/addr.c:95: error: (Each undeclared identifier is
reported only once
drivers/infiniband/core/addr.c:95: error: for each function it appears
in.)
drivers/infiniband/core/addr.c:98: error: `RDMA_NODE_RNIC' undeclared
(first use in this function)
make[3]: *** [drivers/infiniband/core/addr.o] Error 1
make[2]: *** [drivers/infiniband/core] Error 2
make[1]: *** [drivers/infiniband] Error 2


If I remove include/rdma (which I had to do in the past) then some of
the pathscale code fails to compile.  Here is the error:

  LD [M]  drivers/infiniband/core/rdma_ucm.o
  CC [M]  drivers/infiniband/hw/ipath/ipath_cq.o
In file included from drivers/infiniband/hw/ipath/ipath_cq.c:36:
drivers/infiniband/hw/ipath/ipath_verbs.h:40:26: rdma/ib_pack.h: No such
file or directory
In file included from drivers/infiniband/hw/ipath/ipath_cq.c:36:
drivers/infiniband/hw/ipath/ipath_verbs.h:128: error: field `grh' has
incomplete type
drivers/infiniband/hw/ipath/ipath_verbs.h:147: error: field `mgid' has
incomplete type
drivers/infiniband/hw/ipath/ipath_verbs.h:155: error: field `ibmr' has
incomplete type
drivers/infiniband/hw/ipath/ipath_verbs.h:161: error: field `ibfmr' has
incomplete type
drivers/infiniband/hw/ipath/ipath_verbs.h:168: error: field `ibpd' has
incomplete type
drivers/infiniband/hw/ipath/ipath_verbs.h:174: error: field `ibah' has
incomplete type
drivers/infiniband/hw/ipath/ipath_verbs.h:175: error: field `attr' has
incomplete type
drivers/infiniband/hw/ipath/ipath_verbs.h:223: error: field `ibcq' has
incomplete type
drivers/infiniband/hw/ipath/ipath_verbs.h:239: error: field `wr' has
incomplete type
drivers/infiniband/hw/ipath/ipath_verbs.h:269: error: field `ibsrq' has
incomplete type
drivers/infiniband/hw/ipath/ipath_verbs.h:284: error: field `ibqp' has
incomplete type
drivers/infiniband/hw/ipath/ipath_verbs.h:288: error: field
`remote_ah_attr' has incomplete type
drivers/infiniband/hw/ipath/ipath_verbs.h:331: error: field `path_mtu'
has incomplete type
drivers/infiniband/hw/ipath/ipath_verbs.h:412: error: field `ibdev' has
incomplete type
drivers/infiniband/hw/ipath/ipath_verbs.h:485: error: field `ibucontext'
has incomplete type
drivers/infiniband/hw/ipath/ipath_verbs.h: In function `to_imr':
drivers/infiniband/hw/ipath/ipath_verbs.h:490: warning: type defaults to
`int' in declaration of `__mptr'
drivers/infiniband/hw/ipath/ipath_verbs.h:490: warning: initialization
from incompatible pointer type
drivers/infiniband/hw/ipath/ipath_verbs.h: In function `to_ifmr':
drivers/infiniband/hw/ipath/ipath_verbs.h:495: warning: type defaults to
`int' in declaration of `__mptr'
drivers/infiniband/hw/ipath/ipath_verbs.h:495: warning: initialization
from incompatible pointer type
drivers/infiniband/hw/ipath/ipath_verbs.h: In function `to_ipd':
drivers/infiniband/hw/ipath/ipath_verbs.h:500: warning: type defaults to
`int' in declaration of `__mptr'
drivers/infiniband/hw/ipath/ipath_verbs.h:500: warning: initialization
from incompatible pointer type
drivers/infiniband/hw/ipath/ipath_verbs.h: In function `to_iah':
drivers/infiniband/hw/ipath/ipath_verbs.h:505: warning: type defaults to
`int' in declaration of `__mptr'
drivers/infiniband/hw/ipath/ipath_verbs.h:505: warning: initialization
from incompatible pointer type
drivers/infiniband/hw/ipath/ipath_verbs.h: In function `to_icq':
drivers/infiniband/hw/ipath/ipath_verbs.h:510: warning: type defaults to
`int' in declaration of `__mptr'
drivers/infiniband/hw/ipath/ipath_verbs.h:510: warning: initialization
from incompatible pointer type
drivers/infiniband/hw/ipath/ipath_verbs.h: In function `to_isrq':
drivers/infiniband/hw/ipath/ipath_verbs.h:515: warning: type defaults to
`int' in declaration of `__mptr'
drivers/infiniband/hw/ipath/ipath_verbs.h:515: warning: initialization
from incompatible pointer type
drivers/infiniband/hw/ipath/ipath_verbs.h: In function `to_iqp':
drivers/infiniband/hw/ipath/ipath_verbs.h:520: warning: type defaults to
`int' in declaration of `__mptr'
drivers/infiniband/hw/ipath/ipath_verbs.h:520: warning: initialization
from incompatible pointer type
drivers/infiniband/hw/ipath/ipath_verbs.h: In function `to_idev':
drivers/infiniband/hw/ipath/ipath_verbs.h:525: warning: type defaults to
`int' in declaration of `__mptr'
drivers/infiniband/hw/ipath/ipath_verbs.h:525: warning: initialization
from incompatible pointer type
drivers/infiniband/hw/ipath/ipath_verbs.h: At top level:
drivers/infiniband/hw/ipath/ipath_verbs.h:533: warning: 

[openib-general] Re: Compile problems with core code and pathscale for svn6462 and linux-2.6.17-rc1

2006-04-13 Thread Bryan O'Sullivan
On Thursday 13 April 2006 16:32, Matt Leininger wrote:
 I'm trying to compile the svn 6462 snapshot with linux-2.6.17-rc1 on a
 RHEL4 based system.

Are you building the ipath driver out of the kernel.org tree, or out of svn?  
If the latter, you have to patch the kernel and rebuild it first.

b
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: Compile problems with core code and pathscale for svn6462 and linux-2.6.17-rc1

2006-04-13 Thread Matt Leininger
On Thu, 2006-04-13 at 16:40 -0700, Bryan O'Sullivan wrote:
 On Thursday 13 April 2006 16:32, Matt Leininger wrote:
  I'm trying to compile the svn 6462 snapshot with linux-2.6.17-rc1 on a
  RHEL4 based system.
 
 Are you building the ipath driver out of the kernel.org tree, or out of svn?  
 If the latter, you have to patch the kernel and rebuild it first.

  Out of svn.  I have the drivers/infiniband pointing to the svn tree.  

  I'll try using the drivers in the kernel.org tree.

  - Matt

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: Compile problems with core code and pathscale for svn6462 and linux-2.6.17-rc1

2006-04-13 Thread Bryan O'Sullivan
On Thursday 13 April 2006 16:51, Matt Leininger wrote:

  Are you building the ipath driver out of the kernel.org tree, or out of
  svn? If the latter, you have to patch the kernel and rebuild it first.

   Out of svn.  I have the drivers/infiniband pointing to the svn tree.

Yes, that won't work, because the svn include directory has a bunch of stuff 
that's no upstream.

b
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: Compile problems with core code and pathscale for svn6462 and linux-2.6.17-rc1

2006-04-13 Thread Matt Leininger
On Thu, 2006-04-13 at 16:54 -0700, Bryan O'Sullivan wrote:
 On Thursday 13 April 2006 16:51, Matt Leininger wrote:
 
   Are you building the ipath driver out of the kernel.org tree, or out of
   svn? If the latter, you have to patch the kernel and rebuild it first.
 
Out of svn.  I have the drivers/infiniband pointing to the svn tree.
 
 Yes, that won't work, because the svn include directory has a bunch of stuff 
 that's no upstream.
 
  Ok.  So the current state is that the mainline devel branch will be
broken for a while?

  BTW, the linux-2.6.17-rc1 in-kernel IB compiled fine.

  - Matt

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: Compile problems with core code and pathscale for svn6462 and linux-2.6.17-rc1

2006-04-13 Thread Bryan O'Sullivan
On Thursday 13 April 2006 16:56, Matt Leininger wrote:

 Ok.  So the current state is that the mainline devel branch will be
 broken for a while?

I have no idea.  The current situation is fairly annoying, though.

b
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] RFC: start weaning userspace drivers from sysfs

2006-04-13 Thread Johann George
 As part of the libibverbs 1.1 release, I would like to remove the
 dependency on libsysfs

I highly approve of this move.

 the simple sysfs stuff we need is easy to do directly.

I was looking at it earlier this week and came to the same conclusion.

Johann
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Compile problems with core code and pathscale for svn6462 and linux-2.6.17-rc1

2006-04-13 Thread Roland Dreier
Matt If I remove include/rdma (which I had to do in the past)
Matt then some of the pathscale code fails to compile.  Here is
Matt the error:

Yes, you need the patch below for the ipath directory.  I sent this to
pathscale a while ago but it seems to take a while for patches to make
it from their internal repository to svn...

--- infiniband/hw/ipath/Makefile(revision 6462)
+++ infiniband/hw/ipath/Makefile(working copy)
@@ -1,5 +1,6 @@
 EXTRA_CFLAGS += -DIPATH_IDSTR='PathScale kernel.org driver' \
-   -DIPATH_KERN_TYPE=0
+   -DIPATH_KERN_TYPE=0 \
+   -Idrivers/infiniband/include
 
 obj-$(CONFIG_IPATH_CORE) += ipath_core.o
 obj-$(CONFIG_INFINIBAND_IPATH) += ib_ipath.o
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] RFC: start weaning userspace drivers from sysfs

2006-04-13 Thread Roland Dreier
Bryan Is the goal of this to make sure that new hardware-specific
Bryan libraries will work with old libibverbs?  How likely do you
Bryan think that is to happen?  I don't see much of a problem
Bryan with simply breaking backwards compatibility here, since it
Bryan seems unlikely that someone would update one, but not the
Bryan other.

I just want to decouple things as much as possible, so there doesn't
have to be a flag day cut over from the new world to the old.  This
way we can get low-level drivers out everywhere and then change libibverbs.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] 2.6.17-rc1 IPoIB netperf results

2006-04-13 Thread Matt Leininger
Here are the latest IPoIB results:

For mthca I saw a range of 380-424 MB/s.  The local CPU utilization on
the send side dropped for the 380 MB/s, from 98% to 70%

For ipath it was 310 MB/s.  The local CPU utilization on the send side
was always around 30%.  

  - Matt

Mellanox benchmarks are with RHEL4 x86_64 with HCA FW v4.7.0
dual EM64T 3.2 GHz PCIe IB HCA (memfull)
patch 1 - remove changeset 314324121f9b94b2ca657a494cf2b9cb0e4a28cc
patch 2 - remove changeset b8259d9ad1d0f8d0c5ea0e37bb15080b0bd395b5
msi_x=1 for all tests
PathScale benchmarks are with RHEL4 x86_64 with HTX HCA
dual-socket dual-core Opteron 2.4 GHz 



netperf -f -M -c -C -H IP_ADDRESS

KernelOpenIB netperf (MB/s)  
2.6.17-rc1   in-kernel424 (mthca ipoib)
2.6.17-rc1   in-kernel310 (ipath ipoib)
2.6.16   svn 6307 367 (mthca ipoib)
2.6.16   svn 6307 319 (ipath ipoib)
2.6.16   svn 6083 371 (mthca ipoib)
2.6.16   svn 6083 304 (ipath ipoib)
2.6.16   svn 5938 380 (mthca ipoib)
2.6.16   svn 5938 300 (ipath ipoib)
2.6.16   in-kernel364
2.6.16-rc5   in-kernel367  
2.6.15   in-kernel382
2.6.14-rc4 patch 12  in-kernel436 
2.6.14-rc4 patch 1   in-kernel434 
2.6.14-rc4   in-kernel385 
2.6.14-rc3   in-kernel374 
2.6.13.2 svn3627  386 
2.6.13.2 patch 1 svn3627  446 
2.6.13.2 in-kernel394 
2.6.13-rc3 patch 12  in-kernel442 
2.6.13-rc3 patch 1   in-kernel450 
2.6.13-rc3   in-kernel395
2.6.13-rc2 patch 1   in-kernel409
2.6.13-rc1 patch 1   in-kernel408
2.6.12.5-lustre  in-kernel399  
2.6.12.5 patch 1 in-kernel464
2.6.12.5 in-kernel402 
2.6.12   in-kernel406 
2.6.12-rc6 patch 1   in-kernel470 
2.6.12-rc6   in-kernel407
2.6.12-rc5   in-kernel405 
2.6.12-rc5 patch 1   in-kernel474
2.6.12-rc4   in-kernel470 
2.6.12-rc3   in-kernel466 
2.6.12-rc2   in-kernel469 
2.6.12-rc1   in-kernel466
2.6.11   in-kernel464 
2.6.11   svn3687  464 
2.6.9-11.ELsmp   svn3513  425  (Woody's results, 3.6Ghz EM64T) 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general