[openib-general] Re: [PATCH] mad.c::ib_register_mad_agent: Fix RMPP version check during agent registration
On Thu, 2006-04-13 at 01:44, Roland Dreier wrote: OK, I applied this by hand ... your mailer turned all your tabs into spaces somewhere along the way, so the patch wouldn't apply. Wow. That hasn't happened in a while. I used preformat on evolution the same as the other patches so I'm not sure what's up. Thanks for applying it. -- Hal - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] thanks and a question
Hi again Ron, On Wed, 2006-04-12 at 23:46, Ronald G Minnich wrote: Hal Rosenstock wrote: hoq is HOQLife. Is slv the switch LifeTimeValue ? I believe so. Does that have anything to do with those settings ? it would not work until hoq and slv were 17. Truly hanging ? yes, and it was the only real connection at that point, from the bproc daemon on the slave node to the bproc daemon on the master. There was only 1 host powered up at that point. It was very repeatable -- we tried to get it to boot many times. And, weirdly, it always hung at that same point. Switches might drop 64 bytes at a time based on those parameters. But why does the sender think the segment has been acked, when the receiver has never seen that last 64 bytes? Where did the sender get that TCP-level ack? I don't know. It doesn't make sense. Dropping a buffer (64 bytes) in a packet should cause a CRC error which should mean the TCP packet is not valid. In any case, you should be able to see the drops in the various Port (error) counters. That effectively doubles the time before the drops would occur which probably eliminated the drops so you didn't see this. 16 = 268.435 msec 17 = 526.871 msec which leads to another question. This is 1/2 second. Does it really mean that you could end up buffering 1/2 worth of flow on each port for all 256 ports? It is limited by the number of buffers (per VL per port) which is no where near this so that could not occur. The credits advertised on the link are reduced by the buffers in use so the throughput would slow down on a congested port (meaning either congestion or a slow receiver). What doesn't make sense to me is the one flow. Are you sure there's no other data traffic ? If so, that doesn't make sense to me and hang together with the rest of this scenario. no other traffic that we could see, but there had been traffic prior to this. I would recommend putting an IB analyzer on the last link towards that slave node and capturing the data traffic. -- Hal Thanks hal! ron ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Trying to compile mvapich RHEL4U3 for ib.
Sayantan Sur wrote: Hello Roger, With mvapich-0.9.7 it errors out in the building stage with an error ibv_free_device_list/ibv_get_device_list missing, I cannot find any of the ib libraries on RHEL4U3 that appear to contain that library. Thanks for trying out MVAPICH-0.9.7. Currently, we don't have any machine with RHEL4U3. We are installing two machines with RHEL4U3 and we will try out MVAPICH on that as soon as possible. The verbs `ibv_get_device_list' was introduced before the 1.0 branch. So, if you have either OpenIB installed from the trunk or from the 1.0 branch, you _should_ be able to see this verb in the library. I am wondering if you are trying out the default versions of the OpenIB rpms on RHEL4U3? Yes, I am trying the default version of RHEL4U3, alot of our customers would much rather use unmodified RHEL, though I can probably talk them out of it with a bit of work. They have some strange ideas that RHEL is somehow guaranteed to work right, and from what I can tell it won't completely work just because RH did not include a IB mpi variant, at least not one that I can find. Using the mvapich-gen2-1.src.rpm from openib.org results in these errors (on the first thing it tries to compile). viainit.c: In function `create_cq': viainit.c:118: error: too few arguments to function `ibv_create_cq' This is also due to a verb change made a while back to the ibv_create_cq. I believe this version of mvapich-gen2 source rpm was created against the version of userspace support which is present in the very same .src.rpm (you may install those if you want, though they are a little old now). The userspace verbs changed after this src rpm was created. I have verified that the include file prototype has more arguments, than are contained in viainit.c. Yes, it seems that the RPM you have installed is from somewhere in between the ibv_create_cq verb change and the later introduction of the ibv_get_device list verb. I'm wondering if you could try it out with the latest 1.0 branch of OpenIB? In addition, we will get back to you asap with our testing on RHEL4U3. Thanks, Sayantan. Do you know if it would be possible to just replace the userspace section and not mess with the kernel part of OpenIB? I am guessing from what I have read that this is very possible, and only requires me to remove the already existing RHEL rpms for OpenIB userspace support. Thank you very much. If you guys need access I have 2 test machines that I can give access to to do whatever testing is needed. Roger ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Trying to compile mvapich RHEL4U3 for ib.
Hello Roger, Do you know if it would be possible to just replace the userspace section and not mess with the kernel part of OpenIB? I am guessing from what I have read that this is very possible, and only requires me to remove the already existing RHEL rpms for OpenIB userspace support. IMHO, it should be possible. However, OpenIB userspace and kernel module authors should be able to exactly answer this question. Roland, any thoughts on which SVN version of userspace support may work with the RHEL default RPMs? Thank you very much. If you guys need access I have 2 test machines that I can give access to to do whatever testing is needed. That's great! You can send the login information to me. Thanks, Sayantan. -- http://www.cse.ohio-state.edu/~surs ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Trying to compile mvapich RHEL4U3 for ib.
Sayantan Roland, any thoughts on which SVN version of userspace Sayantan support may work with the RHEL default RPMs? Any version should work. It might be simpler to use stable releases such as libibverbs-1.0.2 and libmthca-1.0.1. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Trying to compile mvapich RHEL4U3 for ib.
Yes, I am trying the default version of RHEL4U3, alot of our customers would much rather use unmodified RHEL, though I can probably talk them out of it with a bit of work. They have some strange ideas that RHEL is somehow guaranteed to work right, and from what I can tell it won't completely work just because RH did not include a IB mpi variant, at least not one that I can find. I didn't try MVAPICH, but I had no luck getting Open MPI 1.0.1 to work with the RHEL4 U3 OpenIB code. The RHEL4 U3 relnotes are pretty clear that its included OpenIB is a technology preview not for production environments, and the APIs are subject to change (which they already did comparing RHEL4 U3 to OF 1.0). I think you are much better off trying the OF 1.0 code. Scott Weitzenkamp SQA Manager Cisco Systems ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Trying to compile mvapich RHEL4U3 for ib.
Scott wrote, I didn't try MVAPICH, but I had no luck getting Open MPI 1.0.1 to work with the RHEL4 U3 OpenIB code. Not sure if you are interested in a comercial MPI or not, but we did test Intel MPI with the RHEL4-U3 code and it worked fine, except on Mellanox DDR cards. woody ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Trying to compile mvapich RHEL4U3 for ib.
Hello Roger, I'm just CC-ing this to openib-general for the community. Thanks for giving us access. I have verified that the `ibv_get_device_list' verb is indeed *missing* from the OpenIB install. I'm afraid that given this Redhat rpm, it is difficult to get mvapich to work (without patching it). As Roland and others have indicated, perhaps the best way is for you to upgrade to atleast the 1.0 branch. That should be the most stable OpenIB release yet. https://openib.org/svn/gen2/branches/1.0/src/userspace/ You should be able to keep the kernel stuff intact and just upgrade the user level support (management, libibverbs, libmthca). You may skip upgrading management, however it'll be best to upgrade it too, lest you face any OpenSM issues. Thanks, Sayantan. * On Apr,4 Sayantan Sur[EMAIL PROTECTED] wrote : Hello Roger, Do you know if it would be possible to just replace the userspace section and not mess with the kernel part of OpenIB? I am guessing from what I have read that this is very possible, and only requires me to remove the already existing RHEL rpms for OpenIB userspace support. IMHO, it should be possible. However, OpenIB userspace and kernel module authors should be able to exactly answer this question. Roland, any thoughts on which SVN version of userspace support may work with the RHEL default RPMs? Thank you very much. If you guys need access I have 2 test machines that I can give access to to do whatever testing is needed. That's great! You can send the login information to me. Thanks, Sayantan. -- http://www.cse.ohio-state.edu/~surs ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- http://www.cse.ohio-state.edu/~surs ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] IB/ipath: Fix whitespace
Signed-off-by: Roland Dreier [EMAIL PROTECTED] --- Nothing but replacing spaces with tabs. Please apply to svn and let me know if it's OK to queue for upstream. BTW, any progress on reviewing the static function cleanups I sent earlier? drivers/infiniband/hw/ipath/ipath_intr.c |4 + drivers/infiniband/hw/ipath/ipath_verbs.c | 114 +++-- 2 files changed, 59 insertions(+), 59 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c index 60f5f41..0bcb428 100644 --- a/drivers/infiniband/hw/ipath/ipath_intr.c +++ b/drivers/infiniband/hw/ipath/ipath_intr.c @@ -172,8 +172,8 @@ static void handle_e_ibstatuschanged(str was %s\n, dd-ipath_unit, ib_linkstate(lstate), ib_linkstate((unsigned) - dd-ipath_lastibcstat -IPATH_IBSTATE_MASK)); + dd-ipath_lastibcstat +IPATH_IBSTATE_MASK)); } else { lstate = dd-ipath_lastibcstat IPATH_IBSTATE_MASK; diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c index e3be492..8d2558a 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.c +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c @@ -1125,26 +1125,26 @@ static void __exit ipath_verbs_cleanup(v static ssize_t show_rev(struct class_device *cdev, char *buf) { -struct ipath_ibdev *dev = -container_of(cdev, struct ipath_ibdev, ibdev.class_dev); -int vendor, boardrev, majrev, minrev; - -ipath_layer_query_device(dev-dd, vendor, boardrev, - majrev, minrev); -return sprintf(buf, %d.%d\n, majrev, minrev); + struct ipath_ibdev *dev = + container_of(cdev, struct ipath_ibdev, ibdev.class_dev); + int vendor, boardrev, majrev, minrev; + + ipath_layer_query_device(dev-dd, vendor, boardrev, +majrev, minrev); + return sprintf(buf, %d.%d\n, majrev, minrev); } static ssize_t show_hca(struct class_device *cdev, char *buf) { -struct ipath_ibdev *dev = -container_of(cdev, struct ipath_ibdev, ibdev.class_dev); -int ret; - -ret = ipath_layer_get_boardname(dev-dd, buf, 128); -if (ret 0) -goto bail; -strcat(buf, \n); -ret = strlen(buf); + struct ipath_ibdev *dev = + container_of(cdev, struct ipath_ibdev, ibdev.class_dev); + int ret; + + ret = ipath_layer_get_boardname(dev-dd, buf, 128); + if (ret 0) + goto bail; + strcat(buf, \n); + ret = strlen(buf); bail: return ret; @@ -1152,40 +1152,40 @@ bail: static ssize_t show_stats(struct class_device *cdev, char *buf) { -struct ipath_ibdev *dev = -container_of(cdev, struct ipath_ibdev, ibdev.class_dev); -int i; -int len; - -len = sprintf(buf, - RC resends %d\n - RC QACKs%d\n - RC ACKs %d\n - RC SEQ NAKs %d\n - RC RDMA seq %d\n - RC RNR NAKs %d\n - RC OTH NAKs %d\n - RC timeouts %d\n - RC RDMA dup %d\n - piobuf wait %d\n - no piobuf %d\n - PKT drops %d\n - WQE errs%d\n, - dev-n_rc_resends, dev-n_rc_qacks, dev-n_rc_acks, - dev-n_seq_naks, dev-n_rdma_seq, dev-n_rnr_naks, - dev-n_other_naks, dev-n_timeouts, - dev-n_rdma_dup_busy, dev-n_piowait, - dev-n_no_piobuf, dev-n_pkt_drops, dev-n_wqe_errs); -for (i = 0; i ARRAY_SIZE(dev-opstats); i++) { + struct ipath_ibdev *dev = + container_of(cdev, struct ipath_ibdev, ibdev.class_dev); + int i; + int len; + + len = sprintf(buf, + RC resends %d\n + RC QACKs%d\n + RC ACKs %d\n + RC SEQ NAKs %d\n + RC RDMA seq %d\n + RC RNR NAKs %d\n + RC OTH NAKs %d\n + RC timeouts %d\n + RC RDMA dup %d\n + piobuf wait %d\n + no piobuf %d\n + PKT drops %d\n + WQE errs%d\n, + dev-n_rc_resends, dev-n_rc_qacks, dev-n_rc_acks, + dev-n_seq_naks, dev-n_rdma_seq, dev-n_rnr_naks, + dev-n_other_naks, dev-n_timeouts, +
[openib-general][PATCH] srp: tuned parameters,
Hi Roland, Please review this patch + introducing srp_sg_tablesize as module parameter + adjusting SRP_MAX_IU_LEN, SRP_MAX_INDIRECT from srp_sg_tablesize + throttling command per lun ie. max_cmd_per_lun can be passed in when adding target (same as max_sect) Signed-off-by: Vu Pham [EMAIL PROTECTED] Index: infiniband/ulp/srp/ib_srp.c === --- infiniband/ulp/srp/ib_srp.c (revision 6455) +++ infiniband/ulp/srp/ib_srp.c (working copy) @@ -62,6 +62,12 @@ MODULE_DESCRIPTION(InfiniBand SCSI RDMA v DRV_VERSION ( DRV_RELDATE )); MODULE_LICENSE(Dual BSD/GPL); +int srp_sg_tablesize = SRP_MAX_SG_TABLESIZE; + +module_param(srp_sg_tablesize, int, 0444); +MODULE_PARM_DESC(srp_sg_tablesize, + Max number of scatter lists supportted per IO - default is 32); + static int topspin_workarounds = 1; module_param(topspin_workarounds, int, 0444); @@ -1325,7 +1331,6 @@ static struct scsi_host_template srp_tem .eh_host_reset_handler = srp_reset_host, .can_queue = SRP_SQ_SIZE, .this_id = -1, - .sg_tablesize = SRP_MAX_INDIRECT, .cmd_per_lun = SRP_SQ_SIZE, .use_clustering = ENABLE_CLUSTERING, .shost_attrs = srp_host_attrs @@ -1381,6 +1386,7 @@ enum { SRP_OPT_PKEY = 1 3, SRP_OPT_SERVICE_ID = 1 4, SRP_OPT_MAX_SECT = 1 5, + SRP_OPT_MAX_CMD_PER_LUN = 1 6, SRP_OPT_ALL = (SRP_OPT_ID_EXT | SRP_OPT_IOC_GUID | SRP_OPT_DGID | @@ -1389,13 +1395,14 @@ enum { }; static match_table_t srp_opt_tokens = { - { SRP_OPT_ID_EXT, id_ext=%s }, - { SRP_OPT_IOC_GUID, ioc_guid=%s }, - { SRP_OPT_DGID, dgid=%s }, - { SRP_OPT_PKEY, pkey=%x }, - { SRP_OPT_SERVICE_ID, service_id=%s }, - { SRP_OPT_MAX_SECT, max_sect=%d }, - { SRP_OPT_ERR, NULL } + { SRP_OPT_ID_EXT, id_ext=%s }, + { SRP_OPT_IOC_GUID, ioc_guid=%s }, + { SRP_OPT_DGID, dgid=%s }, + { SRP_OPT_PKEY, pkey=%x }, + { SRP_OPT_SERVICE_ID, service_id=%s }, + { SRP_OPT_MAX_SECT, max_sect=%d }, + { SRP_OPT_MAX_CMD_PER_LUN, max_cmd_per_lun=%d }, + { SRP_OPT_ERR, NULL } }; static int srp_parse_options(const char *buf, struct srp_target_port *target) @@ -1471,6 +1478,14 @@ static int srp_parse_options(const char target-scsi_host-max_sectors = token; break; + case SRP_OPT_MAX_CMD_PER_LUN: + if (match_int(args, token)) { +printk(KERN_WARNING PFX bad max cmd_per_lun parameter '%s'\n, p); +goto out; + } + target-scsi_host-cmd_per_lun = token; + break; + default: printk(KERN_WARNING PFX unknown parameter or missing value '%s' in target creation request\n, p); @@ -1509,6 +1524,7 @@ static ssize_t srp_create_target(struct return -ENOMEM; target_host-max_lun = SRP_MAX_LUN; + target_host-sg_tablesize = srp_sg_tablesize; target = host_to_target(target_host); memset(target, 0, sizeof *target); Index: infiniband/ulp/srp/ib_srp.h === --- infiniband/ulp/srp/ib_srp.h (revision 6455) +++ infiniband/ulp/srp/ib_srp.h (working copy) @@ -47,6 +47,8 @@ #include rdma/ib_sa.h #include rdma/ib_cm.h +extern int srp_sg_tablesize; + enum { SRP_PATH_REC_TIMEOUT_MS = 1000, SRP_ABORT_TIMEOUT_MS = 5000, @@ -55,7 +57,7 @@ enum { SRP_DLID_REDIRECT = 2, SRP_MAX_LUN = 512, - SRP_MAX_IU_LEN = 256, + SRP_MAX_SG_TABLESIZE = 32, SRP_RQ_SHIFT = 6, SRP_RQ_SIZE = 1 SRP_RQ_SHIFT, @@ -66,9 +68,10 @@ enum { }; #define SRP_OP_RECV (1 31) -#define SRP_MAX_INDIRECT ((SRP_MAX_IU_LEN - \ - sizeof (struct srp_cmd) - \ - sizeof (struct srp_indirect_buf)) / 16) +#define SRP_MAX_INDIRECT srp_sg_tablesize +#define SRP_MAX_IU_LEN (srp_sg_tablesize * 16 + \ + sizeof (struct srp_cmd) + \ + sizeof (struct srp_indirect_buf)) \ enum srp_target_state { SRP_TARGET_LIVE, ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Fix for ibping
Works like a charm... -Viswa On 12 Apr 2006 21:32:33 -0400, Hal Rosenstock [EMAIL PROTECTED] wrote: On Wed, 2006-04-12 at 20:46, Hal Rosenstock wrote: On Wed, 2006-04-12 at 18:25, Viswanath Krishnamurthy wrote: The RMPP version needs to be 1. Thanks. I'm not sure what changed here to require this. I need to do some more digging.I figured it out. The fix is in r6448. Can you update and try it ?Thanks.-- Hal -- Hal [EMAIL PROTECTED] src]# svn diff ibping.c Index: ibping.c === -- ibping.c(revision 6446) +++ ibping.c(working copy) @@ -336,7 +336,7 @@ exit(0); } - if (mad_register_client(ping_class, 0) 0) + if (mad_register_client(ping_class, 1) 0) IBERROR(can't register to ping class %d, ping_class); if (ib_resolve_portid_str(portid, argv[0], dest_type, sm_id) 0) __ ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: [PATCH] git: updates to rdma_cm branch
Roland Dreier wrote: OK, I updated my rdma_cm branch with all of this. In addition I put the following in -- it's idiomatic in the kernel to let the compiler handle htons(A_CONSTANT) in code. Should I commit this to svn too? This change is fine. Please commit to svn too. Thanks. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general][patch review] srp: fmr implementation,
Hi Roland, Apr 7 18:17:17 lab105 kernel: Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6b6b I think I fixed the bug causing this oops (I was able to reproduce it, and I don't see it any more). I checked the following patch in and queued it for kernel 2.6.17: My ia64 system still crashes with the patch applied. Please see log below Apr 13 13:10:21 lab105 kernel: Abort for req_index 1 Apr 13 13:10:26 lab105 kernel: ib_srp: SRP reset_host called Apr 13 13:10:28 lab105 kernel: ib_srp: connection closed Apr 13 13:10:28 lab105 kernel: Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6b6b Apr 13 13:10:28 lab105 kernel: scsi_eh_2[13324]: Oops 11012296146944 [1] Apr 13 13:10:28 lab105 kernel: Modules linked in: ib_srp ib_cm ib_sa evdev joydev sg st sr_mod ide_cd cdrom usbserial parport_pc lp parport ipv6 thermal processor fan button binfmt_misc usbhid ib_mthca ib_mad ib_core ehci_hcd uhci_hcd usbcore i2c_i801 i2c_core e1000 nls_iso8859_1 nls_cp437 dm_mod reiserfs mptspi scsi_transport_spi mptscsih mptbase sd_mod scsi_mod Apr 13 13:10:28 lab105 kernel: Apr 13 13:10:28 lab105 kernel: Pid: 13324, CPU 1, comm:scsi_eh_2 Apr 13 13:10:28 lab105 kernel: psr : 121008026018 ifs : 850d ip : [a0020235a0f1]Not tainted Apr 13 13:10:28 lab105 kernel: ip is at srp_reconnect_target+0x2b1/0x5c0 [ib_srp] Apr 13 13:10:28 lab105 kernel: unat: pfs : 050d rsc : 0003 Apr 13 13:10:28 lab105 kernel: rnat: bsps: pr : 9541 Apr 13 13:10:28 lab105 kernel: ldrs: ccv : fpsr: 0009804c8a70433f Apr 13 13:10:28 lab105 kernel: csd : ssd : Apr 13 13:10:28 lab105 kernel: b0 : a0020235a060 b6 : a0013320 b7 : a002023ddd80 Apr 13 13:10:28 lab105 kernel: f6 : 1003e6b6b6b6b6b6b6b6b f7 : 0ffdd8000 Apr 13 13:10:28 lab105 kernel: f8 : 1003e3598 f9 : 1003e0118 Apr 13 13:10:28 lab105 kernel: f10 : 1003e f11 : 1003e Apr 13 13:10:28 lab105 kernel: r1 : a0020235c200 r2 : e001e58f8b58 r3 : e0018d748a40 Apr 13 13:10:28 lab105 kernel: r8 : e001e58f8ba8 r9 : e001e58f89f8 r10 : a00100931338 Apr 13 13:10:28 lab105 kernel: r11 : 0001 r12 : e001ea8f7d00 r13 : e001ea8f Apr 13 13:10:28 lab105 kernel: r14 : a00100931340 r15 : e001ea8f r16 : 0001 Apr 13 13:10:28 lab105 kernel: r17 : 0001 r18 : e001ea8f0f84 r19 : a00100931348 Apr 13 13:10:28 lab105 kernel: r20 : r21 : 0008 r22 : e479c980 Apr 13 13:10:28 lab105 kernel: r23 : e001f5e7a920 r24 : 0080 r25 : e479c99f Apr 13 13:10:28 lab105 kernel: r26 : a002023ddd80 r27 : e00187d4c1e0 r28 : e00187d4c000 Apr 13 13:10:28 lab105 kernel: r29 : e001f5e7a880 r30 : e0018d748ab8 r31 : e0018d748a20 Apr 13 13:10:28 lab105 kernel: Apr 13 13:10:28 lab105 kernel: Call Trace: Apr 13 13:10:28 lab105 kernel: [a00100013000] show_stack+0x80/0xa0 Apr 13 13:10:28 lab105 kernel: sp=e001ea8f7880 bsp=e001ea8f1308 Apr 13 13:10:28 lab105 kernel: [a00100013860] show_regs+0x840/0x880 Apr 13 13:10:28 lab105 kernel: sp=e001ea8f7a50 bsp=e001ea8f12a8 Apr 13 13:10:28 lab105 kernel: [a00100035a10] die+0x1b0/0x2e0 Apr 13 13:10:28 lab105 kernel: sp=e001ea8f7a60 bsp=e001ea8f1260 Apr 13 13:10:28 lab105 kernel: [a00100057840] ia64_do_page_fault+0x9a0/0xb20 Apr 13 13:10:28 lab105 kernel: sp=e001ea8f7a80 bsp=e001ea8f11f0 Apr 13 13:10:28 lab105 kernel: [a001bc80] ia64_leave_kernel+0x0/0x280 Apr 13 13:10:28 lab105 kernel: sp=e001ea8f7b30 bsp=e001ea8f11f0 Apr 13 13:10:28 lab105 kernel: [a0020235a0f0] srp_reconnect_target+0x2b0/0x5c0 [ib_srp] Apr 13 13:10:28 lab105 kernel: sp=e001ea8f7d00 bsp=e001ea8f1188 Apr 13 13:10:28 lab105 kernel: [a0020235a460] srp_reset_host+0x60/0xa0 [ib_srp] Apr 13 13:10:28 lab105 kernel: sp=e001ea8f7dc0 bsp=e001ea8f1160 Apr 13 13:10:28 lab105 kernel: [a00201b2f4d0] scsi_try_host_reset+0xd0/0x240 [scsi_mod] Apr 13 13:10:28 lab105 kernel: sp=e001ea8f7dc0 bsp=e001ea8f1130 Apr 13 13:10:28 lab105 kernel: [a00201b320a0] scsi_error_handler+0x1860/0x2000 [scsi_mod] Apr 13 13:10:28 lab105 kernel: sp=e001ea8f7dc0 bsp=e001ea8f1040 Apr 13 13:10:28 lab105 kernel: [a001000b98e0] kthread+0x220/0x280 Apr 13 13:10:28 lab105 kernel: sp=e001ea8f7e10 bsp=e001ea8f1000 Apr 13 13:10:28 lab105 kernel: [a00100011440] kernel_thread_helper+0xe0/0x100 Apr 13 13:10:28 lab105 kernel: sp=e001ea8f7e30 bsp=e001ea8f0fd0 Apr 13 13:10:28 lab105 kernel: [a0019140] start_kernel_thread+0x20/0x40 Apr 13 13:10:28 lab105 kernel: sp=e001ea8f7e30 bsp=e001ea8f0fd0 Apr
Re: [openib-general] RDMA RC QP returning RNR Retry Counter Exceeded Error
Ira Weiny wrote: I have started writing a simple RDMA app which uses the rdmacm. I have gotten the connection established, QP's and MR's set up, and have sent the RDMA ETH. However, more and more I am getting the RNR Retry Counter Exceeded error back from the client's post send of the RDMA ETH. About 1/10 times it will work but most of the time it does not. I have figured out that you can't set the IBV_QP_RNR_RETRY attribute unless you go from RTR to RTS. The state of the QP is RTS and the IBV_QP_RNR_RETRY value is 0 as set by the rdmacm. Do I have to, or can I, transition the QP from RTS to RTR and then back again to set the IBV_QP_RNR_RETRY? You cannot transition a QP from RTS to RTR. Did you post receive buffers before you complete the connection? Also, what's RDMA ETH? - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] New diags tool available
Hi, With svn r6460, a new diags tool is now available on the trunk. It is Ira Weiny's saquery. (Thanks for bearing with me on this). saquery tool obtains information based on node name: saquery -h Usage: saquery [-h -d -P -N -L -G][name] Queries node records by default -d enable debugging -P get PathRecord info -N get NodeRecord info -L Return just the Lid of the name specified -G Return just the Guid of the name specified -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] opensm issues on 64 node RHEL4 cluster?
We just moved a cluster over to the latest redhat release, and opensm seems to be having issues. This is running the redhat provided kernel and opensm packages [EMAIL PROTECTED] troy]# uname -r 2.6.9-34.ELsmp [EMAIL PROTECTED] troy]# cat /etc/redhat-release Red Hat Enterprise Linux WS release 4 (Nahant Update 3) [EMAIL PROTECTED] troy]# rpm -qi opensm Name: opensm Relocations: (not relocatable) Version : 1.0 Vendor: Red Hat, Inc. Release : 0.4265.2.EL4 Build Date: Thu 02 Feb 2006 02:24:15 PM CST Install Date: Tue 14 Mar 2006 12:35:09 PM CST Build Host: hs20-bc1-7.build.redhat.com Group : System Environment/Base Source RPM: opensm-1.0-0.4265.2.EL4.src.rpm Size: 1122289 License: GPL/BSD Signature : DSA/SHA1, Thu 16 Feb 2006 01:45:15 PM CST, Key ID 219180cddb42a60e Packager: Red Hat, Inc. http://bugzilla.redhat.com/bugzilla URL : https://openib.org/svn/gen2/trunk The opensm log file is at: http://scl.ameslab.gov/~troy/64-node-RHEL4-osm.log.gz Should I go ahead and grab the opensm from the latest subversion and see if it's any better? ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] opensm issues on 64 node RHEL4 cluster?
Hi Troy, On Thu, 2006-04-13 at 15:35, Troy Benjegerdes wrote: We just moved a cluster over to the latest redhat release, and opensm seems to be having issues. This is running the redhat provided kernel and opensm packages [EMAIL PROTECTED] troy]# uname -r 2.6.9-34.ELsmp [EMAIL PROTECTED] troy]# cat /etc/redhat-release Red Hat Enterprise Linux WS release 4 (Nahant Update 3) [EMAIL PROTECTED] troy]# rpm -qi opensm Name: opensm Relocations: (not relocatable) Version : 1.0 Vendor: Red Hat, Inc. Release : 0.4265.2.EL4 Build Date: Thu 02 Feb 2006 02:24:15 PM CST Install Date: Tue 14 Mar 2006 12:35:09 PM CST Build Host: hs20-bc1-7.build.redhat.com Group : System Environment/Base Source RPM: opensm-1.0-0.4265.2.EL4.src.rpm Size: 1122289 License: GPL/BSD Signature : DSA/SHA1, Thu 16 Feb 2006 01:45:15 PM CST, Key ID 219180cddb42a60e Packager: Red Hat, Inc. http://bugzilla.redhat.com/bugzilla URL : https://openib.org/svn/gen2/trunk The opensm log file is at: http://scl.ameslab.gov/~troy/64-node-RHEL4-osm.log.gz Should I go ahead and grab the opensm from the latest subversion and see if it's any better? If that is the technology preview, then using OpenSM from either OF 1.0 rc2 or from the trunk _should_ be much better especially in your environment. Note you that if you do this, you would also need the management libraries as well as OpenSM. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] RFC: start weaning userspace drivers from sysfs
As part of the libibverbs 1.1 release, I would like to remove the dependency on libsysfs, since libsysfs is not very well maintained, not consistent across distros, and the simple sysfs stuff we need is easy to do directly. In this direction, I've already made some changes to libibverbs to reduce its internal use of sysfs. However, sysfs is embedded in the ABI between libibverbs and low-level drivers: libibverbs looks for a function in each driver with the name openib_driver_init and calls it with a struct sysfs_class_device *. To fix this in libibverbs 1.1 (which will break ABI from libibverbs 1.0), I propose to replace the driver entry point with a new entry point that looks like struct ibv_device *ibv_driver_init(const char *uverbs_sys_path, int abi_version); where uverbs_sys_path will be a string like /sys/class/infiniband_verbs/uverbs0 and abi_version will be the contents of the file abi_version under that path, or 0 if the file is not present. (This last parameter is just to save every low-level driver from implementing the same code to read the standard abi_version sysfs attribute). However, we can move low-level drivers in this direction in a piecemeal, forwards and backwards compatible way: just add a new ibv_driver_init entry point, but leave the old openib_driver_init entry point there and make it a simple wrapper around the new function. As an example, here's a patch to libmthca that does that. Thoughts? Thanks, Roland --- src/userspace/libmthca/configure.in (revision 6431) +++ src/userspace/libmthca/configure.in (working copy) @@ -12,16 +12,21 @@ dnl Checks for programs AC_PROG_CC dnl Checks for libraries +AC_CHECK_LIB(ibverbs, ibv_get_device_list, [], +AC_MSG_ERROR([ibv_get_device_list() not found. libmthca requires libibverbs.])) dnl Checks for header files. AC_CHECK_HEADER(infiniband/driver.h, [], -AC_MSG_ERROR([infiniband/driver.h not found. Is libibverbs installed?])) +AC_MSG_ERROR([infiniband/driver.h not found. libmthca requires libibverbs.])) AC_HEADER_STDC dnl Checks for typedefs, structures, and compiler characteristics. AC_C_CONST AC_CHECK_SIZEOF(long) +dnl Checks for library functions +AC_CHECK_FUNCS(ibv_read_sysfs_file) + AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script, if test -n `$LD --help /dev/null 2/dev/null | grep version-script`; then ac_cv_version_script=yes --- src/userspace/libmthca/src/mthca.map(revision 6431) +++ src/userspace/libmthca/src/mthca.map(working copy) @@ -1,4 +1,6 @@ { - global: openib_driver_init; + global: + ibv_driver_init; + openib_driver_init; local: *; }; --- src/userspace/libmthca/src/mthca.c (revision 6431) +++ src/userspace/libmthca/src/mthca.c (working copy) @@ -217,29 +217,53 @@ static struct ibv_device_ops mthca_dev_o .free_context = mthca_free_context }; -struct ibv_device *openib_driver_init(struct sysfs_class_device *sysdev) +/* + * Keep a private implementation of HAVE_IBV_READ_SYSFS_FILE to handle + * old versions of libibverbs that didn't implement it. This can be + * removed when libibverbs 1.0.3 or newer is available everywhere. + */ +#ifndef HAVE_IBV_READ_SYSFS_FILE +static int ibv_read_sysfs_file(const char *dir, const char *file, + char *buf, size_t size) +{ + char path[256]; + int fd; + int len; + + snprintf(path, sizeof path, %s/%s, dir, file); + + fd = open(path, O_RDONLY); + if (fd 0) + return -1; + + len = read(fd, buf, size); + + close(fd); + + if (len 0 buf[len - 1] == '\n') + buf[--len] = '\0'; + + return len; +} +#endif /* HAVE_IBV_READ_SYSFS_FILE */ + +struct ibv_device *ibv_driver_init(const char *uverbs_sys_path, + int abi_version) { - struct sysfs_device*pcidev; - struct sysfs_attribute *attr; + charvalue[8]; struct mthca_device*dev; unsignedvendor, device; int i; - pcidev = sysfs_get_classdev_device(sysdev); - if (!pcidev) + if (ibv_read_sysfs_file(uverbs_sys_path, device/vendor, + value, sizeof value) 0) return NULL; + sscanf(value, %i, vendor); - attr = sysfs_get_device_attr(pcidev, vendor); - if (!attr) + if (ibv_read_sysfs_file(uverbs_sys_path, device/device, + value, sizeof value) 0) return NULL; - sscanf(attr-value, %i, vendor); - sysfs_close_attribute(attr); - - attr = sysfs_get_device_attr(pcidev, device); - if (!attr) - return NULL; - sscanf(attr-value, %i, device); - sysfs_close_attribute(attr); + sscanf(value, %i, device); for (i = 0;
Re: [openib-general][patch review] srp: fmr implementation,
Hmm, it's clearly a use-after-free bug. Based on ip is at srp_reconnect_target+0x2b1/0x5c0 [ib_srp] can you guess where it is in the SRP driver or what it's accessing? Also this is happening because the connection is being reconnected, because SCSI commands are timing out. Do you have any idea why this is happening? What does the target see when this happens? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general][patch review] srp: fmr implementation,
Roland Hmm, it's clearly a use-after-free bug. (...because 6b is the slab poisoning free value, and the oops is at 6b6b6b6b6b6b6b6b...) ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general][patch review] srp: fmr implementation,
One stupid but useful way to narrow this down would be to reproduce the crash with the following patch applied on top... Index: linux-kernel/infiniband/ulp/srp/ib_srp.c === --- linux-kernel.orig/infiniband/ulp/srp/ib_srp.c 2006-04-12 12:24:37.398566000 -0700 +++ linux-kernel/infiniband/ulp/srp/ib_srp.c2006-04-13 13:57:45.793412000 -0700 @@ -428,7 +428,12 @@ target-state = SRP_TARGET_CONNECTING; spin_unlock_irq(target-scsi_host-host_lock); + printk(KERN_ERR %s/%d: about to disconnect...\n, __func__, __LINE__); + srp_disconnect_target(target); + + printk(KERN_ERR %s/%d: disconnected...\n, __func__, __LINE__); + /* * Now get a new local CM ID so that we avoid confusing the * target in case things are really fouled up. @@ -442,23 +447,33 @@ ib_destroy_cm_id(target-cm_id); target-cm_id = new_cm_id; + printk(KERN_ERR %s/%d: got a new CM ID...\n, __func__, __LINE__); + qp_attr.qp_state = IB_QPS_RESET; ret = ib_modify_qp(target-qp, qp_attr, IB_QP_STATE); if (ret) goto err; + printk(KERN_ERR %s/%d: Reset QP...\n, __func__, __LINE__); + ret = srp_init_qp(target, target-qp); if (ret) goto err; + printk(KERN_ERR %s/%d: Init QP...\n, __func__, __LINE__); + while (ib_poll_cq(target-cq, 1, wc) 0) ; /* nothing */ + printk(KERN_ERR %s/%d: cleared CQ...\n, __func__, __LINE__); + list_for_each_entry(req, target-req_queue, list) { req-scmnd-result = DID_RESET 16; req-scmnd-scsi_done(req-scmnd); } + printk(KERN_ERR %s/%d: cleared request queue...\n, __func__, __LINE__); + target-rx_head = 0; target-tx_head = 0; target-tx_tail = 0; @@ -468,10 +483,14 @@ target-req_ring[SRP_SQ_SIZE - 1].next = -1; INIT_LIST_HEAD(target-req_queue); + printk(KERN_ERR %s/%d: reinited req ring...\n, __func__, __LINE__); + ret = srp_connect_target(target); if (ret) goto err; + printk(KERN_ERR %s/%d: connected target...\n, __func__, __LINE__); + spin_lock_irq(target-scsi_host-host_lock); if (target-state == SRP_TARGET_CONNECTING) { ret = 0; ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: [uDAPL] dtest server never ends when using the dapl provider OpenIB-scm1
Dotan Barak wrote: Hi. thanks for the quick response. I executed the dtest with the -v parameter and here is the output of both sides. I added the test the '-l' parameter to be able to change to dapl provider in command line (if you wish i can post you a patch). full server output: --- sw043:/tmp/tsscr/svn.mlx_tp/gen2/userspace/ulps/udapl/dtest # ./dtest -l OpenIB-scm2 -v 23996 DAPL_PROVIDER is OpenIB-scm2 23996 Verbose 23996 Running as server 23996 Allocated RDMA buffers (r:0x8052390,s:0x8052618) len 64 23996 Opened Interface Adaptor ... 23996 waiting for message receive event 23996 inbound message; message arrived! 23996 SERVER: RCV buffer 0x80525d0 contains: 0x55 len=64 23996 SERVER: SND buffer 0x8052858 contains: 0xffaa len=64 23996 calling post_send 23996 send_msg completed 23996 do_ping_pong_msg complete 23996 Disconnect and Free EP 0x805f518 Hmm, not sure what this thread is waiting on. I would expect to see the dat_ep_disconnect messages before the wait complete or at least the dat_ep_disconnect message indicating a blocking disconnect call. The next 3 messages expected are as follow: dat_ep_disconnect dat_ep_disconnect completed dat_evd_wait for h_conn_evd completed Can you attach to the server process with gdb and get me a back trace from each of the threads? What does driver IBED-1.0-rc3 consist of? Thanks, -arlin ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: [uDAPL] dtest server never ends when using the dapl provider OpenIB-scm1
Arlin Davis wrote: What does driver IBED-1.0-rc3 consist of? I think that we want all IBED release issues to go directly to the IBED release team. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Compile problems with core code and pathscale for svn6462 and linux-2.6.17-rc1
I'm trying to compile the svn 6462 snapshot with linux-2.6.17-rc1 on a RHEL4 based system. I get the following error for addr.c: CC [M] drivers/infiniband/core/index.o CC [M] drivers/infiniband/core/addr.o In file included from drivers/infiniband/core/addr.c:38: drivers/infiniband/include/rdma/ib_addr.h:43: error: field `dev_type' has incomplete type drivers/infiniband/core/addr.c: In function `copy_addr': drivers/infiniband/core/addr.c:95: error: `RDMA_NODE_IB_CA' undeclared (first use in this function) drivers/infiniband/core/addr.c:95: error: (Each undeclared identifier is reported only once drivers/infiniband/core/addr.c:95: error: for each function it appears in.) drivers/infiniband/core/addr.c:98: error: `RDMA_NODE_RNIC' undeclared (first use in this function) make[3]: *** [drivers/infiniband/core/addr.o] Error 1 make[2]: *** [drivers/infiniband/core] Error 2 make[1]: *** [drivers/infiniband] Error 2 If I remove include/rdma (which I had to do in the past) then some of the pathscale code fails to compile. Here is the error: LD [M] drivers/infiniband/core/rdma_ucm.o CC [M] drivers/infiniband/hw/ipath/ipath_cq.o In file included from drivers/infiniband/hw/ipath/ipath_cq.c:36: drivers/infiniband/hw/ipath/ipath_verbs.h:40:26: rdma/ib_pack.h: No such file or directory In file included from drivers/infiniband/hw/ipath/ipath_cq.c:36: drivers/infiniband/hw/ipath/ipath_verbs.h:128: error: field `grh' has incomplete type drivers/infiniband/hw/ipath/ipath_verbs.h:147: error: field `mgid' has incomplete type drivers/infiniband/hw/ipath/ipath_verbs.h:155: error: field `ibmr' has incomplete type drivers/infiniband/hw/ipath/ipath_verbs.h:161: error: field `ibfmr' has incomplete type drivers/infiniband/hw/ipath/ipath_verbs.h:168: error: field `ibpd' has incomplete type drivers/infiniband/hw/ipath/ipath_verbs.h:174: error: field `ibah' has incomplete type drivers/infiniband/hw/ipath/ipath_verbs.h:175: error: field `attr' has incomplete type drivers/infiniband/hw/ipath/ipath_verbs.h:223: error: field `ibcq' has incomplete type drivers/infiniband/hw/ipath/ipath_verbs.h:239: error: field `wr' has incomplete type drivers/infiniband/hw/ipath/ipath_verbs.h:269: error: field `ibsrq' has incomplete type drivers/infiniband/hw/ipath/ipath_verbs.h:284: error: field `ibqp' has incomplete type drivers/infiniband/hw/ipath/ipath_verbs.h:288: error: field `remote_ah_attr' has incomplete type drivers/infiniband/hw/ipath/ipath_verbs.h:331: error: field `path_mtu' has incomplete type drivers/infiniband/hw/ipath/ipath_verbs.h:412: error: field `ibdev' has incomplete type drivers/infiniband/hw/ipath/ipath_verbs.h:485: error: field `ibucontext' has incomplete type drivers/infiniband/hw/ipath/ipath_verbs.h: In function `to_imr': drivers/infiniband/hw/ipath/ipath_verbs.h:490: warning: type defaults to `int' in declaration of `__mptr' drivers/infiniband/hw/ipath/ipath_verbs.h:490: warning: initialization from incompatible pointer type drivers/infiniband/hw/ipath/ipath_verbs.h: In function `to_ifmr': drivers/infiniband/hw/ipath/ipath_verbs.h:495: warning: type defaults to `int' in declaration of `__mptr' drivers/infiniband/hw/ipath/ipath_verbs.h:495: warning: initialization from incompatible pointer type drivers/infiniband/hw/ipath/ipath_verbs.h: In function `to_ipd': drivers/infiniband/hw/ipath/ipath_verbs.h:500: warning: type defaults to `int' in declaration of `__mptr' drivers/infiniband/hw/ipath/ipath_verbs.h:500: warning: initialization from incompatible pointer type drivers/infiniband/hw/ipath/ipath_verbs.h: In function `to_iah': drivers/infiniband/hw/ipath/ipath_verbs.h:505: warning: type defaults to `int' in declaration of `__mptr' drivers/infiniband/hw/ipath/ipath_verbs.h:505: warning: initialization from incompatible pointer type drivers/infiniband/hw/ipath/ipath_verbs.h: In function `to_icq': drivers/infiniband/hw/ipath/ipath_verbs.h:510: warning: type defaults to `int' in declaration of `__mptr' drivers/infiniband/hw/ipath/ipath_verbs.h:510: warning: initialization from incompatible pointer type drivers/infiniband/hw/ipath/ipath_verbs.h: In function `to_isrq': drivers/infiniband/hw/ipath/ipath_verbs.h:515: warning: type defaults to `int' in declaration of `__mptr' drivers/infiniband/hw/ipath/ipath_verbs.h:515: warning: initialization from incompatible pointer type drivers/infiniband/hw/ipath/ipath_verbs.h: In function `to_iqp': drivers/infiniband/hw/ipath/ipath_verbs.h:520: warning: type defaults to `int' in declaration of `__mptr' drivers/infiniband/hw/ipath/ipath_verbs.h:520: warning: initialization from incompatible pointer type drivers/infiniband/hw/ipath/ipath_verbs.h: In function `to_idev': drivers/infiniband/hw/ipath/ipath_verbs.h:525: warning: type defaults to `int' in declaration of `__mptr' drivers/infiniband/hw/ipath/ipath_verbs.h:525: warning: initialization from incompatible pointer type drivers/infiniband/hw/ipath/ipath_verbs.h: At top level: drivers/infiniband/hw/ipath/ipath_verbs.h:533: warning:
[openib-general] Re: Compile problems with core code and pathscale for svn6462 and linux-2.6.17-rc1
On Thursday 13 April 2006 16:32, Matt Leininger wrote: I'm trying to compile the svn 6462 snapshot with linux-2.6.17-rc1 on a RHEL4 based system. Are you building the ipath driver out of the kernel.org tree, or out of svn? If the latter, you have to patch the kernel and rebuild it first. b ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: Compile problems with core code and pathscale for svn6462 and linux-2.6.17-rc1
On Thu, 2006-04-13 at 16:40 -0700, Bryan O'Sullivan wrote: On Thursday 13 April 2006 16:32, Matt Leininger wrote: I'm trying to compile the svn 6462 snapshot with linux-2.6.17-rc1 on a RHEL4 based system. Are you building the ipath driver out of the kernel.org tree, or out of svn? If the latter, you have to patch the kernel and rebuild it first. Out of svn. I have the drivers/infiniband pointing to the svn tree. I'll try using the drivers in the kernel.org tree. - Matt ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: Compile problems with core code and pathscale for svn6462 and linux-2.6.17-rc1
On Thursday 13 April 2006 16:51, Matt Leininger wrote: Are you building the ipath driver out of the kernel.org tree, or out of svn? If the latter, you have to patch the kernel and rebuild it first. Out of svn. I have the drivers/infiniband pointing to the svn tree. Yes, that won't work, because the svn include directory has a bunch of stuff that's no upstream. b ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: Compile problems with core code and pathscale for svn6462 and linux-2.6.17-rc1
On Thu, 2006-04-13 at 16:54 -0700, Bryan O'Sullivan wrote: On Thursday 13 April 2006 16:51, Matt Leininger wrote: Are you building the ipath driver out of the kernel.org tree, or out of svn? If the latter, you have to patch the kernel and rebuild it first. Out of svn. I have the drivers/infiniband pointing to the svn tree. Yes, that won't work, because the svn include directory has a bunch of stuff that's no upstream. Ok. So the current state is that the mainline devel branch will be broken for a while? BTW, the linux-2.6.17-rc1 in-kernel IB compiled fine. - Matt ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: Compile problems with core code and pathscale for svn6462 and linux-2.6.17-rc1
On Thursday 13 April 2006 16:56, Matt Leininger wrote: Ok. So the current state is that the mainline devel branch will be broken for a while? I have no idea. The current situation is fairly annoying, though. b ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] RFC: start weaning userspace drivers from sysfs
As part of the libibverbs 1.1 release, I would like to remove the dependency on libsysfs I highly approve of this move. the simple sysfs stuff we need is easy to do directly. I was looking at it earlier this week and came to the same conclusion. Johann ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Compile problems with core code and pathscale for svn6462 and linux-2.6.17-rc1
Matt If I remove include/rdma (which I had to do in the past) Matt then some of the pathscale code fails to compile. Here is Matt the error: Yes, you need the patch below for the ipath directory. I sent this to pathscale a while ago but it seems to take a while for patches to make it from their internal repository to svn... --- infiniband/hw/ipath/Makefile(revision 6462) +++ infiniband/hw/ipath/Makefile(working copy) @@ -1,5 +1,6 @@ EXTRA_CFLAGS += -DIPATH_IDSTR='PathScale kernel.org driver' \ - -DIPATH_KERN_TYPE=0 + -DIPATH_KERN_TYPE=0 \ + -Idrivers/infiniband/include obj-$(CONFIG_IPATH_CORE) += ipath_core.o obj-$(CONFIG_INFINIBAND_IPATH) += ib_ipath.o ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] RFC: start weaning userspace drivers from sysfs
Bryan Is the goal of this to make sure that new hardware-specific Bryan libraries will work with old libibverbs? How likely do you Bryan think that is to happen? I don't see much of a problem Bryan with simply breaking backwards compatibility here, since it Bryan seems unlikely that someone would update one, but not the Bryan other. I just want to decouple things as much as possible, so there doesn't have to be a flag day cut over from the new world to the old. This way we can get low-level drivers out everywhere and then change libibverbs. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] 2.6.17-rc1 IPoIB netperf results
Here are the latest IPoIB results: For mthca I saw a range of 380-424 MB/s. The local CPU utilization on the send side dropped for the 380 MB/s, from 98% to 70% For ipath it was 310 MB/s. The local CPU utilization on the send side was always around 30%. - Matt Mellanox benchmarks are with RHEL4 x86_64 with HCA FW v4.7.0 dual EM64T 3.2 GHz PCIe IB HCA (memfull) patch 1 - remove changeset 314324121f9b94b2ca657a494cf2b9cb0e4a28cc patch 2 - remove changeset b8259d9ad1d0f8d0c5ea0e37bb15080b0bd395b5 msi_x=1 for all tests PathScale benchmarks are with RHEL4 x86_64 with HTX HCA dual-socket dual-core Opteron 2.4 GHz netperf -f -M -c -C -H IP_ADDRESS KernelOpenIB netperf (MB/s) 2.6.17-rc1 in-kernel424 (mthca ipoib) 2.6.17-rc1 in-kernel310 (ipath ipoib) 2.6.16 svn 6307 367 (mthca ipoib) 2.6.16 svn 6307 319 (ipath ipoib) 2.6.16 svn 6083 371 (mthca ipoib) 2.6.16 svn 6083 304 (ipath ipoib) 2.6.16 svn 5938 380 (mthca ipoib) 2.6.16 svn 5938 300 (ipath ipoib) 2.6.16 in-kernel364 2.6.16-rc5 in-kernel367 2.6.15 in-kernel382 2.6.14-rc4 patch 12 in-kernel436 2.6.14-rc4 patch 1 in-kernel434 2.6.14-rc4 in-kernel385 2.6.14-rc3 in-kernel374 2.6.13.2 svn3627 386 2.6.13.2 patch 1 svn3627 446 2.6.13.2 in-kernel394 2.6.13-rc3 patch 12 in-kernel442 2.6.13-rc3 patch 1 in-kernel450 2.6.13-rc3 in-kernel395 2.6.13-rc2 patch 1 in-kernel409 2.6.13-rc1 patch 1 in-kernel408 2.6.12.5-lustre in-kernel399 2.6.12.5 patch 1 in-kernel464 2.6.12.5 in-kernel402 2.6.12 in-kernel406 2.6.12-rc6 patch 1 in-kernel470 2.6.12-rc6 in-kernel407 2.6.12-rc5 in-kernel405 2.6.12-rc5 patch 1 in-kernel474 2.6.12-rc4 in-kernel470 2.6.12-rc3 in-kernel466 2.6.12-rc2 in-kernel469 2.6.12-rc1 in-kernel466 2.6.11 in-kernel464 2.6.11 svn3687 464 2.6.9-11.ELsmp svn3513 425 (Woody's results, 3.6Ghz EM64T) ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general