Re: [openib-general] OpenSM causes kernel trap
Thanks, I committed just the packet->msg => packet->msg->mad fix as one changeset, and the rest of this patch (along with some kmalloc()+memset() => kzalloc() cleanups now that 2.6.14 is out) as a second changeset. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM crash with today's trunk
I believe that this is in r3889. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] OpenSM crash with today's trunk
Hello, I updated the OpenIB stack today and I get the following error on starting OpenSM. The verbose log is available at http://www.cs.rutgers.edu/~bohra/osm-v.log # opensm -V -d10 -r - OpenSM Rev:openib-1.1.0 Command Line Arguments: Big V selected d level = 0xa Reassign LIDs Log File: /var/log/osm.log - OpenSM Rev:openib-1.1.0 Using default guid 0x2c901081e7471 Error from osm_opensm_bind (0x2A) Exiting SM Segmentation fault Please let me know what I can do to debug this. Thanks Aniruddha ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] SRQ limit reached async event.
Galen> Does anyone now if openib supports the SRQ limit Galen> asynchronous event? Yes, openib verbs and the mthca driver supports this. However, with current firmware, you will only receive this event for mem-free HCAs (firmware versions 5.x and 1.x). For mem-ful HCAs (firmware versions 3.x and 4.x), you will need to use as-yet-unreleased firmware for the event to be generated. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] OpenSM causes kernel trap
>OK, I think I found it. The problem was that ib_umad_write() wrote >through packet->msg in a few places where it should have used >packet->msg->mad, and therefore corrupted the address of the buffer. Yep - that appears to be the issue. I've attached another patch that includes your fixes, plus adds some additional code cleanup. Signed-off-by: Sean Hefty <[EMAIL PROTECTED]> Index: user_mad.c === --- user_mad.c (revision 3861) +++ user_mad.c (working copy) @@ -99,7 +99,6 @@ struct ib_mad_send_buf *msg; struct list_head list; intlength; - DECLARE_PCI_UNMAP_ADDR(mapping) struct ib_user_mad mad; }; @@ -138,24 +137,23 @@ struct ib_mad_send_wc *send_wc) { struct ib_umad_file *file = agent->context; - struct ib_umad_packet *timeout, *packet = send_wc->send_buf->context[0]; + struct ib_umad_packet *timeout; + struct ib_umad_packet *packet = send_wc->send_buf->context[0]; ib_destroy_ah(packet->msg->ah); ib_free_send_mad(packet->msg); if (send_wc->status == IB_WC_RESP_TIMEOUT_ERR) { - timeout = kmalloc(sizeof *timeout + sizeof (struct ib_mad_hdr), - GFP_KERNEL); + timeout = kmalloc(sizeof *timeout + IB_MGMT_MAD_HDR, GFP_KERNEL); if (!timeout) goto out; - memset(timeout, 0, sizeof *timeout + sizeof (struct ib_mad_hdr)); + memset(timeout, 0, sizeof *timeout + IB_MGMT_MAD_HDR); - timeout->length = sizeof (struct ib_mad_hdr); + timeout->length = IB_MGMT_MAD_HDR; timeout->mad.hdr.id = packet->mad.hdr.id; timeout->mad.hdr.status = ETIMEDOUT; - memcpy(timeout->mad.data, packet->mad.data, - sizeof (struct ib_mad_hdr)); + memcpy(timeout->mad.data, packet->mad.data, IB_MGMT_MAD_HDR); if (!queue_packet(file, agent, timeout)) return; @@ -245,7 +243,7 @@ else ret = -ENOSPC; } else if (copy_to_user(buf, &packet->mad, - packet->length + sizeof (struct ib_user_mad))) + packet->length + sizeof (struct ib_user_mad))) ret = -EFAULT; else ret = packet->length + sizeof (struct ib_user_mad); @@ -270,22 +268,19 @@ struct ib_rmpp_mad *rmpp_mad; u8 method; __be64 *tid; - int ret, length, hdr_len, rmpp_hdr_size; + int ret, length, hdr_len, copy_offset; int rmpp_active = 0; if (count < sizeof (struct ib_user_mad)) return -EINVAL; length = count - sizeof (struct ib_user_mad); - packet = kmalloc(sizeof *packet + sizeof(struct ib_mad_hdr) + -sizeof (struct ib_rmpp_hdr), GFP_KERNEL); + packet = kmalloc(sizeof *packet + IB_MGMT_RMPP_HDR, GFP_KERNEL); if (!packet) return -ENOMEM; if (copy_from_user(&packet->mad, buf, - sizeof (struct ib_user_mad) + - sizeof (struct ib_mad_hdr) + - sizeof (struct ib_rmpp_hdr))) { + sizeof (struct ib_user_mad) + IB_MGMT_RMPP_HDR)) { ret = -EFAULT; goto err; } @@ -296,8 +291,6 @@ goto err; } - packet->length = length; - down_read(&file->agent_mutex); agent = file->agent[packet->mad.hdr.id]; @@ -344,12 +337,10 @@ goto err_ah; } rmpp_active = 1; + copy_offset = IB_MGMT_RMPP_HDR; } else { - if (length > sizeof (struct ib_mad)) { - ret = -EINVAL; - goto err_ah; - } hdr_len = IB_MGMT_MAD_HDR; + copy_offset = IB_MGMT_MAD_HDR; } packet->msg = ib_create_send_mad(agent, @@ -363,32 +354,18 @@ } packet->msg->ah = ah; - packet->msg->timeout_ms = packet->mad.hdr.timeout_ms; + packet->msg->timeout_ms = packet->mad.hdr.timeout_ms; packet->msg->retries = packet->mad.hdr.retries; packet->msg->context[0] = packet; - if (!rmpp_active) { - /* Copy message from user into send buffer */ - if (copy_from_user(packet->msg->mad, - buf + sizeof (struct ib_user_mad), length)) { - ret = -EFAULT; - goto err_msg; - } - } else { - rmpp_hdr_size = sizeof (struct ib_mad_hdr) + - sizeof (struct ib_rmpp_hdr); - - /* Only copy MAD headers (RMPP header
[openib-general] SRQ limit reached async event.
Hello, Does anyone now if openib supports the SRQ limit asynchronous event? I am working with mellanox verbs right now and it doesn't seem to support this. I say this because I have to set the srq_limit attribute via VAPI_modify_srq in order to get the event, unfortunately when I call VAPI_modify_srq I get: error in VAPI_modify_srq: Not implemented Any insight is appreciated. Thanks, Galen ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Boot over IB - support in Bproc status?
Hi, Can anyone point out the current staus of Boot over IB - support in Bproc? Also what is the other solution about "mass boot over IB" now? (openSM, SRP...) Thanks. HB LANL CCN-9 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM causes kernel trap
BTW, Jay, can you confirm that this patch fixes your problem too? Thanks, Roland ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM causes kernel trap
OK, I think I found it. The problem was that ib_umad_write() wrote through packet->msg in a few places where it should have used packet->msg->mad, and therefore corrupted the address of the buffer. I'll commit the patch below in a little while, which fixes this issue and the packet->length race that Sean spotted, unless someone sees a problem with it: --- infiniband/core/user_mad.c (revision 3867) +++ infiniband/core/user_mad.c (working copy) @@ -297,8 +297,6 @@ static ssize_t ib_umad_write(struct file goto err; } - packet->length = length; - down_read(&file->agent_mutex); agent = file->agent[packet->mad.hdr.id]; @@ -398,12 +396,12 @@ static ssize_t ib_umad_write(struct file * transaction ID matches the agent being used to send the * MAD. */ - method = ((struct ib_mad_hdr *) packet->msg)->method; + method = ((struct ib_mad_hdr *) packet->msg->mad)->method; if (!(method & IB_MGMT_METHOD_RESP) && method != IB_MGMT_METHOD_TRAP_REPRESS && method != IB_MGMT_METHOD_SEND) { - tid = &((struct ib_mad_hdr *) packet->msg)->tid; + tid = &((struct ib_mad_hdr *) packet->msg->mad)->tid; *tid = cpu_to_be64(((u64) agent->hi_tid) << 32 | (be64_to_cpup(tid) & 0x)); } @@ -414,7 +412,7 @@ static ssize_t ib_umad_write(struct file up_read(&file->agent_mutex); - return sizeof (struct ib_user_mad_hdr) + packet->length; + return count; err_msg: ib_free_send_mad(packet->msg); ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] ping over IPoIB does not work between 2 cards on the same host
I have a host with 2 HCAs (dual port each but I only connected one port per machine) connected to a switch. When IPoIB configured I ping cards own IP address it works. I can ping another machines with their HCA cards configured with IPoIB fine. And I can ping both local IP addresses from remote machine(s) Details: ifconfig ib1 192.168.0.1 netmask 255.255.0.0 ifconfig ib3 192.168.0.3 netmask 255.255.0.0 On remote machine: ifconfig ib0 192.168.1.0 netmask 255.255.0.0 Locally: ping -I ib3 192.168.0.3 PING 192.168.0.3 (192.168.97.3) from 192.168.0.3 ib3: 56(84) bytes of data. 64 bytes from 192.168.0.3: icmp_seq=0 ttl=64 time=0.028 ms ping -I ib1 192.168.0.1 PING 192.168.0.1 (192.168.97.1) from 192.168.0.1 ib1: 56(84) bytes of data. 64 bytes from 192.168.0.1: icmp_seq=0 ttl=64 time=0.028 ms # ping -I ib3 192.168.1.0 PING 192.168.1.0 (192.168.1.0) from 192.168.0.3 ib3: 56(84) bytes of data. 64 bytes from 192.168.1.0: icmp_seq=0 ttl=64 time=1.81 ms >From remote host: # ping -I ib0 192.168.0.1 PING 192.168.0.1 (192.168.0.1) from 192.168.1.0 ib0: 56(84) bytes of data. 64 bytes from 192.168.0.1: icmp_seq=0 ttl=64 time=0.086 ms # ping -I ib0 192.168.0.3 PING 192.168.0.3 (192.168.0.3) from 192.168.1.0 ib0: 56(84) bytes of data. 64 bytes from 192.168.0.1: icmp_seq=0 ttl=64 time=0.086 ms Locally between 2 cards:# ping -I ib3 192.168.0.1 PING 192.168.0.1 (192.168.0.1) from 192.168.0.3 ib3: 56(84) bytes of data. >From 192.168.0.3 icmp_seq=1 Destination Host Unreachable From 192.168.0.3 icmp_seq=2 Destination Host Unreachable From 192.168.0.3 icmp_seq=3 Destination Host Unreachable Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 275 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM causes kernel trap
Roland Dreier wrote: Good catch. Seems like the below patch is the right fix: we start out with Fix looks right to me. packet->length = length; I don't think that this assignment is needed. Once the packet is sent, it is simply freed. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] [SRP] srp_cm_handler expanded response handling
On Thu, 27 Oct 2005, Roland Dreier wrote: >Looks good, except: > >> +if (reason == 0x00010002) > >can you add enums for all these SRP_LOGIN_REJ reason codes rather than >open-coding this magic number here? OK. Thanks, John Signed-off-by: John Kingman <[EMAIL PROTECTED]> Index: ib_srp.h === --- ib_srp.h(revision 3884) +++ ib_srp.h(working copy) @@ -76,6 +76,16 @@ enum srp_target_state { SRP_TARGET_REMOVED }; +enum srp_login_rej_reason { + SRP_UNABLE_ESTABLISH_CHANNEL= 0x0001, + SRP_INSUFFICIENT_RESOURCES = 0x00010001, + SRP_REQ_IT_IU_LENGTH_TOO_LARGE = 0x00010002, + SRP_UNABLE_ASSOCIATE_CHANNEL= 0x00010003, + SRP_UNSUPPORTED_DESCRIPTOR_FMT = 0x00010004, + SRP_MULTI_CHANNEL_UNSUPPORTED = 0x00010005, + SRP_CHANNEL_LIMIT_REACHED = 0x00010006 +}; + struct srp_host { u8 initiator_port_id[16]; struct ib_device *dev; Index: ib_srp.c === --- ib_srp.c(revision 3883) +++ ib_srp.c(working copy) @@ -975,6 +975,7 @@ static int srp_cm_handler(struct ib_cm_i struct ib_qp_attr *qp_attr = NULL; int attr_mask = 0; int comp = 0; + int rsp_opcode = 0; switch (event->event) { case IB_CM_REQ_ERROR: @@ -985,17 +986,20 @@ static int srp_cm_handler(struct ib_cm_i case IB_CM_REP_RECEIVED: comp = 1; + rsp_opcode = *(u8 *) event->private_data; - { + if (rsp_opcode == SRP_LOGIN_RSP) { struct srp_login_rsp *rsp = event->private_data; - /* XXX check that opcode is SRP RSP */ - target->max_ti_iu_len = be32_to_cpu(rsp->max_ti_iu_len); target->req_lim = be32_to_cpu(rsp->req_lim_delta); target->scsi_host->can_queue = min(target->req_lim, target->scsi_host->can_queue); + } else { + printk(KERN_WARNING PFX "Unhandled RSP opcode %#x\n", rsp_opcode); + target->status = -ECONNRESET; + break; } target->status = srp_alloc_iu_bufs(target); @@ -1043,7 +1047,8 @@ static int srp_cm_handler(struct ib_cm_i printk(KERN_DEBUG PFX "REJ received\n"); comp = 1; - if (event->param.rej_rcvd.reason == IB_CM_REJ_PORT_CM_REDIRECT) { + switch (event->param.rej_rcvd.reason) { + case IB_CM_REJ_PORT_CM_REDIRECT: cpi = event->param.rej_rcvd.ari; target->path.dlid = cpi->redirect_lid; target->path.pkey = cpi->redirect_pkey; @@ -1052,23 +1057,52 @@ static int srp_cm_handler(struct ib_cm_i target->status = target->path.dlid ? SRP_DLID_REDIRECT : SRP_PORT_REDIRECT; - } else if (topspin_workarounds && - !memcmp(&target->ioc_guid, topspin_oui, 3) && - event->param.rej_rcvd.reason == IB_CM_REJ_PORT_REDIRECT) { - /* -* Topspin/Cisco SRP gateways incorrectly send -* reject reason code 25 when they mean 24 -* (port redirect). -*/ - memcpy(target->path.dgid.raw, - event->param.rej_rcvd.ari, 16); - - printk(KERN_DEBUG PFX "Topspin/Cisco redirect to target port GID %016llx%016llx\n", - (unsigned long long) be64_to_cpu(target->path.dgid.global.subnet_prefix), - (unsigned long long) be64_to_cpu(target->path.dgid.global.interface_id)); + break; - target->status = SRP_PORT_REDIRECT; - } else { + case IB_CM_REJ_PORT_REDIRECT: + if (topspin_workarounds && + !memcmp(&target->ioc_guid, topspin_oui, 3)) { + /* +* Topspin/Cisco SRP gateways incorrectly send +* reject reason code 25 when they mean 24 +* (port redirect). +*/ + memcpy(target->path.dgid.raw, + event->param.rej_rcvd.ari, 16); + + printk(KERN_DEBUG PFX "Topspin/Cisco redirect to target port GID %016llx%016llx\n", + (unsigned long long) be64_to_cpu(target->path.dgid.global.subnet_prefix), +
[openib-general] Re: [PATCH] [SRP] srp_cm_handler expanded response handling
Looks good, except: > + if (reason == 0x00010002) can you add enums for all these SRP_LOGIN_REJ reason codes rather than open-coding this magic number here? Thanks, Roland ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] [SRP] srp_cm_handler expanded response handling
This patch expands the srp_cm_handler code to recognize more response cases and provides a place holder for future code to handle SRP target exceptions such as IB_CM_REJ_CONSUMER_DEFINED with reason code 0x00010002 (requested max_it_iu_len too large). Patch has been tested with our target. Signed-off-by: John Kingman <[EMAIL PROTECTED]> Index: ib_srp.c === --- ib_srp.c(revision 3883) +++ ib_srp.c(working copy) @@ -975,6 +975,7 @@ static int srp_cm_handler(struct ib_cm_i struct ib_qp_attr *qp_attr = NULL; int attr_mask = 0; int comp = 0; + int rsp_opcode = 0; switch (event->event) { case IB_CM_REQ_ERROR: @@ -985,17 +986,20 @@ static int srp_cm_handler(struct ib_cm_i case IB_CM_REP_RECEIVED: comp = 1; + rsp_opcode = *(u8 *) event->private_data; - { + if (rsp_opcode == SRP_LOGIN_RSP) { struct srp_login_rsp *rsp = event->private_data; - /* XXX check that opcode is SRP RSP */ - target->max_ti_iu_len = be32_to_cpu(rsp->max_ti_iu_len); target->req_lim = be32_to_cpu(rsp->req_lim_delta); target->scsi_host->can_queue = min(target->req_lim, target->scsi_host->can_queue); + } else { + printk(KERN_WARNING PFX "Unhandled RSP opcode %#x\n", rsp_opcode); + target->status = -ECONNRESET; + break; } target->status = srp_alloc_iu_bufs(target); @@ -1043,7 +1047,8 @@ static int srp_cm_handler(struct ib_cm_i printk(KERN_DEBUG PFX "REJ received\n"); comp = 1; - if (event->param.rej_rcvd.reason == IB_CM_REJ_PORT_CM_REDIRECT) { + switch (event->param.rej_rcvd.reason) { + case IB_CM_REJ_PORT_CM_REDIRECT: cpi = event->param.rej_rcvd.ari; target->path.dlid = cpi->redirect_lid; target->path.pkey = cpi->redirect_pkey; @@ -1052,23 +1057,52 @@ static int srp_cm_handler(struct ib_cm_i target->status = target->path.dlid ? SRP_DLID_REDIRECT : SRP_PORT_REDIRECT; - } else if (topspin_workarounds && - !memcmp(&target->ioc_guid, topspin_oui, 3) && - event->param.rej_rcvd.reason == IB_CM_REJ_PORT_REDIRECT) { - /* -* Topspin/Cisco SRP gateways incorrectly send -* reject reason code 25 when they mean 24 -* (port redirect). -*/ - memcpy(target->path.dgid.raw, - event->param.rej_rcvd.ari, 16); - - printk(KERN_DEBUG PFX "Topspin/Cisco redirect to target port GID %016llx%016llx\n", - (unsigned long long) be64_to_cpu(target->path.dgid.global.subnet_prefix), - (unsigned long long) be64_to_cpu(target->path.dgid.global.interface_id)); + break; - target->status = SRP_PORT_REDIRECT; - } else { + case IB_CM_REJ_PORT_REDIRECT: + if (topspin_workarounds && + !memcmp(&target->ioc_guid, topspin_oui, 3)) { + /* +* Topspin/Cisco SRP gateways incorrectly send +* reject reason code 25 when they mean 24 +* (port redirect). +*/ + memcpy(target->path.dgid.raw, + event->param.rej_rcvd.ari, 16); + + printk(KERN_DEBUG PFX "Topspin/Cisco redirect to target port GID %016llx%016llx\n", + (unsigned long long) be64_to_cpu(target->path.dgid.global.subnet_prefix), + (unsigned long long) be64_to_cpu(target->path.dgid.global.interface_id)); + + target->status = SRP_PORT_REDIRECT; + } else { + printk(KERN_WARNING " REJ reason: IB_CM_REJ_PORT_REDIRECT\n"); + target->status = -ECONNRESET; + } + break; + + case IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID: + printk(KERN_WARNING " REJ reason: IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID\n"); + target->status = -ECONNRESET; + break; + + case IB_CM_REJ_CONSUMER_DEFIN
Re: [openib-general] OpenSM causes kernel trap
Sean> the only bug I saw was accessing packet->length after Sean> calling ib_post_send_mad(). The send_handler() will free Sean> the packet, so there's a race there. Good catch. Seems like the below patch is the right fix: we start out with length = count - sizeof (struct ib_user_mad); and then do packet->length = length; so in return sizeof (struct ib_user_mad_hdr) + packet->length; we're really just returning count -- in ib_user_mad.h, the definition of struct ib_user_mad is: struct ib_user_mad { struct ib_user_mad_hdr hdr; __u8data[0]; }; so sizeof struct ib_user_mad == struct ib_user_mad_hdr. Hal, am I missing something? Was there any reason to write the return statement like that, or is it OK to just return count directly? - R. --- infiniband/core/user_mad.c (revision 3867) +++ infiniband/core/user_mad.c (working copy) @@ -414,7 +414,7 @@ static ssize_t ib_umad_write(struct file up_read(&file->agent_mutex); - return sizeof (struct ib_user_mad_hdr) + packet->length; + return count; err_msg: ib_free_send_mad(packet->msg); ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC] OpenSM Interactive Console
On Thu, Oct 27, 2005 at 09:29:57AM -0500, Troy Benjegerdes wrote: > I guess the point of all this is find a end-user use-case for the SM > MIB, and work back from there to decide if haveing a MIB actually helps > solve the problem. The end-use case is likely to be something like "an enterprise which insists on managing as much as possible through HP OpenView." Which isn't anyone in HPC, hence the current lack of interest. Now the things you'd actually want to monitor for a cluster, it's not really the normal stuff that's in MIBs. I'd want to know if a cable was unexpectedly unplugged, or if a node was up but its IB connection wasn't. I'd like to know if a link had an unusual error rate. -- greg ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] OpenSM causes kernel trap
I think that is likely a different issue. -- Hal From: [EMAIL PROTECTED] on behalf of James Lentini Sent: Thu 10/27/2005 1:54 PM To: Roland Dreier Cc: openib-general@openib.org Subject: Re: [openib-general] OpenSM causes kernel trap On Thu, 27 Oct 2005, Roland Dreier wrote: > Sean, looks like your MAD send buf stuff may have broken send > timeouts. Any quick ideas before I dig into this? Itamar also had a problem with the MAD layer on x86_64: http://openib.org/pipermail/openib-general/2005-October/013029.html ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: Automated userspace build error
On 27.10.2005 [14:31:31 +0200], Michael S. Tsirkin wrote: > Quoting r. Nishanth Aravamudan <[EMAIL PROTECTED]>: > > Subject: Re: Automated userspace build error > > > > On 25.10.2005 [15:22:56 -0700], Roland Dreier wrote: > > > Nishanth> Hrm, well, I'm testing the latest svn (3865), did the > > > Nishanth> patch just get checked in? > > > > > > Yeah, I only noticed it and fixed it after your original email. I > > > just meant that I had already checked it in before sending my reply. > > > Sorry for the confusion... > > > > No worries, I figured that's what happened. > > > > On a related note, do you (or anyone else) have any suggestions for > > build-testing all of the userspace components? There isn't a top-level > > Makefile of any kind to make it easy :/ > > > > Thanks, > > Nish > > Yes, look at scripts in > https://openib.org/svn/trunk/contrib/mellanox/scripts > > You can also, basically, cut and paste stuff from the FAQ page, > but that relies on performing make as root. Which luckily I can do (or is that unluckily -- if I screw up, the machine tends to fall over ;). Thanks for the pointer! -Nish ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM causes kernel trap
Sean Hefty wrote: I don't see anything off there either. Timeouts seem to work fine with CM testing, so I'm guessing that the issue is somewhere in user_mad.c. I'm trying to see if there's anything wrong in ib_umad_write() that might cause it to crash on the completion. Re-testing with grmpp, I didn't hit any issues running with or without RMPP. ib_umad_write() can be cleaned up a little, but the only bug I saw was accessing packet->length after calling ib_post_send_mad(). The send_handler() will free the packet, so there's a race there. This doesn't seem related to this crash though. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Automated userspace build error
On 26.10.2005 [17:15:05 -0700], Woodruff, Robert J wrote: > Nish wrote, > >On a related note, do you (or anyone else) have any suggestions for > >build-testing all of the userspace components? There isn't a top-level > >Makefile of any kind to make it easy :/ > > >Thanks, > >Nish > > If you look at the openib download page, Makia posted a userspace > source RPM, although it is a bit out of date. RPM's aren't necessarily useful, but the means to get there might be. > I also have a similar build proceedure that I use > internally, basically building all of the usermode components > and then building an RPM to allow easy installation on other > nodes for testing There are also .spec files for most of the individual > libraries, if you prefer to build RPMs for individual libraries. > I find it easier just to lump it all into one big usermode component RPM > and > one kernel-mode component RPM. Yes, that's my goal. But I don't necessarily want to install the libraries. Just build them. I will take a look at the SRPM you mentioned above. Thanks, Nish ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM causes kernel trap
Roland Dreier wrote: Sean> I think that the send_handler in user_mad.c is broken. I don't see anything obviously wrong -- in Jay's log, the call to ib_free_send_mad() is crashing. When can it be wrong to do that from the send handler? I don't see anything off there either. Timeouts seem to work fine with CM testing, so I'm guessing that the issue is somewhere in user_mad.c. I'm trying to see if there's anything wrong in ib_umad_write() that might cause it to crash on the completion. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM causes kernel trap
Sean> I think that the send_handler in user_mad.c is broken. I don't see anything obviously wrong -- in Jay's log, the call to ib_free_send_mad() is crashing. When can it be wrong to do that from the send handler? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] ifup/ifdown scripts don't work with IPoIB
On Thu, Oct 27, 2005 at 09:37:29AM -0700, Bob Woodruff wrote: > Grant wrote, > >What does it say when you use *192* for the first byte? > > Same thing, I had a typo in first email, > > arping -c 2 -w 3 -D -I ib0 192.168.0.1 > ARPING 192.168.0.1 from 0.0.0.0 ib0 > Sent 2 probes (2 broadcast(s)) > Received -1 response(s) Hrm...wouldn't that be a bug in arping program? How can one get "-1" responses? And I can't reproduce that here (ia64-linux): gsyprf3:~# ifconfig ib0 ib0 Link encap:UNSPEC HWaddr 00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:10.0.0.51 Bcast:10.0.0.255 Mask:255.255.255.0 UP BROADCAST MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) gsyprf3:~# arping -c 2 -w 3 -D -I ib0 10.0.0.51 ARPING 10.0.0.51 from 0.0.0.0 ib0 Sent 2 probes (2 broadcast(s)) Received 0 response(s) gsyprf3:~# arping -c 2 -w 3 -D -I ib0 10.0.0.55 ARPING 10.0.0.55 from 0.0.0.0 ib0 Sent 2 probes (2 broadcast(s)) Received 0 response(s) There is no 10.0.0.55 IP in use on this network. I don't understand if the above result is correct or not and I did RTFM. BTW, I'm using Debian "iputils-arping 20020927-2". grant ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM causes kernel trap
On Thu, 27 Oct 2005, Roland Dreier wrote: > Sean, looks like your MAD send buf stuff may have broken send > timeouts. Any quick ideas before I dig into this? Itamar also had a problem with the MAD layer on x86_64: http://openib.org/pipermail/openib-general/2005-October/013029.html ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM causes kernel trap
Roland Dreier wrote: Sean, looks like your MAD send buf stuff may have broken send timeouts. Any quick ideas before I dig into this? I think that the send_handler in user_mad.c is broken. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM causes kernel trap
Roland Dreier wrote: Sean, looks like your MAD send buf stuff may have broken send timeouts. Any quick ideas before I dig into this? No quick ideas why. I'll start looking into this as well. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] ifup/ifdown scripts don't work with IPoIB
Hal wrote, >I think arping needs a minor change to work for IB due to the difference in the >HW addresses for IPoIB and other LAN MACs. >-- Hal Yep. That is the conclusion that we came to also. A work around for now, one can just remove the arping check in ifup if the device is an ib device. Not perfect, but allows it to work for ib devices and the normal ifcfg- scripts. Something like, if [ "x`echo ${REALDEVICE} | sed -e "s/^ib.//"`" != "x" ]; then if ! arping -q -c 2 -w 3 -D -I ${REALDEVICE} ${IPADDR} ; then echo $"Error, some other host already uses address ${IPADDR}." exit 1 fi fi woody ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM causes kernel trap
Sean, looks like your MAD send buf stuff may have broken send timeouts. Any quick ideas before I dig into this? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: ib_mthca panic on PPC64
OK, the latest svn should work again. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] OpenSM causes kernel trap
I am trying to start up opensm on a Dell PowerEdge 2850 with a Mellanox based infiniband card. We are using the x86-64 Architecture. The kernel is recompiled with the latest stack from subversion, and all of the modules load OK. However, when I try to start opensm I get the following error. After this, then modules can not be successfully removed from the kernel and opensm is not successfully running. I can send the output from opensm's log file if anyone is interested. Thanks. -Jay Higley Oct 27 12:07:17 riba OpenSM[3321]: OpenSM Rev:openib-1.1.0 Oct 27 12:07:17 riba kernel: Unable to handle kernel paging request at RIP: Oct 27 12:07:17 riba kernel: {kfree+107} Oct 27 12:07:17 riba kernel: PGD 103027 PUD 5619067 PMD 0 Oct 27 12:07:17 riba kernel: Oops: [1] SMP Oct 27 12:07:17 riba kernel: CPU 3 Oct 27 12:07:17 riba kernel: Modules linked in: nfsd exportfs lockd nfs_acl ipv6 sunrpc ib_uverbs ib_at ib_sdp ib_ucm ib_cm ib_ping ib_mthca ib_umad binfmt_misc dm_mod video thermal processor fan container button battery ac ehci_hcd uhci_hcd pcspkr floppy parport_pc parport ib_ipoib ib_sa ib_mad ib_core e1000 snd_pcm_oss snd_pcm snd_timer snd_page_alloc snd_mixer_oss snd soundcore ext3 jbd megaraid_mbox megaraid_mm sd_mod scsi_mod Oct 27 12:07:17 riba kernel: Pid: 1783, comm: ib_mad1 Not tainted 2.6.13.4-86.caos.smp Oct 27 12:07:17 riba kernel: RIP: 0010:[] {kfree+107} Oct 27 12:07:17 riba kernel: RSP: 0018:81013df97db8 EFLAGS: 00010006 Oct 27 12:07:17 riba kernel: RAX: 0003 RBX: RCX: 81013fd93518 Oct 27 12:07:17 riba kernel: RDX: 00762000 RSI: 0292 RDI: 810004b02028 Oct 27 12:07:17 riba kernel: RBP: 81010e00 R08: 81013df96000 R09: Oct 27 12:07:17 riba kernel: R10: 0001 R11: R12: 81013e600e10 Oct 27 12:07:17 riba kernel: R13: 810037deb000 R14: 81013e600e78 R15: 880e5190 Oct 27 12:07:17 riba kernel: FS: () GS:804f3980() knlGS: Oct 27 12:07:17 riba kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b Oct 27 12:07:17 riba kernel: CR2: CR3: 00013907a000 CR4: 06e0 Oct 27 12:07:17 riba kernel: Process ib_mad1 (pid: 1783, threadinfo 81013df96000, task 81013e40a1b0) Oct 27 12:07:17 riba kernel: Stack: 0286 81013e600e10 81013f3db180 880e272e Oct 27 12:07:17 riba kernel:81013df97e28 8817113f 81013e40a3c8 81013fd93500 Oct 27 12:07:17 riba kernel:81013e600e00 0292 Oct 27 12:07:17 riba kernel: Call Trace:{:ib_mad:ib_free_send_mad+14} {:ib_umad:send_handler+63} Oct 27 12:07:17 riba kernel: {:ib_mad:timeout_sends+404} {__wake_up+67} Oct 27 12:07:17 riba kernel: {worker_thread+498} {default_wake_function+0} Oct 27 12:07:17 riba kernel: {__wake_up_common+64} {default_wake_function+0} Oct 27 12:07:17 riba kernel: {keventd_create_kthread+0} {worker_thread+0} Oct 27 12:07:17 riba kernel: {keventd_create_kthread+0} {kthread+217} Oct 27 12:07:17 riba kernel:{child_rip+8} {keventd_create_kthread+0} Oct 27 12:07:17 riba kernel:{kthread+0} {child_rip+0} Oct 27 12:07:17 riba kernel: Oct 27 12:07:17 riba kernel: Oct 27 12:07:17 riba kernel: Code: 8b 03 3b 43 04 73 04 89 c0 eb 0a 48 89 de e8 a2 03 00 00 8b Oct 27 12:07:17 riba kernel: RIP {kfree+107} RSP Oct 27 12:07:17 riba kernel: CR2: ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] ifup/ifdown scripts don't work with IPoIB
I think arping needs a minor change to work for IB due to the difference in the HW addresses for IPoIB and other LAN MACs. -- Hal From: [EMAIL PROTECTED] on behalf of Bob Woodruff Sent: Thu 10/27/2005 12:37 PM To: 'Grant Grundler' Cc: openib-general@openib.org Subject: RE: [openib-general] ifup/ifdown scripts don't work with IPoIB Grant wrote, >What does it say when you use *192* for the first byte? Same thing, I had a typo in first email, arping -c 2 -w 3 -D -I ib0 192.168.0.1 ARPING 192.168.0.1 from 0.0.0.0 ib0 Sent 2 probes (2 broadcast(s)) Received -1 response(s) woody ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] add node_guid to struct ib_device
Here's a modified version of Roland's original patch that adds only the node_guid to struct ib_device. Signed-off-by: Sean Hefty <[EMAIL PROTECTED]> I'll rework my other patches based on this change. Index: include/rdma/ib_verbs.h === --- include/rdma/ib_verbs.h (revision 3861) +++ include/rdma/ib_verbs.h (working copy) @@ -951,6 +951,7 @@ u64 uverbs_cmd_mask; int uverbs_abi_ver; + __be64 node_guid; u8 node_type; u8 phys_port_cnt; }; Index: hw/mthca/mthca_dev.h === --- hw/mthca/mthca_dev.h(revision 3830) +++ hw/mthca/mthca_dev.h(working copy) @@ -290,7 +290,7 @@ u64 ddr_end; MTHCA_DECLARE_DOORBELL_LOCK(doorbell_lock) - struct semaphore cap_mask_mutex; + struct semaphore dev_attr_mutex; void __iomem*hcr; void __iomem*kar; @@ -528,4 +528,17 @@ return dev->mthca_flags & MTHCA_FLAG_MEMFREE; } +/* + * XXX remove once 2.6.14 is released. + */ +static inline void *mthca_kzalloc(size_t size, unsigned int __nocast flags) +{ + void *ret = kmalloc(size, flags); + if (ret) + memset(ret, 0, size); + return ret; +} +#undef kzalloc +#define kzalloc(s, f) mthca_kzalloc(s, f); + #endif /* MTHCA_DEV_H */ Index: hw/mthca/mthca_provider.c === --- hw/mthca/mthca_provider.c (revision 3830) +++ hw/mthca/mthca_provider.c (working copy) @@ -45,6 +45,14 @@ #include "mthca_user.h" #include "mthca_memfree.h" +static void init_query_mad(struct ib_smp *mad) +{ + mad->base_version = 1; + mad->mgmt_class= IB_MGMT_CLASS_SUBN_LID_ROUTED; + mad->class_version = 1; + mad->method= IB_MGMT_METHOD_GET; +} + static int mthca_query_device(struct ib_device *ibdev, struct ib_device_attr *props) { @@ -55,7 +63,7 @@ u8 status; - in_mad = kmalloc(sizeof *in_mad, GFP_KERNEL); + in_mad = kzalloc(sizeof *in_mad, GFP_KERNEL); out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL); if (!in_mad || !out_mad) goto out; @@ -64,12 +72,8 @@ props->fw_ver = mdev->fw_ver; - memset(in_mad, 0, sizeof *in_mad); - in_mad->base_version = 1; - in_mad->mgmt_class = IB_MGMT_CLASS_SUBN_LID_ROUTED; - in_mad->class_version = 1; - in_mad->method = IB_MGMT_METHOD_GET; - in_mad->attr_id= IB_SMP_ATTR_NODE_INFO; + init_query_mad(in_mad); + in_mad->attr_id = IB_SMP_ATTR_NODE_INFO; err = mthca_MAD_IFC(mdev, 1, 1, 1, NULL, NULL, in_mad, out_mad, @@ -127,20 +131,16 @@ int err = -ENOMEM; u8 status; - in_mad = kmalloc(sizeof *in_mad, GFP_KERNEL); + in_mad = kzalloc(sizeof *in_mad, GFP_KERNEL); out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL); if (!in_mad || !out_mad) goto out; memset(props, 0, sizeof *props); - memset(in_mad, 0, sizeof *in_mad); - in_mad->base_version = 1; - in_mad->mgmt_class = IB_MGMT_CLASS_SUBN_LID_ROUTED; - in_mad->class_version = 1; - in_mad->method = IB_MGMT_METHOD_GET; - in_mad->attr_id= IB_SMP_ATTR_PORT_INFO; - in_mad->attr_mod = cpu_to_be32(port); + init_query_mad(in_mad); + in_mad->attr_id = IB_SMP_ATTR_PORT_INFO; + in_mad->attr_mod = cpu_to_be32(port); err = mthca_MAD_IFC(to_mdev(ibdev), 1, 1, port, NULL, NULL, in_mad, out_mad, @@ -185,7 +185,7 @@ int err; u8 status; - if (down_interruptible(&to_mdev(ibdev)->cap_mask_mutex)) + if (down_interruptible(&to_mdev(ibdev)->dev_attr_mutex)) return -ERESTARTSYS; err = mthca_query_port(ibdev, port, &attr); @@ -207,7 +207,7 @@ } out: - up(&to_mdev(ibdev)->cap_mask_mutex); + up(&to_mdev(ibdev)->dev_attr_mutex); return err; } @@ -219,18 +219,14 @@ int err = -ENOMEM; u8 status; - in_mad = kmalloc(sizeof *in_mad, GFP_KERNEL); + in_mad = kzalloc(sizeof *in_mad, GFP_KERNEL); out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL); if (!in_mad || !out_mad) goto out; - memset(in_mad, 0, sizeof *in_mad); - in_mad->base_version = 1; - in_mad->mgmt_class = IB_MGMT_CLASS_SUBN_LID_ROUTED; - in_mad->class_version = 1; - in_mad->method = IB_MGMT_METHOD_GET; - in_mad->attr_id= IB_SMP_ATTR_PKEY_TABLE; - in_mad->attr_mod
Re: [openib-general] Re: ehca testing
OK, looks like you have two problems. First of all, you seem to have two versions of ib_mthca, one of which gets picked up by hotplug on boot and one of which gets picked up by modprobe. Notice how you don't see the dev->ib_dev.node_type = 1 line when mthca runs on boot? The only explanation I can come up with for that would be that you have an old version of it in an initrd or something that's screwing thing up. As for the crash in poll_catas, I understand what's going on there. The catastrophic error polling code is ioremap()ing a PCI address instead of the correct CPU address. They're different on pSeries but not on most other architectures, so I didn't see problems in testing. I'll commit a fix for that problem shortly. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] ifup/ifdown scripts don't work with IPoIB
Grant wrote, >What does it say when you use *192* for the first byte? Same thing, I had a typo in first email, arping -c 2 -w 3 -D -I ib0 192.168.0.1 ARPING 192.168.0.1 from 0.0.0.0 ib0 Sent 2 probes (2 broadcast(s)) Received -1 response(s) woody ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: ehca testing
On Thu, Oct 20, 2005 at 03:32:13PM -0700, Roland Dreier wrote: > Troy> There is some sort of strange initializiation error going on here.. > > Yes, very strange. Can you add > > printk(KERN_ERR "hca->node_type = %d\n", hca->node_type); > > to the beginning of ipoib_add_port(), and > > printk(KERN_ERR "dev->ib_dev.node_type = %d\n", dev->ib_dev.node_type); > > right before the call to ib_register_device() in > mthca_register_device() and send the output that you get when hotplug > loads ib_mthca vs. when you load ib_mthca by hand? When loaded at boot: [586811.915831] ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23, 2005) [586811.915849] ib_mthca: Initializing :d9:00.0 [586811.916634] PCI: Enabling device: (:d9:00.0), cmd 142 [586818.501595] openafs: module license 'http://www.openafs.org/dl/license10.html' taints kernel. [586818.504651] Found system call table at 0xc0013e68 (scan: close+ioctl) [586818.520240] Starting AFS cache scan...Memory cache: Allocating 12500 dcacheentries...found 0 non-empty cache files (0%). [586875.848354] afs: Lost contact with volume location server 147.155.137.10 incell scl.ameslab.gov [586875.848374] afs: Lost contact with volume location server 147.155.137.10 incell scl.ameslab.gov [587154.758768] hca->node_type = 236 [587154.760578] hca->node_type = 236 [587154.761511] hca->node_type = 236 [587154.761572] mthca0: ib_query_pkey port 3 failed (ret = -22) [587154.761584] hca->node_type = 236 [587154.761633] mthca0: ib_query_pkey port 4 failed (ret = -22) [587154.761644] hca->node_type = 236 [587154.762506] hca->node_type = 236 [587154.763422] hca->node_type = 236 [587154.763480] mthca0: ib_query_pkey port 7 failed (ret = -22) [587154.763491] hca->node_type = 236 [587154.763542] mthca0: ib_query_pkey port 8 failed (ret = -22) [587154.763553] hca->node_type = 236 [587154.765698] hca->node_type = 236 [587154.767136] hca->node_type = 236 [587154.767312] mthca0: ib_query_pkey port 11 failed (ret = -22) [587154.767324] hca->node_type = 236 [587154.767455] mthca0: ib_query_pkey port 12 failed (ret = -22) [587154.767471] hca->node_type = 236 [587154.769140] hca->node_type = 236 [587154.772116] hca->node_type = 236 [587154.772180] mthca0: ib_query_pkey port 15 failed (ret = -22) [587154.772192] hca->node_type = 236 [587154.772243] mthca0: ib_query_pkey port 16 failed (ret = -22) [587154.772255] hca->node_type = 236 [587154.773401] hca->node_type = 236 [587154.776817] hca->node_type = 236 [587154.776974] mthca0: ib_query_pkey port 19 failed (ret = -22) [587154.776986] hca->node_type = 236 [587154.778179] mthca0: ib_query_pkey port 20 failed (ret = -22) [587154.778198] hca->node_type = 236 [587154.780159] hca->node_type = 236 [587154.785406] hca->node_type = 236 [587154.785512] mthca0: ib_query_pkey port 23 failed (ret = -22) [587154.785523] hca->node_type = 236 [587154.785582] mthca0: ib_query_pkey port 24 failed (ret = -22) [587154.785599] hca->node_type = 236 [587154.789427] hca->node_type = 236 [587154.794314] hca->node_type = 236 [587154.794458] mthca0: ib_query_pkey port 27 failed (ret = -22) [587154.794474] hca->node_type = 236 [587154.794634] mthca0: ib_query_pkey port 28 failed (ret = -22) [587154.794646] hca->node_type = 236 [587154.797133] hca->node_type = 236 [587154.803507] hca->node_type = 236 [587154.803597] mthca0: ib_query_pkey port 31 failed (ret = -22) [587154.803608] hca->node_type = 236 [587154.803667] mthca0: ib_query_pkey port 32 failed (ret = -22) [587154.803679] hca->node_type = 236 [587154.820947] hca->node_type = 236 [587154.829795] hca->node_type = 236 [587154.831921] mthca0: ib_query_pkey port 35 failed (ret = -22) [587154.831934] hca->node_type = 236 [587154.834932] mthca0: ib_query_pkey port 36 failed (ret = -22) [587154.834946] hca->node_type = 236 [587154.844314] hca->node_type = 236 [587154.853591] hca->node_type = 236 [587154.853680] mthca0: ib_query_pkey port 39 failed (ret = -22) [587154.853692] hca->node_type = 236 [587154.853745] mthca0: ib_query_pkey port 40 failed (ret = -22) [587154.853761] hca->node_type = 236 [587154.869483] hca->node_type = 236 [587154.874749] hca->node_type = 236 [587154.874952] mthca0: ib_query_pkey port 43 failed (ret = -22) [587154.874969] hca->node_type = 236 [587154.875609] mthca0: ib_query_pkey port 44 failed (ret = -22) [587154.875624] hca->node_type = 236 [587154.894612] hca->node_type = 236 [587154.908058] hca->node_type = 236 [587154.909244] mthca0: ib_query_pkey port 47 failed (ret = -22) [587154.909261] hca->node_type = 236 [587154.909323] mthca0: ib_query_pkey port 48 failed (ret = -22) [587154.909334] hca->node_type = 236 [587154.918749] hca->node_type = 236 [587154.939629] hca->node_type = 236 [587154.939729] mthca0: ib_query_pkey port 51 failed (ret = -22) [587154.939745] hca->node_type = 236 [587154.939866] mthca0: ib_query_pkey port 52 failed (ret = -22) [587154.939883] hca->node_type = 236 [587154.957219] hca->node_type = 236 [587154.971523] hca->node_type = 236
[openib-general] ib_mthca panic on PPC64
I got this the other day (before I had a chance to add the debug code) p5l0:~# [443954.161068] mthca0: ib_query_pkey port 0 failed (ret = -22) [443988.334644] mthca0: ib_query_pkey port 0 failed (ret = -22) [444037.579342] ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23, 2005) [444037.579360] ib_mthca: Initializing :d9:00.0 [444101.503664] ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23, 2005) [444101.503682] ib_mthca: Initializing :d9:00.0 [444107.815375] Oops: Kernel access of bad area, sig: 7 [#1] [444107.815389] SMP NR_CPUS=8 NUMA PSERIES LPAR [444107.815401] Modules linked in: ib_ipoib ib_sa ib_mthca ib_mad ib_core openaf s [444107.815425] NIP: D98BF638 XER: 2018 LR: C0057B2C CTR: D0 00098BF5D0 [444107.815440] REGS: c001ee79b490 TRAP: 0300 Tainted: P (2.6.13.3-p ower5) [444107.815455] MSR: 80009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 CR: 2800 0084 [444107.815469] DAR: d10082189a04 DSISR: 4000 [444107.815481] TASK: c001ee7950e0[0] 'swapper' THREAD: c001ee798000 CPU : 6 [444107.815494] GPR00: 0010 C001EE79B710 D98D6540 D1 0082189A04 [444107.815515] GPR04: 0008 0001009D0180 00 000800 [444107.815535] GPR08: C003DDA91910 C001EE79B840 D10082189A04 [444107.815556] GPR12: 4882 C04BF400 00C00060 [444107.815576] GPR16: 0006 [444107.815595] GPR20: C05F7ED8 C05F7F40 C0606500 [444107.815617] GPR24: C001ECEFC498 C001EE79B840 C001EE798000 C003DDA91000 [444107.815639] GPR28: 0100 C003DDA91000 D98D4EC0 [444107.815661] NIP [d98bf638] .poll_catas+0x68/0x2f0 [ib_mthca] [444107.815699] LR [c0057b2c] .run_timer_softirq+0x15c/0x260 [444107.815717] Call Trace: [444107.815725] [c001ee79b710] [c001ee79b7c0] 0xc001ee79b7c0 (unreliable) [444107.815744] [c001ee79b7d0] [c0057b2c] .run_timer_softirq+0x15c/0x260 [444107.815764] [c001ee79b890] [c0051e68] .__do_softirq+0xe8/0x1c0 [444107.815783] [c001ee79b950] [c0051fc4] .do_softirq+0x84/0x90 [444107.815801] [c001ee79b9d0] [c00108f0] .timer_interrupt+0xd0/0x41 0 [444107.815821] [c001ee79bad0] [c000a2b4] decrementer_common+0xb4/0x100 [444107.815838] --- Exception: 901 at .pseries_dedicated_idle+0x104/0x280 [444107.815857] LR = .pseries_dedicated_idle+0x1e0/0x280 [444107.815868] [c001ee79be90] [c000f460] .cpu_idle+0x40/0x60 [444107.815886] [c001ee79bf00] [c0032fa0] .start_secondary+0x120/0x150 [444107.815905] [c001ee79bf90] [c000ba7c] .enable_64b_mode+0x0/0x28 [444107.815922] Instruction dump: [444107.815930] 3be0 4820 2fab 381f0001 7c1f07b4 409e0058 801d0908 7f9f0040 [444107.815955] 409c00c8 e97d08f8 7be91764 7c6b4a14 <7c001c2c> 0c00 4c00012c 780b0020 [444107.815983] <0>Kernel panic - not syncing: Fatal exception in interrupt [444107.815998] ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] PGI compiler issue with dat_platform_specific.h
Hi, We ran into some troubles when compiling the OpenIB dapl provider with the PGI compiler. I believe this should appear in both ibat-cm and the scm based providers. Has anyone compiled DAPL/Gen2 with PGI? Is there a quick workaround for this? PGC-W-0221-Redefinition of symbol UINT64_C (/usr/include/stdint.h: 304) PGC-S-0040-Illegal use of symbol, u_int64_t (/home/1/surs/projects/Gen2/dapl_scm _patch/dapl/dat/include/dat/dat_platform_specific.h: 139) PGC/x86-64 Linux/x86-64 6.0-5: compilation completed with severe errors Our machine is SuSe 9.3, with linux kernel version 2.6.13.1 and OpenIB svn #3882. Thanks, Sayantan. -- http://www.cse.ohio-state.edu/~surs ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] ifup/ifdown scripts don't work with IPoIB
On Thu, Oct 27, 2005 at 08:58:50AM -0700, Bob Woodruff wrote: > If I run the arping command manually, I get > > arping -c 2 -w 3 -D -I ib0 102.168.0.1 What does it say when you use *192* for the first byte? (This may not be the only problem...but need to get that right too) grant ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] osm_console.c - compilation warnings
Title: osm_console.c - compilation warnings Hi Hal I think you are missing #include As I get the following warnings: osm_console.c: In function `loglevel_parse': osm_console.c:112: warning: implicit declaration of function `strtoul' osm_console.c:118: warning: implicit declaration of function `strtol' osm_console.c: In function `osm_console': osm_console.c:177: warning: implicit declaration of function `free' EZ Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] ifup/ifdown scripts don't work with IPoIB
I was trying to set up my system to use the normal /etc/sysconfig/network-scripts/ifcfg-ib0 and have the interface brought up at startup using /sbin/ifup, as it does with Ethernet. I am running on a RedHat EL4.0 U2 distribution. My config files looks like this, # OpenIB IPoIB Controller DEVICE=ib0 BOOTPROTO=static ONBOOT=yes IPADDR=192.168.0.1 NETMASK=255.255.255.0 BROADCAST=192.168.0.255 When I run /sbin/ifup ib0, I get [EMAIL PROTECTED] woody]# /sbin/ifup ib0 Error, some other host already uses address 192.168.0.1. Looking at the ifup script, it does a if ! arping -q -c 2 -w 3 -D -I ${REALDEVICE} ${IPADDR} ; then echo $"Error, some other host already uses address ${IPADDR}." exit 1 fi If I run the arping command manually, I get arping -c 2 -w 3 -D -I ib0 102.168.0.1 ARPING 102.168.0.1 from 0.0.0.0 ib0 Sent 2 probes (2 broadcast(s)) Received -1 response(s) but when I run it on the eth0 device, I get arping -c 2 -w 3 -D -I eth0 10.0.0.1 ARPING 10.0.0.1 from 0.0.0.0 eth0 Sent 2 probes (2 broadcast(s)) Received 0 response(s) So why with IPoIB, does arping return -1 for IPoIB, rather than 0 like it does with ethernet ? Is this a problem with IPoIB or the ifup script ? woody ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [RFC] OpenSM Interactive Console
Title: RE: [openib-general] [RFC] OpenSM Interactive Console Hi Hal, I still think that a "server" like behavior is much preferable to having the SM sit there and wait for console inputs. The SM is a service and thus should run like a daemon. MIB is just a standard way to avoid the need to define our own protocol to do that. In your implementation the SM should be put in console mode from the first invocation and thus will need a dedicated terminal. Even with osmsh one could implement (using standard Tcl sockets) a simple server that could just wait for remote commands (I can provide the code as I have done zillions of such servers). The MIB is nicer and I think it is not very complicated to implement. At least not the trivial groups of setting SM parameters. The more I think about it the more I get convinced we need to do it. Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -Original Message- > From: Hal Rosenstock [mailto:[EMAIL PROTECTED]] > Sent: Thursday, October 27, 2005 1:45 PM > To: Eitan Zahavi > Cc: Troy Benjegerdes; openib-general@openib.org > Subject: RE: [openib-general] [RFC] OpenSM Interactive Console > > There have been requests for this CLI functionality from at least the labs. It has been > discussed on the list. > > Also, there was the following comment in OpenSM::main.c: > > /* > Sit here forever > In the future, some sort of console interactivity could > be implemented in this loop. > */ > > -- Hal > > > > From: Eitan Zahavi [mailto:[EMAIL PROTECTED]] > Sent: Thu 10/27/2005 2:03 AM > To: Hal Rosenstock; Eitan Zahavi > Cc: Troy Benjegerdes; openib-general@openib.org > Subject: RE: [openib-general] [RFC] OpenSM Interactive Console > > > > Yes this MIB needs some cleanup. > I would love to hear from the community some feedback regarding SM MIB > usefulness. > > In the past we did not get any push for interactive SM or online configurable SM so I > did not see any reason to work on it. > > I do not think it is a huge task to make SM MIB work with OpenSM. At least not the > 90% of it that I glanced through. > > > Eitan Zahavi > Design Technology Director > Mellanox Technologies LTD > Tel:+972-4-9097208 > Fax:+972-4-9593245 > P.O. Box 586 Yokneam 20692 ISRAEL > > > > -Original Message- > > From: Hal Rosenstock [mailto:[EMAIL PROTECTED]] > > Sent: Wednesday, October 26, 2005 7:44 PM > > To: Eitan Zahavi > > Cc: Troy Benjegerdes; openib-general@openib.org > > Subject: RE: [openib-general] [RFC] OpenSM Interactive Console > > > > Hi Eitan, > > > > I sit corrected. There are R/W parameters in the SM MIB as you indicate. I was > > thinking of all the other IPoIB MIBs. It's been a while since I looked at the SM MIB. > > > > Also, the SM MIB (draft-ietf-ipoib-subnet-manager-mib-00) expired a while ago. At > a > > minimum, it needs to be dusted off. That would include updating it for IBA 1.2. > > > > -- Hal > > > > > > > > From: Eitan Zahavi [mailto:[EMAIL PROTECTED]] > > Sent: Tue 10/25/2005 5:19 AM > > To: Hal Rosenstock > > Cc: Troy Benjegerdes; openib-general@openib.org > > Subject: Re: [openib-general] [RFC] OpenSM Interactive Console > > > > > > > > Hal Rosenstock wrote: > > > On Mon, 2005-10-24 at 14:38, Eitan Zahavi wrote: > > > > > >>Hal Rosenstock wrote: > > >> > > >>>On Mon, 2005-10-24 at 03:08, Eitan Zahavi wrote: > > >>> > > >>> > > I would suggest to use SNMP for the tasks below. IETF IPoIB group > > > > > > has > > > > > defined an SNMP MIB that can support the required functionality > > > > > > below. > > > > > >>> > > >>>The IETF SNMP MIBs are one way of presenting the information to the > > >>>outside world. There are other possible management interfaces. The > > > > > > SNMP > > > > > >>>MIB instrumentation would need to use lower layer APIs to get this > > >>>information out of the SM. > > >> > > >>Yes but the IETF SM MIB is the only one that is close to a standard > > > > > > way. > > > > > >>It does not require low level interface if it will integrate into the > > > > > > OpenSM code. > > > > > >>One way to do it is buy extending OpenSM with an AgentX interface. > > >> > > >>IMO one clear advantage of using SNMP for SM integration is that the > > > > > > code will work with any SM that is IETF compliant. > > > > > >>Also if you want to write a "client server" type of application on top > > > > > > of an SM you > > > > > >>can either stick to sending MADs which translate into SA client based > > > > > > application or > > > > > >>you better stay with some known protocol for management (like SNMP) > > > > > > and not develop yet another protocol for > > > > > >>doing exactly the same things as SNMP already supports. > > > > > > > > > There are limitations in the SNMP MIBs. One is that they are RO so they > > > are more for monitoring. Also, many environments do not use S
Re: [openib-general] [RFC] OpenSM Interactive Console
For me, the only purpose for an SNMP MIB would be to get the information into a network management system. In my case, I'll be using something that's open-source or has a plugin architecture like Nagios, and I'd really rather just have the network management system communicate with the subnet manager or SMA packets directly rather than introducting an extra translation to SNMP. SNMP is only usefull to me because it is (in theory) an interoperable cross-vendor standard. In the infiniband case, we already have a cross-vendor standard implementation (OpenIB), and adding SNMP is another dependency and layer of complexity that can break and be difficult to set up. If I knew of an open-source tool that was actually able to use SNMP to query a random ethernet vendor's switch and be able to tell me what port a particular MAC address was plugged into, I might be more positive. But as far as I know, each vendor's SNMP implementation is broken in subtly different ways, so that this gets to be a nightmare to actually implement. I guess the point of all this is find a end-user use-case for the SM MIB, and work back from there to decide if haveing a MIB actually helps solve the problem. On Thu, Oct 27, 2005 at 08:03:57AM +0200, Eitan Zahavi wrote: > Yes this MIB needs some cleanup. > I would love to hear from the community some feedback regarding SM MIB > usefulness. > > In the past we did not get any push for interactive SM or online > configurable SM so I did not see any reason to work on it. > > I do not think it is a huge task to make SM MIB work with OpenSM. At least > not the 90% of it that I glanced through. > > > Eitan Zahavi > Design Technology Director > Mellanox Technologies LTD > Tel:+972-4-9097208 > Fax:+972-4-9593245 > P.O. Box 586 Yokneam 20692 ISRAEL > > > > -Original Message- > > From: Hal Rosenstock [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, October 26, 2005 7:44 PM > > To: Eitan Zahavi > > Cc: Troy Benjegerdes; openib-general@openib.org > > Subject: RE: [openib-general] [RFC] OpenSM Interactive Console > > > > Hi Eitan, > > > > I sit corrected. There are R/W parameters in the SM MIB as you indicate. I > was > > thinking of all the other IPoIB MIBs. It's been a while since I looked at > the SM MIB. > > > > Also, the SM MIB (draft-ietf-ipoib-subnet-manager-mib-00) expired a while > ago. At a > > minimum, it needs to be dusted off. That would include updating it for IBA > 1.2. > > > > -- Hal > > > > > > > > From: Eitan Zahavi [mailto:[EMAIL PROTECTED] > > Sent: Tue 10/25/2005 5:19 AM > > To: Hal Rosenstock > > Cc: Troy Benjegerdes; openib-general@openib.org > > Subject: Re: [openib-general] [RFC] OpenSM Interactive Console > > > > > > > > Hal Rosenstock wrote: > > > On Mon, 2005-10-24 at 14:38, Eitan Zahavi wrote: > > > > > >>Hal Rosenstock wrote: > > >> > > >>>On Mon, 2005-10-24 at 03:08, Eitan Zahavi wrote: > > >>> > > >>> > > I would suggest to use SNMP for the tasks below. IETF IPoIB group > > > > > > has > > > > > defined an SNMP MIB that can support the required functionality > > > > > > below. > > > > > >>> > > >>>The IETF SNMP MIBs are one way of presenting the information to the > > >>>outside world. There are other possible management interfaces. The > > > > > > SNMP > > > > > >>>MIB instrumentation would need to use lower layer APIs to get this > > >>>information out of the SM. > > >> > > >>Yes but the IETF SM MIB is the only one that is close to a standard > > > > > > way. > > > > > >>It does not require low level interface if it will integrate into the > > > > > > OpenSM code. > > > > > >>One way to do it is buy extending OpenSM with an AgentX interface. > > >> > > >>IMO one clear advantage of using SNMP for SM integration is that the > > > > > > code will work with any SM that is IETF compliant. > > > > > >>Also if you want to write a "client server" type of application on top > > > > > > of an SM you > > > > > >>can either stick to sending MADs which translate into SA client based > > > > > > application or > > > > > >>you better stay with some known protocol for management (like SNMP) > > > > > > and not develop yet another protocol for > > > > > >>doing exactly the same things as SNMP already supports. > > > > > > > > > There are limitations in the SNMP MIBs. One is that they are RO so they > > > are more for monitoring. Also, many environments do not use SNMP. It is > > > unclear how much of a requirement it is to manage any SM or how many > > > other SMs support the SM MIB. (There are other IB associated MIBs too). > > > > SNMP MIBs are certainly not just RO a simple example from the SM MIB: > >ibSmPortInfoLMC OBJECT-TYPE > >SYNTAX Unsigned32(0..7) > >MAX-ACCESS read-write > >STATUS current > >DESCRIPTION > > "LID mask for multipath support. User should take extra caution > > when setting this value, since any ch
[openib-general] [PATCH] Opensm - fix lmc algorithm
Hi Hal, We noticed a problem in the lmc assignment algorithm. In the current code - when trying to run opensm with lmc > 0, the opensm goes into infinite loop. Debugging the problem we noticed that there is a problem with the lid assignment, and we changed the algorithm. The change is in the osm_lid_mgr_init_sweep function. We have done some testing to the new code, and it seems that the lmc assignment is ok with the fix. Thanks, Yael Signed-off-by: Yael Kalka <[EMAIL PROTECTED]> Index: opensm/osm_lid_mgr.c === --- opensm/osm_lid_mgr.c(revision 3848) +++ opensm/osm_lid_mgr.c(working copy) @@ -337,7 +337,7 @@ __osm_lid_mgr_init_sweep( uint16_t max_defined_lid; uint16_t max_persistent_lid; uint16_t max_discovered_lid; - uint16_t lid, l; + uint16_t lid; uint16_t disc_min_lid; uint16_t disc_max_lid; uint16_t db_min_lid; @@ -349,16 +349,23 @@ __osm_lid_mgr_init_sweep( osm_port_t *p_port; cl_qmap_t *p_port_guid_tbl; uint8_t lmc_num_lids = (uint8_t)(1 << p_mgr->p_subn->opt.lmc); + uint16_t lmc_mask; + uint16_t req_lid, num_lids; OSM_LOG_ENTER( p_mgr->p_log, __osm_lid_mgr_init_sweep ); + if (p_mgr->p_subn->opt.lmc) +lmc_mask = ~((1 << p_mgr->p_subn->opt.lmc) - 1); + else +lmc_mask = 0x; + /* if we came out of standby we need to discard any previous guid 2 lid info we might had */ if ( p_mgr->p_subn->coming_out_of_standby == TRUE ) { osm_db_clear( p_mgr->p_g2l ); for (lid = 0; lid < cl_ptr_vector_get_size(&p_mgr->used_lids); lid++) - cl_ptr_vector_set(&p_mgr->used_lids, lid, NULL); + cl_ptr_vector_set(p_persistent_vec, lid, NULL); } /* we need to cleanup the empty ranges list */ @@ -375,7 +382,7 @@ __osm_lid_mgr_init_sweep( /* we if are on the first sweep and in re-assign lids mode we should ignore all the available info and simply define one - hufe empty range */ + huge empty range */ if ((p_mgr->p_subn->first_time_master_sweep == TRUE) && (p_mgr->p_subn->opt.reassign_lids == TRUE )) { @@ -398,6 +405,34 @@ __osm_lid_mgr_init_sweep( osm_port_get_lid_range_ho(p_port, &disc_min_lid, &disc_max_lid); for (lid = disc_min_lid; lid <= disc_max_lid; lid++) cl_ptr_vector_set(p_discovered_vec, lid, p_port ); +/* make sure the guid2lid entry is valid. If not - clean it. */ +if (!osm_db_guid2lid_get( p_mgr->p_g2l, + cl_ntoh64(osm_port_get_guid(p_port)), + &db_min_lid, &db_max_lid)) +{ + if ( osm_node_get_type( osm_port_get_parent_node( p_port ) ) != + IB_NODE_TYPE_SWITCH) +num_lids = lmc_num_lids; + else +num_lids = 1; + + if ((num_lids != 1) && + (((db_min_lid & lmc_mask) != db_min_lid) || + (db_max_lid - db_min_lid + 1 < num_lids)) ) + { +/* Not alligned, or not wide enough - remove the entry */ +osm_log( p_mgr->p_log, OSM_LOG_DEBUG, + "__osm_lid_mgr_init_sweep: " + "Cleaning persistent entry for guid:0x%016" PRIx64 + " illegal range:[0x%x:0x%x] \n", + cl_ntoh64(osm_port_get_guid(p_port)), db_min_lid, + db_max_lid ); +osm_db_guid2lid_delete( p_mgr->p_g2l, +cl_ntoh64(osm_port_get_guid(p_port))); +for ( lid = db_min_lid ; lid <= db_max_lid ; lid++ ) + cl_ptr_vector_set(p_persistent_vec, lid, NULL); + } +} } /* @@ -434,7 +469,7 @@ __osm_lid_mgr_init_sweep( { is_free = TRUE; /* first check to see if the lid is used by a persistent assignment */ -if ((lid < max_persistent_lid) && cl_ptr_vector_get(p_persistent_vec, lid)) +if ((lid <= max_persistent_lid) && cl_ptr_vector_get(p_persistent_vec, lid)) { osm_log( p_mgr->p_log, OSM_LOG_DEBUG, "__osm_lid_mgr_init_sweep: " @@ -442,62 +477,86 @@ __osm_lid_mgr_init_sweep( lid); is_free = FALSE; } - -/* check the discovered port if there is one */ -if ((lid < max_discovered_lid) && -(p_port = (osm_port_t *)cl_ptr_vector_get(p_discovered_vec, lid))) +else { - /* get the lid range of that port - but we know how many lids we - are about to assign to it */ - osm_port_get_lid_range_ho(p_port, &disc_min_lid, &disc_max_lid); - if ( osm_node_get_type( osm_port_get_parent_node( p_port ) ) != - IB_NODE_TYPE_SWITCH) -disc_max_lid = disc_min_lid + lmc_num_lids - 1; - + /* check this is a discovered port */ + CL_ASSERT(lid <= max_discovered_lid); + if ((p_port = (osm_port_t *)cl_ptr_vector_get(p_discovered_vec, lid))) + { +/* we have a port. Now lets see
[openib-general] Re: Automated userspace build error
Quoting r. Nishanth Aravamudan <[EMAIL PROTECTED]>: > Subject: Re: Automated userspace build error > > On 25.10.2005 [15:22:56 -0700], Roland Dreier wrote: > > Nishanth> Hrm, well, I'm testing the latest svn (3865), did the > > Nishanth> patch just get checked in? > > > > Yeah, I only noticed it and fixed it after your original email. I > > just meant that I had already checked it in before sending my reply. > > Sorry for the confusion... > > No worries, I figured that's what happened. > > On a related note, do you (or anyone else) have any suggestions for > build-testing all of the userspace components? There isn't a top-level > Makefile of any kind to make it easy :/ > > Thanks, > Nish Yes, look at scripts in https://openib.org/svn/trunk/contrib/mellanox/scripts You can also, basically, cut and paste stuff from the FAQ page, but that relies on performing make as root. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [RFC] OpenSM Interactive Console
There have been requests for this CLI functionality from at least the labs. It has been discussed on the list. Also, there was the following comment in OpenSM::main.c: /* Sit here forever In the future, some sort of console interactivity could be implemented in this loop. */ -- Hal From: Eitan Zahavi [mailto:[EMAIL PROTECTED] Sent: Thu 10/27/2005 2:03 AM To: Hal Rosenstock; Eitan Zahavi Cc: Troy Benjegerdes; openib-general@openib.org Subject: RE: [openib-general] [RFC] OpenSM Interactive Console Yes this MIB needs some cleanup. I would love to hear from the community some feedback regarding SM MIB usefulness. In the past we did not get any push for interactive SM or online configurable SM so I did not see any reason to work on it. I do not think it is a huge task to make SM MIB work with OpenSM. At least not the 90% of it that I glanced through. Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -Original Message- > From: Hal Rosenstock [mailto:[EMAIL PROTECTED] > Sent: Wednesday, October 26, 2005 7:44 PM > To: Eitan Zahavi > Cc: Troy Benjegerdes; openib-general@openib.org > Subject: RE: [openib-general] [RFC] OpenSM Interactive Console > > Hi Eitan, > > I sit corrected. There are R/W parameters in the SM MIB as you indicate. I > was > thinking of all the other IPoIB MIBs. It's been a while since I looked at the > SM MIB. > > Also, the SM MIB (draft-ietf-ipoib-subnet-manager-mib-00) expired a while > ago. At a > minimum, it needs to be dusted off. That would include updating it for IBA > 1.2. > > -- Hal > > > > From: Eitan Zahavi [mailto:[EMAIL PROTECTED] > Sent: Tue 10/25/2005 5:19 AM > To: Hal Rosenstock > Cc: Troy Benjegerdes; openib-general@openib.org > Subject: Re: [openib-general] [RFC] OpenSM Interactive Console > > > > Hal Rosenstock wrote: > > On Mon, 2005-10-24 at 14:38, Eitan Zahavi wrote: > > > >>Hal Rosenstock wrote: > >> > >>>On Mon, 2005-10-24 at 03:08, Eitan Zahavi wrote: > >>> > >>> > I would suggest to use SNMP for the tasks below. IETF IPoIB group > > > > has > > > defined an SNMP MIB that can support the required functionality > > > > below. > > > >>> > >>>The IETF SNMP MIBs are one way of presenting the information to the > >>>outside world. There are other possible management interfaces. The > > > > SNMP > > > >>>MIB instrumentation would need to use lower layer APIs to get this > >>>information out of the SM. > >> > >>Yes but the IETF SM MIB is the only one that is close to a standard > > > > way. > > > >>It does not require low level interface if it will integrate into the > > > > OpenSM code. > > > >>One way to do it is buy extending OpenSM with an AgentX interface. > >> > >>IMO one clear advantage of using SNMP for SM integration is that the > > > > code will work with any SM that is IETF compliant. > > > >>Also if you want to write a "client server" type of application on top > > > > of an SM you > > > >>can either stick to sending MADs which translate into SA client based > > > > application or > > > >>you better stay with some known protocol for management (like SNMP) > > > > and not develop yet another protocol for > > > >>doing exactly the same things as SNMP already supports. > > > > > > There are limitations in the SNMP MIBs. One is that they are RO so they > > are more for monitoring. Also, many environments do not use SNMP. It is > > unclear how much of a requirement it is to manage any SM or how many > > other SMs support the SM MIB. (There are other IB associated MIBs too). > > SNMP MIBs are certainly not just RO a simple example from the SM MIB: >ibSmPortInfoLMC OBJECT-TYPE >SYNTAX Unsigned32(0..7) >MAX-ACCESS read-write >STATUS current >DESCRIPTION > "LID mask for multipath support. User should take extra caution > when setting this value, since any change will effect packet > routing." >::= { ibSmPortInfoEntry 19 } > > > I agree that it is possible that currently no SM is supporting the SM MIB. > But it does make sense to have ALL of the them support it. Such that they can > be activated/deactivated and configured in the manner. > > Most unix distributions and windows box have standard SNMP agent and client > included in them > So it does not take more then simple bash or C code to interact with the SM > if it > supports SNMP. > > > > > > Everything but the dynamic partitioning (OpenSM does not have > partition manager to this moment) > >>> > >>> > >>>What Troy meant by partitioning is not necessarily IB partitioning. > >> > >>How are you sure about that? Troy - please comment. > > > > > > I think you missed an email on th