Re: [PATCH v2 38/51] IB/qib: Add qib_sysfs.c
On Fri, 2010-04-09 at 17:27 -0700, Jason Gunthorpe wrote: > On Fri, Apr 09, 2010 at 05:13:24PM -0700, Ralph Campbell wrote: > > > For the QSFP data, I hope I can leave it as is since it is > > related to the link state that the other files contain. > > It is a read-only file so no issue with trying to set a value. > > There was some flack for other stuff like this a while back. > > IMHO, it would be appropriate to have a hex dump of the entire QFSP > EEPROM and leave parsing to userspace, or put the parsed version in > debugfs.. > > Jason OK. I will move it to our file system which is used to export binary data. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 38/51] IB/qib: Add qib_sysfs.c
On Fri, Apr 09, 2010 at 05:13:24PM -0700, Ralph Campbell wrote: > For the QSFP data, I hope I can leave it as is since it is > related to the link state that the other files contain. > It is a read-only file so no issue with trying to set a value. There was some flack for other stuff like this a while back. IMHO, it would be appropriate to have a hex dump of the entire QFSP EEPROM and leave parsing to userspace, or put the parsed version in debugfs.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 38/51] IB/qib: Add qib_sysfs.c
On Fri, Apr 09, 2010 at 04:31:26PM -0700, Ralph Campbell wrote: > I would be very reluctant to change this code. > One device we have doesn't have an i2c address, the "address" is the > address of memory within the chip, not the address of the chip on the > bus. Fair enough.. But mixing not-i2c + QSFP real i2c on the same bus = too scary for me :) Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 38/51] IB/qib: Add qib_sysfs.c
On Thu, 2010-04-08 at 14:33 -0700, Roland Dreier wrote: > > This was for debugging and clearly could use a better method. > > Some stats are definitely useful and having files per value > > would make scripting easier too. > > I could add /sys/class/infiniband/qib0/ports/1/diag_counters/* > > > > One other place that has multiple values is dumping the QSFP > > data from the IB cable. > > You can stick debugging data under debugfs. Or just kill it for now and > figure out how to add it back later. OK. I replaced the /sys/class/infiniband/qib0/stats file with % ls /sys/class/infiniband/qib0/ports/1/diag_counters/ dmawait pkt_dropsrc_dupreq rc_seqnakrnr_naks loop_pkts rc_acks rc_qacksrc_timeouts seq_naks other_naks rc_delayed_comp rc_resends rdma_seq unaligned These are all simple counter values. For the QSFP data, I hope I can leave it as is since it is related to the link state that the other files contain. It is a read-only file so no issue with trying to set a value. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 38/51] IB/qib: Add qib_sysfs.c
On Thu, 2010-04-08 at 15:50 -0700, Jason Gunthorpe wrote: > On Thu, Apr 08, 2010 at 03:26:41PM -0700, Ralph Campbell wrote: > > On Thu, 2010-04-08 at 15:08 -0700, Jason Gunthorpe wrote: > > > On Thu, Apr 08, 2010 at 02:29:53PM -0700, Ralph Campbell wrote: > > > > > > > One other place that has multiple values is dumping the QSFP > > > > data from the IB cable. > > > > > > I was going to comment that building your own I2C subsystem is kinda > > > strange, can't the in-kernel stuff be used? > > > It is a bit strange but we have had this discussion before with > > the ib_ipath driver. Basically, the devices connected to the bus > > weren't really I2C compliant. Since we had to support the > > non-standard parts, it was easier to have only one set of code > > that could handle both. > > Is this still true on qib? I didn't notice anything too obviously > weird, and QSFPs are clearly standard i2c.. > > The i2c layer has a pretty flexible definition of i2c these days.. > > Jason I would be very reluctant to change this code. One device we have doesn't have an i2c address, the "address" is the address of memory within the chip, not the address of the chip on the bus. Reading some of the QSFP cable's EEPROM caused another EEPROM on the bus to be erased. We had to put in defensive code to check for this case. I'm not the expert who wrote this code (he retired) so I would have to do a lot of work making sure the older parts and cables with these problems worked after a major rewrite of the code. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 38/51] IB/qib: Add qib_sysfs.c
On Thu, 08 Apr 2010 16:29:06 -0700 Roland Dreier wrote: > > I for one would rather not see this die. We have debugged some critical > > issues using this data. The sysfs entries above are what Mellanox uses. > > Should those also be changed? > > Which sysfs entries? I don't see any likely looking code in either of > the Mellanox drivers. My apologies. It appears this an OFED thing.[*] 16:31:04 > ls -la /sys/class/infiniband/mlx4_0/diag_counters/ total 0 drwxr-xr-x 2 root root0 Apr 8 16:31 ./ drwxr-xr-x 4 root root0 Apr 8 16:31 ../ --w--w--w- 1 root root 4096 Apr 8 16:31 clear_diag -r--r--r-- 1 root root 4096 Apr 8 16:31 num_baddb -r--r--r-- 1 root root 4096 Apr 8 16:31 num_cqovf ... This is with the current RHEL5.4 drivers. So is there an equivalent for mlx4 in the upstream kernel? I don't see them. Do you feel it is appropriate for the port counters to be in sysfs? 16:46:19 > ls -la /sys/class/infiniband/mlx4_0/ports/1/counters/ total 0 drwxr-xr-x 2 root root0 Apr 8 16:46 ./ drwxr-xr-x 5 root root0 Apr 6 09:33 ../ -r--r--r-- 1 root root 4096 Apr 8 16:46 VL15_dropped -r--r--r-- 1 root root 4096 Apr 8 16:46 excessive_buffer_overrun_errors ... Although these "diag_counters" are internal to the cards they do track counts which are part of the IB spec like RNR NAK's. So I am not sure they are really "debug" in the strictest sense. I will rephrase my statement, I would like to see these counters included as they give valuable information about what is going on in a cluster. Ira * OFED MUST die! > > - R. > > -- > Roland Dreier || For corporate legal information go to: > http://*www.*cisco.com/web/about/doing_business/legal/cri/index.html > -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 38/51] IB/qib: Add qib_sysfs.c
On Thu, 2010-04-08 at 16:29 -0700, Roland Dreier wrote: > > I for one would rather not see this die. We have debugged some critical > > issues using this data. The sysfs entries above are what Mellanox uses. > > Should those also be changed? > > Which sysfs entries? I don't see any likely looking code in either of > the Mellanox drivers. > > - R. I think Ira means: # ls /sys/class/infiniband/mlx4_0/diag_counters/ clear_diagrq_num_lqpoerq_num_udsdprd sq_num_lqpoe sq_num_rree num_baddb rq_num_mce rq_num_wrfe sq_num_mwbesq_num_rsync num_cqovf rq_num_oos sq_num_bre sq_num_oos sq_num_tree num_eqovf rq_num_rae sq_num_ieecne sq_num_rabrte sq_num_wrfe rq_num_laerq_num_rire sq_num_ieecse sq_num_rae rq_num_leeoe rq_num_rnr sq_num_leeoesq_num_rire rq_num_llerq_num_rsyncsq_num_lle sq_num_rnr rq_num_lperq_num_ucsdprd sq_num_lpe sq_num_roe I couldn't find where these entries are added in the kernel.org code. It is in the OFED tree though. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 38/51] IB/qib: Add qib_sysfs.c
> I for one would rather not see this die. We have debugged some critical > issues using this data. The sysfs entries above are what Mellanox uses. > Should those also be changed? Which sysfs entries? I don't see any likely looking code in either of the Mellanox drivers. - R. -- Roland Dreier || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 38/51] IB/qib: Add qib_sysfs.c
On Thu, 2010-04-08 at 15:50 -0700, Jason Gunthorpe wrote: > On Thu, Apr 08, 2010 at 03:26:41PM -0700, Ralph Campbell wrote: > > On Thu, 2010-04-08 at 15:08 -0700, Jason Gunthorpe wrote: > > > On Thu, Apr 08, 2010 at 02:29:53PM -0700, Ralph Campbell wrote: > > > > > > > One other place that has multiple values is dumping the QSFP > > > > data from the IB cable. > > > > > > I was going to comment that building your own I2C subsystem is kinda > > > strange, can't the in-kernel stuff be used? > > > It is a bit strange but we have had this discussion before with > > the ib_ipath driver. Basically, the devices connected to the bus > > weren't really I2C compliant. Since we had to support the > > non-standard parts, it was easier to have only one set of code > > that could handle both. > > Is this still true on qib? I didn't notice anything too obviously > weird, and QSFPs are clearly standard i2c.. > > The i2c layer has a pretty flexible definition of i2c these days.. > > Jason I will take another look but don't be surprised if it takes me awhile since I'm also working on addressing the other comments too. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 38/51] IB/qib: Add qib_sysfs.c
On Thu, Apr 08, 2010 at 03:26:41PM -0700, Ralph Campbell wrote: > On Thu, 2010-04-08 at 15:08 -0700, Jason Gunthorpe wrote: > > On Thu, Apr 08, 2010 at 02:29:53PM -0700, Ralph Campbell wrote: > > > > > One other place that has multiple values is dumping the QSFP > > > data from the IB cable. > > > > I was going to comment that building your own I2C subsystem is kinda > > strange, can't the in-kernel stuff be used? > It is a bit strange but we have had this discussion before with > the ib_ipath driver. Basically, the devices connected to the bus > weren't really I2C compliant. Since we had to support the > non-standard parts, it was easier to have only one set of code > that could handle both. Is this still true on qib? I didn't notice anything too obviously weird, and QSFPs are clearly standard i2c.. The i2c layer has a pretty flexible definition of i2c these days.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 38/51] IB/qib: Add qib_sysfs.c
On Thu, 2010-04-08 at 15:08 -0700, Jason Gunthorpe wrote: > On Thu, Apr 08, 2010 at 02:29:53PM -0700, Ralph Campbell wrote: > > > One other place that has multiple values is dumping the QSFP > > data from the IB cable. > > I was going to comment that building your own I2C subsystem is kinda > strange, can't the in-kernel stuff be used? > > Jason It is a bit strange but we have had this discussion before with the ib_ipath driver. Basically, the devices connected to the bus weren't really I2C compliant. Since we had to support the non-standard parts, it was easier to have only one set of code that could handle both. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 38/51] IB/qib: Add qib_sysfs.c
On Thu, 08 Apr 2010 14:33:59 -0700 Roland Dreier wrote: > > This was for debugging and clearly could use a better method. > > Some stats are definitely useful and having files per value > > would make scripting easier too. > > I could add /sys/class/infiniband/qib0/ports/1/diag_counters/* > > > > One other place that has multiple values is dumping the QSFP > > data from the IB cable. > > You can stick debugging data under debugfs. Or just kill it for now and > figure out how to add it back later. I for one would rather not see this die. We have debugged some critical issues using this data. The sysfs entries above are what Mellanox uses. Should those also be changed? Ira > -- > Roland Dreier || For corporate legal information go to: > http://*www.*cisco.com/web/about/doing_business/legal/cri/index.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://*vger.kernel.org/majordomo-info.html > -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 38/51] IB/qib: Add qib_sysfs.c
On Thu, Apr 08, 2010 at 02:29:53PM -0700, Ralph Campbell wrote: > One other place that has multiple values is dumping the QSFP > data from the IB cable. I was going to comment that building your own I2C subsystem is kinda strange, can't the in-kernel stuff be used? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 38/51] IB/qib: Add qib_sysfs.c
> This was for debugging and clearly could use a better method. > Some stats are definitely useful and having files per value > would make scripting easier too. > I could add /sys/class/infiniband/qib0/ports/1/diag_counters/* > > One other place that has multiple values is dumping the QSFP > data from the IB cable. You can stick debugging data under debugfs. Or just kill it for now and figure out how to add it back later. -- Roland Dreier || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 38/51] IB/qib: Add qib_sysfs.c
This was for debugging and clearly could use a better method. Some stats are definitely useful and having files per value would make scripting easier too. I could add /sys/class/infiniband/qib0/ports/1/diag_counters/* One other place that has multiple values is dumping the QSFP data from the IB cable. On Thu, 2010-04-08 at 13:50 -0700, Roland Dreier wrote: > I didn't audit everything (so there may be more problems here too), but > certainly this violates the "one value per sysfs file" rule rather > egregiously: > > > +static ssize_t show_stats(struct device *device, struct device_attribute > *attr, > > +char *buf) > > +{ > > + struct qib_ibdev *dev = > > + container_of(device, struct qib_ibdev, ibdev.dev); > > + struct qib_devdata *dd = dd_from_dev(dev); > > + unsigned pidx; > > + unsigned i; > > + int len = 0; > > + unsigned long flags; > > + > > + for (pidx = 0; pidx < dd->num_pports; ++pidx) { > > + struct qib_ibport *ibp = &dd->pport[pidx].ibport_data; > > + > > + len += sprintf(buf + len, > > + "Port %d:\n" > > + "RC timeouts %d\n" > > + "RC resends %d\n" > > + "RC QACKs%d\n" > > + "RC SEQ NAKs %d\n" > > + "RC RDMA seq %d\n" > > + "RC RNR NAKs %d\n" > > + "RC OTH NAKs %d\n" > > + "RC DComp%d\n" > > + "RCr dup req %d\n" > > + "RCr SEQ NAK %d\n" > > + "wait piobuf %d\n" > > + "wait DMA%d\n" > > + "wait TX %d\n" > > + "unaligned %d\n" > > + "loop pkts %d\n" > > + "PKT drops %d\n", > > + dd->pport[pidx].port, > > + ibp->n_timeouts, ibp->n_rc_resends, > > + ibp->n_rc_qacks, ibp->n_seq_naks, > > + ibp->n_rdma_seq, ibp->n_rnr_naks, > > + ibp->n_other_naks, ibp->n_rc_delayed_comp, > > + ibp->n_rc_dupreq, ibp->n_rc_seqnak, > > + dev->n_piowait, ibp->n_dmawait, dev->n_txwait, > > + ibp->n_unaligned, ibp->n_loop_pkts, > > + ibp->n_pkt_drops); > > + for (i = 0; i < ARRAY_SIZE(ibp->opstats); i++) { > > + const struct qib_opcode_stats *si = &ibp->opstats[i]; > > + > > + if (!si->n_packets && !si->n_bytes) > > + continue; > > + len += sprintf(buf + len, "%02x %llu/%llu\n", i, > > + (unsigned long long) si->n_packets, > > + (unsigned long long) si->n_bytes); > > + } > > + } > > + len += sprintf(buf + len, "Ctx:npkts"); > > + for (i = 0; i < dd->first_user_ctxt; i++) { > > + if (!dd->rcd[i]) > > + continue; > > + len += sprintf(buf + len, " %u:%u", i, > > + dd->rcd[i]->pkt_count); > > + } > > + len += sprintf(buf + len, "\n"); > > + spin_lock_irqsave(&dev->qpt_lock, flags); > > + for (i = 0; i < dev->qp_table_size; i++) { > > + struct qib_qp *qp; > > + for (qp = dev->qp_table[i]; qp != NULL; qp = qp->next) { > > + struct qib_swqe *wqe; > > + > > + if (qp->s_last == qp->s_acked && > > + qp->s_acked == qp->s_cur && > > + qp->s_cur == qp->s_tail && > > + qp->s_tail == qp->s_head) > > + continue; > > + if (len + 128 >= PAGE_SIZE) > > + break; > > + wqe = get_swqe_ptr(qp, qp->s_last); > > + len += sprintf(buf + len, > > + "QP%u %s %u %u %u f=%x %u %u %u %u %u " > > + "PSN %x %x %x %x %x " > > + "(%u %u %u %u %u %u) QP%u LID %x\n", > > + qp->ibqp.qp_num, > > + qp_type_str[qp->ibqp.qp_type], > > + qp->state, > > + wqe->wr.opcode, > > + qp->s_hdrwords, > > + qp->s_flags, > > + atomic_read(&qp->s_dma_busy), > > + !list_empty(&qp->iowait), > > + qp->timeout, > > + wqe->ssn, > > + qp->s_lsn, > > + qp->s_last_psn, > > +
Re: [PATCH v2 38/51] IB/qib: Add qib_sysfs.c
I didn't audit everything (so there may be more problems here too), but certainly this violates the "one value per sysfs file" rule rather egregiously: > +static ssize_t show_stats(struct device *device, struct device_attribute > *attr, > + char *buf) > +{ > +struct qib_ibdev *dev = > +container_of(device, struct qib_ibdev, ibdev.dev); > +struct qib_devdata *dd = dd_from_dev(dev); > +unsigned pidx; > +unsigned i; > +int len = 0; > +unsigned long flags; > + > +for (pidx = 0; pidx < dd->num_pports; ++pidx) { > +struct qib_ibport *ibp = &dd->pport[pidx].ibport_data; > + > +len += sprintf(buf + len, > + "Port %d:\n" > + "RC timeouts %d\n" > + "RC resends %d\n" > + "RC QACKs%d\n" > + "RC SEQ NAKs %d\n" > + "RC RDMA seq %d\n" > + "RC RNR NAKs %d\n" > + "RC OTH NAKs %d\n" > + "RC DComp%d\n" > + "RCr dup req %d\n" > + "RCr SEQ NAK %d\n" > + "wait piobuf %d\n" > + "wait DMA%d\n" > + "wait TX %d\n" > + "unaligned %d\n" > + "loop pkts %d\n" > + "PKT drops %d\n", > + dd->pport[pidx].port, > + ibp->n_timeouts, ibp->n_rc_resends, > + ibp->n_rc_qacks, ibp->n_seq_naks, > + ibp->n_rdma_seq, ibp->n_rnr_naks, > + ibp->n_other_naks, ibp->n_rc_delayed_comp, > + ibp->n_rc_dupreq, ibp->n_rc_seqnak, > + dev->n_piowait, ibp->n_dmawait, dev->n_txwait, > + ibp->n_unaligned, ibp->n_loop_pkts, > + ibp->n_pkt_drops); > +for (i = 0; i < ARRAY_SIZE(ibp->opstats); i++) { > +const struct qib_opcode_stats *si = &ibp->opstats[i]; > + > +if (!si->n_packets && !si->n_bytes) > +continue; > +len += sprintf(buf + len, "%02x %llu/%llu\n", i, > + (unsigned long long) si->n_packets, > + (unsigned long long) si->n_bytes); > +} > +} > +len += sprintf(buf + len, "Ctx:npkts"); > +for (i = 0; i < dd->first_user_ctxt; i++) { > +if (!dd->rcd[i]) > +continue; > +len += sprintf(buf + len, " %u:%u", i, > + dd->rcd[i]->pkt_count); > +} > +len += sprintf(buf + len, "\n"); > +spin_lock_irqsave(&dev->qpt_lock, flags); > +for (i = 0; i < dev->qp_table_size; i++) { > +struct qib_qp *qp; > +for (qp = dev->qp_table[i]; qp != NULL; qp = qp->next) { > +struct qib_swqe *wqe; > + > +if (qp->s_last == qp->s_acked && > +qp->s_acked == qp->s_cur && > +qp->s_cur == qp->s_tail && > +qp->s_tail == qp->s_head) > +continue; > +if (len + 128 >= PAGE_SIZE) > +break; > +wqe = get_swqe_ptr(qp, qp->s_last); > +len += sprintf(buf + len, > + "QP%u %s %u %u %u f=%x %u %u %u %u %u " > + "PSN %x %x %x %x %x " > + "(%u %u %u %u %u %u) QP%u LID %x\n", > + qp->ibqp.qp_num, > + qp_type_str[qp->ibqp.qp_type], > + qp->state, > + wqe->wr.opcode, > + qp->s_hdrwords, > + qp->s_flags, > + atomic_read(&qp->s_dma_busy), > + !list_empty(&qp->iowait), > + qp->timeout, > + wqe->ssn, > + qp->s_lsn, > + qp->s_last_psn, > + qp->s_psn, qp->s_next_psn, > + qp->s_sending_psn, qp->s_sending_hpsn, > + qp->s_last, qp->s_acked, qp->s_cur, > + qp->s_tail, qp->s_head, qp->s_size, > + qp->remote_qpn, > + qp->remote_ah_attr.dlid);