On Thu, 08 Apr 2010 16:29:06 -0700 Roland Dreier <rdre...@cisco.com> wrote:
> > I for one would rather not see this die. We have debugged some critical > > issues using this data. The sysfs entries above are what Mellanox uses. > > Should those also be changed? > > Which sysfs entries? I don't see any likely looking code in either of > the Mellanox drivers. My apologies. It appears this an OFED thing.[*] 16:31:04 > ls -la /sys/class/infiniband/mlx4_0/diag_counters/ total 0 drwxr-xr-x 2 root root 0 Apr 8 16:31 ./ drwxr-xr-x 4 root root 0 Apr 8 16:31 ../ --w--w--w- 1 root root 4096 Apr 8 16:31 clear_diag -r--r--r-- 1 root root 4096 Apr 8 16:31 num_baddb -r--r--r-- 1 root root 4096 Apr 8 16:31 num_cqovf ... This is with the current RHEL5.4 drivers. <sigh> So is there an equivalent for mlx4 in the upstream kernel? I don't see them. Do you feel it is appropriate for the port counters to be in sysfs? 16:46:19 > ls -la /sys/class/infiniband/mlx4_0/ports/1/counters/ total 0 drwxr-xr-x 2 root root 0 Apr 8 16:46 ./ drwxr-xr-x 5 root root 0 Apr 6 09:33 ../ -r--r--r-- 1 root root 4096 Apr 8 16:46 VL15_dropped -r--r--r-- 1 root root 4096 Apr 8 16:46 excessive_buffer_overrun_errors ... Although these "diag_counters" are internal to the cards they do track counts which are part of the IB spec like RNR NAK's. So I am not sure they are really "debug" in the strictest sense. I will rephrase my statement, I would like to see these counters included as they give valuable information about what is going on in a cluster. Ira * OFED MUST die! > > - R. > > -- > Roland Dreier <rola...@cisco.com> || For corporate legal information go to: > http://*www.*cisco.com/web/about/doing_business/legal/cri/index.html > -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html