Re: [ewg] EWG/OFED meeting minutes for July 24, 2012
On Sun, 29 Jul 2012 09:46:35 -0700 Alex Netes ale...@mellanox.com wrote: OFED 3.5: = 1. Kernel base: Move to kernel 3.5 GA will be done this week 2. Backports: RHEL 6.2, 6.3 and SLES 11 SP2 - available today Low level drivers: mlx4 (core ib) , nes Missing: mlx4_en - Mellanox, cxgb - Chelsio, qib - Intel 3. RC1: If all will provide backports by Tue - July-31 we will be able to release RC1 on Aug-2 - Mellanox is committed. - Need answers from Intel (Tom) and Chelsio (Steve) 4. User space: New uDAPL package and it is in the latest OFED-3.5 build. Need to include new librdmacm-1.0.16-1.src.rpm and a new ibacm-1.0.7- 1.src.rpm packages Management - Alex - is what we have is OK? There would be another OpenSM release on Wed Aug-1, that will include the latest bug fixes and some new contributed features such as: - Per Module Logging support - Congestion Control support - Perf_mgr extensions Diagnostic tools - Ira - is what we have is OK? There have been a number of bug fixes so I went ahead and released a 1.6.1 to the list just now. Thanks, Ira 5. Release schedule: Will decide in next meeting - assuming RC1 will be at end of next week and testing will start. Tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- Ira Weiny Member of Technical Staff Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Questions about CC in OFED
On Wed, 27 Jun 2012 11:41:03 +0800 Michael Zhang ziwen1...@gmail.com wrote: Hi everybody, I focus on the source code of IB congestion control in OFED, such as CCT related data structure and FECN/BECN process. But the only code related to the congestion control mechanism is found in the module of infinipath-psm. As I know, psm works for MPI applications and actually I haven't installed psm in our system. I have done experiment through IPoIB and the congestion control still works that makes me confused. I am not familiar with how psm may set CC parameters. Perhaps someone from QLogic/Intel can speak up here? Could someone tell me where is the CC-related source code located besides infinipath-psm? Or some functions are realized in hardware such as the movement of CCT Index when receives BECN message? Thank you for your help! The latest version of infiniband-diags (1.6.0)[*] has 2 tools (ibccquery and ibccconfig) which will allow one to query and set the CC parameters. As you will note in the man page as well as the help output these tools are not to be used lightly. You can cause instability in your fabric if you set things wrong. However, when used properly they work well. In addition, Al Chu from LLNL has posted a patch to OpenSM which allows for the setting of CC parameters in OpenSM. Feedback from this patch has been minimal but Alex Netes (maintainer of OpenSM) says he will review it soon. (http://www.spinics.net/lists/linux-rdma/msg11615.html) Finally, if you are interested in the ongoing development of this and other features of Open Fabrics code you may want to sign up for the linux-rdma mailing list (linux-r...@vger.kernel.org or http://vger.kernel.org/vger-lists.html). Ira [*] git://beany.openfabrics.org/~iraweiny/infiniband-diags.git Best, Ziwen -- Ziwen(Michael) Zhang Ph.D candidate College of Computer National University of Defense Technology (NUDT) , P.R.China Email: zi...@nudt.edu.cn ziwen1...@gmail.com -- Ira Weiny Member of Technical Staff Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Getting started
On Mon, 11 Jun 2012 14:23:12 -0300 Paulo R. Panhoto pp...@netscape.net wrote: Hello, I have a socket application that connects to a third party server. Can it be replaced it with a Verbs application? I would say only if the 3rd party server also supports Verbs. So far, I could learn that verbs API work either over infiniband (IB, RoCE) or iWARP (with RDMAP on top). Is this correct? I am not sure what you mean RDMAP on top? Verbs supports IB, RoCE, and iWarp. Ira Regards, Paulo. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- Ira Weiny Member of Technical Staff Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] server migration
On Tue, 29 May 2012 17:21:23 + Marciniszyn, Mike mike.marcinis...@intel.com wrote: Is there something about this move that I can no longer clone the following repos: git clone git://git.openfabrics.org/compat-rdma/linux-3.2.git git clone git://git.openfabrics.org/compat-rdma/compat.git git clone git://git.openfabrics.org/compat-rdma/compat-rdma.git It appears that no one is listening on the git port.Changing git to beany doesn't help.Changing git to sofa works. Same issue here: 15:27:41 git clone git://git.openfabrics.org/~iraweiny/infiniband-diags.git Initialized empty Git repository in /tmp/infiniband-diags/.git/ fatal: read error: Connection reset by peer 15:27:43 git clone git://sofa.openfabrics.org/~iraweiny/infiniband-diags.git Initialized empty Git repository in /tmp/infiniband-diags/.git/ remote: Counting objects: 4579, done. remote: Compressing objects: 100% (1887/1887), done. remote: Total 4579 (delta 3448), reused 3433 (delta 2613) Receiving objects: 100% (4579/4579), 967.54 KiB, done. Resolving deltas: 100% (3448/3448), done. Ira Mike From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Ken Strandberg Sent: Monday, May 28, 2012 2:29 PM To: nvme...@lists.openfabrics.org; ewg@lists.openfabrics.org; OFA Marketing Working Group (m...@lists.openfabrics.org) (m...@lists.openfabrics.org); o...@lists.openfabrics.org; interop...@lists.openfabrics.org; Ryan, Jim Subject: [ewg] server migration I completed the server migration this morning. Here are items to note. Please send me email if you find any anomalies or abnormalities. To all: lists.openfabrics.orghttp://lists.openfabrics.org, www.openfabrics.orghttp://www.openfabrics.org should function as before. There is a new direct subdomain to the downloads: downloads.openfabrics.orghttp://downloads.openfabrics.org Please check the following: Send yourself email through your openfabrics account, if you're forwarding mail from openfabrics.orghttp://openfabrics.org to another account. If you use an openfabrics.orghttp://openfabrics.org email account directly, please check you can get your email. If you want to use an openfabrics.orghttp://openfabrics.org email account, you now can access it with your imap client using your login and password for your existing account and lists.openfabrics.orghttp://lists.openfabrics.org as the server. Developers: The new server is beany.openfabrics.orghttp://beany.openfabrics.org. Your username and passwd should be the same as on sofa. I rsync'd all /home dirs to the new server on Saturday (5/27). Please verify your files on beany are up to date. If not, please update your files. Sofa will remain online for about another month. The week before I de-commission sofa, I'll notify all users. And again the day before. Please make sure all your files are moved to beany. If you had cron jobs you ran, you should check that they're set up and running as intended. I did not set up or get into any of your /home files. I only copied them. git.openfabrics.orghttp://git.openfabrics.org should function as before. SVN users: The svn repos should be operating as before. I dumped and loaded all databases as of Sunday morning (5/28). This is now the active repository. If you use svn:// protocol with your client, you need to change your address to the following svn://beany.openfabrics.org/ofwhttp://beany.openfabrics.org/ofw (for windows) svn://beany.openfabrics.org/nvmewinhttp://beany.openfabrics.org/nvmewin (for NVMe) If you use http:// protocol, the address is still http://www.openfabrics.org/svnrepo/nvmewin (or ofw) Again, please do not use sofa.openfabrics.org/svnrepo/xxxhttp://sofa.openfabrics.org/svnrepo/xxx for your repositories. Your trees will be out of date if you do. General: Why all this work to migrate to a new server? The sofa /home and /var directories were getting full. There was only about 3% left in /home. The new server gives us more disk space (a full TB instead of 700GB). Offsite storage for files. I was concerned about disaster recovery in the event of a meltdown at the hosting facility. Other than that, the hardware is about the same. We're saving about $120/month for more space. Again, please send me email if you see anything abnormal. Thanks. -- Ken Strandberg Webmanager/SysAdmin OpenFabrics Alliance k...@openfabrics.orgmailto:k...@openfabrics.org www.openfabrics.orghttp://www.openfabrics.org -- Ira Weiny Member of Technical Staff Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] ibv_create_qp fails with enomem
qkey_viol_cntr: 0x0 sm_sl: 0 pkey_tbl_len: 128 gid_tbl_len: 128 subnet_timeout: 18 init_type_reply: 0 active_width: 4X (2) active_speed: 10.0 Gbps (4) phys_state: LINK_UP (5) GID[ 0]: fe80::::0002:c903:004b:d5e8 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- Ira Weiny Member of Technical Staff Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] EWG/OFED meeting minutes for today (Dec 13, 2011)
On Fri, 6 Jan 2012 14:10:19 -0800 Woodruff, Robert J robert.j.woodr...@intel.com wrote: I took an AR from the last meeting to solicit input on the Linux distro/kernel list that we will support in the next release of OFED-3.2. The suggestion in the meeting was that we drop support for all RHEL EL 5.x versions. We'd like peoples input on that as it may be hard to do backports from kernel.org 3.2 all the way back to the 2.6.18 Redhat kernel base. I think that is reasonable. Ira For the other distro/kernel releases, here is a start of a list of the distos that I'd like to see supported. RHEL EL 6.0, 6.1, 6.2, and possibly 6.3 if released before OFED-3.2 . In the past, we have only supported the latest 2 updates of RHEL, but I received some feedback that it would be better to support the last 4. SLES 11 SP 1 and SLES 11 SP 2 (if released before OFED-3.2) Scientific Linux 6.1 Kernel.org 3.2 (what other kernel.org versions do we need to support ?) ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- Ira Weiny Member of Technical Staff Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [ANNOUNCE] infiniband-diags 1.5.12 tarball release
Hello, There is a new release of infiniband-diags. Tarball available at: http://www.openfabrics.org/downloads/management/infiniband-diags-1.5.12.tar.gz (listed in http://www.openfabrics.org/downloads/management/latest.txt) 7a823a3f6d9cfa3d19e1ca6889f3c122 infiniband-diags-1.5.12.tar.gz This is a bug fix release based on the 1.5 branch. Full list of changes is below. Author: Ira Weiny wei...@llnl.gov infiniband-diags: check_lft_balance.pl add -C/-P options infiniband-diags: clean up build Author: Albert L.Chu ch...@llnl.gov check_lft_balance.pl: Add extra check when using -e heuristic flag in check_lft_balance.pl Add -e heuristic flag to check_lft_balance.pl to detect common scenarios where unbalanced routing will occur. Update check_lft_balance.pl to work with newer infiniband-diag tools ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Web site access
On Thu, 27 Oct 2011 12:01:40 -0700 Tziporet Koren tzipo...@mellanox.com wrote: What is the issue? Don't you have a permission? Correct. Hal emailed me Ken Strandberg's name. I am going to email him. Thanks, Ira Tziporet -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Ira Weiny Sent: Wednesday, October 26, 2011 2:17 AM To: EWG Subject: [ewg] Web site access I am not sure who to email for server access... I built a new release of infiniband-diags and I would like to put it and the MD5 sum in /var/www/openfabrics.org/downloads/management/ This has bug fixes for 1.5.4, no new features. How do I officially release a new version? Last time Alex Netes put it in the above directory for me and I made an announcement and Vlad picked it up. I would like to do this without Alex's help. :-D Thanks, Ira -- Ira Weiny Member of Technical Staff Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Web site access
I am not sure who to email for server access... I built a new release of infiniband-diags and I would like to put it and the MD5 sum in /var/www/openfabrics.org/downloads/management/ This has bug fixes for 1.5.4, no new features. How do I officially release a new version? Last time Alex Netes put it in the above directory for me and I made an announcement and Vlad picked it up. I would like to do this without Alex's help. :-D Thanks, Ira -- Ira Weiny Member of Technical Staff Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] What is the current version of mstflint?
Oren, What is the current version of mstflint? Looking through the git tree I see that you fixed a bug which resulted in kernel messages like this: 2011-07-31 08:38:43 mstflint:3792 freeing invalid memtype fe90-fe91 I am trying to point RedHat to this fixed version (So that they can include it in RHEL 6.2) but I don't see any tags in the git tree. Is the fix from commit 771c3d04c9a09a83c182037663032b1f53dbf87b Fix unmap typo bug in a tagged and/or released version? From the comments in the most recent commit it looks like the most current version is MFT 2.7.0 build 20 (final)? Thanks, Ira -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Current administrator for git accounts on openfabrics.org
Who would I contact for a git account on git.openfabrics.org/git? Thanks, Ira -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Current administrator for git accounts on openfabrics.org
Thanks! Ira On Wed, 9 Feb 2011 10:00:25 -0800 Alex Netes ale...@voltaire.com wrote: Hi Ira, Who would I contact for a git account on git.openfabrics.org/git? Ken Strandberg k...@kenstrandberg.com is sysadmin in openfabrics.org and he would be happy to assist you. Thanks, Alex. -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] RFC: Splitting of the management git tree in Open Fabrics
As I briefly mentioned in an email to Yevgeny regarding libibumad ABI's; I believe it is time to break up the management git tree. With GA of OFED 1.5.2 scheduled for Sept 13, I would like to request comments from the community about the following split after that GA. On openfabrics.org/git split management.git into the following trees. openfabrics.org/git/infiniband-diags.git openfabrics.org/git/libibumad.git openfabrics.org/git/libibmad.git openfabrics.org/git/opensm.git Sasha can populate those from the current management tree. We believe there are git commands which will do this without losing any history from the git trees. Vlad, what changes would you have to make in the OFED build to accommodate these packages being in separate git trees? Any other concerns or comments? Thanks, Ira -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Allowing ib dignostics to be run without being logged in as root.
To steer the conversation in a different direction. Perhaps there is a need to have a second umad device file which allows only for Get operations? I know this could be some work and I don't know if it could be completely done (I have not thought through all the details). [*] I know there is some discussion on the interface for userspace apps and MAD's on the developers mailing list. Is this a requirement we should look into more? I know we have some need for this and now Woody has this need as well. Thoughts? Ira [*] NOTE: I am not directly volunteering to do this work ;-) But I have been interested in changing the user level MAD libraries in the past so I think I could help. On Wed, 26 May 2010 09:51:53 -0700 Justin Clift jus...@salasaga.org wrote: On 05/27/2010 02:19 AM, Woodruff, Robert J wrote: Hal wrote, sudo can be configured for specific commands to be allowed to specific users. Then perhaps that is a safer way to do it, but it would put more work on the system admin to set it up for people, but if setting the permissions of the commands to setuid root opens up a security hole, we would not want that. From an experienced SysAdmin perspective, the less setuid/setgid programs there are on a system the better. If a system could have them *all* removed, that would be great. :) Security types generally don't like them either, regarding them as a point of weakness due to circumventing finer grained access controls (sudo, ACLs, RBAC, etc). setuid/setgid binaries are also included (and queried) in *every* system audit. Good security practise will generally change the binaries back to being non-setuid/non-setgid (ie normal perms) unless there's a Very Good Reason for them to be otherwise. I have personally had to secure/harden many *nix systems over the years, plus write detailed technical best practice guides for multi-national corporates on how to do it on more than one occasion. Last time was in roughly 2006, and setuid/setgid stuff was regarded as bad old practise at that time. I'd expect it would be even less favoured now. Does anyone know if setting the permissions to setuid root does actually open up a security hole ? Not directly. It just creates lots of secondary hassles for SysAdmins, Security Admins, policy enforcement software, and monitoring software because it introduces another vector for attack. People having a need for setuid or setgid root for these binaries can most definitely do it themselves as part of their roll out. Not sure if that perspective helps, but you do seem to be asking. :) Regards and best wishes, Justin Clift woody -- Salasaga - Open Source eLearning IDE http://*www.*salasaga.org ___ ewg mailing list ewg@lists.openfabrics.org http://*lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Question: When should patches be submitted to EWG and when should they be submitted to linux-rdma?
On Wed, 26 May 2010 13:58:41 -0700 Mike Heinz michael.he...@qlogic.com wrote: My preference for bug fixes is that they be applied so that they go into the upstream kernel - assuming they don't require EWG-only changes. But I need to understand the correlation between the two source trees - if you accept a bug fix for the upstream kernel, will that end up in OFED as well, or do I need to submit the patch to both groups? There is a reason upstream is called upstream. If you get it into the upstream kernel it will flow down and everyone will get it. If you only submit to EWG then it will stay there in OFED purgatory. That is not to say you can't submit to OFED for critical things which your customers need but that should be an exception rather than the rule. Ira -Original Message- From: Roland Dreier [mailto:rdre...@cisco.com] Sent: Wednesday, May 26, 2010 4:50 PM To: Mike Heinz Cc: openfabrics-...@openib.org Subject: Re: [ewg] Question: When should patches be submitted to EWG and when should they be submitted to linux-rdma? The subject says it all. If I have a patch that can be applied against either the current OFED git repository or against the upstream kernel - where do I post it? What do you want to happen to the patch? If you want it applied to the upstream kernel, then send it to me and linux-rdma. If you want it applied to an OFED tree, send it to ewg. -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://*www.*cisco.com/web/about/doing_business/legal/cri/index.html ___ ewg mailing list ewg@lists.openfabrics.org http://*lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] ibcheckerrors Port All FAILED reported
On Thu, 6 May 2010 06:26:55 -0700 Mike Heinz michael.he...@qlogic.com wrote: Ira, I'm pretty sure I already fixed this problem. I submitted a patch to Sasha back in April. The tests below are with the current master. git://git.openfabrics.org/~sashak/management Ira -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Ira Weiny Sent: Wednesday, May 05, 2010 9:10 PM To: Woodruff, Robert J; linux-r...@vger.kernel.org Cc: EWG; tzipo...@mellanox.co.il Subject: Re: [ewg] ibcheckerrors Port All FAILED reported Interesting... I have a switch which does this as well. Tracing through the scripts shows that the perfquery command is failing like this. 14:29:03 ./perfquery 40 255 ./perfquery: iberror: failed: AllPortSelect not supported It seems there is an issue with the CapabilityMask value... 14:43:32 ./perfquery 40 255 cap_mask 0x400 === my debug output ./perfquery: iberror: failed: AllPortSelect not supported 14:43:38 ./saquery CPI 40 SA ClassPortInfo: ... Capability mask..0x2602 ... Those don't match because... perfquery has a bug... perfquery is issuing a PMA query when it should be issuing a SA query. It just so happens that on some switches the result of that PMA query indicates AllPortSelect is available. Patch to follow. Ira On Wed, 5 May 2010 13:47:54 -0700 Woodruff, Robert J robert.j.woodr...@intel.com wrote: Hi guys, When I run ibcheckerrors on my Mellanox switch, it is reporting that Port all FAILED. From what I can tell, the switch is working fine and I think that this is a bogus error from the program. If this is indeed not a real problem, can the diagnostic be fixed to not report this as an error ? ibcheckerrors -nocolor -v -t 100 # Checking Switch: nodeguid 0x0002c902004046a0 Node check lid 7: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: FAILED Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK Checking Ca: nodeguid 0x0002c9030002628a Node check lid 14: OK Error check on lid 14 (cstnh-2 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300025e0a Node check lid 12: OK Error check on lid 12 (cstnh-3 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030002615e Node check lid 15: OK Error check on lid 15 (cstnh-4 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e442 Node check lid 11: OK Error check on lid 11 (cstnh-8 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e44e Node check lid 8: OK Error check on lid 8 (cstnh-11 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e3e6 Node check lid 2: OK Error check on lid 2 (cstnh-13 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e44a Node check lid 18: OK Error check on lid 18 (cstnh-9 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300044fb4 Node check lid 13: OK Error check on lid 13 (cstnh-7 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300044fbc Node check lid 10: OK Error check on lid 10 (cstnh-1 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e3ee Node check lid 9: OK Error check on lid 9 (cstnh-10 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e446 Node check lid 4: OK Error check on lid 4 (cstnh-12 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e22e Node check lid 1: OK Error check on lid 1 (cstnh-14 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e43e Node check lid 19: OK Error check on lid 19 (cstnh-15 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0090270002000345 Node check lid 6: OK Error check on lid 6 (cstnh-5 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0090270002000335 Node check lid
Re: [ewg] ibcheckerrors Port All FAILED reported
On Thu, 6 May 2010 14:11:24 -0700 Sasha Khapyorsky sas...@voltaire.com wrote: On 18:09 Wed 05 May , Ira Weiny wrote: 14:29:03 ./perfquery 40 255 ./perfquery: iberror: failed: AllPortSelect not supported It seems there is an issue with the CapabilityMask value... 14:43:32 ./perfquery 40 255 cap_mask 0x400 === my debug output ./perfquery: iberror: failed: AllPortSelect not supported 14:43:38 ./saquery CPI 40 SA ClassPortInfo: ... Capability mask..0x2602 ... Those don't match because... perfquery has a bug... perfquery is issuing a PMA query when it should be issuing a SA query. I'm not following. How should it be related to each other SA and PM ClassPortInfo(s)? It's not, I was confused... :-D Ira Sasha -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] ibcheckerrors Port All FAILED reported
Interesting... I have a switch which does this as well. Tracing through the scripts shows that the perfquery command is failing like this. 14:29:03 ./perfquery 40 255 ./perfquery: iberror: failed: AllPortSelect not supported It seems there is an issue with the CapabilityMask value... 14:43:32 ./perfquery 40 255 cap_mask 0x400 === my debug output ./perfquery: iberror: failed: AllPortSelect not supported 14:43:38 ./saquery CPI 40 SA ClassPortInfo: ... Capability mask..0x2602 ... Those don't match because... perfquery has a bug... perfquery is issuing a PMA query when it should be issuing a SA query. It just so happens that on some switches the result of that PMA query indicates AllPortSelect is available. Patch to follow. Ira On Wed, 5 May 2010 13:47:54 -0700 Woodruff, Robert J robert.j.woodr...@intel.com wrote: Hi guys, When I run ibcheckerrors on my Mellanox switch, it is reporting that Port all FAILED. From what I can tell, the switch is working fine and I think that this is a bogus error from the program. If this is indeed not a real problem, can the diagnostic be fixed to not report this as an error ? ibcheckerrors -nocolor -v -t 100 # Checking Switch: nodeguid 0x0002c902004046a0 Node check lid 7: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: FAILED Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK Checking Ca: nodeguid 0x0002c9030002628a Node check lid 14: OK Error check on lid 14 (cstnh-2 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300025e0a Node check lid 12: OK Error check on lid 12 (cstnh-3 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030002615e Node check lid 15: OK Error check on lid 15 (cstnh-4 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e442 Node check lid 11: OK Error check on lid 11 (cstnh-8 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e44e Node check lid 8: OK Error check on lid 8 (cstnh-11 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e3e6 Node check lid 2: OK Error check on lid 2 (cstnh-13 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e44a Node check lid 18: OK Error check on lid 18 (cstnh-9 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300044fb4 Node check lid 13: OK Error check on lid 13 (cstnh-7 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300044fbc Node check lid 10: OK Error check on lid 10 (cstnh-1 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e3ee Node check lid 9: OK Error check on lid 9 (cstnh-10 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e446 Node check lid 4: OK Error check on lid 4 (cstnh-12 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e22e Node check lid 1: OK Error check on lid 1 (cstnh-14 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e43e Node check lid 19: OK Error check on lid 19 (cstnh-15 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0090270002000345 Node check lid 6: OK Error check on lid 6 (cstnh-5 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0090270002000335 Node check lid 5: OK Error check on lid 5 (cstnh-6 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300028238 Node check lid 3: OK Error check on lid 3 (cst-linux HCA-1) port 1: OK ## Summary: 17 nodes checked, 0 bad nodes found ## 32 ports checked, 0 ports have errors beyond threshold ___ ewg mailing list ewg@lists.openfabrics.org http://*lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- Ira Weiny wei...@llnl.gov ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] ibcheckerrors Port All FAILED reported
Nevermind, I am wrong about the below. However, there is an option to emulate the all ports when it is not supported. That is a way to fix this I believe. Ira On Wed, 5 May 2010 18:09:43 -0700 Ira Weiny wei...@llnl.gov wrote: Interesting... I have a switch which does this as well. Tracing through the scripts shows that the perfquery command is failing like this. 14:29:03 ./perfquery 40 255 ./perfquery: iberror: failed: AllPortSelect not supported It seems there is an issue with the CapabilityMask value... 14:43:32 ./perfquery 40 255 cap_mask 0x400 === my debug output ./perfquery: iberror: failed: AllPortSelect not supported 14:43:38 ./saquery CPI 40 SA ClassPortInfo: ... Capability mask..0x2602 ... Those don't match because... perfquery has a bug... perfquery is issuing a PMA query when it should be issuing a SA query. It just so happens that on some switches the result of that PMA query indicates AllPortSelect is available. Patch to follow. Ira On Wed, 5 May 2010 13:47:54 -0700 Woodruff, Robert J robert.j.woodr...@intel.com wrote: Hi guys, When I run ibcheckerrors on my Mellanox switch, it is reporting that Port all FAILED. From what I can tell, the switch is working fine and I think that this is a bogus error from the program. If this is indeed not a real problem, can the diagnostic be fixed to not report this as an error ? ibcheckerrors -nocolor -v -t 100 # Checking Switch: nodeguid 0x0002c902004046a0 Node check lid 7: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port all: FAILED Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 2: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 3: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 7: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 8: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 9: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 10: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 17: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 18: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 20: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 25: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 26: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 27: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 28: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 34: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 35: OK Error check on lid 7 (Infiniscale-IV Mellanox Technologies) port 36: OK Checking Ca: nodeguid 0x0002c9030002628a Node check lid 14: OK Error check on lid 14 (cstnh-2 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300025e0a Node check lid 12: OK Error check on lid 12 (cstnh-3 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030002615e Node check lid 15: OK Error check on lid 15 (cstnh-4 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e442 Node check lid 11: OK Error check on lid 11 (cstnh-8 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e44e Node check lid 8: OK Error check on lid 8 (cstnh-11 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e3e6 Node check lid 2: OK Error check on lid 2 (cstnh-13 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e44a Node check lid 18: OK Error check on lid 18 (cstnh-9 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300044fb4 Node check lid 13: OK Error check on lid 13 (cstnh-7 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300044fbc Node check lid 10: OK Error check on lid 10 (cstnh-1 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e3ee Node check lid 9: OK Error check on lid 9 (cstnh-10 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e446 Node check lid 4: OK Error check on lid 4 (cstnh-12 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e22e Node check lid 1: OK Error check on lid 1 (cstnh-14 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c9030008e43e Node check lid 19: OK Error check on lid 19 (cstnh-15 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0090270002000345 Node check lid 6: OK Error check on lid 6 (cstnh-5 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0090270002000335 Node check lid 5: OK Error check on lid 5 (cstnh-6 HCA-1) port 1: OK # Checking Ca: nodeguid 0x0002c90300028238 Node check lid 3: OK Error check on lid 3 (cst-linux HCA-1) port 1: OK ## Summary: 17 nodes checked, 0 bad nodes found ## 32 ports checked, 0 ports have errors beyond threshold ___ ewg mailing list
Re: [ewg] RE: OFED Jan 5, 2009 meeting minutes on OFED plans
On Wed, 7 Jan 2009 09:35:39 -0800 Woodruff, Robert J robert.j.woodr...@intel.com wrote: Doug wrote, I'm not so much concerned over IBTA standards. I'm concerned over what makes it into the upstream linux kernels. How much OFED's kernel differs from the upstream kernel directly impacts supportability of the OFED stack in our products. The more it diverges, the higher the support load. We actively control that divergence as a result. In general, we discussed and decided at the last developer's workshop in Sonoma to try to make sure that any new features that were going into OFED be first accepted for inclusion in the upstream kernel, or at least queued in Roland's tree for upstream. I think we did a pretty good job in OFED 1.4 of adhering to that process, or at least we made significant progress towards that goal. We did this specifically to try to prevent major divergence between the upstream kernel and the OFED kernel. So for a major new feature like IBoE, I think it makes sense to first discuss the patches on ofa-general and perhaps even a RFC on kernel.org before we include it into an OFED release. my 2 cents, woody I agree. OFED should be downstream of kernel.org for as much as possible. New features should be introduced there first. Ira ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Reminder: OpenSM BOF at OFA Sonoma Workshop
Just a reminder that we are going to have a BOF for OpenSM Monday the 7th at 6:30pm; room is TBA. Please come and share your use, experience and desires for OpenSM. Or if you have yet to try OpenSM, listen in on what others are doing with it. Thanks, Ira Weiny Comp Sci./Math Prog. Lawrence Livermore National Lab [EMAIL PROTECTED] ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] OFED 1.2.5.4 is ready on the ofa server
It looks like there is something corrupt with the tarball. 13:29:50 tar xzf OFED-1.2.5.4.tgz gzip: stdin: unexpected end of file tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now Ira On Tue, 4 Dec 2007 10:34:48 +0200 Tziporet Koren [EMAIL PROTECTED] wrote: OFED-1.2.5.4 is ready: http://www.openfabrics.org/downloads/OFED/ofed-1.2.5/OFED-1.2.5.4.tgz Changes since OFED 1.2.5 - RDS: - Performance enhancements - GA for Oracle 11 - IPoIB: - Use NAPI by default - For small received packets, allocate a new, smaller SKB to relief accounting on the socket. - mlx4: - Enable changing default max HCA resource limits using module options. - Support opening of more resources then the default by increasing command timeout for INIT_HCA to 10 seconds - PPC64 support: - Fixed compilation problems on SLES10 SP1 Changes from OFED 1.2.5.3: == - Low level drivers update: - cxgb3: Pull in latest fixes. - ipath: Pull in latest fixes. - OSes support: - Added support for SLES9 SP4 (no QA was done) - Added support for RHEL5 up1 (no QA was done) - IPOIB: - Removed the usage of unsignalled QP in Tx due to deadlock. - RDS: - Relax the header consistency check on fragment reassembly Tziporet Vlad Tziporet Koren Software Director Mellanox Technologies mailto: [EMAIL PROTECTED] Tel +972-4-9097200, ext 380 ___ general mailing list [EMAIL PROTECTED] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] OFED teleconference today
On Mon, 13 Aug 2007 19:15:41 +0300 Sasha Khapyorsky [EMAIL PROTECTED] wrote: On 08:26 Mon 13 Aug , Jeff Squyres wrote: Friendly reminder: the OFED teleconference is several hours from now (Monday, August 13, 2007). Noon US eastern / 9am US Pacific / 7pm Israel 1. Monday, Aug 13, code 210062024 2. Monday, Aug 27, code 210062024 3. Monday, Sep 10, code 210062024 US/Canada: +1.866.432.9903 India: +91.80.4103.3979 Israel: +972.9.892.7026 Others: http://cisco.com/en/US/about/doing_business/conferencing/ What is the enter code? 210062024 worked for me. The old code 2102061 did NOT work. Ira Sasha ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] OFED teleconference today
It is nothing you did. Matt and I have internal calendar software which had the wrong (old?) code. I used it first then thought to look at the mail. I suspected Sasha had made a similar mistake. Ira On Mon, 13 Aug 2007 12:20:52 -0400 Jeff Squyres [EMAIL PROTECTED] wrote: FWIW, I always list today's code in the mail (see below). I have taken to listing the next few codes as well. Is that causing confusion? On Aug 13, 2007, at 12:17 PM, Ira Weiny wrote: On Mon, 13 Aug 2007 19:15:41 +0300 Sasha Khapyorsky [EMAIL PROTECTED] wrote: On 08:26 Mon 13 Aug , Jeff Squyres wrote: Friendly reminder: the OFED teleconference is several hours from now (Monday, August 13, 2007). Noon US eastern / 9am US Pacific / 7pm Israel 1. Monday, Aug 13, code 210062024 2. Monday, Aug 27, code 210062024 3. Monday, Sep 10, code 210062024 US/Canada: +1.866.432.9903 India: +91.80.4103.3979 Israel: +972.9.892.7026 Others: http://cisco.com/en/US/about/doing_business/ conferencing/ What is the enter code? 210062024 worked for me. The old code 2102061 did NOT work. Ira Sasha ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- Jeff Squyres Cisco Systems ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: ANNOUNCE: ofed kernel build updates
Michael, I only got a chance to try the ofed_makedist.sh and compile (not actually run). However the build worked very well! So initial feedback is this works much better. Thanks, Ira On Wed, 25 Jul 2007 17:11:41 +0300 Michael S. Tsirkin [EMAIL PROTECTED] wrote: Hi! I'd like to announce a couple of updates that were recently made to the build scripts on the ofed_kernel branch. This is an attempt to answer repeated requests, aired at Sonoma, to simplify access to kernel sources. The idea is that a user of a supported kernel will just be able to download an appropriate tarball and run with it without need for patching. These changes are available from ofed_kernel git tree maintained by Vlad: git://git.openfabrics.org/~vlad/ofed_kernel.git ofed_kernel The code is mine, but the ideas mostly come from criticism and code sent by Ira Weiny. Thanks, Ira! Note that the changes were made in a backwards-compatible way, so that existing scripts using configure/make will continue working. What's new: 1. New script ofed_scripts/ofed_patch.sh This will apply fixes and backport patches for a specific kernel to the current tree. Usage: ./ofed_scripts/ofed_patch.sh --with-backport=VERSION This makes it possible for distro vendors to generate a tarball pre-patched for a specific kernel. 2. New script ofed_scripts/ofed_makedist.sh This script repeatedly clones the current repository, runs ofed_scripts/ofed_patch.sh, and then builds tarballs of ofed kernel source pre-patched for supported kernel versions. I plan to work with Vlad to run this script as part of nightly builds, so that prepatched tarballs will become available for download. 3. configure script made re-entrant configure script does not apply patches anymore: all it does is create configure.mk.kernel and autoconf.h files. This finally makes it possible to change configuration parameters just by re-running configure. For backwards-compatibility, if configure detects that ofed_scripts/ofed_patch.sh was not run yet, it prints a warning and runs it automatically. Feedback wellcome. -- MST ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg