Re: [ewg] EWG/OFED meeting minutes for July 24, 2012

2012-07-29 Thread Alex Netes

> 
> OFED 3.5:
> =
> 1. Kernel base: Move to kernel 3.5 GA will be done this week
> 
> 2. Backports:
> RHEL 6.2, 6.3 and SLES 11 SP2 - available today
> Low level drivers: mlx4 (core & ib) , nes
> Missing: mlx4_en - Mellanox, cxgb - Chelsio, qib - Intel
> 
> 
> 3. RC1:
> If all will provide backports by Tue - July-31 we will be able to release 
> RC1
> on Aug-2
> - Mellanox is committed.
> - Need answers from Intel (Tom) and Chelsio (Steve)
> 
> 4. User space:
> New uDAPL package and it is in the latest OFED-3.5 build.
> Need to include new librdmacm-1.0.16-1.src.rpm and a new ibacm-1.0.7-
> 1.src.rpm packages
> Management - Alex - is what we have is OK?

There would be another OpenSM release on Wed Aug-1, that will include the latest
bug fixes and some new contributed features such as:
- Per Module Logging support
- Congestion Control support
- Perf_mgr extensions

> Diagnostic tools - Ira - is what we have is OK?
> 
> 5. Release schedule:
> Will decide in next meeting - assuming RC1 will be at end of next week
> and testing will start.
> 
> Tziporet
> 
> ___
> ewg mailing list
> ewg@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] OpenSM 1.5.4 Boot Problem

2011-12-16 Thread Alex Netes
Hi Hector,

Few more questions.
Does this happen to you only when you try to shut down the OpenSM on reboot?
What is the host cpu architecture? x86/x86_64/ppc?


> -Original Message-
> From: ewg-boun...@lists.openfabrics.org [mailto:ewg-
> boun...@lists.openfabrics.org] On Behalf Of Hal Rosenstock
> Sent: Thursday, December 15, 2011 9:06 PM
> To: Hector Abrach
> Cc: ewg@lists.openfabrics.org
> Subject: Re: [ewg] OpenSM 1.5.4 Boot Problem
> 
> Hector,
> 
> On 12/15/2011 12:49 PM, Hector Abrach wrote:
> > Hal,
> >
> > Thank you for the response. To address your questions:
> >
> >> So the switch stays up and the servers (including the one OpenSM is
> >> on) is rebooted, right ?
> >
> > Right.
> >
> >> Do the servers run QNX rather than Linux ? Are you saying all OpenSM
> >> code is the same as stock OpenSM 3.3.12 (OFED 1.5.4-rc3) ?
> >
> > Yes, all 7 servers run QNX. The OpenSM code is 99% the same, the only
> > changes I had to make were made to some #define libraries.
> > The big changes were made for the driver, not so much OpenSM.
> 
> I would think there are also changes for porting of complib to QNX. Do you
> use osm_vendor_ibumad.c as the OpenSM vendor layer ?
> 
> > I'm using IBNet 1.3.
> 
> What's IBNet 1.3 ? I'm not familiar with that.
> 
> > OpenSM always runs on the same one server, the others don't run it.
> 
> Understood.
> 
> >> Is the topology the 7 servers and the 1 switch and if you use other
> >> switches you don't see this issue ?
> >
> > That's correct, the topology is 7 servers and 1 switch. We typically
> > use less servers (4) for our application but the problem is more
> > easily reproducible with more servers so we have a 7 server setup with
> > 1 switch. We don't have a great selection of switches but I know our
> > previous switch did not cause this problem. Our intention is to go to
> > production with this new switch but we can't release until we find an
> > acceptable solution.
> >
> >>Ican see the responses but not the requests. What verbosity level did
> >>you use ?
> >
> > I ran OpenSM with level -D 0x06 (error, info, verbose). I don't want
> > to do -D 0xFF because I know this fixes the problem for sure.
> 
> I think -D 0x23 (error, info, frames) would do the trick...
> 
> > -
> >
> > In summary:
> > 1.knowing that the system gets stuck for sm_vendor_ibumad.c ->
> > umad_receiver() -> "for(;;)" but keeps running properly for function
> > main.c -> osm_manager_loop().
> > 2.If I use -D 0xFF the problem is completely fixed
> > 3.if I use OSM_DEFAULT_SMP_MAX_ON_WIRE of 1 instead of any other
> > value the problem is completely fixed
> > 4.The failure always occurs with qp0_mads_outstanding of 1
> > remaining
> > what do you think could be wrong?
> > Do you think the driver could be the problem?
> 
> Yes; The thing that I think is a likely suspect and may be missing and causing
> this issue is the (built in to kernel MAD in Linux) timeout retry code for MAD
> transactions which if the timeout/retries are exhaused triggers a send error
> (callback). Is that implemented ?
> 
> However, I don't have a good explanation for why you see this now and not
> before with your other switches but maybe that's not important.
> 
> > What debug command should I use to see the sent requests?
> 
> See above.
> 
> -- Hal
> 
> > Thank you
> >
> > Hector Abrach
> >
> >
> >
> >
> > From:   Hal Rosenstock 
> > To: Hector Abrach 
> > Cc: ewg@lists.openfabrics.org
> > Date:   12/14/2011 08:23 PM
> > Subject:Re: [ewg] OpenSM 1.5.4 Boot Problem
> >
> >
> > --
> > --
> >
> >
> >
> > Hector,
> >
> > On 12/14/2011 1:41 PM, Hector Abrach wrote:
> >> Hal,
> >>
> >> Sorry for the multiple emails, but I was thinking how it may be a
> >> "freeze /stall" rather than a time out.  One reason is that it
> >> doesn't send an error message, is as if the log completely dies.
> >
> > So nothing interesting in the log...
> >
> >> However, in
> >> file osm_vendor_ibumad.c under function umad_receiver there is an
> >> infinite loop "for(;;)" which seems to die when I get to that
> >> previously discussed vl15_poller. I checked to see if it breaks out
> >> of the loop but it doesn't seem to.
> >
> > It never breaks out of that loop except when OpenSM is shutting down.
> > That's the basic receive loop.
> >
> > -- Hal
> >
> >> I'm not sure if this may be an additional hint.
> >> Thank you
> >>
> >> Hector Abrach
> >>
> >>
> >> From:  Hector Abrach 
> >> To:  Hal Rosenstock 
> >> Cc:  ewg@lists.openfabrics.org
> >> Date:  12/14/2011 11:15 AM
> >> Subject:  Re: [ewg] OpenSM 1.5.4 Boot Problem
> >> Sent by:  ewg-boun...@lists.openfabrics.org
> >>
> >>
> >> -
> >> ---
> >>
> >>
> >>
> >> Hal,
> >>
> >> Thank yo

Re: [ewg] EWG/Meeting agenda for today - 28-Mar, 2011

2011-03-28 Thread Alex Netes
On 16:14 Mon 28 Mar , Tziporet Koren wrote:
> 
> EWG/OFED Agenda for today:
> 
> 1. OFED 1.6 schedule:
> --
> - Move to kernel 2.6.38 (since its GA already, and we have not started 
> backports)
> - Ongoing work on backports - during Q2
> - First RC - end of June
>   RCs every 2 weeks
> - GA - End of Aug
> 
> 2. OFED 1.6 main features:
> 
> - Mellanox: CX3 support
> - SRIOV support for mlx4 with CX2 & CX3
> - FDR support
> - New OSes support: As usual the latest OSes will be supported
> - Remove MPI packages from OFED
> - Ne management package: Alex - please send details
> 
 
OpenSM main improvements:
1. Torus-2QoS routing engine
2. Performance Manager improvements: improved redirection and extended counters
support
3. Additional port balancing options for routing
4. More bug fixes

Alex.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [ANNOUNCE] opensm tarballs release

2011-03-07 Thread Alex Netes
Hi,

There is a new release of the OpenSM tarball available in:

http://www.openfabrics.org/downloads/management/

(listed in http://www.openfabrics.org/downloads/management/latest.txt)

5e9b461073f7cfbafe0207e014796f9f  opensm-3.3.9.tar.gz

All component versions are from recent master branch. Full list of changes is
below.

OpenSM:
Alex Netes (1):
opensm: fixed memory leak in multicast spanning tree calculation
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] OFA management tree seperation

2011-02-23 Thread Alex Netes

On 15:59 Sun 13 Feb , Jason Gunthorpe wrote:
> On Sun, Feb 13, 2011 at 09:55:50AM +0200, Alex Netes wrote:
> > 
> > We finished the management tree seperation.
> > From now on, Ira Weiny  takes the responsibility for 
> > maitaining libibmad and
> > infiniband-diags. His trees are:
> > 
> > git://git.openfabrics.org/~iraweiny/libibmad
> > git://git.openfabrics.org/~iraweiny/infiniband-diags
> > 
> > libibumad, opensm and ibsim trees stays under my responsibility:
> > 
> > git://git.openfabrics.org/~alexnetes/libibumad
> > git://git.openfabrics.org/~alexnetes/opensm
> > git://git.openfabrics.org/~alexnetes/ibsim
> 
> Can you please include the OpenSm 3.2.6 and related branch and tags in
> your repository?
> 

Done. OpenSM 3.2.6 resides on opensm-3.2 branch as before.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [ANNOUNCE] management tarballs release

2011-02-16 Thread Alex Netes
Hi,

There is a new release of the management (OpenSM and infiniband
diagnostics) tarballs available in:

http://www.openfabrics.org/downloads/management/

(listed in http://www.openfabrics.org/downloads/management/latest.txt)

c0b24a1053ae8b0b3caf5950b3ede6dc  infiniband-diags-1.5.8.tar.gz
c2755aa360d3f29d04865ba4e2454a98  libibmad-1.3.7.tar.gz
c7575b7620615d7dfa1c7fdbbd310ec7  libibumad-1.3.7.tar.gz
df051f5f0192d369b0b904147cb045a8  opensm-3.3.8.tar.gz

All component versions are from recent master branch. Full list of
changes is below.

OpenSM:
===

Alex Netes (1):
  opensm: fixed getline pointer allocation free in osm_console_io

Eli Dorfman (Voltaire) (1):
  Wrong handling of MC create and delete traps

Hal Rosenstock (6):
  opensm/osm_state_mgr.c: Don't signal DISCOVER to SM state machine when 
already DISCOVERING
  opensm: Fix some typos
  osmtest/osmt_service.c: In osmt_run_service_records_flow, add missing 
status
  opensm/osm_ucast_ftree: When roots are not connected, update hop count 
but not lft
  opensm/osm_trap_rcv.c: No need to check for sweep for trap 145
  opensm: Add support for SwitchInfo:MulticastFDBTop

Ira Weiny (1):
  Add node/port/qos information to some error messages

Jason Gunthorpe (1):
  Fix autotools to include the necessary M4 files

Sasha Khapyorsky (3):
  opensm/sa: simplify osm_mcmr_rcv_find_or_create_new_mgrp() function call
  opensm/osm_node_info_rcv.c: move p_physp declaration under code block
  opensm/osm_db_files.c: malloc() return value run-time check

Stan C. Smith (2):
  replace (long*)(long) casting with transportable data type (uintptr_t)
  replace (long*)(long) casting with transportable data type (uintptr_t)

Yevgeny Kliteynik (28):
  opensm/osm_qos_policy.c: change a log message
  opensm/osm_prtn.c: removing TopSpin hack
  libvendor/osm_vendor_ibumad_sa.c: remove useless "if" statement
  libvendor/osm_vendor_mlx_sa.c: remove useless "if" statement
  opensm/osm_mtree.c: removing useless 'if' statement
  opensm/osm_sminfo_rcv.c: removing unused variable
  opensm/osm_pkey.c: removing unused function
  opensm/osm_sa_pkey_record.c: removing unused variable
  opensm/osm_sa_vlarb_record.c: removed unused variable
  opensm/osm_node_info_rcv.c: remove useless code line
  osmtest/osmtest.c: handle timeouts in PR stress test
  opensm/osm_helper.c: fix potential overrun of the array
  opensm/osm_helper.c: cosmetics - move define closer to the relevant code
  opensm/osm_mesh.c: fixing a bug in compare_switches()
  opensm/osm_subnet.c: fixing small bug in error path
  opensm/osm_db_files.c: fix small memory leak
  osmtest/osmt_slvl_vl_arb.c: handling fopen() failure
  opensm/osm_helper.c: use ARR_SIZE macro instead of hardcoded values
  osm_vl15intf.c: fixing use-after-free coredump
  opensm/osm_trap_rcv.c: fix possible core dump
  opensm/osm_ucast_ftree.c: fix small memory leak in error path
  opensm/osm_ucast_ftree.c: fixing another memory leak at error path
  opensm/osm_ucast_lash.c: small bug in calculating allocated size
  opensm/osm_pkey_mgr.c: fixing small memory leak
  opensm/osm_ucast_file.c: closing file descriptor in error path
  opensm/osm_qos_parser_y.y: fixing bunch of memory leaks on invalid values
  opensm/osm_console.c: fix memory and file descriptor leaks
  opensm/st.c: fix potential core dumps

libibumad:
==

Jason Gunthorpe (1):
  Fix autotools to include the necessary M4 files

Mike Heinz (1):
  FW: [PATCH] umad_send.3 (man page)

Yevgeny Kliteynik (1):
  umad.{c,h}: moving stdlib.h include from C to H file

libibmad:
=

Ira Weiny (1):
  libibmad/fields.c: Change all PortCounter names to match the Specification

Jason Gunthorpe (1):
  Fix autotools to include the necessary M4 files

infiniband-diags:
=

Albert Chu (4):
  add --diff support to iblinkinfo
  support --diffcheck in iblinkinfo
  Add lid and node description diff options for --diffcheck in iblinkinfo
  support --filterdownports in iblinkinfo

Alex Netes (3):
  Makefile: ChangeLog and version generation script path fix
  infiniband-diags: update shared library versions
  infiniband-diags: package versions update

Eli Dorfman (Voltaire) (2):
  infiniband-diags: Do not exit when unexpected node found
  inifiband-diags: Support Voltaire switch ISR4200

Hal Rosenstock (3):
  infiniband-diags/ibtracert: Eliminate direct route (-D) option
  infiniband-diags/saquery.c: In dump_one_mcmember_record, fix flow label 
endian
  infiniband-diags/iblinkinfo.c: Limit some queries to switches

Ira Weiny (4):
  libibmad/fields.c: Change all PortCounter names to match the Specification
  infiniband-diags: Verify timeout value specified to diagnostics
  Further timeout paramater verifica

[ewg] OFA management tree seperation

2011-02-12 Thread Alex Netes

We finished the management tree seperation.
>From now on, Ira Weiny  takes the responsibility for 
>maitaining libibmad and
infiniband-diags. His trees are:

git://git.openfabrics.org/~iraweiny/libibmad
git://git.openfabrics.org/~iraweiny/infiniband-diags

libibumad, opensm and ibsim trees stays under my responsibility:

git://git.openfabrics.org/~alexnetes/libibumad
git://git.openfabrics.org/~alexnetes/opensm
git://git.openfabrics.org/~alexnetes/ibsim


Alex.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Current administrator for git accounts on openfabrics.org

2011-02-09 Thread Alex Netes
Hi Ira,

Who would I contact for a git account on git.openfabrics.org/git?

> Ken Strandberg  is sysadmin in openfabrics.org and he 
> would be happy to assist you.

Thanks, Alex. 
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg