Re: [ewg] [ANNOUNCE] OFED 4.8-rc2 release is available
On 5/4/2017 3:14 PM, Woodruff, Robert J wrote: > Doug Ledford wrote, >> The OFA has already learned their lesson once with XRC. I wonder >> if they are getting ready to get hit with another lesson over this. >> The issue is if the OFA ships an API and it isn't upstream, then >> that API needs to legitimately be a >private, should never conflict >> with upstream API. If upstream then implements the same thing in a >> different way and users are exposed to the rather unpleasant choice >> of code to OFA's API or to upstream's API for the same thing, it >> >creates a schism in the user's code base around what API to >> use/support. This is entirely contrary to the OFA's stated goals >> about the end user experience of people using their RDMA software. >> So Intel can certainly make any cost >analysis they want about >> their hardware and the software to support it, but the OFA is not >> Intel's personal software distributor and the OFA must look at >> other, bigger picture issues than Intel. > > The Xeon-Phi code is somewhat different than what happened in the > past with XRC. In the case of XRC, it was integrated into the base > OFED package and built-in by default. For the Xeon-Phi code, it is > clearly marked as a technology preview and is not even compiled in at > all unless specifically enabled. This is not too terribly different > than the experimental branches that the kernel has. Also, the > Xeon-Phi code does not add any new APIs. It implements a new kernel internal API, yes? > It simply implements a > driver set and library, thus there is no risk in someone coding to an > API in OFED that gets totally changed once it gets upstream. I'm not sure I understand your statement here Woody. If there's no API, then how do people even use the hardware? Or are you saying that the API is in the library, and that API can be preserved even if the underlying driver implementation is changed to match whatever upstream might implement instead of what you already have implemented? > you really do not have a say in the matter. Nope. I don't. But, as I pointed out in my other email, if relationships matter, then whether or not someone has a say does not negate the need to listen. Personally, I haven't really investigated this code so I'm not going to argue against the fact that the OFA ships it, other than what I have already which is that it has been a stated goal of the OFA to foster a unified code base, so collaborating with upstream is generally necessary. If you are saying that the Xeon Phi support is implemented in a library (like nVidia's CUDA support) that insulates the end user from a possible fracture if upstream implements things differently, then that mostly settles my concerns. I still think it would be best if the Xeon Phi people collaborated with upstream on the kernel internal Peer to Peer PCI API as that's evidently a requirement of the Xeon Phi library? It is conceivably possible failure to collaborate could in fact break the Xeon Phi library if they simply don't implement something the library has a hard requirement on. But that's outside of my particular wheel house so I'm just suggesting that it might be a wise thing to do. -- Doug Ledford <dledf...@redhat.com> GPG Key ID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: OpenPGP digital signature ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/mailman/listinfo/ewg
Re: [ewg] [ANNOUNCE] OFED 4.8-rc2 release is available
[linux-rdma@ was accidentally dropped on my email, so readding it on this response] On 5/4/2017 2:44 PM, Hefty, Sean wrote: >>> OFED is a software product of OFA, not Linux. OFA can put anything >>> that they want in it. Why do you even care? It's no different than >>> Intel or Mellanox or any other company shipping out of tree >>> software. >> >> The primary answer to your question depends on whether or not the >> software will ever be upstreamed. If it will, then it really should >> go >> there first and not later, and the reason is well exemplified by what >> happened with XRC where the version that landed in OFED and the >> version >> that landed in upstream were two totally different things, and users >> had >> to go back and fix up all their code because of the difference once it >> finally did land upstream. It's not nice to put users in that >> position >> again, and this does sound like it might end up going down that exact >> road since upstream is pursuing ways of doing peer to peer PCI >> operations and such without any input from the Xeon Phi folks. > > I'm not defending whatever business decisions any organization (including a > multi-company non-profit like OFA) wants to make wrt their software > distributions. I'm claiming that that's their decision. > I'm not sure I agree with that position. A company has the right to decide for themselves what to do. An organization like the OFA is different, in that it is based upon a collective agreement entered into by multiple parties with certain specific stated intents and goals written out in bylaws (although we all know the OFA is already in violation of those at the moment, let's just assume they aren't for the purposes of this conversation). If the organization then takes to violating those bylaws, it essentially becomes in breach of contract to itself and all members that originally agreed to those bylaws to my lay persons legal mind. I would say that does give people (at a minimum, any member of the organization who entered into this agreement under different pretenses, but possibly also to non-member entities affected by the actions of the organization) grounds to complain. But all that aside, the OFA at least has a pretense of wanting to get along with the upstream linux community. As long as they want to preserve that relationship, then they should listen when the community has something to say. It might well be their decision, but the ramifications of that decision might sabotage their other interests. -- Doug Ledford <dledf...@redhat.com> GPG Key ID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: OpenPGP digital signature ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/mailman/listinfo/ewg
Re: [ewg] [ANNOUNCE] OFED 4.8-rc2 release is available
On 5/4/2017 1:34 PM, Hefty, Sean wrote: >> This is exactly why we so strongly discourage this out of tree >> stuff - getting something unmergable in OFED is *NOT* Job Done, >> Time to Go Home. Down this path just creates another Lustre mess. > > Actually, this may be 'job done'. No individual or company is > obligated to provide upstream software for any of their hardware. Very true, but in that case you would expect Intel to be shipping the software, not OFA. Much like they already do with their compiler, their MPI, etc. > OFA decides what to ship in their software products, not the greater > linux kernel community. Individual companies can decide if out of > tree maintenance is more cost effective than trying to merge code > upstream. Because that's what this ultimately comes down to. The OFA has already learned their lesson once with XRC. I wonder if they are getting ready to get hit with another lesson over this. The issue is if the OFA ships an API and it isn't upstream, then that API needs to legitimately be a private, should never conflict with upstream API. If upstream then implements the same thing in a different way and users are exposed to the rather unpleasant choice of code to OFA's API or to upstream's API for the same thing, it creates a schism in the user's code base around what API to use/support. This is entirely contrary to the OFA's stated goals about the end user experience of people using their RDMA software. So Intel can certainly make any cost analysis they want about their hardware and the software to support it, but the OFA is not Intel's personal software distributor and the OFA must look at other, bigger picture issues than Intel. > IMO, the only people who have legitimate complaints here are those > people running Xeon Phi with Mellanox HCAs who are being forced to > use OFED, rather than upstream code. I suspect there are legitimate grounds to complain about the fact that it is shipped in the OFA OFED and not limited to an Intel OFED derivative similar to Mellanox OFED. -- Doug Ledford <dledf...@redhat.com> GPG Key ID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: OpenPGP digital signature ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/mailman/listinfo/ewg
Re: [ewg] [ANNOUNCE] OFED 4.8-rc2 release is available
On 5/4/2017 1:19 PM, Hefty, Sean wrote: >> So in plain words: Intel abuses their influence in OFED to ship >> crap that has absolutely no chance to get upstream in the current >> form instead of working with the community to improve >> infrastructure. >> >> That's exactly what I guessed, thanks for confirming. > > OFED is a software product of OFA, not Linux. OFA can put anything > that they want in it. Why do you even care? It's no different than > Intel or Mellanox or any other company shipping out of tree > software. The primary answer to your question depends on whether or not the software will ever be upstreamed. If it will, then it really should go there first and not later, and the reason is well exemplified by what happened with XRC where the version that landed in OFED and the version that landed in upstream were two totally different things, and users had to go back and fix up all their code because of the difference once it finally did land upstream. It's not nice to put users in that position again, and this does sound like it might end up going down that exact road since upstream is pursuing ways of doing peer to peer PCI operations and such without any input from the Xeon Phi folks. -- Doug Ledford <dledf...@redhat.com> GPG Key ID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD signature.asc Description: OpenPGP digital signature ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/mailman/listinfo/ewg
Re: [ewg] OFED release with RoCE v2 support
On 05/18/2016 11:22 AM, Woodruff, Robert J wrote: >> In this case, I expect you guys to be hurting hard when you try to do all >> the backports. The problem is that in the last year we have made lots >of >> changes that depend on upstream kernel improvements both inside the RDMA >> stack and out. I suspect this backport cycle may be worse >than those in >> the past. But I could be wrong... > > Yes, and of course it depends on which distro kernel we attempt to backport > to. For the later model RHEL 7.x and SLES 12 kernels, it may not > be as bad, if those kernels have picked up the required upstream kernel > improvements. I am however worried that it might be too hard to try to > backport to a RHEL 6.x series kernel. We'll have to evaluate that once we > start work on the next major OFED. I would not be surprised in the least if backports to an EL6 kernel are a simple no-go for current upstream code. -- Doug Ledford <dledf...@redhat.com> GPG KeyID: 0E572FDD signature.asc Description: OpenPGP digital signature ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/mailman/listinfo/ewg
Re: [ewg] OFED release with RoCE v2 support
On 05/18/2016 10:24 AM, Woodruff, Robert J wrote: > >> Thanks for the quick response. > >> Do we have any rough estimate on when the next major release of OFED will >> GA? > > In the past, when we move up to a new kernel version, it usually takes a fair > amount of time for people to develop all the backports > to the various Linux distro kernels. From past experience, I would estimate > at least 6 months, which would put the release > at the end of the year or so. If you want to keep closer tabs on the > activities, you can subscribe to the ewg email list and/or attend the > bi-weekly ewg conference calls. In this case, I expect you guys to be hurting hard when you try to do all the backports. The problem is that in the last year we have made lots of changes that depend on upstream kernel improvements both inside the RDMA stack and out. I suspect this backport cycle may be worse than those in the past. But I could be wrong... -- Doug Ledford <dledf...@redhat.com> GPG KeyID: 0E572FDD signature.asc Description: OpenPGP digital signature ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/mailman/listinfo/ewg
Re: [ewg] libibverbs not compiling against RHEL6.6
On 05/16/2016 12:36 PM, Steve Wise wrote: > > >> -Original Message----- >> From: Doug Ledford [mailto:dledf...@redhat.com] >> Sent: Monday, May 16, 2016 11:27 AM >> To: Steve Wise; ewg@lists.openfabrics.org >> Subject: Re: libibverbs not compiling against RHEL6.6 >> >> On 05/16/2016 12:09 PM, Steve Wise wrote: >>> See: http://bugs.openfabrics.org/bugzilla/show_bug.cgi?id=2598 >>> >>> Looks like some compilation problem against RHEL6.6. Who should own >>> this? It is assigned to Doug Ledford, but I'm not sure he's the correct >>> person? >> >> I'm not, but I can tell you what the problem is. Upstream versions of >> multiple packages like to instill rpath settings in libraries so they >> can have both devel and production libraries on the same box. The >> problem is that rpath can cause unexpected behaviors. Things like >> Distro X releases security update to library A used by app M. But, >> because app M uses rpath to find it's library as library B, the update >> has no effect. The user thinks they are secure, when in fact, they are >> not. So, in general rpath laden libraries and applications are >> considered a strong security risk by distros as they can silently >> prevent security updates from taking effect. For this reason, the Red >> Hat packaged version of rpmbuild includes checks for rpath and throws >> errors when it is found. You either have to turn that security check >> off in rpm, or you have to modify the libibverbs package not to use >> rpath in its final files. >> >> Here's a couple options to solve the issue. >> >> In the %build section of the rpm spec file: >> sed -i 's|^hardcode_libdir_flag_spec=.*|hardcode_libdir_flag_spec=""|g' >> libtool >> sed -i 's|^runpath_var=LD_RUN_PATH|runpath_var=DIE_RPATH_DIE|g' libtool >> make %{?_smp_mflags} CFLAGS="$CFLAGS -fno-strict-aliasing" >> >> In the %install section of the rpm spec file: >> # kill rpaths >> chrpath -d %{buildroot}%{_bindir}/* >> > > Thanks Doug. So this would be a change to the spec.in file included in the > libibverbs package? No. You should have your own spec file for building packages. The spec file in the rpm itself is a good starting point, but it generally should be modified to suit the particular OS you are building on. We (meaning Red Hat) keep our spec files separate from our tarballs and build our packages from the combination. OFED should do the same thing. Otherwise you run into issues like this and rebuilding an upstream release to solve an OFED build issue doesn't make much sense. -- Doug Ledford <dledf...@redhat.com> GPG KeyID: 0E572FDD signature.asc Description: OpenPGP digital signature ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/mailman/listinfo/ewg
Re: [ewg] libibverbs not compiling against RHEL6.6
On 05/16/2016 12:09 PM, Steve Wise wrote: > See: http://bugs.openfabrics.org/bugzilla/show_bug.cgi?id=2598 > > Looks like some compilation problem against RHEL6.6. Who should own > this? It is assigned to Doug Ledford, but I'm not sure he's the correct > person? I'm not, but I can tell you what the problem is. Upstream versions of multiple packages like to instill rpath settings in libraries so they can have both devel and production libraries on the same box. The problem is that rpath can cause unexpected behaviors. Things like Distro X releases security update to library A used by app M. But, because app M uses rpath to find it's library as library B, the update has no effect. The user thinks they are secure, when in fact, they are not. So, in general rpath laden libraries and applications are considered a strong security risk by distros as they can silently prevent security updates from taking effect. For this reason, the Red Hat packaged version of rpmbuild includes checks for rpath and throws errors when it is found. You either have to turn that security check off in rpm, or you have to modify the libibverbs package not to use rpath in its final files. Here's a couple options to solve the issue. In the %build section of the rpm spec file: sed -i 's|^hardcode_libdir_flag_spec=.*|hardcode_libdir_flag_spec=""|g' libtool sed -i 's|^runpath_var=LD_RUN_PATH|runpath_var=DIE_RPATH_DIE|g' libtool make %{?_smp_mflags} CFLAGS="$CFLAGS -fno-strict-aliasing" In the %install section of the rpm spec file: # kill rpaths chrpath -d %{buildroot}%{_bindir}/* -- Doug Ledford <dledf...@redhat.com> GPG KeyID: 0E572FDD signature.asc Description: OpenPGP digital signature ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/mailman/listinfo/ewg
Re: [ewg] [PATCH] libmlx5: Implement missing open_qp verb
On Mon, 2015-05-18 at 11:02 +0200, Sébastien Dugué wrote: Commit 0c7ac1083831 added XRC support for mlx5, however this is missing the open_qp verb. Signed-off-by: Sébastien Dugué sebastien.du...@bull.net Patch looks reasonable. However, I didn't review it to make sure that it interacts with other libmlx5 internals properly, just for the obvious things (locks handled properly, proper error unwind, etc). I'm assuming Eli will process this, for now I'm removing it from patchworks (Eli isn't listed as a maintainer there so he can't remove it, which highlights one of the difficulties of this list being used for both kernel and user space patches...there are lots of user space maintainers and most of them aren't list as such in patchworks). --- src/mlx5.c | 1 + src/mlx5.h | 2 ++ src/verbs.c | 38 ++ 3 files changed, 41 insertions(+) diff --git a/src/mlx5.c b/src/mlx5.c index d02328881992..39f59975d3d2 100644 --- a/src/mlx5.c +++ b/src/mlx5.c @@ -579,6 +579,7 @@ static int mlx5_init_context(struct verbs_device *vdev, context-ibv_ctx.ops = mlx5_ctx_ops; verbs_set_ctx_op(v_ctx, create_qp_ex, mlx5_create_qp_ex); + verbs_set_ctx_op(v_ctx, open_qp, mlx5_open_qp); verbs_set_ctx_op(v_ctx, open_xrcd, mlx5_open_xrcd); verbs_set_ctx_op(v_ctx, close_xrcd, mlx5_close_xrcd); verbs_set_ctx_op(v_ctx, create_srq_ex, mlx5_create_srq_ex); diff --git a/src/mlx5.h b/src/mlx5.h index 6ad79fe324d3..f548e51ee338 100644 --- a/src/mlx5.h +++ b/src/mlx5.h @@ -613,6 +613,8 @@ void *mlx5_get_send_wqe(struct mlx5_qp *qp, int n); int mlx5_copy_to_recv_wqe(struct mlx5_qp *qp, int idx, void *buf, int size); int mlx5_copy_to_send_wqe(struct mlx5_qp *qp, int idx, void *buf, int size); int mlx5_copy_to_recv_srq(struct mlx5_srq *srq, int idx, void *buf, int size); +struct ibv_qp *mlx5_open_qp(struct ibv_context *context, + struct ibv_qp_open_attr *attr); struct ibv_xrcd *mlx5_open_xrcd(struct ibv_context *context, struct ibv_xrcd_init_attr *xrcd_init_attr); int mlx5_get_srq_num(struct ibv_srq *srq, uint32_t *srq_num); diff --git a/src/verbs.c b/src/verbs.c index 8ddf4e631c9f..dc899bce4e00 100644 --- a/src/verbs.c +++ b/src/verbs.c @@ -1122,6 +1122,44 @@ int mlx5_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, return ret; } +struct ibv_qp *mlx5_open_qp(struct ibv_context *context, + struct ibv_qp_open_attr *attr) +{ + struct ibv_open_qp cmd; + struct ibv_create_qp_resp resp; + struct mlx5_qp *qp; + int ret; + struct mlx5_context *ctx = to_mctx(context); + + qp = calloc(1, sizeof(*qp)); + + if (!qp) + return NULL; + + ret = ibv_cmd_open_qp(context, qp-verbs_qp, sizeof(qp-verbs_qp), + attr, cmd, sizeof(cmd), resp, sizeof(resp)); + if (ret) + goto err; + + pthread_mutex_lock(ctx-qp_table_mutex); + ret = mlx5_store_qp(ctx, qp-verbs_qp.qp.qp_num, qp); + + if (ret) { + pthread_mutex_unlock(ctx-qp_table_mutex); + fprintf(stderr, mlx5_store_qp failed ret=%d\n, ret); + goto destroy; + } + pthread_mutex_unlock(ctx-qp_table_mutex); + + return (struct ibv_qp *)qp-verbs_qp; + +destroy: + ibv_cmd_destroy_qp(qp-verbs_qp.qp); +err: + free(qp); + return NULL; +} + struct ibv_ah *mlx5_create_ah(struct ibv_pd *pd, struct ibv_ah_attr *attr) { struct mlx5_ah *ah; -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/mailman/listinfo/ewg
Re: [ewg] Huge size for opensm process (5.7GB)
On 09/26/13 15:28, Sandeep Dhavale wrote: Hello, I have a setup where 2 nodes are connected back to back. So my subnet is basically 2 nodes. This is RHEL6.3 setup and opensm running is [root@intel-eva1 ~]# rpm -qi opensm Name: opensm Relocations: (not relocatable) Version : 3.3.13Vendor: Red Hat, Inc. Release : 1.el6 Build Date: Tue 28 Feb 2012 07:53:06 PM PST Install Date: Thu 04 Jul 2013 01:44:25 AM PDT Build Host: x86-003.build.bos.redhat.com Group : System Environment/DaemonsSource RPM: opensm-3.3.13-1.el6.src.rpm Size: 1317469 License: GPLv2 or BSD Signature : RSA/8, Wed 30 May 2012 11:14:47 AM PDT, Key ID 199e2f91fd431d51 Packager: Red Hat, Inc. http://bugzilla.redhat.com/bugzilla URL : http://www.openfabrics.org/ Summary : OpenIB InfiniBand Subnet Manager and management utilities The opensm process has grown up and is now 5.7GB. I do not think this is normal for a subnet of size 2 nodes. PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ DATA COMMAND 15306 root 20 0 5851m 2568 692 S 0.0 0.0 0:06.04 5.7g opensm [root@intel-eva1 ~]# ps -ef | grep opensm root 15306 1 0 01:04 ?00:00:06 /usr/sbin/opensm -B -F /etc/rdma/opensm.conf.[0-9]* One thing to notice is the /var/log/opensm.log is flooded with messages like below every 10 second: Sep 26 12:21:58 384194 [E2BA2700] 0x01 - osm_prtn_make_partitions: Partition configuration /etc/rdma/partitions.conf is not accessible (No such file or directory) Sep 26 12:21:58 385098 [E2BA2700] 0x02 - SUBNET UP Sep 26 12:22:08 384306 [E2BA2700] 0x01 - osm_prtn_make_partitions: Partition configuration /etc/rdma/partitions.conf is not accessible (No such file or directory) Sep 26 12:22:08 385189 [E2BA2700] 0x02 - SUBNET UP Can anybody put some light on what might be wrong? Does opensm maintain the log in memory as well? I haven’t put a limit on the size of the log in /etc/rdma/opensm.conf. You're hitting a bug in the opensm startup script on that release. It's trying to run with a non-existent config file (this was fixed by adding shopt -s nullglob to the startup script). I would update to the later opensm from rhel6.4 where this is no longer an issue. -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD http://people.redhat.com/dledford ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Status of SDP? (and ib_sdp patch for Linux-3.4.x)
On 04/23/2012 01:25 PM, richard Croucher wrote: What's makes SDP so useful and why is will be sorely missed is that it does not require users to refactor existing TCP socket code, or even be in possession of the source code for that matter. Neither does IPoIB, and there are no legal questions about the usability of IPoIB. Simply preloading libsdp is all need to do to convert that old, cranky tcp program to run (almost) native over InfiniBand. SDP and IPoIB are not that far apart on a well tuned IPoIB setup. SDP is used as a wire protocol rather than a API. There are several contending API's out there, including Bob Russell's Extended Socket API which could be candidates, but I would like to see the concept of the preload preserved. It's been a powerful factor in getting InfiniBand adoption. It's been a nice stop gap measure to help those apps that needed a boost larger than IPoIB (although not greatly so) while waiting on a native RDMA version of the application. If the app truly needed the performance of RDMA, they had plenty of time to port over to it. If they didn't deem porting important enough, then they should be able to exist with just IPoIB. There is very little impetus to maintain two versions of software to make sockets based apps work on RDMA fabrics when the relative gain of one over the other isn't that high and it has legal encumbrances to negate the positive performance benefit. -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD http://people.redhat.com/dledford signature.asc Description: OpenPGP digital signature ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Licensing issue with current rds-tools tarballs
This came up during package review for Fedora. I have already brought it up with Venkat at Oracle. The issue is that the licensing of this package is incomplete (the file COPYING that is referenced in several source files is not present in the tarball) and in one place outdated (in examples/rds-example.c, the address for the FSF is incorrect). Just an FYI for any companies whose legal teams are as strict as ours about not redistributing code without a proper, legal license. I hope an rds-tools-2.0.8 should be rolled soon to resolve the issue. -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD http://people.redhat.com/dledford signature.asc Description: OpenPGP digital signature ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [RFC] – Proposal for new process for OFED releases
On 12/01/2011 02:53 PM, Tziporet Koren wrote: We propose a new process for the OFED releases starting from next OFED release: - OFED content will be the relevant kernel.org modules and user space released packages - OFED will offer only backports to the distros (no fixes) - OFED package will be used for easy installation of all packages in a friendly manner Yay!!! I'm all in favor. -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Allowing ib dignostics to be run without being logged in as root.
On 05/26/2010 03:52 PM, rich...@informatix-sol.com wrote: It's better to be statically linked. This is not the opinion of most people I know. It used to be the norm back in the day, but the truth of the matter is that when it comes to system libraries, if an attacker has managed to compromise either a system library or the dynamic linker, then the system is already lost and the ability to compromise your program is moot. If, on the other hand, you have statically linked your program and then an exploit has been found in the library you statically linked, now your program is vulnerable even after the system shared library has been updated. Having said that, I've seen packages from OFED developers that tend to do multiple of the various bad security practices. Things like installing libraries in places like /usr/local or even in home directories, or using rpath in programs to allow circumvention of system installed shared libraries. These are things that should *never* be done on production software or systems and should be purged from any software prior to release. However all setuid programs present a threat. The challenge as a security administrator is to assess and minimize the threat. Smaller programs where you can inspect and understand the program are more trustable than large complex programs. This part is very true. -- Doug Ledford dledf...@redhat.com GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: OpenPGP digital signature ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Allowing ib dignostics to be run without being logged in as root.
On 05/25/2010 07:21 PM, Woodruff, Robert J wrote: Hal wrote, If you really want any user to do this, is changing umad permissions sufficient ? This is less of a security hole than setuid but does open things up for malicious users. -- Hal I wanted to avoid doing this as it would allow some malicious user to just open /dev/umad and send random mads and cause big problems with the fabric. I was thinking that if the applications like perfquery are trusted to not allow someone to do anything malicious, then having them run as setuid root would not open a security hole ? sudo sounds like if would allow them to run any command as root ID, which I think is a larger security hole than just setting the one or few trusted applications to setuid root. But then, I am not a security expert so I may not know all of the possible issues with setting a command to setuid root. (This may sound a bit like a lecture, but you indicated you didn't know the implications of setting a program set uid root, so this is intended to shed some light on the sorts of things involved in such a decision) The problem with setting *any* command to setuid root is not whether or not the command is supposed to do safe things, but whether or not the command can be tricked (either itself or any of the libraries it is linked against) into doing something unintended. From that standpoint, you have to examine every single possible area of user input into the command and then determine whether or not malicious input can cause things to go awry. To use a fictitious, made up problem from the perfquery.c code: The user supplies the local port number to use on the command line, the program uses strtoul() on the argument to convert it to an unsigned long, but it does nothing to check the range of the port. So far, this is actual stuff from the program. Now, for the sake of argument, let's make up a nice, easy exploit ;-) Assume the port is passed into a listen call at some point. Let's also assume that at some point there was a check in the application to make sure the port was 1024. Then let's assume that the glibc implementation of the listen call masked the int to just the valid port range. A malicious user could then pass in a number 65536 to pass the application's built in check for a port 1024, but then when glibc masked off to just the valid port range (0-65535), a properly constructed int would mask down to something inside the reserved port range of the TCP namespace (0-1023). As long as the application is not suid, no big deal because this trick would just result in the kernel telling the application it couldn't listen on a port 1024 as those are reserved. Once you make the application suid though, it could silently use this trick to listen on a reserved port. Of course, what you *do* with that is still up in the air, but this highlights why making *any* app suid root is a security risk. It's not what the app is supposed to do that matters, it's how a malicious user can trick the app into doing something it was never intended to do that matters. And those tricks can be very non-obvious to casual inspection. So, setting something to suid root should be an absolute last resort thing. Normally, you only do so after performing a security audit of the application. But a security audit doesn't guarantee an application is 100% exploit free. Really, an audit just provides a relative level of assurance that the app is safe to make suid root. The level of confidence is directly related to the level of expertise of the auditor as well as the care taken by the auditor in this particular audit, and it is inversely proportional to the complexity of the code. So, to be as safe as possible, you write your program to be as simple as possible, to do only what it absolutely must suid root, and have the best auditor audit it. -- Doug Ledford dledf...@redhat.com GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: OpenPGP digital signature ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH v4] IB Core: RAW ETH support
On 06/16/2010 02:35 PM, Steve Wise wrote: Jason Gunthorpe wrote: On Wed, Jun 16, 2010 at 12:12:06PM -0500, Steve Wise wrote: Granted our dev process may not be documented, but I always assumed the general idea was to get changes accepted upstream, then pull into ofed. OFED is just a mechanism to make top-of-tree linux work on distro kernels. There are some exceptions, but this stuff shouldn't be an exception. That is what many people wish for, me included, but it is not at all what generally happens :( In my observation the typical flow is: - A patch is written, it may or may not be sent to the list - 'business drivers' get it slammed into OFED right away - A patch is finally sent for proper review - It is not merged, there are comments.. - Interest in doing anything is lost because it is already in OFED and that is all that matters, right? - People complain. For instance, the iWarp thingy we were just discussing fits this process rather well. You're wrong. I started that iWARP change in 2007 on LKLM. I proposed a few ideas and show the pros/cons of each. And it was NAKed 100% by mister miller.It was then included in OFED as a last resort only because I couldn't get any help with trying to add this upstream in any form. I even spent a few weeks developing a way to administor iwarp only ipaddresses, but Roland didn't like that scheme for various reasons. So please don't mention that particular patch as a bad process unless you want to argue with me some more about it. Uhm, what you just described does fit my process outline: #1 - Patch written, sent to LKML. Check. #3 - Patch sent for proper review - in 2007. Check. #4 - Not merged. NAK by DM. Check. #2 - 'business drivers' force it into OFED - 'last resort' ie, iWarp cards can't be used without some fix. Check. #5 - Interest is lost. Yep, this was done in 2007, and it was idle till now. Check. #6 - People Complain. Hmm. Yep. Check. Note the ordering is different. IE I tried very hard to get the right solution designed and agreed-upon upstream. But failed. That's my bad.I did, however help with the iWARP core code including neighbour update net events which did go in upstream before ofed. Don't think I'm being critical toward only you, or singling out that little iWarp patch. But it really isn't special, or different, or an exception. Pick nearly any patch in OFED and someone will rush to its defense with a 'we tried to follow the process and it failed, so we did it anyway' argument. I also didn't say this is the only way that RDMA development goes, lots and lots of stuff goes into mainline first, from everyone. Jason OFED maintainers should be more rigid, perhaps, with requiring that changes be accepted upstream first. One observation is that there is no OFED RDMA maintainer, aka a Roland Dreier, for the OFED code. So each driver maintainer pretty much has free reign to do the right thing or the wrong thing... Yep, no doubt that has an impact on things. It's for this very reason that our next operating system is not following OFED but instead is using upstream as its basis. That will be true from now on with our products. -- Doug Ledford dledf...@redhat.com GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: OpenPGP digital signature ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH v4] IB Core: RAW ETH support
On 06/16/2010 01:12 PM, Steve Wise wrote: Jason Gunthorpe wrote: On Wed, Jun 16, 2010 at 09:09:59AM -0500, Steve Wise wrote: Granted our dev process may not be documented, but I always assumed the general idea was to get changes accepted upstream, then pull into ofed. OFED is just a mechanism to make top-of-tree linux work on distro kernels. There are some exceptions, but this stuff shouldn't be an exception. That is what many people wish for, me included, but it is not at all what generally happens :( In my observation the typical flow is: - A patch is written, it may or may not be sent to the list - 'business drivers' get it slammed into OFED right away - A patch is finally sent for proper review - It is not merged, there are comments.. - Interest in doing anything is lost because it is already in OFED and that is all that matters, right? - People complain. For instance, the iWarp thingy we were just discussing fits this process rather well. You're wrong. I started that iWARP change in 2007 on LKLM. I proposed a few ideas and show the pros/cons of each. And it was NAKed 100% by mister miller.It was then included in OFED as a last resort only ^ Which, of course, is the problem. Once you have a solution besides get it upstream, you throw whatever you feel like into OFED instead of whatever upstream will accept. How long has OFED shipped the XRC stuff now while it *still* isn't upstream? because I couldn't get any help with trying to add this upstream in any form. Again, OFED is part of the reason this failed. That users had someplace else to get working code besides upstream meant that you didn't have end users putting pressure on the upstream kernel folks to accept *some* form of solution. So, your job was harder because there were no users present to put pressure on mister miller or others, and then you perpetuated the issue by caving and going back to OFED as a last resort. It has become a last resort so often now that trying to get things upstream first is just a sort of private joke amongst some people I think. I even spent a few weeks developing a way to administor iwarp only ipaddresses, but Roland didn't like that scheme for various reasons. So please don't mention that particular patch as a bad process unless you want to argue with me some more about it. Also, the chelsio iWARP driver has 100% been upstream first, then ofed. Some of us are indeed trying to do the right thing. steps off soap box OFED just needs to go away. It's been far too abused for far too long and it's mere existence is hindering upstream development. I appreciate that you attempt to do the right thing most of the time, but it really needs to be all of the time, and you need your users right there beside you in order to carry the weight you need in order to get solutions designed and accepted instead of running into the brick wall you ran into before. -- Doug Ledford dledf...@redhat.com GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: OpenPGP digital signature ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ofa-general] Re: [ewg] [Patch mthca backport] Don't use kmalloc 128k
On Jul 27, 2009, at 1:10 PM, Roland Dreier wrote: And I don't think the upstream kernel has that limit on kmalloc size either (at least with SLUB, not sure about SLAB). This patch was actually written as an emulation of the upstream SLUB behavior, which is exactly the same thing: on large allocations forward to __g_f_p(). See include/linux/slub_def.h's definition of kmalloc_large and kmalloc. Right. But does upstream SLAB also pass through to the page allocator the same as SLUB? No, slab just fails, in which case you have to do your own __g_f_p call. How about SLQB? No clue. -- Doug Ledford dledf...@redhat.com GPG KeyID: CFBFF194 http://people.redhat.com/dledford InfiniBand Specific RPMS http://people.redhat.com/dledford/Infiniband PGP.sig Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ofa-general] Re: [ewg] [Patch mthca backport] Don't use kmalloc 128k
On Jul 23, 2009, at 3:06 PM, Roland Dreier wrote: This will fix the 2^20 bits limit on our bitmaps once and for all. Not really... since getting 128KB of contiguous memory is likely to fail anyway. That depends. If you mean at bootup when you are first loading the module, no. You only need large allocations on large memory boxes, and fragmentation won't have happened yet. So it's a perfectly reliable mechanism then. If you are talking about unloading and reloading the module on a busy system, then yes, it could fail then. However, I would argue that if you get a module load failure on reload, then you could always just reboot (keeping in mind that really users shouldn't *need* to ever reload the module anyway, and anything that makes them reload the module is probably a bug, I'm perfectly happy with saying the bug requires a reboot instead of a module reload...that might even provide extra incentive to fix the bug). And I don't think the upstream kernel has that limit on kmalloc size either (at least with SLUB, not sure about SLAB). This patch was actually written as an emulation of the upstream SLUB behavior, which is exactly the same thing: on large allocations forward to __g_f_p(). See include/linux/slub_def.h's definition of kmalloc_large and kmalloc. -- Doug Ledford dledf...@redhat.com GPG KeyID: CFBFF194 http://people.redhat.com/dledford InfiniBand Specific RPMS http://people.redhat.com/dledford/Infiniband PGP.sig Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [Patch mthca backport] Don't use kmalloc 128k
On Jul 23, 2009, at 4:20 AM, Jack Morgenstein wrote: On Thursday 16 July 2009 21:08, Doug Ledford wrote: On rhel4 and rhel5 machines, the kmalloc implementation does not automatically forward kmalloc requests 128kb to __get_free_pages. Please include this patch in all rhel4 and rhel5 backport directories so that we do the right thing in the mthca driver on rhel in regards to kmalloc requests larger than 128k (at least in this code path, there may be others lurking too, I'll forward additional patches if I find they are needed). commit a7f18a776785aecb5eb9967aef6f0f603b698ba0 Author: Doug Ledford dledf...@redhat.com Date: Thu Jul 16 12:47:55 2009 -0400 [mthca] Fix attempts to use kmalloc on overly large allocations Signed-off-by: Doug Ledford dledf...@redhat.com This needs a correct signed-off-by: line. Mine got added when I put it in my local git tree, but the original patch came from Red Hat's bugzilla, bug #508902, author David Jeffery djeff...@redhat.com Roland, I think that this patch should be taken into the mainstream kernel, rather than just as a backport patch for RHEL. (We can have a similar patch for mlx4). I notice that __get_free_pages(), free_pages(), and get_order() are all in the mainstream kernel. This will fix the 2^20 bits limit on our bitmaps once and for all. If you agree, I will post this patch and one for mlx4 on the general list. Doug posted this patch on the EWG list. Thanks Doug! diff --git a/drivers/infiniband/hw/mthca/mthca_mr.c b/drivers/ infiniband/hw/mthca/mthca_mr.c index d606edf..312e18d 100644 --- a/drivers/infiniband/hw/mthca/mthca_mr.c +++ b/drivers/infiniband/hw/mthca/mthca_mr.c @@ -152,8 +152,11 @@ static int mthca_buddy_init(struct mthca_buddy *buddy, int max_order) goto err_out; for (i = 0; i = buddy-max_order; ++i) { - s = BITS_TO_LONGS(1 (buddy-max_order - i)); - buddy-bits[i] = kmalloc(s * sizeof (long), GFP_KERNEL); + s = BITS_TO_LONGS(1 (buddy-max_order - i)) * sizeof(long); + if(s PAGE_SIZE) + buddy-bits[i] = (unsigned long *)__get_free_pages(GFP_KERNEL, get_order(s)); + else + buddy-bits[i] = kmalloc(s, GFP_KERNEL); if (!buddy-bits[i]) goto err_out_free; bitmap_zero(buddy-bits[i], @@ -166,9 +169,13 @@ static int mthca_buddy_init(struct mthca_buddy *buddy, int max_order) return 0; err_out_free: - for (i = 0; i = buddy-max_order; ++i) - kfree(buddy-bits[i]); - + for (i = 0; i = buddy-max_order; ++i){ + s = BITS_TO_LONGS(1 (buddy-max_order - i)) * sizeof(long); + if(s PAGE_SIZE) + free_pages((unsigned long)buddy-bits[i], get_order(s)); + else + kfree(buddy-bits[i]); + } err_out: kfree(buddy-bits); kfree(buddy-num_free); @@ -178,10 +185,15 @@ err_out: static void mthca_buddy_cleanup(struct mthca_buddy *buddy) { - int i; + int i, s; - for (i = 0; i = buddy-max_order; ++i) - kfree(buddy-bits[i]); + for (i = 0; i = buddy-max_order; ++i){ + s = BITS_TO_LONGS(1 (buddy-max_order - i)) * sizeof(long); + if(s PAGE_SIZE) + free_pages((unsigned long)buddy-bits[i], get_order(s)); + else + kfree(buddy-bits[i]); + } kfree(buddy-bits); kfree(buddy-num_free); -- Doug Ledford dledf...@redhat.com GPG KeyID: CFBFF194 http://people.redhat.com/dledford InfiniBand Specific RPMS http://people.redhat.com/dledford/Infiniband PGP.sig Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [Patch mthca backport] Don't use kmalloc 128k
On rhel4 and rhel5 machines, the kmalloc implementation does not automatically forward kmalloc requests 128kb to __get_free_pages. Please include this patch in all rhel4 and rhel5 backport directories so that we do the right thing in the mthca driver on rhel in regards to kmalloc requests larger than 128k (at least in this code path, there may be others lurking too, I'll forward additional patches if I find they are needed). bz508902.patch Description: Binary data -- Doug Ledford dledf...@redhat.com GPG KeyID: CFBFF194 http://people.redhat.com/dledford InfiniBand Specific RPMS http://people.redhat.com/dledford/Infiniband PGP.sig Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Distribution of userspace code for OFED-1.5
this change for the beta. tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg - -- Doug Ledford dledf...@redhat.com GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkpFLm8ACgkQQ9aEs6Ims9g73wCgiydabK8voDjG4jJKK+gMErEE aRQAoNr0Fl4ZetWYzYT4nri00BavXbFQ =wRhZ -END PGP SIGNATURE- ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] RFC: Do we wish to take MPI out of OFED?
On Sun, 2009-06-07 at 11:36 +0300, Pavel Shamis (Pasha) wrote: I care about both. I care about the fact that a solid, well adhered to API makes for lots of happy MPI campers, not just one happy MPI camper. And the API road is the path to long term interoperability, not just short term. So if you really care about MPI, I would recommend you look to the long term, and you may find you agree with me then. I agree here with Doug, API road is way to go. But It is not reason to exclude MPIs from OFED. OFED API issues maybe resolved even when MPIs are part of package. You're right, API issues *can* be resolved even if the latest MPIs are shipped with OFED. It's just that this hasn't been the case in the past, and throwing everything in the same bucket is a strong incentive not to worry about it in the future. -- Doug Ledford dledf...@redhat.com GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ofa-general] RE: [ewg] RFC: Do we wish to take MPI out of OFED?
On Fri, 2009-06-05 at 09:44 -0400, Jeff Squyres wrote: 3. As Doug described, packaging MPI and OFED together actually makes it *harder* for distros. Remember that RHEL and SUSE don't end up using any of the OFED packaging; they essentially use the individual SRPMs. One minor clarification, it's not so much the RPM packaging that makes things difficult, it's the compatibility matrix. Since things aren't designed to cleanly inter-operate with each other in anything other than very specific combinations, it means that updates are an all or nothing affair. This is in direct contrast to the rest of our entire operating system where we isolate and target things that need fixed and only things that need fixed. -- Doug Ledford dledf...@redhat.com GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ofa-general] RE: [ewg] RFC: Do we wish to take MPI out of OFED?
On Sun, 2009-06-07 at 13:51 -0700, Woodruff, Robert J wrote: Doug wrote, One minor clarification, it's not so much the RPM packaging that makes things difficult, it's the compatibility matrix. Since things aren't designed to cleanly inter-operate with each other in anything other than very specific combinations, it means that updates are an all or nothing affair. This is in direct contrast to the rest of our entire operating system where we isolate and target things that need fixed and only things that need fixed. I think this was true early on with the OFA and OFED releases, but I do think things are stabilizing in this area as the code has matured and thus I think that having various components decoupled should be easier going forward. For most things this is true, but not for all. BTW, Intel MPI has always been decoupled and we have not seen this to be a problem. We have recommended people install newer versions of OFED from time to time as MPI found bugs that were fixed in the newer OFED, but it was not the API that was not stable, it was just bugs that were found that required a newer OFED version. I don't know the particulars of how Intel MPI is distributed, built, etc. But, when you tell me that you have from time to time recommended customers install a later version of OFED to get a bug fix, I get the impression that you start off by telling customers that OFED is a prerequisite to running Intel MPI in the first place. If that's the case, then I would question whether that's decoupled from OFED, or just not in the OFED tarball. -- Doug Ledford dledf...@redhat.com GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] RFC: Do we wish to take MPI out of OFED?
On Thu, 2009-06-04 at 14:02 -0400, Dhabaleswar Panda wrote: Tziporet, Main reasons to keep MPI in OFED: - All participants test with the same MPI versions, and when installing OFED it is ensured that MPI will work fine with this version. - Customers convenience in install (no need to go to more sites to get MPI) - MPI is an important RDMA ULP and although it is not developed in OFA it is widely used by OFED customers I support keeping MPI packages in the OFED because of the above positive points you have mentioned. I would also like to mention that keeping MPI packages in OFED helps to test out various new features and functionalities (such as APM and XRC in the past and the new memory registration scheme being discussed now) It's interesting that you mention XRC support as something from the past. As far as I'm concerned, it's still a hasn't landed yet feature as it still hasn't landed upstream. as they get introduced. Such an integrated approach helps to test out these features at the lower layers as well as at the MPI layer. Such an integrated approach is great for a test lab, but has no place in a production environment. This integrated approach is what you do when you are in proof of concept phase. After that, you move into development phase where you engineer it properly, define the API, clean up the code, fix bugs, etc. (this is generally what happens during the linux kernel review process, or at least partially). Finally, you reach a stable phase, where most of the bugs are fixed, the API is fixed, and you can generally rely upon the software to work as expected and to handle the majority of situations it is likely to encounter. This integrated approach you mention is really only useful when you are trying to leap frog directly from proof of concept to production use without ever going through the other phases and without ever bringing your code quality up to snuff. However, to make the complete OFED release process work smoothly for everybody (vendors, distros, users, etc.) without affecting the release schedule, it is essential that stable MPI packages are added to OFED. This is what we have been doing wrt MVAPICH and MVAPICH2 for the last several years. If you're just throwing in the latest stable release, then it serves no purpose. Whether it's in OFED or on your site makes absolutely 0 difference except to the size of the OFED tarball. If the developers of any MPI package do not want it to be a part of the OFED due to any constraints, it should be allowed. However, such an action should not force to remove all MPI packages. From the point of view of MVAPICH and MVAPICH2 packages in OFED, we have been providing stable packages to OFED for the last several years helping the OFED community and would like to continue with this process. Thanks, DK ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- Doug Ledford dledf...@redhat.com GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] RFC: Do we wish to take MPI out of OFED?
On Wed, 2009-06-03 at 19:14 +0200, Pawel Dziekonski wrote: On Wed, 03 Jun 2009 at 10:14:07AM -0400, Doug Ledford wrote: On Jun 3, 2009, at 3:39 AM, Pawel Dziekonski wrote: On Tue, 02 Jun 2009 at 10:30:07PM +0300, Tziporet Koren wrote: Main reasons to keep MPI in OFED: - All participants test with the same MPI versions, and when installing OFED it is ensured that MPI will work fine with this version. - Customers convenience in install (no need to go to more sites to get MPI) - MPI is an important RDMA ULP and although it is not developed in OFA it is widely used by OFED customers As a customer I strongly support above mentioned pros. It's a guarantee for us that MPI is well tested with OFED release. MPI makes an effective test bed for the RDMA stack whether it is shipped with it or not. Removing the MPIs from the distribution would not, in all likelihood, change the fact that MPIs would be used to test the RDMA stack prior to release. I believe that this effort saves a lot of troubles that would be raised from separate releases of MPI and OFED distros. If you truly believe this, and you accept that shipping the MPI with the RDMA stack is an acceptable solution to the problem, then you are encouraging totally craptacular engineering as a customer. Since you have a non-US email address, and since craptacular is a word I use frequently, but which I also sort of just made up, let me define that. Sometimes, things are good. When they are really good, they are spectacular. Sometimes, things are crap. When they are *really* crap, they are craptacular. The RDMA stack provides an API. The MPI stacks are nothing more than you look at the problem from vendor point of view (vendor-like mail domain? ;). you care about api. i care about mpi. I care about both. I care about the fact that a solid, well adhered to API makes for lots of happy MPI campers, not just one happy MPI camper. And the API road is the path to long term interoperability, not just short term. So if you really care about MPI, I would recommend you look to the long term, and you may find you agree with me then. from technical point of view it is enough for me if you say in ofed docs that THIS and THAT particular versions of MPI was tested and WORKS. just like suppoerted and tested list of linux kernels and linux distros. I definitely can download and compile mpi by myself. however since you already used some versions of MPI to test the RDMA stack prior to release why not simply attach it to release? it just makes the release more complete from CUSTOMER point of view. For the cons that you stripped from the original email: it throws off one or the other's release schedule, it delays things, etc. -- Doug Ledford dledf...@redhat.com GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] RFC: Do we wish to take MPI out of OFED?
On Jun 3, 2009, at 3:39 AM, Pawel Dziekonski wrote: On Tue, 02 Jun 2009 at 10:30:07PM +0300, Tziporet Koren wrote: Main reasons to keep MPI in OFED: - All participants test with the same MPI versions, and when installing OFED it is ensured that MPI will work fine with this version. - Customers convenience in install (no need to go to more sites to get MPI) - MPI is an important RDMA ULP and although it is not developed in OFA it is widely used by OFED customers As a customer I strongly support above mentioned pros. It's a guarantee for us that MPI is well tested with OFED release. MPI makes an effective test bed for the RDMA stack whether it is shipped with it or not. Removing the MPIs from the distribution would not, in all likelihood, change the fact that MPIs would be used to test the RDMA stack prior to release. I believe that this effort saves a lot of troubles that would be raised from separate releases of MPI and OFED distros. If you truly believe this, and you accept that shipping the MPI with the RDMA stack is an acceptable solution to the problem, then you are encouraging totally craptacular engineering as a customer. Since you have a non-US email address, and since craptacular is a word I use frequently, but which I also sort of just made up, let me define that. Sometimes, things are good. When they are really good, they are spectacular. Sometimes, things are crap. When they are *really* crap, they are craptacular. The RDMA stack provides an API. The MPI stacks are nothing more than a consumer of that API. This is no different than TCP sockets and MPI stacks: API provider/consumer relationship. If there is a problem between the MPIs and the RDMA stacks from version to version, it means that one of them isn't adhering to the API. The proper solution to that problem is *NOT* to put them together and throw the API out the window, it's to get the API provider and consumer to adhere to the API and to make sure that the API works. Otherwise, the only solution is to bundle *every* consumer of that API with the API provider because the API stability is, you guessed it, craptacular. If you aren't part of the solution, you're part of the problem. Don't encourage craptacular engineering that throws the API out the window. thanks, Pawel -- Pawel Dziekonski pawel.dziekon...@wcss.pl Wroclaw Centre for Networking Supercomputing, HPC Department Politechnika Wr., pl. Grunwaldzki 9, bud. D2/101, 50-377 Wroclaw, POLAND phone: +48 71 3202043, fax: +48 71 3225797, http://www.wcss.wroc.pl ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- Doug Ledford dledf...@redhat.com GPG KeyID: CFBFF194 http://people.redhat.com/dledford InfiniBand Specific RPMS http://people.redhat.com/dledford/Infiniband PGP.sig Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] shipping firmware with ofed
On Tue, 2009-03-03 at 12:48 -0600, Steve Wise wrote: Woodruff, Robert J wrote: Steve Wrote, I request that we do add this. Forcing customers to download firmware isn't very user friendly. It should be fairly easy to do, yes? I would take on the work involved to add the infrastructure needed... All that would be needed, I think, is to make a src rpm that allows generating an rpm that will install the firmware in /lib/firmware. Each provider could manage their own. Or we could have a firmware src rpm that holds all the providers' firmware images... Thoughts? Steve. If people submit firmware to ofed, then would it have to be submitted under the dual BSD/GPL license as agreed to in the OFA bylaws. If so, then you would probably have to submit the firmware source code as well, and I am not sure if you are willing to do that... This has somehow been dealt with by RedHat. They ship chelsio binary firmware without src. Doug, can you please comment on this? I asked around, and we've never actually shipped Chelsio firmware (we almost did, but instead Chelsio added the firmware byte code into their upstream driver and we backported that so that the kernel has the firmware built into the Chelsio driver). We have shipped other firmware rpms in the past. Those rpms have to be built separate from regular source based rpms (or else the aggregation aspect of the GPL and most other licenses kicks in), so for instance you couldn't stick a binary firmware blob in the libcxgb3 rpm along side it's library source. We've also traditionally shipped any firmware packages on a separate CD/DVD from our main distribution where we had certain closed source but free to distribute items (like Adobe Reader/Flash Player, that sort of thing). And I don't think the firmware rpms are in the base channel in RHN but instead require you to subscribe to the addons channel in RHN (but I could be wrong on that, I don't keep up with RHN channel assignments myself). So, our preference is actually to have the right firmware be part of the driver itself. That's what cxgb3 does now, it's what QLogic FC adapters have done since forever, as well as quite a few other drivers, and it makes sure the driver and firmware *never* get out of sync. It also means that a kernel upgrade doesn't trigger a firmware upgrade on the file system which can render the previous kernel unusable. -- Doug Ledford dledf...@redhat.com GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] RE: RHEL 5.3 and OFED 1.4.x
On Thu, 2009-01-22 at 16:07 -0600, Steve Wise wrote: I understand the desire to not release new features in a point release, but at the same time, these features are ready or near ready now. And prior features have definitely been released in point releases. (connectX for example). Another key point is that these features do not need the kernel rebase that will happen with ofed-1.5, which will take months... Just more thoughts. :) I'm a bit late to this discussion, and you may have already talked about this in the ewg teleconference, but I want to throw in my thoughts. As far as new features goes, adding ConnectX support in a point release is a huge difference from switching OpenMPI releases from a stable series to the .0 release of the next series. In the case of ConnectX, it was just another driver and its addition should have had almost 0 impact on anyone not using that driver. On the other hand, switching OpenMPI versions changes the OpenMPI stack for everyone and has the potential to create wide spread regressions should something go wrong. So the risk factor comparison between these two actions simply isn't valid. One doesn't risk regressions for non-ConnectX users, one risks regressions for everyone using OpenSM. Steve. Woodruff, Robert J wrote: I think that we need to discuss this in the EWG meeting. In the past I think that we have agreed to only do bug fixes in point release and not add major new features. If we do want to include the new MPI, then perhaps we should call it 1.5 and pull in the schedule for 1.5. Just a thought. woody -Original Message- From: Steve Wise [mailto:sw...@opengridcomputing.com] Sent: Thursday, January 22, 2009 1:46 PM To: John Russo Cc: Woodruff, Robert J; gene...@lists.openfabrics.org; ewg@lists.openfabrics.org Subject: Re: [ewg] RE: RHEL 5.3 and OFED 1.4.x I think releasing OMPI-1.3 with iWARP support is also good justification. And there are RDS issues with ofed-1.4 even over IB that I think will add to justification. John Russo wrote: I understand but I think that this is another consideration that should be factored in. Even if there are no critical PRs to fix, the introduction of RHEL 5.3 (along with less critical PRs) may be enough justification. I simply want to plant the seed in everyone's mind before our next meeting. Thanks -Original Message- From: Woodruff, Robert J [mailto:robert.j.woodr...@intel.com] Sent: Thursday, January 22, 2009 3:44 PM To: John Russo; gene...@lists.openfabrics.org Cc: ewg@lists.openfabrics.org Subject: RE: RHEL 5.3 and OFED 1.4.x In the last EWG meeting, we discussed waiting a month or so and seeing what kind of bugs were reported against 1.4 to determine if a 1.4.1 release was needed. From: general-boun...@lists.openfabrics.org [mailto:general-boun...@lists.openfabrics.org] On Behalf Of John Russo Sent: Thursday, January 22, 2009 12:37 PM To: gene...@lists.openfabrics.org Subject: [ofa-general] RHEL 5.3 and OFED 1.4.x Does the release of RHEL 5.3 create any additional justification for a maintenance release of OFED (1.4.1) to be generated? I am already hearing requests for an OFED release that will support it. John Russo QLogic ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- Doug Ledford dledf...@redhat.com GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] RE: RHEL 5.3 and OFED 1.4.x
On Wed, 2009-01-28 at 09:38 -0600, Steve Wise wrote: Doug Ledford wrote: On Thu, 2009-01-22 at 16:07 -0600, Steve Wise wrote: I understand the desire to not release new features in a point release, but at the same time, these features are ready or near ready now. And prior features have definitely been released in point releases. (connectX for example). Another key point is that these features do not need the kernel rebase that will happen with ofed-1.5, which will take months... Just more thoughts. :) I'm a bit late to this discussion, and you may have already talked about this in the ewg teleconference, but I want to throw in my thoughts. As far as new features goes, adding ConnectX support in a point release is a huge difference from switching OpenMPI releases from a stable series to the .0 release of the next series. In the case of ConnectX, it was just another driver and its addition should have had almost 0 impact on anyone not using that driver. On the other hand, switching OpenMPI versions changes the OpenMPI stack for everyone and has the potential to create wide spread regressions should something go wrong. So the risk factor comparison between these two actions simply isn't valid. One doesn't risk regressions for non-ConnectX users, one risks regressions for everyone using OpenSM. Good points. One way to alleviate this is to ship both 1.2.8 and 1.3 in ofed-1.4.1 and mark 1.3 as experimental. Then remove 1.2.8 in ofed-1.5 and make 1.3.x the production version for ofed-1.5. That's certainly doable IMO. I suggested this in the last conf call but folks didn't like the thought of testing both. But perhaps marking it experimental resolves this issue? So the iWARP vendors will test 1.3 and little to no testing is required for 1.2.8 since it has been qualified with ofed-1.4 QA. What about adding some automated tests using mpitests? Both automated build tests (which does some amount of testing of the mpicc et. al. wrappers) and run tests (which would require a slightly more sophisticated test harness in that it at least needs to know about machines to run the tests over, etc)? In fact, while I'm at it, let me attach my Makefile patch I use against the mpitests-3.1 package in OFED 1.4. It greatly simplifies the make environment and does something that I think the mpitests package *should* do but currently doesn't without my patch: test the mpicc wrappers. The current Makefiles set all sorts of MPIHOME and CC and other variables...these are all things that mpicc *should* take care of for you and *not* using plain mpicc in the mpitests Makefiles simply ignores one aspect of the testing that is perfectly valid and means you have to validate your mpi build environment separately. I would suggest that this patch, or something like it, be applied to the build environment for mpitests. Is the person responsible for that tarball on these lists? -- Doug Ledford dledf...@redhat.com GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband --- mpitests-3.0/IMB-3.0/src/Makefile.make 2007-11-22 09:18:07.0 -0500 +++ mpitests-3.0/IMB-3.0/src/Makefile 2008-09-18 14:08:56.0 -0400 @@ -1,21 +1,9 @@ # Enter root directory of mpich install -MPI_HOME=$(MPIHOME) - -MPICC=$(shell find ${MPI_HOME} -name mpicc -print) - -NULL_STRING := -ifneq (,$(findstring /bin/mpicc,${MPICC})) -MPI_INCLUDE := -I${MPI_HOME}/include -else -$(error Variable MPI_HOME=${MPI_HOME} does not seem to contain a valid mpicc) -endif -LIB_PATH= -LIBS= -CC = ${MPI_HOME}/bin/mpicc +CC = mpicc OPTFLAGS= -O3 CLINKER = ${CC} LDFLAGS = CPPFLAGS= -export MPI_INCLUDE CC LIB_PATH LIBS OPTFLAGS CLINKER LDFLAGS CPPFLAGS +export CC OPTFLAGS CLINKER LDFLAGS CPPFLAGS include Makefile.base --- mpitests-3.0/IMB-3.0/src/Makefile.base.make 2007-11-22 09:18:07.0 -0500 +++ mpitests-3.0/IMB-3.0/src/Makefile.base 2008-09-18 14:08:56.0 -0400 @@ -59,6 +59,14 @@ EXT : $(OBJEXT) IO: $(OBJIO) $(CLINKER) $(LDFLAGS) -o IMB-IO $(OBJIO) $(LIB_PATH) $(LIBS) +install: + mkdir -p ${DESTDIR}; \ + for benchmark in IMB-MPI1 IMB-EXT IMB-IO; do \ + if [ -e $$benchmark ]; then \ + cp $$benchmark ${DESTDIR}${INSTALL_DIR}/mpitests-$$benchmark; \ + fi; \ + done + # Make sure that we remove executables for specific architectures clean: /bin/rm -f *.o *~ PI* core IMB-IO IMB-EXT IMB-MPI1 exe_io exe_ext exe_mpi1 --- mpitests-3.0/presta-1.4.0/Makefile.make 2006-08-01 04:25:21.0 -0400 +++ mpitests-3.0/presta-1.4.0/Makefile 2008-09-18 14:52:46.0 -0400 @@ -6,14 +6,7 @@ # # Default values -MPIHOME= -CC=$(MPIHOME)/bin/mpicc DISTRIB= -STACK_PREFIX= -LIBS= -lm -L$(MPIHOME)/lib/shared -L$(MPIHOME)/lib -L$(DISTRIB)/$(STACK_PREFIX)/lib64 -L$(DISTRIB
RE: [ewg] RE: OFED Jan 5, 2009 meeting minutes on OFED plans
On Tue, 2009-01-06 at 14:00 -0800, Ryan, Jim wrote: Sean, I think that's a good point. What it suggests to me is asking when someone proposes a non-standard feature, what process, procedures, documentation, support, etc. if any, should be made available by the entity making the proposal? It seems to me asking the same questions of all proposed features is fair and reasonable, and shouldn't represent an unreasonable barrier to progress. Thoughts? If this already exists, it's my ignorance and I will apologize in advance Thanks again, Jim -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Sean Hefty Sent: Tuesday, January 06, 2009 1:54 PM To: 'Tziporet Koren'; ewg@lists.openfabrics.org Cc: gene...@lists.openfabrics.org Subject: [ewg] RE: OFED Jan 5, 2009 meeting minutes on OFED plans * Mellanox suggested to add IB over Eth - this is similar to iWARP but more like IB (e.g. including UD), and can work over ConnectX. A concern was raised by Intel (Dave Sommers) since it is not a standard transport. Decision: This request will be raised in the MWG, and they should decide if OFA can support it. Just is just my opinion, but in the past, OFED has included non-standard features, like extended connected mode, that are still not part of the IBTA spec. Do we know if such a feature would be accepted into the Linux kernel? I think OFED should base their decision more on the answer to that question than IBTA approval. FWIW, this is the question I ask before accepting OFED kernel patches into our kernel. With the exception of SDP (which was intentionally allowed) and qlgc_vnic (which was unintentionally allowed), if it's not either in the upstream linux kernel, or slated for inclusion, then I don't include it in our kernel. Hence why xrc and rds support still isn't in our products. -- Doug Ledford dledf...@redhat.com GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ewg] RE: OFED Jan 5, 2009 meeting minutes on OFED plans
On Tue, 2009-01-06 at 18:12 -0800, Gilad Shainer wrote: We need to look on this from the right angel. This is not a feature but rather a core component that adds support for a new adapter/NIC. This is the same as the core drivers for the other adapters that are supported already. In all fairness, the comment below was to implement IB over eth. Nothing today does that. iWARP is not IB and has unique requirements. Running full IB over eth is different. Saying it's not a new feature is like saying that when iSCSI over TCP first came out that it wasn't a new feature. Sure, we had SCSI and we had TCP, but we didn't have SCSI over TCP, so adding it *was* a new feature. In general we need to look not only on spec related features, but also to cover features that can benefit OFED and WinOF users (such as IPoIB connected mode or WinVerbs). I'm not so much concerned over IBTA standards. I'm concerned over what makes it into the upstream linux kernels. How much OFED's kernel differs from the upstream kernel directly impacts supportability of the OFED stack in our products. The more it diverges, the higher the support load. We actively control that divergence as a result. Gilad. -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Ryan, Jim Sent: Tuesday, January 06, 2009 2:01 PM To: Hefty, Sean; Tziporet Koren; ewg@lists.openfabrics.org Cc: gene...@lists.openfabrics.org Subject: RE: [ewg] RE: OFED Jan 5, 2009 meeting minutes on OFED plans Sean, I think that's a good point. What it suggests to me is asking when someone proposes a non-standard feature, what process, procedures, documentation, support, etc. if any, should be made available by the entity making the proposal? It seems to me asking the same questions of all proposed features is fair and reasonable, and shouldn't represent an unreasonable barrier to progress. Thoughts? If this already exists, it's my ignorance and I will apologize in advance Thanks again, Jim -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Sean Hefty Sent: Tuesday, January 06, 2009 1:54 PM To: 'Tziporet Koren'; ewg@lists.openfabrics.org Cc: gene...@lists.openfabrics.org Subject: [ewg] RE: OFED Jan 5, 2009 meeting minutes on OFED plans * Mellanox suggested to add IB over Eth - this is similar to iWARP but more like IB (e.g. including UD), and can work over ConnectX. A concern was raised by Intel (Dave Sommers) since it is not a standard transport. Decision: This request will be raised in the MWG, and they should decide if OFA can support it. Just is just my opinion, but in the past, OFED has included non-standard features, like extended connected mode, that are still not part of the IBTA spec. Do we know if such a feature would be accepted into the Linux kernel? I think OFED should base their decision more on the answer to that question than IBTA approval. - Sean ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- Doug Ledford dledf...@redhat.com GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] OFED-1.4 build issue
On Sun, 2009-01-04 at 11:12 +0200, Vladimir Sokolovsky wrote: Doug Ledford wrote: So I downloaded the OFED-1.4.tar.gz file, unpacked it, went into OFED-1.4/SRPMS and ran rpm2cpio on ofa_kernel-ofed-1.4.src.rpm, then ran cpio on the output, then unpacked the ofa_kernel-1.4.tar.gz file. I then went into the ofa-kernel-1.4 directory and attempted to run configure. So far, there are at least three patches in the fixes directory that won't apply either with or without quilt. The ipath_01_*.patch, ipath_02_*.patch, and mlx4_0390_Different*.patch files all refuse to apply. I haven't seen if there are others that fail to apply, there may be. I stopped after hitting the third one. Does the ofa_kernel tarball properly patch up for other people? Hi Doug, I did the same without any issues: 1. Download http://www.openfabrics.org/downloads/OFED/ofed-1.4/OFED-1.4.tgz 2. Use cpio rpm2cpio ofa_kernel-1.4-ofed1.4.src.rpm ofa_kernel-1.4-ofed1.4.cpio cpio -i ofa_kernel-1.4-ofed1.4.cpio 3. Run configure (I tried with and without quilt installed, kernel 2.6.9-78.ELsmp): tar xzf ofa_kernel-1.4.tgz cd ofa_kernel-1.4/ ./configure OK, I figured it out. Evidently, when I extracted the tarball, I extracted it over the top of an already existing ofa_kernel-1.4 directory from an earlier ofed-1.4 beta release. That directory contained a few files that didn't exist in the new tarball, and therefore weren't overwritten, and as a result the combined patch set didn't work. A complete removal of the directory with a fresh extraction from the tarball (what I thought I had in the first place) solved the problem. -- Doug Ledford dledf...@redhat.com GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] OFED-1.4 build issue
So I downloaded the OFED-1.4.tar.gz file, unpacked it, went into OFED-1.4/SRPMS and ran rpm2cpio on ofa_kernel-ofed-1.4.src.rpm, then ran cpio on the output, then unpacked the ofa_kernel-1.4.tar.gz file. I then went into the ofa-kernel-1.4 directory and attempted to run configure. So far, there are at least three patches in the fixes directory that won't apply either with or without quilt. The ipath_01_*.patch, ipath_02_*.patch, and mlx4_0390_Different*.patch files all refuse to apply. I haven't seen if there are others that fail to apply, there may be. I stopped after hitting the third one. Does the ofa_kernel tarball properly patch up for other people? -- Doug Ledford dledf...@redhat.com GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] OFED-1.3.2-20080728-0355.tgz issues
On Mon, 2008-09-08 at 13:21 +0300, Eli Cohen wrote: Next, I was watching the compiler warnings as things compiled, and I got an array subscript out of bounds warning. Turns out it was a legitimate error. There is a thinko in ipoib_transport_dev_init (well, at least in the file I had here, I assume the final version of the files you have would end up the same or close to it once quilt has done what I did manually). The snippet to fix the bug looks like this: @@ -228,8 +233,9 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca) priv-rx_wr_draft[i].sg_list = priv-sglist_draft[i][0]; if (i UD_POST_RCV_COUNT - 1) priv-rx_wr_draft[i].next = priv-rx_wr_draft[i + 1]; + else + priv-rx_wr_draft[i].next = NULL; } - priv-rx_wr_draft[i].next = NULL; if (ipoib_ud_need_sg(priv-max_ib_mtu)) { for (i = 0; i UD_POST_RCV_COUNT; ++i) { Yes, that's a bug, and I think the reason it did not occur as a real problem is because the whole ipoib_dev_priv struct is cleared at allocation so the last entry was NULL due to that. Whether it showed up as a problem or not, the compiler warning was an array out of bounds warning. That's not the type of compiler warning that should be ignored as it almost always points to a bug. I guess I was a little surprised that even though the compile tests on the kernel pass, that you guys allow that type of warning to go unchecked. I make a habit out of reviewing kernel compile warnings on the code I maintain and, when possible, I fix all the warnings just so things like this get caught. Anyway, I think I prefer this fix to the same problem: Index: ofed_kernel-fixes/drivers/infiniband/ulp/ipoib/ipoib_verbs.c === --- ofed_kernel-fixes.orig/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2008-09-08 13:07:02.0 +0300 +++ ofed_kernel-fixes/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2008-09-08 13:08:41.0 +0300 @@ -234,7 +234,7 @@ int ipoib_transport_dev_init(struct net_ if (i UD_POST_RCV_COUNT - 1) priv-rx_wr_draft[i].next = priv-rx_wr_draft[i + 1]; } - priv-rx_wr_draft[i].next = NULL; + priv-rx_wr_draft[UD_POST_RCV_COUNT - 1].next = NULL; if (ipoib_ud_need_sg(priv-max_ib_mtu)) { for (i = 0; i UD_POST_RCV_COUNT; ++i) { What do you think? If you're going to keep the setting of the last item to NULL outside the loop, then you can also remove the if inside the loop as you'll just overwrite the last entry when you exit the loop. -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] OFED-1.3.2-20080728-0355.tgz issues
On Thu, 2008-09-11 at 23:42 +0300, Eli Cohen wrote: Index: ofed_kernel-fixes/drivers/infiniband/ulp/ipoib/ipoib_verbs.c === --- ofed_kernel-fixes.orig/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2008-09-08 13:07:02.0 +0300 +++ ofed_kernel-fixes/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2008-09-08 13:08:41.0 +0300 @@ -234,7 +234,7 @@ int ipoib_transport_dev_init(struct net_ if (i UD_POST_RCV_COUNT - 1) priv-rx_wr_draft[i].next = priv-rx_wr_draft[i + 1]; } - priv-rx_wr_draft[i].next = NULL; + priv-rx_wr_draft[UD_POST_RCV_COUNT - 1].next = NULL; if (ipoib_ud_need_sg(priv-max_ib_mtu)) { for (i = 0; i UD_POST_RCV_COUNT; ++i) { What do you think? If you're going to keep the setting of the last item to NULL outside the loop, then you can also remove the if inside the loop as you'll just overwrite the last entry when you exit the loop. Well, yes, but then I am going to reference an entry outside the bounds of the array, which we want to prevent in the first place. No, you won't be referencing it, you'll be calculating it's address, saving that calculated result into a valid spot in the array, and then immediately overwriting that result with NULL. That's perfectly valid. -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] RE: List of libraries in OFED
On Sun, 2008-07-27 at 12:30 +0300, Sasha Khapyorsky wrote: On 12:19 Sun 27 Jul , Tziporet Koren wrote: Betsy Zeller wrote: From what I've heard, there are currently applications using: - libopensm - libosmcomp - libosmvendor - libibcommon Now that it is well understood that these libraries are intended to be private, developers can move away from using them. But, in the meantime it would be helpful if any major planned changes in these could be posted to the list. Sasha - please comment on the request from Betsy Sure, we are posting even minor changes :) . Basically I'm fine if developers use (or will use) those libraries, important point is that they should not expect stable API there. I've also heard it suggested that it would be easier to avoid some issues with private libraries if they were not in the standard compiler search path. There are pros and cons to deciding to move them, but I thought I would mention the suggestion. Library owners: Any thoughts here? I don't like this idea (as well as this word - private :)). Some packages in OFED already share those libraries (for instance ibutils uses libopensm, etc.). Also somebody may want to use it - let people to decide. I'll second Sasha's opinion here. Being in the default library search path is fine, even for private libraries. The pain of having them outside the default search path far outweighs the benefit of them not being readily available to people that don't understand they are private. -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ewg] Re: [ofa-general] OFED Jan 28 meeting summary onRC3readiness
On Thu, 2008-01-31 at 10:07 -0800, Sean Hefty wrote: In all fairness, the kernel portion of all of this, and the process of getting things into Linus' kernel, has *always* been a case of staging things in Roland's tree and then merging upstream. So, at least for the kernel, that's mostly true as OFED is pretty close to Roland's tree generally speaking. As for the user space packages though, you guys *are* the upstream. There's no one to merge upstream to and very little oversight by anyone. So, it's entirely up to all of you just how much your package seems to be a feature of the day change-athon versus a solid, stable program. I don't believe that this is the model actually in use. OFED has accepted kernel features that have not been submitted for upstream inclusion, or, in some cases, that were, but were rejected. (For examples, see local SA, SA event subscription, XRC, SDP, and some of the previous incarnations of IPoIB CM.) There are thousands of lines of code difference between OFED and the kernel upon which it's based. (To be clear, I'm not objecting to any changes, just the sheer volume.) The OFED releases of the userspace libraries are not identical to those provided by the maintainers. (See libibverbs.) Whose version of libibverbs does RedHat plan on using? How do you manage the differences between OFED and Roland's libibverbs libraries? And I'm really not trying to come across harsh here, but if the distros are willing to pull the OFED code, why should OFA bother trying to merge anything upstream? I pull *some* OFED code. I don't pull it all. There are things in OFED I won't accept until they've gone upstream. Hence, RDS is not in our offering. We made the mistake of taking SDP long ago and we'll carry that forward, but we generally look for things to be upstream before pulling them from OFED at this point (or at least have been submitted upstream and is being worked towards acceptance). In terms of user space, given a choice between a released tarball or the custom OFED tarball, I choose the released tarball. So, I currently have Roland's libibverbs, libmthca, and libmlx4. -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ewg] Re: [ofa-general] OFED Jan 28 meeting summary on RC3readiness
On Wed, 2008-01-30 at 14:03 -0800, Sean Hefty wrote: The main reason is not the bugs but the features supported by IBM - CM support for non SRQ and 4K MTU These are entirely my opinions, but... OFED isn't even at RC1 if it's not at feature freeze... OFED has moved well beyond trying to provide an enterprise distribution to simply providing an experimental code base more concerned with including the latest and greatest features. It's become the staging area for getting the code into shape for merging upstream, which wasn't what I thought was the purpose of OFED. Well, that's not really a fair thing to say given that the CM support for non SRQ patch *is* upstream, it just isn't in OFED. As far as OFED not even being at RC1 if it isn't at feature freeze, that all depends on what's classified as a feature. I know the two patches above were called features by Tziporet, but if this were an internal Red Hat project, those would have been more correctly classified as blockers. Once we've passed our feature freeze deadline and started our testing and validation, if a bug or shortcoming is found in some new code we submitted, then that is classified as a blocker (unless it's actually unimportant enough that we can leave it, but there are very few of this sort of thing ever found). For us anyway, this will be our first release where we are turning on CM support in IPoIB. It would be a legitimate bug that the code as submitted doesn't work across all the hardware. So, that would be a blocker bug, with the fix being the non-SRQ support. Anyway, I got the impression that the real sentiment of your mail was less about those two bugs/features and more that OFED seems to be more of an experimental source repo than an enterprise distribution. In all fairness, the kernel portion of all of this, and the process of getting things into Linus' kernel, has *always* been a case of staging things in Roland's tree and then merging upstream. So, at least for the kernel, that's mostly true as OFED is pretty close to Roland's tree generally speaking. As for the user space packages though, you guys *are* the upstream. There's no one to merge upstream to and very little oversight by anyone. So, it's entirely up to all of you just how much your package seems to be a feature of the day change-athon versus a solid, stable program. -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] OFED Jan 28 meeting summary on RC3 readiness
On Tue, 2008-01-29 at 14:21 +0200, Tziporet Koren wrote: OFED Jan 28 meeting summary on RC3 readiness: = 1. OFED 1.3 readiness toward RC3 this week * RC3 is based on the official 2.6.24 release * RC3 is expected on Wed * RC4 is planned for Feb 13 2. All companies update: * IBM - ready for RC3 * Voltaire - ready for RC3 * Qlogic - ready for RC3; will work on bug 874 * Intel - things looks good. Need some uDAPL update from Arlin * Chelsio - ready for RC3 * NetEffect - ready for RC3 * Cisco - reported all issues in bugzilla * Mellanox - ready for RC3 * MPI - all packages are ready 3. Request to change IPoIB to support CM without SRQ and 4K MTU Decided that we cannot insert such enhancements at this stage (RC3 built today) without delaying the release since IPoIB is a critical ULP used by all customers. Since we do not want to delay the release and we wish to have a solution for the new IPoIB enhancements we plan to have 1.3.1 release Hmmm...I'd like to put my $.02 in here. I don't have any visibility into what drives the OFED schedule, so I have no clue as to why people don't want to slip the schedule for this change. I'm sure you guys have your reasons. However, I also happen to be a consumer of this code, and I know for a fact that no one has gotten my input on this issue. So, the deal is that I'm currently integrating OFED 1.3 into what will be RHEL5.2. The RHEL5.2 freeze date has already passed, but in order to keep what finally goes out from being too stale, I'm being allowed to submit the OFED-1.3-rc1 code prior to freeze, and then update to OFED-1.3 final during our beta test process. What this means, is that anything you punt from 1.3 to 1.3.1, you are also punting out of RHEL5.2 and RHEL4.7. So, that being said, there's a whole trickle down effect with various groups that would really like to be able to use 5.2 out of the box that may prefer a slip in 1.3 so that this can be part of it instead of punting to 1.3.1. I'm not saying this will change your mind, but I'm sure it wasn't part of the decision process before, so I'm bringing it up. AIs: Tziporet to define the 1.3.1 release (scope of changes, schedule etc.) Vlad: open 1_3_1 branch so people will have a place to commit changes. We will not start any daily build before 1.3 release 3. Review high priority bugs: 846 critical[EMAIL PROTECTED]SDP crash on RHEL5 ppc64 running netserver - will be debugged 859 critical[EMAIL PROTECTED] Bonding configuration on Sles10 sp1 is not loaded consistently - fixed 863 critical[EMAIL PROTECTED] ib-bonding won't compile for RHEL4 U6 - fixed 874 critical[EMAIL PROTECTED] Intel MPI (IMB test) hangs intermittently on the qlogic HCA - will be debugged by Qlogic 760 major [EMAIL PROTECTED] UDP performance on Rx is lower than Tx - for 1.3.1 761 major [EMAIL PROTECTED] Poor and jittery UDP performance at small messages - for 1.3.1 Ditto for requesting these two be in 1.3. We've already had customers bring up the UDP performance issue in our previous releases. 869 major [EMAIL PROTECTED]mstflint won't build on SLES10 x86 - fixed 736 major [EMAIL PROTECTED] IBV_WC_RETRY_EXC_ERR errors with local rdma_reads - seems a FW issue (Mellanox to debug) 767 major [EMAIL PROTECTED] Non backport Kernels that don't build in genalloc cause compile errors for cxgb3 - no fix (document) And we still need to get actual downloads for a number of the srpms in OFED-1.3. The various spec files list fictitious tarballs that aren't actually available on the download server. While that works for the rcs, they really need to have a tarball up there for final. -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ofa-general] Re: [ewg] New features for OFED 1.4
On Wed, 2007-11-07 at 18:08 +0200, Tziporet Koren wrote: Johann George wrote: Tziporet, So we should assess how close we are to that goal and how we can put OFED out of business. Could you cover this topic during your session on OFED 1.3: Procedure and Review? It seems that this would be the right place to bring it up and we can attempt to extend your session to allow for it. I think its more appropriate in the OFED 1.4 session But maybe instead of talking about 1.3 status (which everybody can see from the weekly meeting reports) I should talk about OFED in the future However I need some input from the distros Splitting the RPMs up was a *huge* step in the right direction. I think my last emails on the topic relayed why we aren't able to just directly import spec files over and over again, so once we have released tarballs and a single spec import (well, if ever on the spec import, a lot of times we just write our own that does what we want), then we are good. Beyond that, future thinking, is just that a collection of known interoperable tarballs is best for us. So, as Roland has mentioned many time, what's needed from me is a release, not a distribution. And the release need only consist of: dapl-2.0.3 + dapl-1.2.2 + ibverbs-1.1 + mthca-1.0.4 + blah, blah, blah are all known to work properly together. From that point, I just grab the appropriate tarballs that contain the releases mentioned, and I build them all through the build system and that's our release cycle. Tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: Do you (Redhat) plan to integrate OFED 1.2.5 in the coming update of Redhat?
On Wed, 2007-08-22 at 17:34 +0300, Tziporet Koren wrote: Hi Doug, As you know OFED 1.2.5 went GA last week. Does Redhat has any plans to include this release in any coming update for RHEL5? For the kernel stuff possibly, but for the user space packages I'm really just looking for tarballs to download. Like Roland mentioned, what I need is a release, not a distribution. -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] Re: ofa_1_2_c_kernel 20070802-0201 daily build status
On Tue, 2007-08-14 at 09:59 +0200, Hoang-Nam Nguyen wrote: Hi Doug! On Sat, 2007-08-11 at 21:13 +0300, Michael S. Tsirkin wrote: Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]: Subject: Re: ofa_1_2_c_kernel 20070802-0201 daily build status Hello Doug and Scott! On Thursday 02 August 2007 18:08, Michael S. Tsirkin wrote: ehca backports for kernel.org kernels seem to be broken. 1. Does anyone care enough to fix them? If not we'll disable ehca in build for these kernels. 2. Could you upload kernels for RHEL4U5 and SLES10 ppc64? Don't you guys already have RHEL4U5? It had a backports directory in the OFED 1.2 release...and it's been out for quite a while... Some part of this thread might confuse. And really, it's not about any specific backport issue from ehca or other component(s). It's a general prereq for ofed's daily build to have rhel4.5 resp sles10 ppc64 in their daily build runs too. Thanks Nam All of the kernel rpms from our U5 kernel have been on my web page in my sig for *ages*. All you need to do is download the needed rpms and install. -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] Re: ofa_1_2_c_kernel 20070802-0201 daily build status
On Tue, 2007-08-14 at 12:40 +0300, Michael S. Tsirkin wrote: Quoting Scott Bahling [EMAIL PROTECTED]: Subject: Re: [ofa-general] Re: ofa_1_2_c_kernel 20070802-0201 daily build status On Tue, 2007-08-14 at 09:59 +0200, Hoang-Nam Nguyen wrote: Hi Doug! On Sat, 2007-08-11 at 21:13 +0300, Michael S. Tsirkin wrote: Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]: Subject: Re: ofa_1_2_c_kernel 20070802-0201 daily build status Hello Doug and Scott! On Thursday 02 August 2007 18:08, Michael S. Tsirkin wrote: ehca backports for kernel.org kernels seem to be broken. 1. Does anyone care enough to fix them? If not we'll disable ehca in build for these kernels. 2. Could you upload kernels for RHEL4U5 and SLES10 ppc64? Don't you guys already have RHEL4U5? It had a backports directory in the OFED 1.2 release...and it's been out for quite a while... Some part of this thread might confuse. And really, it's not about any specific backport issue from ehca or other component(s). It's a general prereq for ofed's daily build to have rhel4.5 resp sles10 ppc64 in their daily build runs too. I have checked, and Mellanox still has an active partner account with Novell which means they can download our product isos (which contain the kernel-source rpms) at http://download.novell.com and any package updates at support.novell.com http://support.novell.com/linux/psdb/ using proper Mellanox logins. You can contact me offline of you need help here. Also, our current cvs snapshots can be found at ftp://ftp.suse.com/pub/projects/kernel/kotd/ or the following mirror ftp://ftp.gwdg.de/pub/linux/suse/projects/kernel/kotd/ These are unreleased bits, but give you a peek into what changes might be coming in the next kernel update. Not sure if this is of use to you. I don't have a ppc machine so I can't use your install ISOs. Nonsense. Of course you can. At a minimum rpm2cpio | cpio -ivd gets around the arch of the package regardless of what machine you are on and installs it without involving the rpm database. If someone cares enough about SLES/RHEL on ppc support in OFED, she can upload the unpacked kernel devel headers for these distros to ofa server and we'll use these to cross-compile OFED kernel bits, nightly. -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: RFC OFED-1.3 installation
On Tue, 2007-07-17 at 20:12 +0300, Michael S. Tsirkin wrote: Look, rpms are just like versioned tarballs. Once they go out in the wild, that particular name-version-release combination is FROZEN. It really looks like this is a work around for when you want to apply a patch without going through maintainer. Not really. When you have a customer with a sev 1 issue, you don't wait for upstream to release a new version of gcc before you get them their fix. There are also those times when you have an older, long released product that isn't up to date with upstream, for instance RHEL4 mdadm is 1.12.0 and will not be updated to the 2.6.2 version that's in Fedora. If I find a bug in that 1.12.0 version of mdadm, then I'll fix it using a patch in the spec file. If the bug also exists in upstream then it will get sent upstream to be included in the latest upstream release. But, upstream won't care about version 1.12.0, and they won't release a new version 1 mdadm just for our bugfix, so we carry those targeted fixes around as long as we have that version 1 mdadm on systems. There are other reasons to do this as well, for instance when you need to make a change as part of package integration that simply isn't needed or wanted upstream. For example, many times upstream couldn't care less about patches that implement our particular file system layout for a package. There are lots of things that we as a distributor have to care about that upstream generally does not. The spec file and patches are how we solve our customer's problems. They are what make a stable distribution, as opposed to a bleeding edge, must always update to latest upstream version to fix any problem system, a reality. It's the difference between RHEL and Fedora. The way OFED release process works, we really don't do releases all that often, and when we do, we can coordinate with the maintainer. -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] RE: RFC OFED-1.3 installation
cp -a $target $target-$VERSION 110 sed -e 's/@VERSION@/'$VERSION'/;s/@RELEASE@/'$RELEASE'/;s/@TARBALL@/'$TARBALL'/ ' $target/$target.spec.in $target-$VERSION/$target.spec 111 cd $target-$VERSION 112 ./autogen.sh 113 cd .. 114 echo Creating $TMPDIR/$TARBALL 115 tar -czf $TMPDIR/$TARBALL --exclude=.git $target-$VERSION I thought that the standard way to get tar.gz file is using autotools (3 commands) like I wrote before: autogen.sh, configure, make dist. Can you explain why your way is better? autogen.sh yes (and it should have been in my script, my current one has it, but that one didn't). Since configure tries to figure out a bunch of stuff about the build environment, it must be run in the software development environment of the platform you are targeting the final build for. If you run it on your local RHEL4 machine, but our RHEL4 build environment for our next update has a different glibc that changes some minor thing that configure actually checks, then it would be wrong. So, even if you run configure, I can't trust the output from it. Obviously, if you aren't running configure, then make dist is irrelevant. So, you can run configure if you want, but I will ignore the output in anything I build. And if the make dist operation removes any files necessary for me to properly reconfigure the software using configure, then it will be a totally broken tarball from my perspective. Do you have a proposal for daily builds? We need OFED daily builds for verification. We can't wait for RedHat updates to get the updated OFED packages. I have a newer version of that make.dist script that I wrote to specifically work for the repos other than the management tree. Using that script, you could just do this: for repo in *; do ./make.dist $repo daily rm $RPMDIR/${repo}* rpmbuild --rebuild dist/$repo-git.tar.gz rpm -Uvh $RPMDIR/${repo}* done That's really all you need for anything you are building for internal use. And if one of you wanted to be responsible for providing the rpms, then a single person could actually maintain versioned rpms that way. It would only break down when you try to run the make.dist script from different systems since it creates a file that lets it know what the next number in sequence is each time it builds that git.tar.gz file. However, even that could be solved by putting the release file in some sort of SCM if you wanted multiple people to be able to build properly versioned rpms. Really, the strictest guidelines apply to things you make publicly available. If you want to have a private, EWG only area on the ofa server where you guys can share daily, unversioned builds, go right ahead. It's when they go out in the wild and you expect other people to pick them up that you have to care. What OFED-1.3 structure do you propose? Should it consist of source RPMs or tgz files? What features install script should support? From my standpoint, tgz files are really about all I care about. For instance, no matter what install script you write, I won't be using it because we have our own install/update methods. And it's hard for you to make a spec file that's both relevant for Red Hat and SuSE and at the same time clean enough to meet our requirements. There is one suggestion I would make though that greatly helps with the whole package versioning issue. We have this trick we use in our kernel RPMs back when we used to ship a kernel-source rpm (which was different than the src.rpm, it was a pre-prepared, already prep'ed source tree ready to be built from). When we built our own kernel RPMs, we would go into the top level Makefile in the kernel source tree and edit the extraversion to be what matched the rpm. When we made that source tree that would become the kernel-source package, we edited extraversion to -prep so that the final result if a customer used it to build a kernel would be something like 2.6.9-prep in the kernel version. You guys could do something similar in all the src.rpms you ship. Since you know they will be compiled locally, you could easily put something like .local at the end of you release string, so that say dapl would be version: 1.2.1, release: 1.local or 1.ofa or something like that. It doesn't solve package version comparison issues (aka, telling which package is newer by the number), but it does help to solve identification issues. -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] Re: RFC OFED-1.3 installation
On Wed, 2007-07-18 at 00:09 +0300, Michael S. Tsirkin wrote: Quoting Roland Dreier [EMAIL PROTECTED]: Subject: Re: [ofa-general] Re: RFC OFED-1.3 installation I don't really think we want customers to run beta code What's the point of a beta then?? Donnu. In previous OFED releases, we had release candidates rather than beta. Openfabrics members were running RCs and reporting issues on the list and in bugzilla. Do you really ask your customers to do this for you? Sure, as much as possible. I generally don't recommend using it in production, but just as close as they can get to production is fine with me. The more issues they find while I'm still actually working on it and making new revisions, the less issues they'll find after I stupidly think I'm done. -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg