Re: [OMPI devel] 1.3 PML default choice

2009-01-13 Thread Brian W. Barrett
The selection logic for the PML is very confusing and doesn't follow the standard priority selection. The reasons for this are convoluted and not worth discussing here. The bottom line, however, is that the OB1 PML will be the default *UNLESS* the PSM (PathScale/Qlogic) MTL can be chosen, in

Re: [OMPI devel] RFC: [slightly] Optimize Fortran MPI_SEND / MPI_RECV

2009-02-07 Thread Brian W. Barrett
On Sat, 7 Feb 2009, Jeff Squyres wrote: End result: I guess I'm a little surprised that the difference is that clear -- does a function call really take 10ns? I'm also surprised that the layered C version has significantly more jitter than the non-layered version; I can't really explain that.

Re: [OMPI devel] RFC: Rename several OMPI_* names to OPAL_*

2009-02-10 Thread Brian W. Barrett
I have no objections to this change Brian On Tue, 10 Feb 2009, Greg Koenig wrote: RFC: Rename several OMPI_* names to OPAL_* WHAT: Rename several #define values that encode the prefix "OMPI_" to instead encode the prefix "OPAL_" throughout the entire Open MPI source code tree. Also, elimina

Re: [OMPI devel] RFC: eliminating "descriptor" argument from sendi function

2009-02-23 Thread Brian W. Barrett
At a high level, it seems reasonable to me. I am not familiar enough with the sendi code, however, to have a strong opinion either way. Brian On Mon, 23 Feb 2009, Jeff Squyres wrote: Sounds reasonable to me. George / Brian? On Feb 21, 2009, at 2:11 AM, Eugene Loh wrote: What: Eliminate

Re: [OMPI devel] RFC: eliminating "descriptor" argument from sendi function

2009-02-23 Thread Brian W. Barrett
On Mon, 23 Feb 2009, Jeff Squyres wrote: On Feb 23, 2009, at 10:37 AM, Eugene Loh wrote: I sense an opening here and rush in for the kill... :-) And, why does the PML pass a BTL argument into the sendi function? First, the BTL argument is not typically used. Second, if the BTL sendi func

Re: [OMPI devel] compiler_args in wrapper-data.txt files with Portland Group Compilers

2009-02-24 Thread Brian W. Barrett
Hi Wayne - Sorry for the delay. I'm the author of that code, and am currently trying to finish my dissertation, so I've been a bit behind. Anyway, at present, the compiler_args field only works on a single token. So you can't have something looking for -tp p7. I thought about how to do thi

Re: [OMPI devel] 1.3.1rc3 was borked; 1.3.1rc4 is out

2009-03-03 Thread Brian W. Barrett
On Tue, 3 Mar 2009, Jeff Squyres wrote: 1.3.1rc3 had a race condition in the ORTE shutdown sequence. The only difference between rc3 and rc4 was a fix for that race condition. Please test ASAP: http://www.open-mpi.org/software/ompi/v1.3/ I'm sorry, I've failed to test rc1 & rc2 on Catam

Re: [OMPI devel] calling sendi earlier in the PML

2009-03-03 Thread Brian W. Barrett
On Tue, 3 Mar 2009, Eugene Loh wrote: First, this behavior is basically what I was proposing and what George didn't feel comfortable with. It is arguably no compromise at all. (Uggh, why must I be so honest?) For eager messages, it favors BTLs with sendi functions, which could lead to those

Re: [OMPI devel] calling sendi earlier in the PML

2009-03-03 Thread Brian W. Barrett
On Tue, 3 Mar 2009, Jeff Squyres wrote: On Mar 3, 2009, at 3:31 PM, Eugene Loh wrote: First, this behavior is basically what I was proposing and what George didn't feel comfortable with. It is arguably no compromise at all. (Uggh, why must I be so honest?) For eager messages, it favors BTL

Re: [OMPI devel] 1.3.1rc3 was borked; 1.3.1rc4 is out

2009-03-03 Thread Brian W. Barrett
On Tue, 3 Mar 2009, Brian W. Barrett wrote: On Tue, 3 Mar 2009, Jeff Squyres wrote: 1.3.1rc3 had a race condition in the ORTE shutdown sequence. The only difference between rc3 and rc4 was a fix for that race condition. Please test ASAP: http://www.open-mpi.org/software/ompi/v1.3

Re: [OMPI devel] calling sendi earlier in the PML

2009-03-04 Thread Brian W. Barrett
On Wed, 4 Mar 2009, George Bosilca wrote: I'm churning a lot and not making much progress, but I'll try chewing on that idea (unless someone points out it's utterly ridiculous). I'll look into having PML ignore sendi functions altogether and just make the "send-immediate" path work fast with

Re: [OMPI devel] RFC: move BTLs out of ompi into separate layer

2009-03-09 Thread Brian W. Barrett
I, not suprisingly, have serious concerns about this RFC. It assumes that the ompi_proc issues and bootstrapping issues (the entire point of the move, as I understand it) can both be solved, but offer no proof to support that claim. Without those two issues solved, we would be left with an on

Re: [OMPI devel] RFC: move BTLs out of ompi into separate layer

2009-03-09 Thread Brian W. Barrett
nst it in the default OMPI configuration; other RTEs that want to do more meaningful stuff will need to provide more meaningful implementations of the stubs and hooks. - Hopefully the teleconference time tomorrow works out for Rich (his communications were unclear on this point). Otherwise,

Re: [OMPI devel] RFC: move BTLs out of ompi into separate layer

2009-03-11 Thread Brian W. Barrett
On Wed, 11 Mar 2009, Richard Graham wrote: Brian, Going back over the e-mail trail it seems like you have raised two concerns: - BTL performance after the change, which I would take to be - btl latency - btl bandwidth - Code maintainability - repeated code changes that impact a large number

Re: [OMPI devel] Meta Question -- Open MPI: Is it a dessert topping or is it a floor wax?

2009-03-11 Thread Brian W. Barrett
On Wed, 11 Mar 2009, Andrew Lumsdaine wrote: Hi all -- There is a meta question that I think is underlying some of the discussion about what to do with BTLs etc. Namely, is Open MPI an MPI implementation with a portable run time system -- or is it a distributed OS with an MPI interface? It s

Re: [OMPI devel] Meta Question -- Open MPI: Is it a dessert toppingor is it a floor wax?

2009-03-12 Thread Brian W. Barrett
I'm going to stay out of the debate about whether Andy correctly characterized the two points you brought up as a distributed OS or not. Sandia's position on these two points remains the same as I previously stated when the question was distributed OS or not. The primary goal of the Open MPI

Re: [OMPI devel] Inherent limit on #communicators?

2009-04-30 Thread Brian W. Barrett
On Thu, 30 Apr 2009, Ralph Castain wrote: We seem to have hit a problem here - it looks like we are seeing a built-in limit on the number of communicators one can create in a program. The program basically does a loop, calling MPI_Comm_split each time through the loop to create a sub-communicato

Re: [OMPI devel] Inherent limit on #communicators?

2009-04-30 Thread Brian W. Barrett
t. this is not new, so if there is a discrepancy between what the comm structure assumes that a cid is and what the pml assumes, than this was in the code since the very first days of Open MPI... Thanks Edgar Brian W. Barrett wrote: On Thu, 30 Apr 2009, Ralph Castain wrote: We seem to have hit

Re: [OMPI devel] Inherent limit on #communicators?

2009-04-30 Thread Brian W. Barrett
On Thu, 30 Apr 2009, Edgar Gabriel wrote: Brian W. Barrett wrote: When we added the CM PML, we added a pml_max_contextid field to the PML structure, which is the max size cid the PML can handle (because the matching interfaces don't allow 32 bits to be used for the cid. At the same

Re: [OMPI devel] Inherent limit on #communicators?

2009-04-30 Thread Brian W. Barrett
On Thu, 30 Apr 2009, Ralph Castain wrote: well, that's only because the code's doing something it shouldn't.  Have a look at comm_cid.c:185 - there's the check we added to the multi-threaded case (which was the only case when we added it).  The cid generation should never generate a number larger

Re: [OMPI devel] Inherent limit on #communicators?

2009-05-01 Thread Brian W. Barrett
first days of Open MPI... Thanks Edgar Brian W. Barrett wrote: On Thu, 30 Apr 2009,

Re: [OMPI devel] Revise paffinity method?

2009-05-06 Thread Brian W. Barrett
On Wed, 6 May 2009, Ralph Castain wrote: Any thoughts on this? Should we change it? Yes, we should change this (IMHO) :). If so, who wants to be involved in the re-design? I'm pretty sure it would require some modification of the paffinity framework, plus some minor mods to the odls framewo

Re: [OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Brian W. Barrett
On Thu, 14 May 2009, Jeff Squyres wrote: On May 14, 2009, at 1:46 PM, Ralf Wildenhues wrote: A more permanent workaround could be in OpenMPI to list each library that is used *directly* by some other library as a dependency. Sigh. We actually took pains to *not* do that; we *used* to do tha

Re: [OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Brian W. Barrett
On Thu, 14 May 2009, Ralf Wildenhues wrote: Hi Brian, * Brian W. Barrett wrote on Thu, May 14, 2009 at 08:22:58PM CEST: Actually, I think that was something else. Today, libopen-rte.la lists libopen-pal.la as a dependency and libmpi.la lists libopen-rte.la. I had removed the dependency of

Re: [OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Brian W. Barrett
On Thu, 14 May 2009, Jeff Squyres wrote: On May 14, 2009, at 2:22 PM, Brian W. Barrett wrote: We actually took pains to *not* do that; we *used* to do that and explicitly took it out. :-\ IIRC, it had something to do with dlopen'ing libmpi.so...? Actually, I think that was something

Re: [OMPI devel] opal / fortran / Flogical

2009-06-01 Thread Brian W. Barrett
I have to agree with Jeff's concerns. Brian On Mon, 1 Jun 2009, Jeff Squyres wrote: Hmm. I'm not sure that I like this commit. George, Brian, and I specifically kept Fortran out of (the non-generated code in) opal because the MPI layer is the *only* layer that uses Fortran. There was one

Re: [OMPI devel] opal / fortran / Flogical

2009-06-01 Thread Brian W. Barrett
Well, this may just be another sign that the push of the DDT to OPAL is a bad idea. That's been my opinion from the start, so I'm biased. But OPAL was intended to be single process systems portability, not MPI crud. Brian On Mon, 1 Jun 2009, Rainer Keller wrote: Hmm, OK, I see. However, I

Re: [OMPI devel] trac ticket 1944 and pending sends

2009-06-23 Thread Brian W. Barrett
I think that sounds like a rational path forward. Another, more long term, option would be to move from the FIFOs to a linked list (which can even be atomic), which is what MPICH does with nemesis. In that case, there's never a queue to get backed up (although the receive queue for collective

Re: [OMPI devel] trac ticket 1944 and pending sends

2009-06-24 Thread Brian W. Barrett
On Wed, 24 Jun 2009, Eugene Loh wrote: Brian Barrett wrote: Or go to what I proposed and USE A LINKED LIST! (as I said before, not an original idea, but one I think has merit) Then you don't have to size the fifo, because there isn't a fifo. Limit the number of send fragments any one p

Re: [OMPI devel] sm BTL flow management

2009-06-25 Thread Brian W. Barrett
All - Jeff, Eugene, and I had a long discussion this morning on the sm BTL flow management issues and came to a couple of conclusions. * Jeff, Eugene, and I are all convinced that Eugene's addition of polling the receive queue to drain acks when sends start backing up is required for deadloc

Re: [OMPI devel] sm BTL flow management

2009-06-25 Thread Brian W. Barrett
On Thu, 25 Jun 2009, Eugene Loh wrote: I spoke with Brian and Jeff about this earlier today. Presumably, up through 1.2, mca_btl_component_progress would poll and if it received a message fragment would return. Then, presumably in 1.3.0, behavior was changed to keep polling until the FIFO wa

Re: [OMPI devel] MPI_Accumulate() with MPI_PROC_NULL target rank

2009-07-15 Thread Brian W. Barrett
On Wed, 15 Jul 2009, Lisandro Dalcin wrote: The MPI 2-1 standard says: "MPI_PROC_NULL is a valid target rank in the MPI RMA calls MPI_ACCUMULATE, MPI_GET, and MPI_PUT. The effect is the same as for MPI_PROC_NULL in MPI point-to-point communication. After any RMA operation with rank MPI_PROC_NUL

Re: [OMPI devel] autodetect broken

2009-07-22 Thread Brian W. Barrett
The current autodetect implementation seems like the wrong approach to me. I'm rather unhappy the base functionality was hacked up like it was without any advanced notice or questions about original design intent. We seem to have a set of base functions which are now more unreadable than before

Re: [OMPI devel] RFC: meaning of "btl_XXX_eager_limit"

2009-07-23 Thread Brian W. Barrett
On Thu, 23 Jul 2009, Jeff Squyres wrote: There are two solutions I can think of. Which should we do? a. Pass the (max?) PML header size down into the BTL during initialization such that the the btl_XXX_eager_limit can represent the max MPI data payload size (i.e., the BTL can siz

Re: [OMPI devel] libtool issue with crs/self

2009-07-29 Thread Brian W. Barrett
What are you trying to do with lt_dlopen? It seems like you should always go through the MCA base utilities. If one's missing, adding it there seems like the right mechanism. Brian On Wed, 29 Jul 2009, Josh Hursey wrote: George suggested that to me as well yesterday after the meeting. So we

Re: [OMPI devel] Shared library versioning

2009-07-29 Thread Brian W. Barrett
On Wed, 29 Jul 2009, Jeff Squyres wrote: On Jul 28, 2009, at 1:56 PM, Ralf Wildenhues wrote: - support files are not versioned (e.g., show_help text files) - include files are not versioned (e.g., mpi.h) - OMPI's DSOs actually are versioned, but more work would be needed in this area to make t

Re: [OMPI devel] libtool issue with crs/self

2009-07-29 Thread Brian W. Barrett
x27;re looking for a symbol first place it's found, then you can just do: dlsym(RTLD_DEFAULT, symbol); The lt_dlsym only really helps if you're running on really obscure platforms which don't support dlsym and loading "preloaded" components. Brian On Wed, 29 Jul 200

Re: [OMPI devel] Device failover on ob1

2009-08-03 Thread Brian W. Barrett
On Sun, 2 Aug 2009, Ralph Castain wrote: Perhaps a bigger question needs to be addressed - namely, does the ob1 code need to be refactored? Having been involved a little in the early discussion with bull when we debated over where to put this, I know the primary concern was that the code not

Re: [OMPI devel] libtool issue with crs/self

2009-08-05 Thread Brian W. Barrett
cutable, in which case you might be better off just using dlsym() directly. If you're looking for a symbol first place it's found, then you can just do: dlsym(RTLD_DEFAULT, symbol); The lt_dlsym only really helps if you're running on really obscure platforms which don't su

Re: [OMPI devel] libtool issue with crs/self

2009-08-05 Thread Brian W. Barrett
On Wed, 5 Aug 2009, Josh Hursey wrote: On Aug 5, 2009, at 11:35 AM, Brian W. Barrett wrote: Josh - Just in case it wasn't clear -- if you're only looking for a symbol in the executable (which you know is there), you do *NOT* have to dlopen() the executable first (you do with

Re: [OMPI devel] RFC: PML/CM priority

2009-08-11 Thread Brian W. Barrett
On Tue, 11 Aug 2009, Rainer Keller wrote: When compiling on systems with MX or Portals, we offer MTLs and BTLs. If MTLs are used, the PML/CM is loaded as well as the PML/OB1. Question 1: Is favoring OB1 over CM required for any MTL (MX, Portals, PSM)? George has in the past had srtong feelin

Re: [OMPI devel] Oversubscription/Scheduling Bug

2006-05-26 Thread Brian W. Barrett
On Fri, 26 May 2006, Jeff Squyres (jsquyres) wrote: You can see this by slightly modifying your test command -- run "env" instead of "hostname". You'll see that the environment variable OMPI_MCA_mpi_yield_when_idle is set to the value that you passed in on the mpirun command line, regardless of

Re: [OMPI devel] Oversubscription/Scheduling Bug

2006-05-26 Thread Brian W. Barrett
On Fri, 26 May 2006, Brian W. Barrett wrote: On Fri, 26 May 2006, Jeff Squyres (jsquyres) wrote: You can see this by slightly modifying your test command -- run "env" instead of "hostname". You'll see that the environment variable OMPI_MCA_mpi_yield_when_idle is

Re: [OMPI devel] memory_malloc_hooks.c and dlclose()

2006-05-30 Thread Brian W. Barrett
On Mon, 22 May 2006, Neil Ludban wrote: I'm getting a core dump when using openmpi-1.0.2 with the MPI extensions we're developing for the MATLAB interpreter. This same build of openmpi is working great with C programs and our extensions for gnu octave. The machine is AMD64 running Linux: Linu

Re: [OMPI devel] configure & Fortran problem

2006-10-06 Thread Brian W. Barrett
Before you go off and file a bug, this is not an Open MPI issue, but a windows / autoconf issue. Please don't file a bug on this, or I'm just going to have to close it as notabug... Brian On Fri, 6 Oct 2006, Jeff Squyres wrote: Oops. That's a bug. I'll file a ticket. On 10/5/06 12:51 PM

[OMPI devel] Shared memory file changes

2006-10-11 Thread Brian W. Barrett
Hi all - A couple of weeks ago, I committed some changes to the trunk that greatly reduced the size of the shared memory file for small numbers of processes. I haven't heard any complaints (the non-blocking send/receive issue is at proc counts greater than the size this patch affected). Anyon

[OMPI devel] configure changes tonight

2006-10-12 Thread Brian W. Barrett
Hi all - There will be three configure changes committed to the trunk tonight: - Some cleanups resulting from the update to the wrapper compilers for 32/64 bit support - A new configure option to deal with some fixes for the MPI::SEEK_SET (and friends) issue - Some cleanups in the

Re: [OMPI devel] help config.status to not mess up substitutions

2006-10-23 Thread Brian W. Barrett
Thanks, I'll apply ASAP. Brian On Mon, 23 Oct 2006, Ralf Wildenhues wrote: Please apply this robustness patch, which helps to avoid accidental unwanted substitutions done by config.status. From all I can tell, they do not happen now, but first the Autoconf manual warns against them, second th

Re: [OMPI devel] New oob/tcp?

2006-10-25 Thread Brian W. Barrett
The create_listen_thread code should be on both the trunk and v1.2 branch right now. You are correct that the heterogeneous fixes haven't moved just yet, because they aren't quite right. Hope to have that fixed in the near future... brian On Wed, 25 Oct 2006, Ralph H Castain wrote: There

Re: [OMPI devel] Building OpenMPI on windows

2006-11-21 Thread Brian W Barrett
At one point, a long time ago (before anyone started working on the native windows port), I had unpatched OMPI tarballs building on Cygwin, using Cygwin's gcc. Which I believe is all Greg and Beth want to do for now. But I believe that the recent code to support Windows natively has cause

[OMPI devel] Build system changes

2006-11-29 Thread Brian W Barrett
Hi all - Just wanted to give everyone a heads up that there will be two changes to the build system that should have minimal impact on everyone, but are worth noting: 1) If you are using Autoconf 2.60 or later, you *MUST* be using Automake 1.10 or later. Most people are still using

Re: [OMPI devel] incorrect definition of MPI_ERRCODES_IGNORE?

2006-12-30 Thread Brian W. Barrett
Thanks for the bug report. You are absolutely correct - the #define is incorrect in Open MPI. I've committed a fix to our development trunk and it should be included in the future releases. In the mean time, it is safe to change the line in the installed mpi.h for Open MPI from: #define MPI_E

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r12945

2007-01-02 Thread Brian W. Barrett
Because that's what we had been using and I was going for minimal change (since this is for v1.2). Also note that *none* of this code is in performance critical areas. Last I checked, we don't really care how fast attribute updates and error handlers are fired... I think there are much b

Re: [OMPI devel] 1.2b3 fails on bluesteel

2007-01-22 Thread Brian W. Barrett
On Jan 22, 2007, at 10:39 AM, Greg Watson wrote: On Jan 22, 2007, at 9:48 AM, Ralph H Castain wrote: On 1/22/07 9:39 AM, "Greg Watson" wrote: I tried adding '-mca btl ^sm -mca mpi_preconnect_all 1' to the mpirun command line but it still fails with identical error messages. I don't under

[OMPI devel] Libtool update for v1.2

2007-01-23 Thread Brian W. Barrett
Hi all - In December I had brought up the idea of updating the snapshot of Libtool 2 that we use for building the v1.2 branch to a more recent snapshot. The group seemed to think this was a good idea and I was going to do it, then got sidetracked working around a bug in their support for

[OMPI devel] v1.2 / trunk tarball libtool change

2007-01-25 Thread Brian W. Barrett
Hi all - As of tonight, the version of Libtool used to build "official" tarballs for the v1.2 branch and the trunk (this includes nightly snapshots, beta releases, and official releases) has been updated from a snapshot of Libtool 2 from June/July 2006 to on from Jan 23, 2007. This updat

Re: [OMPI devel] [OMPI svn] svn:open-mpi r13644

2007-02-13 Thread Brian W. Barrett
On Feb 13, 2007, at 5:16 PM, Jeff Squyres wrote: On Feb 13, 2007, at 7:10 PM, George Bosilca wrote: It's already in the 1.2 !!! I don't know much you care about performance, but I do. This patch increase by 10% the latency. It might be correct for the pathscale compiler, but it didn't look as

Re: [OMPI devel] [OMPI svn] svn:open-mpi r13644

2007-02-13 Thread Brian W. Barrett
On Feb 13, 2007, at 7:37 PM, Brian W. Barrett wrote: On Feb 13, 2007, at 5:16 PM, Jeff Squyres wrote: On Feb 13, 2007, at 7:10 PM, George Bosilca wrote: It's already in the 1.2 !!! I don't know much you care about performance, but I do. This patch increase by 10% the latency. I

Re: [OMPI devel] [PATCH] ompi_get_libtool_linker_flags.m4: fix $extra_ldflags detection

2007-02-24 Thread Brian W. Barrett
Thanks for the bug report and the patch. Unfortunately, the remove smallest prefix pattern syntax doesn't work with Solaris /bin/sh (standards would be better if everyone followed them...), but I committed something to our development trunk that handles the issue. It should be releases as

Re: [OMPI devel] replace 'atoi' with 'strtol'

2007-04-18 Thread Brian W. Barrett
The patch is so that you can pass in hex in addition to decimal, right? I think that makes sense. But since we're switching to strtol, it might also make sense to add some error detection while we're at it. Not a huge deal, but it would be nice :). Brian > Hi, > > I want to add a patch to opa

Re: [OMPI devel] replace 'atoi' with 'strtol'

2007-04-18 Thread Brian W. Barrett
> > Because the target variable is an (int). > > If I were writing the code, I would leave the cast out. By assigning > the value to an int variable, you get the same effect anyway, so the > cast is redundant. And if you ever change the variable to a long, now > you have to remember to delete th

Re: [OMPI devel] [OMPI svn] svn:open-mpi r14782

2007-05-27 Thread Brian W. Barrett
> On Sun, May 27, 2007 at 10:34:33AM -0600, Galen Shipman wrote: >> Actually, we still need MCA_BTL_FLAGS_FAKE_RDMA , it can be used as >> a hint for components such as one-sided. > What is the purpose of the hint if it should be set for each interconnect. > Just assume that it is set and behave a

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r15474

2007-07-17 Thread Brian W. Barrett
So first, there's an error in the patch (e-mail with details coming shortly, as there are many errors in the patch). There's no need for both isends (the new one and the one in there already). Second, this is in code that's a crutch around the real issue, which is that for a very small class of a

Re: [OMPI devel] PML cm and heterogeneous support

2007-10-25 Thread Brian W. Barrett
I'm surprised that ompi_mtl_datatype_{pack, unpack} are properly handling the heterogeneous issues - I certainly didn't take that into account when I wrote them. The CM code has never been audited for heterogeneous safety, which is why there was protection at that level for not running in hete

Re: [OMPI devel] Question regarding MCA_PML_CM_SEND_REQUEST_INIT_COMMON

2007-10-31 Thread Brian W. Barrett
This is correct -- the MPI_ERROR field should be filled in by the MTL upon completion of the request (or when it knows what to stick in there). The CM PML should generally not fill in that field. Brian On Wed, 31 Oct 2007, Jeff Squyres wrote: Again, I'm not a CM guy :-), but in general, I w

Re: [OMPI devel] Environment forwarding

2007-11-05 Thread Brian W. Barrett
This is extremely tricky to do. How do you know which environment variables to forward (foo in this case) and which not to (hostname). SLURM has a better chance, since it's linux only and generally only run on tightly controlled clusters. But there's a whole variety of things that shouldn't b

Re: [OMPI devel] Environment forwarding

2007-11-05 Thread Brian W. Barrett
On Mon, 5 Nov 2007, Torsten Hoefler wrote: On Mon, Nov 05, 2007 at 04:57:19PM -0500, Brian W. Barrett wrote: This is extremely tricky to do. How do you know which environment variables to forward (foo in this case) and which not to (hostname). SLURM has a better chance, since it's linux

[OMPI devel] Incorrect one-sided test

2007-11-07 Thread Brian W. Barrett
Hi all - Lisa Glendenning, who's working on a Portals one-sided component, discovered that the test onesided/test_start1.c in our repository is incorrect. It assumes that MPI_Win_start is non-blocking, but the standard says that "MPI_WIN_START is allowed to block until the corresponding MPI_

Re: [OMPI devel] THREAD_MULTIPLE

2007-11-28 Thread Brian W. Barrett
On Wed, 28 Nov 2007, Jeff Squyres wrote: We've had a few users complain about trying to use THREAD_MULTIPLE lately and having it not work. Here's a proposal: why don't we disable it (at least in the 1.2 series)? Or, at the very least, put in a big stderr warning that is displayed when THREAD_M

Re: [OMPI devel] RTE Issue II: Interaction between the ROUTED and GRPCOMM frameworks

2007-12-05 Thread Brian W. Barrett
To me, (a) is dumb and (c) isn't a non-starter. The whole point of the component system is to seperate concerns. Routing topology and collectives operations are two difference concerns. While there's some overlap (a topology-aware collective doesn't make sense when using the unity routing st

Re: [OMPI devel] vt-integration

2007-12-05 Thread Brian W. Barrett
OS X enforces a no duplicate symbol rule when flat namespaces are in use (the default on OS X). If all the libraries are two-level namespace libraries (libSystem.dylib, aka libm.dylib is two-level), then duplicate symbols are mostly ok. Libtool by default forces a flat namespace in sharedlibr

Re: [OMPI devel] opal_condition_wait

2007-12-06 Thread Brian W. Barrett
On Thu, 6 Dec 2007, Tim Prins wrote: Tim Prins wrote: First, in opal_condition_wait (condition.h:97) we do not release the passed mutex if opal_using_threads() is not set. Is there a reason for this? I ask since this violates the way condition variables are supposed to work, and it seems like t

Re: [OMPI devel] Dynamically Turning On and Off Memory Manager of Open MPI at Runtime??

2007-12-10 Thread Brian W. Barrett
On Mon, 10 Dec 2007, Peter Wong wrote: Open MPI defines its own malloc (by default), so malloc of glibc is not called. But, without calling malloc of glibc, the allocator of libhugetlbfs to back text and dynamic data by large pages, e.g., 16MB pages on POWER systems, is not used. Indeed, we ca

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Brian W. Barrett
On Tue, 11 Dec 2007, Gleb Natapov wrote: I did a rewrite of matching code in OB1. I made it much simpler and 2 times smaller (which is good, less code - less bugs). I also got rid of huge macros - very helpful if you need to debug something. There is no performance degradation, actually I even

Re: [OMPI devel] matching code rewrite in OB1

2007-12-12 Thread Brian W. Barrett
On Wed, 12 Dec 2007, Gleb Natapov wrote: On Wed, Dec 12, 2007 at 03:46:10PM -0500, Richard Graham wrote: This is better than nothing, but really not very helpful for looking at the specific issues that can arise with this, unless these systems have several parallel networks, with tests that wil

Re: [OMPI devel] IPv4 mapped IPv6 addresses

2007-12-14 Thread Brian W. Barrett
On Fri, 14 Dec 2007, Adrian Knoth wrote: Should we consider moving towards these mapped addresses? The implications: - less code, only one socket to handle - better FD consumption - breaks WinXP support, but not Vista/Longhorn or later - requires non-default kernel runtime setting on Op

Re: [OMPI devel] ptmalloc and pin down cache problems again

2008-01-07 Thread Brian W. Barrett
Nope, I think that's a valid approach. For some reason, I believe it was problematic for the OpenIB guys to do that at the time we were hacking up that code. But if it works, it sounds like a much better approach. When you make the change to the openib mpool, I'd also MORECORE_CANNONT_TRIM

Re: [OMPI devel] Fwd: === CREATE FAILURE ===

2008-01-24 Thread Brian W. Barrett
Automake forces v7 mode so that Solaris tar can untar the tarball, IIRC. Brian On Thu, 24 Jan 2008, Aurélien Bouteiller wrote: According to posix, tar should not limit the file name length. Only the v7 implementation of tar is limited to 99 characters. GNU tar has never been limited in the num

Re: [OMPI devel] xensocket - callbacks through OPAL/libevent

2008-02-05 Thread Brian W. Barrett
On Mon, 4 Feb 2008, Muhammad Atif wrote: I am trying to port xensockets to openmpi. In principle, I have the framework and everything, but there seems to be a small issue, I cannot get libevent (or OPAL) to give callbacks for receive (or send) for xensockets. I have tried to implement native c

Re: [OMPI devel] 3rd party code contributions

2008-02-08 Thread Brian W. Barrett
On Fri, 8 Feb 2008, Ralph Castain wrote: 1. event library 2. ROMIO 3. VT 4. backtrace 5. PLPA - this one is a little less obvious, but still being released as a separate package 6. libNBC Sorry to Ralph, but I clipped everything from his e-mail, then am going to make references to it. oh wel

Re: [OMPI devel] 1.3 Release schedule and contents

2008-02-11 Thread Brian W. Barrett
Out of curiousity, why is one-sided rdma component struck from 1.3? As far as I'm aware, the code is in the trunk and ready for release. Brian On Mon, 11 Feb 2008, Brad Benton wrote: All: The latest scrub of the 1.3 release schedule and contents is ready for review and comment. Please use

Re: [OMPI devel] New address selection for btl-tcp (was Re: [OMPI svn] svn:open-mpi r17307)

2008-02-22 Thread Brian W. Barrett
On Fri, 22 Feb 2008, Adrian Knoth wrote: I see three approaches: a) remove lo globally (in if.c). I expect objections. ;) I object! :). But for a good reason -- it'll break things. Someone tried this before, and the issue is when a node (like a laptop) only has lo -- then there are no

Re: [OMPI devel] RFC: libevent update

2008-03-18 Thread Brian W. Barrett
Jeff / George - Did you add a way to specify which event modules are used? Because epoll pushs the socket list into the kernel, I can see how it would screw up BLCR. I bet everything would work if we forced the use of poll / select. Brian On Tue, 18 Mar 2008, Jeff Squyres wrote: Crud, ok

[OMPI devel] Libtool for 1.3 / trunk builds

2008-03-19 Thread Brian W. Barrett
Hi all - Now that Libtool 2.2 has gone stable (2.0 was skipped entirely), it probably makes sense to update the version of Libtool used to build the nightly tarball and releases for the trunk (and eventually v1.3) from the nightly snapshot we have been using to the stable LT 2.2 release. I'v

Re: [OMPI devel] Libtool for 1.3 / trunk builds

2008-03-19 Thread Brian W. Barrett
about going to 2.2 now or not) On Mar 19, 2008, at 12:26 PM, Brian W. Barrett wrote: Hi all - Now that Libtool 2.2 has gone stable (2.0 was skipped entirely), it probably makes sense to update the version of Libtool used to build the nightly tarball and releases for the trunk (and eventually

[OMPI devel] Proc modex change

2008-03-20 Thread Brian W. Barrett
Hi all - Does anyone know why we go through the modex receive and for the local process in ompi_proc_get_info()? It doesn't seem like it's necessary, and it causes some problems on platforms that don't implement the modex (since it zeros out useful information determined during the init step)

Re: [OMPI devel] IRIX autoconf failure.

2008-03-21 Thread Brian W. Barrett
On Fri, 21 Mar 2008, Regan Russell wrote: I am having problems with the Assembler section of the GNU autoconf stuff on OpenMPI. Is anyone willing to work with me to get this up and running...? As a warning, MIPS / IRIX is not currently on the list of Open MPI supported platforms, so there m

Re: [OMPI devel] FreeBSD timer_base_open error?

2008-03-26 Thread Brian W. Barrett
George - Good catch -- that's going to cause a problem :). But I think we should add yet another check to also make sure that we're on Linux. So the three tests would be: 1) Am I on a platform that we have timer assembly support for? (That's the long list of architectures that we rec

Re: [OMPI devel] Memchecker: breaks trunk again

2008-04-21 Thread Brian W. Barrett
On Mon, 21 Apr 2008, Ralph H Castain wrote: So it appears to be a combination of memchecker=yes automatically requiring valgrind, and the override on the configure line of a param set by a platform file not working. So I can't speak to the valgrind/memchecker issue, but can to the platform/co

Re: [OMPI devel] Flush CQ error on iWARP/Out-of-sync shutdown

2008-05-06 Thread Brian W. Barrett
On Tue, 6 May 2008, Jeff Squyres wrote: On May 5, 2008, at 6:27 PM, Steve Wise wrote: There is a larger question regarding why the remote node is still polling the hca and not shutting down, but my immediate question is if it is an acceptable fix to simply disregard this "error" if it is an iW

Re: [OMPI devel] btl_openib_iwarp.c : making platform specific calls

2008-05-13 Thread Brian W. Barrett
On Tue, 13 May 2008, Don Kerr wrote: I believe there are similar operations being used by other areas of open mpi, place to start looking would be, opal/util/if.c. Yes, opal/util/if.h and opal/util/net.h provide a portable interface to almost everything that comes from getifaddrs(). Brian

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Brian W. Barrett
I think having a parameter to turn off the warning is a great idea. So great in fact, that it already exists in the trunk and v1.2 :)! Setting the default value for the btl_base_warn_component_unused flag from 0 to 1 will have the desired effect. I'm not sure I agree with setting the default

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Brian W. Barrett
Pasha) wrote: I'm agree with Brian. We may add to the warning message detailed description how to disable it. Pasha Brian W. Barrett wrote: I think having a parameter to turn off the warning is a great idea. So great in fact, that it already exists in the trunk and v1.2 :)! Setting the def

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Brian W. Barrett
On Wed, 21 May 2008, Jeff Squyres wrote: 2. An out-of-the-box "mpirun a.out" will print warning messages in perfectly valid/good configurations (no verbs-capable hardware, but just happen to have libibverbs installed). This is a Big Deal. Which is easily solved with a better error message, as

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Brian W. Barrett
d running. So we're only talking about the Open MPI warning message here. More below. On May 21, 2008, at 12:17 PM, Brian W. Barrett wrote: 2. An out-of-the-box "mpirun a.out" will print warning messages in perfectly valid/good configurations (no verbs-capable hardware, but just hap

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Brian W. Barrett
On Wed, 21 May 2008, Jeff Squyres wrote: On May 21, 2008, at 3:38 PM, Jeff Squyres wrote: It would be great if libibverbs could return two different error messages - one for "there's no IB card in this machine" and one for "there's an IB card here, but we can't initialize it". I think that wo

Re: [OMPI devel] openib btl build question

2008-05-21 Thread Brian W. Barrett
On Wed, 21 May 2008, Jeff Squyres wrote: On May 21, 2008, at 4:17 PM, Don Kerr wrote: Just want to make sure what I think I see is true: Linux build. openib btl requires ptmalloc2 and ptmalloc2 requires posix threads, is that correct? ptmalloc2 is not *required* by the openib btl. But it

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Brian W. Barrett
On Wed, 21 May 2008, Jeff Squyres wrote: I'm only concerned about the case where there's an IB card, the user expects the IB card to be used, and the IB card isn't used. Can you put in a site wide btl = ^tcp to avoid the problem? If the IB card fails, then you'll get unreachable MPI errors.

Re: [OMPI devel] openib btl build question

2008-05-22 Thread Brian W. Barrett
x27;t live without mpi_leave_pinned so threads are back. Jeff Squyres wrote: On May 21, 2008, at 4:37 PM, Brian W. Barrett wrote: ptmalloc2 is not *required* by the openib btl. But it is required on Linux if you want to use the mpi_leave_pinned functionality. I see one function call to __

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Brian W. Barrett
On Thu, 22 May 2008, Terry Dontje wrote: The major difference here is that libmyriexpress is not being included in mainline Linux distributions. Specifically: if you can find/use libmyriexpress, it's likely because you have that hardware. The same *used* to be true for libibverbs, but is no lo

  1   2   >