Re: [OMPI users] sm btl choices
On Mar 1, 2010, at 10:04 AM, David Turner wrote: > Hi Ralph, > >> Which version of OMPI are you using? We know that the 1.2 series was >> unreliable about removing the session directories, but 1.3 and above appear >> to be quite good about it. If you are having problems with the 1.3 or 1.4 >> series, I would definitely like to know about it. >> When I was at LANL, I ran a number of tests in exactly this configuration. >> While the sm btl did provide some performance advantage, it wasn't very much >> (the bandwidth was only about 10% greater, and the latency wasn't all that >> different either). I set the default configuration for users to include sm >> as 10% isn't something to sneer at, but you could disable it without an >> enormous impact. > > I realize I have another question about this. When you say "exactly" > this configuration, do you mean the mmap files were backed to /tmp > via ramdisk, or to a remote file system over the communications fabric? Backed to /tmp via ramdisk > > We have historically redefined TMPDIR to point somewhere other than > /tmp, and have told our users *never* to use /tmp (if possible). > I suppose that if OMPI cleans up after itself, and we use a > prologue/epilogue, and regular scrubbing, we can keep /tmp under > control. That's what LANL does...i.e., OMPI cleanup + epilogue > >> Another option would be to run an epilog that hammers the session directory. >> That's what LANL does, even though we didn't see much trouble with cleanup >> starting with the 1.3 series (still have a bunch of users stuck on 1.2). >> Depending on what environment you are running, you might contact folks there >> and get a copy of their epilog script. >> On Mar 1, 2010, at 1:42 AM, David Turner wrote: >>> Hi all, >>> >>> Running on a large cluster of 8-core nodes. I understand >>> that the SM BTL is a "good thing". But I'm curious about >>> its use of memory-mapped files. I believe these files will >>> be in $TMPDIR, which defaults to /tmp. >>> >>> In our cluster, the compute nodes are stateless, so /tmp >>> is actually in RAM. Keeping memory-mapped "files" in >>> memory seems kind of circular, although I know little >>> about these things. A bigger problem is that it appears >>> OMPI does not remove the files upon completion. >>> >>> Another option is to redefine $TMPDIR to point to a >>> "real" file system. In our cluster, all the available >>> file systems are accessed over the IB fabric. So it >>> seems that there will be IB traffic, even though the >>> point of the SM BTL is to avoid this traffic. >>> >>> Given the above two constraints, might it just be >>> better to disable the SM BTL entirely, and use the >>> IB BTL even within a node? Of course, the "self" >>> BTL should still be used if appropriate. >>> >>> Any thoughts clarifying these issues would be >>> greatly appreciated. Thanks! >>> >>> -- >>> Best regards, >>> >>> David Turner >>> User Services Groupemail: dptur...@lbl.gov >>> NERSC Division phone: (510) 486-4027 >>> Lawrence Berkeley Labfax: (510) 486-4316 >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Best regards, > > David Turner > User Services Groupemail: dptur...@lbl.gov > NERSC Division phone: (510) 486-4027 > Lawrence Berkeley Labfax: (510) 486-4316 > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] sm btl choices
Hi Ralph, Which version of OMPI are you using? We know that the 1.2 series was unreliable about removing the session directories, but 1.3 and above appear to be quite good about it. If you are having problems with the 1.3 or 1.4 series, I would definitely like to know about it. When I was at LANL, I ran a number of tests in exactly this configuration. While the sm btl did provide some performance advantage, it wasn't very much (the bandwidth was only about 10% greater, and the latency wasn't all that different either). I set the default configuration for users to include sm as 10% isn't something to sneer at, but you could disable it without an enormous impact. I realize I have another question about this. When you say "exactly" this configuration, do you mean the mmap files were backed to /tmp via ramdisk, or to a remote file system over the communications fabric? We have historically redefined TMPDIR to point somewhere other than /tmp, and have told our users *never* to use /tmp (if possible). I suppose that if OMPI cleans up after itself, and we use a prologue/epilogue, and regular scrubbing, we can keep /tmp under control. Another option would be to run an epilog that hammers the session directory. That's what LANL does, even though we didn't see much trouble with cleanup starting with the 1.3 series (still have a bunch of users stuck on 1.2). Depending on what environment you are running, you might contact folks there and get a copy of their epilog script. On Mar 1, 2010, at 1:42 AM, David Turner wrote: Hi all, Running on a large cluster of 8-core nodes. I understand that the SM BTL is a "good thing". But I'm curious about its use of memory-mapped files. I believe these files will be in $TMPDIR, which defaults to /tmp. In our cluster, the compute nodes are stateless, so /tmp is actually in RAM. Keeping memory-mapped "files" in memory seems kind of circular, although I know little about these things. A bigger problem is that it appears OMPI does not remove the files upon completion. Another option is to redefine $TMPDIR to point to a "real" file system. In our cluster, all the available file systems are accessed over the IB fabric. So it seems that there will be IB traffic, even though the point of the SM BTL is to avoid this traffic. Given the above two constraints, might it just be better to disable the SM BTL entirely, and use the IB BTL even within a node? Of course, the "self" BTL should still be used if appropriate. Any thoughts clarifying these issues would be greatly appreciated. Thanks! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
Re: [OMPI users] sm btl choices
On Mar 1, 2010, at 8:41 AM, David Turner wrote: > On 3/1/10 1:51 AM, Ralph Castain wrote: >> Which version of OMPI are you using? We know that the 1.2 series was >> unreliable about removing the session directories, but 1.3 and above appear >> to be quite good about it. If you are having problems with the 1.3 or 1.4 >> series, I would definitely like to know about it. > > Oops; sorry! OMPI 1.4.1, compiled with PGI 10.0 compilers, > running on Scientific Linux 5.4, ofed 1.4.2. > > The session directories are *frequently* left behind. I have > not really tried to characterize under what circumstances they > are removed. But please confirm: they *should* be removed by > OMPI. Most definitely - they should always be removed by OMPI. This is the first report we have had of them -not- being removed in the 1.4 series, so it is disturbing. What environment are you running under? Does this happen under normal termination, or under abnormal failures (the more you can tell us, the better)? > >> When I was at LANL, I ran a number of tests in exactly this configuration. >> While the sm btl did provide some performance advantage, it wasn't very much >> (the bandwidth was only about 10% greater, and the latency wasn't all that >> different either). I set the default configuration for users to include sm >> as 10% isn't something to sneer at, but you could disable it without an >> enormous impact. > > I'd prefer to provide as much performance as possible, also. > >> Another option would be to run an epilog that hammers the session directory. >> That's what LANL does, even though we didn't see much trouble with cleanup >> starting with the 1.3 series (still have a bunch of users stuck on 1.2). >> Depending on what environment you are running, you might contact folks there >> and get a copy of their epilog script. > > Yes, we are already planning our prologues and epilogues, just > haven't implemented them yet. Even if I can find and fix a > reason why OMPI is currently not doing this, we will probably > do it an epilogue anyway. > > Thanks for your help! > >> On Mar 1, 2010, at 1:42 AM, David Turner wrote: >>> Hi all, >>> >>> Running on a large cluster of 8-core nodes. I understand >>> that the SM BTL is a "good thing". But I'm curious about >>> its use of memory-mapped files. I believe these files will >>> be in $TMPDIR, which defaults to /tmp. >>> >>> In our cluster, the compute nodes are stateless, so /tmp >>> is actually in RAM. Keeping memory-mapped "files" in >>> memory seems kind of circular, although I know little >>> about these things. A bigger problem is that it appears >>> OMPI does not remove the files upon completion. >>> >>> Another option is to redefine $TMPDIR to point to a >>> "real" file system. In our cluster, all the available >>> file systems are accessed over the IB fabric. So it >>> seems that there will be IB traffic, even though the >>> point of the SM BTL is to avoid this traffic. >>> >>> Given the above two constraints, might it just be >>> better to disable the SM BTL entirely, and use the >>> IB BTL even within a node? Of course, the "self" >>> BTL should still be used if appropriate. >>> >>> Any thoughts clarifying these issues would be >>> greatly appreciated. Thanks! >>> >>> -- >>> Best regards, >>> >>> David Turner >>> User Services Groupemail: dptur...@lbl.gov >>> NERSC Division phone: (510) 486-4027 >>> Lawrence Berkeley Labfax: (510) 486-4316 >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Best regards, > > David Turner > User Services Groupemail: dptur...@lbl.gov > NERSC Division phone: (510) 486-4027 > Lawrence Berkeley Labfax: (510) 486-4316 > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] sm btl choices
On 3/1/10 1:51 AM, Ralph Castain wrote: Which version of OMPI are you using? We know that the 1.2 series was unreliable about removing the session directories, but 1.3 and above appear to be quite good about it. If you are having problems with the 1.3 or 1.4 series, I would definitely like to know about it. Oops; sorry! OMPI 1.4.1, compiled with PGI 10.0 compilers, running on Scientific Linux 5.4, ofed 1.4.2. The session directories are *frequently* left behind. I have not really tried to characterize under what circumstances they are removed. But please confirm: they *should* be removed by OMPI. When I was at LANL, I ran a number of tests in exactly this configuration. While the sm btl did provide some performance advantage, it wasn't very much (the bandwidth was only about 10% greater, and the latency wasn't all that different either). I set the default configuration for users to include sm as 10% isn't something to sneer at, but you could disable it without an enormous impact. I'd prefer to provide as much performance as possible, also. Another option would be to run an epilog that hammers the session directory. That's what LANL does, even though we didn't see much trouble with cleanup starting with the 1.3 series (still have a bunch of users stuck on 1.2). Depending on what environment you are running, you might contact folks there and get a copy of their epilog script. Yes, we are already planning our prologues and epilogues, just haven't implemented them yet. Even if I can find and fix a reason why OMPI is currently not doing this, we will probably do it an epilogue anyway. Thanks for your help! On Mar 1, 2010, at 1:42 AM, David Turner wrote: Hi all, Running on a large cluster of 8-core nodes. I understand that the SM BTL is a "good thing". But I'm curious about its use of memory-mapped files. I believe these files will be in $TMPDIR, which defaults to /tmp. In our cluster, the compute nodes are stateless, so /tmp is actually in RAM. Keeping memory-mapped "files" in memory seems kind of circular, although I know little about these things. A bigger problem is that it appears OMPI does not remove the files upon completion. Another option is to redefine $TMPDIR to point to a "real" file system. In our cluster, all the available file systems are accessed over the IB fabric. So it seems that there will be IB traffic, even though the point of the SM BTL is to avoid this traffic. Given the above two constraints, might it just be better to disable the SM BTL entirely, and use the IB BTL even within a node? Of course, the "self" BTL should still be used if appropriate. Any thoughts clarifying these issues would be greatly appreciated. Thanks! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
Re: [OMPI users] sm btl choices
Which version of OMPI are you using? We know that the 1.2 series was unreliable about removing the session directories, but 1.3 and above appear to be quite good about it. If you are having problems with the 1.3 or 1.4 series, I would definitely like to know about it. When I was at LANL, I ran a number of tests in exactly this configuration. While the sm btl did provide some performance advantage, it wasn't very much (the bandwidth was only about 10% greater, and the latency wasn't all that different either). I set the default configuration for users to include sm as 10% isn't something to sneer at, but you could disable it without an enormous impact. Another option would be to run an epilog that hammers the session directory. That's what LANL does, even though we didn't see much trouble with cleanup starting with the 1.3 series (still have a bunch of users stuck on 1.2). Depending on what environment you are running, you might contact folks there and get a copy of their epilog script. On Mar 1, 2010, at 1:42 AM, David Turner wrote: > Hi all, > > Running on a large cluster of 8-core nodes. I understand > that the SM BTL is a "good thing". But I'm curious about > its use of memory-mapped files. I believe these files will > be in $TMPDIR, which defaults to /tmp. > > In our cluster, the compute nodes are stateless, so /tmp > is actually in RAM. Keeping memory-mapped "files" in > memory seems kind of circular, although I know little > about these things. A bigger problem is that it appears > OMPI does not remove the files upon completion. > > Another option is to redefine $TMPDIR to point to a > "real" file system. In our cluster, all the available > file systems are accessed over the IB fabric. So it > seems that there will be IB traffic, even though the > point of the SM BTL is to avoid this traffic. > > Given the above two constraints, might it just be > better to disable the SM BTL entirely, and use the > IB BTL even within a node? Of course, the "self" > BTL should still be used if appropriate. > > Any thoughts clarifying these issues would be > greatly appreciated. Thanks! > > -- > Best regards, > > David Turner > User Services Groupemail: dptur...@lbl.gov > NERSC Division phone: (510) 486-4027 > Lawrence Berkeley Labfax: (510) 486-4316 > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] sm btl choices
Hi all, Running on a large cluster of 8-core nodes. I understand that the SM BTL is a "good thing". But I'm curious about its use of memory-mapped files. I believe these files will be in $TMPDIR, which defaults to /tmp. In our cluster, the compute nodes are stateless, so /tmp is actually in RAM. Keeping memory-mapped "files" in memory seems kind of circular, although I know little about these things. A bigger problem is that it appears OMPI does not remove the files upon completion. Another option is to redefine $TMPDIR to point to a "real" file system. In our cluster, all the available file systems are accessed over the IB fabric. So it seems that there will be IB traffic, even though the point of the SM BTL is to avoid this traffic. Given the above two constraints, might it just be better to disable the SM BTL entirely, and use the IB BTL even within a node? Of course, the "self" BTL should still be used if appropriate. Any thoughts clarifying these issues would be greatly appreciated. Thanks! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316