Chris Samuel wrote:
Okay, so let me point out a second distinction from MVAPICH: the default policy would be to spread out over sockets.----- "Eugene Loh" <eugene....@sun.com> wrote:This is an important discussion.Indeed! My big fear is that people won't pick up the significance of the change and will complain about performance regressions in the middle of an OMPI stable release cycle.2) The proposed OMPI bind-to-socket default is less severe. In the general case, it would allow multiple jobs to bind in the same way without oversubscribing any core or socket. (This comment added to the trac ticket.)That's a nice clarification, thanks. I suspect though that the same issue we have with MVAPICH would occur if two 4 core jobs both bound themselves to the first socket. Let's say you have two sockets, with four cores each. Let's say you submit two four-core jobs. The first job would put two processes on the first socket and two processes on the second. The second job would do the same. The loading would be even. I'm not saying there couldn't be problems. It's just that MVAPICH2 (at least what I looked at) has multiple shortfalls. The binding is to fill up one socket after another (which decreases memory bandwidth per process and increases chances of collisions with other jobs) and binding is to core (increasing chances of oversubscribing cores). The proposed OMPI behavior distributes over sockets (improving memory bandwidth per process and reducing collisions with other jobs) and binding is to sockets (reducing changes of oversubscribing cores, whether due to other MPI jobs or due to multithreaded processes). So, the proposed OMPI behavior mitigates the problems. It would be even better to have binding selections adapt to other bindings on the system. In any case, regardless of what the best behavior is, I appreciate the point about changing behavior in the middle of a stable release. Arguably, leaving significant performance on the table in typical situations is a bug that warrants fixing even in the middle of a release, but I won't try to settle that debate here. In debates on this subject, I've heard people argue that:3) Defaults (if I understand correctly) can be set differently on each cluster.Yes, but the defaults should be sensible for the majority of clusters. If the majority do indeed share nodes between jobs then I would suggest that the default should be off and the minority who don't share nodes should have to enable it. *) Though nodes are getting fatter, most are still thin. *) Resource managers tend to space share the cluster. |
- Re: [OMPI devel] Heads up on new feature to 1.3.4 Chris Samuel
- Re: [OMPI devel] Heads up on new feature to 1.3.4 Ralph Castain
- Re: [OMPI devel] Heads up on new feature to 1.3.4 Chris Samuel
- Re: [OMPI devel] Heads up on new feature to 1.3.4 George Bosilca
- Re: [OMPI devel] Heads up on new feature to 1.3.... Jeff Squyres
- Re: [OMPI devel] Heads up on new feature to ... Chris Samuel
- Re: [OMPI devel] Heads up on new feature... Ralph Castain
- Re: [OMPI devel] Heads up on new feature... Jeff Squyres
- Re: [OMPI devel] Heads up on new feature to 1.3.4 Chris Samuel
- Re: [OMPI devel] Heads up on new feature to 1.3.4 Eugene Loh
- Re: [OMPI devel] Heads up on new feature to 1.3.... Ralph Castain
- Re: [OMPI devel] Heads up on new feature to ... Lenny Verkhovsky
- Re: [OMPI devel] Heads up on new feature... Ralph Castain
- Re: [OMPI devel] Heads up on new fe... N.M. Maclaren
- Re: [OMPI devel] Heads up on ne... Ralph Castain
- Re: [OMPI devel] Heads up on ne... N.M. Maclaren
- Re: [OMPI devel] Heads up on ne... Jeff Squyres
- Re: [OMPI devel] Heads up on ne... N.M. Maclaren
- Re: [OMPI devel] Heads up on ne... Jeff Squyres
- Re: [OMPI devel] Heads up on ne... Patrick Geoffray