iirc, hwloc can read input from an xml file. if not already the case, should we provide a simple mechanism to tell hwloc not to detect the topology from the os but from a config file. for example, if working on a new os and/or hardware, then manually generate the hwloc xml file on each node and do something like mpirun --mca hwloc_file /etc/hwloc.xml ...
makes sense ? On Friday, September 4, 2015, Ralph Castain <r...@open-mpi.org> wrote: > It sounds, then, like removing —without-hwloc will do no harm. At worst, > hwloc might report inaccurate info, but that won’t stop us from running > with appropriate cmd line options (e.g., to set the #slots and bind-to > none). > > Unless there are any further concerns, I’ll prep the PR > > > On Sep 4, 2015, at 1:08 AM, Kawashima, Takahiro < > t-kawash...@jp.fujitsu.com > <javascript:_e(%7B%7D,'cvml','t-kawash...@jp.fujitsu.com');>> wrote: > > Brice, > > I'm a developer of Fujitsu MPI for K computer and Fujitsu > PRIMEHPC FX10/FX100 (SPARC-based CPU). > > Though I'm not familiar with the hwloc code and didn't know > the issue reported by Gilles, I also would be able to help > you to fix the issue. > > Takahiro Kawashima, > MPI development team, > Fujitsu > > Thanks Brice, > > bottom line, even if hwloc is not fully ported, it should build and ompi > should get something usable. > in this case, i have no objection removing the --without-hwloc configure > option. > > you can contact me off-list regarding the FX10 specific issue > > Cheers, > > Gilles > > On 9/4/2015 2:31 PM, Brice Goglin wrote: > > Le 04/09/2015 00:36, Gilles Gouaillardet a écrit : > > Ralph, > > just to be clear, your proposal is to abort if openmpi is configured > with --without-hwloc, right ? > ( the --with-hwloc option is not removed because we want to keep the > option of using an external hwloc library ) > > if I understand correctly, Paul's point is that if openmpi is ported > to a new architecture for which hwloc has not been ported yet > (embedded hwloc or external hwloc), then the very first step is to > port hwloc before ompi can be built. > > did I get it right Paul ? > > Brice, what would happen in such a case ? > embedded hwloc cannot be built ? > hwloc returns little or no information ? > > > If it's a new operating system and it supports at least things like > sysconf, you will get a Machine object with one PUs per logical processor. > > If it's a new platform running Linux, they are supposed to tell Linux > at least package/core/thread information. That's what we have for ARM > for instance. > > Missing topology detection can be worked around easily (with XML and > synthetic description, what we did for BlueGene/Q before adding manual > support for that specific processor). Binding support can't. > And once you get binding, you get x86-topology even if the operating > system isn't supported (using cpuid). > > for example, on Fujitsu FX10 node (single socket, 16 cores), hwloc > reports 16 sockets with one core each and no cache. though this is > not correct, that can be seen as equivalent to the real config by > ompi, so this is not really an issue for ompi. > > > Can you help fixing this? > > The issue is indeed with supercomputers with uncommon architectures > like this one. > > _______________________________________________ > devel mailing list > de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/09/17961.php > > >