Strange, the backtrace below looks total crazy, I don't see how debug checks could still pass in that case. Any chance you valgrind that thing?
Brice Le 21/09/2013 01:09, Ralph Castain a écrit : > Hmmm...nope, not a peep (no extra output at all). Just segfaulted like > before. > > On Sep 20, 2013, at 4:06 PM, Brice Goglin <brice.gog...@inria.fr > <mailto:brice.gog...@inria.fr>> wrote: > >> Try adding HWLOC_DEBUG_CHECK=1 in your environment, it will enable >> many assertions at the end of hwloc_topology_load() >> >> Brice >> >> >> >> Le 21/09/2013 01:03, Ralph Castain a écrit : >>> I didn't try loading it with lstopo - just tried the OMPI trunk. It >>> loads okay, but segfaults when you try to find an object by depth >>> >>> #0 0x00000001005fe5dc in opal_hwloc172_hwloc_get_obj_by_depth >>> (topology=Cannot access memory at address 0xfffffffffffffff7 >>> ) at traversal.c:623 >>> #1 0x0000000100b6dfaa in opal_hwloc172_hwloc_get_root_obj >>> (topology=Cannot access memory at address 0xfffffffffffffff7 >>> ) at rmaps_rr_mappers.c:747 >>> #2 0x0000000100b6e139 in orte_rmaps_rr_byslot (jdata=Cannot access >>> memory at address 0xffffffffffffff77 >>> ) at rmaps_rr_mappers.c:774 >>> #3 0x0000000100b6d6da in orte_rmaps_rr_map (jdata=Cannot access >>> memory at address 0xffffffffffffff17 >>> ) at rmaps_rr.c:211 >>> #4 0x0000000100353098 in orte_rmaps_base_map_job (fd=Cannot access >>> memory at address 0xfffffffffffffe7b >>> ) at base/rmaps_base_map_job.c:320 >>> #5 0x00000001005ce28c in event_process_active_single_queue >>> (base=Cannot access memory at address 0xffffffffffffffe7 >>> ) at event.c:1367 >>> #6 0x00000001005ce500 in event_process_active (base=Cannot access >>> memory at address 0xffffffffffffffe7 >>> ) at event.c:1437 >>> #7 0x00000001005ceb71 in opal_libevent2021_event_base_loop >>> (base=Cannot access memory at address 0xffffffffffffffb7 >>> ) at event.c:1645 >>> #8 0x00000001002c5158 in orterun (argc=Cannot access memory at >>> address 0xfffffffffffffd1b >>> ) at orterun.c:3039 >>> #9 0x00000001002c32a4 in main (argc=Cannot access memory at address >>> 0xfffffffffffffffb >>> ) at main.c:14 >>> >>> Looks to me like memory may be getting hosed >>> >>> >>> On Sep 20, 2013, at 2:59 PM, Brice Goglin <brice.gog...@inria.fr >>> <mailto:brice.gog...@inria.fr>> wrote: >>> >>>> I can't see any segfault. Where does the segfault occurs for you? >>>> In OMPI only (or lstopo too)? When loading or when using the topology? >>>> >>>> I tried lstopo on that file with and without >>>> HWLOC_NO_LIBXML_IMPORT=1 (in case the bug is in one of XML >>>> backends), looks ok. >>>> >>>> Brice >>>> >>>> >>>> >>>> >>>> >>>> Le 20/09/2013 23:53, Ralph Castain a écrit : >>>>> Here are the two files I tried - not from the same machine. The foo.xml >>>>> works, the topo.xml segfaults >>>>> >>>>> >>>>> >>>>> >>>>> One of our users reported it from their machine, but I don't have their >>>>> topo file. >>>>> >>>>> On Sep 20, 2013, at 2:41 PM, Brice Goglin <brice.gog...@inria.fr> wrote: >>>>> >>>>>> Hello, >>>>>> I don't see anything reason for such an incompatibility. But there are >>>>>> many combinations, we can't test everything. >>>>>> I can't reproduce that on my machines. Can you send the XML output of >>>>>> both versions on one of your machines? >>>>>> Brice >>>>>> >>>>>> >>>>>> >>>>>> Le 20/09/2013 23:32, Ralph Castain a écrit : >>>>>>> Hi folks >>>>>>> >>>>>>> I've run across a rather strange behavior. We have two branches in OMPI >>>>>>> - the devel trunk (using hwloc v1.7.2) and our feature release series >>>>>>> (using hwloc 1.5.2). I have found the following: >>>>>>> >>>>>>> *the feature series can correctly load an xml file generated by lstopo >>>>>>> of versions 1.5 or greater >>>>>>> >>>>>>> * the devel series can correctly load an xml file generated by lstopo >>>>>>> of versions 1.7 or greater, but not files generated by prior versions. >>>>>>> In the latter case, I segfault as soon as I try to use the loaded >>>>>>> topology. >>>>>>> >>>>>>> Any ideas why the discrepancy? Can I at least detect the version used >>>>>>> to create a file when loading it so I can error out instead of >>>>>>> segfaulting? >>>>>>> >>>>>>> Ralph >>>>>>> >>>>>>> _______________________________________________ >>>>>>> hwloc-devel mailing list >>>>>>> hwloc-de...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel >>>>>> _______________________________________________ >>>>>> hwloc-devel mailing list >>>>>> hwloc-de...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel >>>>> >>>>> >>>>> _______________________________________________ >>>>> hwloc-devel mailing list >>>>> hwloc-de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel >>>> >>>> _______________________________________________ >>>> hwloc-devel mailing list >>>> hwloc-de...@open-mpi.org <mailto:hwloc-de...@open-mpi.org> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel >>> >>> >>> >>> _______________________________________________ >>> hwloc-devel mailing list >>> hwloc-de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel >> >> _______________________________________________ >> hwloc-devel mailing list >> hwloc-de...@open-mpi.org <mailto:hwloc-de...@open-mpi.org> >> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel > > > > _______________________________________________ > hwloc-devel mailing list > hwloc-de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel