[hwloc-devel] Create success (hwloc r1.2.1rc1r3540)
Creating nightly hwloc snapshot SVN tarball was a success. Snapshot: hwloc 1.2.1rc1r3540 Start time: Mon Jul 4 21:03:32 EDT 2011 End time: Mon Jul 4 21:05:48 EDT 2011 Your friendly daemon, Cyrador
[hwloc-devel] Create success (hwloc r1.3a1r3537)
Creating nightly hwloc snapshot SVN tarball was a success. Snapshot: hwloc 1.3a1r3537 Start time: Mon Jul 4 21:01:02 EDT 2011 End time: Mon Jul 4 21:03:31 EDT 2011 Your friendly daemon, Cyrador
Re: [OMPI devel] TIPC BTL Segmentation fault
Hi, here is the result: ehhexxn@oak:~/git/test$ mpirun -n 2 -mca btl tipc,self valgrind ./hello_c > 11.out ==30850== Memcheck, a memory error detector ==30850== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. ==30850== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info ==30850== Command: ./hello_c ==30850== ==30849== Memcheck, a memory error detector ==30849== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. ==30849== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info ==30849== Command: ./hello_c ==30849== ==30849== Jump to the invalid address stated on the next line ==30849==at 0xDEAFBEEDDEAFBEED: ??? ==30849==by 0x50151F1: opal_list_construct (opal_list.c:88) ==30849==by 0xA8A49F1: opal_obj_run_constructors (opal_object.h:427) ==30849==by 0xA8A4E59: mca_pml_ob1_comm_construct (pml_ob1_comm.c:56) ==30849==by 0xA8A1385: opal_obj_run_constructors (opal_object.h:427) ==30849==by 0xA8A149F: opal_obj_new (opal_object.h:477) ==30849==by 0xA8A12FA: opal_obj_new_debug (opal_object.h:252) ==30849==by 0xA8A2A5F: mca_pml_ob1_add_comm (pml_ob1.c:182) ==30849==by 0x4E95F50: ompi_mpi_init (ompi_mpi_init.c:770) ==30849==by 0x4EC6C32: PMPI_Init (pinit.c:84) ==30849==by 0x400935: main (in /home/ehhexxn/git/test/hello_c) ==30849== Address 0xdeafbeeddeafbeed is not stack'd, malloc'd or (recently) free'd ==30849== [oak:30849] *** Process received signal *** [oak:30849] Signal: Segmentation fault (11) [oak:30849] Signal code: Invalid permissions (2) [oak:30849] Failing at address: 0xdeafbeeddeafbeed ==30849== Invalid read of size 1 ==30849==at 0xA011FDB: ??? (in /lib/libgcc_s.so.1) ==30849==by 0xA012B0B: _Unwind_Backtrace (in /lib/libgcc_s.so.1) ==30849==by 0x60BE69D: backtrace (backtrace.c:91) ==30849==by 0x4FAB055: opal_backtrace_buffer (backtrace_execinfo.c:54) ==30849==by 0x5026DF3: show_stackframe (stacktrace.c:348) ==30849==by 0x5DB1B3F: ??? (in /lib/libpthread-2.12.1.so) ==30849==by 0xDEAFBEEDDEAFBEEC: ??? ==30849==by 0x50151F1: opal_list_construct (opal_list.c:88) ==30849==by 0xA8A49F1: opal_obj_run_constructors (opal_object.h:427) ==30849==by 0xA8A4E59: mca_pml_ob1_comm_construct (pml_ob1_comm.c:56) ==30849==by 0xA8A1385: opal_obj_run_constructors (opal_object.h:427) ==30849==by 0xA8A149F: opal_obj_new (opal_object.h:477) ==30849== Address 0xdeafbeeddeafbeed is not stack'd, malloc'd or (recently) free'd ==30849== ==30849== ==30849== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==30849== General Protection Fault ==30849==at 0xA011FDB: ??? (in /lib/libgcc_s.so.1) ==30849==by 0xA012B0B: _Unwind_Backtrace (in /lib/libgcc_s.so.1) ==30849==by 0x60BE69D: backtrace (backtrace.c:91) ==30849==by 0x4FAB055: opal_backtrace_buffer (backtrace_execinfo.c:54) ==30849==by 0x5026DF3: show_stackframe (stacktrace.c:348) ==30849==by 0x5DB1B3F: ??? (in /lib/libpthread-2.12.1.so) ==30849==by 0xDEAFBEEDDEAFBEEC: ??? ==30849==by 0x50151F1: opal_list_construct (opal_list.c:88) ==30849==by 0xA8A49F1: opal_obj_run_constructors (opal_object.h:427) ==30849==by 0xA8A4E59: mca_pml_ob1_comm_construct (pml_ob1_comm.c:56) ==30849==by 0xA8A1385: opal_obj_run_constructors (opal_object.h:427) ==30849==by 0xA8A149F: opal_obj_new (opal_object.h:477) ==30850== Jump to the invalid address stated on the next line ==30850==at 0xDEAFBEEDDEAFBEED: ??? ==30850==by 0x50151F1: opal_list_construct (opal_list.c:88) ==30850==by 0xA8A49F1: opal_obj_run_constructors (opal_object.h:427) ==30850==by 0xA8A4E59: mca_pml_ob1_comm_construct (pml_ob1_comm.c:56) ==30850==by 0xA8A1385: opal_obj_run_constructors (opal_object.h:427) ==30850==by 0xA8A149F: opal_obj_new (opal_object.h:477) ==30850==by 0xA8A12FA: opal_obj_new_debug (opal_object.h:252) ==30850==by 0xA8A2A5F: mca_pml_ob1_add_comm (pml_ob1.c:182) ==30850==by 0x4E95F50: ompi_mpi_init (ompi_mpi_init.c:770) ==30850==by 0x4EC6C32: PMPI_Init (pinit.c:84) ==30850==by 0x400935: main (in /home/ehhexxn/git/test/hello_c) ==30850== Address 0xdeafbeeddeafbeed is not stack'd, malloc'd or (recently) free'd ==30850== [oak:30850] *** Process received signal *** [oak:30850] Signal: Segmentation fault (11) [oak:30850] Signal code: Invalid permissions (2) [oak:30850] Failing at address: 0xdeafbeeddeafbeed ==30849== ==30849== HEAP SUMMARY: ==30849== in use at exit: 2,338,964 bytes in 3,213 blocks ==30849== total heap usage: 5,205 allocs, 1,992 frees, 12,942,078 bytes allocated ==30849== ==30850== Invalid read of size 1 ==30850==at 0xA011FDB: ??? (in /lib/libgcc_s.so.1) ==30850==by 0xA012B0B: _Unwind_Backtrace (in /lib/libgcc_s.so.1) ==30850==by 0x60BE69D: backtrace (backtrace.c:91) ==30850==by 0x4FAB055: opal_backtrace_buffer (backtrace_execinfo.c:54) ==30850==by 0x5026DF3: show_stackframe
Re: [OMPI devel] TIPC BTL Segmentation fault
Keep in mind, too, that opal_object is the "base" object -- put in C++ terms, it's the abstract class that all other classes are made of. So it's rare that we could create a opal_object by itself. opal_objects are usually created as part of some other, higher-level object. What's the full call stack of where Valgrind is showing the error? Make sure you have the most recent valgrind (www.valgrind.org); the versions that ship in various distros may be somewhat old. Newer valgrind versions show lots of things that older versions don't. A new valgrind *might* be able to show some prior memory fault that is causing the issue...? On Jul 4, 2011, at 7:45 AM, Xin He wrote: > Hi, > > I ran the program with valgrind, and it showed almost the same error. It > appeared that the segmentation fault happened during > the initiation of an opal_object. That's why it puzzled me. > > /Xin > > > On 07/04/2011 01:40 PM, Jeff Squyres wrote: >> Ah -- so this is in the template code. I suspect this code might have bit >> rotted a bit. :-\ >> >> If you run this through valgrind, does anything obvious show up? I ask >> because this kind of error is typically a symptom of the real error. I.e., >> the real error was some kind of memory corruption that occurred earlier, and >> this is the memory access that exposes that prior memory corruption. >> >> >> On Jul 4, 2011, at 5:08 AM, Xin He wrote: >> >>> Yes, it is a opal_object. >>> >>> And this error seems to be caused by these code: >>> >>> void mca_btl_template_proc_construct(mca_btl_template_proc_t* >>> template_proc){ >>> ... >>>. >>> /* add to list of all proc instance */ >>> OPAL_THREAD_LOCK(_btl_template_component.template_lock); >>> >>> opal_list_append(_btl_template_component.template_procs,_proc->super); >>> OPAL_THREAD_UNLOCK(_btl_template_component.template_lock); >>> } >>> >>> /Xin >>> >>> On 07/02/2011 10:49 PM, Jeff Squyres (jsquyres) wrote: Do u know which object it is that is being constructed? When you compile with debugging enabled, theres strings in the object struct that identify te file and line where the obj was created. Sent from my phone. No type good. On Jun 29, 2011, at 8:48 AM, "Xin He"wrote: > Hi, > > As I advanced in my implementation of TIPC BTL, I added the component and > tried to run hello_c program to test. > > Then I got this segmentation fault. It seemed happening after the call > "mca_btl_tipc_add_procs". > > The error message displayed: > > [oak:23192] *** Process received signal *** > [oak:23192] Signal: Segmentation fault (11) > [oak:23192] Signal code: (128) > [oak:23192] Failing at address: (nil) > [oak:23192] [ 0] /lib/libpthread.so.0(+0xfb40) [0x7fec2a40fb40] > [oak:23192] [ 1] /usr/lib/libmpi.so.0(+0x1e6c10) [0x7fec2b2afc10] > [oak:23192] [ 2] /usr/lib/libmpi.so.0(+0x1e71f2) [0x7fec2b2b01f2] > [oak:23192] [ 3] /usr/lib/openmpi/mca_pml_ob1.so(+0x59f2) [0x7fec264fc9f2] > [oak:23192] [ 4] /usr/lib/openmpi/mca_pml_ob1.so(+0x5e5a) [0x7fec264fce5a] > [oak:23192] [ 5] /usr/lib/openmpi/mca_pml_ob1.so(+0x2386) [0x7fec264f9386] > [oak:23192] [ 6] /usr/lib/openmpi/mca_pml_ob1.so(+0x24a0) [0x7fec264f94a0] > [oak:23192] [ 7] /usr/lib/openmpi/mca_pml_ob1.so(+0x22fb) [0x7fec264f92fb] > [oak:23192] [ 8] /usr/lib/openmpi/mca_pml_ob1.so(+0x3a60) [0x7fec264faa60] > [oak:23192] [ 9] /usr/lib/libmpi.so.0(+0x67f51) [0x7fec2b130f51] > [oak:23192] [10] /usr/lib/libmpi.so.0(MPI_Init+0x173) [0x7fec2b161c33] > [oak:23192] [11] hello_i(main+0x22) [0x400936] > [oak:23192] [12] /lib/libc.so.6(__libc_start_main+0xfe) [0x7fec2a09bd8e] > [oak:23192] [13] hello_i() [0x400859] > [oak:23192] *** End of error message *** > > I used gdb to check the stack: > (gdb) bt > #0 0x77afac10 in opal_obj_run_constructors (object=0x6ca980) >at ../opal/class/opal_object.h:427 > #1 0x77afb1f2 in opal_list_construct (list=0x6ca958) at > class/opal_list.c:88 > #2 0x72d479f2 in opal_obj_run_constructors (object=0x6ca958) >at ../../../../opal/class/opal_object.h:427 > #3 0x72d47e5a in mca_pml_ob1_comm_construct (comm=0x6ca8c0) >at pml_ob1_comm.c:55 > #4 0x72d44386 in opal_obj_run_constructors (object=0x6ca8c0) >at ../../../../opal/class/opal_object.h:427 > #5 0x72d444a0 in opal_obj_new (cls=0x72f6c040) >at ../../../../opal/class/opal_object.h:477 > #6 0x72d442fb in opal_obj_new_debug (type=0x72f6c040, >file=0x72d62840 "pml_ob1.c", line=182) >at ../../../../opal/class/opal_object.h:252 > #7 0x72d45a60 in mca_pml_ob1_add_comm (comm=0x601060) at > pml_ob1.c:182 > #8 0x7797bf51 in ompi_mpi_init (argc=1,
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
On Jul 3, 2011, at 8:40 PM, Kawashima wrote: >> Does your llp sed path order MPI matching ordering? Eg if some prior isend >> is already queued, could the llp send overtake it? > > Yes, LLP send may overtake queued isend. > But we use correct PML send_sequence. So the LLP message is queued as > unexpected message on receiver side, and I think it's no problem. Good! I just wanted to ask because I couldn't quite tell from your prior description. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] TIPC BTL Segmentation fault
Ah -- so this is in the template code. I suspect this code might have bit rotted a bit. :-\ If you run this through valgrind, does anything obvious show up? I ask because this kind of error is typically a symptom of the real error. I.e., the real error was some kind of memory corruption that occurred earlier, and this is the memory access that exposes that prior memory corruption. On Jul 4, 2011, at 5:08 AM, Xin He wrote: > Yes, it is a opal_object. > > And this error seems to be caused by these code: > > void mca_btl_template_proc_construct(mca_btl_template_proc_t* template_proc){ > ... >. > /* add to list of all proc instance */ > OPAL_THREAD_LOCK(_btl_template_component.template_lock); > opal_list_append(_btl_template_component.template_procs, > _proc->super); > OPAL_THREAD_UNLOCK(_btl_template_component.template_lock); > } > > /Xin > > On 07/02/2011 10:49 PM, Jeff Squyres (jsquyres) wrote: >> Do u know which object it is that is being constructed? When you compile >> with debugging enabled, theres strings in the object struct that identify te >> file and line where the obj was created. >> >> Sent from my phone. No type good. >> >> On Jun 29, 2011, at 8:48 AM, "Xin He" >>>> wrote: >> >> >>> Hi, >>> >>> As I advanced in my implementation of TIPC BTL, I added the component and >>> tried to run hello_c program to test. >>> >>> Then I got this segmentation fault. It seemed happening after the call >>> "mca_btl_tipc_add_procs". >>> >>> The error message displayed: >>> >>> [oak:23192] *** Process received signal *** >>> [oak:23192] Signal: Segmentation fault (11) >>> [oak:23192] Signal code: (128) >>> [oak:23192] Failing at address: (nil) >>> [oak:23192] [ 0] /lib/libpthread.so.0(+0xfb40) [0x7fec2a40fb40] >>> [oak:23192] [ 1] /usr/lib/libmpi.so.0(+0x1e6c10) [0x7fec2b2afc10] >>> [oak:23192] [ 2] /usr/lib/libmpi.so.0(+0x1e71f2) [0x7fec2b2b01f2] >>> [oak:23192] [ 3] /usr/lib/openmpi/mca_pml_ob1.so(+0x59f2) [0x7fec264fc9f2] >>> [oak:23192] [ 4] /usr/lib/openmpi/mca_pml_ob1.so(+0x5e5a) [0x7fec264fce5a] >>> [oak:23192] [ 5] /usr/lib/openmpi/mca_pml_ob1.so(+0x2386) [0x7fec264f9386] >>> [oak:23192] [ 6] /usr/lib/openmpi/mca_pml_ob1.so(+0x24a0) [0x7fec264f94a0] >>> [oak:23192] [ 7] /usr/lib/openmpi/mca_pml_ob1.so(+0x22fb) [0x7fec264f92fb] >>> [oak:23192] [ 8] /usr/lib/openmpi/mca_pml_ob1.so(+0x3a60) [0x7fec264faa60] >>> [oak:23192] [ 9] /usr/lib/libmpi.so.0(+0x67f51) [0x7fec2b130f51] >>> [oak:23192] [10] /usr/lib/libmpi.so.0(MPI_Init+0x173) [0x7fec2b161c33] >>> [oak:23192] [11] hello_i(main+0x22) [0x400936] >>> [oak:23192] [12] /lib/libc.so.6(__libc_start_main+0xfe) [0x7fec2a09bd8e] >>> [oak:23192] [13] hello_i() [0x400859] >>> [oak:23192] *** End of error message *** >>> >>> I used gdb to check the stack: >>> (gdb) bt >>> #0 0x77afac10 in opal_obj_run_constructors (object=0x6ca980) >>>at ../opal/class/opal_object.h:427 >>> #1 0x77afb1f2 in opal_list_construct (list=0x6ca958) at >>> class/opal_list.c:88 >>> #2 0x72d479f2 in opal_obj_run_constructors (object=0x6ca958) >>>at ../../../../opal/class/opal_object.h:427 >>> #3 0x72d47e5a in mca_pml_ob1_comm_construct (comm=0x6ca8c0) >>>at pml_ob1_comm.c:55 >>> #4 0x72d44386 in opal_obj_run_constructors (object=0x6ca8c0) >>>at ../../../../opal/class/opal_object.h:427 >>> #5 0x72d444a0 in opal_obj_new (cls=0x72f6c040) >>>at ../../../../opal/class/opal_object.h:477 >>> #6 0x72d442fb in opal_obj_new_debug (type=0x72f6c040, >>>file=0x72d62840 "pml_ob1.c", line=182) >>>at ../../../../opal/class/opal_object.h:252 >>> #7 0x72d45a60 in mca_pml_ob1_add_comm (comm=0x601060) at >>> pml_ob1.c:182 >>> #8 0x7797bf51 in ompi_mpi_init (argc=1, argv=0x7fffdf58, >>> requested=0, >>>provided=0x7fffde28) at runtime/ompi_mpi_init.c:770 >>> #9 0x779acc33 in PMPI_Init (argc=0x7fffde5c, >>> argv=0x7fffde50) >>>at pinit.c:84 >>> #10 0x00400936 in main (argc=1, argv=0x7fffdf58) at hello_c.c:17 >>> >>> It seems the error happened when an object is constructed. Any idea why >>> this is happening? >>> >>> Thanks. >>> >>> Best regards, >>> Xin >>> >>> >>> ___ >>> devel mailing list >>> >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> ___ >> devel mailing list >> >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [hwloc-devel] hwloc trunk nightly 1.3a1r3511 fails to build on CentOS 5.6 & RHEL 5.6
All this should be fixed now, and the configure output is now clear (it doesn't change its mind about pci_init/cleanup or pci_lookup_name without any obvious reason anymore). FC7: checking for pci/pci.h... yes checking for pci_init in -lpci... no checking for pci_init in -lpci with -lz... yes checking for pci_lookup_name in -lpci... no checking for inet_ntoa in -lresolv... yes checking for pci_lookup_name in -lpci with -lresolv... yes RHEL5.6: checking for pci/pci.h... yes checking for pci_init in -lpci... yes checking for pci_lookup_name in -lpci... no checking for inet_ntoa in -lresolv... yes checking for pci_lookup_name in -lpci with -lresolv... yes RHEL5.3: checking for pci/pci.h... yes checking for pci_init in -lpci... yes checking for pci_lookup_name in -lpci... yes Christopher, it should work starting with trunk r3535. Brice Le 30/06/2011 07:50, Brice Goglin a écrit : > Le 29/06/2011 13:18, Brice Goglin a écrit : >> I don't think we finally fixed this. >> >> IIRC, we need either a way to bypass the cache, or always add -lresolv >> even if it's useless (or find another way to detect if lresolv is needed). > Redefining our own HWLOC_AC_CHECK_LIB_NO_CACHE looks possible. > > Otherwise, we could use something different from AC_CHECK_LIB for the > second check (AC_SEARCH_LIBS uses a different cache name). > Or even use AC_LINK_IFELSE/AC_TRY_LINK which never cache anything. > > Brice > > ___ > hwloc-devel mailing list > hwloc-de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
Re: [OMPI devel] TIPC BTL Segmentation fault
Yes, it is a opal_object. And this error seems to be caused by these code: void mca_btl_template_proc_construct(mca_btl_template_proc_t* template_proc){ ... . /* add to list of all proc instance */ OPAL_THREAD_LOCK(_btl_template_component.template_lock); opal_list_append(_btl_template_component.template_procs, _proc->super); OPAL_THREAD_UNLOCK(_btl_template_component.template_lock); } /Xin On 07/02/2011 10:49 PM, Jeff Squyres (jsquyres) wrote: Do u know which object it is that is being constructed? When you compile with debugging enabled, theres strings in the object struct that identify te file and line where the obj was created. Sent from my phone. No type good. On Jun 29, 2011, at 8:48 AM, "Xin He"wrote: Hi, As I advanced in my implementation of TIPC BTL, I added the component and tried to run hello_c program to test. Then I got this segmentation fault. It seemed happening after the call "mca_btl_tipc_add_procs". The error message displayed: [oak:23192] *** Process received signal *** [oak:23192] Signal: Segmentation fault (11) [oak:23192] Signal code: (128) [oak:23192] Failing at address: (nil) [oak:23192] [ 0] /lib/libpthread.so.0(+0xfb40) [0x7fec2a40fb40] [oak:23192] [ 1] /usr/lib/libmpi.so.0(+0x1e6c10) [0x7fec2b2afc10] [oak:23192] [ 2] /usr/lib/libmpi.so.0(+0x1e71f2) [0x7fec2b2b01f2] [oak:23192] [ 3] /usr/lib/openmpi/mca_pml_ob1.so(+0x59f2) [0x7fec264fc9f2] [oak:23192] [ 4] /usr/lib/openmpi/mca_pml_ob1.so(+0x5e5a) [0x7fec264fce5a] [oak:23192] [ 5] /usr/lib/openmpi/mca_pml_ob1.so(+0x2386) [0x7fec264f9386] [oak:23192] [ 6] /usr/lib/openmpi/mca_pml_ob1.so(+0x24a0) [0x7fec264f94a0] [oak:23192] [ 7] /usr/lib/openmpi/mca_pml_ob1.so(+0x22fb) [0x7fec264f92fb] [oak:23192] [ 8] /usr/lib/openmpi/mca_pml_ob1.so(+0x3a60) [0x7fec264faa60] [oak:23192] [ 9] /usr/lib/libmpi.so.0(+0x67f51) [0x7fec2b130f51] [oak:23192] [10] /usr/lib/libmpi.so.0(MPI_Init+0x173) [0x7fec2b161c33] [oak:23192] [11] hello_i(main+0x22) [0x400936] [oak:23192] [12] /lib/libc.so.6(__libc_start_main+0xfe) [0x7fec2a09bd8e] [oak:23192] [13] hello_i() [0x400859] [oak:23192] *** End of error message *** I used gdb to check the stack: (gdb) bt #0 0x77afac10 in opal_obj_run_constructors (object=0x6ca980) at ../opal/class/opal_object.h:427 #1 0x77afb1f2 in opal_list_construct (list=0x6ca958) at class/opal_list.c:88 #2 0x72d479f2 in opal_obj_run_constructors (object=0x6ca958) at ../../../../opal/class/opal_object.h:427 #3 0x72d47e5a in mca_pml_ob1_comm_construct (comm=0x6ca8c0) at pml_ob1_comm.c:55 #4 0x72d44386 in opal_obj_run_constructors (object=0x6ca8c0) at ../../../../opal/class/opal_object.h:427 #5 0x72d444a0 in opal_obj_new (cls=0x72f6c040) at ../../../../opal/class/opal_object.h:477 #6 0x72d442fb in opal_obj_new_debug (type=0x72f6c040, file=0x72d62840 "pml_ob1.c", line=182) at ../../../../opal/class/opal_object.h:252 #7 0x72d45a60 in mca_pml_ob1_add_comm (comm=0x601060) at pml_ob1.c:182 #8 0x7797bf51 in ompi_mpi_init (argc=1, argv=0x7fffdf58, requested=0, provided=0x7fffde28) at runtime/ompi_mpi_init.c:770 #9 0x779acc33 in PMPI_Init (argc=0x7fffde5c, argv=0x7fffde50) at pinit.c:84 #10 0x00400936 in main (argc=1, argv=0x7fffdf58) at hello_c.c:17 It seems the error happened when an object is constructed. Any idea why this is happening? Thanks. Best regards, Xin ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [hwloc-devel] hwloc_distances as utility?
Le 03/07/2011 23:55, Jiri Hladky a écrit : > Hi all, > > I have come across tests/hwloc_distances test and I believe that it > would be great to convert this into the utility > "hwloc-report-instances" published under utils/ directory. Please let > me know what you think about it. > > It would take the same input as hwloc-info (read topology from > different formats instead of discovering the topology on the local > machine), support both logical and physical indexes (-l and -p) switch. By the way, lstopo shows distance information, but it does not change it depending on -l/-p. We may want to fix this. > I have used stream memory bandwidth benchmark > (http://www.cs.virginia.edu/stream/) in the past to produce the > similar output as tests/hwloc_distances. It was interesting to see > that numactl and kernel scheduler are both using number of hopes > instead of memory bandwidth. Actually, Linux only uses the number of hops on one specific MIPS architecture (SGI IP27 Origin 200/2000). In other cases, it uses the cpu-to-memory latency (usually reported by ACPI or so). > On some systems number of hops does not represent memory bandwidth. I > have reported this in BZ 655041 > > https://bugzilla.redhat.com/show_bug.cgi?id=655041 This bug is private unfortunately. > In any case I believe that hwloc-report-instances would be useful > utility. Please let me know your opinion. Agreed. There are still several things to improve regarding distances. Everything should be in https://svn.open-mpi.org/trac/hwloc/ticket/43 Brice