Sure. Processors were scaled down while idling to 1000MHz (I hope this will show up as attachement instead of inlined...)
* on Wednesday, 16.12.09 at 18:12, Lenny Verkhovsky <lenny.verkhov...@gmail.com> wrote: > Hi, > can you provide $cat /proc/cpuinfo > I am not optimistic that it will help, but still... > thanks > Lenny. > > On Wed, Dec 16, 2009 at 6:01 PM, Daan van Rossum > <d...@flash.uchicago.edu>wrote: > > > Hi Terry, > > > > Thanks for your hint. I tried configure --enable-debug and even compiled it > > with all kind of manual debug flags turned on, but it doesn't help to get > > rid of this problem. So it definitively is not an optimization flaw. > > One more interesting test would be to try an older version of the Intel > > compiler. But the next older version that I have is 10.0.015, which is too > > old for the operating system (must be >10.1). > > > > > > A good thing is that this bug is very easy to test. You only need one line > > of MPI code and one process in the execution. > > > > A few more test cases: > > rank 0=node01 slot=1-7 > > and > > rank 0=node01 slot=0,2-7 > > and > > rank 0=node01 slot=0-1,3-7 > > work WELL. > > But > > rank 0=node01 slot=0-2,4-7 > > FAILS. > > > > As long as either slot 0, 1, OR 2 is excluded from the list it's allright. > > Excluding a different slot, like slot 3, does not help. > > > > > > I'll try to get hold of an Intel v10.1 compiler version. > > > > Best, > > Daan > > > > * on Monday, 14.12.09 at 14:57, Terry Dontje <terry.don...@sun.com> wrote: > > > > > I don't really want to throw fud on this list but we've seen all > > > sorts of oddities with OMPI 1.3.4 being built with Intel's 11.1 > > > compiler versus their 11.0 or other compilers (gcc, Sun Studio, pgi, > > > and pathscale). I have not tested your specific failing case but > > > considering your issue doesn't show up with gcc I am wondering if > > > there is some sort of optimization issue with the 11.1 compiler. > > > > > > It might be interesting to see if using certain optimization levels > > > with the Intel 11.1 compiler produces a working OMPI library. > > > > > > --td > > > > > > Daan van Rossum wrote: > > > >Hi Ralph, > > > > > > > >I took the Dec 10th snapshot, but got exactly the same behavior as with > > version 1.3.4. > > > > > > > >I just noticed that even this rankfile doesn't work, with a single > > process: > > > > rank 0=node01 slot=0-3 > > > > > > > >------------ > > > >[node01:31105] mca:base:select:(paffinity) Querying component [linux] > > > >[node01:31105] mca:base:select:(paffinity) Query of component [linux] > > set priority to 10 > > > >[node01:31105] mca:base:select:(paffinity) Selected component [linux] > > > >[node01:31105] paffinity slot assignment: slot_list == 0-3 > > > >[node01:31105] paffinity slot assignment: rank 0 runs on cpu #0 (#0) > > > >[node01:31105] paffinity slot assignment: rank 0 runs on cpu #1 (#1) > > > >[node01:31105] paffinity slot assignment: rank 0 runs on cpu #2 (#2) > > > >[node01:31105] paffinity slot assignment: rank 0 runs on cpu #3 (#3) > > > >[node01:31106] mca:base:select:(paffinity) Querying component [linux] > > > >[node01:31106] mca:base:select:(paffinity) Query of component [linux] > > set priority to 10 > > > >[node01:31106] mca:base:select:(paffinity) Selected component [linux] > > > >[node01:31106] paffinity slot assignment: slot_list == 0-3 > > > >[node01:31106] paffinity slot assignment: rank 0 runs on cpu #0 (#0) > > > >[node01:31106] paffinity slot assignment: rank 0 runs on cpu #1 (#1) > > > >[node01:31106] paffinity slot assignment: rank 0 runs on cpu #2 (#2) > > > >[node01:31106] paffinity slot assignment: rank 0 runs on cpu #3 (#3) > > > >[node01:31106] *** An error occurred in MPI_Comm_rank > > > >[node01:31106] *** on a NULL communicator > > > >[node01:31106] *** Unknown error > > > >[node01:31106] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) > > > >forrtl: severe (174): SIGSEGV, segmentation fault occurred > > > >------------ > > > > > > > >The spawned compute process doesn't sense that it should skip the > > setting paffinity... > > > > > > > > > > > >I saw the posting from last July about a similar problem (the problem > > that I mentioned on the bottom, with the slot=0:* notation not working). But > > that is a different problem (besides, that is still not working as it > > seems). > > > > > > > >Best, > > > >Daan > > > > > > > >* on Saturday, 12.12.09 at 18:48, Ralph Castain <r...@open-mpi.org> > > wrote: > > > > > > > >>This looks like an uninitialized variable that gnu c handles one way > > and intel another. Someone recently contributed a patch to the ompi trunk to > > fix just such a thing in this code area - don't know if it addresses this > > problem or not. > > > >> > > > >>Can you try the ompi trunk (a nightly tarball from the last day or so > > forward) and see if this still occurs? > > > >> > > > >>Thanks > > > >>Ralph > > > >> > > > >>On Dec 11, 2009, at 4:06 PM, Daan van Rossum wrote: > > > >> > > > >>>Hi all, > > > >>> > > > >>>There's a problem with ompi 1.3.4 when compiled with the intel > > 11.1.059 c compiler, related with the built in processor binding > > functionallity. The problem does not occur when ompi is compiled with the > > gnu c compiler. > > > >>> > > > >>>A mpi program execution fails (segfault) on mpi_init() when the > > following rank file is used: > > > >>>rank 0=node01 slot=0-3 > > > >>>rank 1=node01 slot=0-3 > > > >>>but runs fine with: > > > >>>rank 0=node01 slot=0 > > > >>>rank 1=node01 slot=1-3 > > > >>>and fine with: > > > >>>rank 0=node01 slot=0-1 > > > >>>rank 1=node01 slot=1-3 > > > >>>but segfaults with: > > > >>>rank 0=node01 slot=0-2 > > > >>>rank 1=node01 slot=1-3 > > > >>> > > > >>>This is on a two-processor quad-core opteron machine (occurs on all > > nodes of the cluster) with Ubuntu 8.10, kernel 2.6.27-16. > > > >>>This is the siplest case that fails. Generally, I would like to bind > > processors to physical procs but always allow any core, like > > > >>>rank 0=node01 slot=p0:0-3 > > > >>>rank 1=node01 slot=p0:0-3 > > > >>>rank 2=node01 slot=p0:0-3 > > > >>>rank 3=node01 slot=p0:0-3 > > > >>>rank 4=node01 slot=p1:0-3 > > > >>>rank 5=node01 slot=p1:0-3 > > > >>>rank 6=node01 slot=p1:0-3 > > > >>>rank 7=node01 slot=p1:0-3 > > > >>>which fails too. > > > >>> > > > >>>This happens with a test code that contains only two lines of code, > > calling mpi_init and mpi_finalize subsequently, and happens in both fortran > > and in c. > > > >>> > > > >>>One more interesting thing is, that the problem with setting the > > process affinity does not occur on our four-processor quad-core opteron > > nodes, with exactly the same OS etc. > > > >>> > > > >>> > > > >>>Setting "--mca paffinity_base_verbose 5" shows what is going wrong for > > this rankfile: > > > >>>rank 0=node01 slot=0-3 > > > >>>rank 1=node01 slot=0-3 > > > >>>------------- WRONG ----------------- > > > >>>[node01:23174] mca:base:select:(paffinity) Querying component [linux] > > > >>>[node01:23174] mca:base:select:(paffinity) Query of component [linux] > > set priority to 10 > > > >>>[node01:23174] mca:base:select:(paffinity) Selected component [linux] > > > >>>[node01:23174] paffinity slot assignment: slot_list == 0-3 > > > >>>[node01:23174] paffinity slot assignment: rank 0 runs on cpu #0 (#0) > > > >>>[node01:23174] paffinity slot assignment: rank 0 runs on cpu #1 (#1) > > > >>>[node01:23174] paffinity slot assignment: rank 0 runs on cpu #2 (#2) > > > >>>[node01:23174] paffinity slot assignment: rank 0 runs on cpu #3 (#3) > > > >>>[node01:23174] paffinity slot assignment: slot_list == 0-3 > > > >>>[node01:23174] paffinity slot assignment: rank 1 runs on cpu #0 (#0) > > > >>>[node01:23174] paffinity slot assignment: rank 1 runs on cpu #1 (#1) > > > >>>[node01:23174] paffinity slot assignment: rank 1 runs on cpu #2 (#2) > > > >>>[node01:23174] paffinity slot assignment: rank 1 runs on cpu #3 (#3) > > > >>>[node01:23175] mca:base:select:(paffinity) Querying component [linux] > > > >>>[node01:23175] mca:base:select:(paffinity) Query of component [linux] > > set priority to 10 > > > >>>[node01:23175] mca:base:select:(paffinity) Selected component [linux] > > > >>>[node01:23176] mca:base:select:(paffinity) Querying component [linux] > > > >>>[node01:23176] mca:base:select:(paffinity) Query of component [linux] > > set priority to 10 > > > >>>[node01:23176] mca:base:select:(paffinity) Selected component [linux] > > > >>>[node01:23175] paffinity slot assignment: slot_list == 0-3 > > > >>>[node01:23175] paffinity slot assignment: rank 0 runs on cpu #0 (#0) > > > >>>[node01:23175] paffinity slot assignment: rank 0 runs on cpu #1 (#1) > > > >>>[node01:23175] paffinity slot assignment: rank 0 runs on cpu #2 (#2) > > > >>>[node01:23175] paffinity slot assignment: rank 0 runs on cpu #3 (#3) > > > >>>[node01:23176] paffinity slot assignment: slot_list == 0-3 > > > >>>[node01:23176] paffinity slot assignment: rank 1 runs on cpu #0 (#0) > > > >>>[node01:23176] paffinity slot assignment: rank 1 runs on cpu #1 (#1) > > > >>>[node01:23176] paffinity slot assignment: rank 1 runs on cpu #2 (#2) > > > >>>[node01:23176] paffinity slot assignment: rank 1 runs on cpu #3 (#3) > > > >>>[node01:23175] *** Process received signal *** > > > >>>[node01:23176] *** Process received signal *** > > > >>>[node01:23175] Signal: Segmentation fault (11) > > > >>>[node01:23175] Signal code: Address not mapped (1) > > > >>>[node01:23175] Failing at address: 0x30 > > > >>>[node01:23176] Signal: Segmentation fault (11) > > > >>>[node01:23176] Signal code: Address not mapped (1) > > > >>>[node01:23176] Failing at address: 0x30 > > > >>>------------- WRONG ----------------- > > > >>> > > > >>>------------- RIGHT ----------------- > > > >>>[node25:23241] mca:base:select:(paffinity) Querying component [linux] > > > >>>[node25:23241] mca:base:select:(paffinity) Query of component [linux] > > set priority to 10 > > > >>>[node25:23241] mca:base:select:(paffinity) Selected component [linux] > > > >>>[node25:23241] paffinity slot assignment: slot_list == 0-3 > > > >>>[node25:23241] paffinity slot assignment: rank 0 runs on cpu #0 (#0) > > > >>>[node25:23241] paffinity slot assignment: rank 0 runs on cpu #1 (#1) > > > >>>[node25:23241] paffinity slot assignment: rank 0 runs on cpu #2 (#2) > > > >>>[node25:23241] paffinity slot assignment: rank 0 runs on cpu #3 (#3) > > > >>>[node25:23241] paffinity slot assignment: slot_list == 0-3 > > > >>>[node25:23241] paffinity slot assignment: rank 1 runs on cpu #0 (#0) > > > >>>[node25:23241] paffinity slot assignment: rank 1 runs on cpu #1 (#1) > > > >>>[node25:23241] paffinity slot assignment: rank 1 runs on cpu #2 (#2) > > > >>>[node25:23241] paffinity slot assignment: rank 1 runs on cpu #3 (#3) > > > >>>[node25:23242] mca:base:select:(paffinity) Querying component [linux] > > > >>>[node25:23242] mca:base:select:(paffinity) Query of component [linux] > > set priority to 10 > > > >>>[node25:23242] mca:base:select:(paffinity) Selected component [linux] > > > >>>[node25:23243] mca:base:select:(paffinity) Querying component [linux] > > > >>>[node25:23243] mca:base:select:(paffinity) Query of component [linux] > > set priority to 10 > > > >>>[node25:23243] mca:base:select:(paffinity) Selected component [linux] > > > >>>------------- RIGHT ----------------- > > > >>> > > > >>>Apparently, only a master process (ID [node01:23174] and > > [node25:23241]) set the paffinity in the RIGHT case, but in the WRONG case, > > also the compute processes ([node01:23175] and [node01:23176], rank0 and > > rank1) try to set the their own paffinity properties. > > > >>> > > > >>> > > > >>> > > > >>>Note that for the rankfile also the notation does not work. But that > > seems to have a different origin, as it tries to bind to a core# 4, whereas > > there are just 0-3. > > > >>>rank 0=node01 slot=0:* > > > >>>rank 1=node01 slot=0:* > > > >>> > > > >>> > > > >>>Thanks for your help on this! > > > >>> > > > >>>-- > > > >>>Daan van Rossum > > > >>>_______________________________________________ > > > >>>devel mailing list > > > >>>de...@open-mpi.org > > > >>>http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > >>_______________________________________________ > > > >>devel mailing list > > > >>de...@open-mpi.org > > > >>http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > > > > >-- > > > >Daan van Rossum > > > > > > > >University of Chicago > > > >Department of Astronomy and Astrophysics > > > >5640 S. Ellis Ave > > > >Chicago, IL 60637 > > > >phone: 773-7020624 > > > >_______________________________________________ > > > >devel mailing list > > > >de...@open-mpi.org > > > >http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > > > _______________________________________________ > > > devel mailing list > > > de...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > -- > > Daan van Rossum > > > > University of Chicago > > Department of Astronomy and Astrophysics > > 5640 S. Ellis Ave > > Chicago, IL 60637 > > phone: 773-7020624 > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Daan van Rossum University of Chicago Department of Astronomy and Astrophysics 5640 S. Ellis Ave Chicago, IL 60637 phone: 773-7020624
processor : 0 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2350 stepping : 3 cpu MHz : 1000.000 cache size : 512 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs bogomips : 4020.53 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate processor : 1 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2350 stepping : 3 cpu MHz : 1000.000 cache size : 512 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 4 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs bogomips : 4024.42 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate processor : 2 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2350 stepping : 3 cpu MHz : 1000.000 cache size : 512 KB physical id : 0 siblings : 4 core id : 2 cpu cores : 4 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs bogomips : 4024.32 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate processor : 3 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2350 stepping : 3 cpu MHz : 1000.000 cache size : 512 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs bogomips : 4024.32 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate processor : 4 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2350 stepping : 3 cpu MHz : 1000.000 cache size : 512 KB physical id : 1 siblings : 4 core id : 0 cpu cores : 4 apicid : 4 initial apicid : 4 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs bogomips : 4020.69 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate processor : 5 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2350 stepping : 3 cpu MHz : 1000.000 cache size : 512 KB physical id : 1 siblings : 4 core id : 1 cpu cores : 4 apicid : 5 initial apicid : 5 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs bogomips : 4020.64 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate processor : 6 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2350 stepping : 3 cpu MHz : 1000.000 cache size : 512 KB physical id : 1 siblings : 4 core id : 2 cpu cores : 4 apicid : 6 initial apicid : 6 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs bogomips : 4020.63 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate processor : 7 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2350 stepping : 3 cpu MHz : 1000.000 cache size : 512 KB physical id : 1 siblings : 4 core id : 3 cpu cores : 4 apicid : 7 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs bogomips : 4020.69 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate