Re: [OMPI devel] New OMPI MPI extension
Hi Jeff, There's a typo in trunk/README: -> 1175 ...unrelated to wach other I guess you mean "unrelated to each other". Rayson On Wed, Apr 21, 2010 at 12:35 PM, Jeff Squyres wrote: > Per the telecon Tuesday, I committed a new OMPI MPI extension to the trunk: > > https://svn.open-mpi.org/trac/ompi/changeset/23018 > > Please read the commit message and let me know what you think. Suggestions > are welcome. > > If everyone is ok with it, I'd like to see this functionality hit the 1.5 > series at some point. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
[OMPI devel] sendrecv_replace: long time to allocate/free memory
Hi all, The sendrecv_replace in Open MPI seems to allocate/free memory with MPI_Alloc_mem()/MPI_Free_mem() I measured the time to allocate/free a buffer of 1MB. MPI_Alloc_mem/MPI_Free_mem take 350us while malloc/free only take 8us. malloc/free in ompi/mpi/c/sendrecv_replace.c was replaced by MPI_Alloc_mem/MPI_Free_mem with this commit : user:twoodall date:Thu Sep 22 16:43:17 2005 + summary: use MPI_Alloc_mem/MPI_Free_mem for internally allocated buffers Is there a real reason to use these functions or can we move back to malloc/free ? Is there a problem on my configuration explaining such slow performance with MPI_Alloc_mem ? Pascal
[OMPI devel] Segmentation fault on x86_64 on heterogeneous environment
Hello, list. I have a strange segmentation fault on x86_64 machine running together with x86. I am running attached program that sends some bytes from process 0 to process 1. My configuration is: Machine #1: (process 0) arch: x86 hostname: magomedov-desktop linux distro: Ubuntu 9.10 Open MPI: v1.4 configured with --enable-heterogeneous --enable-debug Machine #2: (process 1) arch: x86_64 hostname: linuxtche linux distro: Fedora 12 Open MPI: v1.4 configured with --enable-heterogeneous --prefix=/home/magomedov/openmpi/ --enable-debug They are connected by ethernet. My user environment on second (x86_64) machine is set up to use Open MPI from /home/magomedov/openmpi/. Then I compile attached program on both machines (at the same path) and run it. Process 0 from x86 machine should send data to process 1 on x86_64 machine. First, let's send 65530 bytes: mpirun -host timur,linuxtche -np 2 /home/magomedov/workspace/mpi-test/mpi-send-test 65530 magomedov@linuxtche's password: *** processor magomedov-desktop, comm size is 2, my rank is 0, pid 21875 *** *** processor linuxtche, comm size is 2, my rank is 1, pid 11357 *** Received 65530 bytes It's OK. Then let's send 65537 bytes: magomedov@magomedov-desktop:~/workspace/mpi-test$ mpirun -host timur,linuxtche -np 2 /home/magomedov/workspace/mpi-test/mpi-send-test 65537 magomedov@linuxtche's password: *** processor magomedov-desktop, comm size is 2, my rank is 0, pid 9205 *** *** processor linuxtche, comm size is 2, my rank is 1, pid 28858 *** [linuxtche:28858] *** Process received signal *** [linuxtche:28858] Signal: Segmentation fault (11) [linuxtche:28858] Signal code: Address not mapped (1) [linuxtche:28858] Failing at address: 0x201143bf8 [linuxtche:28858] [ 0] /lib64/libpthread.so.0() [0x3600c0f0f0] [linuxtche:28858] [ 1] /home/magomedov/openmpi/lib/openmpi/mca_pml_ob1.so(+0xfc27) [0x7f5e94076c27] [linuxtche:28858] [ 2] /home/magomedov/openmpi/lib/openmpi/mca_btl_tcp.so(+0xadac) [0x7f5e935c3dac] [linuxtche:28858] [ 3] /home/magomedov/openmpi/lib/libopen-pal.so.0(+0x27611) [0x7f5e96575611] [linuxtche:28858] [ 4] /home/magomedov/openmpi/lib/libopen-pal.so.0(+0x27c57) [0x7f5e96575c57] [linuxtche:28858] [ 5] /home/magomedov/openmpi/lib/libopen-pal.so.0(opal_event_loop+0x1f) [0x7f5e96575848] [linuxtche:28858] [ 6] /home/magomedov/openmpi/lib/libopen-pal.so.0(opal_progress+0x89) [0x7f5e965648dd] [linuxtche:28858] [ 7] /home/magomedov/openmpi/lib/openmpi/mca_pml_ob1.so(+0x762f) [0x7f5e9406e62f] [linuxtche:28858] [ 8] /home/magomedov/openmpi/lib/openmpi/mca_pml_ob1.so(+0x777d) [0x7f5e9406e77d] [linuxtche:28858] [ 9] /home/magomedov/openmpi/lib/openmpi/mca_pml_ob1.so(+0x8246) [0x7f5e9406f246] [linuxtche:28858] [10] /home/magomedov/openmpi/lib/libmpi.so.0(MPI_Recv +0x2d2) [0x7f5e96af832c] [linuxtche:28858] [11] /home/magomedov/workspace/mpi-test/mpi-send-test(main+0x1e4) [0x400ee8] [linuxtche:28858] [12] /lib64/libc.so.6(__libc_start_main+0xfd) [0x360001eb1d] [linuxtche:28858] [13] /home/magomedov/workspace/mpi-test/mpi-send-test() [0x400c49] [linuxtche:28858] *** End of error message *** -- mpirun noticed that process rank 1 with PID 28858 on node linuxtche exited on signal 11 (Segmentation fault). -- If I am trying to send >= 65537 bytes from x86 I always get segfault on x86_64. I made some investigations and found that "bad" pointer always has a valid pointer actually in it's lower 32-bit word and "2" or "1" in it's upper word. Program segfaults in pml_ob1_recvfrag.c, in function mca_pml_ob1_recv_frag_callback_fin(), rdma pointer is broken. I inserted rdma = (mca_btl_base_descriptor_t*)((unsigned long)rdma & 0x); line which I believe truncates 64-bit pointer to 32 bits and segfaults disappeared. However, this is not the solution. After some investigations with gdb it seems to me like this pointer was sent to x86 machine and was received from it broken but I don't realize what is going on enough to fix it... Can anyone reproduce it? I got the same results on openmpi-1.4.2rc1 too. It looks like the same problem was described here http://www.open-mpi.org/community/lists/users/2010/02/12182.php in ompi-users list. -- Kind regards, Timur Magomedov Senior C++ Developer DevelopOnBox LLC / Zodiac Interactive http://www.zodiac.tv/ #include #include #include #include #include #include int main(int argc, char *argv[]) { int ret; int size; int rank; int name_len; char name[MPI_MAX_PROCESSOR_NAME]; int len; int sender = 0; int receiver = 1; uint8_t *val; MPI_Status stat; MPI_Init(&argc, &argv); MPI_Get_processor_name(name, &name_len); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); printf("*** processor %s, comm size is %d, my rank is %d, pid %u ***\n", name, size, rank, getpid()); if (argc != 2) { printf("Usage: %s message_length\n", argv[0]); exi
Re: [OMPI devel] New OMPI MPI extension
Fixed -- thanks! On Apr 22, 2010, at 12:35 AM, Rayson Ho wrote: > Hi Jeff, > > There's a typo in trunk/README: > > -> 1175 ...unrelated to wach other > > I guess you mean "unrelated to each other". > > Rayson > > > > On Wed, Apr 21, 2010 at 12:35 PM, Jeff Squyres wrote: > > Per the telecon Tuesday, I committed a new OMPI MPI extension to the trunk: > > > >https://svn.open-mpi.org/trac/ompi/changeset/23018 > > > > Please read the commit message and let me know what you think. Suggestions > > are welcome. > > > > If everyone is ok with it, I'd like to see this functionality hit the 1.5 > > series at some point. > > > > -- > > Jeff Squyres > > jsquy...@cisco.com > > For corporate legal information go to: > > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] kernel 2.6.23 vs 2.6.24 - communication/wait times
To sum up and give an update: The extended communication times while using shared memory communication of openmpi processes are caused by openmpi session directory laying on the network via NFS. The problem is resolved by establishing on each diskless node a ramdisk or mounting a tmpfs. By setting the MCA parameter orte_tmpdir_base to point to the according mountpoint shared memory communication and its files are kept local, thus decreasing the communication times by magnitudes. The relation of the problem to the kernel version is not really resolved, but maybe not "the problem" in this respect. My benchmark is now running fine on a single node with 4 CPU, kernel 2.6.33.1 and openmpi 1.4.1. Running on multiple nodes I experience still higher (TCP) communication times than I would expect. But that requires me some more deep researching the issue (e.g. collisions on the network) and should probably posted to a new thread. Thank you guys for your help. oli -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Re: [OMPI devel] kernel 2.6.23 vs 2.6.24 - communication/wait times
Oliver, Thank you for this summary insight. This substantially affects the structural design of software implementations, which points to a new analysis "opportunity" in our software. Ken Lloyd -Original Message- From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf Of Oliver Geisler Sent: Thursday, April 22, 2010 9:38 AM To: Open MPI Developers Subject: Re: [OMPI devel] kernel 2.6.23 vs 2.6.24 - communication/wait times To sum up and give an update: The extended communication times while using shared memory communication of openmpi processes are caused by openmpi session directory laying on the network via NFS. The problem is resolved by establishing on each diskless node a ramdisk or mounting a tmpfs. By setting the MCA parameter orte_tmpdir_base to point to the according mountpoint shared memory communication and its files are kept local, thus decreasing the communication times by magnitudes. The relation of the problem to the kernel version is not really resolved, but maybe not "the problem" in this respect. My benchmark is now running fine on a single node with 4 CPU, kernel 2.6.33.1 and openmpi 1.4.1. Running on multiple nodes I experience still higher (TCP) communication times than I would expect. But that requires me some more deep researching the issue (e.g. collisions on the network) and should probably posted to a new thread. Thank you guys for your help. oli -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] kernel 2.6.23 vs 2.6.24 - communication/wait times
Hello Oliver, thanks for the update. Just my $0.02: the upcoming Open MPI v1.5 will warn users, if their session directory is on NFS (or Lustre). Best regards, Rainer On Thursday 22 April 2010 11:37:48 am Oliver Geisler wrote: > To sum up and give an update: > > The extended communication times while using shared memory communication > of openmpi processes are caused by openmpi session directory laying on > the network via NFS. > > The problem is resolved by establishing on each diskless node a ramdisk > or mounting a tmpfs. By setting the MCA parameter orte_tmpdir_base to > point to the according mountpoint shared memory communication and its > files are kept local, thus decreasing the communication times by > magnitudes. > > The relation of the problem to the kernel version is not really > resolved, but maybe not "the problem" in this respect. > My benchmark is now running fine on a single node with 4 CPU, kernel > 2.6.33.1 and openmpi 1.4.1. > Running on multiple nodes I experience still higher (TCP) communication > times than I would expect. But that requires me some more deep > researching the issue (e.g. collisions on the network) and should > probably posted to a new thread. > > Thank you guys for your help. > > oli > -- Rainer Keller, PhD Tel: +1 (865) 241-6293 Oak Ridge National Lab Fax: +1 (865) 241-4811 PO Box 2008 MS 6164 Email: kel...@ornl.gov Oak Ridge, TN 37831-2008AIM/Skype: rusraink
Re: [OMPI devel] kernel 2.6.23 vs 2.6.24 - communication/wait times
On Apr 22, 2010, at 10:08 AM, Rainer Keller wrote: Hello Oliver, thanks for the update. Just my $0.02: the upcoming Open MPI v1.5 will warn users, if their session directory is on NFS (or Lustre). ... or panfs :-) Samuel K. Gutierrez Best regards, Rainer On Thursday 22 April 2010 11:37:48 am Oliver Geisler wrote: To sum up and give an update: The extended communication times while using shared memory communication of openmpi processes are caused by openmpi session directory laying on the network via NFS. The problem is resolved by establishing on each diskless node a ramdisk or mounting a tmpfs. By setting the MCA parameter orte_tmpdir_base to point to the according mountpoint shared memory communication and its files are kept local, thus decreasing the communication times by magnitudes. The relation of the problem to the kernel version is not really resolved, but maybe not "the problem" in this respect. My benchmark is now running fine on a single node with 4 CPU, kernel 2.6.33.1 and openmpi 1.4.1. Running on multiple nodes I experience still higher (TCP) communication times than I would expect. But that requires me some more deep researching the issue (e.g. collisions on the network) and should probably posted to a new thread. Thank you guys for your help. oli -- Rainer Keller, PhD Tel: +1 (865) 241-6293 Oak Ridge National Lab Fax: +1 (865) 241-4811 PO Box 2008 MS 6164 Email: kel...@ornl.gov Oak Ridge, TN 37831-2008AIM/Skype: rusraink ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] New OMPI MPI extension
Jeff, Seems like OMPI_Affinity_str() 's finest granularity is at the core level. However, in SGE (Sun Grid Engine) we also offer thread level (SMT) binding: http://wikis.sun.com/display/gridengine62u5/Using+Job+to+Core+Binding Will OpenMPI support thread level binding in the future?? BTW, another 2 typos in README: 1193subdirectory off <- directory "of" 1199 thse extensions <- "these" extensions Rayson On Thu, Apr 22, 2010 at 10:35 AM, Jeff Squyres wrote: > Fixed -- thanks! > > On Apr 22, 2010, at 12:35 AM, Rayson Ho wrote: > >> Hi Jeff, >> >> There's a typo in trunk/README: >> >> -> 1175 ...unrelated to wach other >> >> I guess you mean "unrelated to each other". >> >> Rayson >> >> >> >> On Wed, Apr 21, 2010 at 12:35 PM, Jeff Squyres wrote: >> > Per the telecon Tuesday, I committed a new OMPI MPI extension to the trunk: >> > >> > https://svn.open-mpi.org/trac/ompi/changeset/23018 >> > >> > Please read the commit message and let me know what you think. >> > Suggestions are welcome. >> > >> > If everyone is ok with it, I'd like to see this functionality hit the 1.5 >> > series at some point. >> > >> > -- >> > Jeff Squyres >> > jsquy...@cisco.com >> > For corporate legal information go to: >> > http://www.cisco.com/web/about/doing_business/legal/cri/ >> > >> > >> > ___ >> > devel mailing list >> > de...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] New OMPI MPI extension
On Apr 22, 2010, at 12:34 PM, Rayson Ho wrote: > Seems like OMPI_Affinity_str() 's finest granularity is at the core > level. However, in SGE (Sun Grid Engine) we also offer thread level > (SMT) binding: > > http://wikis.sun.com/display/gridengine62u5/Using+Job+to+Core+Binding > > Will OpenMPI support thread level binding in the future?? Yes, but two things have to happen first: 1. Successfully import hwloc. I tried importing hwloc 1.0rc1 earlier this week and ran into some problems; I unfortunately got side-tracked before I could fix them. I need to fix those and get hwloc 1.0 out the door (it isn't clear to me yet if the problem was in OMPI or hwloc; but I want to resolve it before hwloc hits v1.0). 2. Update our internal handling inside OMPI to understand hardware threads (and possibly boards). Our current internal APIs were written before hardware threads really mattered to HPC, so we need to do some updates. It probably won't be too hard to do, but it does touch a bunch of places in OPAL and ORTE. This likely puts OMPI hardware thread support in the 1.5.1 or 1.5.2 timeframe. > BTW, another 2 typos in README: > > 1193subdirectory off <- directory "of" > > 1199 thse extensions <- "these" extensions Awesome; thanks! I had apparently enabled "typo-mode" in emacs when I wrote this stuff. :-) -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/