Re: [OMPI users] Crashes over TCP/ethernet but not on shared memory
On Oct 24, 2008, at 12:10 PM, V. Ram wrote: Resuscitating this thread... Well, we spent some time testing the various options, and Leonardo's suggestion seems to work! We disabled TCP Segment Offloading on the e1000 NICs using "ethtool -K eth tso off" and this type of crash no longer happens. I hope this message can help anyone else experiencing the same issues. Thanks Leonardo! OMPI devs: does this imply bug(s) in the e1000 driver/chip? Should I contact the driver authors? Maybe? :-) I don't think that we do anything particularly whacky, TCP-wise -- we just open sockets and read/write plain vanilla data down the fd's. So it might be worth contacting them and asking if there are any known issues...? -- Jeff Squyres Cisco Systems
Re: [OMPI users] Crashes over TCP/ethernet but not on shared memory
Resuscitating this thread... Well, we spent some time testing the various options, and Leonardo's suggestion seems to work! We disabled TCP Segment Offloading on the e1000 NICs using "ethtool -K eth tso off" and this type of crash no longer happens. I hope this message can help anyone else experiencing the same issues. Thanks Leonardo! OMPI devs: does this imply bug(s) in the e1000 driver/chip? Should I contact the driver authors? On Fri, 10 Oct 2008 12:42:19 -0400, "V. Ram" said: > Leonardo, > > These nodes are all using intel e1000 chips. As the nodes are AMD > K7-based, these are the older chips, not the new ones with all the > eeprom issues with the newer kernel. > > The kernel in use is from the 2.6.22 family, and the e1000 driver is the > one shipped with the kernel. I am running it compiled into the kernel, > not as a module. > > When testing using the intel MPI Benchmarks, I found that increasing the > receive ring buffer size to the max (4096) helped performance, so I use > ethtool -G on startup. > > Checking ethtool -k, I see that tcp segment offload is on. I can try > turning that off to see what happens. > > Oddly, on 64-bit nodes using the tg3 driver, this code doesn't crash or > have these same issues, and I'm not having to turn off tso. > > Can anyone else suggest why the code might be crashing when running over > ethernet and not over shared memory? Any suggestions on how to debug > this or interpret the error message issued from btl_tcp_frag.c ? > > Thanks. > > > On Wed, 01 Oct 2008 18:11:34 +0200, "Leonardo Fialho" > said: > > Ram, > > > > What is the name and version of the kernel module for your NIC? I have > > experimented some similar with my tg3 module. The error which appeared > > for my was different: > > > > [btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv > > failed: No route to host (113) > > > > I solved it changing the following parameter in the linux kernel: > > > > /sbin/ethtool -K eth0 tso off > > > > Leonardo > > > > > > Aurélien Bouteiller escribió: > > > If you have several network cards in your system, it can sometime get > > > the endpoints confused. Especially if you don't have the same number > > > of cards or don't use the same subnet for all "eth0, eth1". You should > > > try to restrict Open MPI to use only one of the available networks by > > > using the --mca btl_tcp_if_include ethx parameter to mpirun, where x > > > is the network interface that is always connected to the same logical > > > and physical network on your machine. > > > > > > Aurelien > > > > > > Le 1 oct. 08 à 11:47, V. Ram a écrit : > > > > > >> I wrote earlier about one of my users running a third-party Fortran code > > >> on 32-bit x86 machines, using OMPI 1.2.7, that is having some odd crash > > >> behavior. > > >> > > >> Our cluster's nodes all have 2 single-core processors. If this code is > > >> run on 2 processors on 1 node, it runs seemingly fine. However, if the > > >> job runs on 1 processor on each of 2 nodes (e.g., mpirun --bynode), then > > >> it crashes and gives messages like: > > >> > > >> [node4][0,1,4][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] > > >> [node3][0,1,3][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] > > >> mca_btl_tcp_frag_recv: readv failed with errno=110 > > >> mca_btl_tcp_frag_recv: readv failed with errno=104 > > >> > > >> Essentially, if any network communication is involved, the job crashes > > >> in this form. > > >> > > >> I do have another user that runs his own MPI code on 10+ of these > > >> processors for days at a time without issue, so I don't think it's > > >> hardware. > > >> > > >> The original code also runs fine across many networked nodes if the > > >> architecture is x86-64 (also running OMPI 1.2.7). > > >> > > >> We have also tried different Fortran compilers (both PathScale and > > >> gfortran) and keep getting these crashes. > > >> > > >> Are there any suggestions on how to figure out if it's a problem with > > >> the code or the OMPI installation/software on the system? We have tried > > >> "--debug-daemons" with no new/interesting information being revealed. > > >> Is there a way to trap segfault messages or more detailed MPI > > >> transaction information or anything else that could help diagnose this? > > >> > > >> Thanks. > > >> -- > > >> V. Ram > > >> v_r_...@fastmail.fm > > >> > > >> -- > > >> http://www.fastmail.fm - Same, same, but different... > > >> > > >> ___ > > >> users mailing list > > >> us...@open-mpi.org > > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > > ___ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > -- > > Leonardo Fialho > > Computer Architecture and Operating Systems Department - CAOS > > Universidad Autonoma de Barcelona - UAB > > ETSE, Edifcio Q, QC/3088 > > http://www.
Re: [OMPI users] Crashes over TCP/ethernet but not on shared memory
On Oct 10, 2008, at 12:42 PM, V. Ram wrote: Can anyone else suggest why the code might be crashing when running over ethernet and not over shared memory? Any suggestions on how to debug this or interpret the error message issued from btl_tcp_frag.c ? Unfortunately this is a standard error message which do not enlighten us on what the real error is/was. It simply state that one node failed to read data from a socket, which usually happens when the remote peer died unexpectedly (such as a seg-fault). george.
Re: [OMPI users] Crashes over TCP/ethernet but not on shared memory
Leonardo, These nodes are all using intel e1000 chips. As the nodes are AMD K7-based, these are the older chips, not the new ones with all the eeprom issues with the newer kernel. The kernel in use is from the 2.6.22 family, and the e1000 driver is the one shipped with the kernel. I am running it compiled into the kernel, not as a module. When testing using the intel MPI Benchmarks, I found that increasing the receive ring buffer size to the max (4096) helped performance, so I use ethtool -G on startup. Checking ethtool -k, I see that tcp segment offload is on. I can try turning that off to see what happens. Oddly, on 64-bit nodes using the tg3 driver, this code doesn't crash or have these same issues, and I'm not having to turn off tso. Can anyone else suggest why the code might be crashing when running over ethernet and not over shared memory? Any suggestions on how to debug this or interpret the error message issued from btl_tcp_frag.c ? Thanks. On Wed, 01 Oct 2008 18:11:34 +0200, "Leonardo Fialho" said: > Ram, > > What is the name and version of the kernel module for your NIC? I have > experimented some similar with my tg3 module. The error which appeared > for my was different: > > [btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv > failed: No route to host (113) > > I solved it changing the following parameter in the linux kernel: > > /sbin/ethtool -K eth0 tso off > > Leonardo > > > Aurélien Bouteiller escribió: > > If you have several network cards in your system, it can sometime get > > the endpoints confused. Especially if you don't have the same number > > of cards or don't use the same subnet for all "eth0, eth1". You should > > try to restrict Open MPI to use only one of the available networks by > > using the --mca btl_tcp_if_include ethx parameter to mpirun, where x > > is the network interface that is always connected to the same logical > > and physical network on your machine. > > > > Aurelien > > > > Le 1 oct. 08 à 11:47, V. Ram a écrit : > > > >> I wrote earlier about one of my users running a third-party Fortran code > >> on 32-bit x86 machines, using OMPI 1.2.7, that is having some odd crash > >> behavior. > >> > >> Our cluster's nodes all have 2 single-core processors. If this code is > >> run on 2 processors on 1 node, it runs seemingly fine. However, if the > >> job runs on 1 processor on each of 2 nodes (e.g., mpirun --bynode), then > >> it crashes and gives messages like: > >> > >> [node4][0,1,4][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] > >> [node3][0,1,3][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] > >> mca_btl_tcp_frag_recv: readv failed with errno=110 > >> mca_btl_tcp_frag_recv: readv failed with errno=104 > >> > >> Essentially, if any network communication is involved, the job crashes > >> in this form. > >> > >> I do have another user that runs his own MPI code on 10+ of these > >> processors for days at a time without issue, so I don't think it's > >> hardware. > >> > >> The original code also runs fine across many networked nodes if the > >> architecture is x86-64 (also running OMPI 1.2.7). > >> > >> We have also tried different Fortran compilers (both PathScale and > >> gfortran) and keep getting these crashes. > >> > >> Are there any suggestions on how to figure out if it's a problem with > >> the code or the OMPI installation/software on the system? We have tried > >> "--debug-daemons" with no new/interesting information being revealed. > >> Is there a way to trap segfault messages or more detailed MPI > >> transaction information or anything else that could help diagnose this? > >> > >> Thanks. > >> -- > >> V. Ram > >> v_r_...@fastmail.fm > >> > >> -- > >> http://www.fastmail.fm - Same, same, but different... > >> > >> ___ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Leonardo Fialho > Computer Architecture and Operating Systems Department - CAOS > Universidad Autonoma de Barcelona - UAB > ETSE, Edifcio Q, QC/3088 > http://www.caos.uab.es > Phone: +34-93-581-2888 > Fax: +34-93-581-2478 > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- V. Ram v_r_...@fastmail.fm -- http://www.fastmail.fm - Faster than the air-speed velocity of an unladen european swallow
Re: [OMPI users] Crashes over TCP/ethernet but not on shared memory
Sorry for replying to this so late, but I have been away. Reply below... On Wed, 1 Oct 2008 11:58:30 -0400, "Aurélien Bouteiller" said: > If you have several network cards in your system, it can sometime get > the endpoints confused. Especially if you don't have the same number > of cards or don't use the same subnet for all "eth0, eth1". You should > try to restrict Open MPI to use only one of the available networks by > using the --mca btl_tcp_if_include ethx parameter to mpirun, where x > is the network interface that is always connected to the same logical > and physical network on your machine. I was pretty sure this wasn't the problem since basically all the nodes only have one interface configured, but I had the user try the --mca btl_tcp_if_include parameter. The same result / crash occurred. > > Aurelien > > Le 1 oct. 08 à 11:47, V. Ram a écrit : > > > I wrote earlier about one of my users running a third-party Fortran > > code > > on 32-bit x86 machines, using OMPI 1.2.7, that is having some odd > > crash > > behavior. > > > > Our cluster's nodes all have 2 single-core processors. If this code > > is > > run on 2 processors on 1 node, it runs seemingly fine. However, if > > the > > job runs on 1 processor on each of 2 nodes (e.g., mpirun --bynode), > > then > > it crashes and gives messages like: > > > > [node4][0,1,4][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] > > [node3][0,1,3][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] > > mca_btl_tcp_frag_recv: readv failed with errno=110 > > mca_btl_tcp_frag_recv: readv failed with errno=104 > > > > Essentially, if any network communication is involved, the job crashes > > in this form. > > > > I do have another user that runs his own MPI code on 10+ of these > > processors for days at a time without issue, so I don't think it's > > hardware. > > > > The original code also runs fine across many networked nodes if the > > architecture is x86-64 (also running OMPI 1.2.7). > > > > We have also tried different Fortran compilers (both PathScale and > > gfortran) and keep getting these crashes. > > > > Are there any suggestions on how to figure out if it's a problem with > > the code or the OMPI installation/software on the system? We have > > tried > > "--debug-daemons" with no new/interesting information being revealed. > > Is there a way to trap segfault messages or more detailed MPI > > transaction information or anything else that could help diagnose > > this? > > > > Thanks. > > -- > > V. Ram > > v_r_...@fastmail.fm -- V. Ram v_r_...@fastmail.fm -- http://www.fastmail.fm - A no graphics, no pop-ups email service
Re: [OMPI users] Crashes over TCP/ethernet but not on shared memory
Ram, What is the name and version of the kernel module for your NIC? I have experimented some similar with my tg3 module. The error which appeared for my was different: [btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: No route to host (113) I solved it changing the following parameter in the linux kernel: /sbin/ethtool -K eth0 tso off Leonardo Aurélien Bouteiller escribió: If you have several network cards in your system, it can sometime get the endpoints confused. Especially if you don't have the same number of cards or don't use the same subnet for all "eth0, eth1". You should try to restrict Open MPI to use only one of the available networks by using the --mca btl_tcp_if_include ethx parameter to mpirun, where x is the network interface that is always connected to the same logical and physical network on your machine. Aurelien Le 1 oct. 08 à 11:47, V. Ram a écrit : I wrote earlier about one of my users running a third-party Fortran code on 32-bit x86 machines, using OMPI 1.2.7, that is having some odd crash behavior. Our cluster's nodes all have 2 single-core processors. If this code is run on 2 processors on 1 node, it runs seemingly fine. However, if the job runs on 1 processor on each of 2 nodes (e.g., mpirun --bynode), then it crashes and gives messages like: [node4][0,1,4][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] [node3][0,1,3][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed with errno=110 mca_btl_tcp_frag_recv: readv failed with errno=104 Essentially, if any network communication is involved, the job crashes in this form. I do have another user that runs his own MPI code on 10+ of these processors for days at a time without issue, so I don't think it's hardware. The original code also runs fine across many networked nodes if the architecture is x86-64 (also running OMPI 1.2.7). We have also tried different Fortran compilers (both PathScale and gfortran) and keep getting these crashes. Are there any suggestions on how to figure out if it's a problem with the code or the OMPI installation/software on the system? We have tried "--debug-daemons" with no new/interesting information being revealed. Is there a way to trap segfault messages or more detailed MPI transaction information or anything else that could help diagnose this? Thanks. -- V. Ram v_r_...@fastmail.fm -- http://www.fastmail.fm - Same, same, but different... ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Leonardo Fialho Computer Architecture and Operating Systems Department - CAOS Universidad Autonoma de Barcelona - UAB ETSE, Edifcio Q, QC/3088 http://www.caos.uab.es Phone: +34-93-581-2888 Fax: +34-93-581-2478
Re: [OMPI users] Crashes over TCP/ethernet but not on shared memory
If you have several network cards in your system, it can sometime get the endpoints confused. Especially if you don't have the same number of cards or don't use the same subnet for all "eth0, eth1". You should try to restrict Open MPI to use only one of the available networks by using the --mca btl_tcp_if_include ethx parameter to mpirun, where x is the network interface that is always connected to the same logical and physical network on your machine. Aurelien Le 1 oct. 08 à 11:47, V. Ram a écrit : I wrote earlier about one of my users running a third-party Fortran code on 32-bit x86 machines, using OMPI 1.2.7, that is having some odd crash behavior. Our cluster's nodes all have 2 single-core processors. If this code is run on 2 processors on 1 node, it runs seemingly fine. However, if the job runs on 1 processor on each of 2 nodes (e.g., mpirun --bynode), then it crashes and gives messages like: [node4][0,1,4][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] [node3][0,1,3][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed with errno=110 mca_btl_tcp_frag_recv: readv failed with errno=104 Essentially, if any network communication is involved, the job crashes in this form. I do have another user that runs his own MPI code on 10+ of these processors for days at a time without issue, so I don't think it's hardware. The original code also runs fine across many networked nodes if the architecture is x86-64 (also running OMPI 1.2.7). We have also tried different Fortran compilers (both PathScale and gfortran) and keep getting these crashes. Are there any suggestions on how to figure out if it's a problem with the code or the OMPI installation/software on the system? We have tried "--debug-daemons" with no new/interesting information being revealed. Is there a way to trap segfault messages or more detailed MPI transaction information or anything else that could help diagnose this? Thanks. -- V. Ram v_r_...@fastmail.fm -- http://www.fastmail.fm - Same, same, but different... ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users