Re: [OMPI users] Crashes over TCP/ethernet but not on shared memory

2008-10-27 Thread Jeff Squyres

On Oct 24, 2008, at 12:10 PM, V. Ram wrote:


Resuscitating this thread...

Well, we spent some time testing the various options, and Leonardo's
suggestion seems to work!

We disabled TCP Segment Offloading on the e1000 NICs using "ethtool -K
eth tso off" and this type of crash no longer happens.

I hope this message can help anyone else experiencing the same issues.
Thanks Leonardo!

OMPI devs: does this imply bug(s) in the e1000 driver/chip?  Should I
contact the driver authors?


Maybe?  :-)

I don't think that we do anything particularly whacky, TCP-wise -- we  
just open sockets and read/write plain vanilla data down the fd's.  So  
it might be worth contacting them and asking if there are any known  
issues...?


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Crashes over TCP/ethernet but not on shared memory

2008-10-24 Thread V. Ram
Resuscitating this thread...

Well, we spent some time testing the various options, and Leonardo's
suggestion seems to work!

We disabled TCP Segment Offloading on the e1000 NICs using "ethtool -K
eth tso off" and this type of crash no longer happens.

I hope this message can help anyone else experiencing the same issues. 
Thanks Leonardo!

OMPI devs: does this imply bug(s) in the e1000 driver/chip?  Should I
contact the driver authors?


On Fri, 10 Oct 2008 12:42:19 -0400, "V. Ram"  said:
> Leonardo,
> 
> These nodes are all using intel e1000 chips.  As the nodes are AMD
> K7-based, these are the older chips, not the new ones with all the
> eeprom issues with the newer kernel.
> 
> The kernel in use is from the 2.6.22 family, and the e1000 driver is the
> one shipped with the kernel.  I am running it compiled into the kernel,
> not as a module.
> 
> When testing using the intel MPI Benchmarks, I found that increasing the
> receive ring buffer size to the max (4096) helped performance, so I use
> ethtool -G on startup.
> 
> Checking ethtool -k, I see that tcp segment offload is on.  I can try
> turning that off to see what happens.
> 
> Oddly, on 64-bit nodes using the tg3 driver, this code doesn't crash or
> have these same issues, and I'm not having to turn off tso.
> 
> Can anyone else suggest why the code might be crashing when running over
> ethernet and not over shared memory?  Any suggestions on how to debug
> this or interpret the error message issued from btl_tcp_frag.c ?
> 
> Thanks.
> 
> 
> On Wed, 01 Oct 2008 18:11:34 +0200, "Leonardo Fialho"
>  said:
> > Ram,
> > 
> > What is the name and version of the kernel module for your NIC? I have 
> > experimented some similar with my tg3 module. The error which appeared 
> > for my was different:
> > 
> > [btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv 
> > failed: No route to host (113)
> > 
> > I solved it changing the following parameter in the linux kernel:
> > 
> > /sbin/ethtool -K eth0 tso off
> > 
> > Leonardo
> > 
> > 
> > Aurélien Bouteiller escribió:
> > > If you have several network cards in your system, it can sometime get 
> > > the endpoints confused. Especially if you don't have the same number 
> > > of cards or don't use the same subnet for all "eth0, eth1". You should 
> > > try to restrict Open MPI to use only one of the available networks by 
> > > using the --mca btl_tcp_if_include ethx parameter to mpirun, where x 
> > > is the network interface that is always connected to the same logical 
> > > and physical network on your machine.
> > >
> > > Aurelien
> > >
> > > Le 1 oct. 08 à 11:47, V. Ram a écrit :
> > >
> > >> I wrote earlier about one of my users running a third-party Fortran code
> > >> on 32-bit x86 machines, using OMPI 1.2.7, that is having some odd crash
> > >> behavior.
> > >>
> > >> Our cluster's nodes all have 2 single-core processors.  If this code is
> > >> run on 2 processors on 1 node, it runs seemingly fine.  However, if the
> > >> job runs on 1 processor on each of 2 nodes (e.g., mpirun --bynode), then
> > >> it crashes and gives messages like:
> > >>
> > >> [node4][0,1,4][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
> > >> [node3][0,1,3][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
> > >> mca_btl_tcp_frag_recv: readv failed with errno=110
> > >> mca_btl_tcp_frag_recv: readv failed with errno=104
> > >>
> > >> Essentially, if any network communication is involved, the job crashes
> > >> in this form.
> > >>
> > >> I do have another user that runs his own MPI code on 10+ of these
> > >> processors for days at a time without issue, so I don't think it's
> > >> hardware.
> > >>
> > >> The original code also runs fine across many networked nodes if the
> > >> architecture is x86-64 (also running OMPI 1.2.7).
> > >>
> > >> We have also tried different Fortran compilers (both PathScale and
> > >> gfortran) and keep getting these crashes.
> > >>
> > >> Are there any suggestions on how to figure out if it's a problem with
> > >> the code or the OMPI installation/software on the system? We have tried
> > >> "--debug-daemons" with no new/interesting information being revealed.
> > >> Is there a way to trap segfault messages or more detailed MPI
> > >> transaction information or anything else that could help diagnose this?
> > >>
> > >> Thanks.
> > >> -- 
> > >>  V. Ram
> > >>  v_r_...@fastmail.fm
> > >>
> > >> -- 
> > >> http://www.fastmail.fm - Same, same, but different...
> > >>
> > >> ___
> > >> users mailing list
> > >> us...@open-mpi.org
> > >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > >
> > > ___
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > 
> > 
> > -- 
> > Leonardo Fialho
> > Computer Architecture and Operating Systems Department - CAOS
> > Universidad Autonoma de Barcelona - UAB

Re: [OMPI users] Crashes over TCP/ethernet but not on shared memory

2008-10-10 Thread George Bosilca


On Oct 10, 2008, at 12:42 PM, V. Ram wrote:

Can anyone else suggest why the code might be crashing when running  
over

ethernet and not over shared memory?  Any suggestions on how to debug
this or interpret the error message issued from btl_tcp_frag.c ?


Unfortunately this is a standard error message which do not enlighten  
us on what the real error is/was. It simply state that one node failed  
to read data from a socket, which usually happens when the remote peer  
died unexpectedly (such as a seg-fault).


  george.



Re: [OMPI users] Crashes over TCP/ethernet but not on shared memory

2008-10-10 Thread V. Ram
Leonardo,

These nodes are all using intel e1000 chips.  As the nodes are AMD
K7-based, these are the older chips, not the new ones with all the
eeprom issues with the newer kernel.

The kernel in use is from the 2.6.22 family, and the e1000 driver is the
one shipped with the kernel.  I am running it compiled into the kernel,
not as a module.

When testing using the intel MPI Benchmarks, I found that increasing the
receive ring buffer size to the max (4096) helped performance, so I use
ethtool -G on startup.

Checking ethtool -k, I see that tcp segment offload is on.  I can try
turning that off to see what happens.

Oddly, on 64-bit nodes using the tg3 driver, this code doesn't crash or
have these same issues, and I'm not having to turn off tso.

Can anyone else suggest why the code might be crashing when running over
ethernet and not over shared memory?  Any suggestions on how to debug
this or interpret the error message issued from btl_tcp_frag.c ?

Thanks.


On Wed, 01 Oct 2008 18:11:34 +0200, "Leonardo Fialho"
 said:
> Ram,
> 
> What is the name and version of the kernel module for your NIC? I have 
> experimented some similar with my tg3 module. The error which appeared 
> for my was different:
> 
> [btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv 
> failed: No route to host (113)
> 
> I solved it changing the following parameter in the linux kernel:
> 
> /sbin/ethtool -K eth0 tso off
> 
> Leonardo
> 
> 
> Aurélien Bouteiller escribió:
> > If you have several network cards in your system, it can sometime get 
> > the endpoints confused. Especially if you don't have the same number 
> > of cards or don't use the same subnet for all "eth0, eth1". You should 
> > try to restrict Open MPI to use only one of the available networks by 
> > using the --mca btl_tcp_if_include ethx parameter to mpirun, where x 
> > is the network interface that is always connected to the same logical 
> > and physical network on your machine.
> >
> > Aurelien
> >
> > Le 1 oct. 08 à 11:47, V. Ram a écrit :
> >
> >> I wrote earlier about one of my users running a third-party Fortran code
> >> on 32-bit x86 machines, using OMPI 1.2.7, that is having some odd crash
> >> behavior.
> >>
> >> Our cluster's nodes all have 2 single-core processors.  If this code is
> >> run on 2 processors on 1 node, it runs seemingly fine.  However, if the
> >> job runs on 1 processor on each of 2 nodes (e.g., mpirun --bynode), then
> >> it crashes and gives messages like:
> >>
> >> [node4][0,1,4][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
> >> [node3][0,1,3][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
> >> mca_btl_tcp_frag_recv: readv failed with errno=110
> >> mca_btl_tcp_frag_recv: readv failed with errno=104
> >>
> >> Essentially, if any network communication is involved, the job crashes
> >> in this form.
> >>
> >> I do have another user that runs his own MPI code on 10+ of these
> >> processors for days at a time without issue, so I don't think it's
> >> hardware.
> >>
> >> The original code also runs fine across many networked nodes if the
> >> architecture is x86-64 (also running OMPI 1.2.7).
> >>
> >> We have also tried different Fortran compilers (both PathScale and
> >> gfortran) and keep getting these crashes.
> >>
> >> Are there any suggestions on how to figure out if it's a problem with
> >> the code or the OMPI installation/software on the system? We have tried
> >> "--debug-daemons" with no new/interesting information being revealed.
> >> Is there a way to trap segfault messages or more detailed MPI
> >> transaction information or anything else that could help diagnose this?
> >>
> >> Thanks.
> >> -- 
> >>  V. Ram
> >>  v_r_...@fastmail.fm
> >>
> >> -- 
> >> http://www.fastmail.fm - Same, same, but different...
> >>
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Leonardo Fialho
> Computer Architecture and Operating Systems Department - CAOS
> Universidad Autonoma de Barcelona - UAB
> ETSE, Edifcio Q, QC/3088
> http://www.caos.uab.es
> Phone: +34-93-581-2888
> Fax: +34-93-581-2478
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
-- 
  V. Ram
  v_r_...@fastmail.fm

-- 
http://www.fastmail.fm - Faster than the air-speed velocity of an
  unladen european swallow




Re: [OMPI users] Crashes over TCP/ethernet but not on shared memory

2008-10-10 Thread V. Ram
Sorry for replying to this so late, but I have been away.  Reply
below...

On Wed, 1 Oct 2008 11:58:30 -0400, "Aurélien Bouteiller"
 said:
> If you have several network cards in your system, it can sometime get  
> the endpoints confused. Especially if you don't have the same number  
> of cards or don't use the same subnet for all "eth0, eth1". You should  
> try to restrict Open MPI to use only one of the available networks by  
> using the --mca btl_tcp_if_include ethx parameter to mpirun, where x  
> is the network interface that is always connected to the same logical  
> and physical network on your machine.

I was pretty sure this wasn't the problem since basically all the nodes
only have one interface configured, but I had the user try the --mca
btl_tcp_if_include parameter.  The same result / crash occurred.

> 
> Aurelien
> 
> Le 1 oct. 08 à 11:47, V. Ram a écrit :
> 
> > I wrote earlier about one of my users running a third-party Fortran  
> > code
> > on 32-bit x86 machines, using OMPI 1.2.7, that is having some odd  
> > crash
> > behavior.
> >
> > Our cluster's nodes all have 2 single-core processors.  If this code  
> > is
> > run on 2 processors on 1 node, it runs seemingly fine.  However, if  
> > the
> > job runs on 1 processor on each of 2 nodes (e.g., mpirun --bynode),  
> > then
> > it crashes and gives messages like:
> >
> > [node4][0,1,4][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
> > [node3][0,1,3][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
> > mca_btl_tcp_frag_recv: readv failed with errno=110
> > mca_btl_tcp_frag_recv: readv failed with errno=104
> >
> > Essentially, if any network communication is involved, the job crashes
> > in this form.
> >
> > I do have another user that runs his own MPI code on 10+ of these
> > processors for days at a time without issue, so I don't think it's
> > hardware.
> >
> > The original code also runs fine across many networked nodes if the
> > architecture is x86-64 (also running OMPI 1.2.7).
> >
> > We have also tried different Fortran compilers (both PathScale and
> > gfortran) and keep getting these crashes.
> >
> > Are there any suggestions on how to figure out if it's a problem with
> > the code or the OMPI installation/software on the system? We have  
> > tried
> > "--debug-daemons" with no new/interesting information being revealed.
> > Is there a way to trap segfault messages or more detailed MPI
> > transaction information or anything else that could help diagnose  
> > this?
> >
> > Thanks.
> > -- 
> >  V. Ram
> >  v_r_...@fastmail.fm
-- 
  V. Ram
  v_r_...@fastmail.fm

-- 
http://www.fastmail.fm - A no graphics, no pop-ups email service




Re: [OMPI users] Crashes over TCP/ethernet but not on shared memory

2008-10-01 Thread Leonardo Fialho

Ram,

What is the name and version of the kernel module for your NIC? I have 
experimented some similar with my tg3 module. The error which appeared 
for my was different:


[btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv 
failed: No route to host (113)


I solved it changing the following parameter in the linux kernel:

/sbin/ethtool -K eth0 tso off

Leonardo


Aurélien Bouteiller escribió:
If you have several network cards in your system, it can sometime get 
the endpoints confused. Especially if you don't have the same number 
of cards or don't use the same subnet for all "eth0, eth1". You should 
try to restrict Open MPI to use only one of the available networks by 
using the --mca btl_tcp_if_include ethx parameter to mpirun, where x 
is the network interface that is always connected to the same logical 
and physical network on your machine.


Aurelien

Le 1 oct. 08 à 11:47, V. Ram a écrit :


I wrote earlier about one of my users running a third-party Fortran code
on 32-bit x86 machines, using OMPI 1.2.7, that is having some odd crash
behavior.

Our cluster's nodes all have 2 single-core processors.  If this code is
run on 2 processors on 1 node, it runs seemingly fine.  However, if the
job runs on 1 processor on each of 2 nodes (e.g., mpirun --bynode), then
it crashes and gives messages like:

[node4][0,1,4][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
[node3][0,1,3][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
mca_btl_tcp_frag_recv: readv failed with errno=110
mca_btl_tcp_frag_recv: readv failed with errno=104

Essentially, if any network communication is involved, the job crashes
in this form.

I do have another user that runs his own MPI code on 10+ of these
processors for days at a time without issue, so I don't think it's
hardware.

The original code also runs fine across many networked nodes if the
architecture is x86-64 (also running OMPI 1.2.7).

We have also tried different Fortran compilers (both PathScale and
gfortran) and keep getting these crashes.

Are there any suggestions on how to figure out if it's a problem with
the code or the OMPI installation/software on the system? We have tried
"--debug-daemons" with no new/interesting information being revealed.
Is there a way to trap segfault messages or more detailed MPI
transaction information or anything else that could help diagnose this?

Thanks.
--
 V. Ram
 v_r_...@fastmail.fm

--
http://www.fastmail.fm - Same, same, but different...

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-93-581-2478



Re: [OMPI users] Crashes over TCP/ethernet but not on shared memory

2008-10-01 Thread Aurélien Bouteiller
If you have several network cards in your system, it can sometime get  
the endpoints confused. Especially if you don't have the same number  
of cards or don't use the same subnet for all "eth0, eth1". You should  
try to restrict Open MPI to use only one of the available networks by  
using the --mca btl_tcp_if_include ethx parameter to mpirun, where x  
is the network interface that is always connected to the same logical  
and physical network on your machine.


Aurelien

Le 1 oct. 08 à 11:47, V. Ram a écrit :

I wrote earlier about one of my users running a third-party Fortran  
code
on 32-bit x86 machines, using OMPI 1.2.7, that is having some odd  
crash

behavior.

Our cluster's nodes all have 2 single-core processors.  If this code  
is
run on 2 processors on 1 node, it runs seemingly fine.  However, if  
the
job runs on 1 processor on each of 2 nodes (e.g., mpirun --bynode),  
then

it crashes and gives messages like:

[node4][0,1,4][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
[node3][0,1,3][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
mca_btl_tcp_frag_recv: readv failed with errno=110
mca_btl_tcp_frag_recv: readv failed with errno=104

Essentially, if any network communication is involved, the job crashes
in this form.

I do have another user that runs his own MPI code on 10+ of these
processors for days at a time without issue, so I don't think it's
hardware.

The original code also runs fine across many networked nodes if the
architecture is x86-64 (also running OMPI 1.2.7).

We have also tried different Fortran compilers (both PathScale and
gfortran) and keep getting these crashes.

Are there any suggestions on how to figure out if it's a problem with
the code or the OMPI installation/software on the system? We have  
tried

"--debug-daemons" with no new/interesting information being revealed.
Is there a way to trap segfault messages or more detailed MPI
transaction information or anything else that could help diagnose  
this?


Thanks.
--
 V. Ram
 v_r_...@fastmail.fm

--
http://www.fastmail.fm - Same, same, but different...

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





[OMPI users] Crashes over TCP/ethernet but not on shared memory

2008-10-01 Thread V. Ram
I wrote earlier about one of my users running a third-party Fortran code
on 32-bit x86 machines, using OMPI 1.2.7, that is having some odd crash
behavior.

Our cluster's nodes all have 2 single-core processors.  If this code is
run on 2 processors on 1 node, it runs seemingly fine.  However, if the
job runs on 1 processor on each of 2 nodes (e.g., mpirun --bynode), then
it crashes and gives messages like:

[node4][0,1,4][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
[node3][0,1,3][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
mca_btl_tcp_frag_recv: readv failed with errno=110
mca_btl_tcp_frag_recv: readv failed with errno=104

Essentially, if any network communication is involved, the job crashes
in this form.

I do have another user that runs his own MPI code on 10+ of these
processors for days at a time without issue, so I don't think it's
hardware.

The original code also runs fine across many networked nodes if the
architecture is x86-64 (also running OMPI 1.2.7).

We have also tried different Fortran compilers (both PathScale and
gfortran) and keep getting these crashes.

Are there any suggestions on how to figure out if it's a problem with
the code or the OMPI installation/software on the system? We have tried
"--debug-daemons" with no new/interesting information being revealed. 
Is there a way to trap segfault messages or more detailed MPI
transaction information or anything else that could help diagnose this?

Thanks.
-- 
  V. Ram
  v_r_...@fastmail.fm

-- 
http://www.fastmail.fm - Same, same, but different...