Re: bnx2x - occasional high packet loss (on LAN)

2015-10-09 Thread Nikola Ciprich
Hello Ariel,

finally, I was able to reproduce the problem with debug set as you
requested..

> > Two things you can collect for me to help debug this issue:
> > 
> > 1.
> > Output of ethtool -d eth1 after problem occurs (redirect to a file)
> 
> here it is:
> http://nik.lbox.cz/download/ethtool-eth4.txt
> http://nik.lbox.cz/download/ethtool-eth5.txt
> 
> > 2.
> > dmesg after enabling link related debug messages. Use
> > modprobe bnx2x debug=0x4
> > Or
> > ethtool -s eth1 msglvl 0x4

here it is:

http://nik.lbox.cz/download/dmesg3-box1.txt

http://nik.lbox.cz/download/dmesg3-box2.txt

will this help?

BR

nik


> 
> I have set this for both interfaces, but don't see anything new
> in dmesg..
> 
> 
> > to enable these prints.
> > 
> > This holds for both of your problems (unless it is the same issue).
> > 
> > Thanks,
> > Ariel
> > 
> 
> -- 
> -
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
> 
> tel.:   +420 591 166 214
> fax:+420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
> 
> mobil servis: +420 737 238 656
> email servis: ser...@linuxbox.cz
> -



-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpwEXwhvfybL.pgp
Description: PGP signature


RE: bnx2x - occasional high packet loss (on LAN)

2015-09-21 Thread Ariel Elior
> -Original Message-
> From: Nikola Ciprich [mailto:nikola.cipr...@linuxbox.cz]
> Sent: Monday, September 21, 2015 1:32 PM
> To: Ariel Elior <ariel.el...@qlogic.com>
> Cc: netdev <netdev@vger.kernel.org>; n...@linuxbox.cz
> Subject: Re: bnx2x - occasional high packet loss (on LAN)
> 
> Hello Ariel,
> 
> after few days of torturing NICs with flood pings, card
> seems to have given up with lots of errors..
> 
> I've uploaded new kernel log here:
> 
> http://nik.lbox.cz/download/dmesg.txt
> 
> Will this help?
> 
> I still have it in this hung state now, in case I could provide
> more info for diagnostics..
> 
> however please note this can be different problem then I was reporting
> originaly, since I only had high packet loss, while now whole card
> seems to be blocked..  but maybe it just is worse case of the same
> problem?

Hi Nikola,
Seems like the link below is the same file you shared before - I don't see any 
errors there.

Two things you can collect for me to help debug this issue:

1.
Output of ethtool -d eth1 after problem occurs (redirect to a file)

2.
dmesg after enabling link related debug messages. Use
modprobe bnx2x debug=0x4
Or
ethtool -s eth1 msglvl 0x4
to enable these prints.

This holds for both of your problems (unless it is the same issue).

Thanks,
Ariel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bnx2x - occasional high packet loss (on LAN)

2015-09-21 Thread Nikola Ciprich
Hello Ariel,

after few days of torturing NICs with flood pings, card
seems to have given up with lots of errors..

I've uploaded new kernel log here:

http://nik.lbox.cz/download/dmesg.txt

Will this help?

I still have it in this hung state now, in case I could provide
more info for diagnostics..

however please note this can be different problem then I was reporting
originaly, since I only had high packet loss, while now whole card
seems to be blocked..  but maybe it just is worse case of the same
problem?

BR

nik





On Wed, Sep 16, 2015 at 10:18:34AM +0200, Nikola Ciprich wrote:
> On Wed, Sep 16, 2015 at 08:15:41AM +, Ariel Elior wrote:
> > Hi Nikola,
> > Please provide dmesg output from your system.
> > Thanks,
> > Ariel
> 
> Hello Ariel,
> 
> here it is:
> 
> http://nik.lbox.cz/download/dmesg.txt
> 
> BR
> 
> nik
> 
> 
> > 
> 
> 
> -- 
> -
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
> 
> tel.:   +420 591 166 214
> fax:+420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
> 
> mobil servis: +420 737 238 656
> email servis: ser...@linuxbox.cz
> -



-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpaGhvGIm8WQ.pgp
Description: PGP signature


Re: bnx2x - occasional high packet loss (on LAN)

2015-09-21 Thread Nikola Ciprich
> Hi Nikola,
> Seems like the link below is the same file you shared before - I don't see 
> any errors there.

ouch, the file was correct, but the permissions were wrong..
so maybe you were getting older file from some proxy?

anyways, I've copied file so you can get it from
new location:

http://nik.lbox.cz/download/dmesg2.txt



> 
> Two things you can collect for me to help debug this issue:
> 
> 1.
> Output of ethtool -d eth1 after problem occurs (redirect to a file)

here it is:
http://nik.lbox.cz/download/ethtool-eth4.txt
http://nik.lbox.cz/download/ethtool-eth5.txt

> 2.
> dmesg after enabling link related debug messages. Use
> modprobe bnx2x debug=0x4
> Or
> ethtool -s eth1 msglvl 0x4

I have set this for both interfaces, but don't see anything new
in dmesg..


> to enable these prints.
> 
> This holds for both of your problems (unless it is the same issue).
> 
> Thanks,
> Ariel
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpv_dwk6X4To.pgp
Description: PGP signature


Re: bnx2x - occasional high packet loss (on LAN)

2015-09-16 Thread Nikola Ciprich
On Wed, Sep 16, 2015 at 08:15:41AM +, Ariel Elior wrote:
> Hi Nikola,
> Please provide dmesg output from your system.
> Thanks,
> Ariel

Hello Ariel,

here it is:

http://nik.lbox.cz/download/dmesg.txt

BR

nik


> 


-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgppmrmdfyDhb.pgp
Description: PGP signature


RE: bnx2x - occasional high packet loss (on LAN)

2015-09-16 Thread Ariel Elior
Hi Nikola,
Please provide dmesg output from your system.
Thanks,
Ariel

> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On
> Behalf Of Nikola Ciprich
> Sent: Tuesday, September 15, 2015 7:17 AM
> To: netdev <netdev@vger.kernel.org>
> Cc: n...@linuxbox.cz
> Subject: bnx2x - occasional high packet loss (on LAN)
> 
> Hello,
> 
> I'm trying to track strange issue with one of our servers and
> like to ask for recommendations..
> 
> I've got three node cluster (nodes A..C) interconnected with stacked broadcom
> ICX6610. eth0 of each box is connected to first switch, eth1 to second one,
> bonding set as follows: "mode=802.3ad lacp_rate=fast xmit_hash_policy=layer2+3
> miimon=100"
> 
> It happened few times, that suddenly eth1 on box A started misbehaving and
> communication
> with other nodes (ie flood ping) started dropping up to 30% packets. When 
> this port
> has been shut on both sides, problem immediately vanished.
> 
> We've tried replacing card, cable and using different port on switch, but 
> problem
> repeated again yesterday..
> 
> Since it's "only" loss, and not link loss, bonding doesn't help me much..
> 
> however during weekend, port also had strange link issue:
> 
> Sep 12 15:23:45 remrprv1a kernel: [676373.296786] bnx2x :03:00.1 eth1: NIC
> Link is Down
> Sep 12 15:23:46 remrprv1a kernel: [676373.356638] bond0: link status 
> definitely
> down for interface eth1, disabling it
> Sep 12 15:23:46 remrprv1a kernel: [676374.299571] bnx2x :03:00.1 eth1: NIC
> Link is Up, 1 Mbps full duplex, Flow control: ON - receive & transmit
> Sep 12 15:23:47 remrprv1a kernel: [676374.364428] bond0: link status 
> definitely up
> for interface eth1, 1 Mbps full duplex
> Sep 12 15:23:47 remrprv1a kernel: [676374.372902] bond0: first active 
> interface up!
> Sep 12 15:24:24 remrprv1a kernel: [676411.402511] bnx2x :03:00.1 eth1: NIC
> Link is Down
> Sep 12 15:24:24 remrprv1a kernel: [676411.407422] bond0: link status 
> definitely
> down for interface eth1, disabling it
> Sep 12 15:24:25 remrprv1a kernel: [676412.405311] bnx2x :03:00.1 eth1: NIC
> Link is Up, 1 Mbps full duplex, Flow control: ON - receive & transmit
> Sep 12 15:24:25 remrprv1a kernel: [676412.408123] bond0: link status 
> definitely up
> for interface eth1, 0 Mbps full duplex
> Sep 12 15:24:51 remrprv1a kernel: [676438.477641] bnx2x :03:00.1 eth1: NIC
> Link is Down
> Sep 12 15:24:51 remrprv1a kernel: [676438.528513] bond0: link status 
> definitely
> down for interface eth1, disabling it
> Sep 12 15:24:52 remrprv1a kernel: [676439.480472] bnx2x :03:00.1 eth1: NIC
> Link is Up, 1 Mbps full duplex, Flow control: ON - receive & transmit
> Sep 12 15:24:52 remrprv1a kernel: [676439.536282] bond0: link status 
> definitely up
> for interface eth1, 1 Mbps full duplex
> 
> 0mbps link speed is quite weird I guess..
> 
> all three boxes are the same, running centos6 based system, 4.0.5 x86_64 
> kernel.
> 
> The only difference I noticed on them is, that irqbalance was enabled on 
> problematic
> box and not on the others.. So I disabled it and rebooted the box.. The 
> problem is,
> I can't really wait for the problem to reappear, so I'd like to ask, has 
> anybody
> seen similar problem? I of so, was it fixed in some newer kernel release? I 
> haven't
> found mention in the changelogs, but still.. or does somebody have a hint on 
> what
> else
> I should check?
> 
> I'll try to reproduce this on test system (enabling irqbalance and doing some 
> network
> benchmarks, but I'd be most happy if I could prevent it on this production 
> system..)
> 
> thanks a lot for any advance
> 
> with best regards
> 
> nikola ciprich
> 
> PS: here's lspci -vv of eths.. should I provide any further information, 
> please let me
> know:
> 
> http://nik.lbox.cz/download/lspci.txt
> 
> --
> -
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
> 
> tel.:   +420 591 166 214
> fax:+420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
> 
> mobil servis: +420 737 238 656
> email servis: ser...@linuxbox.cz
> -
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


bnx2x - occasional high packet loss (on LAN)

2015-09-14 Thread Nikola Ciprich
Hello,

I'm trying to track strange issue with one of our servers and
like to ask for recommendations..

I've got three node cluster (nodes A..C) interconnected with stacked broadcom
ICX6610. eth0 of each box is connected to first switch, eth1 to second one,
bonding set as follows: "mode=802.3ad lacp_rate=fast xmit_hash_policy=layer2+3 
miimon=100"

It happened few times, that suddenly eth1 on box A started misbehaving and 
communication
with other nodes (ie flood ping) started dropping up to 30% packets. When this 
port
has been shut on both sides, problem immediately vanished.

We've tried replacing card, cable and using different port on switch, but 
problem
repeated again yesterday..

Since it's "only" loss, and not link loss, bonding doesn't help me much..

however during weekend, port also had strange link issue:

Sep 12 15:23:45 remrprv1a kernel: [676373.296786] bnx2x :03:00.1 eth1: NIC 
Link is Down
Sep 12 15:23:46 remrprv1a kernel: [676373.356638] bond0: link status definitely 
down for interface eth1, disabling it
Sep 12 15:23:46 remrprv1a kernel: [676374.299571] bnx2x :03:00.1 eth1: NIC 
Link is Up, 1 Mbps full duplex, Flow control: ON - receive & transmit
Sep 12 15:23:47 remrprv1a kernel: [676374.364428] bond0: link status definitely 
up for interface eth1, 1 Mbps full duplex
Sep 12 15:23:47 remrprv1a kernel: [676374.372902] bond0: first active interface 
up!
Sep 12 15:24:24 remrprv1a kernel: [676411.402511] bnx2x :03:00.1 eth1: NIC 
Link is Down
Sep 12 15:24:24 remrprv1a kernel: [676411.407422] bond0: link status definitely 
down for interface eth1, disabling it
Sep 12 15:24:25 remrprv1a kernel: [676412.405311] bnx2x :03:00.1 eth1: NIC 
Link is Up, 1 Mbps full duplex, Flow control: ON - receive & transmit
Sep 12 15:24:25 remrprv1a kernel: [676412.408123] bond0: link status definitely 
up for interface eth1, 0 Mbps full duplex
Sep 12 15:24:51 remrprv1a kernel: [676438.477641] bnx2x :03:00.1 eth1: NIC 
Link is Down
Sep 12 15:24:51 remrprv1a kernel: [676438.528513] bond0: link status definitely 
down for interface eth1, disabling it
Sep 12 15:24:52 remrprv1a kernel: [676439.480472] bnx2x :03:00.1 eth1: NIC 
Link is Up, 1 Mbps full duplex, Flow control: ON - receive & transmit
Sep 12 15:24:52 remrprv1a kernel: [676439.536282] bond0: link status definitely 
up for interface eth1, 1 Mbps full duplex

0mbps link speed is quite weird I guess..

all three boxes are the same, running centos6 based system, 4.0.5 x86_64 kernel.

The only difference I noticed on them is, that irqbalance was enabled on 
problematic
box and not on the others.. So I disabled it and rebooted the box.. The problem 
is,
I can't really wait for the problem to reappear, so I'd like to ask, has anybody
seen similar problem? I of so, was it fixed in some newer kernel release? I 
haven't
found mention in the changelogs, but still.. or does somebody have a hint on 
what else
I should check? 

I'll try to reproduce this on test system (enabling irqbalance and doing some 
network
benchmarks, but I'd be most happy if I could prevent it on this production 
system..)

thanks a lot for any advance

with best regards

nikola ciprich

PS: here's lspci -vv of eths.. should I provide any further information, please 
let me know:

http://nik.lbox.cz/download/lspci.txt

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgploXW8p3tLt.pgp
Description: PGP signature