Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-03-02 Thread Nick Rogers
Second that. Daily panics using a Tyan board w/ BCM5704. Unfortunately
unable to provide crash dump and I was forced to use a different NIC. But
for what its worth here is the relevant pciconf -lv output.

b...@pci0:2:9:0: class=0x02 card=0x164814e4 chip=0x164814e4 rev=0x03
hdr=0x00
vendor = 'Broadcom Corporation'
device = 'NetXtreme Dual Gigabit Adapter (BCM5704)'
class  = network
subclass   = ethernet
b...@pci0:2:9:1: class=0x02 card=0x164814e4 chip=0x164814e4 rev=0x03
hdr=0x00
vendor = 'Broadcom Corporation'
device = 'NetXtreme Dual Gigabit Adapter (BCM5704)'
class  = network
subclass   = ethernet


On Sat, Feb 27, 2010 at 2:50 PM, Erik Klavon  wrote:
>
> I have BCM5704 hardware (Tyan S2882 system board). I am seeing kernel
> panics very similar to those described in this thread on this
> hardware. pciconf -lcv output below. If you'd like access to this
> hardware I can arrange it; please contact me off list.
>
> Erik
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-03-01 Thread Dmitry Rybin
Broadcom 5714 5715 - no problems.

2010/2/22 Denis Lamanov :
> Yes, PCIX BCM5704
> FreeBSD vpn2 8.0-STABLE FreeBSD 8.0-STABLE #1 r204028: Thu Feb 18 08:29:42
> EET 2010     ad...@vpn2:/usr/obj/usr/src/sys/GENERIC  i386
>
> 2010/2/22 Pyun YongHyeon 
>
>> On Mon, Feb 22, 2010 at 03:17:17PM +0200, Denis Lamanov wrote:
>> > I see same trouble (lost packets after 4 day uptime and reboot) :(
>> >
>> > dev.bge.0.stats.rx.FCSErrors: 18
>> >
>>
>> You also have PCIX BCM5704 controller? What FreeBSD version do you
>> use?
>>
>> > 2010/2/19 Slawa Olhovchenkov 
>> >
>> > > On Fri, Feb 19, 2010 at 12:06:47PM -0800, Pyun YongHyeon wrote:
>> > >
>> > > >
>> > > > > dev.bge.1.stats.rx.Fragments: 1
>> > > >
>> > > > You received a frame that is less than 64 bytes with a bad FCS.
>> > > >
>> > > > > dev.bge.1.stats.rx.UcastPkts: 2956515
>> > > > > dev.bge.1.stats.rx.MulticastPkts: 0
>> > > > > dev.bge.1.stats.rx.FCSErrors: 18
>> > > >
>> > > > You have a lot of FCS errors here.
>> > > > Please double check cabling. If the statistics counter is right,
>> > > > sender is guilty or you have bad cabling issues here.
>> > >
>> > > 1. lost packets much more 18. I think hundreds, or thousands.
>> > > 2. packets lost on both (bge0 & bge1) interfaces
>> > > 3. packets don't lost on sources at Aug'09
>> > > ___
>> > > freebsd-stable@freebsd.org mailing list
>> > > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> > > To unsubscribe, send any mail to "
>> freebsd-stable-unsubscr...@freebsd.org"
>> > >
>>
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-27 Thread Erik Klavon
Hi Pyun,

On Fri, Feb 19, 2010 at 01:12:01PM -0800, Pyun YongHyeon wrote:
> Since I don't have BCM5704 hardware it's hard to find which
> revision may affect to this issue. Could you narrow down which
> revision number started showing the issue?

I have BCM5704 hardware (Tyan S2882 system board). I am seeing kernel
panics very similar to those described in this thread on this
hardware. pciconf -lcv output below. If you'd like access to this
hardware I can arrange it; please contact me off list.

Erik

b...@pci0:2:9:0:class=0x02 card=0x164414e4 chip=0x164814e4 rev=0x03 
hdr=0x00
vendor = 'Broadcom Corporation'
device = 'NetXtreme Dual Gigabit Adapter (BCM5704)'
class  = network
subclass   = ethernet
cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 1 split 
transaction
cap 01[48] = powerspec 2  supports D0 D3  current D0
cap 03[50] = VPD
cap 05[58] = MSI supports 8 messages, 64 bit 
b...@pci0:2:9:1:class=0x02 card=0x164414e4 chip=0x164814e4 rev=0x03 
hdr=0x00
vendor = 'Broadcom Corporation'
device = 'NetXtreme Dual Gigabit Adapter (BCM5704)'
class  = network
subclass   = ethernet
cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 1 split 
transaction
cap 01[48] = powerspec 2  supports D0 D3  current D0
cap 03[50] = VPD
cap 05[58] = MSI supports 8 messages, 64 bit 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-22 Thread Denis Lamanov
Yes, PCIX BCM5704
FreeBSD vpn2 8.0-STABLE FreeBSD 8.0-STABLE #1 r204028: Thu Feb 18 08:29:42
EET 2010 ad...@vpn2:/usr/obj/usr/src/sys/GENERIC  i386

2010/2/22 Pyun YongHyeon 

> On Mon, Feb 22, 2010 at 03:17:17PM +0200, Denis Lamanov wrote:
> > I see same trouble (lost packets after 4 day uptime and reboot) :(
> >
> > dev.bge.0.stats.rx.FCSErrors: 18
> >
>
> You also have PCIX BCM5704 controller? What FreeBSD version do you
> use?
>
> > 2010/2/19 Slawa Olhovchenkov 
> >
> > > On Fri, Feb 19, 2010 at 12:06:47PM -0800, Pyun YongHyeon wrote:
> > >
> > > >
> > > > > dev.bge.1.stats.rx.Fragments: 1
> > > >
> > > > You received a frame that is less than 64 bytes with a bad FCS.
> > > >
> > > > > dev.bge.1.stats.rx.UcastPkts: 2956515
> > > > > dev.bge.1.stats.rx.MulticastPkts: 0
> > > > > dev.bge.1.stats.rx.FCSErrors: 18
> > > >
> > > > You have a lot of FCS errors here.
> > > > Please double check cabling. If the statistics counter is right,
> > > > sender is guilty or you have bad cabling issues here.
> > >
> > > 1. lost packets much more 18. I think hundreds, or thousands.
> > > 2. packets lost on both (bge0 & bge1) interfaces
> > > 3. packets don't lost on sources at Aug'09
> > > ___
> > > freebsd-stable@freebsd.org mailing list
> > > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > > To unsubscribe, send any mail to "
> freebsd-stable-unsubscr...@freebsd.org"
> > >
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-22 Thread Pyun YongHyeon
On Mon, Feb 22, 2010 at 03:17:17PM +0200, Denis Lamanov wrote:
> I see same trouble (lost packets after 4 day uptime and reboot) :(
> 
> dev.bge.0.stats.rx.FCSErrors: 18
> 

You also have PCIX BCM5704 controller? What FreeBSD version do you
use?

> 2010/2/19 Slawa Olhovchenkov 
> 
> > On Fri, Feb 19, 2010 at 12:06:47PM -0800, Pyun YongHyeon wrote:
> >
> > >
> > > > dev.bge.1.stats.rx.Fragments: 1
> > >
> > > You received a frame that is less than 64 bytes with a bad FCS.
> > >
> > > > dev.bge.1.stats.rx.UcastPkts: 2956515
> > > > dev.bge.1.stats.rx.MulticastPkts: 0
> > > > dev.bge.1.stats.rx.FCSErrors: 18
> > >
> > > You have a lot of FCS errors here.
> > > Please double check cabling. If the statistics counter is right,
> > > sender is guilty or you have bad cabling issues here.
> >
> > 1. lost packets much more 18. I think hundreds, or thousands.
> > 2. packets lost on both (bge0 & bge1) interfaces
> > 3. packets don't lost on sources at Aug'09
> > ___
> > freebsd-stable@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
> >
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-22 Thread Slawa Olhovchenkov
On Mon, Feb 22, 2010 at 10:18:47AM +0300, Slawa Olhovchenkov wrote:

> On Sun, Feb 21, 2010 at 03:41:53PM -0800, Pyun YongHyeon wrote:
> 
> > On Sun, Feb 21, 2010 at 12:44:50AM +0300, Slawa Olhovchenkov wrote:
> > > On Fri, Feb 19, 2010 at 01:12:01PM -0800, Pyun YongHyeon wrote:
> > > 
> > > > Normally you should not have any FCS errors, it could be related
> > > > with signal quality and these errors might not be correctly
> > > > counted.
> > > 
> > > I can't check cable and switch counters on bge1 before Feb 24.
> > > 
> > > > > 3. packets don't lost on sources at Aug'09
> > > > 
> > > > Since I don't have BCM5704 hardware it's hard to find which
> > > > revision may affect to this issue. Could you narrow down which
> > > > revision number started showing the issue?
> > > 
> > > I am don't update source between Aug'09 and Feb 16.
> > > 
> > 
> > There were many bge(4) changes in that time frame. So it's hard to
> > find which commit is guilty for the packet drop issue. If you can
> > narrow down possible changes that might affect the issue that could
> > help me a lot. You can do binary searching technique for the SVN
> > revisions to know possible candidates.
> > http://svn.freebsd.org/viewvc/base/head/sys/dev/bge/if_bge.c
> 
> How I can do this?
> I don't work w/ svn before and don't know optimal way for one file.

mail# rm sys/dev/bge/*
mail# svn checkout -r 201697 svn://svn.freebsd.org/base/stable/8/sys/dev/bge/ 
sys/dev/bge
Asys/dev/bge/if_bgereg.h
Asys/dev/bge/if_bge.c
Checked out revision 201697.
mail# make -DNO_CLEAN -DKERNFAST buildkernel
===> bge (all)
cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc   
-DHAVE_KERNEL_OPTION_HEADERS -include /usr/obj/usr/src/sys/MAIL/opt_global.h 
-I. -I@ -I@/contrib/altq -finline-limit=8000 --param inline-unit-growth=100 
--param large-function-growth=1000 -fno-common -g -fno-omit-frame-pointer 
-I/usr/obj/usr/src/sys/MAIL -mcmodel=kernel -mno-red-zone  -mfpmath=387 
-mno-sse -mno-sse2 -mno-sse3 -mno-mmx -mno-3dnow  -msoft-float 
-fno-asynchronous-unwind-tables -ffreestanding -fstack-protector 
-std=iso9899:1999 -fstack-protector -Wall -Wredundant-decls -Wnested-externs 
-Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  
-Wundef -Wno-pointer-sign -fformat-extensions -c 
/usr/src/sys/modules/bge/../../dev/bge/if_bge.c
ld  -d -warn-common -r -d -o if_bge.ko.debug if_bge.o
:> export_syms
awk -f /usr/src/sys/conf/kmod_syms.awk if_bge.ko.debug  export_syms | xargs -J% 
objcopy % if_bge.ko.debug
objcopy --only-keep-debug if_bge.ko.debug if_bge.ko.symbols
objcopy --strip-debug --add-gnu-debuglink=if_bge.ko.symbols if_bge.ko.debug 
if_bge.ko
===> mii (all)
cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc   
-DHAVE_KERNEL_OPTION_HEADERS -include /usr/obj/usr/src/sys/MAIL/opt_global.h 
-I. -I@ -I@/contrib/altq -finline-limit=8000 --param inline-unit-growth=100 
--param large-function-growth=1000 -fno-common -g -fno-omit-frame-pointer 
-I/usr/obj/usr/src/sys/MAIL -mcmodel=kernel -mno-red-zone  -mfpmath=387 
-mno-sse -mno-sse2 -mno-sse3 -mno-mmx -mno-3dnow  -msoft-float 
-fno-asynchronous-unwind-tables -ffreestanding -fstack-protector 
-std=iso9899:1999 -fstack-protector -Wall -Wredundant-decls -Wnested-externs 
-Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  
-Wundef -Wno-pointer-sign -fformat-extensions -c 
/usr/src/sys/modules/mii/../../dev/mii/brgphy.c
ld  -d -warn-common -r -d -o miibus.ko.debug acphy.o amphy.o atphy.o axphy.o 
bmtphy.o brgphy.o ciphy.o e1000phy.o exphy.o gentbi.o icsphy.o inphy.o 
ip1000phy.o jmphy.o lxtphy.o miibus_if.o mii.o mii_physubr.o mlphy.o nsgphy.o 
nsphy.o nsphyter.o pnaphy.o qsphy.o rgephy.o rlphy.o ruephy.o tdkphy.o tlphy.o 
truephy.o ukphy.o ukphy_subr.o xmphy.o
echo mii_mediachgmii_phy_probe   mii_phy_reset   mii_pollstat
mii_tick > export_syms
awk -f /usr/src/sys/conf/kmod_syms.awk miibus.ko.debug  export_syms | xargs -J% 
objcopy % miibus.ko.debug
objcopy --only-keep-debug miibus.ko.debug miibus.ko.symbols
objcopy --strip-debug --add-gnu-debuglink=miibus.ko.symbols miibus.ko.debug 
miibus.ko

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-22 Thread Denis Lamanov
vpn2# sysctl dev.bge.0.stats
dev.bge.0.stats.FramesDroppedDueToFilters: 0
dev.bge.0.stats.DmaWriteQueueFull: 0
dev.bge.0.stats.DmaWriteHighPriQueueFull: 0
dev.bge.0.stats.NoMoreRxBDs: 0
dev.bge.0.stats.InputDiscards: 0
dev.bge.0.stats.InputErrors: 0
dev.bge.0.stats.RecvThresholdHit: 36622
dev.bge.0.stats.DmaReadQueueFull: 17
dev.bge.0.stats.DmaReadHighPriQueueFull: 0
dev.bge.0.stats.SendDataCompQueueFull: 0
dev.bge.0.stats.RingSetSendProdIndex: 116130
dev.bge.0.stats.RingStatusUpdate: 79240
dev.bge.0.stats.Interrupts: 79240
dev.bge.0.stats.AvoidedInterrupts: 0
dev.bge.0.stats.SendThresholdHit: 0
dev.bge.0.stats.rx.Octets: 132390898
dev.bge.0.stats.rx.Fragments: 0
dev.bge.0.stats.rx.UcastPkts: 117696
dev.bge.0.stats.rx.MulticastPkts: 1
dev.bge.0.stats.rx.FCSErrors: 41
dev.bge.0.stats.rx.AlignmentErrors: 0
dev.bge.0.stats.rx.xonPauseFramesReceived: 0
dev.bge.0.stats.rx.xoffPauseFramesReceived: 0
dev.bge.0.stats.rx.ControlFramesReceived: 0
dev.bge.0.stats.rx.xoffStateEntered: 0
dev.bge.0.stats.rx.FramesTooLong: 0
dev.bge.0.stats.rx.Jabbers: 0
dev.bge.0.stats.rx.UndersizePkts: 0
dev.bge.0.stats.rx.inRangeLengthError: 0
dev.bge.0.stats.rx.outRangeLengthError: 0
dev.bge.0.stats.tx.Octets: 125971311
dev.bge.0.stats.tx.Collisions: 0
dev.bge.0.stats.tx.XonSent: 0
dev.bge.0.stats.tx.XoffSent: 0
dev.bge.0.stats.tx.flowControlDone: 0
dev.bge.0.stats.tx.InternalMacTransmitErrors: 0
dev.bge.0.stats.tx.SingleCollisionFrames: 0
dev.bge.0.stats.tx.MultipleCollisionFrames: 0
dev.bge.0.stats.tx.DeferredTransmissions: 0
dev.bge.0.stats.tx.ExcessiveCollisions: 0
dev.bge.0.stats.tx.LateCollisions: 0
dev.bge.0.stats.tx.UcastPkts: 115417
dev.bge.0.stats.tx.MulticastPkts: 0
dev.bge.0.stats.tx.BroadcastPkts: 0
dev.bge.0.stats.tx.CarrierSenseErrors: 0
dev.bge.0.stats.tx.Discards: 0
dev.bge.0.stats.tx.Errors: 0


2010/2/19 Pyun YongHyeon 

> On Fri, Feb 19, 2010 at 11:13:59PM +0300, Slawa Olhovchenkov wrote:
> > On Fri, Feb 19, 2010 at 12:06:47PM -0800, Pyun YongHyeon wrote:
> >
> > >
> > > > dev.bge.1.stats.rx.Fragments: 1
> > >
> > > You received a frame that is less than 64 bytes with a bad FCS.
> > >
> > > > dev.bge.1.stats.rx.UcastPkts: 2956515
> > > > dev.bge.1.stats.rx.MulticastPkts: 0
> > > > dev.bge.1.stats.rx.FCSErrors: 18
> > >
> > > You have a lot of FCS errors here.
> > > Please double check cabling. If the statistics counter is right,
> > > sender is guilty or you have bad cabling issues here.
> >
> > 1. lost packets much more 18. I think hundreds, or thousands.
> > 2. packets lost on both (bge0 & bge1) interfaces
>
> If you see the MAC statistics counter, you have the following
> number of status updates and interrupts. Both numbers are same
> which means the controller didn't lost interrupts for state
> updates.
> dev.bge.0.stats.RingStatusUpdate: 950302
> dev.bge.0.stats.Interrupts: 950302
> and
> dev.bge.1.stats.RingStatusUpdate: 5518912
> dev.bge.1.stats.Interrupts: 5518912
>
> You received 582767 unicast packets and lost 0 packet for bge0.
> dev.bge.0.stats.rx.UcastPkts: 582767
> And you also received 2956515 unicast packets and lost 19 packets
> for bge1.
> dev.bge.1.stats.rx.Fragments: 1
> dev.bge.1.stats.rx.UcastPkts: 2956515
> dev.bge.1.stats.rx.FCSErrors: 18
> I don't see such a large number packet drops from these MAC
> statistics unless upper stack drops received packets.
> I fixed some counter updates which were ignored in previous
> releases so you may happen to see lost counters in recent version.
>
> Normally you should not have any FCS errors, it could be related
> with signal quality and these errors might not be correctly
> counted.
>
> > 3. packets don't lost on sources at Aug'09
>
> Since I don't have BCM5704 hardware it's hard to find which
> revision may affect to this issue. Could you narrow down which
> revision number started showing the issue?
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-22 Thread Denis Lamanov
I see same trouble (lost packets after 4 day uptime and reboot) :(

dev.bge.0.stats.rx.FCSErrors: 18

2010/2/19 Slawa Olhovchenkov 

> On Fri, Feb 19, 2010 at 12:06:47PM -0800, Pyun YongHyeon wrote:
>
> >
> > > dev.bge.1.stats.rx.Fragments: 1
> >
> > You received a frame that is less than 64 bytes with a bad FCS.
> >
> > > dev.bge.1.stats.rx.UcastPkts: 2956515
> > > dev.bge.1.stats.rx.MulticastPkts: 0
> > > dev.bge.1.stats.rx.FCSErrors: 18
> >
> > You have a lot of FCS errors here.
> > Please double check cabling. If the statistics counter is right,
> > sender is guilty or you have bad cabling issues here.
>
> 1. lost packets much more 18. I think hundreds, or thousands.
> 2. packets lost on both (bge0 & bge1) interfaces
> 3. packets don't lost on sources at Aug'09
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-21 Thread Slawa Olhovchenkov
On Sun, Feb 21, 2010 at 03:41:53PM -0800, Pyun YongHyeon wrote:

> On Sun, Feb 21, 2010 at 12:44:50AM +0300, Slawa Olhovchenkov wrote:
> > On Fri, Feb 19, 2010 at 01:12:01PM -0800, Pyun YongHyeon wrote:
> > 
> > > Normally you should not have any FCS errors, it could be related
> > > with signal quality and these errors might not be correctly
> > > counted.
> > 
> > I can't check cable and switch counters on bge1 before Feb 24.
> > 
> > > > 3. packets don't lost on sources at Aug'09
> > > 
> > > Since I don't have BCM5704 hardware it's hard to find which
> > > revision may affect to this issue. Could you narrow down which
> > > revision number started showing the issue?
> > 
> > I am don't update source between Aug'09 and Feb 16.
> > 
> 
> There were many bge(4) changes in that time frame. So it's hard to
> find which commit is guilty for the packet drop issue. If you can
> narrow down possible changes that might affect the issue that could
> help me a lot. You can do binary searching technique for the SVN
> revisions to know possible candidates.
> http://svn.freebsd.org/viewvc/base/head/sys/dev/bge/if_bge.c

How I can do this?
I don't work w/ svn before and don't know optimal way for one file.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-21 Thread Pyun YongHyeon
On Sun, Feb 21, 2010 at 12:44:50AM +0300, Slawa Olhovchenkov wrote:
> On Fri, Feb 19, 2010 at 01:12:01PM -0800, Pyun YongHyeon wrote:
> 
> > Normally you should not have any FCS errors, it could be related
> > with signal quality and these errors might not be correctly
> > counted.
> 
> I can't check cable and switch counters on bge1 before Feb 24.
> 
> > > 3. packets don't lost on sources at Aug'09
> > 
> > Since I don't have BCM5704 hardware it's hard to find which
> > revision may affect to this issue. Could you narrow down which
> > revision number started showing the issue?
> 
> I am don't update source between Aug'09 and Feb 16.
> 

There were many bge(4) changes in that time frame. So it's hard to
find which commit is guilty for the packet drop issue. If you can
narrow down possible changes that might affect the issue that could
help me a lot. You can do binary searching technique for the SVN
revisions to know possible candidates.
http://svn.freebsd.org/viewvc/base/head/sys/dev/bge/if_bge.c

> 4. Packets don't lost immediately after reboot.
> 
> PS: I got kernel panic.
> 

I think this is the same crash(NULL pointer dereference in
m_copym(9)) as you reported and I think this means the patch I
posted did not help to fix the panic issue.

> ===
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x18
> fault code  = supervisor read data, page not present
> instruction pointer = 0x20:0x802eb3b7
> stack pointer   = 0x28:0xff80001c66e0
> frame pointer   = 0x28:0xff8  01c6740
> code segment= base 0x0, limi  0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 724 (named)
> [thread pid 724 tid 100051 ]
> Stopped at  m_copym+0x37:   movl0x18(%r12),%eax
> db> panic
> panic: from debugger
> cpuid = 0
> Uptime: 1d5h55m33s
> Physical memory: 2039 MB
> Dumping 1448 MB: 1433 1417 1401
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-20 Thread Slawa Olhovchenkov
On Fri, Feb 19, 2010 at 01:12:01PM -0800, Pyun YongHyeon wrote:

> Normally you should not have any FCS errors, it could be related
> with signal quality and these errors might not be correctly
> counted.

I can't check cable and switch counters on bge1 before Feb 24.

> > 3. packets don't lost on sources at Aug'09
> 
> Since I don't have BCM5704 hardware it's hard to find which
> revision may affect to this issue. Could you narrow down which
> revision number started showing the issue?

I am don't update source between Aug'09 and Feb 16.

4. Packets don't lost immediately after reboot.

PS: I got kernel panic.

===
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x18
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x802eb3b7
stack pointer   = 0x28:0xff80001c66e0
frame pointer   = 0x28:0xff8  01c6740
code segment= base 0x0, limi  0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 724 (named)
[thread pid 724 tid 100051 ]
Stopped at  m_copym+0x37:   movl0x18(%r12),%eax
db> panic
panic: from debugger
cpuid = 0
Uptime: 1d5h55m33s
Physical memory: 2039 MB
Dumping 1448 MB: 1433 1417 1401
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-19 Thread Pyun YongHyeon
On Fri, Feb 19, 2010 at 11:13:59PM +0300, Slawa Olhovchenkov wrote:
> On Fri, Feb 19, 2010 at 12:06:47PM -0800, Pyun YongHyeon wrote:
> 
> > 
> > > dev.bge.1.stats.rx.Fragments: 1
> > 
> > You received a frame that is less than 64 bytes with a bad FCS.
> > 
> > > dev.bge.1.stats.rx.UcastPkts: 2956515
> > > dev.bge.1.stats.rx.MulticastPkts: 0
> > > dev.bge.1.stats.rx.FCSErrors: 18
> > 
> > You have a lot of FCS errors here.
> > Please double check cabling. If the statistics counter is right,
> > sender is guilty or you have bad cabling issues here.
> 
> 1. lost packets much more 18. I think hundreds, or thousands.
> 2. packets lost on both (bge0 & bge1) interfaces

If you see the MAC statistics counter, you have the following
number of status updates and interrupts. Both numbers are same
which means the controller didn't lost interrupts for state
updates.
dev.bge.0.stats.RingStatusUpdate: 950302
dev.bge.0.stats.Interrupts: 950302
and
dev.bge.1.stats.RingStatusUpdate: 5518912
dev.bge.1.stats.Interrupts: 5518912

You received 582767 unicast packets and lost 0 packet for bge0.
dev.bge.0.stats.rx.UcastPkts: 582767
And you also received 2956515 unicast packets and lost 19 packets
for bge1.
dev.bge.1.stats.rx.Fragments: 1
dev.bge.1.stats.rx.UcastPkts: 2956515
dev.bge.1.stats.rx.FCSErrors: 18
I don't see such a large number packet drops from these MAC
statistics unless upper stack drops received packets.
I fixed some counter updates which were ignored in previous
releases so you may happen to see lost counters in recent version.

Normally you should not have any FCS errors, it could be related
with signal quality and these errors might not be correctly
counted.

> 3. packets don't lost on sources at Aug'09

Since I don't have BCM5704 hardware it's hard to find which
revision may affect to this issue. Could you narrow down which
revision number started showing the issue?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-19 Thread Slawa Olhovchenkov
On Fri, Feb 19, 2010 at 12:06:47PM -0800, Pyun YongHyeon wrote:

> 
> > dev.bge.1.stats.rx.Fragments: 1
> 
> You received a frame that is less than 64 bytes with a bad FCS.
> 
> > dev.bge.1.stats.rx.UcastPkts: 2956515
> > dev.bge.1.stats.rx.MulticastPkts: 0
> > dev.bge.1.stats.rx.FCSErrors: 18
> 
> You have a lot of FCS errors here.
> Please double check cabling. If the statistics counter is right,
> sender is guilty or you have bad cabling issues here.

1. lost packets much more 18. I think hundreds, or thousands.
2. packets lost on both (bge0 & bge1) interfaces
3. packets don't lost on sources at Aug'09
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-19 Thread Pyun YongHyeon
On Fri, Feb 19, 2010 at 10:11:03PM +0300, Slawa Olhovchenkov wrote:
> On Fri, Feb 19, 2010 at 11:03:59AM -0800, Pyun YongHyeon wrote:
> 
> > On Fri, Feb 19, 2010 at 03:24:15PM +0300, Slawa Olhovchenkov wrote:
> > > On Fri, Feb 19, 2010 at 08:51:29AM +0300, Slawa Olhovchenkov wrote:
> > > 
> > > > On Thu, Feb 18, 2010 at 04:19:13PM -0800, Pyun YongHyeon wrote:
> > > > 
> > > > > 
> > > > > I'm still not sure whether the panic is related with bge(4) but
> > > > > there are a couple of missing workaround for PCIX BCM5704 silicon
> > > > > bug in bge(4). Did you also see the panic before updating to
> > > > > stable/8?
> > > > 
> > > > Before updating to stable/8 2010-Feb-16 I see network freez on stable/8
> > > > 2009-Sep -- bge stop receiving packets (by tcpdump), after aprox. 40-50
> > > > days uptime.
> > > > 
> > > > 
> > > > > Anyway, try attached patch and let me know how it works.
> > > > 
> > > > Thanks, I try.
> > > > 
> > > 
> > > I don't get trap after 2 hour, but already see next trouble:
> > > 
> > > ===
> > > PING 10.200.0.1 (10.200.0.1): 56 data bytes
> > > 
> > > --- 10.200.0.1 ping statistics ---
> > > 100 packets transmitted, 97 packets received, 3.0% packet loss
> > > round-trip min/avg/max/stddev = 0.188/0.268/0.356/0.044 ms
> > > ===
> > > 
> > > w/o patch, but witch fresh source I see same trouble: after 12 hour 7% 
> > > lost.
> > > netstat -i don't show any errors.
> > 
> > I think BCM5704 supports HW MAC statistics counter. Try extract it
> > with "sysctl dev.bge.0.stats". It will give you much more
> > information.
> 

[...]

> dev.bge.1.stats.rx.Fragments: 1

You received a frame that is less than 64 bytes with a bad FCS.

> dev.bge.1.stats.rx.UcastPkts: 2956515
> dev.bge.1.stats.rx.MulticastPkts: 0
> dev.bge.1.stats.rx.FCSErrors: 18

You have a lot of FCS errors here.
Please double check cabling. If the statistics counter is right,
sender is guilty or you have bad cabling issues here.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-19 Thread Slawa Olhovchenkov
On Fri, Feb 19, 2010 at 11:03:59AM -0800, Pyun YongHyeon wrote:

> On Fri, Feb 19, 2010 at 03:24:15PM +0300, Slawa Olhovchenkov wrote:
> > On Fri, Feb 19, 2010 at 08:51:29AM +0300, Slawa Olhovchenkov wrote:
> > 
> > > On Thu, Feb 18, 2010 at 04:19:13PM -0800, Pyun YongHyeon wrote:
> > > 
> > > > 
> > > > I'm still not sure whether the panic is related with bge(4) but
> > > > there are a couple of missing workaround for PCIX BCM5704 silicon
> > > > bug in bge(4). Did you also see the panic before updating to
> > > > stable/8?
> > > 
> > > Before updating to stable/8 2010-Feb-16 I see network freez on stable/8
> > > 2009-Sep -- bge stop receiving packets (by tcpdump), after aprox. 40-50
> > > days uptime.
> > > 
> > > 
> > > > Anyway, try attached patch and let me know how it works.
> > > 
> > > Thanks, I try.
> > > 
> > 
> > I don't get trap after 2 hour, but already see next trouble:
> > 
> > ===
> > PING 10.200.0.1 (10.200.0.1): 56 data bytes
> > 
> > --- 10.200.0.1 ping statistics ---
> > 100 packets transmitted, 97 packets received, 3.0% packet loss
> > round-trip min/avg/max/stddev = 0.188/0.268/0.356/0.044 ms
> > ===
> > 
> > w/o patch, but witch fresh source I see same trouble: after 12 hour 7% lost.
> > netstat -i don't show any errors.
> 
> I think BCM5704 supports HW MAC statistics counter. Try extract it
> with "sysctl dev.bge.0.stats". It will give you much more
> information.

dev.bge.0.stats.FramesDroppedDueToFilters: 0
dev.bge.0.stats.DmaWriteQueueFull: 0
dev.bge.0.stats.DmaWriteHighPriQueueFull: 0
dev.bge.0.stats.NoMoreRxBDs: 0
dev.bge.0.stats.InputDiscards: 0
dev.bge.0.stats.InputErrors: 0
dev.bge.0.stats.RecvThresholdHit: 561594
dev.bge.0.stats.DmaReadQueueFull: 41972
dev.bge.0.stats.DmaReadHighPriQueueFull: 0
dev.bge.0.stats.SendDataCompQueueFull: 0
dev.bge.0.stats.RingSetSendProdIndex: 705180
dev.bge.0.stats.RingStatusUpdate: 950302
dev.bge.0.stats.Interrupts: 950302
dev.bge.0.stats.AvoidedInterrupts: 0
dev.bge.0.stats.SendThresholdHit: 0
dev.bge.0.stats.rx.Octets: 196013834
dev.bge.0.stats.rx.Fragments: 0
dev.bge.0.stats.rx.UcastPkts: 582767
dev.bge.0.stats.rx.MulticastPkts: 0
dev.bge.0.stats.rx.FCSErrors: 0
dev.bge.0.stats.rx.AlignmentErrors: 0
dev.bge.0.stats.rx.xonPauseFramesReceived: 0
dev.bge.0.stats.rx.xoffPauseFramesReceived: 0
dev.bge.0.stats.rx.ControlFramesReceived: 0
dev.bge.0.stats.rx.xoffStateEntered: 0
dev.bge.0.stats.rx.FramesTooLong: 0
dev.bge.0.stats.rx.Jabbers: 0
dev.bge.0.stats.rx.UndersizePkts: 0
dev.bge.0.stats.rx.inRangeLengthError: 0
dev.bge.0.stats.rx.outRangeLengthError: 0
dev.bge.0.stats.tx.Octets: 654902713
dev.bge.0.stats.tx.Collisions: 0
dev.bge.0.stats.tx.XonSent: 0
dev.bge.0.stats.tx.XoffSent: 0
dev.bge.0.stats.tx.flowControlDone: 0
dev.bge.0.stats.tx.InternalMacTransmitErrors: 0
dev.bge.0.stats.tx.SingleCollisionFrames: 0
dev.bge.0.stats.tx.MultipleCollisionFrames: 0
dev.bge.0.stats.tx.DeferredTransmissions: 0
dev.bge.0.stats.tx.ExcessiveCollisions: 0
dev.bge.0.stats.tx.LateCollisions: 0
dev.bge.0.stats.tx.UcastPkts: 699931
dev.bge.0.stats.tx.MulticastPkts: 0
dev.bge.0.stats.tx.BroadcastPkts: 492
dev.bge.0.stats.tx.CarrierSenseErrors: 0
dev.bge.0.stats.tx.Discards: 0
dev.bge.0.stats.tx.Errors: 0

dev.bge.1.stats.FramesDroppedDueToFilters: 0
dev.bge.1.stats.DmaWriteQueueFull: 0
dev.bge.1.stats.DmaWriteHighPriQueueFull: 0
dev.bge.1.stats.NoMoreRxBDs: 0
dev.bge.1.stats.InputDiscards: 0
dev.bge.1.stats.InputErrors: 0
dev.bge.1.stats.RecvThresholdHit: 2889283
dev.bge.1.stats.DmaReadQueueFull: 79
dev.bge.1.stats.DmaReadHighPriQueueFull: 0
dev.bge.1.stats.SendDataCompQueueFull: 0
dev.bge.1.stats.RingSetSendProdIndex: 2861918
dev.bge.1.stats.RingStatusUpdate: 5518912
dev.bge.1.stats.Interrupts: 5518912
dev.bge.1.stats.AvoidedInterrupts: 0
dev.bge.1.stats.SendThresholdHit: 0
dev.bge.1.stats.rx.Octets: 930931282
dev.bge.1.stats.rx.Fragments: 1
dev.bge.1.stats.rx.UcastPkts: 2956515
dev.bge.1.stats.rx.MulticastPkts: 0
dev.bge.1.stats.rx.FCSErrors: 18
dev.bge.1.stats.rx.AlignmentErrors: 0
dev.bge.1.stats.rx.xonPauseFramesReceived: 0
dev.bge.1.stats.rx.xoffPauseFramesReceived: 0
dev.bge.1.stats.rx.ControlFramesReceived: 0
dev.bge.1.stats.rx.xoffStateEntered: 0
dev.bge.1.stats.rx.FramesTooLong: 0
dev.bge.1.stats.rx.Jabbers: 0
dev.bge.1.stats.rx.UndersizePkts: 0
dev.bge.1.stats.rx.inRangeLengthError: 0
dev.bge.1.stats.rx.outRangeLengthError: 0
dev.bge.1.stats.tx.Octets: 305055886
dev.bge.1.stats.tx.Collisions: 0
dev.bge.1.stats.tx.XonSent: 0
dev.bge.1.stats.tx.XoffSent: 0
dev.bge.1.stats.tx.flowControlDone: 0
dev.bge.1.stats.tx.InternalMacTransmitErrors: 0
dev.bge.1.stats.tx.SingleCollisionFrames: 0
dev.bge.1.stats.tx.MultipleCollisionFrames: 0
dev.bge.1.stats.tx.DeferredTransmissions: 0
dev.bge.1.stats.tx.ExcessiveCollisions: 0
dev.bge.1.stats.tx.LateCollisions: 0
dev.bge.1.stats.tx.UcastPkts: 2860335
dev.bge.1.stats.tx.MulticastPkts: 0
dev.bge.1.stats.tx.BroadcastPkts: 447
dev.bge.1.stats.tx.CarrierSenseErrors: 0
dev.bge.1.stats.tx.Discards: 0
dev.bge.1.stats.tx.Errors

Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-19 Thread Pyun YongHyeon
On Fri, Feb 19, 2010 at 03:24:15PM +0300, Slawa Olhovchenkov wrote:
> On Fri, Feb 19, 2010 at 08:51:29AM +0300, Slawa Olhovchenkov wrote:
> 
> > On Thu, Feb 18, 2010 at 04:19:13PM -0800, Pyun YongHyeon wrote:
> > 
> > > 
> > > I'm still not sure whether the panic is related with bge(4) but
> > > there are a couple of missing workaround for PCIX BCM5704 silicon
> > > bug in bge(4). Did you also see the panic before updating to
> > > stable/8?
> > 
> > Before updating to stable/8 2010-Feb-16 I see network freez on stable/8
> > 2009-Sep -- bge stop receiving packets (by tcpdump), after aprox. 40-50
> > days uptime.
> > 
> > 
> > > Anyway, try attached patch and let me know how it works.
> > 
> > Thanks, I try.
> > 
> 
> I don't get trap after 2 hour, but already see next trouble:
> 
> ===
> PING 10.200.0.1 (10.200.0.1): 56 data bytes
> 
> --- 10.200.0.1 ping statistics ---
> 100 packets transmitted, 97 packets received, 3.0% packet loss
> round-trip min/avg/max/stddev = 0.188/0.268/0.356/0.044 ms
> ===
> 
> w/o patch, but witch fresh source I see same trouble: after 12 hour 7% lost.
> netstat -i don't show any errors.

I think BCM5704 supports HW MAC statistics counter. Try extract it
with "sysctl dev.bge.0.stats". It will give you much more
information.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-19 Thread Slawa Olhovchenkov
On Fri, Feb 19, 2010 at 08:51:29AM +0300, Slawa Olhovchenkov wrote:

> On Thu, Feb 18, 2010 at 04:19:13PM -0800, Pyun YongHyeon wrote:
> 
> > 
> > I'm still not sure whether the panic is related with bge(4) but
> > there are a couple of missing workaround for PCIX BCM5704 silicon
> > bug in bge(4). Did you also see the panic before updating to
> > stable/8?
> 
> Before updating to stable/8 2010-Feb-16 I see network freez on stable/8
> 2009-Sep -- bge stop receiving packets (by tcpdump), after aprox. 40-50
> days uptime.
> 
> 
> > Anyway, try attached patch and let me know how it works.
> 
> Thanks, I try.
> 

I don't get trap after 2 hour, but already see next trouble:

===
PING 10.200.0.1 (10.200.0.1): 56 data bytes

--- 10.200.0.1 ping statistics ---
100 packets transmitted, 97 packets received, 3.0% packet loss
round-trip min/avg/max/stddev = 0.188/0.268/0.356/0.044 ms
===

w/o patch, but witch fresh source I see same trouble: after 12 hour 7% lost.
netstat -i don't show any errors.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-18 Thread Slawa Olhovchenkov
On Thu, Feb 18, 2010 at 04:19:13PM -0800, Pyun YongHyeon wrote:

> 
> I'm still not sure whether the panic is related with bge(4) but
> there are a couple of missing workaround for PCIX BCM5704 silicon
> bug in bge(4). Did you also see the panic before updating to
> stable/8?

Before updating to stable/8 2010-Feb-16 I see network freez on stable/8
2009-Sep -- bge stop receiving packets (by tcpdump), after aprox. 40-50
days uptime.


> Anyway, try attached patch and let me know how it works.

Thanks, I try.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-18 Thread Pyun YongHyeon
On Thu, Feb 18, 2010 at 03:32:54PM -0800, Jeremy Chadwick wrote:
> On Fri, Feb 19, 2010 at 12:50:39AM +0300, Slawa Olhovchenkov wrote:
> > On Thu, Feb 18, 2010 at 01:32:13PM -0800, Pyun YongHyeon wrote:
> > 
> > > > > dmesg output(only bge(4) related one).
> > > > 
> > > > dmesg from boot:
> > > > 
> > > > bge0:  mem 
> > > > 0xfdf7-0xfdf7 irq 25 at device 2.0 on pci2
> > > > miibus0:  on bge0
> > > > brgphy0:  PHY 1 on miibus0
> > > > brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> > > > 1000baseT-FDX, auto
> > > > bge0: Ethernet address: 00:14:c2:3d:e5:52
> > > > bge0: [ITHREAD]
> > > > bge1:  mem 
> > > > 0xfdf6-0xfdf6 irq 26 at device 2.1 on pci2
> > > > miibus1:  on bge1
> > > > brgphy1:  PHY 1 on miibus1
> > > > brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> > > > 1000baseT-FDX, auto
> > > > bge1: Ethernet address: 00:14:c2:3d:e5:51
> > > > bge1: [ITHREAD]
> > > > bge1: link state changed to UP
> > > > bge0: link state changed to UP
> > > > 
> > > > Nothing in dmesg before trap.
> > > > 
> > > 
> > > Is this PCI-X controller? It would be even better if you can post
> > 
> > This integrated controller (HP DL360-G4)
> > 
> > > bge(4) related dmesg output of verbosed boot and the output of
> >
> > ...
> > pci0:2:2:0: bad VPD cksum, remain 19
> > bge0:  mem 
> > 0xfdf7-0xfdf7 irq 25 at device 2.0 on pci2
> > bge0: Reserved 0x1 bytes for rid 0x10 type 3 at 0xfdf7
> > bge0: CHIP ID 0x2100; ASIC REV 0x02; CHIP REV 0x21; PCI-X
> > ...
> > pci0:2:2:1: bad VPD cksum, remain 19
> > bge1:  mem 
> > 0xfdf6-0xfdf6 irq 26 at device 2.1 on pci2
> > bge1: Reserved 0x1 bytes for rid 0x10 type 3 at 0xfdf6
> > bge1: CHIP ID 0x2100; ASIC REV 0x02; CHIP REV 0x21; PCI-X
> 
> Are the "bad VPD checksum" messages somehow responsible for this?
> They're both related to the bge(4) interfaces:
> 
> > b...@pci0:2:2:0:class=0x02 card=0x00d00e11 chip=0x164814e4 
> > rev=0x10 hdr=0x00
> > b...@pci0:2:2:1:class=0x02 card=0x00d00e11 chip=0x164814e4 
> > rev=0x10 hdr=0x00
>  

Driver tries to read VPD from controller but it seems it failed to
fully parse the data. But it managed to get PN part so it
successfully extracted device name string from the controller.
I don't think this is related with driver instability though.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-18 Thread Pyun YongHyeon
On Fri, Feb 19, 2010 at 12:50:39AM +0300, Slawa Olhovchenkov wrote:
> On Thu, Feb 18, 2010 at 01:32:13PM -0800, Pyun YongHyeon wrote:
> 
> > > > dmesg output(only bge(4) related one).
> > > 
> > > dmesg from boot:
> > > 
> > > bge0:  mem 
> > > 0xfdf7-0xfdf7 irq 25 at device 2.0 on pci2
> > > miibus0:  on bge0
> > > brgphy0:  PHY 1 on miibus0
> > > brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> > > 1000baseT-FDX, auto
> > > bge0: Ethernet address: 00:14:c2:3d:e5:52
> > > bge0: [ITHREAD]
> > > bge1:  mem 
> > > 0xfdf6-0xfdf6 irq 26 at device 2.1 on pci2
> > > miibus1:  on bge1
> > > brgphy1:  PHY 1 on miibus1
> > > brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> > > 1000baseT-FDX, auto
> > > bge1: Ethernet address: 00:14:c2:3d:e5:51
> > > bge1: [ITHREAD]
> > > bge1: link state changed to UP
> > > bge0: link state changed to UP
> > > 
> > > Nothing in dmesg before trap.
> > > 
> > 
> > Is this PCI-X controller? It would be even better if you can post
> 
> This integrated controller (HP DL360-G4)
> 
> > bge(4) related dmesg output of verbosed boot and the output of
> 
> Preloaded elf kernel "/boot/kernel/kernel" at 0x8088e000.
> Preloaded elf obj module "/boot/kernel/if_bge.ko" at 0x8088e1d0.
> Preloaded elf obj module "/boot/kernel/miibus.ko" at 0x8088e7f8.
> pci0:2:2:0: bad VPD cksum, remain 19
> bge0:  mem 
> 0xfdf7-0xfdf7 irq 25 at device 2.0 on pci2
> bge0: Reserved 0x1 bytes for rid 0x10 type 3 at 0xfdf7
> bge0: CHIP ID 0x2100; ASIC REV 0x02; CHIP REV 0x21; PCI-X
> miibus0:  on bge0
> brgphy0:  PHY 1 on miibus0
> brgphy0: OUI 0x000818, model 0x0019, rev. 0
> brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> 1000baseT-FDX, auto
> bge0: bpf attached
> bge0: Ethernet address: 00:14:c2:3d:e5:52
> ioapic1: routing intpin 1 (PCI IRQ 25) to lapic 0 vector 50
> bge0: [MPSAFE]
> bge0: [ITHREAD]
> pci0:2:2:1: bad VPD cksum, remain 19
> bge1:  mem 
> 0xfdf6-0xfdf6 irq 26 at device 2.1 on pci2
> bge1: Reserved 0x1 bytes for rid 0x10 type 3 at 0xfdf6
> bge1: CHIP ID 0x2100; ASIC REV 0x02; CHIP REV 0x21; PCI-X
> miibus1:  on bge1
> brgphy1:  PHY 1 on miibus1
> brgphy1: OUI 0x000818, model 0x0019, rev. 0
> brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> 1000baseT-FDX, auto
> bge1: bpf attached
> bge1: Ethernet address: 00:14:c2:3d:e5:51
> ioapic1: routing intpin 2 (PCI IRQ 26) to lapic 0 vector 51
> bge1: [MPSAFE]
> bge1: [ITHREAD]
> bge1: link UP
> bge1: link state changed to UP
> 
> 
> > "pciconf -lcv".
> 

[...]

> b...@pci0:2:2:0:class=0x02 card=0x00d00e11 chip=0x164814e4 
> rev=0x10 hdr=0x00
> vendor = 'Broadcom Corporation'
> device = 'NetXtreme Dual Gigabit Adapter (BCM5704)'
> class  = network
> subclass   = ethernet
> cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 1 split 
> transaction
> cap 01[48] = powerspec 2  supports D0 D3  current D0
> cap 03[50] = VPD
> cap 05[58] = MSI supports 8 messages, 64 bit 
> b...@pci0:2:2:1:class=0x02 card=0x00d00e11 chip=0x164814e4 
> rev=0x10 hdr=0x00
> vendor = 'Broadcom Corporation'
> device = 'NetXtreme Dual Gigabit Adapter (BCM5704)'
> class  = network
> subclass   = ethernet
> cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 1 split 
> transaction
> cap 01[48] = powerspec 2  supports D0 D3  current D0
> cap 03[50] = VPD
> cap 05[58] = MSI supports 8 messages, 64 bit 

I'm still not sure whether the panic is related with bge(4) but
there are a couple of missing workaround for PCIX BCM5704 silicon
bug in bge(4). Did you also see the panic before updating to
stable/8?
Anyway, try attached patch and let me know how it works.
Index: sys/dev/bge/if_bge.c
===
--- sys/dev/bge/if_bge.c	(revision 204011)
+++ sys/dev/bge/if_bge.c	(working copy)
@@ -1342,6 +1342,7 @@
 bge_chipinit(struct bge_softc *sc)
 {
 	uint32_t dma_rw_ctl;
+	uint16_t val;
 	int i;
 
 	/* Set endianness before we access any non-PCI registers. */
@@ -1362,6 +1363,17 @@
 	i < BGE_STATUS_BLOCK_END + 1; i += sizeof(uint32_t))
 		BGE_MEMWIN_WRITE(sc, i, 0);
 
+	if (sc->bge_chiprev == BGE_CHIPREV_5704_BX) {
+		/*
+		 *  Fix data corruption casued by non-qword write with WB.
+		 *  Fix master abort in PCI mode.
+		 *  Fix PCI latency timer.
+		 */
+		val = pci_read_config(sc->bge_dev, BGE_PCI_MSI_DATA + 2, 2);
+		val |= (1 << 10) | (1 << 12) | (1 << 13);
+		pci_write_config(sc->bge_dev, BGE_PCI_MSI_DATA + 2, val, 2);
+	}
+
 	/*
 	 * Set up the PCI DMA control register.
 	 */
@@ -3157,6 +3169,26 @@
 	pci_write_config(dev, BGE_PCI_CMD, command, 4);
 	write_op(sc, BGE_MISC_CFG, BGE_32BITTIME_66MHZ);
 
+	/*
+	 * Disable PCIX relaxed ordering to ensure status block update
+	 * comes first than packet buffer DMA. Otherwise driver may
+	 * read stale s

Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-18 Thread Jeremy Chadwick
On Fri, Feb 19, 2010 at 12:50:39AM +0300, Slawa Olhovchenkov wrote:
> On Thu, Feb 18, 2010 at 01:32:13PM -0800, Pyun YongHyeon wrote:
> 
> > > > dmesg output(only bge(4) related one).
> > > 
> > > dmesg from boot:
> > > 
> > > bge0:  mem 
> > > 0xfdf7-0xfdf7 irq 25 at device 2.0 on pci2
> > > miibus0:  on bge0
> > > brgphy0:  PHY 1 on miibus0
> > > brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> > > 1000baseT-FDX, auto
> > > bge0: Ethernet address: 00:14:c2:3d:e5:52
> > > bge0: [ITHREAD]
> > > bge1:  mem 
> > > 0xfdf6-0xfdf6 irq 26 at device 2.1 on pci2
> > > miibus1:  on bge1
> > > brgphy1:  PHY 1 on miibus1
> > > brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> > > 1000baseT-FDX, auto
> > > bge1: Ethernet address: 00:14:c2:3d:e5:51
> > > bge1: [ITHREAD]
> > > bge1: link state changed to UP
> > > bge0: link state changed to UP
> > > 
> > > Nothing in dmesg before trap.
> > > 
> > 
> > Is this PCI-X controller? It would be even better if you can post
> 
> This integrated controller (HP DL360-G4)
> 
> > bge(4) related dmesg output of verbosed boot and the output of
>
> ...
> pci0:2:2:0: bad VPD cksum, remain 19
> bge0:  mem 
> 0xfdf7-0xfdf7 irq 25 at device 2.0 on pci2
> bge0: Reserved 0x1 bytes for rid 0x10 type 3 at 0xfdf7
> bge0: CHIP ID 0x2100; ASIC REV 0x02; CHIP REV 0x21; PCI-X
> ...
> pci0:2:2:1: bad VPD cksum, remain 19
> bge1:  mem 
> 0xfdf6-0xfdf6 irq 26 at device 2.1 on pci2
> bge1: Reserved 0x1 bytes for rid 0x10 type 3 at 0xfdf6
> bge1: CHIP ID 0x2100; ASIC REV 0x02; CHIP REV 0x21; PCI-X

Are the "bad VPD checksum" messages somehow responsible for this?
They're both related to the bge(4) interfaces:

> b...@pci0:2:2:0:class=0x02 card=0x00d00e11 chip=0x164814e4 
> rev=0x10 hdr=0x00
> b...@pci0:2:2:1:class=0x02 card=0x00d00e11 chip=0x164814e4 
> rev=0x10 hdr=0x00

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-18 Thread Slawa Olhovchenkov
On Thu, Feb 18, 2010 at 01:32:13PM -0800, Pyun YongHyeon wrote:

> > > dmesg output(only bge(4) related one).
> > 
> > dmesg from boot:
> > 
> > bge0:  mem 
> > 0xfdf7-0xfdf7 irq 25 at device 2.0 on pci2
> > miibus0:  on bge0
> > brgphy0:  PHY 1 on miibus0
> > brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> > 1000baseT-FDX, auto
> > bge0: Ethernet address: 00:14:c2:3d:e5:52
> > bge0: [ITHREAD]
> > bge1:  mem 
> > 0xfdf6-0xfdf6 irq 26 at device 2.1 on pci2
> > miibus1:  on bge1
> > brgphy1:  PHY 1 on miibus1
> > brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> > 1000baseT-FDX, auto
> > bge1: Ethernet address: 00:14:c2:3d:e5:51
> > bge1: [ITHREAD]
> > bge1: link state changed to UP
> > bge0: link state changed to UP
> > 
> > Nothing in dmesg before trap.
> > 
> 
> Is this PCI-X controller? It would be even better if you can post

This integrated controller (HP DL360-G4)

> bge(4) related dmesg output of verbosed boot and the output of

Preloaded elf kernel "/boot/kernel/kernel" at 0x8088e000.
Preloaded elf obj module "/boot/kernel/if_bge.ko" at 0x8088e1d0.
Preloaded elf obj module "/boot/kernel/miibus.ko" at 0x8088e7f8.
pci0:2:2:0: bad VPD cksum, remain 19
bge0:  mem 
0xfdf7-0xfdf7 irq 25 at device 2.0 on pci2
bge0: Reserved 0x1 bytes for rid 0x10 type 3 at 0xfdf7
bge0: CHIP ID 0x2100; ASIC REV 0x02; CHIP REV 0x21; PCI-X
miibus0:  on bge0
brgphy0:  PHY 1 on miibus0
brgphy0: OUI 0x000818, model 0x0019, rev. 0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto
bge0: bpf attached
bge0: Ethernet address: 00:14:c2:3d:e5:52
ioapic1: routing intpin 1 (PCI IRQ 25) to lapic 0 vector 50
bge0: [MPSAFE]
bge0: [ITHREAD]
pci0:2:2:1: bad VPD cksum, remain 19
bge1:  mem 
0xfdf6-0xfdf6 irq 26 at device 2.1 on pci2
bge1: Reserved 0x1 bytes for rid 0x10 type 3 at 0xfdf6
bge1: CHIP ID 0x2100; ASIC REV 0x02; CHIP REV 0x21; PCI-X
miibus1:  on bge1
brgphy1:  PHY 1 on miibus1
brgphy1: OUI 0x000818, model 0x0019, rev. 0
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto
bge1: bpf attached
bge1: Ethernet address: 00:14:c2:3d:e5:51
ioapic1: routing intpin 2 (PCI IRQ 26) to lapic 0 vector 51
bge1: [MPSAFE]
bge1: [ITHREAD]
bge1: link UP
bge1: link state changed to UP


> "pciconf -lcv".

hos...@pci0:0:0:0:  class=0x06 card=0x32000e11 chip=0x35908086 rev=0x0c 
hdr=0x00
vendor = 'Intel Corporation'
device = 'E7520 Server Memory Controller Hub'
class  = bridge
subclass   = HOST-PCI
cap 09[40] = vendor (length 5) Intel cap 4 version 1
pc...@pci0:0:2:0:   class=0x060400 card=0x chip=0x35958086 rev=0x0c 
hdr=0x01
vendor = 'Intel Corporation'
device = 'E752x Memory Controller Hub PCIe Port A0'
class  = bridge
subclass   = PCI-PCI
cap 01[50] = powerspec 2  supports D0 D3  current D0
cap 05[58] = MSI supports 2 messages 
cap 10[64] = PCI-Express 1 root port max data 256(256) link x0(x8)
pc...@pci0:0:4:0:   class=0x060400 card=0x chip=0x35978086 rev=0x0c 
hdr=0x01
vendor = 'Intel Corporation'
device = 'E752x Memory Controller Hub PCIe Port B0'
class  = bridge
subclass   = PCI-PCI
cap 01[50] = powerspec 2  supports D0 D3  current D0
cap 05[58] = MSI supports 2 messages 
cap 10[64] = PCI-Express 1 root port max data 256(256) link x8(x8)
pc...@pci0:0:6:0:   class=0x060400 card=0x chip=0x35998086 rev=0x0c 
hdr=0x01
vendor = 'Intel Corporation'
device = 'E752x Memory Controller Hub PCIe Port C0'
class  = bridge
subclass   = PCI-PCI
cap 01[50] = powerspec 2  supports D0 D3  current D0
cap 05[58] = MSI supports 2 messages 
cap 10[64] = PCI-Express 1 root port max data 256(256) link x0(x8)
pc...@pci0:0:28:0:  class=0x060400 card=0x chip=0x25ae8086 rev=0x02 
hdr=0x01
vendor = 'Intel Corporation'
device = 'Hub Interface to PCI-X Bridge (6300ESB)'
class  = bridge
subclass   = PCI-PCI
cap 07[50] = PCI-X 64-bit bridge 
no...@pci0:0:29:0:  class=0x0c0300 card=0x32010e11 chip=0x25a98086 rev=0x02 
hdr=0x00
vendor = 'Intel Corporation'
device = 'USB 1.1 UHCI Controller *1 (6300ESB)'
class  = serial bus
subclass   = USB
no...@pci0:0:29:1:  class=0x0c0300 card=0x32010e11 chip=0x25aa8086 rev=0x02 
hdr=0x00
vendor = 'Intel Corporation'
device = 'USB 1.1 UHCI Controller *2 (6300ESB)'
class  = serial bus
subclass   = USB
no...@pci0:0:29:4:  class=0x088000 card=0x32010e11 chip=0x25ab8086 rev=0x02 
hdr=0x00
vendor = 'Intel Corporation'
device = 'Watchdog Timer (6300ESB)'
class  = base peripheral
ioap...@pci0:0:29:5:class=0x080020 card=0x32010e11 chip=0x25ac8086 rev=0x02 
hdr=0x00
vendor = 'Intel Corporation'
device = '6300ES

Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-18 Thread Pyun YongHyeon
On Fri, Feb 19, 2010 at 12:24:28AM +0300, Slawa Olhovchenkov wrote:
> On Thu, Feb 18, 2010 at 11:36:12AM -0800, Pyun YongHyeon wrote:
> 
> > On Thu, Feb 18, 2010 at 05:38:22PM +0300, Slawa Olhovchenkov wrote:
> > > On Tue, Feb 16, 2010 at 09:57:19AM -0800, Pyun YongHyeon wrote:
> > > 
> > > > On Sun, Feb 14, 2010 at 10:04:58AM -0800, Nick Rogers wrote:
> > > > > I'm having repeated kernel panic issues on 8.0-RELEASE/amd64. Can 
> > > > > anyone
> > > > > shed light on the below error? I unfortunately cannot provide a 
> > > > > proper crash
> > > > > dump. The pointer addresses are always the same. The only other thing 
> > > > > I've
> > > > > noticed that may be related is a watchdog timeout on bge0 error 
> > > > > before the
> > > > > panic. Thanks.
> > > > > 
> > > > 
> > > > Any chance to get backtrace from the crash?
> > > 
> > > I got same trouble on the same platform (8.0-STABLE/amd64).
> > > hw.bge.allow_asf=0 already
> > > 
> > > I got 2 proper crash dump (first w/ net.inet.ip.forwarding=1
> > > and second w/ net.inet.ip.forwarding=0).
> > > 
> > 
> > It looks like mbuf pointer was changed to NULL in the middle of IP
> > forwarding and IP fragment stage. If bge(4) frees passed mbufs this
> > may happen but I'm not sure this comes from bge(4).
> > By chance, are you using polling(4) on bge(4)? Also show me the
> 
> I am not using polling.
> 

Ok.

> > dmesg output(only bge(4) related one).
> 
> dmesg from boot:
> 
> bge0:  mem 
> 0xfdf7-0xfdf7 irq 25 at device 2.0 on pci2
> miibus0:  on bge0
> brgphy0:  PHY 1 on miibus0
> brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> 1000baseT-FDX, auto
> bge0: Ethernet address: 00:14:c2:3d:e5:52
> bge0: [ITHREAD]
> bge1:  mem 
> 0xfdf6-0xfdf6 irq 26 at device 2.1 on pci2
> miibus1:  on bge1
> brgphy1:  PHY 1 on miibus1
> brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> 1000baseT-FDX, auto
> bge1: Ethernet address: 00:14:c2:3d:e5:51
> bge1: [ITHREAD]
> bge1: link state changed to UP
> bge0: link state changed to UP
> 
> Nothing in dmesg before trap.
> 

Is this PCI-X controller? It would be even better if you can post
bge(4) related dmesg output of verbosed boot and the output of
"pciconf -lcv".
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-18 Thread Slawa Olhovchenkov
On Thu, Feb 18, 2010 at 11:36:12AM -0800, Pyun YongHyeon wrote:

> On Thu, Feb 18, 2010 at 05:38:22PM +0300, Slawa Olhovchenkov wrote:
> > On Tue, Feb 16, 2010 at 09:57:19AM -0800, Pyun YongHyeon wrote:
> > 
> > > On Sun, Feb 14, 2010 at 10:04:58AM -0800, Nick Rogers wrote:
> > > > I'm having repeated kernel panic issues on 8.0-RELEASE/amd64. Can anyone
> > > > shed light on the below error? I unfortunately cannot provide a proper 
> > > > crash
> > > > dump. The pointer addresses are always the same. The only other thing 
> > > > I've
> > > > noticed that may be related is a watchdog timeout on bge0 error before 
> > > > the
> > > > panic. Thanks.
> > > > 
> > > 
> > > Any chance to get backtrace from the crash?
> > 
> > I got same trouble on the same platform (8.0-STABLE/amd64).
> > hw.bge.allow_asf=0 already
> > 
> > I got 2 proper crash dump (first w/ net.inet.ip.forwarding=1
> > and second w/ net.inet.ip.forwarding=0).
> > 
> 
> It looks like mbuf pointer was changed to NULL in the middle of IP
> forwarding and IP fragment stage. If bge(4) frees passed mbufs this
> may happen but I'm not sure this comes from bge(4).
> By chance, are you using polling(4) on bge(4)? Also show me the

I am not using polling.

> dmesg output(only bge(4) related one).

dmesg from boot:

bge0:  mem 
0xfdf7-0xfdf7 irq 25 at device 2.0 on pci2
miibus0:  on bge0
brgphy0:  PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto
bge0: Ethernet address: 00:14:c2:3d:e5:52
bge0: [ITHREAD]
bge1:  mem 
0xfdf6-0xfdf6 irq 26 at device 2.1 on pci2
miibus1:  on bge1
brgphy1:  PHY 1 on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto
bge1: Ethernet address: 00:14:c2:3d:e5:51
bge1: [ITHREAD]
bge1: link state changed to UP
bge0: link state changed to UP

Nothing in dmesg before trap.

-- 
Slawa Olhovchenkov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-18 Thread Pyun YongHyeon
On Thu, Feb 18, 2010 at 05:38:22PM +0300, Slawa Olhovchenkov wrote:
> On Tue, Feb 16, 2010 at 09:57:19AM -0800, Pyun YongHyeon wrote:
> 
> > On Sun, Feb 14, 2010 at 10:04:58AM -0800, Nick Rogers wrote:
> > > I'm having repeated kernel panic issues on 8.0-RELEASE/amd64. Can anyone
> > > shed light on the below error? I unfortunately cannot provide a proper 
> > > crash
> > > dump. The pointer addresses are always the same. The only other thing I've
> > > noticed that may be related is a watchdog timeout on bge0 error before the
> > > panic. Thanks.
> > > 
> > 
> > Any chance to get backtrace from the crash?
> 
> I got same trouble on the same platform (8.0-STABLE/amd64).
> hw.bge.allow_asf=0 already
> 
> I got 2 proper crash dump (first w/ net.inet.ip.forwarding=1
> and second w/ net.inet.ip.forwarding=0).
> 

It looks like mbuf pointer was changed to NULL in the middle of IP
forwarding and IP fragment stage. If bge(4) frees passed mbufs this
may happen but I'm not sure this comes from bge(4).
By chance, are you using polling(4) on bge(4)? Also show me the
dmesg output(only bge(4) related one).

> backtrace from the first crash:
> 
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x18
> fault code  = supervisor read data, page not present
> instruction pointer = 0x20:0x802ea751
> stack pointer   = 0x28:0xff8ef930
> frame pointer   = 0x28:0xff8ef970
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 12 (irq26: bge1)
> panic: from debugger
> cpuid = 0
> Uptime: 5h23m50s
> Physical memory: 2039 MB
> Dumping 1316 MB: 1301 1285 1269 1253 1237 1221 1205 1189 1173 1157 1141 1125 
> 1109 1093 1077 1061 1045 1029 1013 997 981 965 949 933 917 901 885 869 853 
> 837 821 805 789 773 757 741 725 709 693 677 661 645 629 613 597 581 565 549 
> 533 517 501 485 469 453 437 421 405 389 373 357 341 325 309 293 277 261 245 
> 229 213 197 181 165 149 133 117 101 85 69 53 37 21 5
> 
> Reading symbols from /boot/kernel/if_bge.ko...Reading symbols from 
> /boot/kernel/if_bge.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/if_bge.ko
> Reading symbols from /boot/kernel/miibus.ko...Reading symbols from 
> /boot/kernel/miibus.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/miibus.ko
> Reading symbols from /boot/kernel/ipfw.ko...Reading symbols from 
> /boot/kernel/ipfw.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/ipfw.ko
> Reading symbols from /boot/kernel/nfsserver.ko...Reading symbols from 
> /boot/kernel/nfsserver.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/nfsserver.ko
> Reading symbols from /boot/kernel/krpc.ko...Reading symbols from 
> /boot/kernel/krpc.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/krpc.ko
> Reading symbols from /boot/kernel/nfssvc.ko...Reading symbols from 
> /boot/kernel/nfssvc.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/nfssvc.ko
> #0  doadump () at pcpu.h:223
> 223 pcpu.h: No such file or directory.
> in pcpu.h
> (kgdb) bt
> #0  doadump () at pcpu.h:223
> #1  0x802909b9 in boot (howto=260) at 
> /usr/src/sys/kern/kern_shutdown.c:416
> #2  0x80290e0c in panic (fmt=Variable "fmt" is not available.
> ) at /usr/src/sys/kern/kern_shutdown.c:579
> #3  0x801a5bc7 in db_panic (addr=Variable "addr" is not available.
> ) at /usr/src/sys/ddb/db_command.c:478
> #4  0x801a5fd1 in db_command (last_cmdp=0x806b1fa0, 
> cmd_table=Variable "cmd_table" is not available.
> ) at /usr/src/sys/ddb/db_command.c:445
> #5  0x801a6220 in db_command_loop () at 
> /usr/src/sys/ddb/db_command.c:498
> #6  0x801a81e9 in db_trap (type=Variable "type" is not available.
> ) at /usr/src/sys/ddb/db_main.c:229
> #7  0x802c0995 in kdb_trap (type=12, code=0, tf=0xff8ef880) 
> at /usr/src/sys/kern/subr_kdb.c:535
> #8  0x8049ee0d in trap_fatal (frame=0xff8ef880, eva=Variable 
> "eva" is not available.
> ) at /usr/src/sys/amd64/amd64/trap.c:852
> #9  0x8049f1e4 in trap_pfault (frame=0xff8ef880, usermode=0) 
> at /usr/src/sys/amd64/amd64/trap.c:773
> #10 0x8049fa6a in trap (frame=0xff8ef880) at 
> /usr/src/sys/amd64/amd64/trap.c:499
> #11 0x80484ff3 in calltrap () at 
> /usr/src/sys/amd64/am

Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-18 Thread Slawa Olhovchenkov
On Tue, Feb 16, 2010 at 09:57:19AM -0800, Pyun YongHyeon wrote:

> On Sun, Feb 14, 2010 at 10:04:58AM -0800, Nick Rogers wrote:
> > I'm having repeated kernel panic issues on 8.0-RELEASE/amd64. Can anyone
> > shed light on the below error? I unfortunately cannot provide a proper crash
> > dump. The pointer addresses are always the same. The only other thing I've
> > noticed that may be related is a watchdog timeout on bge0 error before the
> > panic. Thanks.
> > 
> 
> Any chance to get backtrace from the crash?

I got same trouble on the same platform (8.0-STABLE/amd64).
hw.bge.allow_asf=0 already

I got 2 proper crash dump (first w/ net.inet.ip.forwarding=1
and second w/ net.inet.ip.forwarding=0).

backtrace from the first crash:

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x18
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x802ea751
stack pointer   = 0x28:0xff8ef930
frame pointer   = 0x28:0xff8ef970
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 12 (irq26: bge1)
panic: from debugger
cpuid = 0
Uptime: 5h23m50s
Physical memory: 2039 MB
Dumping 1316 MB: 1301 1285 1269 1253 1237 1221 1205 1189 1173 1157 1141 1125 
1109 1093 1077 1061 1045 1029 1013 997 981 965 949 933 917 901 885 869 853 837 
821 805 789 773 757 741 725 709 693 677 661 645 629 613 597 581 565 549 533 517 
501 485 469 453 437 421 405 389 373 357 341 325 309 293 277 261 245 229 213 197 
181 165 149 133 117 101 85 69 53 37 21 5

Reading symbols from /boot/kernel/if_bge.ko...Reading symbols from 
/boot/kernel/if_bge.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/if_bge.ko
Reading symbols from /boot/kernel/miibus.ko...Reading symbols from 
/boot/kernel/miibus.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/miibus.ko
Reading symbols from /boot/kernel/ipfw.ko...Reading symbols from 
/boot/kernel/ipfw.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ipfw.ko
Reading symbols from /boot/kernel/nfsserver.ko...Reading symbols from 
/boot/kernel/nfsserver.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/nfsserver.ko
Reading symbols from /boot/kernel/krpc.ko...Reading symbols from 
/boot/kernel/krpc.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/krpc.ko
Reading symbols from /boot/kernel/nfssvc.ko...Reading symbols from 
/boot/kernel/nfssvc.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/nfssvc.ko
#0  doadump () at pcpu.h:223
223 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:223
#1  0x802909b9 in boot (howto=260) at 
/usr/src/sys/kern/kern_shutdown.c:416
#2  0x80290e0c in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:579
#3  0x801a5bc7 in db_panic (addr=Variable "addr" is not available.
) at /usr/src/sys/ddb/db_command.c:478
#4  0x801a5fd1 in db_command (last_cmdp=0x806b1fa0, 
cmd_table=Variable "cmd_table" is not available.
) at /usr/src/sys/ddb/db_command.c:445
#5  0x801a6220 in db_command_loop () at 
/usr/src/sys/ddb/db_command.c:498
#6  0x801a81e9 in db_trap (type=Variable "type" is not available.
) at /usr/src/sys/ddb/db_main.c:229
#7  0x802c0995 in kdb_trap (type=12, code=0, tf=0xff8ef880) at 
/usr/src/sys/kern/subr_kdb.c:535
#8  0x8049ee0d in trap_fatal (frame=0xff8ef880, eva=Variable 
"eva" is not available.
) at /usr/src/sys/amd64/amd64/trap.c:852
#9  0x8049f1e4 in trap_pfault (frame=0xff8ef880, usermode=0) at 
/usr/src/sys/amd64/amd64/trap.c:773
#10 0x8049fa6a in trap (frame=0xff8ef880) at 
/usr/src/sys/amd64/amd64/trap.c:499
#11 0x80484ff3 in calltrap () at 
/usr/src/sys/amd64/amd64/exception.S:224
#12 0x802ea751 in m_copydata (m=0x0, off=0, len=108, 
cp=0xff0027865194 "б\026zHqJВ\220ЦПЫСPo~@<22>Feb 17 15:10:2")
at /usr/src/sys/kern/uipc_mbuf.c:816
#13 0x8035e72d in ip_forward (m=0xff0001530900, srcrt=Variable 
"srcrt" is not available.
) at /usr/src/sys/netinet/ip_input.c:1444
#14 0x8035fef7 in ip_input (m=0xff0001530900) at 
/usr/src/sys/netinet/ip_input.c:717
#15 0x80342e9e in netisr_dispatch_src (proto=1, source=Variable 
"source" is not available.
) at /usr/src/sys/net/netisr.c:917
#16 0xf

Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-16 Thread Pyun YongHyeon
On Sun, Feb 14, 2010 at 10:04:58AM -0800, Nick Rogers wrote:
> I'm having repeated kernel panic issues on 8.0-RELEASE/amd64. Can anyone
> shed light on the below error? I unfortunately cannot provide a proper crash
> dump. The pointer addresses are always the same. The only other thing I've
> noticed that may be related is a watchdog timeout on bge0 error before the
> panic. Thanks.
> 

Any chance to get backtrace from the crash?

> Jan 27 15:25:01 wifi kernel:
> Jan 27 15:25:01 wifi kernel:
> Jan 27 15:25:01 wifi kernel: Fatal trap 12: page fault while in kernel mode
> Jan 27 15:25:01 wifi kernel: cpuid = 4; apic id = 04
> Jan 27 15:25:02 wifi kernel:
> Jan 27 15:25:02 wifi kernel: fault virtual address  = 0x28
> Jan 27 15:25:02 wifi kernel: fault code = supervisor write data,
> page not present
> Jan 27 15:25:02 wifi kernel: instruction pointer=
> 0x20:0x803263b7
> Jan 27 15:25:02 wifi kernel: stack pointer  =
> 0x28:0xff8073acdb40
> Jan 27 15:25:02 wifi kernel: frame pointer  =
> 0x28:0xff8073acdba0
> Jan 27 15:25:02 wifi kernel: code segment   = base 0x0, limit
> 0xf, type 0x1b
> Jan 27 15:25:02 wifi kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
> Jan 27 15:25:02 wifi kernel: processor eflags   =
> Jan 27 15:25:02 wifi kernel: interrupt enabled,
> Jan 27 15:25:02 wifi kernel: resume,
> Jan 27 15:25:02 wifi kernel: IOPL = 0
> Jan 27 15:25:02 wifi kernel:
> Jan 27 15:25:02 wifi kernel: current process
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-15 Thread Nick Rogers
hw.bge.allow_asf: 0

On Mon, Feb 15, 2010 at 2:23 AM, Giacomo Olgeni  wrote:

>
> Hello,
>
> Are you running with hw.bge.allow_asf enabled?
>
>
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"