Re: panic during work with jailed postgresql8.4

2010-04-02 Thread Oleg Lomaka

On Apr 2, 2010, at 4:12 PM, Bjoern A. Zeeb wrote:

> On Fri, 2 Apr 2010, Oleg Lomaka wrote:
> 
>>>> uname -a
>>>> FreeBSD cerberus.regredi.com 8.0-STABLE FreeBSD 8.0-STABLE #7 r206031: Thu 
>>>> Apr  1 13:43:57 EEST 2010 
>>>> r...@cerberus.regredi.com:/usr/obj/usr/src/sys/GENERIC  amd64
>>>> 
>>>> Link to dmesg.boot:
>>>> http://docs.google.com/leaf?id=0B-irbkAqk9i7OGY2ZWJiODgtOWJmMy00NDQ1LTliZDctZjU3N2YwNmMxNjZl&hl=en
>>>> 
>>>> Link to kernel core backtrace:
>>>> http://docs.google.com/Doc?docid=0AeirbkAqk9i7ZGc5Yzc2ZndfM2M4NzYydmRw&hl=en
>>>> 
>>>> Can I help to spot this trouble by providing additional info?
>>> 
>>> Looking at the info I doubt it's related to jails or Pg in first
>>> place.  Have you been running that same setup already before your Apr
>>> 1st, r206031, kernel?  If so, from when was your last kernel?
>> 
>> Yes, this configuration works on another server fine (8.0-STABLE FreeBSD 
>> 8.0-STABLE #3 r205202)
>> 
>> Made few more tests. All tests I make using psql command (as it is 100% 
>> reproducible, may be now try spot it using telnet/netcat, without involving 
>> pg). psql accomplish login operation fine, panic appears after i run any 
>> command like \d, so I think it depends on packet size.
>> 
>> Current picture is:
>> 1. When connect from host machine - works fine.
>> 2. When I connect from other server - works fine.
>> 3. When connect from another jail on the same box as db's jail (tried from 
>> few jails) - kernel fault.
>> 
>> Also tried security.jail.allow_raw_sockets on/off - nothing changes.
> 
> In addition to the private mail I have just sent you, the first thing
> you might try it to updat again; I hadn't realized before that your
> r206031 seems to be in the middle of a multi-commit merge from two
> people.
> 
> It would be worth to update to the latest stable/8 and try again
> first.

That's it. r206088 works fine. Thank you for help.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: panic during work with jailed postgresql8.4

2010-04-02 Thread Oleg Lomaka

On Apr 2, 2010, at 3:02 PM, Bjoern A. Zeeb wrote:

> On Thu, 1 Apr 2010, Oleg Lomaka wrote:
>> I have a kernel panic when connect to postgresql8.4 server installed in one 
>> of jails from another jail. It's 100% reproducible.
>> Also I have tried to connect from host machine to jailed pg server. That way 
>> it works fine without crash.
>> 
>> Server configuration uses geli and zfs. Four disks encrypted using geli. And 
>> raidz2 is using ad8.eli, ad10.eli, ad12.eli, ad14.eli providers. All jails 
>> located at this raidz2 pool.
>> 
>> Also I use ezjail for jails management. And it uses NFS to mount directories 
>> with base system.
>> 
>> atal double fault
>> rip = 0x8063510a
>> rsp = 0xff80eaec5f50
>> rbp = 0xff80eaec6040
>> cpuid = 1; apic id = 02
>> panic: double fault
>> cpuid = 1
>> Uptime: 7m11s
>> Physical memory: 8169 MB
>> 
>> uname -a
>> FreeBSD cerberus.regredi.com 8.0-STABLE FreeBSD 8.0-STABLE #7 r206031: Thu 
>> Apr  1 13:43:57 EEST 2010 
>> r...@cerberus.regredi.com:/usr/obj/usr/src/sys/GENERIC  amd64
>> 
>> Link to dmesg.boot:
>> http://docs.google.com/leaf?id=0B-irbkAqk9i7OGY2ZWJiODgtOWJmMy00NDQ1LTliZDctZjU3N2YwNmMxNjZl&hl=en
>> 
>> Link to kernel core backtrace:
>> http://docs.google.com/Doc?docid=0AeirbkAqk9i7ZGc5Yzc2ZndfM2M4NzYydmRw&hl=en
>> 
>> Can I help to spot this trouble by providing additional info?
> 
> Looking at the info I doubt it's related to jails or Pg in first
> place.  Have you been running that same setup already before your Apr
> 1st, r206031, kernel?  If so, from when was your last kernel?

Yes, this configuration works on another server fine (8.0-STABLE FreeBSD 
8.0-STABLE #3 r205202)

Made few more tests. All tests I make using psql command (as it is 100% 
reproducible, may be now try spot it using telnet/netcat, without involving 
pg). psql accomplish login operation fine, panic appears after i run any 
command like \d, so I think it depends on packet size.

 Current picture is:
1. When connect from host machine - works fine.
2. When I connect from other server - works fine.
3. When connect from another jail on the same box as db's jail (tried from few 
jails) - kernel fault. 

Also tried security.jail.allow_raw_sockets on/off - nothing changes. 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: panic during work with jailed postgresql8.4

2010-04-01 Thread Oleg Lomaka

On Apr 2, 2010, at 4:52 AM, pluknet wrote:

> On 1 April 2010 22:18, Oleg Lomaka  wrote:
>> 
>> 
>> I have a kernel panic when connect to postgresql8.4 server installed in one 
>> of jails from another jail. It's 100% reproducible.
>> Also I have tried to connect from host machine to jailed pg server. That way 
>> it works fine without crash.
>> 
>> Server configuration uses geli and zfs. Four disks encrypted using geli. And 
>> raidz2 is using ad8.eli, ad10.eli, ad12.eli, ad14.eli providers. All jails 
>> located at this raidz2 pool.
>> 
>> Also I use ezjail for jails management. And it uses NFS to mount directories 
>> with base system.
>> 
>> atal double fault
>> rip = 0x8063510a
>> rsp = 0xff80eaec5f50
>> rbp = 0xff80eaec6040
>> cpuid = 1; apic id = 02
>> panic: double fault
>> cpuid = 1
>> Uptime: 7m11s
>> Physical memory: 8169 MB
>> 
>> uname -a
>> FreeBSD cerberus.regredi.com 8.0-STABLE FreeBSD 8.0-STABLE #7 r206031: Thu 
>> Apr  1 13:43:57 EEST 2010 
>> r...@cerberus.regredi.com:/usr/obj/usr/src/sys/GENERIC  amd64
>> 
>> Link to dmesg.boot:
>> http://docs.google.com/leaf?id=0B-irbkAqk9i7OGY2ZWJiODgtOWJmMy00NDQ1LTliZDctZjU3N2YwNmMxNjZl&hl=en
>> 
>> Link to kernel core backtrace:
>> http://docs.google.com/Doc?docid=0AeirbkAqk9i7ZGc5Yzc2ZndfM2M4NzYydmRw&hl=en
> 
> Looking at backtrace, I wonder whether tp->t_maxseg changes in
> tcp_mtudisc() at all.
> You should be able to extract its value on each 2*n frame in that big
> recursive call.


You are right, pt->t_maxseg doesn't change

(kgdb) frame 9
#9  0x807097e8 in tcp_mtudisc (inp=0xff00193c53f0, errno=Variable 
"errno" is not available.
) at tcp_offload.h:282
282 return (tcp_output(tp));
(kgdb) p tp->t_maxseg
$1 = 14336
(kgdb) frame 11
#11 0x807097e8 in tcp_mtudisc (inp=0xff00193c53f0, errno=Variable 
"errno" is not available.
) at tcp_offload.h:282
282 return (tcp_output(tp));
(kgdb) p tp->t_maxseg
$2 = 14336

... (full log at 
http://docs.google.com/Doc?docid=0AeirbkAqk9i7ZGc5Yzc2ZndfNGQ4cWpia2dz&hl=en )

(kgdb) frame 81
#81 0x807097e8 in tcp_mtudisc (inp=0xff00193c53f0, errno=Variable 
"errno" is not available.
) at tcp_offload.h:282
282 return (tcp_output(tp));
(kgdb) p tp->t_maxseg
$37 = 14336
(kgdb) 

panic during work with jailed postgresql8.4

2010-04-01 Thread Oleg Lomaka
Hello,

I have a kernel panic when connect to postgresql8.4 server installed in one of 
jails from another jail. It's 100% reproducible.
Also I have tried to connect from host machine to jailed pg server. That way it 
works fine without crash.

Server configuration uses geli and zfs. Four disks encrypted using geli. And 
raidz2 is using ad8.eli, ad10.eli, ad12.eli, ad14.eli providers. All jails 
located at this raidz2 pool.

Also I use ezjail for jails management. And it uses NFS to mount directories 
with base system.

atal double fault
rip = 0x8063510a
rsp = 0xff80eaec5f50
rbp = 0xff80eaec6040
cpuid = 1; apic id = 02
panic: double fault
cpuid = 1
Uptime: 7m11s
Physical memory: 8169 MB

uname -a
FreeBSD cerberus.regredi.com 8.0-STABLE FreeBSD 8.0-STABLE #7 r206031: Thu Apr  
1 13:43:57 EEST 2010 r...@cerberus.regredi.com:/usr/obj/usr/src/sys/GENERIC 
 amd64

Link to dmesg.boot:
http://docs.google.com/leaf?id=0B-irbkAqk9i7OGY2ZWJiODgtOWJmMy00NDQ1LTliZDctZjU3N2YwNmMxNjZl&hl=en

Link to kernel core backtrace:
http://docs.google.com/Doc?docid=0AeirbkAqk9i7ZGc5Yzc2ZndfM2M4NzYydmRw&hl=en

Can I help to spot this trouble by providing additional info?

Thanks.

Re: any hope for nfe/msk?

2007-11-07 Thread Oleg Lomaka

Hello,

Pyun YongHyeon wrote:

On Thu, Nov 01, 2007 at 10:59:48AM +0200, Oleg Lomaka wrote:
 > Hello,
 > 
 > Pyun YongHyeon wrote:

 > >On Tue, Oct 30, 2007 at 04:01:04PM +0200, Oleg Lomaka wrote:
 > >
 > >[...]
 > >
 > > > I had RxFIFO overrun again :(
 > > > from dmest:
 > > > msk0: Rx FIFO overrun!
 > >
 > >[...]
 > >
 > >Please try attached patch again. Sorry for the trouble.
 > >After applying the patch show me verbosed dmesg output related with
 > >msk(4)/PHY driver.
 > >
 > >Thanks for testing.
 > >  
 > pcib1:  irq 16 at device 28.0 on pci0

 > pcib1:   domain0
 > pcib1:   secondary bus 2
 > pcib1:   subordinate bus   2
 > pcib1:   I/O decode0x2000-0x2fff
 > pcib1:   memory decode 0xd010-0xd01f
 > pcib1:   no prefetched decode
 > pci2:  on pcib1
 > pci2: domain=0, physical bus=2
 > found-> vendor=0x11ab, dev=0x4352, revid=0x14
 >domain=0, bus=2, slot=0, func=0
 >class=02-00-00, hdrtype=0x00, mfdev=0
 >cmdreg=0x0007, statreg=0x4010, cachelnsz=16 (dwords)
 >lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
 >intpin=a, irq=11
 >powerspec 2  supports D0 D1 D2 D3  current D0
 >MSI supports 2 messages, 64 bit
 >map[10]: type Memory, range 64, base 0xd010, size 14, enabled
 > pcib1: requested memory range 0xd010-0xd0103fff: good
 >map[18]: type I/O Port, range 32, base 0x2000, size  8, enabled
 > pcib1: requested I/O range 0x2000-0x20ff: in range
 > pcib1: slot 0 INTA routed to irq 16
 > mskc0:  port 0x2000-0x20ff mem 
 > 0xd010-0xd0103fff irq 16 at device 0.0 on pci2

 > mskc0: Reserved 0x4000 bytes for rid 0x10 type 3 at 0xd010
 > mskc0: MSI count : 2
 > mskc0: RAM buffer size : 4KB
 > mskc0: Port 0 : Rx Queue 2KB(0x:0x07ff)
 > mskc0: Port 0 : Tx Queue 2KB(0x0800:0x0fff)
 > msk0:  on mskc0
 > msk0: bpf attached
 > msk0: Ethernet address: 00:1b:24:0e:bc:26
 > miibus0:  on msk0
 > e1000phy0:  PHY 0 on miibus0
 > e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 > ioapic0: routing intpin 16 (PCI IRQ 16) to vector 49
 > mskc0: [MPSAFE]
 > mskc0: [FILTER]
 > 


So far all looks good to me. If you encounter watchdog timeouts
or Rx FIFO overruns let me know.

  


Got it again:
msk0: Rx FIFO overrun!
I believe this is happening under heavy CPU usage. Now i have firefox 
compiling and watched pictures on remote windows box using rdesktop. And 
after few minutes got network freeze.
But it looks i didn't get any packet lost :). Take a look at ping 
statistics... funny...

tdevil% ping 10.1.1.254
PING 10.1.1.254 (10.1.1.254): 56 data bytes
64 bytes from 10.1.1.254: icmp_seq=0 ttl=64 time=35926.404 ms
64 bytes from 10.1.1.254: icmp_seq=1 ttl=64 time=34925.694 ms
64 bytes from 10.1.1.254: icmp_seq=2 ttl=64 time=33924.729 ms
64 bytes from 10.1.1.254: icmp_seq=3 ttl=64 time=32923.814 ms
64 bytes from 10.1.1.254: icmp_seq=4 ttl=64 time=31922.833 ms
64 bytes from 10.1.1.254: icmp_seq=5 ttl=64 time=30921.878 ms
64 bytes from 10.1.1.254: icmp_seq=6 ttl=64 time=29920.923 ms
64 bytes from 10.1.1.254: icmp_seq=7 ttl=64 time=28919.960 ms
64 bytes from 10.1.1.254: icmp_seq=8 ttl=64 time=27919.009 ms
64 bytes from 10.1.1.254: icmp_seq=9 ttl=64 time=26918.042 ms
64 bytes from 10.1.1.254: icmp_seq=10 ttl=64 time=25917.078 ms
64 bytes from 10.1.1.254: icmp_seq=11 ttl=64 time=24916.115 ms
64 bytes from 10.1.1.254: icmp_seq=12 ttl=64 time=23915.144 ms
64 bytes from 10.1.1.254: icmp_seq=13 ttl=64 time=22914.192 ms
64 bytes from 10.1.1.254: icmp_seq=14 ttl=64 time=21913.214 ms
64 bytes from 10.1.1.254: icmp_seq=15 ttl=64 time=20912.278 ms
64 bytes from 10.1.1.254: icmp_seq=16 ttl=64 time=19911.330 ms
64 bytes from 10.1.1.254: icmp_seq=17 ttl=64 time=18910.375 ms
64 bytes from 10.1.1.254: icmp_seq=18 ttl=64 time=17909.419 ms
64 bytes from 10.1.1.254: icmp_seq=19 ttl=64 time=16853.821 ms
64 bytes from 10.1.1.254: icmp_seq=20 ttl=64 time=15854.710 ms
64 bytes from 10.1.1.254: icmp_seq=21 ttl=64 time=14701.312 ms
64 bytes from 10.1.1.254: icmp_seq=22 ttl=64 time=13701.003 ms
64 bytes from 10.1.1.254: icmp_seq=23 ttl=64 time=12700.052 ms
64 bytes from 10.1.1.254: icmp_seq=24 ttl=64 time=11699.098 ms
64 bytes from 10.1.1.254: icmp_seq=25 ttl=64 time=10698.148 ms
64 bytes from 10.1.1.254: icmp_seq=36 ttl=64 time=0.463 ms
64 bytes from 10.1.1.254: icmp_seq=37 ttl=64 time=0.379 ms

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: any hope for nfe/msk?

2007-11-01 Thread Oleg Lomaka

Hello,

Pyun YongHyeon wrote:

On Tue, Oct 30, 2007 at 04:01:04PM +0200, Oleg Lomaka wrote:

[...]

 > I had RxFIFO overrun again :(
 > from dmest:
 > msk0: Rx FIFO overrun!

[...]

Please try attached patch again. Sorry for the trouble.
After applying the patch show me verbosed dmesg output related with
msk(4)/PHY driver.

Thanks for testing.
  

pcib1:  irq 16 at device 28.0 on pci0
pcib1:   domain0
pcib1:   secondary bus 2
pcib1:   subordinate bus   2
pcib1:   I/O decode0x2000-0x2fff
pcib1:   memory decode 0xd010-0xd01f
pcib1:   no prefetched decode
pci2:  on pcib1
pci2: domain=0, physical bus=2
found-> vendor=0x11ab, dev=0x4352, revid=0x14
   domain=0, bus=2, slot=0, func=0
   class=02-00-00, hdrtype=0x00, mfdev=0
   cmdreg=0x0007, statreg=0x4010, cachelnsz=16 (dwords)
   lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
   intpin=a, irq=11
   powerspec 2  supports D0 D1 D2 D3  current D0
   MSI supports 2 messages, 64 bit
   map[10]: type Memory, range 64, base 0xd010, size 14, enabled
pcib1: requested memory range 0xd010-0xd0103fff: good
   map[18]: type I/O Port, range 32, base 0x2000, size  8, enabled
pcib1: requested I/O range 0x2000-0x20ff: in range
pcib1: slot 0 INTA routed to irq 16
mskc0:  port 0x2000-0x20ff mem 
0xd010-0xd0103fff irq 16 at device 0.0 on pci2

mskc0: Reserved 0x4000 bytes for rid 0x10 type 3 at 0xd010
mskc0: MSI count : 2
mskc0: RAM buffer size : 4KB
mskc0: Port 0 : Rx Queue 2KB(0x:0x07ff)
mskc0: Port 0 : Tx Queue 2KB(0x0800:0x0fff)
msk0:  on mskc0
msk0: bpf attached
msk0: Ethernet address: 00:1b:24:0e:bc:26
miibus0:  on msk0
e1000phy0:  PHY 0 on miibus0
e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
ioapic0: routing intpin 16 (PCI IRQ 16) to vector 49
mskc0: [MPSAFE]
mskc0: [FILTER]

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: any hope for nfe/msk?

2007-10-30 Thread Oleg Lomaka

Pyun YongHyeon wrote:

On Thu, Oct 25, 2007 at 05:30:32PM +0900, To Oleg Lomaka wrote:

[...]

 >  > tdevil% grep -iE "msk|phy" /var/run/dmesg.boot
 >  > pci0: domain=0, physical bus=0
 >  > pci2: domain=0, physical bus=2
 >  > mskc0:  port 0x2000-0x20ff mem 
 >  > 0xd010-0xd0103fff irq 16 at device 0.0 on pci2

 >  > mskc0: Reserved 0x4000 bytes for rid 0x10 type 3 at 0xd010
 >  > mskc0: MSI count : 2
 >  > mskc0: RAM buffer size : 16KB
 >  > mskc0: Port 0 : Rx Queue 10KB(0x:0x27ff)
 >  > mskc0: Port 0 : Tx Queue 10KB(0x2800:0x4fff)
 >  > msk0:  on mskc0
 >  > msk0: bpf attached
 >  > msk0: Ethernet address: 00:1b:24:0e:bc:26
 >  > miibus0:  on msk0
 >  > e1000phy0:  PHY 0 on miibus0
 >  > e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 >  > ukphy0:  PHY 3 on miibus0
 >  > ukphy0: OUI 0x001000, model 0x0004, rev. 0
 >  > ukphy0:  no media present
 >  > ukphy1:  PHY 6 on miibus0
 >  > ukphy1: OUI 0x004400, model 0x0011, rev. 0
 >  > ukphy1:  no media present
 >  > mskc0: [MPSAFE]
 >  > mskc0: [FILTER]
 >  > pci3: domain=0, physical bus=3
 >  > pci4: domain=0, physical bus=4
 >  > pci5: domain=0, physical bus=5
 >  > pci10: domain=0, physical bus=10
 >  > 
 > 
 > Thanks for the info. Would please try attached patch?
 > 


Any progress here?
I guess it's very important to fix the bug as it would affect all
Yukon FE based NIC.

  
I've applied your patch again yesterday. There was no halts for few 
hours already (after ports cvs up and other network/cpu loads). I'll 
give you a note in a day or two if there will no be any troubles.

Thanks for your help.

--
 Oleg Lomaka, 
 System Administrator

 Kiev Zoral Development Center
 Tel: +380-44-4928018
 ALEK-RIPE, ALEK-UANIC

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: any hope for nfe/msk?

2007-10-30 Thread Oleg Lomaka

Pyun YongHyeon wrote:

On Tue, Oct 30, 2007 at 10:42:33AM +0200, Oleg Lomaka wrote:
 > Pyun YongHyeon wrote:
 > >On Thu, Oct 25, 2007 at 05:30:32PM +0900, To Oleg Lomaka wrote:
 > >
 > >[...]
 > >
 > > >  > tdevil% grep -iE "msk|phy" /var/run/dmesg.boot
 > > >  > pci0: domain=0, physical bus=0
 > > >  > pci2: domain=0, physical bus=2
 > > >  > mskc0:  port 0x2000-0x20ff 
 > > mem >  > 0xd010-0xd0103fff irq 16 at device 0.0 on pci2

 > > >  > mskc0: Reserved 0x4000 bytes for rid 0x10 type 3 at 0xd010
 > > >  > mskc0: MSI count : 2
 > > >  > mskc0: RAM buffer size : 16KB
 > > >  > mskc0: Port 0 : Rx Queue 10KB(0x:0x27ff)
 > > >  > mskc0: Port 0 : Tx Queue 10KB(0x2800:0x4fff)
 > > >  > msk0:  on 
 > > mskc0

 > > >  > msk0: bpf attached
 > > >  > msk0: Ethernet address: 00:1b:24:0e:bc:26
 > > >  > miibus0:  on msk0
 > > >  > e1000phy0:  PHY 0 on 
 > > miibus0

 > > >  > e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 > > >  > ukphy0:  PHY 3 on miibus0
 > > >  > ukphy0: OUI 0x001000, model 0x0004, rev. 0
 > > >  > ukphy0:  no media present
 > > >  > ukphy1:  PHY 6 on miibus0
 > > >  > ukphy1: OUI 0x004400, model 0x0011, rev. 0
 > > >  > ukphy1:  no media present
 > > >  > mskc0: [MPSAFE]
 > > >  > mskc0: [FILTER]
 > > >  > pci3: domain=0, physical bus=3
 > > >  > pci4: domain=0, physical bus=4
 > > >  > pci5: domain=0, physical bus=5
 > > >  > pci10: domain=0, physical bus=10
 > > >  > 
 > > > 
 > > > Thanks for the info. Would please try attached patch?
 > > > 
 > >

 > >Any progress here?
 > >I guess it's very important to fix the bug as it would affect all
 > >Yukon FE based NIC.
 > >
 > >  
 > I've applied your patch again yesterday. There was no halts for few 
 > hours already (after ports cvs up and other network/cpu loads). I'll 
 > give you a note in a day or two if there will no be any troubles.

 > Thanks for your help.
 > 


Glad to hear that. Would you show me the verbosed boot messages
related with msk(4)?

According to your dmesg output I guess you have phantom PHYs
attached to msk(4) too. So I'd also like to know the output of
"devinfo -rv".

  


I had RxFIFO overrun again :(
from dmest:
msk0: Rx FIFO overrun!
pid 1245 (gnome-vfs-daemon), uid 1001: exited on signal 11
msk0: watchdog timeout (missed Tx interrupts) -- recovering

from boot log:
pci2:  on pcib1
pci2: domain=0, physical bus=2
found-> vendor=0x11ab, dev=0x4352, revid=0x14
   domain=0, bus=2, slot=0, func=0
   class=02-00-00, hdrtype=0x00, mfdev=0
   cmdreg=0x0007, statreg=0x4010, cachelnsz=16 (dwords)
   lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
   intpin=a, irq=11
   powerspec 2  supports D0 D1 D2 D3  current D0
   MSI supports 2 messages, 64 bit
   map[10]: type Memory, range 64, base 0xd010, size 14, enabled
pcib1: requested memory range 0xd010-0xd0103fff: good
   map[18]: type I/O Port, range 32, base 0x2000, size  8, enabled
pcib1: requested I/O range 0x2000-0x20ff: in range
pcib1: slot 0 INTA routed to irq 16
mskc0:  port 0x2000-0x20ff mem 
0xd010-0xd0103fff irq 16 at device 0.0 on pci2

mskc0: Reserved 0x4000 bytes for rid 0x10 type 3 at 0xd010
mskc0: MSI count : 2
mskc0: RAM buffer size : 4KB
mskc0: Port 0 : Rx Queue 10KB(0x:0x27ff)
mskc0: Port 0 : Tx Queue -6KB(0x2800:0x0fff)
msk0:  on mskc0
msk0: bpf attached
msk0: Ethernet address: 00:1b:24:0e:bc:26
miibus0:  on msk0
e1000phy0:  PHY 0 on miibus0
e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
ukphy0:  PHY 3 on miibus0
ukphy0: OUI 0x001000, model 0x0004, rev. 0
ukphy0:  no media present
ukphy1:  PHY 6 on miibus0
ukphy1: OUI 0x004400, model 0x0011, rev. 0
ukphy1:  no media present
ioapic0: routing intpin 16 (PCI IRQ 16) to vector 49
mskc0: [MPSAFE]
mskc0: [FILTER]
pcib2:  irq 17 at device 28.1 on pci0
pcib2:   domain0
pcib2:   secondary bus 3
pcib2:   subordinate bus   3
pcib2:   I/O decode0xf000-0xfff
pcib2:   memory decode 0xd000-0xd


and devinfo:
tdevil% devinfo -rv
nexus0
 cryptosoft0
 apic0
 I/O memory addresses:
 0xfec0-0xfec0001f
 0xfee0-0xfee003ff
 legacy0
   cpu0
   pcib0
 pci0
   hostb0 pnpinfo vendor=0x8086 device=0x27a0 subvendor=0x1025 
subdevice=0x0110 class=0x06 at slot=0 function=0
   vgapci0 pnpinfo vendor=0x8086 device=0x27a2 subvendor=0x1025 
subdevice=0x0110 class=0x03 at slot=2 function=0

   I/O ports:

Re: any hope for nfe/msk?

2007-10-25 Thread Oleg Lomaka

Pyun YongHyeon wrote:

On Thu, Oct 25, 2007 at 09:59:15AM +0300, Oleg Lomaka wrote:
 > Hello,
 > 
 > Pyun YongHyeon wrote:

 > >On Wed, Oct 24, 2007 at 05:12:44PM +0300, Oleg Lomaka wrote:
 > > > Pyun YongHyeon wrote:
 > > > >On Wed, Oct 24, 2007 at 09:33:48AM +0200, Danny Braniss wrote:
 > > > > > Hi,
 > > > > >   these drivers don't work under 7.0
 > > > > > As soon as some mild preasure is applied, they start loosing 
 > > > > interrupts, and
 > > > > > in my case the hosts come to a total stand-still, since they are 
 > > > > diskless

 > > > > > and rely on the network.
 > > > > > This happens at 1gb and at 100mg.
 > > > > > 
 > > > > > Maybe the problem is with the shared interrups?

 > > > > >   
 > > > > >   irq16: mskc0 uhci0   3308351 13
 > > > > > or
 > > > > >   irq21: nfe0 ohci01584415 24
 > > > > > 
 > > > > > but I have no idea how to uncouple this
 > > > > > 
 > > > >

 > > > >If you see watchdog timeout errors on your console, shared interrupt
 > > > >would be culprit.
 > > > >For msk(4) set hw.msk.legacy_intr="1" in loader.conf or use kenv(1)
 > > > >to set it before loading msk(4) kernel module.
 > > > >For nfe(4) you can switch to polling(4).
 > > > >
 > > > >  
 > > > I have some msk troubles too. On my laptop (acer travelmate 2483wxmi) 
 > > > under heavy cpu & network load msk periodically stops working for few 
 > > > minutes.

 > >
 > >If that happens msk(4) recover from the non-working state?
 > >  
 > Yes, some times in few seconds, some times in 5 - 10 minutes, but always 
 > recovers.

 > > > sysctl -a|grep msk
 > > > <118>msk0: no link ...
 > > > <118>DHCPREQUEST on msk0 to 255.255.255.255 port 67
 > > > <118>DHCPREQUEST on msk0 to 255.255.255.255 port 67
 > > > <118>DHCPDISCOVER on msk0 to 255.255.255.255 port 67 interval 3
 > > > <118>DHCPREQUEST on msk0 to 255.255.255.255 port 67
 > > > <118>msk0: flags=8843 metric 0 
 > > > mtu 1500

 > > > msk0: watchdog timeout (missed Tx interrupts) -- recovering
 > > > msk0: watchdog timeout (missed Tx interrupts) -- recovering
 > > > msk0: Rx FIFO overrun!
 > > 
 > >This looks bad. Would you show me verbosed boot messages related with
 > >msk(4) and PHY driver as well as "vmstat -i" output.
 > >
 > >  
 > Here are values from just booted laptop. If it will halt msk today 
 > again, I'll resend.
 > 
 > tdevil% vmstat -i

 > interrupt  total   rate
 > irq1: atkbd03275  1
 > irq12: psm011157  6
 > irq14: ata022500 13
 > irq15: ata1   85  0
 > irq16: mskc0 uhci+ 17334 10
 > irq18: uhci2   1  0
 > irq22: pcm046530 27
 > irq23: uhci0 ehci0 95882 57
 > cpu0: timer  3322705   1999
 > Total3519469   2117
 > 
 > 
 > tdevil% grep -iE "msk|phy" /var/run/dmesg.boot

 > pci0: domain=0, physical bus=0
 > pci2: domain=0, physical bus=2
 > mskc0:  port 0x2000-0x20ff mem 
 > 0xd010-0xd0103fff irq 16 at device 0.0 on pci2

 > mskc0: Reserved 0x4000 bytes for rid 0x10 type 3 at 0xd010
 > mskc0: MSI count : 2
 > mskc0: RAM buffer size : 16KB
 > mskc0: Port 0 : Rx Queue 10KB(0x:0x27ff)
 > mskc0: Port 0 : Tx Queue 10KB(0x2800:0x4fff)
 > msk0:  on mskc0
 > msk0: bpf attached
 > msk0: Ethernet address: 00:1b:24:0e:bc:26
 > miibus0:  on msk0
 > e1000phy0:  PHY 0 on miibus0
 > e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 > ukphy0:  PHY 3 on miibus0
 > ukphy0: OUI 0x001000, model 0x0004, rev. 0
 > ukphy0:  no media present
 > ukphy1:  PHY 6 on miibus0
 > ukphy1: OUI 0x004400, model 0x0011, rev. 0
 > ukphy1:  no media present
 > mskc0: [MPSAFE]
 > mskc0: [FILTER]
 > pci3: domain=0, physical bus=3
 > pci4: domain=0, physical bus=4
 > pci5: domain=0, physical bus=5
 > pci10: domain=0, physical bus=10
 > 


Thanks for the info. Would please try attached patch?

  
After kldunload/kldload i've got following and had to revert to original 
one (1.1

Re: any hope for nfe/msk?

2007-10-25 Thread Oleg Lomaka

Hello,

Pyun YongHyeon wrote:

On Wed, Oct 24, 2007 at 05:12:44PM +0300, Oleg Lomaka wrote:
 > Pyun YongHyeon wrote:
 > >On Wed, Oct 24, 2007 at 09:33:48AM +0200, Danny Braniss wrote:
 > > > Hi,
 > > > these drivers don't work under 7.0
 > > > As soon as some mild preasure is applied, they start loosing 
 > > interrupts, and
 > > > in my case the hosts come to a total stand-still, since they are 
 > > diskless

 > > > and rely on the network.
 > > > This happens at 1gb and at 100mg.
 > > > 
 > > > Maybe the problem is with the shared interrups?

 > > > 
 > > > irq16: mskc0 uhci0   3308351 13
 > > > or
 > > > irq21: nfe0 ohci01584415 24
 > > > 
 > > > but I have no idea how to uncouple this
 > > > 
 > >

 > >If you see watchdog timeout errors on your console, shared interrupt
 > >would be culprit.
 > >For msk(4) set hw.msk.legacy_intr="1" in loader.conf or use kenv(1)
 > >to set it before loading msk(4) kernel module.
 > >For nfe(4) you can switch to polling(4).
 > >
 > >  
 > I have some msk troubles too. On my laptop (acer travelmate 2483wxmi) 
 > under heavy cpu & network load msk periodically stops working for few 
 > minutes.


If that happens msk(4) recover from the non-working state?
  
Yes, some times in few seconds, some times in 5 - 10 minutes, but always 
recovers.

 > sysctl -a|grep msk
 > <118>msk0: no link ...
 > <118>DHCPREQUEST on msk0 to 255.255.255.255 port 67
 > <118>DHCPREQUEST on msk0 to 255.255.255.255 port 67
 > <118>DHCPDISCOVER on msk0 to 255.255.255.255 port 67 interval 3
 > <118>DHCPREQUEST on msk0 to 255.255.255.255 port 67
 > <118>msk0: flags=8843 metric 0 
 > mtu 1500

 > msk0: watchdog timeout (missed Tx interrupts) -- recovering
 > msk0: watchdog timeout (missed Tx interrupts) -- recovering
 > msk0: Rx FIFO overrun!
 
This looks bad. Would you show me verbosed boot messages related with
msk(4) and PHY driver as well as "vmstat -i" output.

  
Here are values from just booted laptop. If it will halt msk today 
again, I'll resend.


tdevil% vmstat -i
interrupt  total   rate
irq1: atkbd03275  1
irq12: psm011157  6
irq14: ata022500 13
irq15: ata1   85  0
irq16: mskc0 uhci+ 17334 10
irq18: uhci2   1  0
irq22: pcm046530 27
irq23: uhci0 ehci0 95882 57
cpu0: timer  3322705   1999
Total3519469   2117


tdevil% grep -iE "msk|phy" /var/run/dmesg.boot
pci0: domain=0, physical bus=0
pci2: domain=0, physical bus=2
mskc0:  port 0x2000-0x20ff mem 
0xd010-0xd0103fff irq 16 at device 0.0 on pci2

mskc0: Reserved 0x4000 bytes for rid 0x10 type 3 at 0xd010
mskc0: MSI count : 2
mskc0: RAM buffer size : 16KB
mskc0: Port 0 : Rx Queue 10KB(0x:0x27ff)
mskc0: Port 0 : Tx Queue 10KB(0x2800:0x4fff)
msk0:  on mskc0
msk0: bpf attached
msk0: Ethernet address: 00:1b:24:0e:bc:26
miibus0:  on msk0
e1000phy0:  PHY 0 on miibus0
e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
ukphy0:  PHY 3 on miibus0
ukphy0: OUI 0x001000, model 0x0004, rev. 0
ukphy0:  no media present
ukphy1:  PHY 6 on miibus0
ukphy1: OUI 0x004400, model 0x0011, rev. 0
ukphy1:  no media present
mskc0: [MPSAFE]
mskc0: [FILTER]
pci3: domain=0, physical bus=3
pci4: domain=0, physical bus=4
pci5: domain=0, physical bus=5
pci10: domain=0, physical bus=10


 > msk0: watchdog timeout (missed Tx interrupts) -- recovering
 > msk0: watchdog timeout (missed Tx interrupts) -- recovering
 > msk0: watchdog timeout (missed Tx interrupts) -- recovering
 > dev.mskc.0.%desc: Marvell Yukon 88E8038 Gigabit Ethernet
 > dev.mskc.0.%driver: mskc
 > dev.mskc.0.%location: slot=0 function=0
 > dev.mskc.0.%pnpinfo: vendor=0x11ab device=0x4352 subvendor=0x1025 
 > subdevice=0x0110 class=0x02

 > dev.mskc.0.%parent: pci2
 > dev.mskc.0.process_limit: 128
 > dev.msk.0.%desc: Marvell Technology Group Ltd. Yukon FE Id 0xb7 Rev 0x01
 > dev.msk.0.%driver: msk
 > dev.msk.0.%parent: mskc0
 > dev.miibus.0.%parent: msk0
 > 
 > Not sure if it is connected to previous issue.
 > 
 > uname -a
 > FreeBSD tdevil.lomaka.org.ua 7.0-BETA1 FreeBSD 7.0-BETA1 #0: Mon Oct 22 
 > 18:32:01 EEST 2007 
 > [EMAIL PROTECTED]:/usr/obj/usr/src/sys/TDEVIL-7.kernconf  i386
 > 

  


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: any hope for nfe/msk?

2007-10-24 Thread Oleg Lomaka

Pyun YongHyeon wrote:

On Wed, Oct 24, 2007 at 09:33:48AM +0200, Danny Braniss wrote:
 > Hi,
 >   these drivers don't work under 7.0
 > As soon as some mild preasure is applied, they start loosing interrupts, and
 > in my case the hosts come to a total stand-still, since they are diskless
 > and rely on the network.
 > This happens at 1gb and at 100mg.
 > 
 > Maybe the problem is with the shared interrups?

 >   
 >   irq16: mskc0 uhci0   3308351 13
 > or
 >   irq21: nfe0 ohci01584415 24
 > 
 > but I have no idea how to uncouple this
 > 


If you see watchdog timeout errors on your console, shared interrupt
would be culprit.
For msk(4) set hw.msk.legacy_intr="1" in loader.conf or use kenv(1)
to set it before loading msk(4) kernel module.
For nfe(4) you can switch to polling(4).

  
I have some msk troubles too. On my laptop (acer travelmate 2483wxmi) 
under heavy cpu & network load msk periodically stops working for few 
minutes.

sysctl -a|grep msk
<118>msk0: no link ...
<118>DHCPREQUEST on msk0 to 255.255.255.255 port 67
<118>DHCPREQUEST on msk0 to 255.255.255.255 port 67
<118>DHCPDISCOVER on msk0 to 255.255.255.255 port 67 interval 3
<118>DHCPREQUEST on msk0 to 255.255.255.255 port 67
<118>msk0: flags=8843 metric 0 
mtu 1500

msk0: watchdog timeout (missed Tx interrupts) -- recovering
msk0: watchdog timeout (missed Tx interrupts) -- recovering
msk0: Rx FIFO overrun!
msk0: watchdog timeout (missed Tx interrupts) -- recovering
msk0: watchdog timeout (missed Tx interrupts) -- recovering
msk0: watchdog timeout (missed Tx interrupts) -- recovering
dev.mskc.0.%desc: Marvell Yukon 88E8038 Gigabit Ethernet
dev.mskc.0.%driver: mskc
dev.mskc.0.%location: slot=0 function=0
dev.mskc.0.%pnpinfo: vendor=0x11ab device=0x4352 subvendor=0x1025 
subdevice=0x0110 class=0x02

dev.mskc.0.%parent: pci2
dev.mskc.0.process_limit: 128
dev.msk.0.%desc: Marvell Technology Group Ltd. Yukon FE Id 0xb7 Rev 0x01
dev.msk.0.%driver: msk
dev.msk.0.%parent: mskc0
dev.miibus.0.%parent: msk0

Not sure if it is connected to previous issue.

uname -a
FreeBSD tdevil.lomaka.org.ua 7.0-BETA1 FreeBSD 7.0-BETA1 #0: Mon Oct 22 
18:32:01 EEST 2007 
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/TDEVIL-7.kernconf  i386


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"