malloc(M_WAITOK) of g_bio, forcing M_NOWAIT with non-sleepable locks held:

2006-08-31 Thread Václav Haisman
Hi,
I found this in logs of 6.1 box that I admin this morning. The machine
keeps running after that.

--
wilx
+malloc(M_WAITOK) of g_bio, forcing M_NOWAIT with the following non-sleepable 
locks held:
+exclusive sleep mutex inp (tcpinp) r = 0 (0xc50c5d38) locked @ 
/usr/src/sys/netinet/tcp_usrreq.c:1029
+KDB: stack backtrace:
+kdb_backtrace(c08eef84,e78be89c,1,c45752c0,c1035380) at kdb_backtrace+0x2f
+witness_warn(5,0,c0819ce0,c07fc905,c05eaca1) at witness_warn+0x1ac
+uma_zalloc_arg(c1035380,0,102,e78be8e4,c07562a1) at uma_zalloc_arg+0x3d
+g_alloc_bio(8,c0819777,c4575400,c45752c0,d367aac8) at g_alloc_bio+0x23
+swapgeom_strategy(d367aac8,c45752c0,c0819777,271) at swapgeom_strategy+0x3a
+swp_pager_strategy(d367aac8,0,c0819777,437,c0760ee7) at swp_pager_strategy+0x88
+swap_pager_getpages(c728e948,e78be9ec,1,0,e78be9b0) at 
swap_pager_getpages+0x382
+vm_fault(c4f9b128,806b000,1,0,c4c0b300) at vm_fault+0xb13
+trap_pfault(e78beaa8,0,806b2c0,c08a9c60,806b2c0) at trap_pfault+0xf4
+trap(c088,28,c4c00028,e78beb28,806b2c0) at trap+0x33e
+calltrap() at calltrap+0x5
+--- trap 0xc, eip = 0xc07b9b16, esp = 0xe78beae8, ebp = 0xe78beb08 ---
+generic_copyin(e78bec84,e78beb28,4,4,c50c5ca8) at generic_copyin+0x32
+tcp_ctloutput(c4d1dc84,e78bec84,0,c589b400,e78bec68) at tcp_ctloutput+0x182
+sosetopt(c4d1dc84,e78bec84,e78bec80,c08eef80,c4bd15a0) at sosetopt+0x38
+kern_setsockopt(c4c0b300,7,6,1,806b2c0) at kern_setsockopt+0xd6
+setsockopt(c4c0b300,e78bed04,14,28279000,5) at setsockopt+0x3e
+syscall(3b,2808003b,bfbf003b,1,806b2c0) at syscall+0x295
+Xint0x80_syscall() at Xint0x80_syscall+0x1f
+--- syscall (105, FreeBSD ELF32, setsockopt), eip = 0x2827959b, esp = 
0xbf9fea3c, ebp = 0xbf9fea68 ---
+Sleeping on swread with the following non-sleepable locks held:
+exclusive sleep mutex inp (tcpinp) r = 0 (0xc50c5d38) locked @ 
/usr/src/sys/netinet/tcp_usrreq.c:1029
+KDB: stack backtrace:
+kdb_backtrace(c08eef84,e78be8e0,1,1,0) at kdb_backtrace+0x2f
+witness_warn(5,c08fe2a0,c0802eb1,c0819833,c08fe2a0) at witness_warn+0x1ac
+msleep(c1bb6fe8,c08fe2a0,40,c0819833,4e20) at msleep+0x58
+swap_pager_getpages(c728e948,e78be9ec,1,0,e78be9b0) at 
swap_pager_getpages+0x400
+vm_fault(c4f9b128,806b000,1,0,c4c0b300) at vm_fault+0xb13
+trap_pfault(e78beaa8,0,806b2c0,c08a9c60,806b2c0) at trap_pfault+0xf4
+trap(c088,28,c4c00028,e78beb28,806b2c0) at trap+0x33e
+calltrap() at calltrap+0x5
+--- trap 0xc, eip = 0xc07b9b16, esp = 0xe78beae8, ebp = 0xe78beb08 ---
+generic_copyin(e78bec84,e78beb28,4,4,c50c5ca8) at generic_copyin+0x32
+tcp_ctloutput(c4d1dc84,e78bec84,0,c589b400,e78bec68) at tcp_ctloutput+0x182
+sosetopt(c4d1dc84,e78bec84,e78bec80,c08eef80,c4bd15a0) at sosetopt+0x38
+kern_setsockopt(c4c0b300,7,6,1,806b2c0) at kern_setsockopt+0xd6
+setsockopt(c4c0b300,e78bed04,14,28279000,5) at setsockopt+0x3e
+syscall(3b,2808003b,bfbf003b,1,806b2c0) at syscall+0x295
+Xint0x80_syscall() at Xint0x80_syscall+0x1f
+--- syscall (105, FreeBSD ELF32, setsockopt), eip = 0x2827959b, esp = 
0xbf9fea3c, ebp = 0xbf9fea68 ---

signature.asc
Description: OpenPGP digital signature


Suspend freezes system on my Thinkpad X41

2006-08-31 Thread Jordan Sissel

Subject says it all. Attempting FN+F4 (suspend) or acpiconf -s 3 or -s 4 all
cause the system to halt.

Specs:
Thinkpad X41
6.1-RELEASE GENERIC
Atheros 5212
Broadcom BCM5751M
SATA (ICH6 SATA150)

Freezes are non-recoverable. Even hardware functions (such as lcd dimming)
cease. Seems like the system has half suspended, but the LCD is still on and
displaying. Sleep light doesn't blink indicating sleepiness.

Any help on this would be greatly appreciated :)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


gjournal questions

2006-08-31 Thread Kevin Kramer

Pavel,

running 6.1-stable with these patches
rebuilt kernel/world as of 8/28 @ 2p CST w/ these patches

gjournal6_20060808.patch
vfs_subr.c.3.patch

the backend RAID presents 4 luns, this is how we config'd it.
da1 - 8G
da2 - ~897G
da3 - 8G
da4 - ~897G

da2/4 have been partitioned in FreeBSD, then we did the following

gjournal label -v /dev/da2 /dev/da1
gjournal label -v /dev/da4 /dev/da3
newfs -U -L scr09 /dev/da2.journal
newfs -U -L scr10 /dev/da4.journal

so  1 -8 G journal for each data device.

now that the server is under load i'm seeing NFS not responding messages 
on my clients. the message corresponds to the gjournal suspend/copy 
operation, causing my clients to hang or give no such file or directory.


we copied 137G to /scr10 and it just finished, could this be some 
remains of writes from the journal?


here is the time correlation

Aug 31 13:55:24 donkey kernel: GEOM_JOURNAL[1]: Starting copy of journal.
Aug 31 13:55:24 donkey kernel: GEOM_JOURNAL[1]: Switch time of da4: 
0.002798s
Aug 31 13:55:24 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 
14.030198s

Aug 31 13:55:24 donkey kernel: GEOM_JOURNAL[1]: Data has been copied.
Aug 31 13:55:33 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 
0.13s
Aug 31 13:55:44 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 
0.13s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Msync time of /scr09: 
0.10s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Sync time of /scr09: 
0.09s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Suspend time of /scr09: 
0.07s

Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Starting copy of journal.
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Switch time of da2: 
0.002302s

Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Data has been copied.
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Msync time of /scr10: 
0.029769s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Sync time of /scr10: 
0.035259s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Suspend time of /scr10: 
10.109732s

Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Starting copy of journal.
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Switch time of da4: 
0.002756s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 
10.182759s

Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Data has been copied.
Aug 31 13:56:14 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 
0.12s
Aug 31 13:56:24 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 
0.11s
Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Msync time of /scr09: 
0.10s
Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Sync time of /scr09: 
0.09s
Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Suspend time of /scr09: 
0.07s

Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Starting copy of journal.
Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Switch time of da2: 
0.002364s

Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Data has been copied.

from syslog server

Aug 31 13:55:23 user.notice bowltest4 kernel: nfs: server donkey not 
responding, still trying

Aug 31 13:55:23 user.notice bowltest4 kernel: nfs: server donkey OK
Aug 31 13:55:23 user.notice laybox32 kernel: nfs: server donkey OK
Aug 31 13:55:29 user.notice b-115-4 kernel: nfs: server donkey not 
responding, still trying

Aug 31 13:55:29 user.notice b-115-4 kernel: nfs: server donkey OK
Aug 31 13:55:56 user.notice b-116-16 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:55:56 user.notice b-204-40 kernel: nfs: server donkey not 
responding, still trying

Aug 31 13:55:57 user.notice b-116-16 kernel: nfs: server donkey OK
Aug 31 13:55:57 user.notice lic2 kernel: nfs: server donkey not 
responding, still trying

Aug 31 13:55:57 user.notice b-204-40 kernel: nfs: server donkey OK
Aug 31 13:55:57 user.notice lic2 kernel: nfs: server donkey OK
Aug 31 13:55:57 user.notice laybox29 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:55:57 user.notice laybox26 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:55:58 user.notice laybox19 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:55:58 user.notice laybox37 kernel: nfs: server donkey not 
responding, still trying

Aug 31 13:56:00 user.notice laybox19 kernel: nfs: server donkey OK
Aug 31 13:56:00 user.notice laybox26 kernel: nfs: server donkey OK
Aug 31 13:56:00 user.notice laybox37 kernel: nfs: server donkey OK
Aug 31 13:56:00 user.notice laybox29 kernel: nfs: server donkey OK
Aug 31 13:56:05 daemon.info ws-119-8 amd[2640]: file server 
donkey20.centtech.com, type nfs, state not responding
Aug 31 13:56:05 daemon.info ws-119-8 amd[2640]: file server 
donkey20.centtech.com, type nfs, state ok
Aug 31 13:56:36 user.notice b-116-17 kernel: nfs: server donkey not 
responding, still trying

Aug 31 13:56:36 user.notice b-116-17 kernel: nfs: server donkey OK
Aug 31 13:56:40 user.notice b-210-17 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:56:41 

Re: suggestions for SATA RAID cards

2006-08-31 Thread Albert Chin
On Thu, Aug 24, 2006 at 01:39:43AM -0500, Nikolas Britton wrote:
 Their new hardware (coming soon) will have a 800MHz XScale with
 DDR2-533 cache.

According to http://www.areca.com.tw/products/html/pcietosata1280.htm,
it's already here.

-- 
albert chin ([EMAIL PROTECTED])
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Suspend freezes system on my Thinkpad X41

2006-08-31 Thread Sam Leffler
Jordan Sissel wrote:
 Subject says it all. Attempting FN+F4 (suspend) or acpiconf -s 3 or -s 4
 all
 cause the system to halt.
 
 Specs:
 Thinkpad X41
 6.1-RELEASE GENERIC
 Atheros 5212
 Broadcom BCM5751M
 SATA (ICH6 SATA150)
 
 Freezes are non-recoverable. Even hardware functions (such as lcd dimming)
 cease. Seems like the system has half suspended, but the LCD is still on
 and
 displaying. Sleep light doesn't blink indicating sleepiness.
 
 Any help on this would be greatly appreciated :)

Atheros 5212 is too imprecise; please give mac+phy rev's from
dmesg|grep ath.  Otherwise, try taking ath out of your kernel config to
see if you can suspend+resume.  I've got an outstanding issue with ath
in how suspend+resume is handled--for cardbus cards at least (never seen
it with minipci).

Sam
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: NFS locking: lockf freezes (rpc.lockd problem?)

2006-08-31 Thread Greg Byshenk
On Tue, Aug 29, 2006 at 05:05:26PM +, Michael Abbott wrote:

[I wrote]
 An alternative would be to update to RELENG_6 (or at least RELENG_6_1)
 and then try again.
 
 So.  I have done this.  And I can't reproduce the problem.

 # uname -a
 FreeBSD venus.araneidae.co.uk 6.1-STABLE FreeBSD 6.1-STABLE #1: Mon Aug 28 
 18:32:17 UTC 2006 
 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC  i386
 
 Hmm.  Hopefully this is a *good* thing, ie, the problem really has been 
 fixed, rather than just going into hiding.
 
 So, as far as I can tell, lockf works properly in this release.


Just as an interesting side note, I just experienced rpc.lockd crashing.
The server is not running RELENG_6, but RELENG_5 (FreeBSD 5.5-STABLE
#15: Thu Aug 24 18:47:20 CEST 2006).  Due to user error, someone ended
up with over 1000 processes trying to lock the same NFS mounted file at
the same time.  The result was over 1000 Cannot allocate memory errors
followed by rpc.lockd crashing.

I guess the server is telling me it wants an update...


-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Suspend freezes system on my Thinkpad X41

2006-08-31 Thread Jordan Sissel

ath_hal: 0.9.16.16 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
ath0: Atheros 5212 mem 0xa020-0xa020 irq 21 at device 2.0 on pci4
ath0: Ethernet address: 00:16:ce:40:7e:41
ath0: mac 5.9 phy 4.3 radio 3.6

It's minipci.

As for the video. The LCD display still shows text. If I am in X, it
switches to a vty before attempting sleep. The backlight stays on and the
display does not turn off.

I think when I was testing earlier, I did not have X running nor if_ath
loaded. I'll post more later tonight (tomorrow?) when I get a chance.

To further clarify: The laptop does *NOT* successfully go to sleep. It
returns to vty and locks hard. This is not a resume problem, this is a
suspend problem. Attempting to hit ctrl+alt+del to reboot does nothing.

-Jordan

On 8/31/06, Sam Leffler [EMAIL PROTECTED] wrote:


Jordan Sissel wrote:
 Subject says it all. Attempting FN+F4 (suspend) or acpiconf -s 3 or -s 4
 all
 cause the system to halt.

 Specs:
 Thinkpad X41
 6.1-RELEASE GENERIC
 Atheros 5212
 Broadcom BCM5751M
 SATA (ICH6 SATA150)

 Freezes are non-recoverable. Even hardware functions (such as lcd
dimming)
 cease. Seems like the system has half suspended, but the LCD is still on
 and
 displaying. Sleep light doesn't blink indicating sleepiness.

 Any help on this would be greatly appreciated :)

Atheros 5212 is too imprecise; please give mac+phy rev's from
dmesg|grep ath.  Otherwise, try taking ath out of your kernel config to
see if you can suspend+resume.  I've got an outstanding issue with ath
in how suspend+resume is handled--for cardbus cards at least (never seen
it with minipci).

Sam


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]