Re: requesting vinum help

2003-11-26 Thread Cosmin Stroe
On Thu, 27 Nov 2003, Greg 'groggy' Lehey wrote:

> On Wednesday, 26 November 2003 at 12:04:52 -0600, Cosmin Stroe wrote:
> >
> > I am using vinum atm, and I am having serious problems with it.  After
> > about 16 hrs of writing data to a vinum volume via NFS at a constant data
> > stream of 200k/sec and reading at 400k/sec at the same time, the whole
> > machine just freezes, hard.  The only thing I can do is reboot.  This
> > behavior appears in 4.8 and 5-CURRENT.  I have no indication of what is
> > wrong, or how to go about finding it out.  The problem is either with NFS
> > or Vinum, and I'm leaning towards Vinum (because of the failure in both
> > -STABLE and -CURRENT).
> >
> > I'm not the kind of person that relies on other people, and I like to fix
> > my own problems, but this is a problem which I cannot fix at this time.
> > So, I'm planning to look through the code of vinum and start messing with
> > it to figure out how it works and how to debug it.
> 
> This is unlikely to get you very far.  Some more details (offline if
> you prefer) would be handy, but as you say, you can't even be sure
> that it's Vinum.  The best thing would be to get the system into the
> kernel debugger at the point of freeze, if that's possible, and try to
> work out what has happened.
> 

Quick question:  If this is a software problem with vinum, there should be 
no way it can hard lock a machine.  Is this assumption correct ?  I should 
be able to invoke the kernel debugger by pressing the hotkey 
(ctrl+alt+esc) while the machine is locked and get a backtrace (altho i'd 
be in an ISR servicing the hotkey, so i'm not sure it'd do much good).

Any special suggestions on debugging this kind of freezing problem ?  The 
hardware has been tested and it's good (CPU,RAM,HDs). (some kind of 
watchdog in software ??)


> > What would also be appreciated is an overall "map" of how vinum is
> > organized and how it works.
> 
> You've read the documentation on http://www.vinumvm.org/, right?  If
> you have any questions, I'm sure it can be improved on.
> 

Yes :).

> Greg
> --
> See complete headers for address and phone numbers.
> 


Cosmin Stroe.

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: requesting vinum help

2003-11-26 Thread Cosmin Stroe
On Wed, 26 Nov 2003, Poul-Henning Kamp wrote:

> In message <[EMAIL PROTECTED]>, "Joel M. Baldwin" writes:
> 
> >I was trying to use some restraint and not rant and rave in public like
> >I wanted to do.  I'm rather miffed that nothing appeared in UPDATING.
> >Rather than an unproductive public RANT I thought I'd ask for private assistance.
> >I can post a summary afterwards if you like, or even better write a better
> >FAQ/tutorial on vinum.
> 
> Joel,
> 
> The problem is that vinum is hot political potato in the project.
> 
> In the eyes of a fair number of competent people, vinum has never
> quite "made it".  I think most of them have given it a shot and
> lost data to it.  Some of them, after looking in the code to "fix
> the problem", said "never again!" and now hate vinum of a good
> heart.
> 
> Greg has disclaimed maintainership of vinum some time ago for reasons
> of politics, and he now is of the opinion that it is everybodys
> (elses) task to maintain vinum.  Everybody else disagree and belive
> that "vinum is very much Gregs own problem".
> 
> With Greg being a core@ member, and well known for his ability to
> talk an acturan megadonkey into taking a stroll after first having
> talked its legs off about procedural issues, "Doing something about
> vinum" is permanently on the "we should really..." list and everybody
> hopes somebody else will "deal with it".  Of course, in the end
> nobody does.
> 
> As matters stand, we are doing our users a disservice by continuing
> to pretend everything is OK when in fact it is not at all.
> 
> Personally, I think vinum(8) should not be in our 5-STABLE featureset
> if it is not brought up to current standards and actively maintained.
> 
> But at the very least we should have the release notes reflect that
> vinum is unmaintained and belived to unreliable and have vinum(8)
> issue a very stern warning to people along those lines.
> 
> I'm sure that a major bikeshed will now ensue and people will argue
> that there is a lot more to this dispute than what I've said above.
> 
> They're right of course, this is a very short summary :-)
> 
> Poul-Henning
> 
> 


I am using vinum atm, and I am having serious problems with it.  After 
about 16 hrs of writing data to a vinum volume via NFS at a constant data 
stream of 200k/sec and reading at 400k/sec at the same time, the whole 
machine just freezes, hard.  The only thing I can do is reboot.  This 
behavior appears in 4.8 and 5-CURRENT.  I have no indication of what is 
wrong, or how to go about finding it out.  The problem is either with NFS 
or Vinum, and I'm leaning towards Vinum (because of the failure in both 
-STABLE and -CURRENT).

I'm not the kind of person that relies on other people, and I like to fix 
my own problems, but this is a problem which I cannot fix at this time.  
So, I'm planning to look through the code of vinum and start messing with 
it to figure out how it works and how to debug it.  This is how important 
Vinum is to me at the moment.

I'm not a kernel coder, or an intense coder in general (but I'm proficient 
in C/C++, and have used FreeBSD for quite some years now), so I'm reading  
the Kernel Developer's Handbook as a starting point.  If anyone has other 
online documentation on FreeBSD Kernel programming, it would be much 
appreciated.  

What would also be appreciated is an overall "map" of how vinum is 
organized and how it works.  Otherwise, I'll have to painstaikingly 
go through the code and figure everything out little by little 
(which I plan to do, but if you know how Vinum works, everything is much 
easier, makes sense right away, and takes less time).

Thank you in advance.

Cosmin Stroe.

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


LOR (swap_pager.c:1323, swap_pager.c:1838, uma_core.c:876) (current:Nov17)

2003-11-18 Thread Cosmin Stroe
Here is the stack backtrace:

lock order reversal
 1st 0xc1da318c vm object (vm object) @ /usr/src/sys/vm/swap_pager.c:1323
 2nd 0xc0724900 swap_pager swhash (swap_pager swhash) @ 
/usr/src/sys/vm/swap_pager.c:1838
 3rd 0xc0c358c4 vm object (vm object) @ /usr/src/sys/vm/uma_core.c:876
Stack backtrace:
backtrace(c0692be9,c0c358c4,c06a376c,c06a376c,c06a464d) at backtrace+0x17
witness_lock(c0c358c4,8,c06a464d,36c,1) at witness_lock+0x672
_mtx_lock_flags(c0c358c4,0,c06a464d,36c,1) at _mtx_lock_flags+0xba
obj_alloc(c0c22480,1000,c976f9db,101,c06f3f50) at obj_alloc+0x3f
slab_zalloc(c0c22480,1,c06a464d,68c,c0c22494) at slab_zalloc+0xb3
uma_zone_slab(c0c22480,1,c06a464d,68c,c0c22520) at uma_zone_slab+0xd6
uma_zalloc_internal(c0c22480,0,1,5c1,72e,c06f55a8) at uma_zalloc_internal+0x3e
uma_zalloc_arg(c0c22480,0,1,72e,2) at uma_zalloc_arg+0x3ab
swp_pager_meta_build(c1da318c,7,0,2,0) at swp_pager_meta_build+0x174
swap_pager_putpages(c1da318c,c976fbb8,8,0,c976fb20) at swap_pager_putpages+0x32d
default_pager_putpages(c1da318c,c976fbb8,8,0,c976fb20) at default_pager_putpages+0x2e
vm_pageout_flush(c976fbb8,8,0,0,c06f36a0) at vm_pageout_flush+0x17a
vm_pageout_clean(c0dae2d8,0,c06a4468,32a,0) at vm_pageout_clean+0x305
vm_pageout_scan(0,0,c06a4468,5a9,1f4) at vm_pageout_scan+0x65f
vm_pageout(0,c976fd48,c068d4ed,311,0) at vm_pageout+0x31b
fork_exit(c0625250,0,c976fd48) at fork_exit+0xb4
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xc976fd7c, ebp = 0 ---
Debugger("witness_lock")
Stopped at  Debugger+0x54:  xchgl   %ebx,in_Debugger.0
db>

I'm running the sources from yesterday, nov 17:

FreeBSD 5.1-CURRENT #0: Mon Nov 17 06:40:05 CST 2003 
root@:/usr/obj/usr/src/sys/GALAXY 

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: checking stopevent 2!

2003-11-15 Thread Cosmin Stroe
On Sat, Nov 15, 2003 at 09:38:37AM -0500, Robert Watson wrote:
> 
> On Sat, 15 Nov 2003, Andy Farkas wrote:
> 
> would probably be useful if you could drop to DDB and generate a trace for
> the event.
> 

I've done that, in this email message:

http://docs.freebsd.org/cgi/getmsg.cgi?fetch=2157067+0+current/freebsd-current

> > 
> > ...
> > Nov 15 16:05:44  hummer kernel: checking stopevent 2 with the following 
> > non-sleepable locks held:
> > Nov 15 16:05:44  hummer kernel: exclusive sleep mutex sigacts r = 0 
> > (0xc4656aa8) locked @ /hummer/src-current/src/sys/kern/kern_condvar.c:289
> > Nov 15 16:05:44  hummer kernel: checking stopevent 2 with the following 
> > non-sleepable locks held:
> > Nov 15 16:05:44  hummer kernel: exclusive sleep mutex sigacts r = 0 
> > (0xc4656aa8) locked @ /hummer/src-current/src/sys/kern/subr_trap.c:260
> > Nov 15 16:05:44  hummer kernel: checking stopevent 2 with the following 
> > non-sleepable locks held:
> > Nov 15 16:05:45  hummer kernel: exclusive sleep mutex sigacts r = 0 
> > (0xc4656aa8) locked @ /hummer/src-current/src/sys/kern/subr_trap.c:260
> > Nov 15 16:05:45  hummer kernel: checking stopevent 2 with the following 
> > non-sleepable locks held:
> > Nov 15 16:05:45  hummer kernel: exclusive sleep mutex sigacts r = 0 
> > (0xc4663aa8) locked @ /hummer/src-current/src/sys/kern/kern_synch.c:293
> > Nov 15 16:05:45  hummer kernel: checking stopevent 2 with the following 
> > non-sleepable locks held:
> > Nov 15 16:05:45  hummer kernel: exclusive sleep mutex sigacts r = 0 
> > (0xc4663aa8) locked @ /hummer/src-current/src/sys/kern/subr_trap.c:260
> > Nov 15 16:05:45  hummer kernel: checking stopevent 2 with the following 
> > non-sleepable locks held:
> > Nov 15 16:05:45  hummer kernel: exclusive sleep mutex sigacts r = 0 
> > (0xc4663aa8) locked @ /hummer/src-current/src/sys/kern/subr_trap.c:260
> > Nov 15 16:05:45  hummer kernel: checking stopevent 2 with the following 
> > non-sleepable locks held:
> > Nov 15 16:05:45  hummer kernel: exclusive sleep mutex sigacts r = 0 
> > (0xc4656aa8) locked @ /hummer/src-current/src/sys/kern/kern_condvar.c:289
> > Nov 15 16:05:45  hummer kernel: checking stopevent 2 with the following 
> > non-sleepable locks held:
> > Nov 15 16:05:46  hummer kernel: exclusive sleep mutex sigacts r = 0 
> > (0xc4656aa8) locked @ /hummer/src-current/src/sys/kern/subr_trap.c:260
> > Nov 15 16:05:46  hummer kernel: checking stopevent 2 with the following 
> > non-sleepable locks held:
> > Nov 15 16:05:46  hummer kernel: exclusive sleep mutex sigacts r = 0 
> > (0xc4656aa8) locked @ /hummer/src-current/src/sys/kern/kern_condvar.c:289
> > Nov 15 16:05:46  hummer kernel: checking stopevent 2 with the following 
> > non-sleepable locks held:
> > Nov 15 16:05:46  hummer kernel: exclusive sleep mutex sigacts r = 0 
> > (0xc4656aa8) locked @ /hummer/src-current/src/sys/kern/subr_trap.c:260
> > Nov 15 16:05:46  hummer kernel: checking stopevent 2 with the following 
> > non-sleepable locks held:
> > Nov 15 16:05:46  hummer kernel: exclusive sleep mutex sigacts r = 0 
> > (0xc4656aa8) locked @ /hummer/src-current/src/sys/kern/subr_trap.c:260
> > ...
> > 
> > 
> > 
> > This is latest -current (cvsup'd a few hours ago)
> > 
> > 
> > --
> > 
> >  :{ [EMAIL PROTECTED]
> > 
> > Andy Farkas
> > System Administrator
> >Speednet Communications
> >  http://www.speednet.com.au/
> > 
> > 
> > ___
> > [EMAIL PROTECTED] mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to "[EMAIL PROTECTED]"
> > 
> 
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Cosmin Stroe
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


exclusive sleep mutex ... /usr/src/sys/kern/kern_synch.c:293

2003-11-14 Thread Cosmin Stroe
:  port 
0xe800-0xe80f,0xe400-0xe403,0xe000-0xe007,0xdc00-0xdc03,0xd800-0xd807 mem 
0xdc00-0xdc003fff irq 11 at device 10.0 on pci0
atapci1: [MPSAFE]
ata2: at 0xd800 on atapci1
ata2: [MPSAFE]
ata3: at 0xe000 on atapci1
ata3: [MPSAFE]
orm0:  at iomem 0xc8000-0xca7ff,0xc-0xc7fff on isa0
atkbdc0:  at port 0x64,0x60 on isa0
atkbd0:  flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
psm0: failed to get data.
psm0:  irq 12 on atkbdc0
psm0: model IntelliMouse, device ID 3
fdc0:  at port 
0x3f7,0x3f0-0x3f5 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
ppc0: parallel port not found.
sc0:  at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x100>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A, console
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
vga0:  at port 0x3c0-0x3df iomem 0xa-0xb on isa0
unknown:  can't assign resources (port)
unknown:  can't assign resources (irq)
unknown:  can't assign resources (port)
unknown:  can't assign resources (port)
unknown:  can't assign resources (port)
Timecounter "TSC" frequency 1100046119 Hz quality 800
Timecounters tick every 10.000 msec
ipfw2 initialized, divert enabled, rule-based forwarding enabled, default to acc
ept, logging disabled
GEOM: create disk ad0 dp=0xc1c53c60
ad0: 19092MB  [38792/16/63] at ata0-master UDMA66
acd0: DVDROM  at ata1-slave PIO4
GEOM: create disk ad4 dp=0xc1c53a60
ad4: 29196MB  [59320/16/63] at ata2-master UDMA66
Mounting root from ufs:/dev/ad0s1a
Loading configuration files.
00400 reject tcp from any to any dst-port 161 via sis0
Entropy harvesting: interrupts ethernet point_to_point.
kernel dumps on /dev/ad0s1b
swapon: adding /dev/ad0s1b as swap device
Starting file system checks:
/dev/ad0s1a: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ad0s1a: clean, 9028 free (372 frags, 1082 blocks, 0.6% fragmentation)
/dev/ad0s1e: DEFER FOR BACKGROUND CHECKING
/dev/ad0s1d: DEFER FOR BACKGROUND CHECKING
/dev/ad4s1: DEFER FOR BACKGROUND CHECKING
/dev/ad0s1f: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ad0s1f: clean, 1965768 free (79008 frags, 23584W5 blocks, 0.9% fAragmentation)
RNING: /tmp was not properly dismounted
WARNING: /var was not properly dismounted
/var: superblock summary recomputed
WARNING: /mnt/ftp was not properly dismounted
debug.witness_ddb: 0 -> 1
Setting hostname: cosmin.phy.uic.edu.
nge0: gigabit link up
lo0: flags=8049 mtu 16384
inet 127.0.0.1 netmask 0xff00 
inet6 ::1 prefixlen 128 
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2 
nge0: flags=8843 mtu 1500
options=13
inet 131.193.192.26 netmask 0xff00 broadcast 131.193.192.255
inet6 fe80::250:baff:fe39:6d6%nge0 prefixlen 64 tentative scopeid 0x1 
ether 00:50:ba:39:06:d6
media: Ethernet autoselect (none)
status: no carrier
00400 reject tcp from any to any dst-port 161 via sis0
db> panic
panic: from debugger

syncing disks, buffers remaining... 453 453 453 453 453 453 453 453 453 453 453 453 
453 453 453 453 453 453 453 453 
giving up on 409 buffers
Uptime: 7s
Dumping 128 MB
 16 32 48 64 80 96 112
Dump complete
Flushed all rules.
00100 allow ip from any to any via lo0
00200 deny ip from any to 127.0.0.0/8
00300 deny ip from 127.0.0.0/8 to any
65000 allow ip from any to any
Firewall rules loaded, starting divert daemons:.
net.inet.ip.fw.enable: 1 -> 1
add net default: gateway 131.193.192.1
Additional routing options:.
hw.bus.devctl_disable: 0 -> 1
Mounting NFS file systems:.
Starting syslogd.
Nov 14 19:38:26  syslogd: /var/log/debug.log: No such file or directory
Nov 14 19:38:26 cosmin syslogd: kernel boot file is /boot/kernel/kernel
checking stopevent 2 with the following non-sleepable locks held:
exclusive sleep mutex sigacts r = 0 (0xc1cb8aa8) locked @ 
/usr/src/sys/kern/kern_synch.c:293
Debugger("witness_warn")
Stopped at  Debugger+0x54:  xchgl   %ebx,in_Debugger.0
db> trace
Debugger(c0675228,c93f4b88,1,c93f4b84,0) at Debugger+0x54
witness_warn(5,c1c0fcc8,c068d494,2,c06f37a0) at witness_warn+0x19f
issignal(c1bb8dc0,2,c068fc5b,bd,c1c0fcc8) at issignal+0x16b
cursig(c1bb8dc0,0,c0690152,125,1) at cursig+0xe8
msleep(c1c0fc5c,c1c0fcc8,15c,c068fb80,0) at msleep+0x631
wait1(c1bb8dc0,c93f4d10,0,c93f4d40,c065bca0) at wait1+0x990
wait4(c1bb8dc0,c93f4d10,c06a868e,3ee,4) at wait4+0x20
syscall(2f,2f,2f,bfbfeec0,bfbfeec0) at syscall+0x2e0
Xint0x80_syscall() at Xint0x80_syscall+0x1d
--- syscall (7, FreeBSD ELF32, wait4), eip = 0x280d0b1f, esp = 0xbfbfe84c, ebp = 
0xbfbfe868 ---
db> 



Cosmin Stroe

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"