Re: Replicable file-system corruption due to fsck/ufs

2019-04-13 Thread Peter Holm
On Fri, Apr 12, 2019 at 04:13:00PM -0700, Kirk McKusick wrote:
> > Peter Holm  wrote:
> > 
> >> I see this even with a single truncate on HEAD.
> >>
> >> $ ./truncate10.sh
> >> 96 -rw-r--r--  1 root  wheel  1073741824 11 apr. 06:33 test
> >> ** /dev/md10a
> >> ** Last Mounted on /mnt
> >> ** Phase 1 - Check Blocks and Sizes
> >> INODE 3: FILE SIZE 1073741824 BEYOND END OF ALLOCATED FILE, SIZE SHOULD BE 
> >> 268435456
> >> ADJUST? yes
> > 
> > Thanks.. I should have tested that myself.. doh! I was trying to
> > closer replicate my real file that triggered the problem which
> > contained a number of sparse areas.
> > 
> > And thanks for adding Kirk to the discussion. I wanted to first be
> > sure it wasn't just me :-)
> > 
> > Cheers, Jamie
> 
> This is indeed a bug in the calculation of the location of the last
> block of a file. I believe that the following patch to head will
> fix it.
> 
> Peter, can you please test and let me know.
> 
> If Peter confirms that it fixes the bug, I will check it into head
> and MFC it to 12-stable and 11-stable after a 2-week settle-in time.
> 
>   Kirk McKusick
> 

Yes, this patch works for me.

-- 
Peter
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Replicable file-system corruption due to fsck/ufs

2019-04-10 Thread Peter Holm
On Thu, Apr 11, 2019 at 04:47:43AM +0100, Jamie Landeg-Jones wrote:
> I've noticed a replicable disk corruption by fsck_ufs/ffs on sparse files.
> 
> This is on amd/64 12-stable-20190409, but I first noticed it on
> 12-stable-20190326.
> 
> I didn't notice it on my previous build of 12-stable-20190107, but I
> may not have had any relevant sparse files at the time, so I don't know
> if that version was affected. 12-release worked OK.
> 
> Here is a simplified replicable example. Thinking about it just now, I
> suspect it's triggered by files which end in sparseness.
> 
> Can anyone else replicate this, or has my machine gone nuts?
> 
> Cheers, Jamie
> 
>  | root@thompson# l
>  | total 12
>  | 4 drwxr-x---   2 root  wheel  -   512 11 Apr 04:08 ./
>  | 4 drwxr-xr-x  16 root  wheel  - 1,024 11 Apr 04:08 ../
>  | 4 -rw-r-   1 root  wheel  -43 11 Apr 04:08 typescript
>  |
>  | root@thompson# dd if=/dev/zero bs=1m count=2048 of=test.img
>  | 2048+0 records in
>  | 2048+0 records out
>  | 2147483648 bytes transferred in 4.127411 secs (520298036 bytes/sec)
>  |
>  | root@thompson# l
>  | total 2097708
>  |   4 drwxr-x---   2 root  wheel  -   512 11 Apr 04:08 ./
>  |   4 drwxr-xr-x  16 root  wheel  - 1,024 11 Apr 04:08 ../
>  | 2097696 -rw-r-   1 root  wheel  - 2,147,483,648 11 Apr 04:08 test.img
>  |   4 -rw-r-   1 root  wheel  -43 11 Apr 04:08 typescript
>  |
>  | root@thompson# mdconfig test.img
>  | md1
>  |
>  | root@thompson# newfs /dev/md1
>  | /dev/md1: 2048.0MB (4194304 sectors) block size 32768, fragment size 4096
>  | using 4 cylinder groups of 512.03MB, 16385 blks, 65664 inodes.
>  | super-block backups (for fsck_ffs -b #) at:
>  |  192, 1048832, 2097472, 3146112
>  |
>  | root@thompson# md mnt
>  | mnt
>  |
>  | root@thompson# mount /dev/md1 mnt
>  |
>  | root@thompson# cd mnt/
>  | ~/x/mnt ~/x
>  |
>  | root@thompson# df .
>  | Filesystem 1K-blocks Used Avail Capacity  Mounted on
>  | /dev/md1   2,031,1328 1,868,636 0%/root/x/mnt
>  |
>  | root@thompson# l
>  | total 12
>  | 4 drwxr-xr-x  3 root  wheel - 512 11 Apr 04:09 ./
>  | 4 drwxr-x---  3 root  wheel - 512 11 Apr 04:09 ../
>  | 4 drwxrwxr-x  2 root  operator  - 512 11 Apr 04:09 .snap/
>  |
>  | root@thompson# echo "testing 1...2...3..." >> test ; truncate -s +1g test
>  | root@thompson# echo "testing 1...2...3..." >> test ; truncate -s +1g test
>  | root@thompson# echo "testing 1...2...3..." >> test ; truncate -s +1g test
>  | root@thompson# echo "testing 1...2...3..." >> test ; truncate -s +1g test
>  | root@thompson# echo "testing 1...2...3..." >> test ; truncate -s +1g test
>  | root@thompson# echo "testing 1...2...3..." >> test ; truncate -s +1g test
>  | root@thompson# echo "testing 1...2...3..." >> test ; truncate -s +1g test
>  | root@thompson# echo "testing 1...2...3..." >> test ; truncate -s +1g test
>  | root@thompson# echo "testing 1...2...3..." >> test ; truncate -s +1g test
>  |
>  | root@thompson# l
>  | total 652
>  |   4 drwxr-xr-x  3 root  wheel -   512 11 Apr 04:14 ./
>  |   4 drwxr-x---  3 root  wheel -   512 11 Apr 04:09 ../
>  |   4 drwxrwxr-x  2 root  operator  -   512 11 Apr 04:09 .snap/
>  | 640 -rw-r-  1 root  wheel - 9,663,676,605 11 Apr 04:14 test
>  |
>  | root@thompson# sha256 -r test > sha256.out
>  |
>  | root@thompson# cd ..
>  | ~/x ~/x/mnt
>  |
>  | root@thompson# umount mnt
>  |
>  | root@thompson# fsck /dev/md1
>  | ** /dev/md1
>  | ** Last Mounted on /root/x/mnt
>  | ** Phase 1 - Check Blocks and Sizes
>  | INODE 4: FILE SIZE 9663676605 BEYOND END OF ALLOCATED FILE, SIZE SHOULD BE 
> 1342210048
>  | ADJUST? [yn] y
>  |
>  | ** Phase 2 - Check Pathnames
>  | ** Phase 3 - Check Connectivity
>  | ** Phase 4 - Check Reference Counts
>  | ** Phase 5 - Check Cyl groups
>  | 4 files, 163 used, 507620 free (20 frags, 63450 blocks, 0.0% fragmentation)
>  |
>  | * FILE SYSTEM IS CLEAN *
>  |
>  | * FILE SYSTEM WAS MODIFIED *
>  |
>  | root@thompson# fsck /dev/md1
>  | ** /dev/md1
>  | ** Last Mounted on /root/x/mnt
>  | ** Phase 1 - Check Blocks and Sizes
>  | PARTIALLY TRUNCATED INODE I=4
>  | SALVAGE? [yn] y
>  |
>  | INCORRECT BLOCK COUNT I=4 (1280 should be 256)
>  | CORRECT? [yn] y
>  |
>  | INODE 4: FILE SIZE 1342210048 BEYOND END OF ALLOCATED FILE, SIZE SHOULD BE 
> 268468224
>  | ADJUST? [yn] y
>  |
>  | ** Phase 2 - Check Pathnames
>  | ** Phase 3 - Check Connectivity
>  | ** Phase 4 - Check Reference Counts
>  | ** Phase 5 - Check Cyl groups
>  | FREE BLK COUNT(S) WRONG IN SUPERBLK
>  | SALVAGE? [yn] y
>  |
>  | SUMMARY INFORMATION BAD
>  | SALVAGE? [yn] y
>  |
>  | BLK(S) MISSING IN BIT MAPS
>  | SALVAGE? [yn] y
>  |
>  | 4 files, 35 used, 507748 free (20 frags, 63466 blocks, 0.0% fragmentation)
>  |
>  | * FILE SYSTEM IS CLEAN *
>  |
>  | * FILE SYSTEM WAS MODIFIED *
>  |
>  | root@thompson# fsck /dev/md1
>  | ** /dev/md1
>  | ** Last Mounted on /roo

Re: Unkillable process in "vm map (user)"

2017-12-22 Thread Peter Holm
On Fri, Dec 22, 2017 at 02:45:21PM +0200, Konstantin Belousov wrote:
> On Fri, Dec 22, 2017 at 10:26:07AM +0100, Peter Holm wrote:
> > Here's some more info, using the original scenario:
> > https://people.freebsd.org/~pho/stress/log/kostik1070.txt
> 
> This is somewhat weird but also not too puzzling.
> 
> The vmdaemon (pid 41) is running, it tries to reduce the count of resident
> pages in some pmap, most likely the one from the pid 20655.  This process
> seems to be huge: according to the v_stats, there is 15681264 inactive pages,
> and the pagedaemon tries to obtain a vm object lock which is owned by
> vmdaemon, resident count for that object is 15897170 (~64Gb).
> 
> So basically almost all memory belongs to the single object and vmdaemon
> processing it.  Since the object' queue is huge, the map and the object
> locks are taken for long time, preventing other processes touching them
> from making a progress.
> 
> Might be try this (it combines new changes with the OOM patch). I am not
> sure that should_yield() in the vm_swapout_object_deactivate_pages() is
> a good idea unconditionally, but it might be better than the current
> situation.
> 
> diff --git a/sys/vm/vm_fault.c b/sys/vm/vm_fault.c
> index ece496407c2..ce6208569c6 100644
> --- a/sys/vm/vm_fault.c

The patch fixes the problem I got with this scenario.

- Peter
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Unkillable process in "vm map (user)"

2017-12-22 Thread Peter Holm
On Sun, Dec 10, 2017 at 10:42:17PM +0200, Konstantin Belousov wrote:
> On Mon, Dec 11, 2017 at 07:09:31AM +1100, Peter Jeremy wrote:
> > I was experimenting with ports/devel/libmill (which is a library that
> > provides Go-styly functionality for C programs) and managed to create
> > an unkillable process by spawning 100 "goroutines" (think very
> > cheap "thread" or "coroutine") joined by "channels" (think message
> > passing pipes).  (The program ran basically instantaneously with 1
> > or 10 "goroutines", and the Go version has no problems with 100
> > goroutines on a much smaller system).
> > 
> > According to SIGINFO, it's blocked on "vm map (user)" but I can't kill
> > it.  Can anyone suggest a way to unwedge it?
> > 
> > This is on a system running FreeBSD/amd64 11.1-STABLE r324494.
> Ensure that you use at least r326188.
> 
> > 
> > server% procstat -kk 452
> >   PIDTID COMMTDNAME  KSTACK
> >   452 102382 chain   -   mi_switch+0x17c 
> > sleepq_switch+0x118 sleepq_wait+0x43 _sx_slock_hard+0x34e _sx_slock+0xd4 
> > vm_map_lookup+0xbd vm_fault_hold+0x194b vm_fault+0x75 trap_pfault+0x107 
> > trap+0x382 calltrap+0x8
> 
> There is another thread owning the map lock, and seeing what that thread
> does is the next step.
> 
> Can you provide a binary to reproduce which does not depend on any
> library except the base libs ?

Here's some more info, using the original scenario:
https://people.freebsd.org/~pho/stress/log/kostik1070.txt

- Peter
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Nullfs leaks i-nodes

2013-05-09 Thread Peter Holm
On Wed, May 08, 2013 at 12:13:17PM +0300, Konstantin Belousov wrote:
> On Tue, May 07, 2013 at 08:30:06AM +0200, G??ran L??wkrantz wrote:
> > I created a PR, kern/178238, on this but would like to know if anyone has 
> > any ideas or patches?
> > 
> > Have updated the system where I see this to FreeBSD 9.1-STABLE #0 r250229 
> > and still have the problem.
> 
> The patch below should fix the issue for you, at least it did so in my
> limited testing.
> 
> What is does:
> 1. When inactivating a nullfs vnode, check if the lower vnode is
>unlinked, and reclaim upper vnode if so. [This fixes your case].
> 
> 2. Besides a callback to the upper filesystems for the lower vnode
>reclaimation, it also calls the upper fs for lower vnode unlink.
>This allows nullfs to purge cached vnodes for the unlinked lower.
>[This fixes an opposite case, when the vnode is removed from the
>lower mount, but upper aliases prevent the vnode from being
>recycled].
> 
> 3. Fix a wart which existed from the introduction of the nullfs caching,
>do not unlock lower vnode in the nullfs_reclaim_lowervp().  It should
>be completely innocent, but now it is also formally safe.
> 
> 4. Fix vnode reference leak in nullfs_reclaim_lowervp().
> 
> Please note that the patch is basically not tested, I only verified your
> scenario and a mirror of it as described in item 2.
> 
> diff --git a/sys/fs/nullfs/null.h b/sys/fs/nullfs/null.h
> index 4f37020..a624be6 100644
> --- a/sys/fs/nullfs/null.h

The page fault seen in fifo_close() seems unrelated to this patch,
which I will continue testing some more.

The scenario triggering the page fault is the "rm":

mdconfig -a -t swap -s 1g -u 5
bsdlabel -w md5 auto
newfs -U md5a
mount /dev/md5a /mnt
mount -t nullfs /mnt /mnt2
mkfifo /mnt2/fifo
rm /mnt/fifo

Not a problem on 8.3-STABLE r247938.

- Peter
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Nullfs leaks i-nodes

2013-05-08 Thread Peter Holm
On Wed, May 08, 2013 at 07:58:56PM +0200, Peter Holm wrote:
> On Wed, May 08, 2013 at 12:13:17PM +0300, Konstantin Belousov wrote:
> > On Tue, May 07, 2013 at 08:30:06AM +0200, G??ran L??wkrantz wrote:
> > > I created a PR, kern/178238, on this but would like to know if anyone has 
> > > any ideas or patches?
> > > 
> > > Have updated the system where I see this to FreeBSD 9.1-STABLE #0 r250229 
> > > and still have the problem.
> > 
> > The patch below should fix the issue for you, at least it did so in my
> > limited testing.
> > 
> > What is does:
> > 1. When inactivating a nullfs vnode, check if the lower vnode is
> >unlinked, and reclaim upper vnode if so. [This fixes your case].
> > 
> > 2. Besides a callback to the upper filesystems for the lower vnode
> >reclaimation, it also calls the upper fs for lower vnode unlink.
> >This allows nullfs to purge cached vnodes for the unlinked lower.
> >[This fixes an opposite case, when the vnode is removed from the
> >lower mount, but upper aliases prevent the vnode from being
> >recycled].
> > 
> > 3. Fix a wart which existed from the introduction of the nullfs caching,
> >do not unlock lower vnode in the nullfs_reclaim_lowervp().  It should
> >be completely innocent, but now it is also formally safe.
> > 
> > 4. Fix vnode reference leak in nullfs_reclaim_lowervp().
> > 
> > Please note that the patch is basically not tested, I only verified your
> > scenario and a mirror of it as described in item 2.
> > 
> > diff --git a/sys/fs/nullfs/null.h b/sys/fs/nullfs/null.h
> > index 4f37020..a624be6 100644
> 
> I got this page fault after interrupting a nullfs test that had been
> running for three hours:
> 
> http://people.freebsd.org/~pho/stress/log/kostik562.txt
> 

Seems to be easily reproduced, so I compiled null_vnops.c and
fifo_vnops.c without "-O" in order to get some more info:

http://people.freebsd.org/~pho/stress/log/kostik563.txt

- Peter
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Nullfs leaks i-nodes

2013-05-08 Thread Peter Holm
On Wed, May 08, 2013 at 12:13:17PM +0300, Konstantin Belousov wrote:
> On Tue, May 07, 2013 at 08:30:06AM +0200, G??ran L??wkrantz wrote:
> > I created a PR, kern/178238, on this but would like to know if anyone has 
> > any ideas or patches?
> > 
> > Have updated the system where I see this to FreeBSD 9.1-STABLE #0 r250229 
> > and still have the problem.
> 
> The patch below should fix the issue for you, at least it did so in my
> limited testing.
> 
> What is does:
> 1. When inactivating a nullfs vnode, check if the lower vnode is
>unlinked, and reclaim upper vnode if so. [This fixes your case].
> 
> 2. Besides a callback to the upper filesystems for the lower vnode
>reclaimation, it also calls the upper fs for lower vnode unlink.
>This allows nullfs to purge cached vnodes for the unlinked lower.
>[This fixes an opposite case, when the vnode is removed from the
>lower mount, but upper aliases prevent the vnode from being
>recycled].
> 
> 3. Fix a wart which existed from the introduction of the nullfs caching,
>do not unlock lower vnode in the nullfs_reclaim_lowervp().  It should
>be completely innocent, but now it is also formally safe.
> 
> 4. Fix vnode reference leak in nullfs_reclaim_lowervp().
> 
> Please note that the patch is basically not tested, I only verified your
> scenario and a mirror of it as described in item 2.
> 
> diff --git a/sys/fs/nullfs/null.h b/sys/fs/nullfs/null.h
> index 4f37020..a624be6 100644

I got this page fault after interrupting a nullfs test that had been
running for three hours:

http://people.freebsd.org/~pho/stress/log/kostik562.txt

- Peter
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: fsck_ufs out of swapspace

2011-12-20 Thread Peter Holm
On Tue, Dec 20, 2011 at 11:48:33AM +0200, Kostik Belousov wrote:
> On Tue, Dec 20, 2011 at 09:51:43AM +1100, Peter Jeremy wrote:
> > On 2011-Dec-19 22:27:49 +0100, Michiel Boland  wrote:
> > >Problem solved - it was indeed an endian thing.
> > >The problem is that fsck uses a real_dev_bsize variable that is declared 
> > >long, 
> > >but the DIOCGSECTORSIZE ioctl takes an u_int argument.
> > 
> > To be accurate, this isn't an endian problem, it's a general problem
> > of passing a pointer to an incorrectly sized object.  The bug is
> > masked on amd64 & iA64 because real_dev_bsize is statically allocated
> > and therefore initialised to zero.  This means the failure to assign
> > the top 32 bits in the ioctl doesn't affect the final result.
> > 
> > >A PR has been submitted.
> > 
> > sparc64/163460 for the record.  Thank you for tracking that down.
> 
> The easier fix is to change the type of real_dev_bsize. I used long only
> because other n variables keeping the sector size are long, but there
> is no much reason to use long there.
> 
> Peter, would you, please retest the +J on non-512 byte sectors, with the
> patch attached ?
> 

No problems seen while testing on both i386 and amd64 with a malloc MD
disk, sector size of 4k and SUJ.

- Peter


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: BTX loader hangs after version info

2008-05-25 Thread Peter Holm
On Mon, May 26, 2008 at 02:23:11PM +1200, Mark Kirkwood wrote:
> Peter Holm wrote:
> >On Sun, May 25, 2008 at 08:33:01PM +1200, Mark Kirkwood wrote:
> >  
> >>I wrote:
> >>
> >>>John Baldwin wrote:
> >>>  
> >>>>Try this patch.  I'm not 100% certain this will fix it as I can't 
> >>>>reproduce
> >>>>the issue, but I think it might help.  Specifically, when the boot 
> >>>>code makes
> >>>>a v86 call, the loader/boot2/whatever swaps in/out a new set of 
> >>>>registers via
> >>>>the v86 structure including the eflags register.  However, none of 
> >>>>the boot
> >>>>programs actually initialized the v86 structure.   Thus, the BIOS 
> >>>>routines
> >>>>would start off running with whatever garbage was in v86.efl when 
> >>>>each boot
> >>>>program started.  This meant that we could end up invoking BIOS 
> >>>>routines with
> >>>>interrupts disabled, and I think this might explain a hard hang (if a 
> >>>>BIOS
> >>>>routine was waiting for an interrupt the interrupt would never 
> >>>>fire).  The
> >>>>patch fixes all the boot programs to initialize v86 to a better known 
> >>>>state. At the least it sets v86.efl to a sane value (0x202) rather 
> >>>>than random.  (The
> >>>>random might have always been 0x0 BTW, not sure on that one.)
> >>>>
> >>>> 
> >>>>
> >>>Thanks John,
> >>>
> >>>Unfortunately this patch does *not* cure the issue for my old 
> >>>Supermicro P3TDDE, it still hangs just before presenting the menu. I 
> >>>had to boot off the livefs and copy /boot/loader.old -> /boot/loader 
> >>>to get back to being bootable again - but at least the old fella is on 
> >>>a more up-to-date 7-STABLE now :-)
> >>>  
> >>Given that the patch *did* cure Peters Tyan S2720, I'll double check I 
> >>didn't fat finger applying the patch (mind you the Tyan has AMI BOIS - 
> >>same as my Supermicro P3TDERs that *do* work ok with current 7-STABLE, 
> >>whereas the P3TDDE has Award BIOS).
> >>
> >>Anyway, I'll double check and report back...
> >>
> >>Cheers
> >>
> >>Mark
> >>
> >
> >I did 18 boots with and with out John's patch. With the patch I got 6
> >actual boots and 12 hangs in the loaders progress bar.
> >
> >Without the patch I got 10 boots and 8 hangs.
> >
> >But, my Tyan M/B is old and with known ACPI issues so I'm not sure if
> >this is of much value.
> >
> >Mark, it would be nice if you also observe if a sequence of reboots
> >eventually boots your system. My longest bad streek was 8 reboots.
> >
> >  
> 
> Yeah, I see the same thing - with John's patch applied, out of 9 reboots 
> I got 7 hangs and 2 actual boots.  (didn't try without the patch).
> 
> Mark

OK, that is at least nice with consistency across different HW.

-- 
Peter
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: BTX loader hangs after version info

2008-05-25 Thread Peter Holm
On Sun, May 25, 2008 at 08:33:01PM +1200, Mark Kirkwood wrote:
> I wrote:
> >John Baldwin wrote:
> >>
> >>Try this patch.  I'm not 100% certain this will fix it as I can't 
> >>reproduce
> >>the issue, but I think it might help.  Specifically, when the boot 
> >>code makes
> >>a v86 call, the loader/boot2/whatever swaps in/out a new set of 
> >>registers via
> >>the v86 structure including the eflags register.  However, none of 
> >>the boot
> >>programs actually initialized the v86 structure.   Thus, the BIOS 
> >>routines
> >>would start off running with whatever garbage was in v86.efl when 
> >>each boot
> >>program started.  This meant that we could end up invoking BIOS 
> >>routines with
> >>interrupts disabled, and I think this might explain a hard hang (if a 
> >>BIOS
> >>routine was waiting for an interrupt the interrupt would never 
> >>fire).  The
> >>patch fixes all the boot programs to initialize v86 to a better known 
> >>state. At the least it sets v86.efl to a sane value (0x202) rather 
> >>than random.  (The
> >>random might have always been 0x0 BTW, not sure on that one.)
> >>
> >>  
> >Thanks John,
> >
> >Unfortunately this patch does *not* cure the issue for my old 
> >Supermicro P3TDDE, it still hangs just before presenting the menu. I 
> >had to boot off the livefs and copy /boot/loader.old -> /boot/loader 
> >to get back to being bootable again - but at least the old fella is on 
> >a more up-to-date 7-STABLE now :-)
> 
> Given that the patch *did* cure Peters Tyan S2720, I'll double check I 
> didn't fat finger applying the patch (mind you the Tyan has AMI BOIS - 
> same as my Supermicro P3TDERs that *do* work ok with current 7-STABLE, 
> whereas the P3TDDE has Award BIOS).
> 
> Anyway, I'll double check and report back...
> 
> Cheers
> 
> Mark

I did 18 boots with and with out John's patch. With the patch I got 6
actual boots and 12 hangs in the loaders progress bar.

Without the patch I got 10 boots and 8 hangs.

But, my Tyan M/B is old and with known ACPI issues so I'm not sure if
this is of much value.

Mark, it would be nice if you also observe if a sequence of reboots
eventually boots your system. My longest bad streek was 8 reboots.

- Peter
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: BTX loader hangs after version info

2008-05-25 Thread Peter Holm
On Sun, May 25, 2008 at 08:33:01PM +1200, Mark Kirkwood wrote:
> I wrote:
> >John Baldwin wrote:
> >>
> >>Try this patch.  I'm not 100% certain this will fix it as I can't 
> >>reproduce
> >>the issue, but I think it might help.  Specifically, when the boot 
> >>code makes
> >>a v86 call, the loader/boot2/whatever swaps in/out a new set of 
> >>registers via
> >>the v86 structure including the eflags register.  However, none of 
> >>the boot
> >>programs actually initialized the v86 structure.   Thus, the BIOS 
> >>routines
> >>would start off running with whatever garbage was in v86.efl when 
> >>each boot
> >>program started.  This meant that we could end up invoking BIOS 
> >>routines with
> >>interrupts disabled, and I think this might explain a hard hang (if a 
> >>BIOS
> >>routine was waiting for an interrupt the interrupt would never 
> >>fire).  The
> >>patch fixes all the boot programs to initialize v86 to a better known 
> >>state. At the least it sets v86.efl to a sane value (0x202) rather 
> >>than random.  (The
> >>random might have always been 0x0 BTW, not sure on that one.)
> >>
> >>  
> >Thanks John,
> >
> >Unfortunately this patch does *not* cure the issue for my old 
> >Supermicro P3TDDE, it still hangs just before presenting the menu. I 
> >had to boot off the livefs and copy /boot/loader.old -> /boot/loader 
> >to get back to being bootable again - but at least the old fella is on 
> >a more up-to-date 7-STABLE now :-)
> 
> Given that the patch *did* cure Peters Tyan S2720, I'll double check I 
> didn't fat finger applying the patch (mind you the Tyan has AMI BOIS - 
> same as my Supermicro P3TDERs that *do* work ok with current 7-STABLE, 
> whereas the P3TDDE has Award BIOS).
> 
> Anyway, I'll double check and report back...
> 
> Cheers
> 
> Mark

I now have booted some more with this patch and it would seem that the
problem is still there, from time to time! Most of the time I now boot
without any problems but once in a while the loader still hangs.

I'll try to gather some statistics...
-- 
Peter
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: BTX loader hangs after version info

2008-05-24 Thread Peter Holm
On Fri, May 23, 2008 at 06:11:01PM -0400, John Baldwin wrote:
> On Friday 23 May 2008 09:26:45 am Kostik Belousov wrote:
> > On Fri, May 23, 2008 at 08:29:09AM -0400, John Baldwin wrote:
> > > On Friday 23 May 2008 07:53:11 am Kostik Belousov wrote:
> > > > On Fri, May 23, 2008 at 01:22:55PM +1200, Mark Kirkwood wrote:
> > > > > James Seward wrote:
> > > > > >Hello,
> > > > > >
> > > > > >Two days ago I csup'd my desktop at home, which was running RELENG_7
> > > > > >from about 7.0-RELEASE time, to bring it up-to-date (still on
> > > > > >RELENG_7). I followed my usual buildkernel/world procedure (the usual
> > > > > >one) which has worked fine all the way since 5.x. After installing
> > > > > >kernel and restarting in single user, it was working fine. However,
> > > > > >following installworld it will not boot.
> > > > > >
> > > > > >It stops immediately after "BTX loader 1.00 BTX version 1.02", but
> > > > > >with the cursor on the line *above* the first "B". Nothing futher
> > > > > >happens, but the system responds to Ctrl-Alt-Del.
> > > > > >
> > > > > >I have managed to start it using the install CD and csup'd back to a
> > > > > >version just before the commit to BTX that moved it to 1.02 (March
> > > > > >18th, I think). However, that version too hangs after "BTX loader 
> > > > > >1.00
> > > > > >BTX version 1.01".
> > > > > >
> > > > > >My desktop is currently building RELENG_7_0 to see if that will work,
> > > > > >but I won't know that until later as I'm at work and it is at home :)
> > > > > >
> > > > > >The install CD (BTX 1.00/1.01) boots fine. Nothing else changed on my
> > > > > >system between the last successful boot and the unsuccessful one.
> > > > > >
> > > > > >Any suggestions/advice for what I can try next, or what I can do to
> > > > > >help the troubleshooting process?
> > > > > >
> > > > > >My desktop is an Athlon64 but I am using i386, on an Asus A8V-E 
> > > > > >Deluxe
> > > > > >board.
> > > > >
> > > > > FWIW - I am seeing this too, on a Supermicro P3TDDE. 7-STABLE src from
> > > > > 28-Feb is fine, but Mar, Apr, May code all hangs after printing 
> > > > > "loading
> > > > > /boot/defaults/loader.conf" - presumably reading my /boot/loader.conf?
> > > > >
> > > > > Interestingly I can usually get it to boot by escaping to the loader
> > > > > prompt and then just pressing return.
> > > > >
> > > > > Oddly some other machines (Supermicro P3TDER and Asus PRO31J Laptop)
> > > > > behave normally with src from Mar->May.
> > > > >
> > > > > In all cases the canonical procedure from UPDATING was used 
> > > > > (buildworld,
> > > > > kernel, reboot single, mergemaster -p, installworld, delete-old,
> > > > > mergemaster, reboot).
> > > > >
> > > > > I happy to help collect some debug info (how do you switch this on for
> > > > > the loader?), tho the machine exhibiting the problem is my workstation
> > > > > (of course)!
> > > >
> > > > Try to install new bootblock.
> > > 
> > > I would be wary of that as it might make things worse?  These problems 
> > > are all 
> > > from starting /boot/loader.  boot2 is still working fine and thus there 
> > > is 
> > > still the possiblity of using boot2 to load /boot/loader.old as a 
> > > workaround.  
> > > If you update boot2 and it breaks you can't fix that w/o booting off of 
> > > some 
> > > other media such as a CD.
> > > 
> > > Debugging these hangs is not easy to do remotely.  If you know assembly 
> > > then 
> > > there are some things you can play with.  For example, in the case where 
> > > it 
> > > hangs after printing out the BTX version (from btxldr.S) you could start 
> > > adding debugging to btx.S to print out '.' characters in various places 
> > > and 
> > > see how many get printed out before it hangs.  However, doing this 
> > > requires 
> > > familiarity with assembly and is a lot easier with physical access to a 
> > > box.
> > 
> > When I worked on my version of the realbtx, I sometimes experienced hangs 
> > when
> > vm86 btx run before real-mode btx. I did not investigated it then, only 
> > noted
> > the issue.
> > 
> 
> Try this patch.  I'm not 100% certain this will fix it as I can't reproduce
> the issue, but I think it might help.  Specifically, when the boot code makes
> a v86 call, the loader/boot2/whatever swaps in/out a new set of registers via
> the v86 structure including the eflags register.  However, none of the boot
> programs actually initialized the v86 structure.   Thus, the BIOS routines
> would start off running with whatever garbage was in v86.efl when each boot
> program started.  This meant that we could end up invoking BIOS routines with
> interrupts disabled, and I think this might explain a hard hang (if a BIOS
> routine was waiting for an interrupt the interrupt would never fire).  The
> patch fixes all the boot programs to initialize v86 to a better known state. 
> At the least it sets v86.efl to a sane value (0x202) rather than random.  (The
> random might have always been 0x0 BTW, no

Re: Minidumps in -STABLE and "smaller than physical memory"

2008-01-31 Thread Peter Holm
On Wed, Jan 30, 2008 at 08:06:40PM +0300, Ruslan Ermilov wrote:
> On Thu, Sep 21, 2006 at 05:16:36PM +0200, Peter Holm wrote:
> > On Thu, Sep 21, 2006 at 05:15:47PM +0400, Ruslan Ermilov wrote:
> > > On Thu, Sep 21, 2006 at 11:14:33AM +0100, Gavin Atkinson wrote:
> > > > On Thu, 2006-09-21 at 11:44 +0300, Dmitry Pryanishnikov wrote:
> > > > > Hello!
> > > > > 
> > > > >I've noticed (with the 2-day old RELENG_6) that I still can't 
> > > > > configure my
> > > > > 256Mb swap partition as a dump device for i386 machine with 1Gb RAM 
> > > > > despite
> > > > > having minidumps enabled:
> > > > > 
> > > > > [EMAIL PROTECTED] sysctl debug.minidump
> > > > > debug.minidump: 1
> > > > > [EMAIL PROTECTED] dumpon -v /dev/ad0s3b
> > > > > /dev/ad0s3b is smaller than physical memory
> > > > > 
> > > > > Am I correctly understand that minidumps should lift the restriction
> > > > > 
> > > > >   sizeof(dumpdev) >= sizeof(RAM)
> > > > 
> > > > Yes.
> > > > 
> > > > http://www.freebsd.org/cgi/cvsweb.cgi/src/sbin/dumpon/dumpon.c.diff?r1=1.22&r2=1.23
> > > > needs to be MFC'd.
> > > > 
> > > I sent an MFC request to [EMAIL PROTECTED]
> > > 
> > 
> > >From time to time I've had problems with minidumps on HEAD. Calling
> > doadump() seems to work ok, but after a reset there's no dump. I
> > haven't had time to test this systematically. Has anybody else seen
> > this problem?
> > 
> I think this was the same problem as was diagnosed (minidumps + amd64 +
> SMP).  Though it's not fixed yet, it's understood and its damage is
> avoided.
> 
> 
> Cheers,
> -- 
> Ruslan Ermilov
> [EMAIL PROTECTED]
> FreeBSD committer

No, this was and is i386. After changing "doadump;reset" to
doadump;continue" it would seem that the dumps has a higher chance
of making it to the disk.

-- 
Peter
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Missing parameter validation for syscall(57)

2006-11-07 Thread Peter Holm
While stress testing GENERIC RELENG_6 from Nov 2 18:46 UTC on a NFS
loopback mounted filesystem I came across this problem:
http://people.freebsd.org/~pho/stress/log/cons220.html
-- 
Peter Holm
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ffs snapshot lockup

2006-10-04 Thread Peter Holm
On Wed, Oct 04, 2006 at 03:41:48PM -0400, Kris Kennaway wrote:
> On Wed, Oct 04, 2006 at 01:06:37PM -0400, Vivek Khera wrote:
> > 
> > On Oct 4, 2006, at 12:39 PM, Kris Kennaway wrote:
> > 
> > >>>
> > >>>The only thing I think was running at the time would be a large file
> > >>>copy from a remote system to this one using rsync.
> > >>
> > >>As I understand, you got the panic. Then, you shall post the panic  
> > >>message.
> > >>If you have core file, then running kgdb on the core may show  
> > >>required
> > >>information.
> > >>(it shall be on the console exactly before en
> > >>and backtrace (using the bt command of ddb) of the paniced thread.
> > >
> > >YOu can also do 'show msgbuf' from DDB.
> > >
> > 
> > i ran kgdb on the vmcore file.  since the dump was generated by  
> > calling doadump from DDB, the backtrace was showing the call stack of  
> > that.
> > 
> > from what i read in the output from kgdb, it seems that something  
> > locked the kernel and we broke to debugger from the watchdog timeout  
> > (I enable software watchdog).
> 
> Hmm, be careful with that - if you set the timeout too low (and note
> that for some workloads O(minutes) may even be too low) then you'll
> get a lot of false positives.
> 


Oh, yes. I've using this with success:

watchdogd -t 3600 -e 'ls /tmp /dev > /dev/null; true' -s 60

- Peter
> Kris

> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.5 (FreeBSD)
> 
> iD8DBQFFJA58Wry0BWjoQKURAr5dAKDf4YLcBJU9owRw6N1L3FcgJkvOOgCfRQkq
> bd8+tGZVB28bkYBN6KL7iO0=
> =B7Vl
> -END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Minidumps in -STABLE and "smaller than physical memory"

2006-09-21 Thread Peter Holm
On Thu, Sep 21, 2006 at 05:15:47PM +0400, Ruslan Ermilov wrote:
> On Thu, Sep 21, 2006 at 11:14:33AM +0100, Gavin Atkinson wrote:
> > On Thu, 2006-09-21 at 11:44 +0300, Dmitry Pryanishnikov wrote:
> > > Hello!
> > > 
> > >I've noticed (with the 2-day old RELENG_6) that I still can't 
> > > configure my
> > > 256Mb swap partition as a dump device for i386 machine with 1Gb RAM 
> > > despite
> > > having minidumps enabled:
> > > 
> > > [EMAIL PROTECTED] sysctl debug.minidump
> > > debug.minidump: 1
> > > [EMAIL PROTECTED] dumpon -v /dev/ad0s3b
> > > /dev/ad0s3b is smaller than physical memory
> > > 
> > > Am I correctly understand that minidumps should lift the restriction
> > > 
> > >   sizeof(dumpdev) >= sizeof(RAM)
> > 
> > Yes.
> > 
> > http://www.freebsd.org/cgi/cvsweb.cgi/src/sbin/dumpon/dumpon.c.diff?r1=1.22&r2=1.23
> > needs to be MFC'd.
> > 
> I sent an MFC request to [EMAIL PROTECTED]
> 

>From time to time I've had problems with minidumps on HEAD. Calling
doadump() seems to work ok, but after a reset there's no dump. I
haven't had time to test this systematically. Has anybody else seen
this problem?

- Peter
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Panic: spin lock held too long

2004-11-07 Thread Peter Holm
I just got a panic with GENERIC 5.3-RELEASE:

Script started on Mon Nov  8 08:42:44 2004
$ cd /usr/src/sys/i386/compile/GENERIC
$ kgdb kernel /var/crash/vmcore.1
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".
(no debugging symbols found)...0xc060bcea in doadump ()
(kgdb) bt
#0  0xc060bcea in doadump ()
#1  0xc060c2bd in boot ()
#2  0xc060c579 in panic ()
#3  0xc0603a68 in _mtx_lock_spin ()
#4  0xc06288e2 in sleepq_lookup ()
#5  0xc0612818 in msleep ()
#6  0xc05fa3b7 in kse_release ()
#7  0xc07b641b in syscall ()
#8  0xc07a5bcf in Xint0x80_syscall ()
#9  0x002f in ?? ()
#10 0x002f in ?? ()
:
#36 0xc192b000 in ?? ()
#37 0xc061c617 in sched_switch ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) quit
$ cat /var/crash/info.1
Good dump found on device /dev/ad0s3b
  Architecture: i386
  Architecture version: 1
  Dump length: 268419072B (255 MB)
  Blocksize: 512
  Dumptime: Mon Nov  8 08:00:32 2004
  Hostname: peter.osted.lan
  Versionstring: FreeBSD 5.3-RELEASE #0: Fri Nov  5 21:59:27 CET 2004
[EMAIL PROTECTED]:/usr/src/sys/i386/compile/GENERIC
  Panicstring: spin lock held too long
  Bounds: 1
$ exit
exit
peter# exit
$ exit

Script done on Mon Nov  8 08:44:25 2004

I'm compiling a new kernel with "-g", should it happen again.
-- 
Peter Holm
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"