Re: panic: ASan: Invalid access, 1-byte read at ...

2023-06-21 Thread Peter Holm
On Wed, Jun 21, 2023 at 10:06:28AM -0400, Mark Johnston wrote:
> On Wed, Jun 21, 2023 at 11:53:44AM +0200, Peter Holm wrote:
> > Just got this panic:
> > 
> > 20230621 11:15:23 all (37/912): linux.sh
> > panic: ASan: Invalid access, 1-byte read at 0xfe020bd78e9f, 
> > RedZonePartial(2)
> > cpuid = 1
> > time = 1687338930
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0xa5/frame 
> > 0xfe01f16abc10
> > kdb_backtrace() at kdb_backtrace+0xc7/frame 0xfe01f16abd70
> > vpanic() at vpanic+0x1d7/frame 0xfe01f16abe30
> > panic() at panic+0xb5/frame 0xfe01f16abf00
> > kasan_report() at kasan_report+0xdc/frame 0xfe01f16abfd0
> > pfs_lookup() at pfs_lookup+0x2c2/frame 0xfe01f16ac0f0
> > VOP_CACHEDLOOKUP_APV() at VOP_CACHEDLOOKUP_APV+0x91/frame 0xfe01f16ac130
> > vfs_cache_lookup() at vfs_cache_lookup+0x1f7/frame 0xfe01f16ac210
> > VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x91/frame 0xfe01f16ac250
> > vfs_lookup() at vfs_lookup+0xa0f/frame 0xfe01f16ac510
> > namei() at namei+0x679/frame 0xfe01f16ac690
> > vn_open_cred() at vn_open_cred+0xa94/frame 0xfe01f16aca10
> > kern_openat() at kern_openat+0x50d/frame 0xfe01f16acc70
> > linux_common_open() at linux_common_open+0x141/frame 0xfe01f16acd30
> > amd64_syscall() amd64_syscall+0x30f/frame 0xfast_syscall_common() at 
> > fast_syscall_common+0xf8/frame 0xfe01f16acf30
> > --- syscall (2, Linux ELF64, linux_open), rip = 0x8012ef7f0, rsp = 
> > 0x7fffa238, rbp = 0x7fffa290 ---
> > KDB: enter: panic
> > [ thread pid 31838 tid 100363 ]
> > Stopped at  kdb_enter+0x34: movq$0,0x1e3f7c1(%rip)
> > db>
> > 
> > Details @ https://people.freebsd.org/~pho/stress/log/log0450.txt
> 
> Hi Peter,
> 
> Thanks for the report.  I believe this would be fixed by
> https://reviews.freebsd.org/D40692 .

Hi Mark,

This works for me.

- Peter



panic: ASan: Invalid access, 1-byte read at ...

2023-06-21 Thread Peter Holm
Just got this panic:

20230621 11:15:23 all (37/912): linux.sh
panic: ASan: Invalid access, 1-byte read at 0xfe020bd78e9f, 
RedZonePartial(2)
cpuid = 1
time = 1687338930
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0xa5/frame 0xfe01f16abc10
kdb_backtrace() at kdb_backtrace+0xc7/frame 0xfe01f16abd70
vpanic() at vpanic+0x1d7/frame 0xfe01f16abe30
panic() at panic+0xb5/frame 0xfe01f16abf00
kasan_report() at kasan_report+0xdc/frame 0xfe01f16abfd0
pfs_lookup() at pfs_lookup+0x2c2/frame 0xfe01f16ac0f0
VOP_CACHEDLOOKUP_APV() at VOP_CACHEDLOOKUP_APV+0x91/frame 0xfe01f16ac130
vfs_cache_lookup() at vfs_cache_lookup+0x1f7/frame 0xfe01f16ac210
VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x91/frame 0xfe01f16ac250
vfs_lookup() at vfs_lookup+0xa0f/frame 0xfe01f16ac510
namei() at namei+0x679/frame 0xfe01f16ac690
vn_open_cred() at vn_open_cred+0xa94/frame 0xfe01f16aca10
kern_openat() at kern_openat+0x50d/frame 0xfe01f16acc70
linux_common_open() at linux_common_open+0x141/frame 0xfe01f16acd30
amd64_syscall() amd64_syscall+0x30f/frame 0xfast_syscall_common() at 
fast_syscall_common+0xf8/frame 0xfe01f16acf30
--- syscall (2, Linux ELF64, linux_open), rip = 0x8012ef7f0, rsp = 
0x7fffa238, rbp = 0x7fffa290 ---
KDB: enter: panic
[ thread pid 31838 tid 100363 ]
Stopped at  kdb_enter+0x34: movq$0,0x1e3f7c1(%rip)
db>

Details @ https://people.freebsd.org/~pho/stress/log/log0450.txt

- Peter



panic: general protection fault

2021-04-15 Thread Peter Holm
I just got this one:

0210415 17:01:45 all (458/755): callout_reset_on.sh
kernel trap 9 with interrupts disabled

Fatal trap 9: general protection fault while in kernel mode
cpuid = 10; apic id = 0a
instruction pointer = 0x20:0x80c7c286
stack pointer = 0x0:0xfe00e49b0730
frame pointer = 0x0:0xfe00e49b0770
code segment  = base 0x0, limit 0xf, type 0x1b
   = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process  = 12 (swi1: netisr 0)
trap number  = 9
panic: general protection fault
cpuid = 10
time = 1618498958
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00e49b0440
vpanic() at vpanic+0x181/frame 0xfe00e49b0490
panic() at panic+0x43/frame 0xfe00e49b04f0
trap_fatal() at trap_fatal+0x387/frame 0xfe00e49b0550
trap() at trap+0xa4/frame 0xfe00e49b0660
calltrap() at calltrap+0x8/frame 0xfe00e49b0660
--- trap 0x9, rip = 0x80c7c286, rsp = 0xfe00e49b0730, rbp = 
0xfe00e49b0770 ---
turnstile_wait() at turnstile_wait+0x46/frame 0xfe00e49b0770
__rw_wlock_hard() at __rw_wlock_hard+0x464/frame 0xfe00e49b0820
_rw_wlock_cookie() at _rw_wlock_cookie+0xb7/frame 0xfe00e49b0860
in_pcblookup_hash() at in_pcblookup_hash+0x76/frame 0xfe00e49b0890
in_pcblookup_mbuf() at in_pcblookup_mbuf+0x24/frame 0xfe00e49b08b0
tcp_input() at tcp_input+0x6e8/frame 0xfe00e49b0a10
ip_input() at ip_input+0x194/frame 0xfe00e49b0aa0
swi_net() at swi_net+0x1a1/frame 0xfe00e49b0b20
ithread_loop() at ithread_loop+0x279/frame 0xfe00e49b0bb0
fork_exit() at fork_exit+0x80/frame 0xfe00e49b0bf0

https://people.freebsd.org/~pho/stress/log/log0092.txt

- Peter
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: malloc(M_WAITOK) with sleeping prohibited with main-n245383-15565e0a2177

2021-03-11 Thread Peter Holm
On Thu, Mar 11, 2021 at 12:56:02PM -0500, Mark Johnston wrote:
> On Thu, Mar 11, 2021 at 06:32:13PM +0100, Peter Holm wrote:
> > I just got this panic:
> > 
> > panic: malloc(M_WAITOK) with sleeping prohibited
> > cpuid = 0
> > time = 1615472733
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
> > 0xfe00e49748b0
> > vpanic() at vpanic+0x181/frame 0xfe00e4974900
> > panic() at panic+0x43/frame 0xfe00e4974960
> > malloc_dbg() at malloc_dbg+0xd4/frame 0xfe00e4974980
> > malloc() at malloc+0x34/frame 0xfe00e49749e0
> > g_mirror_event_send() at g_mirror_event_send+0x30/frame 0xfe00e4974a30
> > softclock_call_cc() at softclock_call_cc+0x15d/frame 0xfe00e4974b00
> > softclock() at softclock+0x66/frame 0xfe00e4974b20
> > ithread_loop() at ithread_loop+0x279/frame 0xfe00e4974bb0
> > fork_exit() at fork_exit+0x80/frame 0xfe00e4974bf0
> > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00e4974bf0
> > 
> > https://people.freebsd.org/~pho/stress/log/log0078.txt
> 
> Hi Peter,
> 
> Could you try the patch here? https://reviews.freebsd.org/D29223

This fixed the problem for me. I ran the problem test for an hour and
then the rest of the g_mirror tests. No problems seen.

- Peter
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


panic: malloc(M_WAITOK) with sleeping prohibited with main-n245383-15565e0a2177

2021-03-11 Thread Peter Holm
I just got this panic:

panic: malloc(M_WAITOK) with sleeping prohibited
cpuid = 0
time = 1615472733
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00e49748b0
vpanic() at vpanic+0x181/frame 0xfe00e4974900
panic() at panic+0x43/frame 0xfe00e4974960
malloc_dbg() at malloc_dbg+0xd4/frame 0xfe00e4974980
malloc() at malloc+0x34/frame 0xfe00e49749e0
g_mirror_event_send() at g_mirror_event_send+0x30/frame 0xfe00e4974a30
softclock_call_cc() at softclock_call_cc+0x15d/frame 0xfe00e4974b00
softclock() at softclock+0x66/frame 0xfe00e4974b20
ithread_loop() at ithread_loop+0x279/frame 0xfe00e4974bb0
fork_exit() at fork_exit+0x80/frame 0xfe00e4974bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe00e4974bf0

https://people.freebsd.org/~pho/stress/log/log0078.txt

- Peter
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: malloc(M_WAITOK) with sleeping prohibited

2021-03-10 Thread Peter Holm
On Wed, Mar 10, 2021 at 01:33:34PM +0100, Hans Petter Selasky wrote:
> On 3/10/21 12:41 PM, Peter Holm wrote:
> > On Wed, Mar 10, 2021 at 10:52:53AM +0100, Hans Petter Selasky wrote:
> >> On 3/10/21 10:15 AM, Peter Holm wrote:
> >>> I just got this panic:
> >>>
> >>> igb0:  port 0xd020-0xd03f 
> >>> mem 0xfb32-0xfb33,0xfb344000-0xfb347fff irq 16 at device 0.0 on 
> >>> pci8
> >>> igb0: Using 1024 TX descriptors and 1024 RX descriptors
> >>> igb0: queue equality override not set, capping rx_queues at 6 and 
> >>> tx_queues at 6
> >>> igb0: Using 6 RX queues 6 TX queues
> >>> igb0: Using MSI-X interrupts with 7 vectors
> >>> igb0:
> >>> db>
> >>> db> show panic
> >>> panic: malloc(M_WAITOK) with sleeping prohibited
> >>> db> bt
> >>> Tracing pid 12 tid 100172 td 0xfe010dce2100
> >>> kdb_enter() at kdb_enter+0x37/frame 0xfe00e4f72980
> >>> vpanic() at vpanic+0x1b2/frame 0xfe00e4f729d0
> >>> panic() at panic+0x43/frame 0xfe00e4f72a30
> >>> malloc_dbg() at malloc_dbg+0xd4/frame 0xfe00e4f72a50
> >>> malloc() at malloc+0x34/frame 0xfe00e4f72ab0
> >>> linux_alloc_current() at linux_alloc_current+0x3d/frame 0xfe00e4f72b00
> >>> linux_irq_handler() at linux_irq_handler+0x3a/frame 0xfe00e4f72b20
> >>> ithread_loop() at ithread_loop+0x279/frame 0xfe00e4f72bb0
> >>> fork_exit() at fork_exit+0x80/frame 0xfe00e4f72bf0
> >>> fork_trampoline() at fork_trampoline+0xe/frame 0xfe00e4f72bf0
> >>> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> >>> db> x/s version
> >>> version:FreeBSD 14.0-CURRENT #0 main-n245371-ce53f92e6c81: Wed 
> >>> Mar 10 10:00:29 CET 2021\012
> >>> p...@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO\012
> >>> db>
> >>
> >> This should fix it:
> >> https://cgit.freebsd.org/src/commit/?id=d1cbe79089868226625c12ef49f51214d79aa427
> >>
> >> --HPS
> > 
> > Yes, thank you. Now I see this:
> > 
> > ugen0.3:  at usbus0
> > ukbd0 on uhub3
> > ukbd0:  on 
> > usbus0
> > kbd2 at ukbd0
> > panic: malloc(M_WAITOK) with sleeping prohibited
> > cpuid = 0
> > time = 1615375651
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
> > 0xfe00e4974890
> > vpanic() at vpanic+0x181/frame 0xfe00e49748e0
> > panic() at panic+0x43/frame 0xfe00e4974940
> > malloc_dbg() at malloc_dbg+0xd4/frame 0xfe00e4974960
> > malloc() at malloc+0x34/frame 0xfe00e49749c0
> > linux_alloc_current() at linux_alloc_current+0x3d/frame 0xfe00e4974a10
> > linux_timer_callback_wrapper() at linux_timer_callback_wrapper+0x37/frame 
> > 0xfe00e4974a30
> > softclock_call_cc() at softclock_call_cc+0x15d/frame 0xfe00e4974b00
> > softclock() at softclock+0x66/frame 0xfe00e4974b20
> > ithread_loop() at ithread_loop+0x279/frame 0xfe00e4974bb0
> > fork_exit() at fork_exit+0x80/frame 0xfe00e4974bf0
> > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00e4974bf0
> > --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> > KDB: enter: panic
> > [ thread pid 12 tid 100088 ]
> 
> Try this:
> https://cgit.freebsd.org/src/commit/?id=dfb33cb0ef48084da84072244e8ca486dfcf3a96
> 

Works for me. Thank you!

- Peter

> There will be a more comprehensive fix coming:
> https://reviews.freebsd.org/D29183
> 
> --HPS
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: malloc(M_WAITOK) with sleeping prohibited

2021-03-10 Thread Peter Holm
On Wed, Mar 10, 2021 at 10:52:53AM +0100, Hans Petter Selasky wrote:
> On 3/10/21 10:15 AM, Peter Holm wrote:
> > I just got this panic:
> > 
> > igb0:  port 0xd020-0xd03f mem 
> > 0xfb32-0xfb33,0xfb344000-0xfb347fff irq 16 at device 0.0 on pci8
> > igb0: Using 1024 TX descriptors and 1024 RX descriptors
> > igb0: queue equality override not set, capping rx_queues at 6 and tx_queues 
> > at 6
> > igb0: Using 6 RX queues 6 TX queues
> > igb0: Using MSI-X interrupts with 7 vectors
> > igb0:
> > db>
> > db> show panic
> > panic: malloc(M_WAITOK) with sleeping prohibited
> > db> bt
> > Tracing pid 12 tid 100172 td 0xfe010dce2100
> > kdb_enter() at kdb_enter+0x37/frame 0xfe00e4f72980
> > vpanic() at vpanic+0x1b2/frame 0xfe00e4f729d0
> > panic() at panic+0x43/frame 0xfe00e4f72a30
> > malloc_dbg() at malloc_dbg+0xd4/frame 0xfe00e4f72a50
> > malloc() at malloc+0x34/frame 0xfe00e4f72ab0
> > linux_alloc_current() at linux_alloc_current+0x3d/frame 0xfe00e4f72b00
> > linux_irq_handler() at linux_irq_handler+0x3a/frame 0xfe00e4f72b20
> > ithread_loop() at ithread_loop+0x279/frame 0xfe00e4f72bb0
> > fork_exit() at fork_exit+0x80/frame 0xfe00e4f72bf0
> > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00e4f72bf0
> > --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> > db> x/s version
> > version:FreeBSD 14.0-CURRENT #0 main-n245371-ce53f92e6c81: Wed Mar 
> > 10 10:00:29 CET 2021\012
> > p...@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO\012
> > db>
> 
> This should fix it:
> https://cgit.freebsd.org/src/commit/?id=d1cbe79089868226625c12ef49f51214d79aa427
> 
> --HPS

Yes, thank you. Now I see this:

ugen0.3:  at usbus0
ukbd0 on uhub3
ukbd0:  on 
usbus0
kbd2 at ukbd0
panic: malloc(M_WAITOK) with sleeping prohibited
cpuid = 0
time = 1615375651
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00e4974890
vpanic() at vpanic+0x181/frame 0xfe00e49748e0
panic() at panic+0x43/frame 0xfe00e4974940
malloc_dbg() at malloc_dbg+0xd4/frame 0xfe00e4974960
malloc() at malloc+0x34/frame 0xfe00e49749c0
linux_alloc_current() at linux_alloc_current+0x3d/frame 0xfe00e4974a10
linux_timer_callback_wrapper() at linux_timer_callback_wrapper+0x37/frame 
0xfe00e4974a30
softclock_call_cc() at softclock_call_cc+0x15d/frame 0xfe00e4974b00
softclock() at softclock+0x66/frame 0xfe00e4974b20
ithread_loop() at ithread_loop+0x279/frame 0xfe00e4974bb0
fork_exit() at fork_exit+0x80/frame 0xfe00e4974bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe00e4974bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 12 tid 100088 ]
Stopped at  kdb_enter+0x37: movq$0,0x12891fe(%rip)
db> x/s version
version:FreeBSD 14.0-CURRENT #0 main-n245372-d1cbe7908986: Wed Mar 10 
12:25:03 CET 2021\012
p...@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO\012
db>

- Peter
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


panic: malloc(M_WAITOK) with sleeping prohibited

2021-03-10 Thread Peter Holm
I just got this panic:

igb0:  port 0xd020-0xd03f mem 
0xfb32-0xfb33,0xfb344000-0xfb347fff irq 16 at device 0.0 on pci8
igb0: Using 1024 TX descriptors and 1024 RX descriptors
igb0: queue equality override not set, capping rx_queues at 6 and tx_queues at 6
igb0: Using 6 RX queues 6 TX queues
igb0: Using MSI-X interrupts with 7 vectors
igb0:
db>
db> show panic
panic: malloc(M_WAITOK) with sleeping prohibited
db> bt
Tracing pid 12 tid 100172 td 0xfe010dce2100
kdb_enter() at kdb_enter+0x37/frame 0xfe00e4f72980
vpanic() at vpanic+0x1b2/frame 0xfe00e4f729d0
panic() at panic+0x43/frame 0xfe00e4f72a30
malloc_dbg() at malloc_dbg+0xd4/frame 0xfe00e4f72a50
malloc() at malloc+0x34/frame 0xfe00e4f72ab0
linux_alloc_current() at linux_alloc_current+0x3d/frame 0xfe00e4f72b00
linux_irq_handler() at linux_irq_handler+0x3a/frame 0xfe00e4f72b20
ithread_loop() at ithread_loop+0x279/frame 0xfe00e4f72bb0
fork_exit() at fork_exit+0x80/frame 0xfe00e4f72bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe00e4f72bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
db> x/s version
version:FreeBSD 14.0-CURRENT #0 main-n245371-ce53f92e6c81: Wed Mar 10 
10:00:29 CET 2021\012
p...@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO\012
db>

- Peter
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: r367672 broke the NFS server

2020-12-30 Thread Peter Holm
On Wed, Dec 30, 2020 at 07:27:08PM +0200, Konstantin Belousov wrote:
> On Wed, Dec 30, 2020 at 04:48:27PM +, Rick Macklem wrote:
> > Kostik wrote:
> > >On Wed, Dec 30, 2020 at 02:02:48AM +, Rick Macklem wrote:
> > >> Hi,
> > >>
> > >> Post r367671...
> > >> When multiple files are being created by an NFS client in the same
> > >> directory, the VOP_CREATE()/ufs_create() can fail with ERELOOKUP.
> > >> This results in a EIO return to the NFS client.
> > >> --> This causes "nfsv4 client/server protocol prob err=10026"
> > >>   on the client for NFSv4.0 mounts.
> > >>   --> This explains why this error has been reported by
> > >> several people lately, although it should "never happen".
> > >>
> > >> Unfortunately, for the NFS server, the Lookup call is done separately
> > >> and it will not be easy to redo it, given the current NFS code structure.
> > >>
> > >> Is there another way to deal with the problem r367672 was fixing that
> > >> avoids ufs_create() returning ERELOOKUP?
> > >
> > >Idea of the change is to restart the syscall at top level.  So for NFS
> > >server the right approach is to not send a response and also to not
> > >free the request mbuf chain, but to restart processing.
> > Yes. I took a look and I think restarting the operation by rolling the
> > working position in the mbuf lists back and redoing the operation
> > is feasible and easier than fixing the individual operations.
> > 
> > For NFSv4, you cannot redo the entire compound, since non-idempotent
> > operations like exclusive open may have already been completed.
> > However, rolling back to the beginning of the operation should be
> > doable.
> > --> It will serve as a good test, in that it may expose bugs in the
> >   RPC/operation code where failure (ERELOOKUP) doesn't clean
> >   things up correctly.
> >   --> In NFSv4, there is the open/lock state that cannot be updated
> > for this error case. (The seqid stuff in NFSv4.0 Open can be 
> > fun.
> > Its used to serialize the operations and the number must be
> > incremented for some errors, but not for others. The 10026
> > error occurs when you don't get this right.)
> Note that ERELOOKUP error can only show up from the VOPs that modify the 
> volume.
> Otherwise we simply do not call into SU.  In particular, I believe that opens
> in the sense of NFS are safe.
> 
> Regardless of it, there should be either a catch-all check for ERELOOKUP,
> or assert that ERELOOKUP did not leaked, as it is done for syscalls
> 
> > 
> > I'll start working on this to-day, but I have no idea how long it might
> > take?
> > 
> > >I am sorry I forgot about NFS server when designing this fix, the only
> > >mild excuse I can provide is that the change was quite complicated as is.
> > >I will start looking at the fix.
> > No problem. Sometimes I'd like to forget about NFS too;-).
> > 
> > For the rollback/redo the RPC/operation case, it's probably easier for me
> > to do it. As above, I'll start on it, but...
> > 
> > My main concern is how long it will take, given the FreeBSD13 release
> > starts soon.
> For sure I will help you if needed, and I believe that we could ask for
> testing from Peter.

Absolutely.
Not sure how I missed running NFS test the first time around.

- Peter

> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: general protection fault from uipc_sockaddr+0x4c

2020-12-08 Thread Peter Holm
On Tue, Dec 08, 2020 at 10:30:41AM -0500, Mark Johnston wrote:
> On Tue, Dec 08, 2020 at 12:47:18PM +0100, Peter Holm wrote:
> > I just got this panic:
> > 
> > Fatal trap 9: general protection fault while in kernel mode
> > cpuid = 9; apic id = 09
> > instruction pointer = 0x20:0x80bc6e22
> > stack pointer = 0x28:0xfe0698887630
> > frame pointer = 0x28:0xfe06988876b0
> > code segment  = base 0x0, limit 0xf, type 0x1b
> >= DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags = interrupt enabled, resume, IOPL = 0
> > current process  = 45966 (fstat)
> > trap number  = 9
> > panic: general protection fault
> > cpuid = 9
> > time = 1607416693
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
> > 0xfe0698887340
> > vpanic() at vpanic+0x181/frame 0xfe0698887390
> > panic() at panic+0x43/frame 0xfe06988873f0
> > trap_fatal() at trap_fatal+0x387/frame 0xfe0698887450
> > trap() at trap+0xa4/frame 0xfe0698887560
> > calltrap() at calltrap+0x8/frame 0xfe0698887560
> > --- trap 0x9, rip = 0x80bc6e22, rsp = 0xfe0698887630, rbp = 
> > 0xfe06988876b0 ---
> > __mtx_lock_sleep() at __mtx_lock_sleep+0xd2/frame 0xfe06988876b0
> > __mtx_lock_flags() at __mtx_lock_flags+0xe5/frame 0xfe0698887700
> > uipc_sockaddr() at uipc_sockaddr+0x4c/frame 0xfe0698887730
> > soo_fill_kinfo() at soo_fill_kinfo+0x11e/frame 0xfe0698887770
> > kern_proc_filedesc_out() at kern_proc_filedesc_out+0xb57/frame 
> > 0xfe0698887810
> > sysctl_kern_proc_filedesc() at sysctl_kern_proc_filedesc+0x7d/frame 
> > 0xfe0698887890
> > sysctl_root_handler_locked() at sysctl_root_handler_locked+0x9c/frame 
> > 0xfe06988878e0
> > sysctl_root() at sysctl_root+0x20d/frame 0xfe0698887960
> > userland_sysctl() at userland_sysctl+0x180/frame 0xfe0698887a10
> > sys___sysctl() at sys___sysctl+0x5f/frame 0xfe0698887ac0
> > amd64_syscall() at amd64_syscall+0x147/frame 0xfe0698887bf0
> > fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0698887bf0
> > --- syscall (202, FreeBSD ELF64, sys___sysctl), rip = 0x8003948ea, rsp = 
> > 0x7fffc138, rbp = 0x7fffc170 ---
> > 
> > https://people.freebsd.org/~pho/stress/log/log0004.txt
> 
> So here the unpcb is freed, and indeed the file itself has been closed:
> 
> $3 = {f_flag = 0x3, f_count = 0x0, f_data = 0x0, f_ops = 0x81901f50 
> ,
>   f_vnode = 0x0, f_cred = 0xf80248beb600, f_type = 0x2, 
> f_vnread_flags = 0x0,
>   {f_seqcount = {0x0, 0x0}, f_pipegen = 0x0}, f_nextoff = {0x0, 0x0},
>   f_vnun = {fvn_cdevpriv = 0x0, fvn_advice = 0x0}, f_offset = 0x0}
> 
> However, it must have happened very recently because soo_fill_kinfo()
> dereferences fp->f_data and yet we did not panic due to a null
> dereference.
> 
> kern_proc_filedesc_out() holds the fdtable shared lock thoughout all of
> this, which is supposed to prevent the table entry from being freed
> since that requires the exclusive lock.
> 
> Could you show fdp->fd_ofiles[3] and fdp->fd_map[0] from frame 26?

Sure:

(kgdb) p fdp->fd_files->fdt_ofiles[3]
$1 = {fde_file = 0xf807306fd0f0, fde_caps = {fc_rights = {cr_rights = {0x0, 
0x0}}, fc_ioctls = 0x0, fc_nioctls = 0x0, fc_fcntls = 0x0}, fde_flags = 0x0, 
fde_seqc = 0x2}
(kgdb) p fdp->fd_map[0]
$2 = 0x1f
(kgdb) 

- Peter
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


panic: general protection fault from uipc_sockaddr+0x4c

2020-12-08 Thread Peter Holm
I just got this panic:

Fatal trap 9: general protection fault while in kernel mode
cpuid = 9; apic id = 09
instruction pointer = 0x20:0x80bc6e22
stack pointer = 0x28:0xfe0698887630
frame pointer = 0x28:0xfe06988876b0
code segment  = base 0x0, limit 0xf, type 0x1b
   = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process  = 45966 (fstat)
trap number  = 9
panic: general protection fault
cpuid = 9
time = 1607416693
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0698887340
vpanic() at vpanic+0x181/frame 0xfe0698887390
panic() at panic+0x43/frame 0xfe06988873f0
trap_fatal() at trap_fatal+0x387/frame 0xfe0698887450
trap() at trap+0xa4/frame 0xfe0698887560
calltrap() at calltrap+0x8/frame 0xfe0698887560
--- trap 0x9, rip = 0x80bc6e22, rsp = 0xfe0698887630, rbp = 
0xfe06988876b0 ---
__mtx_lock_sleep() at __mtx_lock_sleep+0xd2/frame 0xfe06988876b0
__mtx_lock_flags() at __mtx_lock_flags+0xe5/frame 0xfe0698887700
uipc_sockaddr() at uipc_sockaddr+0x4c/frame 0xfe0698887730
soo_fill_kinfo() at soo_fill_kinfo+0x11e/frame 0xfe0698887770
kern_proc_filedesc_out() at kern_proc_filedesc_out+0xb57/frame 
0xfe0698887810
sysctl_kern_proc_filedesc() at sysctl_kern_proc_filedesc+0x7d/frame 
0xfe0698887890
sysctl_root_handler_locked() at sysctl_root_handler_locked+0x9c/frame 
0xfe06988878e0
sysctl_root() at sysctl_root+0x20d/frame 0xfe0698887960
userland_sysctl() at userland_sysctl+0x180/frame 0xfe0698887a10
sys___sysctl() at sys___sysctl+0x5f/frame 0xfe0698887ac0
amd64_syscall() at amd64_syscall+0x147/frame 0xfe0698887bf0
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0698887bf0
--- syscall (202, FreeBSD ELF64, sys___sysctl), rip = 0x8003948ea, rsp = 
0x7fffc138, rbp = 0x7fffc170 ---

https://people.freebsd.org/~pho/stress/log/log0004.txt

- Peter
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: r340343 triggers kernel assertion if file is opened with O_BENEATH flag set through symlink

2018-11-28 Thread Peter Holm
On Wed, Nov 28, 2018 at 01:46:17AM +0200, Konstantin Belousov wrote:
> On Wed, Nov 28, 2018 at 12:54:21AM +0300, Vladimir Kondratyev wrote:
> > Following test case triggers assertion after r340343:
> > 
> > 
> > #include 
> > 
> > int
> > main(int argc, char **argv)
> > {
> >     openat(open("/etc", O_RDONLY), "termcap", O_RDONLY | O_BENEATH);
> > }
> > 
> > It results in:
> > 
> > panic: Assertion (ndp->ni_lcf & NI_LCF_LATCH) != 0 failed at
> > /usr/src/sys/kern/vfs_lookup.c:182
> > 
> 
> The following should fix it. Problem was that the topping directory was
> only latched when the initial path was absolute. Since your example
> switched from the relative argument to the absolute symlink, the BENEATH
> tracker rightfully complained that there were no recorded top.
> 
> I also added some asserts I used during the debugging.
> 
> diff --git a/sys/kern/vfs_lookup.c b/sys/kern/vfs_lookup.c
> index 78893c4f2bd..7a80775d91d 100644
> --- a/sys/kern/vfs_lookup.c

With this patch I got a:

$ ./beneath.sh
open("a/b") succeeded
stat("a/b
panic: Assertion (ndp->ni_lcf & NI_LCF_LATCH) != 0 failed at 
../../../kern/vfs_lookup.c:269
cpuid = 4
time = 1543397647
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00c71881a0
vpanic() at vpanic+0x1a3/frame 0xfe00c7188200
panic() at panic+0x43/frame 0xfe00c7188260
namei_handle_root() at namei_handle_root+0xf7/frame 0xfe00c71882b0
namei() at namei+0x617/frame 0xfe00c71884f0
vn_open_cred() at vn_open_cred+0x526/frame 0xfe00c7188640
vn_open() at vn_open+0x4c/frame 0xfe00c7188680
kern_openat() at kern_openat+0x2e9/frame 0xfe00c71888e0
sys_openat() at sys_openat+0x69/frame 0xfe00c7188910
syscallenter() at syscallenter+0x4e3/frame 0xfe00c71889f0
amd64_syscall() at amd64_syscall+0x4d/frame 0xfe00c7188ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfe00c7188ab0
--- syscall (499, FreeBSD ELF64, sys_openat), rip = 0x8003a215a, rsp = 
0x7fffe4f8, rbp = 0x7fffe5e0 ---

https://people.freebsd.org/~pho/stress/log/kostik1127.txt

- Peter
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


panic: mtx_lock() by idle thread ... mutex igb0 @ ../../../net/iflib.c:2084

2018-10-23 Thread Peter Holm
Feeding entropy: .
lo0: link state changed to UP
panic: mtx_lock() by idle thread 0xf800036ac000 on sleep mutex igb0 @ 
../../../net/iflib.c:2084
cpuid = 4
time = 1540286062
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe007876e610
vpanic() at vpanic+0x1a3/frame 0xfe007876e670
panic() at panic+0x43/frame 0xfe007876e6d0
__mtx_lock_flags() at __mtx_lock_flags+0x15a/frame 0xfe007876e720
iflib_admin_intr_deferred() at iflib_admin_intr_deferred+0x2a/frame 
0xfe007876e750
em_msix_link() at em_msix_link+0x84/frame 0xfe007876e780
iflib_fast_intr_ctx() at iflib_fast_intr_ctx+0x21/frame 0xfe007876e7a0
intr_event_handle() at intr_event_handle+0xbb/frame 0xfe007876e7f0
intr_execute_handlers() at intr_execute_handlers+0x58/frame 0xfe007876e820
lapic_handle_intr() at lapic_handle_intr+0x5f/frame 0xfe007876e840
Xapic_isr1() at Xapic_isr1+0xd9/frame 0xfe007876e840
--- interrupt, rip = 0x811e6be6, rsp = 0xfe007876e910, rbp = 
0xfe007876e910 ---
acpi_cpu_c1() at acpi_cpu_c1+0x6/frame 0xfe007876e910
acpi_cpu_idle() at acpi_cpu_idle+0x23d/frame 0xfe007876e960
cpu_idle_acpi() at cpu_idle_acpi+0x3f/frame 0xfe007876e980
cpu_idle() at cpu_idle+0xa7/frame 0xfe007876e9a0
sched_idletd() at sched_idletd+0x517/frame 0xfe007876ea70
fork_exit() at fork_exit+0x84/frame 0xfe007876eab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe007876eab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 11 tid 17 ]
Stopped at  kdb_enter+0x3b: movq$0,kdb_why
db> x/s version
version:FreeBSD 13.0-CURRENT r339638 PHO-GENERIC\012
db> 
-- 
Peter
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Page fault in midi/sequencer.c

2018-10-22 Thread Peter Holm
On Mon, Oct 22, 2018 at 11:00:41AM +0200, Hans Petter Selasky wrote:
> On 10/20/18 6:56 PM, Peter Holm wrote:
> > I can trigger this on 13.0-CURRENT r339445 with a non-root test program:
> > 
> 
> Hi,
> 
> The following commits should fix the issues you experience:
> 
> https://svnweb.freebsd.org/changeset/base/339581
> https://svnweb.freebsd.org/changeset/base/339582
> https://svnweb.freebsd.org/changeset/base/339583
> 
> --HPS

Thank you for fixing this!

-- 
Peter
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Page fault in midi/sequencer.c

2018-10-20 Thread Peter Holm
I can trigger this on 13.0-CURRENT r339445 with a non-root test program:

Calling uiomove() with the following non-sleepable locks held:
exclusive sleep mutex seqflq (seqflq) r = 0 (0xf80003860c08) locked @ 
dev/sound/midi/sequencer.c:952
stack backtrace:
#0 0x80bfe263 at witness_debugger+0x73
#1 0x80bff1b8 at witness_warn+0x448
#2 0x80bf6a91 at uiomove_faultflag+0x71
#3 0x809439e6 at mseq_write+0x4c6
#4 0x80a4f725 at devfs_write_f+0x185
#5 0x80c02a87 at dofilewrite+0x97
#6 0x80c0287f at kern_pwritev+0x5f
#7 0x80c0277d at sys_pwrite+0x8d
#8 0x81070af7 at amd64_syscall+0x2a7
#9 0x8104a4ad at fast_syscall_common+0x101
Kernel page fault with the following non-sleepable locks held:
exclusive sleep mutex seqflq (seqflq) r = 0 (0xf80003860c08) locked @ 
dev/sound/midi/sequencer.c:952
stack backtrace:
#0 0x80bfe263 at witness_debugger+0x73
#1 0x80bff1b8 at witness_warn+0x448
#2 0x810700d3 at trap_pfault+0x53
#3 0x8106f70a at trap+0x2ba
#4 0x81049bc5 at calltrap+0x8
#5 0x80bf6b42 at uiomove_faultflag+0x122
#6 0x809439e6 at mseq_write+0x4c6
#7 0x80a4f725 at devfs_write_f+0x185
#8 0x80c02a87 at dofilewrite+0x97
#9 0x80c0287f at kern_pwritev+0x5f
#10 0x80c0277d at sys_pwrite+0x8d
#11 0x81070af7 at amd64_syscall+0x2a7
#12 0x8104a4ad at fast_syscall_common+0x101


Fatal trap 12: page fault while in kernel mode
cpuid = 4; apic id = 04
fault virtual address = 0x20ea6b
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x8106d32d
stack pointer = 0x28:0xfe00a844a660
frame pointer = 0x28:0xfe00a844a660
code segment  = base 0x0, limit 0xf, type 0x1b
   = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process  = 2356 (xxx)
[ thread pid 2356 tid 100278 ]
Stopped at  copyin_nosmap_erms+0xdd:movl(%rsi),%edx
db>

-- 
Peter
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Page fault in udp_usrreq.c:823

2018-06-21 Thread Peter Holm
On Thu, Jun 21, 2018 at 06:42:41PM -0700, Matthew Macy wrote:
> I made changes this morning / early afternoon.
> -M
> 

With r335501 I no longer see any of the issues I reported. Thank
you for the fix!

- Peter

> On Thu, Jun 21, 2018 at 6:41 PM, Cy Schubert  
> wrote:
> > Like as of now?
> >
> > The last panic occurred this morning after a build last night.
> >
> >
> > --
> > Cheers,
> > Cy Schubert 
> > FreeBSD UNIX: Web:  http://www.FreeBSD.org
> >
> > The need of the many outweighs the greed of the few.
> >
> >
> > In message  > il.com>
> > , Matthew Macy writes:
> >> Try updating. It should be fixed.
> >>
> >> On Thu, Jun 21, 2018 at 6:14 PM, Cy Schubert  
> >> wrot
> >> e:
> >> > In message <20180620090957.ga...@x2.osted.lan>, Peter Holm writes:
> >> >> 20180620 10:32:47 all (1/1): udp.sh
> >> >> Kernel page fault with the following non-sleepable locks held:
> >> >> shared rw udpinp (udpinp) r = 0 (0xf80bbc808d78) locked @ 
> >> >> netinet/in_p
> >> cb.
> >> >> c:2398
> >> >> stack backtrace:
> >> >> #0 0x80c00733 at witness_debugger+0x73
> >> >> #1 0x80c01b11 at witness_warn+0x461
> >> >> #2 0x81075763 at trap_pfault+0x53
> >> >> #3 0x81074d7a at trap+0x2ba
> >> >> #4 0x8105076c at calltrap+0x8
> >> >> #5 0x80dd21b0 at udp_ctlinput+0x50
> >> >> #6 0x80d3081d at icmp_input+0x96d
> >> >> #7 0x80d316d7 at ip_input+0x3f7
> >> >> #8 0x80cc0a92 at netisr_dispatch_src+0xa2
> >> >> #9 0x80ca3ebe at ether_demux+0x16e
> >> >> #10 0x80ca5377 at ether_nh_input+0x427
> >> >> #11 0x80cc0a92 at netisr_dispatch_src+0xa2
> >> >> #12 0x80ca437f at ether_input+0x8f
> >> >> #13 0x80cbc500 at iflib_rxeof+0xc90
> >> >> #14 0x80cb6b6f at _task_fn_rx+0x7f
> >> >> #15 0x80bdd209 at gtaskqueue_run_locked+0x139
> >> >> #16 0x80bdcf88 at gtaskqueue_thread_loop+0x88
> >> >> #17 0x80b54514 at fork_exit+0x84
> >> >>
> >> >>
> >> >> Fatal trap 12: page fault while in kernel mode
> >> >> cpuid = 10; apic id = 0a
> >> >> fault virtual address = 0x8
> >> >> fault code  = supervisor read data, page not present
> >> >> instruction pointer = 0x20:0x80dd2423
> >> >> stack pointer = 0x0:0xfe4a5500
> >> >> frame pointer = 0x0:0xfe4a55a0
> >> >> code segment  = base 0x0, limit 0xf, type 0x1b
> >> >>= DPL 0, pres 1, long 1, def32 0, gran 1
> >> >> processor eflags = interrupt enabled, resume, IOPL = 0
> >> >> current process  = 0 (if_io_tqg_10)
> >> >> [ thread pid 0 tid 100069 ]
> >> >> Stopped at  udp_common_ctlinput+0x263:  cmpq$0,0x8(%rax)
> >> >> db>
> >> >>
> >> >> Details @ https://people.freebsd.org/~pho/stress/log/udp_usrreq.txt
> >> >>
> >> >> --
> >> >> Peter
> >> >> ___
> >> >> freebsd-current@freebsd.org mailing list
> >> >> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> >> >> To unsubscribe, send any mail to 
> >> >> "freebsd-current-unsubscr...@freebsd.org"
> >> >>
> >> >
> >> > This is surprisingly similar to my panic. Twice since June 19.
> >> >
> >> > slippy# kgdb /boot/kernel/kernel vmcore.3
> >> > GNU gdb (GDB) 8.1 [GDB v8.1 for FreeBSD]
> >> > Copyright (C) 2018 Free Software Foundation, Inc.
> >> > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.
> >> > html>
> >> > This is free software: you are free to change and redistribute it.
> >> > There is NO WARRANTY, to the extent permitted by law.  Type "show
> >> > copying"
> >> > and "show warranty" for details.
> >> > This GDB was configured as "x86_64-portbld-freebsd12.0".
> >> > Type "show configuration" for configuration details.
> >> > For bug reporting instructions, please see:
> >> > <http

Page fault in udp_usrreq.c:823

2018-06-20 Thread Peter Holm
20180620 10:32:47 all (1/1): udp.sh
Kernel page fault with the following non-sleepable locks held:
shared rw udpinp (udpinp) r = 0 (0xf80bbc808d78) locked @ 
netinet/in_pcb.c:2398
stack backtrace:
#0 0x80c00733 at witness_debugger+0x73
#1 0x80c01b11 at witness_warn+0x461
#2 0x81075763 at trap_pfault+0x53
#3 0x81074d7a at trap+0x2ba
#4 0x8105076c at calltrap+0x8
#5 0x80dd21b0 at udp_ctlinput+0x50
#6 0x80d3081d at icmp_input+0x96d
#7 0x80d316d7 at ip_input+0x3f7
#8 0x80cc0a92 at netisr_dispatch_src+0xa2
#9 0x80ca3ebe at ether_demux+0x16e
#10 0x80ca5377 at ether_nh_input+0x427
#11 0x80cc0a92 at netisr_dispatch_src+0xa2
#12 0x80ca437f at ether_input+0x8f
#13 0x80cbc500 at iflib_rxeof+0xc90
#14 0x80cb6b6f at _task_fn_rx+0x7f
#15 0x80bdd209 at gtaskqueue_run_locked+0x139
#16 0x80bdcf88 at gtaskqueue_thread_loop+0x88
#17 0x80b54514 at fork_exit+0x84


Fatal trap 12: page fault while in kernel mode
cpuid = 10; apic id = 0a
fault virtual address = 0x8
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80dd2423
stack pointer = 0x0:0xfe4a5500
frame pointer = 0x0:0xfe4a55a0
code segment  = base 0x0, limit 0xf, type 0x1b
   = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process  = 0 (if_io_tqg_10)
[ thread pid 0 tid 100069 ]
Stopped at  udp_common_ctlinput+0x263:  cmpq$0,0x8(%rax)
db>

Details @ https://people.freebsd.org/~pho/stress/log/udp_usrreq.txt

-- 
Peter
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: g_handleattr: md0 bio_length 24 len 31 -> EFAULT

2018-04-08 Thread Peter Holm
On Sun, Apr 08, 2018 at 02:36:08AM -0700, Michael Dexter wrote:
> On 3/24/18 2:35 AM, O. Hartmann wrote:
> > Writing out memory (md) backed images of UFS2 filesystems (NanoBSD images, 
> > created via
> > the classical manual way, no makefs), my recent CURRENT system dumps the
> > console full of these error messages:
> > 
> > g_handleattr: md0 bio_length 24 len 31 -> EFAULT
> > 
> > I do not know what they are supposed to mean and I'd like to ask whether 
> > someone could
> > shed some light on this.
> 
> I am seeing this on the the latest snapshot when attempting to run 
> option_survey.sh which creates an md-attached disk image.
> 
> Anyone else seeing this?
> 
> Michael

Yes, I have:

[pho@freefall ~/public_html/stress/log]$ grep -a g_handleattr: `ls -rt`
numa025.txt:g_handleattr: md10 bio_length 24 len 31 -> EFAULT
numa025.txt:g_handleattr: md10 bio_length 24 len 31 -> EFAULT
numa025.txt:g_handleattr: md10 bio_length 24 len 31 -> EFAULT
numa025.txt:g_handleattr: md10 bio_length 24 len 31 -> EFAULT
numa025.txt:g_handleattr: md10 bio_length 24 len 31 -> EFAULT
numa025.txt:g_handleattr: md10 bio_length 24 len 31 -> EFAULT
numa025.txt:g_handleattr: md10 bio_length 24 len 31 -> EFAULT
numa025.txt:g_handleattr: md10 bio_length 24 len 31 -> EFAULT
numa025.txt:g_handleattr: md10 bio_length 24 len 31 -> EFAULT
kostik1104.txt:g_handleattr: md10 bio_length 24 len 31 -> EFAULT
kostik1104.txt:g_handleattr: md10 bio_length 24 len 31 -> EFAULT
kostik1104.txt:g_handleattr: md10 bio_length 24 len 31 -> EFAULT
kostik1104.txt:g_handleattr: md10 bio_length 24 len 31 -> EFAULT
kostik1104.txt:g_handleattr: md10 bio_length 24 len 31 -> EFAULT
kostik1104.txt:g_handleattr: md10 bio_length 24 len 31 -> EFAULT
kostik1104.txt:g_handleattr: md10 bio_length 24 len 31 -> EFAULT
kostik1104.txt:g_handleattr: md10 bio_length 24 len 31 -> EFAULT
kostik1104.txt:g_handleattr: md10 bio_length 24 len 31 -> EFAULT
[pho@freefall ~/public_html/stress/log]$ 

-- 
Peter
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


panic: softclock_call_cc: act 0xfffff80c036eec00 0

2016-07-19 Thread Peter Holm
Got this while testing a patch:

panic: softclock_call_cc: act 0xf80c036eec00 0
cpuid = 22
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0f940c97d0
vpanic() at vpanic+0x182/frame 0xfe0f940c9850
kassert_panic() at kassert_panic+0x126/frame 0xfe0f940c98c0
softclock_call_cc() at softclock_call_cc+0x5ad/frame 0xfe0f940c99c0
softclock() at softclock+0x47/frame 0xfe0f940c99e0
intr_event_execute_handlers() at intr_event_execute_handlers+0x96/frame 
0xfe0f940c9a20
ithread_loop() at ithread_loop+0xa6/frame 0xfe0f940c9a70
fork_exit() at fork_exit+0x84/frame 0xfe0f940c9ab0

https://people.freebsd.org/~pho/stress/log/kostik920.txt

- Peter
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: bogus refcnt 0 on lle 0xfffff80121a13a00

2016-07-12 Thread Peter Holm
On Tue, Jul 12, 2016 at 10:55:31AM +0200, Hans Petter Selasky wrote:
> On 07/12/16 10:37, Peter Holm wrote:
> > Exiting from single-user mode triggers this:
> >
> > ifa_maintain_loopback_route: deletion failed for interface igb0: 3
> > panic: bogus refcnt 0 on lle 0xf80121a13a00
> > cpuid = 9
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
> > 0xfe1048c63470
> > vpanic() at vpanic+0x182/frame 0xfe1048c634f0
> > kassert_panic() at kassert_panic+0x126/frame 0xfe1048c63560
> > llentry_free() at llentry_free+0x136/frame 0xfe1048c63590
> > in_lltable_free_entry() at in_lltable_free_entry+0xb0/frame 
> > 0xfe1048c635c0
> > htable_prefix_free() at htable_prefix_free+0xce/frame 0xfe1048c63620
> > lltable_prefix_free() at lltable_prefix_free+0x5d/frame 0xfe1048c63660
> > in_scrubprefix() at in_scrubprefix+0x290/frame 0xfe1048c63700
> > in_difaddr_ioctl() at in_difaddr_ioctl+0x285/frame 0xfe1048c63750
> > in_control() at in_control+0x96/frame 0xfe1048c637d0
> > ifioctl() at ifioctl+0xda1/frame 0xfe1048c63860
> > kern_ioctl() at kern_ioctl+0x246/frame 0xfe1048c638c0
> > sys_ioctl() at sys_ioctl+0x171/frame 0xfe1048c639a0
> > amd64_syscall() at amd64_syscall+0x2f6/frame 0xfe1048c63ab0
> > Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe1048c63ab0
> > --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800fd2eba, rsp = 
> > 0x7fffe468, rbp = 0x7fffe4b0 -
> >
> > Details @ https://people.freebsd.org/~pho/stress/log/bogus_refcnt.txt
> >
> 
> FYI:
> https://reviews.freebsd.org/D4605
> 
> Might be related.
> 
> --HPS

No difference with this patch (- netinet6/nd6.c).

BTW The problem is really easy to reproduce: init 1 followed by
"exit" in the single-user shell.

-- 
Peter
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


panic: bogus refcnt 0 on lle 0xfffff80121a13a00

2016-07-12 Thread Peter Holm
Exiting from single-user mode triggers this:

ifa_maintain_loopback_route: deletion failed for interface igb0: 3
panic: bogus refcnt 0 on lle 0xf80121a13a00
cpuid = 9
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe1048c63470
vpanic() at vpanic+0x182/frame 0xfe1048c634f0
kassert_panic() at kassert_panic+0x126/frame 0xfe1048c63560
llentry_free() at llentry_free+0x136/frame 0xfe1048c63590
in_lltable_free_entry() at in_lltable_free_entry+0xb0/frame 0xfe1048c635c0
htable_prefix_free() at htable_prefix_free+0xce/frame 0xfe1048c63620
lltable_prefix_free() at lltable_prefix_free+0x5d/frame 0xfe1048c63660
in_scrubprefix() at in_scrubprefix+0x290/frame 0xfe1048c63700
in_difaddr_ioctl() at in_difaddr_ioctl+0x285/frame 0xfe1048c63750
in_control() at in_control+0x96/frame 0xfe1048c637d0
ifioctl() at ifioctl+0xda1/frame 0xfe1048c63860
kern_ioctl() at kern_ioctl+0x246/frame 0xfe1048c638c0
sys_ioctl() at sys_ioctl+0x171/frame 0xfe1048c639a0
amd64_syscall() at amd64_syscall+0x2f6/frame 0xfe1048c63ab0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe1048c63ab0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800fd2eba, rsp = 
0x7fffe468, rbp = 0x7fffe4b0 -

Details @ https://people.freebsd.org/~pho/stress/log/bogus_refcnt.txt

-- 
Peter
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Kqueue races causing crashes

2016-06-15 Thread Peter Holm
On Wed, Jun 15, 2016 at 11:11:43AM +0300, Konstantin Belousov wrote:
> On Tue, Jun 14, 2016 at 10:26:14PM -0500, Eric Badger wrote:
> > I believe they all have more or less the same cause. The crashes occur 
> > because we acquire a knlist lock via the KN_LIST_LOCK macro, but when we 
> > call KN_LIST_UNLOCK, the knote???s knlist reference (kn->kn_knlist) has 
> > been cleared by another thread. Thus we are unable to unlock the 
> > previously acquired lock and hold it until something causes us to crash 
> > (such as the witness code noticing that we???re returning to userland with 
> > the lock still held).
> ...
> > I believe there???s also a small window where the KN_LIST_LOCK macro 
> > checks kn->kn_knlist and finds it to be non-NULL, but by the time it 
> > actually dereferences it, it has become NULL. This would produce the 
> > ???page fault while in kernel mode??? crash.
> > 
> > If someone familiar with this code sees an obvious fix, I???ll be happy to 
> > test it. Otherwise, I???d appreciate any advice on fixing this. My first 
> > thought is that a ???struct knote??? ought to have its own mutex for 
> > controlling access to the flag fields and ideally the ???kn_knlist??? 
> > field. 
> > I.e., you would first acquire a knote???s lock and then the knlist lock, 
> > thus ensuring that no one could clear the kn_knlist variable while you 
> > hold the knlist lock. The knlist lock, however, usually comes from 
> > whichever event producing entity the knote tracks, so getting lock 
> > ordering right between the per-knote mutex and this other lock seems 
> > potentially hard. (Sometimes we call into functions in kern_event.c with 
> > the knlist lock already held, having been acquired in code outside of 
> > kern_event.c. Consider, for example, calling KNOTE_LOCKED from 
> > kern_exit.c; the PROC_LOCK macro has already been used to acquire the 
> > process lock, also serving as the knlist lock).
> This sounds as a good and correct analysis. I tried your test program
> for around a hour on 8-threads machine, but was not able to trigger the
> issue. Might be Peter have better luck reproducing them. Still, I think
> that the problem is there.
> 
> IMO we should simply avoid clearing kn_knlist in knlist_remove().  The
> member is only used to get the locking function pointers, otherwise
> code relies on KN_DETACHED flag to detect on-knlist condition.  See
> the patch below.
> 
> > 
> > Apropos of the knlist lock and its provenance: why is a lock from the 
> > event producing entity used to control access to the knlist and knote? 
> > Is it generally desirable to, for example, hold the process lock while 
> > operating on a knlist attached to that process? It???s not obvious to me 
> > that this is required or even desirable. This might suggest that a 
> > knlist should have its own lock rather than using a lock from the event 
> > producing entity, which might make addressing this problem more 
> > straightforward.
> 
> Consider the purpose of knlist. It serves as a container for all knotes
> registered on the given subsystem object, like all knotes of the socket,
> process etc which must be fired on event. See the knote() code. The
> consequence is that the subsystem which fires knote() typically already
> holds a lock protecting its own state. As result, it is natural to
> protect the list of the knotes to activate on subsystem event, by the
> subsystem lock.
> 
> diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
> index 0614903..3f45dca 100644
> --- a/sys/kern/kern_event.c

There is not much gdb info here; I'll try to rebuild kgdb.

https://people.freebsd.org/~pho/stress/log/kostik900.txt

The number of CPUs seems important to this test. Four works for me.

- Peter
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Kqueue races causing crashes

2016-06-15 Thread Peter Holm
On Wed, Jun 15, 2016 at 11:11:43AM +0300, Konstantin Belousov wrote:
> On Tue, Jun 14, 2016 at 10:26:14PM -0500, Eric Badger wrote:
> > I believe they all have more or less the same cause. The crashes occur 
> > because we acquire a knlist lock via the KN_LIST_LOCK macro, but when we 
> > call KN_LIST_UNLOCK, the knote???s knlist reference (kn->kn_knlist) has 
> > been cleared by another thread. Thus we are unable to unlock the 
> > previously acquired lock and hold it until something causes us to crash 
> > (such as the witness code noticing that we???re returning to userland with 
> > the lock still held).
> ...
> > I believe there???s also a small window where the KN_LIST_LOCK macro 
> > checks kn->kn_knlist and finds it to be non-NULL, but by the time it 
> > actually dereferences it, it has become NULL. This would produce the 
> > ???page fault while in kernel mode??? crash.
> > 
> > If someone familiar with this code sees an obvious fix, I???ll be happy to 
> > test it. Otherwise, I???d appreciate any advice on fixing this. My first 
> > thought is that a ???struct knote??? ought to have its own mutex for 
> > controlling access to the flag fields and ideally the ???kn_knlist??? 
> > field. 
> > I.e., you would first acquire a knote???s lock and then the knlist lock, 
> > thus ensuring that no one could clear the kn_knlist variable while you 
> > hold the knlist lock. The knlist lock, however, usually comes from 
> > whichever event producing entity the knote tracks, so getting lock 
> > ordering right between the per-knote mutex and this other lock seems 
> > potentially hard. (Sometimes we call into functions in kern_event.c with 
> > the knlist lock already held, having been acquired in code outside of 
> > kern_event.c. Consider, for example, calling KNOTE_LOCKED from 
> > kern_exit.c; the PROC_LOCK macro has already been used to acquire the 
> > process lock, also serving as the knlist lock).
> This sounds as a good and correct analysis. I tried your test program
> for around a hour on 8-threads machine, but was not able to trigger the
> issue. Might be Peter have better luck reproducing them. Still, I think
> that the problem is there.
> 

I got this after 10 runs:

userret: returning with the following locks held:
exclusive sleep mutex process lock (process lock) r = 0 (0xcb714758) locked @ 
kern/kern_event.c:2125
panic: witness_warn
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper(c15b7f5c,c1844da8,0,c158b3fc,f3b29af8,...) at 
db_trace_self_wrapper+0x2a/frame 0xf3b29ac8
kdb_backtrace(c17a92d1,0,c1228287,f3b29b94,0,...) at kdb_backtrace+0x2d/frame 
0xf3b29b30
vpanic(c1228287,f3b29b94,c1228287,f3b29b94,f3b29b94,...) at vpanic+0x115/frame 
0xf3b29b64
kassert_panic(c1228287,c15bc2c4,cb714758,c15aa7c1,84d,...) at 
kassert_panic+0xd9/frame 0xf3b29b88
witness_warn(2,0,c15ba937,f3b29ca8,c0c018d0,...) at witness_warn+0x32a/frame 
0xf3b29bdc
userret(cc2e1340,f3b29ce8,c15aadd7,4,0,...) at userret+0x92/frame 0xf3b29c20
syscall(f3b29ce8) at syscall+0x50e/frame 0xf3b29cdc

I'll apply the patch and test.

- Peter
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Possible bug in or around posix_fadvise after r292326

2016-01-05 Thread Peter Holm
On Tue, Jan 05, 2016 at 01:07:40PM +0200, Konstantin Belousov wrote:
> On Mon, Jan 04, 2016 at 10:05:21PM -0800, Benno Rice wrote:
> > Hi Konstantin,
> > 
> > I recently updated my dev box to r292962. After doing this I attempted to 
> > set up PostgreSQL 9.4. When I ran initdb the last phase hung. Using 
> > procstat -kk I found it appeared to be stuck in a loop inside a 
> > posix_fadvise syscall. I could not ^C or ^Z the initdb process. I could 
> > kill it but a subsequent attempt to rm -rf the /usr/local/pgsql/data 
> > directory also got stuck and was unkillable by any means. Rebooting allowed 
> > me to remove the directory but the initdb process still hung when I re-ran 
> > it.
> > 
> > I tried PostgreSQL 9.3 with similar results.
> > 
> > Looking at the source code for initdb I found that it calls posix_fadvise 
> > like so[1]:
> > 
> >  /*
> >   * We do what pg_flush_data() would do in the backend: prefer to use
> >   * sync_file_range, but fall back to posix_fadvise.  We ignore errors
> >   * because this is only a hint.
> >   */
> >  #if defined(HAVE_SYNC_FILE_RANGE)
> >  (void) sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WRITE);
> >  #elif defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
> >  (void) posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED);
> >  #else
> >  #error PG_FLUSH_DATA_WORKS should not have been defined
> >  #endif
> > 
> > Looking for recent commits involving POSIX_FADV_DONTNEED I found r292326:
> > 
> > https://svnweb.freebsd.org/changeset/base/292326 
> > 
> > 
> > Backing this revision out allowed the initdb process to complete.
> > 
> > My current theory is that some how we???re getting ENOLCK or EAGAIN from 
> > the BUF_TIMELOCK call in bnoreuselist:
> > 
> > https://svnweb.freebsd.org/base/head/sys/kern/vfs_subr.c?view=annotate#l1676
> >  
> > 
> > 
> > Leading to an infinite loop in vop_stdadvise:
> > 
> > https://svnweb.freebsd.org/base/head/sys/kern/vfs_default.c?annotate=292373#l1083
> >  
> > 
> > 
> > I haven???t managed to dig any deeper than that yet.
> > 
> > Is there any other information I could give you to help narrow this down?
> 
> I do not see this issue locally.
> 

I do:

(kgdb) f 9
#9  0x80ac7956 in vop_stdadvise (ap=0xfe081dc6d930) at 
../../../kern/vfs_default.c:1087
1087error = bnoreuselist(>bo_dirty, bo, startn, 
endn);
(kgdb) l
1082endn = ap->a_end / bsize;
1083for (;;) {
1084error = bnoreuselist(>bo_clean, bo, startn, 
endn);
1085if (error == EAGAIN)
1086continue;
1087error = bnoreuselist(>bo_dirty, bo, startn, 
endn);
1088if (error == EAGAIN)
1089continue;
1090break;
1091}
(kgdb) info loc
vp = (struct vnode *) 0xf8008bdaa9c0
bo = (struct bufobj *) 0xf8008bdaab28
startn = 0x0
endn = 0x
start = 0x0
end = 0x8000
bsize = 0x8000
error = 0x0
(kgdb)

https://people.freebsd.org/~pho/stress/log/kostik855.txt

- Peter
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: isofs kernel panic

2015-12-27 Thread Peter Holm
On Sun, Dec 27, 2015 at 12:09:27PM +0200, Konstantin Belousov wrote:
> On Sun, Dec 27, 2015 at 03:38:48AM -0500, Shawn Webb wrote:
> > Hey All,
> > 
> > This is from booting a new installer ISO, generated today from a very
> > recent commit:
> > 
> > === Begin Log ===
> > Trying to mount root from cd9660:/dev/iso9660/11_0__HBSD_AMD64_CD [ro]...
> > lock order reversal:
> >  1st 0xfe00f6222e40 bufwait (bufwait) @ 
> > /jenkins/workspace/HardenedBSD-master-amd64/sys/vm/vm_pager.c:380
> >  2nd 0xf800063785f0 isofs (isofs) @ 
> > /jenkins/workspace/HardenedBSD-master-amd64/sys/kern/imgact_elf.c:883
> > stack backtrace:
> > #0 0x80a7ce70 at witness_debugger+0x70
> > #1 0x80a7cd71 at witness_checkorder+0xe71
> > #2 0x80a0033b at __lockmgr_args+0xd3b
> > #3 0x80ac2fdc at vop_stdlock+0x3c
> > #4 0x80fc0fc0 at VOP_LOCK1_APV+0x100
> > #5 0x80ae397a at _vn_lock+0x9a
> > #6 0x809c4b01 at exec_elf64_imgact+0xa91
> > #7 0x809e35e9 at kern_execve+0x4b9
> > #8 0x809e2ddc at sys_execve+0x4c
> > #9 0x809c760a at start_init+0x26a
> > #10 0x809eadb4 at fork_exit+0x84
> > #11 0x80e4e9ae at fork_trampoline+0xe
> > userret: returning with the following locks held:
> > exclusive lockmgr bufwait (bufwait) r = 0 (0xfe00f6222c10) locked @ 
> > /jenkins/workspace/HardenedBSD-master-amd64/sys/vm/vm_pager.c:380
> > exclusive lockmgr bufwait (bufwait) r = 0 (0xfe00f6222e40) locked @ 
> > /jenkins/workspace/HardenedBSD-master-amd64/sys/vm/vm_pager.c:380
> > panic: witness_warn
> > === End Log ===
> > 
> > This is 11-CURRENT/amd64, based on HardenedBSD commit
> > f0a4c61a2e9e2433db632d70d5764e79c5b84b7a. I booted the ISO up in bhyve
> > using vmrun.sh. I haven't ruled out anything on HardenedBSD's side, yet,
> > but we don't have any changes that would cause a panic'ing LOR in
> > vm_pager.c. I'll do some more investigative work when I get some more
> > sleep. But if anyone has any ideas, please let me know. I kinda wonder
> > if this is related to the recent VFS changes by FreeBSD.
> 
> The following change would fix your problem, I did not tested it.
> 
> diff --git a/sys/vm/vnode_pager.c b/sys/vm/vnode_pager.c
> index ff30f4d..66dd29d 100644
> --- a/sys/vm/vnode_pager.c
> +++ b/sys/vm/vnode_pager.c
> @@ -806,6 +806,7 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_page_t 
> *m, int count,
>* than a page size, then use special small filesystem code.
>*/
>   if (pagesperblock == 0) {
> + relpbuf(bp, freecnt);
>   for (i = 0; i < count; i++) {
>   PCPU_INC(cnt.v_vnodein);
>   PCPU_INC(cnt.v_vnodepgsin);
> ___

This works for me, when running programs from a isofs file system.
-- 
Peter
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: r269147: NULL mp in getnewvnode() via kern_proc_filedesc_out()

2014-08-07 Thread Peter Holm
On Wed, Aug 06, 2014 at 09:15:12PM +0300, Konstantin Belousov wrote:
 On Wed, Aug 06, 2014 at 12:48:28PM -0500, Bryan Drewery wrote:
  On 8/5/2014 10:56 PM, Bryan Drewery wrote:
   On 8/5/2014 10:19 PM, Konstantin Belousov wrote:
   On Tue, Aug 05, 2014 at 09:47:57PM -0500, Bryan Drewery wrote:
   Has anyone else encountered this? Got it while running poudriere.
  
   NULL mp in getnewvnode()
   [...]
   vn_fullpath1() at vn_fullpath1+0x19d/frame 0xfe1247d8e540
   vn_fullpath() at vn_fullpath+0xc1/frame 0xfe1247d8e590
   export_fd_to_sb() at export_fd_to_sb+0x489/frame 0xfe1247d8e7c0
   kern_proc_filedesc_out() at kern_proc_filedesc_out+0x234/frame
   0xfe1247d8e840
   sysctl_kern_proc_filedesc() at sysctl_kern_proc_filedesc+0x84/frame
   0xfe1247d8e900
   sysctl_root_handler_locked() at
   sysctl_root_handler_locked+0x68/frame 0xfe1247d8e940
   sysctl_root() at sysctl_root+0x18e/frame 0xfe1247d8e990
   userland_sysctl() at userland_sysctl+0x192/frame 0xfe1247d8ea30
   sys___sysctl() at sys___sysctl+0x74/frame 0xfe1247d8eae0
   amd64_syscall() at amd64_syscall+0x25a/frame 0xfe1247d8ebf0
   Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe1247d8ebf0
  
   Unfortunately I have no dump as the kmem was too large compared to my
   swap, and I didn't get to the console before some of the text was
   overwritten. Perhaps it will hit it again soon after reboot and I'll get
   a core.
  
   NULL mp in getnewvnode() is only the printf(), it is not a panic or
   KASSERT.  The event does not stop the machine, nor it prints the
   backtrace.
  
   You mentioned that you was unable to dump, so did the system paniced ?
   Without full log of the panic messages and backtrace, it is impossible
   to start guessing what the problem is.
  
   That said, the printf seemingly outlived its usefulness.
  
   
   Got it. I've set debug.debugger_on_panic=1 to not auto reboot on panic
   next time this happens. I had it at 0 which was causing the lack of
   information in these.
  
  Here is the full trace:
  
  
   NULL mp in getnewvnode()
   VNASSERT failed
   0xf806071dc760: tag null, type VDIR
   usecount 1, writecount 0, refcount 1 mountedhere 0
   flags ()
   lock type zfs: EXCL by thread 0xf8009a53f490 (pid 1028, tmux, tid 
   100881)
   vp=0xf806071dc760, lowervp=0xf8013157f588
   panic: Don't call insmntque(foo, NULL)
   cpuid = 5
   KDB: stack backtrace:
   db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
   0xfe1247e76b50
   kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe1247e76c00
   vpanic() at vpanic+0x126/frame 0xfe1247e76c40
   kassert_panic() at kassert_panic+0x139/frame 0xfe1247e76cb0
   insmntque1() at insmntque1+0x230/frame 0xfe1247e76cf0
   null_nodeget() at null_nodeget+0x158/frame 0xfe1247e76d60
   null_lookup() at null_lookup+0xeb/frame 0xfe1247e76dd0
   VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0xf1/frame 0xfe1247e76e00
   lookup() at lookup+0x5ad/frame 0xfe1247e76e90
   namei() at namei+0x4e4/frame 0xfe1247e76f50
   vn_open_cred() at vn_open_cred+0x27a/frame 0xfe1247e770a0
   vop_stdvptocnp() at vop_stdvptocnp+0x161/frame 0xfe1247e773e0
   null_vptocnp() at null_vptocnp+0x2b/frame 0xfe1247e77440
   VOP_VPTOCNP_APV() at VOP_VPTOCNP_APV+0xf7/frame 0xfe1247e77470
   vn_vptocnp_locked() at vn_vptocnp_locked+0x118/frame 0xfe1247e774e0
   vn_fullpath1() at vn_fullpath1+0x19d/frame 0xfe1247e77540
   vn_fullpath() at vn_fullpath+0xc1/frame 0xfe1247e77590
   export_fd_to_sb() at export_fd_to_sb+0x489/frame 0xfe1247e777c0
   kern_proc_filedesc_out() at kern_proc_filedesc_out+0x234/frame 
   0xfe1247e77840
   sysctl_kern_proc_filedesc() at sysctl_kern_proc_filedesc+0x84/frame 
   0xfe1247e77900
   sysctl_root_handler_locked() at sysctl_root_handler_locked+0x68/frame 
   0xfe1247e77940
   sysctl_root() at sysctl_root+0x18e/frame 0xfe1247e77990
   userland_sysctl() at userland_sysctl+0x192/frame 0xfe1247e77a30
   sys___sysctl() at sys___sysctl+0x74/frame 0xfe1247e77ae0
   amd64_syscall() at amd64_syscall+0x25a/frame 0xfe1247e77bf0
   Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe1247e77bf0
   --- syscall (202, FreeBSD ELF64, sys___sysctl), rip = 0x801041fca, rsp = 
   0x7fffd878, rbp = 0x7fffd8b0 ---
   KDB: enter: panic
   [ thread pid 1028 tid 100881 ]
   Stopped at  kdb_enter+0x3e: movq$0,kdb_why
   db call doadump()
   
   Dump failed. Partition too small.
   = 0
  
 
 Try this.
 
 diff --git a/sys/fs/nullfs/null_vnops.c b/sys/fs/nullfs/null_vnops.c
 index 481644c..e803c24 100644
 --- a/sys/fs/nullfs/null_vnops.c

With this patch I get a 
panic: Lock (lockmgr) null not locked @ kern/vfs_default.c:523.

Details @ http://people.freebsd.org/~pho/stress/log/kostik698.txt

- Peter
___
freebsd-current@freebsd.org mailing list

Re: nanobsd / dd problem?

2013-12-09 Thread Peter Holm
On Mon, Dec 09, 2013 at 06:42:39AM +0200, Konstantin Belousov wrote:
 On Sun, Dec 08, 2013 at 06:31:36PM +0100, Stefan Hegnauer wrote:
  Hi,
  
   
  
  I am using freebsd-current (FreeBSD BUILDMASTER 11.0-CURRENT FreeBSD
  11.0-CURRENT #0 r259095: Sun Dec  8 10:20:40 CET 2013
  root@BUILDMASTER:/usr/obj/usr/src/sys/ASUS  i386) in a VirtualBox as a build
  machine for nanobsd images to be used on pc-engines.ch alix boards. The only
  difference to GENERIC is the inclusion of 'march=geode' and disabling of
  most debugging switches (malloc, Witness etc). Worked like a charm in the
  past.
  
   
  
  Since late summer - sorry, no exact date / svn revision - nanobsd.sh fails
  at the last stage when building the disk image, e.g. with
  
  ...
  
  00:00:25 ### log: /usr/obj/nanobsd.alixpf//_.di
  
  #
  
   
  
  Looking a bit closer it seems that dd(1) returns with an I/O error whenever
  the input is a file created with mdconfig(8):
  
  # dd if=/dev/zero of=somebackingfile bs=1k count=5k
  
  # mdconfig -f somebackingfile -u md0
  
  # newfs -U /dev/md0
  
  # dd if=/dev/md0 of=/dev/null
  
  dd: /dev/md0: Input/output error
  
  10241+0 records in
  
  10241+0 records out
  
  5243392 bytes transferred in 3.240345 secs (1618159 bytes/sec)
  
   
  
  The outputfile in nanobsd.sh seems to be error-free.
 It should be one block larger than the right size.
 \
  
  Anyone else seen similar behaviour? How to proceed/fix it?
  
 
 The following patch should clear the error.
 
 The issue is that kern_physio() incorrectly detects EOF due to incorrect
 calculation of bio bio_resid after the bio_length was clipped by the
 'excess' code in g_io_check(). Both bio_length and bio_resid appear
 to be 0 in the pre-last dd transfer, which starts exactly and the
 mediasize, and kern_physio() thinks that it transferred one more block
 than was transferred.
 
 I _suspect_ that it was caused by 'excess' code moving in r256880,
 but I am really not in the right condition to analyze it.  If somebody
 could try the same dd experiment to confirm or deny my suspicion, it
 would be useful.
 
 The patch below should be a right thing to do anyway.
 
 diff --git a/sys/kern/vfs_bio.c b/sys/kern/vfs_bio.c
 index c23a74b..b7c4d60 100644
 --- a/sys/kern/vfs_bio.c
 +++ b/sys/kern/vfs_bio.c
 @@ -3679,7 +3679,6 @@ bufdonebio(struct bio *bip)
  
   bp = bip-bio_caller2;
   bp-b_resid = bp-b_bcount - bip-bio_completed;
 - bp-b_resid = bip-bio_resid;   /* XXX: remove */
   bp-b_ioflags = bip-bio_flags;
   bp-b_error = bip-bio_error;
   if (bp-b_error)

I have tested this patch with a buildworld + selected other tests. No
problems seen (and problem fixed, of cause).

- Peter
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: panic: double fault with 11.0-CURRENT r258504

2013-11-30 Thread Peter Holm
On Thu, Nov 28, 2013 at 09:56:10AM +0200, Konstantin Belousov wrote:
 On Wed, Nov 27, 2013 at 01:11:35PM -0800, Don Lewis wrote:
  On 27 Nov, Konstantin Belousov wrote:
   On Wed, Nov 27, 2013 at 11:35:19AM -0800, Don Lewis wrote:
   On 27 Nov, Konstantin Belousov wrote:
On Wed, Nov 27, 2013 at 11:02:57AM -0800, Don Lewis wrote:
On 27 Nov, Konstantin Belousov wrote:
 On Wed, Nov 27, 2013 at 10:33:30AM -0800, Don Lewis wrote:
 On 27 Nov, Konstantin Belousov wrote:
  On Wed, Nov 27, 2013 at 09:41:36AM -0800, Don Lewis wrote:
  On 27 Nov, Konstantin Belousov wrote:
   On Wed, Nov 27, 2013 at 02:49:12AM -0800, Don Lewis wrote:
   http://people.freebsd.org/~truckman/doublefault2.JPG
   
   What is the instruction at cpu_switch+0x9b ?
  
  movl 0x8(%edx),%eax
  So it is line 176 in swtch.s. Is machine still in ddb, or did you
  obtained the core ? If yes, please print out the content of 
  words at
  0xe4f62bb0 + 4, +8 (*), +16. Please print the content of the 
  word at
  address (*) + 8.
 
 It is still in ddb.
 
 http://people.freebsd.org/~truckman/doublefault3.JPG, though not 
 in
 the above order.
 Uhm, sorry, I mistyped the last part of the instructions.
 
 The new thread pointer is 0xd2f4e000, there is nothing 
 incriminating.
 Please print the word at 0xd2f4e000+0x254 == 0xd2f4e254, which 
 would be
 the address of the new thread pcb. It is load from the pcb + 8 which
 faults.

0xf3d44d60
Again, the pointer looks fine, and its tail is 0xd60, which is correct 
for
the pcb offset in the last page of the thread stack.

Please do 'show thread 0xd2f4e000' before trying below instructions.
   
   Ok, see below:

What happens if you try to read word at 0xf3d44d68 ?
   
   Nothing bad ...
   
   http://people.freebsd.org/~truckman/doublefault4.JPG
   
   So the thread structure looks sane, the stack region is in place where
   it is supposed to be, all the gathered data looks self-consistent. And,
   the access to the faulted address from ddb does not fault.
   
   Thread stacks can only be invalidated when the process is swapped out and
   kernel stack is written to swap.  Your thread flags indicate that it is
   in memory, and TDF_CANSWAP is not set.  I do not believe that our swapout
   code would invalidate stack mapping in such situation, otherwise we would
   have too many complaints already.
   
   Just in case, do you use swap on this box ?
  
  I do.
  
   And, as the last resort, I do understand that this sounds as giving up,
   do you monitor the temperature of the CPUs ? BTW, which CPUs are that,
   please show the cpu identification lines from the boot dmesg.
  
  I don't monitor the temperature, but I do hear the CPU fan speed ramping
  up and down when I'm building ports like this.  Even though I'm pretty
  much keeping one core busy the whole time, the temperature must drop
  enough at times to let the fan speed drop.
  
  I can run math/mprime on this machine for a while to see if anything
  shows up.  I also have a very similar machine (same motherboard but
  different CPU) that I can move the drive over to and test.
  
  Here's the full dmesg.boot:
  
  Copyright (c) 1992-2013 The FreeBSD Project.
  Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
  The Regents of the University of California. All rights reserved.
  FreeBSD is a registered trademark of The FreeBSD Foundation.
  FreeBSD 11.0-CURRENT #63 r258614M: Tue Nov 26 00:29:01 PST 2013
  d...@scratch.catspoiler.org:/usr/obj/usr/src/sys/GENERICSMB i386
  FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610
  WARNING: WITNESS option enabled, expect reduced performance.
  CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 4800+ (2500.06-MHz 686-class 
  CPU)
Origin = AuthenticAMD  Id = 0x60fb1  Family = 0xf  Model = 0x6b  
  Stepping = 1

  Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT
Features2=0x2001SSE3,CX16
AMD Features=0xea500800SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!
AMD Features2=0x11fLAHF,CMP,SVM,ExtAPIC,CR8,Prefetch
 
 The errata list for the Athlon 64 X2 is quite long.  Do you have latest
 BIOS ?  I am not sure if AMD provides standalone firmware update blocks
 for their CPUs.  If any Linux distribution ships updates for AMD CPUs,
 it might be useful to load the update with cpucontrol(8).  Even if we
 do not hit a CPU bug, it would provide me with more certainity that we
 are not chasing ghost.
 
 Another things to try, in vain, is to compile kernel with gcc or disable
 SMP.
 
 Peter, could you, please, try to reproduce the issue ?  It does not look
 like a random hardware failure, since in all cases, it is curthread access
 which is faulting.  The issue is only reported by Don, and so far only
 for i386 SMP.

I'm not seeing this 

Re: panic: double fault with 11.0-CURRENT r258504

2013-11-28 Thread Peter Holm
On Thu, Nov 28, 2013 at 09:56:10AM +0200, Konstantin Belousov wrote:
 On Wed, Nov 27, 2013 at 01:11:35PM -0800, Don Lewis wrote:
  On 27 Nov, Konstantin Belousov wrote:
   On Wed, Nov 27, 2013 at 11:35:19AM -0800, Don Lewis wrote:
   On 27 Nov, Konstantin Belousov wrote:
On Wed, Nov 27, 2013 at 11:02:57AM -0800, Don Lewis wrote:
On 27 Nov, Konstantin Belousov wrote:
 On Wed, Nov 27, 2013 at 10:33:30AM -0800, Don Lewis wrote:
 On 27 Nov, Konstantin Belousov wrote:
  On Wed, Nov 27, 2013 at 09:41:36AM -0800, Don Lewis wrote:
  On 27 Nov, Konstantin Belousov wrote:
   On Wed, Nov 27, 2013 at 02:49:12AM -0800, Don Lewis wrote:
   http://people.freebsd.org/~truckman/doublefault2.JPG
   
   What is the instruction at cpu_switch+0x9b ?
  
  movl 0x8(%edx),%eax
  So it is line 176 in swtch.s. Is machine still in ddb, or did you
  obtained the core ? If yes, please print out the content of 
  words at
  0xe4f62bb0 + 4, +8 (*), +16. Please print the content of the 
  word at
  address (*) + 8.
 
 It is still in ddb.
 
 http://people.freebsd.org/~truckman/doublefault3.JPG, though not 
 in
 the above order.
 Uhm, sorry, I mistyped the last part of the instructions.
 
 The new thread pointer is 0xd2f4e000, there is nothing 
 incriminating.
 Please print the word at 0xd2f4e000+0x254 == 0xd2f4e254, which 
 would be
 the address of the new thread pcb. It is load from the pcb + 8 which
 faults.

0xf3d44d60
Again, the pointer looks fine, and its tail is 0xd60, which is correct 
for
the pcb offset in the last page of the thread stack.

Please do 'show thread 0xd2f4e000' before trying below instructions.
   
   Ok, see below:

What happens if you try to read word at 0xf3d44d68 ?
   
   Nothing bad ...
   
   http://people.freebsd.org/~truckman/doublefault4.JPG
   
   So the thread structure looks sane, the stack region is in place where
   it is supposed to be, all the gathered data looks self-consistent. And,
   the access to the faulted address from ddb does not fault.
   
   Thread stacks can only be invalidated when the process is swapped out and
   kernel stack is written to swap.  Your thread flags indicate that it is
   in memory, and TDF_CANSWAP is not set.  I do not believe that our swapout
   code would invalidate stack mapping in such situation, otherwise we would
   have too many complaints already.
   
   Just in case, do you use swap on this box ?
  
  I do.
  
   And, as the last resort, I do understand that this sounds as giving up,
   do you monitor the temperature of the CPUs ? BTW, which CPUs are that,
   please show the cpu identification lines from the boot dmesg.
  
  I don't monitor the temperature, but I do hear the CPU fan speed ramping
  up and down when I'm building ports like this.  Even though I'm pretty
  much keeping one core busy the whole time, the temperature must drop
  enough at times to let the fan speed drop.
  
  I can run math/mprime on this machine for a while to see if anything
  shows up.  I also have a very similar machine (same motherboard but
  different CPU) that I can move the drive over to and test.
  
  Here's the full dmesg.boot:
  
  Copyright (c) 1992-2013 The FreeBSD Project.
  Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
  The Regents of the University of California. All rights reserved.
  FreeBSD is a registered trademark of The FreeBSD Foundation.
  FreeBSD 11.0-CURRENT #63 r258614M: Tue Nov 26 00:29:01 PST 2013
  d...@scratch.catspoiler.org:/usr/obj/usr/src/sys/GENERICSMB i386
  FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610
  WARNING: WITNESS option enabled, expect reduced performance.
  CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 4800+ (2500.06-MHz 686-class 
  CPU)
Origin = AuthenticAMD  Id = 0x60fb1  Family = 0xf  Model = 0x6b  
  Stepping = 1

  Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT
Features2=0x2001SSE3,CX16
AMD Features=0xea500800SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!
AMD Features2=0x11fLAHF,CMP,SVM,ExtAPIC,CR8,Prefetch
 
 The errata list for the Athlon 64 X2 is quite long.  Do you have latest
 BIOS ?  I am not sure if AMD provides standalone firmware update blocks
 for their CPUs.  If any Linux distribution ships updates for AMD CPUs,
 it might be useful to load the update with cpucontrol(8).  Even if we
 do not hit a CPU bug, it would provide me with more certainity that we
 are not chasing ghost.
 
 Another things to try, in vain, is to compile kernel with gcc or disable
 SMP.
 
 Peter, could you, please, try to reproduce the issue ?  It does not look
 like a random hardware failure, since in all cases, it is curthread access
 which is faulting.  The issue is only reported by Don, and so far only
 for i386 SMP.

I'm running tests 

Re: NewNFS vs. oldNFS for 10.0?

2013-03-18 Thread Peter Holm
On Mon, Mar 18, 2013 at 01:43:24PM +0100, Andre Oppermann wrote:
 On 15.03.2013 15:08, Rick Macklem wrote:
  Lars Eggert wrote:
  Hi,
 
  this reminds me that I ran into an issue lately with the new NFS and
  locking for NFSv3 mounts on a client that ran -CURRENT and a server
  that ran -STABLE.
 
  When I ran portmaster -a on the client, which mounted /usr/ports and
  /usr/local, as well as the location of the respective sqlite databases
  over NFSv3, the client network stack became unresponsive on all
  interfaces for 30 or so seconds and e.g. SSH connections broke. The
  serial console remained active throughout, and the system didn't
  crash. About a minute after the wedgie I could SSH into the box again,
  too.
 
  The issue went away when I killed lockd on the client, but that caused
  the sqlite database to become corrupted over time. The workaround for
  me was to move to NFSv4, which has been working fine. (One more reason
  to make it the default...)
 
  I've mentioned limitations w.r.t. the design of the NLM protocol (rpc.lockd)
  before. Any time there is any kind of network topology issue, it will run
  into difficulties. There may also be other issues.
 
  However, since both the old and new client use the same rpc.lockd in the
  same way (the new one just cribbed the code from the old one), I think
  the same problem would exist for the old one. As such, I don't believe
  this is a regression.
 
 Maybe we can talk Peter Holm into periodically running his file system
 stress test suite against NFS too?  :-)  Peter?
 

I'll add this to my work queue :)

- Peter
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: panic: vputx: missed vn_close

2013-01-10 Thread Peter Holm
On Thu, Jan 10, 2013 at 01:40:07AM +0200, Konstantin Belousov wrote:
 On Wed, Jan 09, 2013 at 07:52:43PM +0100, Florian Smeets wrote:
  Hi,
  
  I got this while building packages with poudriere. I'm running r245188.
  
  Let me know if you need anything else from the dump.
  
  Florian
  
  VNASSERT failed
  0xfe04fda5bba0: tag zfs, type VREG
  usecount 1, writecount 1, refcount 1 mountedhere 0
  flags (VI_ACTIVE)
   VI_LOCKedv_object 0xfe062f6479f8 ref 0 pages 0
  lock type zfs: EXCL by thread 0xfe00bd683480 (pid 34602, umount,
  tid 100578)
  panic: vputx: missed vn_close
  cpuid = 3
  Uptime: 9h25m23s
  Dumping 13255 out of 32647
  MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
  
  [...]
  
  (kgdb) where
  #0  doadump (textdump=1) at pcpu.h:229
  #1  0x804c4ab7 in kern_reboot (howto=260) at
  /usr/home/flo/dev/checkouts/svn-src/sys/kern/kern_shutdown.c:446
  #2  0x804c4fc6 in vpanic (fmt=value optimized out, ap=value
  optimized out) at
  /usr/home/flo/dev/checkouts/svn-src/sys/kern/kern_shutdown.c:753
  #3  0x804c4e56 in kassert_panic (fmt=value optimized out) at
  /usr/home/flo/dev/checkouts/svn-src/sys/kern/kern_shutdown.c:641
  #4  0x8055714d in vputx (vp=0xfe04fda5bba0, func=2) at
  /usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_subr.c:2243
  #5  0x80d6b42f in null_reclaim (ap=value optimized out) at
  /usr/home/flo/dev/checkouts/svn-src/sys/modules/nullfs/../../fs/nullfs/null_vnops.c:743
  #6  0x8070aee8 in VOP_RECLAIM_APV (vop=value optimized out,
  a=value optimized out) at vnode_if.c:1959
  #7  0x8055844c in vgonel (vp=0xfe04fda5b7c0) at vnode_if.h:830
  #8  0x80557a7f in vflush (mp=0xfe0533ce3cc0, rootrefs=1,
  flags=2, td=0xfe00bd683480) at
  /usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_subr.c:2625
  #9  0x80d6aa4e in nullfs_unmount (mp=0xfe0533ce3cc0,
  mntflags=value optimized out)
  at
  /usr/home/flo/dev/checkouts/svn-src/sys/modules/nullfs/../../fs/nullfs/null_vfsops.c:250
  #10 0x805502cf in dounmount (mp=0xfe0533ce3cc0,
  flags=134742016, td=value optimized out) at
  /usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_mount.c:1314
  #11 0x8054ff8b in sys_unmount (td=0xfe00bd683480,
  uap=0xff90d2c87a40) at
  /usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_mount.c:1211
  #12 0x806b4845 in amd64_syscall (td=0xfe00bd683480,
  traced=0) at subr_syscall.c:134
  #13 0x8069d04b in Xfast_syscall () at exception.S:387
  #14 0x000800882ffa in ?? ()
  Previous frame inner to this frame (corrupt stack?)
  
 
 I was able to reproduce it locally. I think that you need to have a file
 opened for write on the nullfs mount, and then do forced unmount of
 the mount, while file is still open.
 
 The patch below fixed it for me.
 
 diff --git a/sys/fs/nullfs/null_vnops.c b/sys/fs/nullfs/null_vnops.c
 index cc35d81..3be7366 100644
 --- a/sys/fs/nullfs/null_vnops.c

I've verified the scenario and are now testing with your patch.

- Peter
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: panic, seems related to r234386

2012-05-13 Thread Peter Holm
On Sun, May 13, 2012 at 12:49:38AM +0200, Mateusz Guzik wrote:
 On Thu, May 10, 2012 at 12:39:00PM +0200, Peter Holm wrote:
  On Thu, May 10, 2012 at 12:21:18PM +0200, Mateusz Guzik wrote:
   On Tue, May 08, 2012 at 09:45:14PM +0200, Peter Holm wrote:
On Mon, May 07, 2012 at 10:11:53PM +0200, Mateusz Guzik wrote:
 On Mon, May 07, 2012 at 12:28:41PM -0700, Doug Barton wrote:
  On 05/06/2012 15:19, Sergey Kandaurov wrote:
   On 7 May 2012 01:54, Doug Barton do...@freebsd.org wrote:
   I got this with today's current, previous (working) kernel is 
   r232719.
  
   panic: _mtx_lock_sleep: recursed on non-recursive mutex struct 
   mount mtx
   @ /frontier/svn/head/sys/kern/vfs_subr.c:4595
  
  ...
  
   Please try this patch.
   
   Index: fs/ext2fs/ext2_vfsops.c
   ===
   --- fs/ext2fs/ext2_vfsops.c (revision 235108)
   +++ fs/ext2fs/ext2_vfsops.c (working copy)
   @@ -830,7 +830,6 @@
   /*
* Write back each (modified) inode.
*/
   -   MNT_ILOCK(mp);
loop:
   MNT_VNODE_FOREACH_ALL(vp, mp, mvp) {
   if (vp-v_type == VNON) {
   
  
  Didn't help, sorry. I put 234385 through some pretty heavy load
  yesterday, and everything was fine. As soon as I move up to 234386, 
  the
  panic triggered again. So I cleaned everything up, applied your 
  patch,
  built a kernel from scratch, and rebooted. It was Ok for a few 
  seconds
  after boot, then panic'ed again, I think in a different place, but 
  I'm
  not sure because subsequent attempts to fsck the file systems 
  caused new
  panics which overwrote the old ones before they could be saved.
  
 
 Another MNT_ILOCK was hiding few lines below, try this patch:
 
 http://student.agh.edu.pl/~mjguzik/patches/ext2fs-ilock.patch
 
 I've tested this a bit and I believe this fixes your problem.
 

Gave this a spin and found what looks like a deadlock:

http://people.freebsd.org/~pho/stress/log/ext2fs.txt

Not a new problem, it would seem. Same issue with 8.3-PRERELEASE 
r232656M.

   
   pid 2680 (fts) holds lock for vnode cb4be414 and tries to lock cc0ac15c
   pid 2581 (openat) holds lock for vnode cc0ac15c and tries to lock cb4be414
   
   openat calls rmdir foo/bar and ext2_rmdir unlocks and tries to lock
   again foo's vnode.
   
   This is fairly easly reproducible with concurrently running mkdir and fts
   testcase programs that are provided by stress2.
   
   I'll try to come up with a patch by the end of the week.
   
  
 
 Easier way to reproduce: mkdir from stress2 and while true; do find /mnt 
 /dev/null; done on another terminal.
 
 Assuming foo/bar directory tree, deadlock happens during removal of bar
 with simultaneous lookup of .. in bar.
 
 Proposed trivial patch:
 http://student.agh.edu.pl/~mjguzik/patches/ext2fs_rmdir-deadlock.patch
 
 If the lock cannot be acquired immediately unlocks 'bar' vnode and then
 locks both vnodes in order.
 
 After patching this I ran into another issue - wrong vnode type panics
 from cache_enter_time after calls by ext2_lookup. (It takes some time to
 reproduce this, testcase as before.)
 
 It looks like ext2_lookup is actually adapted version of ufs_lookup and
 lacks some bugfixes present in current ufs_lookup. I believe those
 bugfixes address this bug.
 
 Here is my attempt to fix the problem (based on ufs_lookup changes):
 http://student.agh.edu.pl/~mjguzik/patches/ext2fs_lookup-relookup.patch
 

I have tested these two patches for a few hours and they do indeed
seem to fix the problem I had seen before.

Regards,

- Peter
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: panic, seems related to r234386

2012-05-10 Thread Peter Holm
On Thu, May 10, 2012 at 12:21:18PM +0200, Mateusz Guzik wrote:
 On Tue, May 08, 2012 at 09:45:14PM +0200, Peter Holm wrote:
  On Mon, May 07, 2012 at 10:11:53PM +0200, Mateusz Guzik wrote:
   On Mon, May 07, 2012 at 12:28:41PM -0700, Doug Barton wrote:
On 05/06/2012 15:19, Sergey Kandaurov wrote:
 On 7 May 2012 01:54, Doug Barton do...@freebsd.org wrote:
 I got this with today's current, previous (working) kernel is 
 r232719.

 panic: _mtx_lock_sleep: recursed on non-recursive mutex struct mount 
 mtx
 @ /frontier/svn/head/sys/kern/vfs_subr.c:4595

...

 Please try this patch.
 
 Index: fs/ext2fs/ext2_vfsops.c
 ===
 --- fs/ext2fs/ext2_vfsops.c (revision 235108)
 +++ fs/ext2fs/ext2_vfsops.c (working copy)
 @@ -830,7 +830,6 @@
 /*
  * Write back each (modified) inode.
  */
 -   MNT_ILOCK(mp);
  loop:
 MNT_VNODE_FOREACH_ALL(vp, mp, mvp) {
 if (vp-v_type == VNON) {
 

Didn't help, sorry. I put 234385 through some pretty heavy load
yesterday, and everything was fine. As soon as I move up to 234386, the
panic triggered again. So I cleaned everything up, applied your patch,
built a kernel from scratch, and rebooted. It was Ok for a few seconds
after boot, then panic'ed again, I think in a different place, but I'm
not sure because subsequent attempts to fsck the file systems caused new
panics which overwrote the old ones before they could be saved.

   
   Another MNT_ILOCK was hiding few lines below, try this patch:
   
   http://student.agh.edu.pl/~mjguzik/patches/ext2fs-ilock.patch
   
   I've tested this a bit and I believe this fixes your problem.
   
  
  Gave this a spin and found what looks like a deadlock:
  
  http://people.freebsd.org/~pho/stress/log/ext2fs.txt
  
  Not a new problem, it would seem. Same issue with 8.3-PRERELEASE r232656M.
  
 
 pid 2680 (fts) holds lock for vnode cb4be414 and tries to lock cc0ac15c
 pid 2581 (openat) holds lock for vnode cc0ac15c and tries to lock cb4be414
 
 openat calls rmdir foo/bar and ext2_rmdir unlocks and tries to lock
 again foo's vnode.
 
 This is fairly easly reproducible with concurrently running mkdir and fts
 testcase programs that are provided by stress2.
 
 I'll try to come up with a patch by the end of the week.
 

Great. Thank you for looking at this.

- Peter
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: panic, seems related to r234386

2012-05-08 Thread Peter Holm
On Mon, May 07, 2012 at 10:11:53PM +0200, Mateusz Guzik wrote:
 On Mon, May 07, 2012 at 12:28:41PM -0700, Doug Barton wrote:
  On 05/06/2012 15:19, Sergey Kandaurov wrote:
   On 7 May 2012 01:54, Doug Barton do...@freebsd.org wrote:
   I got this with today's current, previous (working) kernel is r232719.
  
   panic: _mtx_lock_sleep: recursed on non-recursive mutex struct mount mtx
   @ /frontier/svn/head/sys/kern/vfs_subr.c:4595
  
  ...
  
   Please try this patch.
   
   Index: fs/ext2fs/ext2_vfsops.c
   ===
   --- fs/ext2fs/ext2_vfsops.c (revision 235108)
   +++ fs/ext2fs/ext2_vfsops.c (working copy)
   @@ -830,7 +830,6 @@
   /*
* Write back each (modified) inode.
*/
   -   MNT_ILOCK(mp);
loop:
   MNT_VNODE_FOREACH_ALL(vp, mp, mvp) {
   if (vp-v_type == VNON) {
   
  
  Didn't help, sorry. I put 234385 through some pretty heavy load
  yesterday, and everything was fine. As soon as I move up to 234386, the
  panic triggered again. So I cleaned everything up, applied your patch,
  built a kernel from scratch, and rebooted. It was Ok for a few seconds
  after boot, then panic'ed again, I think in a different place, but I'm
  not sure because subsequent attempts to fsck the file systems caused new
  panics which overwrote the old ones before they could be saved.
  
 
 Another MNT_ILOCK was hiding few lines below, try this patch:
 
 http://student.agh.edu.pl/~mjguzik/patches/ext2fs-ilock.patch
 
 I've tested this a bit and I believe this fixes your problem.
 

Gave this a spin and found what looks like a deadlock:

http://people.freebsd.org/~pho/stress/log/ext2fs.txt

Not a new problem, it would seem. Same issue with 8.3-PRERELEASE r232656M.

- Peter
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Thoughts on TMPFS no longer being considered highly experimental

2011-06-25 Thread Peter Holm
On Fri, Jun 24, 2011 at 10:57:16PM +0300, Kostik Belousov wrote:
 On Fri, Jun 24, 2011 at 06:20:03PM +0200, Peter Holm wrote:
  Got a panic: Not a vnode object quite fast:
  
  http://people.freebsd.org/~pho/stress/log/kostik441.txt
 
 Ah, yes, this is an assertion that was added in the r209702.
 http://people.freebsd.org/~kib/misc/tmpfs.7.patch

Looks good. The mmap(2) test doesn't panic any more, nor does any of
the other TMPFS tests I have.

- Peter
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Thoughts on TMPFS no longer being considered highly experimental

2011-06-24 Thread Peter Holm
On Thu, Jun 23, 2011 at 11:21:53PM +0300, Kostik Belousov wrote:
 On Thu, Jun 23, 2011 at 09:31:09AM -0700, David O'Brien wrote:
  Does anyone object to this patch?
  
  David Wolfskill and I have run TMPFS on a number of machines for two
  years with no problems.
  
  I may have missed something, but I'm not aware of any serious PRs on
  TMPFS either.
  
  
  Index: tmpfs_vfsops.c
  ===
  --- tmpfs_vfsops.c  (revision 221113)
  +++ tmpfs_vfsops.c  (working copy)
  @@ -155,9 +155,6 @@ tmpfs_mount(struct mount *mp)
  return EOPNOTSUPP;
  }
   
  -   printf(WARNING: TMPFS is considered to be a highly experimental 
  -   feature in FreeBSD.\n);
  -
  vn_lock(mp-mnt_vnodecovered, LK_SHARED | LK_RETRY);
  error = VOP_GETATTR(mp-mnt_vnodecovered, va, mp-mnt_cred);
  VOP_UNLOCK(mp-mnt_vnodecovered, 0);
 
 The things I am aware of:
 - there is a races on the lookup. They were papered over in r212305,
 but the bug was not really fixed, AFAIR.
 
 - the tmpfs does double-buffering for the mapped vnodes. This is quite
 insulting for the memory-backed fs, isn't it ? I have a patch, but it is
 still under review.
 
 - I believe Peter Holm has more test cases that fails with tmpfs. He
 would have more details. I somewhat remember some panic on execve(2) the
 binary located on tmpfs.
 

I ran the TMPFS tests I have and so far I only spotted the mmap(2)
problem:

http://people.freebsd.org/~pho/stress/log/tmpfs/

 Removing the warning will not make the issues coming away.

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.11 (FreeBSD)
 
 iEYEARECAAYFAk4DoGEACgkQC3+MBN1Mb4j9wwCg0V37VuQUw5heAl/Z/iAlO+h0
 SmAAoJf/+BF533SS0hUjGsscsSAqUApX
 =5GKO
 -END PGP SIGNATURE-


-- 
Peter
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Thoughts on TMPFS no longer being considered highly experimental

2011-06-24 Thread Peter Holm
On Fri, Jun 24, 2011 at 02:06:27PM +0300, Kostik Belousov wrote:
 On Fri, Jun 24, 2011 at 12:30:16PM +0200, Peter Holm wrote:
  On Thu, Jun 23, 2011 at 11:21:53PM +0300, Kostik Belousov wrote:
   On Thu, Jun 23, 2011 at 09:31:09AM -0700, David O'Brien wrote:
Does anyone object to this patch?

David Wolfskill and I have run TMPFS on a number of machines for two
years with no problems.

I may have missed something, but I'm not aware of any serious PRs on
TMPFS either.


Index: tmpfs_vfsops.c
===
--- tmpfs_vfsops.c  (revision 221113)
+++ tmpfs_vfsops.c  (working copy)
@@ -155,9 +155,6 @@ tmpfs_mount(struct mount *mp)
return EOPNOTSUPP;
}
 
-   printf(WARNING: TMPFS is considered to be a highly 
experimental 
-   feature in FreeBSD.\n);
-
vn_lock(mp-mnt_vnodecovered, LK_SHARED | LK_RETRY);
error = VOP_GETATTR(mp-mnt_vnodecovered, va, mp-mnt_cred);
VOP_UNLOCK(mp-mnt_vnodecovered, 0);
   
   The things I am aware of:
   - there is a races on the lookup. They were papered over in r212305,
   but the bug was not really fixed, AFAIR.
   
   - the tmpfs does double-buffering for the mapped vnodes. This is quite
   insulting for the memory-backed fs, isn't it ? I have a patch, but it is
   still under review.
   
   - I believe Peter Holm has more test cases that fails with tmpfs. He
   would have more details. I somewhat remember some panic on execve(2) the
   binary located on tmpfs.
   
  
  I ran the TMPFS tests I have and so far I only spotted the mmap(2)
  problem:
  
  http://people.freebsd.org/~pho/stress/log/tmpfs/
 It would be indeed good if the issue was the only remaining problem.

Well, more testing is needed for sure.

 The deadlock in tmpfs6.txt is caused by doing copyin() while having
 a page busied. This should be fixed indirectly by the patch to
 avoid double-buffering, I uploaded the latest version at
 http://people.freebsd.org/~kib/misc/tmpfs.5.patch
 
  
   Removing the warning will not make the issues coming away.
  

This doesn't compile:

=== tmpfs (all)
cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc   
-DHAVE_KERNEL_OPTION_HEADERS -include 
/usr/src/sys/i386/compile/PHO/opt_global.h -I. -I@ -I@/contrib/altq
-finline-limit=8000 --param inline-unit-growth=100 --param 
large-function-growth=1000 -fno-common -g -I/usr/src/sys/i386/compile/PHO  
-mno-align-long-strings -mpreferred-stack-boundary=2 -mno-sse
-mno-mmx -msoft-float -ffreestanding -fstack-protector -std=iso9899:1999 
-fstack-protector -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes  
-Wmissing-prototypes -Wpointer-arith -Winline
-Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions  
-Wmissing-include-dirs -fdiagnostics-show-option -c 
/usr/src/sys/modules/tmpfs/../../fs/tmpfs/tmpfs_subr.c
cc1: warnings being treated as errors
/usr/src/sys/modules/tmpfs/../../fs/tmpfs/tmpfs_subr.c: In function 
'tmpfs_reg_resize':
/usr/src/sys/modules/tmpfs/../../fs/tmpfs/tmpfs_subr.c:911: warning: 'uobj' is 
used uninitialized in this function
*** Error code 1

 886 int
 887 tmpfs_reg_resize(struct vnode *vp, off_t newsize)
 888 {
 889 struct tmpfs_mount *tmp;
 890 struct tmpfs_node *node;
 891 vm_object_t uobj;
 892 vm_page_t m;
 893 vm_pindex_t newpages, oldpages;
 894 off_t oldsize;
 895 size_t zerolen;
 896 
 897 MPASS(vp-v_type == VREG);
 898 MPASS(newsize = 0);
 899 
 900 node = VP_TO_TMPFS_NODE(vp);
 901 tmp = VFS_TO_TMPFS(vp-v_mount);
 902 
 903 /*
 904  * Convert the old and new sizes to the number of pages needed to
 905  * store them.  It may happen that we do not need to do anything
 906  * because the last allocated page can accommodate the change on
 907  * its own.
 908  */
 909 oldsize = node-tn_size;
 910 oldpages = OFF_TO_IDX(oldsize + PAGE_MASK);
 911 MPASS(oldpages == uobj-size);
 912 newpages = OFF_TO_IDX(newsize + PAGE_MASK);
 913 if (newpages  oldpages 
 914 newpages - oldpages  TMPFS_PAGES_AVAIL(tmp))
 915 return (ENOSPC);
 916 
 917 TMPFS_LOCK(tmp);
 918 tmp-tm_pages_used += (newpages - oldpages);
 919 TMPFS_UNLOCK(tmp);
 920 
 921 node-tn_size = newsize;
 922 VM_OBJECT_LOCK(uobj);

- Peter
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Thoughts on TMPFS no longer being considered highly experimental

2011-06-24 Thread Peter Holm
On Fri, Jun 24, 2011 at 05:50:43PM +0300, Kostik Belousov wrote:
 On Fri, Jun 24, 2011 at 03:21:05PM +0200, Peter Holm wrote:
  On Fri, Jun 24, 2011 at 02:06:27PM +0300, Kostik Belousov wrote:
   On Fri, Jun 24, 2011 at 12:30:16PM +0200, Peter Holm wrote:
On Thu, Jun 23, 2011 at 11:21:53PM +0300, Kostik Belousov wrote:
 On Thu, Jun 23, 2011 at 09:31:09AM -0700, David O'Brien wrote:
  Does anyone object to this patch?
  
  David Wolfskill and I have run TMPFS on a number of machines for two
  years with no problems.
  
  I may have missed something, but I'm not aware of any serious PRs on
  TMPFS either.
  
  
  Index: tmpfs_vfsops.c
  ===
  --- tmpfs_vfsops.c  (revision 221113)
  +++ tmpfs_vfsops.c  (working copy)
  @@ -155,9 +155,6 @@ tmpfs_mount(struct mount *mp)
  return EOPNOTSUPP;
  }
   
  -   printf(WARNING: TMPFS is considered to be a highly 
  experimental 
  -   feature in FreeBSD.\n);
  -
  vn_lock(mp-mnt_vnodecovered, LK_SHARED | LK_RETRY);
  error = VOP_GETATTR(mp-mnt_vnodecovered, va, mp-mnt_cred);
  VOP_UNLOCK(mp-mnt_vnodecovered, 0);
 
 The things I am aware of:
 - there is a races on the lookup. They were papered over in r212305,
 but the bug was not really fixed, AFAIR.
 
 - the tmpfs does double-buffering for the mapped vnodes. This is quite
 insulting for the memory-backed fs, isn't it ? I have a patch, but it 
 is
 still under review.
 
 - I believe Peter Holm has more test cases that fails with tmpfs. He
 would have more details. I somewhat remember some panic on execve(2) 
 the
 binary located on tmpfs.
 

I ran the TMPFS tests I have and so far I only spotted the mmap(2)
problem:

http://people.freebsd.org/~pho/stress/log/tmpfs/
   It would be indeed good if the issue was the only remaining problem.
  
  Well, more testing is needed for sure.
  
   The deadlock in tmpfs6.txt is caused by doing copyin() while having
   a page busied. This should be fixed indirectly by the patch to
   avoid double-buffering, I uploaded the latest version at
   http://people.freebsd.org/~kib/misc/tmpfs.5.patch
   

 Removing the warning will not make the issues coming away.

  
  This doesn't compile:
  
  === tmpfs (all)
  cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc  
   -DHAVE_KERNEL_OPTION_HEADERS -include 
  /usr/src/sys/i386/compile/PHO/opt_global.h -I. -I@ -I@/contrib/altq
  -finline-limit=8000 --param inline-unit-growth=100 --param 
  large-function-growth=1000 -fno-common -g -I/usr/src/sys/i386/compile/PHO  
  -mno-align-long-strings -mpreferred-stack-boundary=2 -mno-sse
  -mno-mmx -msoft-float -ffreestanding -fstack-protector -std=iso9899:1999 
  -fstack-protector -Wall -Wredundant-decls -Wnested-externs 
  -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith -Winline
  -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions  
  -Wmissing-include-dirs -fdiagnostics-show-option -c 
  /usr/src/sys/modules/tmpfs/../../fs/tmpfs/tmpfs_subr.c
  cc1: warnings being treated as errors
  /usr/src/sys/modules/tmpfs/../../fs/tmpfs/tmpfs_subr.c: In function 
  'tmpfs_reg_resize':
  /usr/src/sys/modules/tmpfs/../../fs/tmpfs/tmpfs_subr.c:911: warning: 'uobj' 
  is used uninitialized in this function
  *** Error code 1
 
 Yes, the patch has rotten. Please try
 http://people.freebsd.org/~kib/misc/tmpfs.6.patch

Got a panic: Not a vnode object quite fast:

http://people.freebsd.org/~pho/stress/log/kostik441.txt

- Peter
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: SU+J: negative used diskspace (for a while)

2011-06-17 Thread Peter Holm
On Fri, Jun 17, 2011 at 05:34:15PM +0200, Hans Ottevanger wrote:
 Hi,
 
 I found a possible issue with SU+J on recent versions of -CURRENT.
 
 After deleting a large file hierarchy (copy of /usr/src, ~1.5 Gbyte),
 df reports a negative number of blocks Used for a while.
 

Yes, thank you. This is a known issue and I believe that it is on
Jeff's to-do list.

- Peter
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: How a full fsck screwed up my SU+J filesystem

2010-12-01 Thread Peter Holm
On Wed, Dec 01, 2010 at 01:28:06AM -0800, Garrett Cooper wrote:
   So... I was doing a portmaster -af today because vlc stopped playing 
 audio (for some reason ... I kind of went on a pkg_cutleaves rampage and 
 probably deinstalled too much stuff), and the machine hardlocked during an 
 upgrade. I did a soft reboot and saw messages along the lines of your 
 journal and filesystem mount time mismatched; running a full fsck. I figured 
 ok, sure... and let it do it's thing. Problem was that it pruned a lot of 
 stuff from my /usr partition -- including the .sujournal !!! So now it's 
 stuck at Mounting local file systems: stating:
 
 Failed to find journal.   Use tunefs to create one
 Failed to start journal: 2
 
   (I assume the 2 means ENOENT). All of the above were printf(9)'s from 
 the kernel.
   Now the machine won't continue in multiuser mode (doesn't respond to 
 interrupts, no panic, etc). Going into ddb, I don't see anything in 
 info_threads (just a bunch of references to sched_switch, a few to 
 fork_trampoline, cpustop_handler, and kdb_enter). I'm going to try and 
 massage the machine back to life from single user mode, but the fact that 
 this died in this way (i.e. .sujournal getting nuked by a full fsck) is a bit 
 disheartening for SU+J :(... It would be nice if at least the fsck aborted 
 before going and nuking the journal :/... (or at the very least if the file 
 wasn't removable -- i.e. SF_NOUNLINK).
   Here's to hoping I can resuscitate the filesystem...
 Thanks,
 -Garrett___

Thank you for reporting this.

I was able to reproduce the problem by:

tunefs -j enable /dev/md5a
mount /dev/md5a /mnt
chflags 0 /mnt/.sujournal
rm -f /mnt/.sujournal
umount /mnt
mount /dev/md5a /mnt

The mount(1) is now stuck in mntref.

http://people.freebsd.org/~pho/stress/log/kostik404.txt

A sequence of tunefs -j disable + tunefs -j enable should get
you going.

-- 
Peter
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Corruption of UFS filesystems after using md(4)

2010-11-03 Thread Peter Holm
On Tue, Nov 02, 2010 at 07:33:50PM +, Bruce Cran wrote:
 On Tuesday 02 November 2010 19:12:14 Bruce Cran wrote:
  I've noticed in recent months that I appear to be getting silent corruption
  of my UFS filesystems - and I think it may be linked to using md(4) or
  creating sparse files.
 
 I've confirmed this is a UFS bug related to sparse files: truncate -s20G f1 
  rm f1 is enough to trigger the error and start generating .viminfo files 
 that appear to be 20GB. When running fsck I get an Invalid block count 
 error 
 if I just reboot without removing the .viminfo file; if I do remove it, I get 
 a Partially allocated inode error.
 

I'm able to verify this by:

m.sh 49L, 1917C written
$ ./m.sh
Local config: x4
+ mdconfig -a -t swap -s 1g -u 5
+ bsdlabel -w md5 auto
+ newfs -U md5a
+ mount /dev/md5a /mnt
+ truncate -s20G /mnt/f1
+ rm /mnt/f1
+ umount /mnt
+ fsck -t ufs -y /dev/md5a
** /dev/md5a
** Last Mounted on /mnt
** Phase 1 - Check Blocks and Sizes
PARTIALLY ALLOCATED INODE I=4
UNEXPECTED SOFT UPDATE INCONSISTENCY

CLEAR? yes

** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? yes

SUMMARY INFORMATION BAD
SALVAGE? yes

BLK(S) MISSING IN BIT MAPS
SALVAGE? yes

2 files, 2 used, 506481 free (25 frags, 63307 blocks, 0.0%
fragmentation)

* FILE SYSTEM IS CLEAN *

* FILE SYSTEM WAS MODIFIED *
+ mdconfig -d -u 5
$ 

- Peter
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: softupdate with journal panic

2010-08-23 Thread Peter Holm
On Tue, Aug 24, 2010 at 12:12:57AM +0300, Kostik Belousov wrote:
 On Sun, Aug 22, 2010 at 03:21:04PM +0200, Peter Holm wrote:
  On Sat, Aug 21, 2010 at 01:49:45PM -0400, Michael Butler wrote:
   While updating sysutils/coreutils port on -current as of this morning
   (SVN r211550), I noted a panic during the directory rename config test.
   
  
  Your problem seems identical to this report:
  
  http://docs.freebsd.org/cgi/mid.cgi?AANLkTinPjiOV21kDLZYV5WScrhLMN7DY8E8jVHWPU5mC
  
 I believe that dotdotremref in this case is legitimately NULL. With this
 assumption, the following patch would help.
 
 diff --git a/sys/ufs/ffs/ffs_softdep.c b/sys/ufs/ffs/ffs_softdep.c
 index b666c0f..65e5255 100644
 --- a/sys/ufs/ffs/ffs_softdep.c

Yes, works for me.

- Peter
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: softupdate with journal panic

2010-08-22 Thread Peter Holm
On Sat, Aug 21, 2010 at 01:49:45PM -0400, Michael Butler wrote:
 While updating sysutils/coreutils port on -current as of this morning
 (SVN r211550), I noted a panic during the directory rename config test.
 

Your problem seems identical to this report:

http://docs.freebsd.org/cgi/mid.cgi?AANLkTinPjiOV21kDLZYV5WScrhLMN7DY8E8jVHWPU5mC

- Peter
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: SUJ panic on new directory rename

2010-07-04 Thread Peter Holm
On Sun, Jul 04, 2010 at 09:42:08PM +0200, Attilio Rao wrote:
 Is this core updates somewhere? (With, possibly, a copy of your kernel
 binaries?)
 
 Attilio
 

Some more info here:

http://people.freebsd.org/~pho/stress/log/attilio039.txt

- Peter
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Deadlock [PATCH]

2003-08-14 Thread Peter Holm
I have tracked down, what I belive to be the cause of several
deadlock situations I have encountered, like
http://people.freebsd.org/~pho/stress/cons40.html.

The problem seems to be the 4bsd scheduler, that does not preempt correctly.

I've included a patch that fixes the problem for me.
-- 
Peter Holm
--- sched_4bsd.c~   Sun Jun 15 16:57:17 2003
+++ sched_4bsd.cSun Aug 10 08:41:06 2003
@@ -448,7 +448,8 @@
 
ke-ke_sched-ske_cpticks++;
kg-kg_estcpu = ESTCPULIM(kg-kg_estcpu + 1);
-   if ((kg-kg_estcpu % INVERSE_ESTCPU_WEIGHT) == 0) {
+   if (((kg-kg_estcpu + 1) % INVERSE_ESTCPU_WEIGHT) == 0) {
+   curthread-td_flags |= TDF_NEEDRESCHED;
resetpriority(kg);
if (td-td_priority = PUSER)
td-td_priority = kg-kg_user_pri;
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Deadlock

2003-07-09 Thread Peter Holm
Here's a trace from a deadlock in a kernel from Jul 8 13:51 UTC:

http://people.freebsd.org/~pho/stress/cons36.html

-- 
Peter Holm
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


LOR in vm/swap_pager.c:1166 + vm/vm_kern.c:325

2003-07-07 Thread Peter Holm
In current from Jul 3 18:36 UTC:

lock order reversal
1st 0xc0836ae4 vm object (vm object) @ vm/swap_pager.c:1166
2nd 0xc082e120 system map (system map) @ vm/vm_kern.c:325
Stack backtrace:
backtrace(c050c617,c082e120,c051eacf,c051eacf,c051e977) at backtrace+0x17
witness_lock(c082e120,8,c051e977,145,c05da600) at witness_lock+0x697
_mtx_lock_flags(c082e120,0,c051e96e,145,cccab904) at _mtx_lock_flags+0xb1
_vm_map_lock(c082e0c0,c051e96e,145,c083a214,cccab95c) at _vm_map_lock+0x36
kmem_malloc(c082e0c0,1000,101,cccab98c,c046c860) at kmem_malloc+0x39
page_alloc(c083a200,1000,cccab97f,101,c055fb80) at page_alloc+0x27
slab_zalloc(c083a200,101,c052030c,664,c0508c33) at slab_zalloc+0x150
uma_zone_slab(c083a200,101,c0520303,664,0) at uma_zone_slab+0xd8
uma_zalloc_internal(c083a200,0,101,6e8,0) at uma_zalloc_internal+0x55
uma_zfree_arg(c1991300,ccf0,0,1,0) at uma_zfree_arg+0x2d7
swp_pager_meta_ctl(c0836ae4,1f,0,2,cccabb6c) at swp_pager_meta_ctl+0x1bf
swap_pager_unswapped(c08f14d8,1,c05072cf,b4,cccabad0) at swap_pager_unswapped+0x2a
vm_fault(c0bab770,bfbff000,2,8,c1982390) at vm_fault+0x1181
trap_pfault(cccabbfc,0,bfbffa68,c0507326,bfbffa68) at trap_pfault+0x10f
trap(18,10,10,bfbffa68,cccabc6c) at trap+0x3cd
calltrap() at calltrap+0x5
--- trap 0xc, eip = 0xc04a243c, esp = 0xcccabc3c, ebp = 0xcccabccc ---
slow_copyout(c1982390,cccabd10,0,cccabd40,c04a471e) at slow_copyout+0x4
wait4(c1982390,cccabd10,c052441d,3fd,4) at wait4+0x20
syscall(2f,2f,2f,0,3a) at syscall+0x26e
Xint0x80_syscall() at Xint0x80_syscall+0x1d
--- syscall (7), eip = 0x807b26b, esp = 0xbfbffa0c, ebp = 0xbfbffa28 ---
-- 
Peter Holm
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Disk/FS I/O issues in -CURRENT

2003-06-30 Thread Peter Holm
On Mon, Jun 30, 2003 at 01:42:15PM +0200, Eirik Oeverby wrote:
 Hi folks,
 
 I am having some very weird problems on my laptop,

I can repeat the problem (noticed with savecore) on a
kernel from Jun 30 05:23 UTC:

current# df -h .
FilesystemSize   Used  Avail Capacity  Mounted on
/dev/ad0s1f   8.2G   1.9G   5.6G25%/usr
current# dd if=/dev/zero of=100mb bs=1024 count=102400
load: 4.04  cmd: dd 25063 [running] 0.33u 28.67s 4% 100k
97657+0 records in
97657+0 records out
10768 bytes transferred in 48.549837 secs (2059755 bytes/sec)

db ps
  pid   proc addruid  ppid  pgrp  flag   stat  wmesgwchan  cmd
25063 c1e5d3c8 cd19d0000 25060 25063 0004002 [RUNQ] dd
8 c197ed3c ccc9e0000 0 0 204 [CPU 0] pagedaemon

db t 8
siointr1(c0b6c800,0,c051fcc5,693,c867abf0) at siointr1+0xd5
siointr(c0b6c800) at siointr+0x35
Xfastintr4() at Xfastintr4+0x63
--- interrupt, eip = 0xc0377480, esp = 0xc867abdc, ebp = 0xc867abf0 ---
strncmp(c051e785,c051df68,123,0,477b) at strncmp
witness_unlock(c059c280,8,c051e785,35c,1) at witness_unlock+0x5a
_mtx_unlock_flags(c059c280,0,c051e785,35c,c0ba69ec) at _mtx_unlock_flags+0x80
vm_pageout_scan(0,0,c051e785,5dd,1f4) at vm_pageout_scan+0x40c
vm_pageout(0,c867ad48,c0505bfc,312,0) at vm_pageout+0x2ce
fork_exit(c04688d0,0,c867ad48) at fork_exit+0xc0
fork_trampoline() at fork_trampoline+0x1a
--- trap 0x1, eip = 0, esp = 0xc867ad7c, ebp = 0 ---
db 

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


panic: softdep_flushfiles: looping

1999-07-29 Thread Peter Holm

Kirk seems to be out of touch :-), so I created PR kern/12869.
--
Peter Holm | mailto:[EMAIL PROTECTED] | http://login.dknet.dk/~pho/
  -[ Member of the BSD-Dk User Group / http://www.bsd-dk.dk/ ] -


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message