Re: ktrace on NFSroot failing?
Rick Macklem wrote: > John Baldwin wrote: > > On 3/10/22 8:14 AM, Mateusz Guzik wrote: > > > On 3/10/22, Bjoern A. Zeeb wrote: > > >> Hi, > > >> > > >> I am having a weird issue with ktrace on an nfsroot machine: > > >> > > >> root:/tmp # ktrace sleep 1 > > >> root:/tmp # kdump > > >> -559038242 Events dropped. > > >> kdump: bogus length 0xdeadc0de > > >> > > >> Anyone seen something like this before? > > >> > > > > > > I just did a quick check and it definitely fails on nfs mounts: > > > # ktrace pwd > > > /root/mjg > > > # kdump > > > -559038242 Events dropped. > > > kdump: bogus length 0xdeadc0de > > > > > > I don't have time to look into it this week though. > > > > Possibly related: core dumps are no longer working for me on NFS > > mounts. I get a 0 byte foo.core instead of a valid core dump. > I just tried a core dump for a kernel built from main sources as of > to-day and it worked ok. > > However my userland is several months old and I can't easily upgrade > it for now. > > There was a recent ZFS problem that found its way into 13.0-p8 that > I'm pretty sure is fixed now. If you had a fairly recent server exporting > ZFS, that *might* explain it? > > In particular, an NFSroot uses NFSv3 and nothing has changed for > NFSv3 in a looonnggg time. Oops, I did make a change that affected NFSv3 as well as NFSv4. Last December I committed a path to head that made IO_APPEND writes use nfs_directio_write() and avoid the buffer cache. Turns out nfs_directio_write() only worked for UIO_USERSPACE. The attached trivial patch fixes ktrace for me. I have no idea if core dumping ever does IO_APPEND VOP_WRITE()s? Please test the attached patch. Thanks, rick rick -- John Baldwin ktrace.patch Description: ktrace.patch
Re: ktrace on NFSroot failing?
John Baldwin wrote: > On 3/10/22 8:14 AM, Mateusz Guzik wrote: > > On 3/10/22, Bjoern A. Zeeb wrote: > >> Hi, > >> > >> I am having a weird issue with ktrace on an nfsroot machine: > >> > >> root:/tmp # ktrace sleep 1 > >> root:/tmp # kdump > >> -559038242 Events dropped. > >> kdump: bogus length 0xdeadc0de > >> > >> Anyone seen something like this before? > >> > > > > I just did a quick check and it definitely fails on nfs mounts: > > # ktrace pwd > > /root/mjg > > # kdump > > -559038242 Events dropped. > > kdump: bogus length 0xdeadc0de > > > > I don't have time to look into it this week though. > > Possibly related: core dumps are no longer working for me on NFS > mounts. I get a 0 byte foo.core instead of a valid core dump. I just tried a core dump for a kernel built from main sources as of to-day and it worked ok. However my userland is several months old and I can't easily upgrade it for now. There was a recent ZFS problem that found its way into 13.0-p8 that I'm pretty sure is fixed now. If you had a fairly recent server exporting ZFS, that *might* explain it? In particular, an NFSroot uses NFSv3 and nothing has changed for NFSv3 in a looonnggg time. rick -- John Baldwin
Re: ktrace on NFSroot failing?
On 3/10/22 8:14 AM, Mateusz Guzik wrote: On 3/10/22, Bjoern A. Zeeb wrote: Hi, I am having a weird issue with ktrace on an nfsroot machine: root:/tmp # ktrace sleep 1 root:/tmp # kdump -559038242 Events dropped. kdump: bogus length 0xdeadc0de Anyone seen something like this before? I just did a quick check and it definitely fails on nfs mounts: # ktrace pwd /root/mjg # kdump -559038242 Events dropped. kdump: bogus length 0xdeadc0de I don't have time to look into it this week though. Possibly related: core dumps are no longer working for me on NFS mounts. I get a 0 byte foo.core instead of a valid core dump. -- John Baldwin
Re: Interrupted fputc followed by fprintf in _IOLBF mode causes core dump
On Fri, Mar 25, 2022 at 5:50 PM Mark Johnston wrote: > > On Fri, Mar 25, 2022 at 03:18:40PM +0300, Guy Yur wrote: > > Hi, > > > > dhcpcd on head (Mar 24) and 13.1-BETA2 crashes in fprintf/__sfvwrite. > > It doesn't crash if If I revert the __sflush/__sfvwrite commits: > > 86a16ada1ea608408cec370171d9f59353e97c77 and > > bafaa70b6f9098d83d074968c8e6747ecec1e118. > > > > ... > > Thanks for the reproducer. The bug is akin to that fixed by > bafaa70b6f9098d83d074968c8e6747ecec1e118. Could you please verify that > the patch below fixes the original bug? > > commit 11f817e847b2f06bd543d1bd6cfdff53d69842dc > Author: Mark Johnston > Date: Fri Mar 25 10:46:24 2022 -0400 > > libc: Restore fp state upon flush error in fputc > > diff --git a/lib/libc/stdio/wbuf.c b/lib/libc/stdio/wbuf.c > index e1aa70243e94..6cd75145a271 100644 > --- a/lib/libc/stdio/wbuf.c > +++ b/lib/libc/stdio/wbuf.c > @@ -52,6 +52,7 @@ __FBSDID("$FreeBSD$"); > int > __swbuf(int c, FILE *fp) > { > + unsigned char *old_p; > int n; > > /* > @@ -87,8 +88,15 @@ __swbuf(int c, FILE *fp) > } > fp->_w--; > *fp->_p++ = c; > - if (++n == fp->_bf._size || (fp->_flags & __SLBF && c == '\n')) > - if (__fflush(fp)) > + old_p = fp->_p; > + if (++n == fp->_bf._size || (fp->_flags & __SLBF && c == '\n')) { > + if (__fflush(fp)) { > + if (fp->_p == old_p) { > + fp->_p--; > + fp->_w++; > + } > return (EOF); > + } > + } > return (c); > } Hi, With this patch applied, dhcpcd doesn't crash. Thanks, Guy
Re: Interrupted fputc followed by fprintf in _IOLBF mode causes core dump
On Fri, Mar 25, 2022 at 03:18:40PM +0300, Guy Yur wrote: > Hi, > > dhcpcd on head (Mar 24) and 13.1-BETA2 crashes in fprintf/__sfvwrite. > It doesn't crash if If I revert the __sflush/__sfvwrite commits: > 86a16ada1ea608408cec370171d9f59353e97c77 and > bafaa70b6f9098d83d074968c8e6747ecec1e118. > > Stack trace: > 0 memcpy () at /usr/src/lib/libc/amd64/string/memmove.S:314 > #1 0x0008221c300a in __sfvwrite (fp=, > uio=0x8207ad338) at /usr/src/lib/libc/stdio/fvwrite.c:182 > #2 0x0008221cc631 in __sprint (fp=0x26fffe, uio=0x8207ad2d7, > locale=) at /usr/src/lib/libc/stdio/vfprintf.c:166 > #3 io_flush (iop=0x8207ad330, locale=) at > /usr/src/lib/libc/stdio/printfcommon.h:157 > #4 __vfprintf (fp=fp@entry=0x8222892d0, locale=locale@entry=0x822288ab8 > <__xlocale_global_locale>, fmt0=, fmt0@entry=0x204182 "%s", > ap=, ap@entry=0x8207ad4e0) at > /usr/src/lib/libc/stdio/vfprintf.c:1033 > #5 0x0008221c8aea in vfprintf_l (fp=0x8222892d0, locale=0x822288ab8 > <__xlocale_global_locale>, fmt0=0x204182 "%s", ap=0x8207ad4e0) > at /usr/src/lib/libc/stdio/vfprintf.c:285 > #6 0x00222efa in vlogprintf_r (ctx=0x270820 <_logctx>, > stream=0x8222892d0, fmt=0x204182 "%s", args=0x8207adad0) at logerr.c:186 > ... > > (gdb) frame 5 > #5 0x0008221c8aea in vfprintf_l (fp=0x8222892d0, locale=0x822288ab8 > <__xlocale_global_locale>, fmt0=0x204182 "%s", ap=0x8207ad4e0) > at /usr/src/lib/libc/stdio/vfprintf.c:285 > 285 ret = __vfprintf(fp, locale, fmt0, ap); > (gdb) p *fp > $1 = {_p = 0x27084f <_logctx+47> "e21:3e7c\n42a/64\n", _r = 0, _w = > -1025, _flags = 2057, _file = 2, _bf = {_base = 0x270820 <_logctx> > "*\"", _size = 1024}, > _lbfsize = -1024, _cookie = 0x8222892d0, _close = 0x8221c7b40 > <__sclose>, _read = 0x8221c7af0 <__sread>, _seek = 0x8221c7b30 > <__sseek>, > _write = 0x8221c7b10 <__swrite>, _ub = {_base = 0x0, _size = 0}, _up > = 0x0, _ur = 0, _ubuf = "\000\000", _nbuf = "", _lb = {_base = 0x0, > _size = 0}, > _blksize = 4096, _offset = 0, _fl_mutex = 0x0, _fl_owner = 0x0, > _fl_count = 0, _orientation = -1, _mbstate = {__mbstate8 = '\000' > , > _mbstateL = 0}, _flags2 = 0} > > (gdb) frame 1 > #1 0x0008221c300a in __sfvwrite (fp=, > uio=0x8207ad338) at /usr/src/lib/libc/stdio/fvwrite.c:182 > 182 COPY(w); > (gdb) p w > $4 = -1 > > > The dhcpcd flow leading to the crash: > 1. init with setvbuf _IOLBF on stderr > https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/logerr.c#L453 > > 2. fputc with newline called on stderr but is interrupted > https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/logerr.c#L187 > > 3. next event received, vfprintf is called on stderr and crashes > https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/logerr.c#L186 > > > Simple program that eventually crashes: Thanks for the reproducer. The bug is akin to that fixed by bafaa70b6f9098d83d074968c8e6747ecec1e118. Could you please verify that the patch below fixes the original bug? commit 11f817e847b2f06bd543d1bd6cfdff53d69842dc Author: Mark Johnston Date: Fri Mar 25 10:46:24 2022 -0400 libc: Restore fp state upon flush error in fputc diff --git a/lib/libc/stdio/wbuf.c b/lib/libc/stdio/wbuf.c index e1aa70243e94..6cd75145a271 100644 --- a/lib/libc/stdio/wbuf.c +++ b/lib/libc/stdio/wbuf.c @@ -52,6 +52,7 @@ __FBSDID("$FreeBSD$"); int __swbuf(int c, FILE *fp) { + unsigned char *old_p; int n; /* @@ -87,8 +88,15 @@ __swbuf(int c, FILE *fp) } fp->_w--; *fp->_p++ = c; - if (++n == fp->_bf._size || (fp->_flags & __SLBF && c == '\n')) - if (__fflush(fp)) + old_p = fp->_p; + if (++n == fp->_bf._size || (fp->_flags & __SLBF && c == '\n')) { + if (__fflush(fp)) { + if (fp->_p == old_p) { + fp->_p--; + fp->_w++; + } return (EOF); + } + } return (c); }
Interrupted fputc followed by fprintf in _IOLBF mode causes core dump
Hi, dhcpcd on head (Mar 24) and 13.1-BETA2 crashes in fprintf/__sfvwrite. It doesn't crash if If I revert the __sflush/__sfvwrite commits: 86a16ada1ea608408cec370171d9f59353e97c77 and bafaa70b6f9098d83d074968c8e6747ecec1e118. Stack trace: 0 memcpy () at /usr/src/lib/libc/amd64/string/memmove.S:314 #1 0x0008221c300a in __sfvwrite (fp=, uio=0x8207ad338) at /usr/src/lib/libc/stdio/fvwrite.c:182 #2 0x0008221cc631 in __sprint (fp=0x26fffe, uio=0x8207ad2d7, locale=) at /usr/src/lib/libc/stdio/vfprintf.c:166 #3 io_flush (iop=0x8207ad330, locale=) at /usr/src/lib/libc/stdio/printfcommon.h:157 #4 __vfprintf (fp=fp@entry=0x8222892d0, locale=locale@entry=0x822288ab8 <__xlocale_global_locale>, fmt0=, fmt0@entry=0x204182 "%s", ap=, ap@entry=0x8207ad4e0) at /usr/src/lib/libc/stdio/vfprintf.c:1033 #5 0x0008221c8aea in vfprintf_l (fp=0x8222892d0, locale=0x822288ab8 <__xlocale_global_locale>, fmt0=0x204182 "%s", ap=0x8207ad4e0) at /usr/src/lib/libc/stdio/vfprintf.c:285 #6 0x00222efa in vlogprintf_r (ctx=0x270820 <_logctx>, stream=0x8222892d0, fmt=0x204182 "%s", args=0x8207adad0) at logerr.c:186 ... (gdb) frame 5 #5 0x0008221c8aea in vfprintf_l (fp=0x8222892d0, locale=0x822288ab8 <__xlocale_global_locale>, fmt0=0x204182 "%s", ap=0x8207ad4e0) at /usr/src/lib/libc/stdio/vfprintf.c:285 285 ret = __vfprintf(fp, locale, fmt0, ap); (gdb) p *fp $1 = {_p = 0x27084f <_logctx+47> "e21:3e7c\n42a/64\n", _r = 0, _w = -1025, _flags = 2057, _file = 2, _bf = {_base = 0x270820 <_logctx> "*\"", _size = 1024}, _lbfsize = -1024, _cookie = 0x8222892d0, _close = 0x8221c7b40 <__sclose>, _read = 0x8221c7af0 <__sread>, _seek = 0x8221c7b30 <__sseek>, _write = 0x8221c7b10 <__swrite>, _ub = {_base = 0x0, _size = 0}, _up = 0x0, _ur = 0, _ubuf = "\000\000", _nbuf = "", _lb = {_base = 0x0, _size = 0}, _blksize = 4096, _offset = 0, _fl_mutex = 0x0, _fl_owner = 0x0, _fl_count = 0, _orientation = -1, _mbstate = {__mbstate8 = '\000' , _mbstateL = 0}, _flags2 = 0} (gdb) frame 1 #1 0x0008221c300a in __sfvwrite (fp=, uio=0x8207ad338) at /usr/src/lib/libc/stdio/fvwrite.c:182 182 COPY(w); (gdb) p w $4 = -1 The dhcpcd flow leading to the crash: 1. init with setvbuf _IOLBF on stderr https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/logerr.c#L453 2. fputc with newline called on stderr but is interrupted https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/logerr.c#L187 3. next event received, vfprintf is called on stderr and crashes https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/logerr.c#L186 Simple program that eventually crashes: #include #include #include static void alrm(int signo __unused) { alarm(1); } char buf[1024]; /* use global to not corrupt stack trace in core dump */ int main() { struct sigaction sa; sa.sa_handler = alrm; sigemptyset(_mask); sa.sa_flags = 0; sigaction(SIGALRM, , NULL); setvbuf(stderr, buf, _IOLBF, sizeof(buf)); alarm(1); while (!ferror(stderr)) { fputc('\n', stderr); } fprintf(stderr, "%s", "a"); return 0; } Regards, Guy Yur