Re: ktrace on NFSroot failing?

2022-03-25 Thread Rick Macklem
Rick Macklem  wrote:
> John Baldwin  wrote:
> > On 3/10/22 8:14 AM, Mateusz Guzik wrote:
> > > On 3/10/22, Bjoern A. Zeeb  wrote:
> > >> Hi,
> > >>
> > >> I am having a weird issue with ktrace on an nfsroot machine:
> > >>
> > >> root:/tmp # ktrace sleep 1
> > >> root:/tmp # kdump
> > >> -559038242  Events dropped.
> > >> kdump: bogus length 0xdeadc0de
> > >>
> > >> Anyone seen something like this before?
> > >>
> > >
> > > I just did a quick check and it definitely fails on nfs mounts:
> > > # ktrace pwd
> > > /root/mjg
> > > # kdump
> > > -559038242  Events dropped.
> > > kdump: bogus length 0xdeadc0de
> > >
> > > I don't have time to look into it this week though.
> >
> > Possibly related: core dumps are no longer working for me on NFS
> > mounts.  I get a 0 byte foo.core instead of a valid core dump.
> I just tried a core dump for a kernel built from main sources as of
> to-day and it worked ok.
> 
> However my userland is several months old and I can't easily upgrade
> it for now.
>
> There was a recent ZFS problem that found its way into 13.0-p8 that
> I'm pretty sure is fixed now. If you had a fairly recent server exporting
> ZFS, that *might* explain it?
> 
> In particular, an NFSroot uses NFSv3 and nothing has changed for
> NFSv3 in a looonnggg time.
Oops, I did make a change that affected NFSv3 as well as NFSv4.
Last December I committed a path to head that made IO_APPEND
writes use nfs_directio_write() and avoid the buffer cache.

Turns out nfs_directio_write() only worked for UIO_USERSPACE.
The attached trivial patch fixes ktrace for me.
I have no idea if core dumping ever does IO_APPEND VOP_WRITE()s?

Please test the attached patch.

Thanks, rick

rick

--
John Baldwin



ktrace.patch
Description: ktrace.patch


Re: ktrace on NFSroot failing?

2022-03-25 Thread Rick Macklem
John Baldwin  wrote:
> On 3/10/22 8:14 AM, Mateusz Guzik wrote:
> > On 3/10/22, Bjoern A. Zeeb  wrote:
> >> Hi,
> >>
> >> I am having a weird issue with ktrace on an nfsroot machine:
> >>
> >> root:/tmp # ktrace sleep 1
> >> root:/tmp # kdump
> >> -559038242  Events dropped.
> >> kdump: bogus length 0xdeadc0de
> >>
> >> Anyone seen something like this before?
> >>
> >
> > I just did a quick check and it definitely fails on nfs mounts:
> > # ktrace pwd
> > /root/mjg
> > # kdump
> > -559038242  Events dropped.
> > kdump: bogus length 0xdeadc0de
> >
> > I don't have time to look into it this week though.
>
> Possibly related: core dumps are no longer working for me on NFS
> mounts.  I get a 0 byte foo.core instead of a valid core dump.
I just tried a core dump for a kernel built from main sources as of
to-day and it worked ok.

However my userland is several months old and I can't easily upgrade
it for now.

There was a recent ZFS problem that found its way into 13.0-p8 that
I'm pretty sure is fixed now. If you had a fairly recent server exporting
ZFS, that *might* explain it?

In particular, an NFSroot uses NFSv3 and nothing has changed for
NFSv3 in a looonnggg time.

rick

--
John Baldwin



Re: ktrace on NFSroot failing?

2022-03-25 Thread John Baldwin

On 3/10/22 8:14 AM, Mateusz Guzik wrote:

On 3/10/22, Bjoern A. Zeeb  wrote:

Hi,

I am having a weird issue with ktrace on an nfsroot machine:

root:/tmp # ktrace sleep 1
root:/tmp # kdump
-559038242  Events dropped.
kdump: bogus length 0xdeadc0de

Anyone seen something like this before?



I just did a quick check and it definitely fails on nfs mounts:
# ktrace pwd
/root/mjg
# kdump
-559038242  Events dropped.
kdump: bogus length 0xdeadc0de

I don't have time to look into it this week though.


Possibly related: core dumps are no longer working for me on NFS
mounts.  I get a 0 byte foo.core instead of a valid core dump.

--
John Baldwin



Re: Interrupted fputc followed by fprintf in _IOLBF mode causes core dump

2022-03-25 Thread Guy Yur
On Fri, Mar 25, 2022 at 5:50 PM Mark Johnston  wrote:
>
> On Fri, Mar 25, 2022 at 03:18:40PM +0300, Guy Yur wrote:
> > Hi,
> >
> > dhcpcd on head (Mar 24) and 13.1-BETA2 crashes in fprintf/__sfvwrite.
> > It doesn't crash if If I revert the __sflush/__sfvwrite commits:
> > 86a16ada1ea608408cec370171d9f59353e97c77 and
> > bafaa70b6f9098d83d074968c8e6747ecec1e118.
> >
> > ...
>
> Thanks for the reproducer.  The bug is akin to that fixed by
> bafaa70b6f9098d83d074968c8e6747ecec1e118.  Could you please verify that
> the patch below fixes the original bug?
>
> commit 11f817e847b2f06bd543d1bd6cfdff53d69842dc
> Author: Mark Johnston 
> Date:   Fri Mar 25 10:46:24 2022 -0400
>
> libc: Restore fp state upon flush error in fputc
>
> diff --git a/lib/libc/stdio/wbuf.c b/lib/libc/stdio/wbuf.c
> index e1aa70243e94..6cd75145a271 100644
> --- a/lib/libc/stdio/wbuf.c
> +++ b/lib/libc/stdio/wbuf.c
> @@ -52,6 +52,7 @@ __FBSDID("$FreeBSD$");
>  int
>  __swbuf(int c, FILE *fp)
>  {
> +   unsigned char *old_p;
> int n;
>
> /*
> @@ -87,8 +88,15 @@ __swbuf(int c, FILE *fp)
> }
> fp->_w--;
> *fp->_p++ = c;
> -   if (++n == fp->_bf._size || (fp->_flags & __SLBF && c == '\n'))
> -   if (__fflush(fp))
> +   old_p = fp->_p;
> +   if (++n == fp->_bf._size || (fp->_flags & __SLBF && c == '\n')) {
> +   if (__fflush(fp)) {
> +   if (fp->_p == old_p) {
> +   fp->_p--;
> +   fp->_w++;
> +   }
> return (EOF);
> +   }
> +   }
> return (c);
>  }

Hi,

With this patch applied, dhcpcd doesn't crash.

Thanks,
Guy



Re: Interrupted fputc followed by fprintf in _IOLBF mode causes core dump

2022-03-25 Thread Mark Johnston
On Fri, Mar 25, 2022 at 03:18:40PM +0300, Guy Yur wrote:
> Hi,
> 
> dhcpcd on head (Mar 24) and 13.1-BETA2 crashes in fprintf/__sfvwrite.
> It doesn't crash if If I revert the __sflush/__sfvwrite commits:
> 86a16ada1ea608408cec370171d9f59353e97c77 and
> bafaa70b6f9098d83d074968c8e6747ecec1e118.
> 
> Stack trace:
> 0  memcpy () at /usr/src/lib/libc/amd64/string/memmove.S:314
> #1  0x0008221c300a in __sfvwrite (fp=,
> uio=0x8207ad338) at /usr/src/lib/libc/stdio/fvwrite.c:182
> #2  0x0008221cc631 in __sprint (fp=0x26fffe, uio=0x8207ad2d7,
> locale=) at /usr/src/lib/libc/stdio/vfprintf.c:166
> #3  io_flush (iop=0x8207ad330, locale=) at
> /usr/src/lib/libc/stdio/printfcommon.h:157
> #4  __vfprintf (fp=fp@entry=0x8222892d0, locale=locale@entry=0x822288ab8
> <__xlocale_global_locale>, fmt0=, fmt0@entry=0x204182 "%s",
>  ap=, ap@entry=0x8207ad4e0) at
> /usr/src/lib/libc/stdio/vfprintf.c:1033
> #5  0x0008221c8aea in vfprintf_l (fp=0x8222892d0, locale=0x822288ab8
> <__xlocale_global_locale>, fmt0=0x204182 "%s", ap=0x8207ad4e0)
>  at /usr/src/lib/libc/stdio/vfprintf.c:285
> #6  0x00222efa in vlogprintf_r (ctx=0x270820 <_logctx>,
> stream=0x8222892d0, fmt=0x204182 "%s", args=0x8207adad0) at logerr.c:186
> ...
> 
> (gdb) frame 5
> #5  0x0008221c8aea in vfprintf_l (fp=0x8222892d0, locale=0x822288ab8
> <__xlocale_global_locale>, fmt0=0x204182 "%s", ap=0x8207ad4e0)
>  at /usr/src/lib/libc/stdio/vfprintf.c:285
> 285 ret = __vfprintf(fp, locale, fmt0, ap);
> (gdb) p *fp
> $1 = {_p = 0x27084f <_logctx+47> "e21:3e7c\n42a/64\n", _r = 0, _w =
> -1025, _flags = 2057, _file = 2, _bf = {_base = 0x270820 <_logctx>
> "*\"", _size = 1024},
>   _lbfsize = -1024, _cookie = 0x8222892d0, _close = 0x8221c7b40
> <__sclose>, _read = 0x8221c7af0 <__sread>, _seek = 0x8221c7b30
> <__sseek>,
>   _write = 0x8221c7b10 <__swrite>, _ub = {_base = 0x0, _size = 0}, _up
> = 0x0, _ur = 0, _ubuf = "\000\000", _nbuf = "", _lb = {_base = 0x0,
> _size = 0},
>   _blksize = 4096, _offset = 0, _fl_mutex = 0x0, _fl_owner = 0x0,
> _fl_count = 0, _orientation = -1, _mbstate = {__mbstate8 = '\000'
> ,
> _mbstateL = 0}, _flags2 = 0}
> 
> (gdb) frame 1
> #1  0x0008221c300a in __sfvwrite (fp=,
> uio=0x8207ad338) at /usr/src/lib/libc/stdio/fvwrite.c:182
> 182 COPY(w);
> (gdb) p w
> $4 = -1
> 
> 
> The dhcpcd flow leading to the crash:
> 1. init with setvbuf _IOLBF on stderr
> https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/logerr.c#L453
> 
> 2. fputc with newline called on stderr but is interrupted
> https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/logerr.c#L187
> 
> 3. next event received, vfprintf is called on stderr and crashes
> https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/logerr.c#L186
> 
> 
> Simple program that eventually crashes:

Thanks for the reproducer.  The bug is akin to that fixed by
bafaa70b6f9098d83d074968c8e6747ecec1e118.  Could you please verify that
the patch below fixes the original bug?

commit 11f817e847b2f06bd543d1bd6cfdff53d69842dc
Author: Mark Johnston 
Date:   Fri Mar 25 10:46:24 2022 -0400

libc: Restore fp state upon flush error in fputc

diff --git a/lib/libc/stdio/wbuf.c b/lib/libc/stdio/wbuf.c
index e1aa70243e94..6cd75145a271 100644
--- a/lib/libc/stdio/wbuf.c
+++ b/lib/libc/stdio/wbuf.c
@@ -52,6 +52,7 @@ __FBSDID("$FreeBSD$");
 int
 __swbuf(int c, FILE *fp)
 {
+   unsigned char *old_p;
int n;
 
/*
@@ -87,8 +88,15 @@ __swbuf(int c, FILE *fp)
}
fp->_w--;
*fp->_p++ = c;
-   if (++n == fp->_bf._size || (fp->_flags & __SLBF && c == '\n'))
-   if (__fflush(fp))
+   old_p = fp->_p;
+   if (++n == fp->_bf._size || (fp->_flags & __SLBF && c == '\n')) {
+   if (__fflush(fp)) {
+   if (fp->_p == old_p) {
+   fp->_p--;
+   fp->_w++;
+   }
return (EOF);
+   }
+   }
return (c);
 }



Interrupted fputc followed by fprintf in _IOLBF mode causes core dump

2022-03-25 Thread Guy Yur
Hi,

dhcpcd on head (Mar 24) and 13.1-BETA2 crashes in fprintf/__sfvwrite.
It doesn't crash if If I revert the __sflush/__sfvwrite commits:
86a16ada1ea608408cec370171d9f59353e97c77 and
bafaa70b6f9098d83d074968c8e6747ecec1e118.

Stack trace:
0  memcpy () at /usr/src/lib/libc/amd64/string/memmove.S:314
#1  0x0008221c300a in __sfvwrite (fp=,
uio=0x8207ad338) at /usr/src/lib/libc/stdio/fvwrite.c:182
#2  0x0008221cc631 in __sprint (fp=0x26fffe, uio=0x8207ad2d7,
locale=) at /usr/src/lib/libc/stdio/vfprintf.c:166
#3  io_flush (iop=0x8207ad330, locale=) at
/usr/src/lib/libc/stdio/printfcommon.h:157
#4  __vfprintf (fp=fp@entry=0x8222892d0, locale=locale@entry=0x822288ab8
<__xlocale_global_locale>, fmt0=, fmt0@entry=0x204182 "%s",
 ap=, ap@entry=0x8207ad4e0) at
/usr/src/lib/libc/stdio/vfprintf.c:1033
#5  0x0008221c8aea in vfprintf_l (fp=0x8222892d0, locale=0x822288ab8
<__xlocale_global_locale>, fmt0=0x204182 "%s", ap=0x8207ad4e0)
 at /usr/src/lib/libc/stdio/vfprintf.c:285
#6  0x00222efa in vlogprintf_r (ctx=0x270820 <_logctx>,
stream=0x8222892d0, fmt=0x204182 "%s", args=0x8207adad0) at logerr.c:186
...

(gdb) frame 5
#5  0x0008221c8aea in vfprintf_l (fp=0x8222892d0, locale=0x822288ab8
<__xlocale_global_locale>, fmt0=0x204182 "%s", ap=0x8207ad4e0)
 at /usr/src/lib/libc/stdio/vfprintf.c:285
285 ret = __vfprintf(fp, locale, fmt0, ap);
(gdb) p *fp
$1 = {_p = 0x27084f <_logctx+47> "e21:3e7c\n42a/64\n", _r = 0, _w =
-1025, _flags = 2057, _file = 2, _bf = {_base = 0x270820 <_logctx>
"*\"", _size = 1024},
  _lbfsize = -1024, _cookie = 0x8222892d0, _close = 0x8221c7b40
<__sclose>, _read = 0x8221c7af0 <__sread>, _seek = 0x8221c7b30
<__sseek>,
  _write = 0x8221c7b10 <__swrite>, _ub = {_base = 0x0, _size = 0}, _up
= 0x0, _ur = 0, _ubuf = "\000\000", _nbuf = "", _lb = {_base = 0x0,
_size = 0},
  _blksize = 4096, _offset = 0, _fl_mutex = 0x0, _fl_owner = 0x0,
_fl_count = 0, _orientation = -1, _mbstate = {__mbstate8 = '\000'
,
_mbstateL = 0}, _flags2 = 0}

(gdb) frame 1
#1  0x0008221c300a in __sfvwrite (fp=,
uio=0x8207ad338) at /usr/src/lib/libc/stdio/fvwrite.c:182
182 COPY(w);
(gdb) p w
$4 = -1


The dhcpcd flow leading to the crash:
1. init with setvbuf _IOLBF on stderr
https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/logerr.c#L453

2. fputc with newline called on stderr but is interrupted
https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/logerr.c#L187

3. next event received, vfprintf is called on stderr and crashes
https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/logerr.c#L186


Simple program that eventually crashes:

#include 
#include 
#include 

static void
alrm(int signo __unused)
{
alarm(1);
}

char buf[1024]; /* use global to not corrupt stack trace in core dump */

int main()
{
struct sigaction sa;

sa.sa_handler = alrm;
sigemptyset(_mask);
sa.sa_flags = 0;
sigaction(SIGALRM, , NULL);

setvbuf(stderr, buf, _IOLBF, sizeof(buf));
alarm(1);
while (!ferror(stderr)) {
fputc('\n', stderr);
}
fprintf(stderr, "%s", "a");
return 0;
}


Regards,
Guy Yur