Re: strmode should take a mode_t instead of int.

2024-06-22 Thread Otto Moerbeek
On Thu, Jun 20, 2024 at 09:17:38AM +0200, Otto Moerbeek wrote:

> On Wed, Jun 19, 2024 at 06:44:56PM +0200, Theo Buehler wrote:
> 
> > These are the ports using strmode.
> > 
> > archivers/libarchive
> > archivers/libtar
> > editors/emacs
> > games/gemrb
> > math/octave
> > misc/findutils
> > net/lftp
> > security/ssh-ldap-helper
> > shells/ksh93
> > sysutils/bfs
> > sysutils/colorls
> > sysutils/coreutils
> > sysutils/lnav
> > sysutils/tarsnap
> > 
> > Given the short list and the nature of the change, I don't think it's
> > necessary to run a bulk, but inspecting a few of them would be good,
> > especially libarchive and coreutils are depended upon by a lot of ports.
> > And there's emacs in this list.
> 
> New diff, taking the suggestion (but not all of it, the implementation
> can use mode_t as it includes sys/types.h
> 
> I tested base + coreutils + emacs builds with this.

ping...

> 
>   -Otto
> 
> Index: include/string.h
> ===
> RCS file: /home/cvs/src/include/string.h,v
> diff -u -p -r1.32 string.h
> --- include/string.h  5 Sep 2017 03:16:13 -   1.32
> +++ include/string.h  20 Jun 2024 07:13:03 -
> @@ -37,7 +37,7 @@
>  
>  #include 
>  #include 
> -#include 
> +#include 
>  
>  /*
>   * POSIX mandates that certain string functions not present in ISO C
> @@ -128,7 +128,7 @@ size_t strlcat(char *, const char *, si
>   __attribute__ ((__bounded__(__string__,1,3)));
>  size_tstrlcpy(char *, const char *, size_t)
>   __attribute__ ((__bounded__(__string__,1,3)));
> -void  strmode(int, char *);
> +void  strmode(__mode_t, char *);
>  char *strsep(char **, const char *);
>  int   timingsafe_bcmp(const void *, const void *, size_t);
>  int   timingsafe_memcmp(const void *, const void *, size_t);
> Index: lib/libc/string/strmode.c
> ===
> RCS file: /home/cvs/src/lib/libc/string/strmode.c,v
> diff -u -p -r1.8 strmode.c
> --- lib/libc/string/strmode.c 31 Aug 2015 02:53:57 -  1.8
> +++ lib/libc/string/strmode.c 20 Jun 2024 07:13:03 -
> @@ -32,10 +32,8 @@
>  #include 
>  #include 
>  
> -/* XXX mode should be mode_t */
> -
>  void
> -strmode(int mode, char *p)
> +strmode(mode_t mode, char *p)
>  {
>/* print type */
>   switch (mode & S_IFMT) {
> 



Re: strmode should take a mode_t instead of int.

2024-06-20 Thread Otto Moerbeek
On Wed, Jun 19, 2024 at 06:44:56PM +0200, Theo Buehler wrote:

> These are the ports using strmode.
> 
> archivers/libarchive
> archivers/libtar
> editors/emacs
> games/gemrb
> math/octave
> misc/findutils
> net/lftp
> security/ssh-ldap-helper
> shells/ksh93
> sysutils/bfs
> sysutils/colorls
> sysutils/coreutils
> sysutils/lnav
> sysutils/tarsnap
> 
> Given the short list and the nature of the change, I don't think it's
> necessary to run a bulk, but inspecting a few of them would be good,
> especially libarchive and coreutils are depended upon by a lot of ports.
> And there's emacs in this list.

New diff, taking the suggestion (but not all of it, the implementation
can use mode_t as it includes sys/types.h

I tested base + coreutils + emacs builds with this.

-Otto

Index: include/string.h
===
RCS file: /home/cvs/src/include/string.h,v
diff -u -p -r1.32 string.h
--- include/string.h5 Sep 2017 03:16:13 -   1.32
+++ include/string.h20 Jun 2024 07:13:03 -
@@ -37,7 +37,7 @@
 
 #include 
 #include 
-#include 
+#include 
 
 /*
  * POSIX mandates that certain string functions not present in ISO C
@@ -128,7 +128,7 @@ size_t   strlcat(char *, const char *, si
__attribute__ ((__bounded__(__string__,1,3)));
 size_t  strlcpy(char *, const char *, size_t)
__attribute__ ((__bounded__(__string__,1,3)));
-voidstrmode(int, char *);
+voidstrmode(__mode_t, char *);
 char   *strsep(char **, const char *);
 int timingsafe_bcmp(const void *, const void *, size_t);
 int timingsafe_memcmp(const void *, const void *, size_t);
Index: lib/libc/string/strmode.c
===
RCS file: /home/cvs/src/lib/libc/string/strmode.c,v
diff -u -p -r1.8 strmode.c
--- lib/libc/string/strmode.c   31 Aug 2015 02:53:57 -  1.8
+++ lib/libc/string/strmode.c   20 Jun 2024 07:13:03 -
@@ -32,10 +32,8 @@
 #include 
 #include 
 
-/* XXX mode should be mode_t */
-
 void
-strmode(int mode, char *p)
+strmode(mode_t mode, char *p)
 {
 /* print type */
switch (mode & S_IFMT) {



Re: strmode should take a mode_t instead of int.

2024-06-19 Thread Otto Moerbeek
On Tue, Jun 18, 2024 at 10:00:20PM -0700, Collin Funk wrote:

> Hi,
> 
> I noticed that strmode(3) says that the first argument should be
> mode_t. OpenBSD declares it with int which is not compatible since
> mode_t appears to be unsigned, from what I can tell.
> 
> NetBSD fixed this a long time ago and FreeBSD did the same before the
> 14.0 release.
> 
> Apologies for the lack of diff, I don't have access to an OpenBSD
> machine at the moment. I think something like this would work though:
> 
> In sys/_types.h:

I think this snippet should be in sys/types.h.

> 
> #ifndef _MODE_T_DEFINED_
> #define _MODE_T_DEFINED_
> typedef __mode_t  mode_t
> #endif
> 
> and then in string.h:

This part is not going to work as string.h include machine/_types.h
but not sys/_types.h (or sys/types.h for that matter). FreeBSD
modified it to include sys/_types.h

> #ifndef _MODE_T_DEFINED_
> #define _MODE_T_DEFINED_
> typedef __mode_t  mode_t
> #endif
> void   strmode(mode_t, char *);
> 
> Thanks,
> Collin
> 

Additionally, the implementation in src/libn/libc/string/strmode.c
needs to start using mode_t.

Building base now with the diff below. So far so good.

But this is more tricky you would think. Modifying string.h to include
more could have unwanted side effects for applications.

-Otto

Index: include/string.h
===
RCS file: /home/cvs/src/include/string.h,v
diff -u -p -r1.32 string.h
--- include/string.h5 Sep 2017 03:16:13 -   1.32
+++ include/string.h19 Jun 2024 13:11:42 -
@@ -37,7 +37,7 @@
 
 #include 
 #include 
-#include 
+#include 
 
 /*
  * POSIX mandates that certain string functions not present in ISO C
@@ -128,7 +128,11 @@ size_t  strlcat(char *, const char *, si
__attribute__ ((__bounded__(__string__,1,3)));
 size_t  strlcpy(char *, const char *, size_t)
__attribute__ ((__bounded__(__string__,1,3)));
-voidstrmode(int, char *);
+#ifndef _MODE_T_DEFINED_
+#define _MODE_T_DEFINED_
+typedef __mode_t   mode_t;
+#endif
+voidstrmode(mode_t, char *);
 char   *strsep(char **, const char *);
 int timingsafe_bcmp(const void *, const void *, size_t);
 int timingsafe_memcmp(const void *, const void *, size_t);
Index: lib/libc/string/strmode.c
===
RCS file: /home/cvs/src/lib/libc/string/strmode.c,v
diff -u -p -r1.8 strmode.c
--- lib/libc/string/strmode.c   31 Aug 2015 02:53:57 -  1.8
+++ lib/libc/string/strmode.c   19 Jun 2024 13:11:42 -
@@ -32,10 +32,8 @@
 #include 
 #include 
 
-/* XXX mode should be mode_t */
-
 void
-strmode(int mode, char *p)
+strmode(mode_t mode, char *p)
 {
 /* print type */
switch (mode & S_IFMT) {
Index: sys/sys/types.h
===
RCS file: /home/cvs/src/sys/sys/types.h,v
diff -u -p -r1.49 types.h
--- sys/sys/types.h 6 Aug 2022 13:31:13 -   1.49
+++ sys/sys/types.h 19 Jun 2024 13:11:43 -
@@ -140,7 +140,10 @@ typedef__gid_t gid_t;  /* group id */
 typedef__id_t  id_t;   /* may contain pid, uid or gid 
*/
 typedef__ino_t ino_t;  /* inode number */
 typedef__key_t key_t;  /* IPC key (for Sys V IPC) */
+#ifndef _MODE_T_DEFINED_
+#define _MODE_T_DEFINED_
 typedef__mode_tmode_t; /* permissions */
+#endif
 typedef__nlink_t   nlink_t;/* link count */
 typedef__rlim_trlim_t; /* resource limit */
 typedef__segsz_t   segsz_t;/* segment size */



Re: Upgrade to OpenBSD 7.5 broke the bsd game of cribbage

2024-06-03 Thread Otto Moerbeek
And committed, will be in 7.6

Thanks,

-Otto

On Sun, Jun 02, 2024 at 08:32:28AM -0500, Don Wilburn wrote:

> Oops.  I'll try sending this to the bugs list for posterity.
> 
> Thanks again,  DW
> 
> 
> On 6/2/24 3:22 AM, Otto Moerbeek wrote:
> > Thanks, but please reply to the list.
> > 
> > -Otot
> > 
> > On Sat, Jun 01, 2024 at 09:25:26PM -0500, Don Wilburn wrote:
> > 
> > > Thank you Otto!
> > > 
> > > I followed your advice and successfully built a patched cribbage game.  I
> > > played several times and it looks right.  I'd say go ahead and incorporate
> > > the patch in all new releases.
> > > 
> > > Apparently I'm the only person on earth who plays this game.  I consider
> > > this game a small part of BSD history, so I'm glad you kept it alive.
> > > 
> > > Adios,  DW
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > On 6/1/24 7:21 AM, Otto Moerbeek wrote:
> > > > On Wed, May 29, 2024 at 08:05:14AM +0200, Otto Moerbeek wrote:
> > > > 
> > > > > On Mon, May 27, 2024 at 09:21:34PM -0500, Don Wilburn wrote:
> > > > > 
> > > > > > Dear OpenBSD,
> > > > > > 
> > > > > > I recently upgraded from version 7.4 to 7.5.  This broke the old 
> > > > > > cribbage
> > > > > > game.  This is included with OpenBSD, if you choose to install the 
> > > > > > games.
> > > > > > 
> > > > > > I'm not a programmer, but I promise you this happened because 
> > > > > > ncurses was
> > > > > > updated from version 5.7 to 6.4
> > > > > > 
> > > > > > The problem:
> > > > > > 
> > > > > > Normally the game gives prompts for play options and cards.  It's 
> > > > > > supposed
> > > > > > to leave the prompt after the response, then advance to a new line. 
> > > > > >  This
> > > > > > gives a brief history of selections
> > > > > > 
> > > > > > Now, starting with  the third prompt (cut the cards), the prompts 
> > > > > > disappear
> > > > > > when a response key is pressed.  This ruins the game. The effect is 
> > > > > > obvious,
> > > > > > even if you don't know how to play cribbage.
> > > > > > 
> > > > > > It would be even more obvious if you have an older system to 
> > > > > > compare with a
> > > > > > current v7.5 system.
> > > > > > 
> > > > > > This happened to linux bsd-games many years ago.  A search will 
> > > > > > indicate
> > > > > > that I filed this same bug with Gentoo linux over 9 years ago.  
> > > > > > Linux
> > > > > > classic bsd-games has been unmaintained since before that time.  
> > > > > > This is
> > > > > > where I observed that the bug happened with a ncurses update.  
> > > > > > Nobody
> > > > > > pursued the solution.
> > > > > > 
> > > > > > I don't have the skills to butcher the game code to work with with 
> > > > > > the
> > > > > > update of ncurses.  Likewise, I don't know how to use a debugger or 
> > > > > > write a
> > > > > > sample program to replicate the effect.  I can't demonstrate WHY 
> > > > > > ncurses is
> > > > > > the problem.  Maybe it's the C compiler's fault?
> > > > > > 
> > > > > > I still play this obsolete command line game.  It's nostalgia, I 
> > > > > > guess.  I
> > > > > > know OpenBSD developers have really important things to maintain.   
> > > > > > If
> > > > > > someone could spare some time for this little bug, I'd be happy.  
> > > > > > Maybe it
> > > > > > could be delegated to a student?
> > > > > > 
> > > > > > Thanks for reading,  DW
> > > > > > 
> > > > > One remains a student forever.
> > > > > 
> > > > > Try this, it does not try to cut corners with switching windows.
> > > > No response from the original reporter.
> > > > 

Re: Upgrade to OpenBSD 7.5 broke the bsd game of cribbage

2024-06-01 Thread Otto Moerbeek
On Wed, May 29, 2024 at 08:05:14AM +0200, Otto Moerbeek wrote:

> On Mon, May 27, 2024 at 09:21:34PM -0500, Don Wilburn wrote:
> 
> > Dear OpenBSD,
> > 
> > I recently upgraded from version 7.4 to 7.5.  This broke the old cribbage
> > game.  This is included with OpenBSD, if you choose to install the games.
> > 
> > I'm not a programmer, but I promise you this happened because ncurses was
> > updated from version 5.7 to 6.4
> > 
> > The problem:
> > 
> > Normally the game gives prompts for play options and cards.  It's supposed
> > to leave the prompt after the response, then advance to a new line.  This
> > gives a brief history of selections
> > 
> > Now, starting with  the third prompt (cut the cards), the prompts disappear
> > when a response key is pressed.  This ruins the game. The effect is obvious,
> > even if you don't know how to play cribbage.
> > 
> > It would be even more obvious if you have an older system to compare with a
> > current v7.5 system.
> > 
> > This happened to linux bsd-games many years ago.  A search will indicate
> > that I filed this same bug with Gentoo linux over 9 years ago.  Linux
> > classic bsd-games has been unmaintained since before that time.  This is
> > where I observed that the bug happened with a ncurses update.  Nobody
> > pursued the solution.
> > 
> > I don't have the skills to butcher the game code to work with with the
> > update of ncurses.  Likewise, I don't know how to use a debugger or write a
> > sample program to replicate the effect.  I can't demonstrate WHY ncurses is
> > the problem.  Maybe it's the C compiler's fault?
> > 
> > I still play this obsolete command line game.  It's nostalgia, I guess.  I
> > know OpenBSD developers have really important things to maintain.   If
> > someone could spare some time for this little bug, I'd be happy.  Maybe it
> > could be delegated to a student?
> > 
> > Thanks for reading,  DW
> > 
> 
> One remains a student forever.
> 
> Try this, it does not try to cut corners with switching windows.

No response from the original reporter.

Is anybody else interested in testing/reviewing?

-Otto

> 
> Index: io.c
> ===
> RCS file: /home/cvs/src/games/cribbage/io.c,v
> diff -u -p -r1.22 io.c
> --- io.c  10 Jan 2016 13:35:09 -  1.22
> +++ io.c  29 May 2024 06:00:03 -
> @@ -505,14 +505,11 @@ get_line(void)
>  {
>   size_t pos;
>   int c, oy, ox;
> - WINDOW *oscr;
>  
> - oscr = stdscr;
> - stdscr = Msgwin;
> - getyx(stdscr, oy, ox);
> - refresh();
> + getyx(Msgwin, oy, ox);
> + wrefresh(Msgwin);
>   /* loop reading in the string, and put it in a temporary buffer */
> - for (pos = 0; (c = readchar()) != '\n'; clrtoeol(), refresh()) {
> + for (pos = 0; (c = readchar()) != '\n'; wclrtoeol(Msgwin), 
> wrefresh(Msgwin)) {
>   if (c == -1)
>   continue;
>   if (c == ' ' && (pos == 0 || linebuf[pos - 1] == ' '))
> @@ -522,13 +519,13 @@ get_line(void)
>   int i;
>   pos--;
>   for (i = strlen(unctrl(linebuf[pos])); i; i--)
> - addch('\b');
> + waddch(Msgwin, '\b');
>   }
>   continue;
>   }
>   if (c == killchar()) {
>   pos = 0;
> - move(oy, ox);
> + wmove(Msgwin, oy, ox);
>   continue;
>   }
>   if (pos >= LINESIZE - 1 || !(isalnum(c) || c == ' ')) {
> @@ -538,12 +535,11 @@ get_line(void)
>   if (islower(c))
>   c = toupper(c);
>   linebuf[pos++] = c;
> - addstr(unctrl(c));
> + waddstr(Msgwin, unctrl(c));
>   Mpos++;
>   }
>   while (pos < sizeof(linebuf))
>   linebuf[pos++] = '\0';
> - stdscr = oscr;
>   return (linebuf);
>  }
>  
> 



Re: Upgrade to OpenBSD 7.5 broke the bsd game of cribbage

2024-05-28 Thread Otto Moerbeek
On Mon, May 27, 2024 at 09:21:34PM -0500, Don Wilburn wrote:

> Dear OpenBSD,
> 
> I recently upgraded from version 7.4 to 7.5.  This broke the old cribbage
> game.  This is included with OpenBSD, if you choose to install the games.
> 
> I'm not a programmer, but I promise you this happened because ncurses was
> updated from version 5.7 to 6.4
> 
> The problem:
> 
> Normally the game gives prompts for play options and cards.  It's supposed
> to leave the prompt after the response, then advance to a new line.  This
> gives a brief history of selections
> 
> Now, starting with  the third prompt (cut the cards), the prompts disappear
> when a response key is pressed.  This ruins the game. The effect is obvious,
> even if you don't know how to play cribbage.
> 
> It would be even more obvious if you have an older system to compare with a
> current v7.5 system.
> 
> This happened to linux bsd-games many years ago.  A search will indicate
> that I filed this same bug with Gentoo linux over 9 years ago.  Linux
> classic bsd-games has been unmaintained since before that time.  This is
> where I observed that the bug happened with a ncurses update.  Nobody
> pursued the solution.
> 
> I don't have the skills to butcher the game code to work with with the
> update of ncurses.  Likewise, I don't know how to use a debugger or write a
> sample program to replicate the effect.  I can't demonstrate WHY ncurses is
> the problem.  Maybe it's the C compiler's fault?
> 
> I still play this obsolete command line game.  It's nostalgia, I guess.  I
> know OpenBSD developers have really important things to maintain.   If
> someone could spare some time for this little bug, I'd be happy.  Maybe it
> could be delegated to a student?
> 
> Thanks for reading,  DW
> 

One remains a student forever.

Try this, it does not try to cut corners with switching windows.

-Otto

Index: io.c
===
RCS file: /home/cvs/src/games/cribbage/io.c,v
diff -u -p -r1.22 io.c
--- io.c10 Jan 2016 13:35:09 -  1.22
+++ io.c29 May 2024 06:00:03 -
@@ -505,14 +505,11 @@ get_line(void)
 {
size_t pos;
int c, oy, ox;
-   WINDOW *oscr;
 
-   oscr = stdscr;
-   stdscr = Msgwin;
-   getyx(stdscr, oy, ox);
-   refresh();
+   getyx(Msgwin, oy, ox);
+   wrefresh(Msgwin);
/* loop reading in the string, and put it in a temporary buffer */
-   for (pos = 0; (c = readchar()) != '\n'; clrtoeol(), refresh()) {
+   for (pos = 0; (c = readchar()) != '\n'; wclrtoeol(Msgwin), 
wrefresh(Msgwin)) {
if (c == -1)
continue;
if (c == ' ' && (pos == 0 || linebuf[pos - 1] == ' '))
@@ -522,13 +519,13 @@ get_line(void)
int i;
pos--;
for (i = strlen(unctrl(linebuf[pos])); i; i--)
-   addch('\b');
+   waddch(Msgwin, '\b');
}
continue;
}
if (c == killchar()) {
pos = 0;
-   move(oy, ox);
+   wmove(Msgwin, oy, ox);
continue;
}
if (pos >= LINESIZE - 1 || !(isalnum(c) || c == ' ')) {
@@ -538,12 +535,11 @@ get_line(void)
if (islower(c))
c = toupper(c);
linebuf[pos++] = c;
-   addstr(unctrl(c));
+   waddstr(Msgwin, unctrl(c));
Mpos++;
}
while (pos < sizeof(linebuf))
linebuf[pos++] = '\0';
-   stdscr = oscr;
return (linebuf);
 }
 



Re: ntpd NULL deref

2024-03-21 Thread Otto Moerbeek
On Tue, Mar 19, 2024 at 02:06:18PM +0100, Alexander Bluhm wrote:

> Hi,
> 
> ntpd crashed on my laptop.  cstr->addr is NULL.  According to
> accounting it was running for a while.
> 
> ntpd[43355]  -   _ntp  __ 0.06 secs Thu Mar 14 10:57 (41:41:32.00)
> ntpd[81566]  -F  root  __ 0.28 secs Thu Mar 14 10:57 (41:39:28.00)
> ntpd[5567]   -DXT_ntp  __ 0.02 secs Thu Mar 14 10:57 (41:39:28.00)
> 
> -rw-r--r--   1 root  wheel  1583504 Mar 16 03:36 5567.core
> 
> constraint.c
>204  cstr->last = now;
>205  cstr->state = STATE_QUERY_SENT;
>206
>207  memset(&am, 0, sizeof(am));
> *  208  memcpy(&am.a, cstr->addr, sizeof(am.a));
>209  am.synced = synced;
>210
>211  iov[iov_cnt].iov_base = &am;
>212  iov[iov_cnt++].iov_len = sizeof(am);
> 
> Core was generated by `ntpd'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x06db7eb7fea0 in memcpy (dst0=0x7b224d08a0e8, src0=, 
> length=272) at /usr/src/lib/libc/string/memcpy.c:103
> 103 TLOOP(*(word *)dst = *(word *)src; src += wsize; dst += 
> wsize);
> (gdb) bt
> #0  0x06db7eb7fea0 in memcpy (dst0=0x7b224d08a0e8, src0=, 
> length=272) at /usr/src/lib/libc/string/memcpy.c:103
> #1  0x06d915308864 in constraint_query (cstr=0x6db756f4000, synced=0) at 
> /usr/src/usr.sbin/ntpd/constraint.c:208
> #2  0x06d9152ff753 in ntp_main (nconf=, pw= out>, argc=, argv=)
> at /usr/src/usr.sbin/ntpd/ntp.c:330
> #3  0x06d9152fd07a in main (argc=, argv=) 
> at /usr/src/usr.sbin/ntpd/ntpd.c:224
> (gdb) frame 1
> #1  0x06d915308864 in constraint_query (cstr=0x6db756f4000, synced=0) at 
> /usr/src/usr.sbin/ntpd/constraint.c:208
> 208 memcpy(&am.a, cstr->addr, sizeof(am.a));
> 
> (gdb) print *cstr
> value of type `constraint' requires 65704 bytes, which is more than 
> max-value-size
> (gdb) print cstr->entry
> $3 = {tqe_next = 0x0, tqe_prev = 0x6dba8b72000}
> (gdb) print cstr->addr_head
> $4 = {name = 0x6db60004850 "www.google.com", path = 0x6db600041c0 "/", a = 
> 0x0, pool = 2 '\002'}
> (gdb) print cstr->addr
> $5 = (struct ntp_addr *) 0x0
> (gdb) print cstr->senderrors
> $6 = 0
> (gdb) print cstr->state
> $7 = STATE_QUERY_SENT
> (gdb) print cstr->id
> $11 = 209
> (gdb) print cstr->fd
> $12 = -1
> (gdb) print cstr->pid
> $13 = 0
> (gdb) print cstr->ibuf
> value of type `imsgbuf' requires 65600 bytes, which is more than 
> max-value-size
> (gdb) print cstr->last
> $14 = 146373
> (gdb) print cstr->constraint
> $15 = 0
> (gdb) print cstr->dnstries
> $16 = 0
> 
> bluhm
> 

I'll try to look into this, but the constraint state engine is very
hard to follow...

-Otto



Re: repeated NTP peers in OpenNTPD

2024-01-01 Thread Otto Moerbeek
On Wed, Dec 27, 2023 at 06:04:22PM +, Guilherme Janczak wrote:

> On Wed, Dec 27, 2023 at 09:54:14AM +0100, Otto Moerbeek wrote:
> > On Mon, Dec 18, 2023 at 04:42:09PM +, Guilherme Janczak wrote:
> > 
> > > On Mon, Dec 18, 2023 at 07:31:37AM +0100, Otto Moerbeek wrote:
> > > > On Sun, Dec 17, 2023 at 05:37:50PM +0100, Otto Moerbeek wrote:
> > > >
> > > > > On Fri, Dec 15, 2023 at 02:54:19PM +0100, Otto Moerbeek wrote:
> > > > >
> > > > > > On Sun, Dec 10, 2023 at 09:57:08AM +0100, Otto Moerbeek wrote:
> > > > > >
> > > > > > > On Fri, Dec 01, 2023 at 09:18:32PM +, 
> > > > > > > guilherme.janc...@yandex.com wrote:
> > > > > > >
> > > > > > > > >Synopsis:  Repeated NTP peers in OpenNTPD
> > > > > > > > >Category:  user
> > > > > > > > >Environment:
> > > > > > > > System  : OpenBSD 7.4
> > > > > > > > Details : OpenBSD 7.4 (GENERIC.MP) #0: Sun Oct 22 
> > > > > > > > 12:13:42 MDT 2023
> > > > > > > >  
> > > > > > > > r...@syspatch-74-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > > > > > >
> > > > > > > > Architecture: OpenBSD.amd64
> > > > > > > > Machine : amd64
> > > > > > > > >Description:
> > > > > > > > If the same address/domain is specified multiple times 
> > > > > > > > in
> > > > > > > > OpenNTPD's configuration file, or if multiple domains 
> > > > > > > > resolve
> > > > > > > > to the same IP address, OpenNTPD will treat the same IP 
> > > > > > > > address
> > > > > > > > as if it was multiple peers.
> > > > > > > > >How-To-Repeat:
> > > > > > > > This can be tested by appending `server 127.0.0.1` 
> > > > > > > > multiple
> > > > > > > > times to the configuration file.
> > > > > > > >
> > > > > > > > Alternatively, assuming a default OpenNTPD 
> > > > > > > > configuration file
> > > > > > > > from OpenBSD 7.4, the following entries can be added to
> > > > > > > > /etc/hosts:
> > > > > > > > 127.0.0.1   time.cloudflare.com
> > > > > > > > 127.0.0.1   pool.ntp.org
> > > > > > > >
> > > > > > > > I noticed this bug using the default 7.4 configuration 
> > > > > > > > file. It
> > > > > > > > can happen because time.cloudflare.com is part of 
> > > > > > > > pool.ntp.org:
> > > > > > > > https://www.ntppool.org/scores/162.159.200.1
> > > > > > > > https://www.ntppool.org/scores/162.159.200.123
> > > > > > > > >Fix:
> > > > > > > > Removing the `server time.cloudflare.com` line from the
> > > > > > > > configuration file is a simple fix the user can make, 
> > > > > > > > but
> > > > > > > > OpenNTPD should check if an IP address it tries to add 
> > > > > > > > to the
> > > > > > > > list of peers is already a peer, and ignore it if so. 
> > > > > > > > If a
> > > > > > > > server is added with the `server` (not `servers`) 
> > > > > > > > keyword in the
> > > > > > > > configuration file, OpenNTPD should try the next IP the 
> > > > > > > > domain
> > > > > > > > resolves to if applicable.
> > > > > > > >
> > > > > > >
> > > > > > > Thanks for the report, I'll take a look.
> > > > > > >
> > > > > > >   -Otto
> > > > > > >
> > > > > >
> > > > > > Due to verious reasons this is all a bit complicated, I did not 
> > > > > > 

Re: repeated NTP peers in OpenNTPD

2023-12-27 Thread Otto Moerbeek
On Mon, Dec 18, 2023 at 04:42:09PM +, Guilherme Janczak wrote:

> On Mon, Dec 18, 2023 at 07:31:37AM +0100, Otto Moerbeek wrote:
> > On Sun, Dec 17, 2023 at 05:37:50PM +0100, Otto Moerbeek wrote:
> >
> > > On Fri, Dec 15, 2023 at 02:54:19PM +0100, Otto Moerbeek wrote:
> > >
> > > > On Sun, Dec 10, 2023 at 09:57:08AM +0100, Otto Moerbeek wrote:
> > > >
> > > > > On Fri, Dec 01, 2023 at 09:18:32PM +, 
> > > > > guilherme.janc...@yandex.com wrote:
> > > > >
> > > > > > >Synopsis:  Repeated NTP peers in OpenNTPD
> > > > > > >Category:  user
> > > > > > >Environment:
> > > > > > System  : OpenBSD 7.4
> > > > > > Details : OpenBSD 7.4 (GENERIC.MP) #0: Sun Oct 22 12:13:42 
> > > > > > MDT 2023
> > > > > >  
> > > > > > r...@syspatch-74-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > > > >
> > > > > > Architecture: OpenBSD.amd64
> > > > > > Machine : amd64
> > > > > > >Description:
> > > > > > If the same address/domain is specified multiple times in
> > > > > > OpenNTPD's configuration file, or if multiple domains resolve
> > > > > > to the same IP address, OpenNTPD will treat the same IP address
> > > > > > as if it was multiple peers.
> > > > > > >How-To-Repeat:
> > > > > > This can be tested by appending `server 127.0.0.1` multiple
> > > > > > times to the configuration file.
> > > > > >
> > > > > > Alternatively, assuming a default OpenNTPD configuration file
> > > > > > from OpenBSD 7.4, the following entries can be added to
> > > > > > /etc/hosts:
> > > > > > 127.0.0.1   time.cloudflare.com
> > > > > > 127.0.0.1   pool.ntp.org
> > > > > >
> > > > > > I noticed this bug using the default 7.4 configuration file. It
> > > > > > can happen because time.cloudflare.com is part of pool.ntp.org:
> > > > > > https://www.ntppool.org/scores/162.159.200.1
> > > > > > https://www.ntppool.org/scores/162.159.200.123
> > > > > > >Fix:
> > > > > > Removing the `server time.cloudflare.com` line from the
> > > > > > configuration file is a simple fix the user can make, but
> > > > > > OpenNTPD should check if an IP address it tries to add to the
> > > > > > list of peers is already a peer, and ignore it if so. If a
> > > > > > server is added with the `server` (not `servers`) keyword in the
> > > > > > configuration file, OpenNTPD should try the next IP the domain
> > > > > > resolves to if applicable.
> > > > > >
> > > > >
> > > > > Thanks for the report, I'll take a look.
> > > > >
> > > > >   -Otto
> > > > >
> > > >
> > > > Due to verious reasons this is all a bit complicated, I did not find a
> > > > nice solution yet. Some patience required.
> > > >
> > > > -Otto
> > > >
> > >
> > > Found some more time. Try this. The approach is: we assume the
> > > addresses of a pool (from the "servers" keyword) vary over time. So if
> > > one of the pool addresses is in the address list of a peer ("server")
> > > we skip it ad try to re-resolve the pool, lookin for a new address, as
> > > we already did earlier when a pool member doe snot respond.
> > >
> > > I decided to not implement "the move to abother address of the list"
> > > in the "server" case. The address logic is already complex enough.
> > >
> > >   -Otto
> > >
> > > Index: client.c
> > > ===
> > > RCS file: /home/cvs/src/usr.sbin/ntpd/client.c,v
> > > diff -u -p -r1.117 client.c
> > > --- client.c  24 Mar 2022 07:37:19 -  1.117
> > > +++ client.c  17 Dec 2023 09:13:16 -
> > > @@ -108,6 +108,7 @@ client_nextaddr(struct ntp_peer *p)
> > >   return (-1);
> > >
> > >   if (p->addr_head.a == NULL) 

Re: ntpd crash in constraint_msg_close log_sockaddr

2023-12-18 Thread Otto Moerbeek
On Mon, Dec 18, 2023 at 07:35:07PM +0100, Otto Moerbeek wrote:

> On Mon, Dec 18, 2023 at 06:38:47PM +0100, Alexander Bluhm wrote:
> 
> > Hi,
> > 
> > for some days or weeks I see crashes of ntpd in accounting log on
> > my laptop.
> > 
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0  log_sockaddr (sa=0x8) at /usr/src/usr.sbin/ntpd/util.c:159
> > 159 if (getnameinfo(sa, SA_LEN(sa), buf, sizeof(buf), NULL, 0,
> > (gdb) bt
> > #0  log_sockaddr (sa=0x8) at /usr/src/usr.sbin/ntpd/util.c:159
> > #1  0x0b02fb57fc32 in constraint_msg_close (id=,
> > data=0xb058f8f3770 "\001", len=4)
> > at /usr/src/usr.sbin/ntpd/constraint.c:714
> > #2  0x0b02fb575f8a in ntp_dispatch_imsg ()
> > at /usr/src/usr.sbin/ntpd/ntp.c:516
> > #3  0x0b02fb5758b8 in ntp_main (nconf=, pw= > out>,
> > argc=, argv=)
> > at /usr/src/usr.sbin/ntpd/ntp.c:378
> > #4  0x0b02fb57304a in main (argc=, argv=)
> > at /usr/src/usr.sbin/ntpd/ntpd.c:224
> > 
> > (gdb) frame 1
> > #1  0x0b02fb57fc32 in constraint_msg_close (id=,
> > data=0xb058f8f3770 "\001", len=4)
> > at /usr/src/usr.sbin/ntpd/constraint.c:714
> > 714 log_sockaddr((struct sockaddr *)
> > (gdb) print cstr
> > $2 = (struct constraint *) 0xb05b96ac000
> > (gdb) print cstr->addr
> > $3 = (struct ntp_addr *) 0x0
> > 
> > Logging a null pointer address does not work.
> > 
> >711  if (fail) {
> >712  log_debug("no constraint reply from %s"
> >713  " received in time, next query %ds",
> >714  log_sockaddr((struct sockaddr *)
> >715  &cstr->addr->ss), CONSTRAINT_SCAN_INTERVAL);
> > 
> > bluhm
> > 
> 
> This should prevent that and a few potenial similar cases.

Though this is a more fundamental appoach and is easier on the eyes imo.

-Otto

Index: client.c
===
RCS file: /home/cvs/src/usr.sbin/ntpd/client.c,v
diff -u -p -r1.117 client.c
--- client.c24 Mar 2022 07:37:19 -  1.117
+++ client.c18 Dec 2023 19:05:41 -
@@ -351,8 +351,7 @@ client_dispatch(struct ntp_peer *p, u_in
interval = error_interval();
set_next(p, interval);
log_info("reply from %s: not synced (%s), next query %llds",
-   log_sockaddr((struct sockaddr *)&p->addr->ss), s,
-   (long long)interval);
+   log_ntp_addr(p->addr), s, (long long)interval);
return (0);
}
 
@@ -379,7 +378,7 @@ client_dispatch(struct ntp_peer *p, u_in
if (!p->trusted && conf->constraint_median != 0 &&
(constraint_check(T2) != 0 || constraint_check(T3) != 0)) {
log_info("reply from %s: constraint check failed",
-   log_sockaddr((struct sockaddr *)&p->addr->ss));
+   log_ntp_addr(p->addr));
set_next(p, error_interval());
return (0);
}
@@ -392,7 +391,7 @@ client_dispatch(struct ntp_peer *p, u_in
set_next(p, interval);
log_info("reply from %s: negative delay %fs, "
"next query %llds",
-   log_sockaddr((struct sockaddr *)&p->addr->ss),
+   log_ntp_addr(p->addr),
p->reply[p->shift].delay, (long long)interval);
return (0);
}
@@ -431,7 +430,7 @@ client_dispatch(struct ntp_peer *p, u_in
if (p->trustlevel < TRUSTLEVEL_BADPEER &&
p->trustlevel + 1 >= TRUSTLEVEL_BADPEER)
log_info("peer %s now valid",
-   log_sockaddr((struct sockaddr *)&p->addr->ss));
+   log_ntp_addr(p->addr));
p->trustlevel++;
}
 
@@ -506,7 +505,7 @@ client_log_error(struct ntp_peer *peer, 
 {
const char *address;
 
-   address = log_sockaddr((struct sockaddr *)&peer->addr->ss);
+   address = log_ntp_addr(peer->addr);
if (peer->lasterror == error) {
log_debug("%s %s: %s", operation, address, strerror(error));
return;
Index: constraint.c
===
RCS file: /home/cvs/src/usr.sbin/ntpd/constraint.c,v
diff -u -p -r1.55 constraint.c
--- constraint.c6 Dec 2023 15:51:53 -   1.55
+++ constraint.c  

Re: ntpd crash in constraint_msg_close log_sockaddr

2023-12-18 Thread Otto Moerbeek
On Mon, Dec 18, 2023 at 06:38:47PM +0100, Alexander Bluhm wrote:

> Hi,
> 
> for some days or weeks I see crashes of ntpd in accounting log on
> my laptop.
> 
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  log_sockaddr (sa=0x8) at /usr/src/usr.sbin/ntpd/util.c:159
> 159 if (getnameinfo(sa, SA_LEN(sa), buf, sizeof(buf), NULL, 0,
> (gdb) bt
> #0  log_sockaddr (sa=0x8) at /usr/src/usr.sbin/ntpd/util.c:159
> #1  0x0b02fb57fc32 in constraint_msg_close (id=,
> data=0xb058f8f3770 "\001", len=4)
> at /usr/src/usr.sbin/ntpd/constraint.c:714
> #2  0x0b02fb575f8a in ntp_dispatch_imsg ()
> at /usr/src/usr.sbin/ntpd/ntp.c:516
> #3  0x0b02fb5758b8 in ntp_main (nconf=, pw=,
> argc=, argv=)
> at /usr/src/usr.sbin/ntpd/ntp.c:378
> #4  0x0b02fb57304a in main (argc=, argv=)
> at /usr/src/usr.sbin/ntpd/ntpd.c:224
> 
> (gdb) frame 1
> #1  0x0b02fb57fc32 in constraint_msg_close (id=,
> data=0xb058f8f3770 "\001", len=4)
> at /usr/src/usr.sbin/ntpd/constraint.c:714
> 714 log_sockaddr((struct sockaddr *)
> (gdb) print cstr
> $2 = (struct constraint *) 0xb05b96ac000
> (gdb) print cstr->addr
> $3 = (struct ntp_addr *) 0x0
> 
> Logging a null pointer address does not work.
> 
>711  if (fail) {
>712  log_debug("no constraint reply from %s"
>713  " received in time, next query %ds",
>714  log_sockaddr((struct sockaddr *)
>715  &cstr->addr->ss), CONSTRAINT_SCAN_INTERVAL);
> 
> bluhm
> 

This should prevent that and a few potenial similar cases.

-Otto

Index: constraint.c
===
RCS file: /home/cvs/src/usr.sbin/ntpd/constraint.c,v
diff -u -p -r1.54 constraint.c
--- constraint.c27 Nov 2022 13:19:00 -  1.54
+++ constraint.c18 Dec 2023 18:34:19 -
@@ -467,10 +467,9 @@ priv_constraint_check_child(pid_t pid, i
if (sig != SIGTERM) {
signame = strsignal(sig) ?
strsignal(sig) : "unknown";
-   log_warnx("constraint %s; "
+   log_warnx("constraint "
"terminated with signal %d (%s)",
-   log_sockaddr((struct sockaddr *)
-   &cstr->addr->ss), sig, signame);
+   sig, signame);
}
fail = 1;
}
@@ -679,9 +678,10 @@ constraint_msg_result(u_int32_t id, u_in
offset = gettime_from_timeval(&tv[0]) -
gettime_from_timeval(&tv[1]);
 
-   log_info("constraint reply from %s: offset %f",
-   log_sockaddr((struct sockaddr *)&cstr->addr->ss),
-   offset);
+   if (cstr->addr != NULL)
+   log_info("constraint reply from %s: offset %f",
+   log_sockaddr((struct sockaddr *)&cstr->addr->ss),
+   offset);
 
cstr->state = STATE_REPLY_RECEIVED;
cstr->last = getmonotime();
@@ -710,10 +710,11 @@ constraint_msg_close(u_int32_t id, u_int
memcpy(&fail, data, len);
 
if (fail) {
-   log_debug("no constraint reply from %s"
-   " received in time, next query %ds",
-   log_sockaddr((struct sockaddr *)
-   &cstr->addr->ss), CONSTRAINT_SCAN_INTERVAL);
+   if (cstr->addr != NULL)
+   log_debug("no constraint reply from %s"
+   " received in time, next query %ds",
+   log_sockaddr((struct sockaddr *)
+   &cstr->addr->ss), CONSTRAINT_SCAN_INTERVAL);

cnt = 0;
TAILQ_FOREACH(tmp, &conf->constraints, entry)



Re: repeated NTP peers in OpenNTPD

2023-12-18 Thread Otto Moerbeek
On Mon, Dec 18, 2023 at 04:42:09PM +, Guilherme Janczak wrote:

> On Mon, Dec 18, 2023 at 07:31:37AM +0100, Otto Moerbeek wrote:
> > On Sun, Dec 17, 2023 at 05:37:50PM +0100, Otto Moerbeek wrote:
> >
> > > On Fri, Dec 15, 2023 at 02:54:19PM +0100, Otto Moerbeek wrote:
> > >
> > > > On Sun, Dec 10, 2023 at 09:57:08AM +0100, Otto Moerbeek wrote:
> > > >
> > > > > On Fri, Dec 01, 2023 at 09:18:32PM +, 
> > > > > guilherme.janc...@yandex.com wrote:
> > > > >
> > > > > > >Synopsis:  Repeated NTP peers in OpenNTPD
> > > > > > >Category:  user
> > > > > > >Environment:
> > > > > > System  : OpenBSD 7.4
> > > > > > Details : OpenBSD 7.4 (GENERIC.MP) #0: Sun Oct 22 12:13:42 
> > > > > > MDT 2023
> > > > > >  
> > > > > > r...@syspatch-74-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > > > >
> > > > > > Architecture: OpenBSD.amd64
> > > > > > Machine : amd64
> > > > > > >Description:
> > > > > > If the same address/domain is specified multiple times in
> > > > > > OpenNTPD's configuration file, or if multiple domains resolve
> > > > > > to the same IP address, OpenNTPD will treat the same IP address
> > > > > > as if it was multiple peers.
> > > > > > >How-To-Repeat:
> > > > > > This can be tested by appending `server 127.0.0.1` multiple
> > > > > > times to the configuration file.
> > > > > >
> > > > > > Alternatively, assuming a default OpenNTPD configuration file
> > > > > > from OpenBSD 7.4, the following entries can be added to
> > > > > > /etc/hosts:
> > > > > > 127.0.0.1   time.cloudflare.com
> > > > > > 127.0.0.1   pool.ntp.org
> > > > > >
> > > > > > I noticed this bug using the default 7.4 configuration file. It
> > > > > > can happen because time.cloudflare.com is part of pool.ntp.org:
> > > > > > https://www.ntppool.org/scores/162.159.200.1
> > > > > > https://www.ntppool.org/scores/162.159.200.123
> > > > > > >Fix:
> > > > > > Removing the `server time.cloudflare.com` line from the
> > > > > > configuration file is a simple fix the user can make, but
> > > > > > OpenNTPD should check if an IP address it tries to add to the
> > > > > > list of peers is already a peer, and ignore it if so. If a
> > > > > > server is added with the `server` (not `servers`) keyword in the
> > > > > > configuration file, OpenNTPD should try the next IP the domain
> > > > > > resolves to if applicable.
> > > > > >
> > > > >
> > > > > Thanks for the report, I'll take a look.
> > > > >
> > > > >   -Otto
> > > > >
> > > >
> > > > Due to verious reasons this is all a bit complicated, I did not find a
> > > > nice solution yet. Some patience required.
> > > >
> > > > -Otto
> > > >
> > >
> > > Found some more time. Try this. The approach is: we assume the
> > > addresses of a pool (from the "servers" keyword) vary over time. So if
> > > one of the pool addresses is in the address list of a peer ("server")
> > > we skip it ad try to re-resolve the pool, lookin for a new address, as
> > > we already did earlier when a pool member doe snot respond.
> > >
> > > I decided to not implement "the move to abother address of the list"
> > > in the "server" case. The address logic is already complex enough.
> > >
> > >   -Otto
> > >
> > > Index: client.c
> > > ===
> > > RCS file: /home/cvs/src/usr.sbin/ntpd/client.c,v
> > > diff -u -p -r1.117 client.c
> > > --- client.c  24 Mar 2022 07:37:19 -  1.117
> > > +++ client.c  17 Dec 2023 09:13:16 -
> > > @@ -108,6 +108,7 @@ client_nextaddr(struct ntp_peer *p)
> > >   return (-1);
> > >
> > >   if (p->addr_head.a == NULL) 

Re: OpenBSD installer partitioning has a problem!

2023-12-18 Thread Otto Moerbeek
On Mon, Dec 18, 2023 at 01:15:57PM -0300, spnti wrote:

> Hello guys!
> 
> The OpenBSD installer partitioning does not accept more than 15 partitions
> with GPT (partitions 'c' and 'i' are already defined in advance by the
> installer). It limits from partition 'a' to 'p', but GPT partitions must
> support up to 128 partitions.
> 
> Thank you for your attention and help.

It would be nice if it would, but nobody did the work (which is not
easy, that certainlyb plays a role). And "must" is a strong word to
use. There is no authority that can force us.

-Otto



Re: dump(8) SEGVs in searchdir() on 7.4-stable

2023-12-17 Thread Otto Moerbeek
On Sun, Dec 17, 2023 at 10:02:49PM +0100, Alexander Klimov wrote:

> Actually backup runs the whole script via doas as root.
> 
> tower# su -ls /bin/ksh backup
> tower$ ulimit -a
> time(cpu-seconds)unlimited
> file(blocks) unlimited
> coredump(blocks) unlimited
> data(kbytes) 3149824
> stack(kbytes)4096
> lockedmem(kbytes)87381
> memory(kbytes)   15898400
> nofiles(descriptors) 512
> processes128
> tower$ ^D
> tower# ulimit -a
> time(cpu-seconds)unlimited
> file(blocks) unlimited
> coredump(blocks) unlimited
> data(kbytes) 4194304
> stack(kbytes)8192
> lockedmem(kbytes)87381
> memory(kbytes)   15898400
> nofiles(descriptors) 128
> processes1310
> tower#

OK, not *very* small, a bit puzzling. Can you run with the patch I
posted?  I'll do some a full review of all the allocations done by
dump. 

-Otto



Re: repeated NTP peers in OpenNTPD

2023-12-17 Thread Otto Moerbeek
On Sun, Dec 17, 2023 at 05:37:50PM +0100, Otto Moerbeek wrote:

> On Fri, Dec 15, 2023 at 02:54:19PM +0100, Otto Moerbeek wrote:
> 
> > On Sun, Dec 10, 2023 at 09:57:08AM +0100, Otto Moerbeek wrote:
> > 
> > > On Fri, Dec 01, 2023 at 09:18:32PM +, guilherme.janc...@yandex.com 
> > > wrote:
> > > 
> > > > >Synopsis:  Repeated NTP peers in OpenNTPD
> > > > >Category:  user
> > > > >Environment:
> > > > System  : OpenBSD 7.4
> > > > Details : OpenBSD 7.4 (GENERIC.MP) #0: Sun Oct 22 12:13:42 
> > > > MDT 2023
> > > >  
> > > > r...@syspatch-74-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > > 
> > > > Architecture: OpenBSD.amd64
> > > > Machine : amd64
> > > > >Description:
> > > > If the same address/domain is specified multiple times in
> > > > OpenNTPD's configuration file, or if multiple domains resolve
> > > > to the same IP address, OpenNTPD will treat the same IP address
> > > > as if it was multiple peers.
> > > > >How-To-Repeat:
> > > > This can be tested by appending `server 127.0.0.1` multiple
> > > > times to the configuration file.
> > > > 
> > > > Alternatively, assuming a default OpenNTPD configuration file
> > > > from OpenBSD 7.4, the following entries can be added to
> > > > /etc/hosts:
> > > > 127.0.0.1   time.cloudflare.com
> > > > 127.0.0.1   pool.ntp.org
> > > > 
> > > > I noticed this bug using the default 7.4 configuration file. It
> > > > can happen because time.cloudflare.com is part of pool.ntp.org:
> > > > https://www.ntppool.org/scores/162.159.200.1
> > > > https://www.ntppool.org/scores/162.159.200.123
> > > > >Fix:
> > > > Removing the `server time.cloudflare.com` line from the
> > > > configuration file is a simple fix the user can make, but
> > > > OpenNTPD should check if an IP address it tries to add to the
> > > > list of peers is already a peer, and ignore it if so. If a
> > > > server is added with the `server` (not `servers`) keyword in the
> > > > configuration file, OpenNTPD should try the next IP the domain
> > > > resolves to if applicable.
> > > > 
> > > 
> > > Thanks for the report, I'll take a look.
> > > 
> > >   -Otto
> > > 
> > 
> > Due to verious reasons this is all a bit complicated, I did not find a
> > nice solution yet. Some patience required.
> > 
> > -Otto
> > 
> 
> Found some more time. Try this. The approach is: we assume the
> addresses of a pool (from the "servers" keyword) vary over time. So if
> one of the pool addresses is in the address list of a peer ("server")
> we skip it ad try to re-resolve the pool, lookin for a new address, as
> we already did earlier when a pool member doe snot respond.
> 
> I decided to not implement "the move to abother address of the list"
> in the "server" case. The address logic is already complex enough.
> 
>   -Otto
> 
> Index: client.c
> ===
> RCS file: /home/cvs/src/usr.sbin/ntpd/client.c,v
> diff -u -p -r1.117 client.c
> --- client.c  24 Mar 2022 07:37:19 -  1.117
> +++ client.c  17 Dec 2023 09:13:16 -
> @@ -108,6 +108,7 @@ client_nextaddr(struct ntp_peer *p)
>   return (-1);
>  
>   if (p->addr_head.a == NULL) {
> + log_debug("kicking query for %s",  p->addr_head.name);
>   priv_dns(IMSG_HOST_DNS, p->addr_head.name, p->id);
>   p->state = STATE_DNS_INPROGRESS;
>   return (-1);
> @@ -145,7 +146,15 @@ client_query(struct ntp_peer *p)
>   }
>  
>   if (p->state < STATE_DNS_DONE || p->addr == NULL)
> - return (-1);
> + return (0);
> +
> + /* if we're a pool member and a peer has taken our address, signal */
> + if (p->addr_head.pool != 0 && addr_already_used(&p->addr->ss)) {
> + log_debug("pool addr %s used by peer, will reresolve pool",
> + log_sockaddr((struct sockaddr *)&p->addr->ss))

Re: dump(8) SEGVs in searchdir() on 7.4-stable

2023-12-17 Thread Otto Moerbeek


This diff checks the allocations if a few pretty big tables.

-Otto

Index: main.c
===
RCS file: /home/cvs/src/sbin/dump/main.c,v
diff -u -p -r1.63 main.c
--- main.c  2 Jun 2022 15:35:55 -   1.63
+++ main.c  17 Dec 2023 20:30:45 -
@@ -465,6 +465,9 @@ main(int argc, char *argv[])
usedinomap = calloc((unsigned) mapsize, sizeof(char));
dumpdirmap = calloc((unsigned) mapsize, sizeof(char));
dumpinomap = calloc((unsigned) mapsize, sizeof(char));
+   if (usedinomap == NULL || dumpdirmap == NULL || dumpinomap == NULL)
+   quit("Failed to allocate tables");
+
tapesize = 3 * (howmany(mapsize * sizeof(char), TP_BSIZE) + 1);
 
nonodump = spcl.c_level < honorlevel;



Re: dump(8) SEGVs in searchdir() on 7.4-stable

2023-12-17 Thread Otto Moerbeek
On Sun, Dec 17, 2023 at 08:23:53PM +0100, Alexander Klimov wrote:

> 
> 
> On 17.12.23 19:19, Otto Moerbeek wrote:
> > On Sun, Dec 17, 2023 at 07:07:40PM +0100, Otto Moerbeek wrote:
> > 
> > > On Sun, Dec 17, 2023 at 06:55:27PM +0100, Alexander Klimov wrote:
> > > 
> > > > Much better!
> > > > 
> > > > Program terminated with signal 11, Segmentation fault.
> > > > #0  0x08e920ed287f in searchdir (ino=7946491, blkno=Unhandled dwarf 
> > > > expression opcode 0xa3
> > > > )
> > > >  at /usr/src/sbin/dump/traverse.c:474
> > > > 474 if (TSTINO(dp->d_ino, dumpinomap)) {
> > > > (gdb) info locals
> > > > dblk = 0x3633633264006136 
> > > > ret = 0
> > > > loc = 29445
> > > > mode = Variable "mode" is not available.
> > > > (gdb) p *dblk
> > > > Cannot access memory at address 0x3633633264006136
> > > > (gdb) p dp
> > > > No symbol "dp" in current context.
> > > > (gdb) p ip
> > > > Variable "ip" is not available.
> > > > (gdb) p ino
> > > > $7 = 7946491
> > > > (gdb) p blkno
> > > > Unhandled dwarf expression opcode 0xa3
> > > > (gdb) p size
> > > > $8 = 186
> > > > (gdb) p filesize
> > > > Unhandled dwarf expression opcode 0xa3
> > > > (gdb) p tapesize
> > > > Variable "tapesize" is not available.
> > > > (gdb) p nodump
> > > > $9 = 0
> > > > (gdb) p dp
> > > > No symbol "dp" in current context.
> > > > (gdb) p ip
> > > > Variable "ip" is not available.
> > > > (gdb) p loc
> > > > $10 = 29445
> > > > (gdb) p dblk
> > > > $11 = 0x3633633264006136 
> > > > (gdb) p mode
> > > > Variable "mode" is not available.
> > > > (gdb) p ret
> > > > $12 = 0
> > > > (gdb)
> > > > 
> > > > But, admittedly, I'm not very familiar with the code.
> > > > So I can only guess. Too large offset? Too less memory? ...
> > > > 
> > > > On 17.12.23 17:27, Otto Moerbeek wrote:
> > > > > Rebuild with
> > > > > cd /usr/src/sbin/dump
> > > > > make obj
> > > > > make clean
> > > > > DEBUG=-g make
> > > > > 
> > > > > And then run gdb again.
> > > 
> > > Install gdb from packages and then run egdb. It understands moden
> > > DWARF expresssions better.
> 
> (gdb) bt
> #0  0x08e920ed287f in ?? ()
> #1  0x007940fb in ?? ()
> #2  0x08e920f3d2c8 in ?? ()
> #3  0x74cace43cd60 in ?? ()
> #4  0x08e9 in ?? ()
> #5  0x in ?? ()
> (gdb)
> 
> > > The value of size is also interesting.
> > 
> > I meant the global mapsize. And while there, dumpinomap is interesting
> > as well.
> 
> (gdb) p dumpinomap
> $1 = 0x0

So it looks like allocation of dumpinomap failed.

> (gdb) p mapsize
> $2 = 2283
> (gdb)
> 
> > 
> > Can you also show
> > 
> > $ doas dumpfs /dev/rsd0a |head -20
> 
> tower# mount
> /dev/sd6a on / type ffs (local, wxallowed)
> /dev/sd7a on /raid1 type ffs (local, wxallowed)
> tower# dumpfs /dev/rsd6a |head -20
> magic 19540119 (FFS2) timeSun Dec 17 19:56:26 2023
> superblock location   65536   id  [ 64c9ff8f f5159e71 ]
> ncg   586 size122096568   blocks  120174461
> bsize 32768   shift   15  mask0x8000
> fsize 4096shift   12  mask0xf000
> frag  8   shift   3   fsbtodb 3
> minfree   5%  optim   timesymlinklen 120
> maxbsize 0maxbpg  4096maxcontig 1 contigsumsize 0
> nbfree14127705ndir152086  nifree  28188483nffree  
> 75665
> bpg   26062   fpg 208496  ipg 52224
> nindir4096inopb   128 maxfilesize 2252349704110079
> sbsize4096cgsize  32768   csaddr  3304cssize  12288
> sblkno24  cblkno  32  iblkno  40  dblkno  3304
> cgrotor   0   fmod0   ronly   0   clean   0
> avgfpdir 64   avgfilesize 16384
> flags none
> fsmnt /
> volname   swuid   0
> 
> cs[].cs_(nbfree,ndir,nifree,nffree):
> tower#
> 
> > 
> > (replacing rsd0a with your device plus the exact dump command line you are 
> > using?
> 
> dump -3 -auf - /
> 
> > 
> >   

Re: dump(8) SEGVs in searchdir() on 7.4-stable

2023-12-17 Thread Otto Moerbeek
On Sun, Dec 17, 2023 at 07:07:40PM +0100, Otto Moerbeek wrote:

> On Sun, Dec 17, 2023 at 06:55:27PM +0100, Alexander Klimov wrote:
> 
> > Much better!
> > 
> > Program terminated with signal 11, Segmentation fault.
> > #0  0x08e920ed287f in searchdir (ino=7946491, blkno=Unhandled dwarf 
> > expression opcode 0xa3
> > )
> > at /usr/src/sbin/dump/traverse.c:474
> > 474 if (TSTINO(dp->d_ino, dumpinomap)) {
> > (gdb) info locals
> > dblk = 0x3633633264006136 
> > ret = 0
> > loc = 29445
> > mode = Variable "mode" is not available.
> > (gdb) p *dblk
> > Cannot access memory at address 0x3633633264006136
> > (gdb) p dp
> > No symbol "dp" in current context.
> > (gdb) p ip
> > Variable "ip" is not available.
> > (gdb) p ino
> > $7 = 7946491
> > (gdb) p blkno
> > Unhandled dwarf expression opcode 0xa3
> > (gdb) p size
> > $8 = 186
> > (gdb) p filesize
> > Unhandled dwarf expression opcode 0xa3
> > (gdb) p tapesize
> > Variable "tapesize" is not available.
> > (gdb) p nodump
> > $9 = 0
> > (gdb) p dp
> > No symbol "dp" in current context.
> > (gdb) p ip
> > Variable "ip" is not available.
> > (gdb) p loc
> > $10 = 29445
> > (gdb) p dblk
> > $11 = 0x3633633264006136 
> > (gdb) p mode
> > Variable "mode" is not available.
> > (gdb) p ret
> > $12 = 0
> > (gdb)
> > 
> > But, admittedly, I'm not very familiar with the code.
> > So I can only guess. Too large offset? Too less memory? ...
> > 
> > On 17.12.23 17:27, Otto Moerbeek wrote:
> > > Rebuild with
> > > cd /usr/src/sbin/dump
> > > make obj
> > > make clean
> > > DEBUG=-g make
> > > 
> > > And then run gdb again.
> 
> Install gdb from packages and then run egdb. It understands moden
> DWARF expresssions better.
> The value of size is also interesting.

I meant the global mapsize. And while there, dumpinomap is interesting
as well.

Can you also show 

$ doas dumpfs /dev/rsd0a |head -20 

(replacing rsd0a with your device plus the exact dump command line you are 
using?

-Otto

> 
> Also: is your filesystem clean? To be sure, unmount and run fsck on it. 
> 
>   -Otto
> 



Re: dump(8) SEGVs in searchdir() on 7.4-stable

2023-12-17 Thread Otto Moerbeek
On Sun, Dec 17, 2023 at 06:55:27PM +0100, Alexander Klimov wrote:

> Much better!
> 
> Program terminated with signal 11, Segmentation fault.
> #0  0x08e920ed287f in searchdir (ino=7946491, blkno=Unhandled dwarf 
> expression opcode 0xa3
> )
> at /usr/src/sbin/dump/traverse.c:474
> 474   if (TSTINO(dp->d_ino, dumpinomap)) {
> (gdb) info locals
> dblk = 0x3633633264006136 
> ret = 0
> loc = 29445
> mode = Variable "mode" is not available.
> (gdb) p *dblk
> Cannot access memory at address 0x3633633264006136
> (gdb) p dp
> No symbol "dp" in current context.
> (gdb) p ip
> Variable "ip" is not available.
> (gdb) p ino
> $7 = 7946491
> (gdb) p blkno
> Unhandled dwarf expression opcode 0xa3
> (gdb) p size
> $8 = 186
> (gdb) p filesize
> Unhandled dwarf expression opcode 0xa3
> (gdb) p tapesize
> Variable "tapesize" is not available.
> (gdb) p nodump
> $9 = 0
> (gdb) p dp
> No symbol "dp" in current context.
> (gdb) p ip
> Variable "ip" is not available.
> (gdb) p loc
> $10 = 29445
> (gdb) p dblk
> $11 = 0x3633633264006136 
> (gdb) p mode
> Variable "mode" is not available.
> (gdb) p ret
> $12 = 0
> (gdb)
> 
> But, admittedly, I'm not very familiar with the code.
> So I can only guess. Too large offset? Too less memory? ...
> 
> On 17.12.23 17:27, Otto Moerbeek wrote:
> > Rebuild with
> > cd /usr/src/sbin/dump
> > make obj
> > make clean
> > DEBUG=-g make
> > 
> > And then run gdb again.

Install gdb from packages and then run egdb. It understands moden
DWARF expresssions better.
The value of size is also interesting.

Also: is your filesystem clean? To be sure, unmount and run fsck on it. 

-Otto



Re: repeated NTP peers in OpenNTPD

2023-12-17 Thread Otto Moerbeek
On Fri, Dec 15, 2023 at 02:54:19PM +0100, Otto Moerbeek wrote:

> On Sun, Dec 10, 2023 at 09:57:08AM +0100, Otto Moerbeek wrote:
> 
> > On Fri, Dec 01, 2023 at 09:18:32PM +, guilherme.janc...@yandex.com 
> > wrote:
> > 
> > > >Synopsis:Repeated NTP peers in OpenNTPD
> > > >Category:user
> > > >Environment:
> > >   System  : OpenBSD 7.4
> > >   Details : OpenBSD 7.4 (GENERIC.MP) #0: Sun Oct 22 12:13:42 MDT 2023
> > >
> > > r...@syspatch-74-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > 
> > >   Architecture: OpenBSD.amd64
> > >   Machine : amd64
> > > >Description:
> > >   If the same address/domain is specified multiple times in
> > >   OpenNTPD's configuration file, or if multiple domains resolve
> > >   to the same IP address, OpenNTPD will treat the same IP address
> > >   as if it was multiple peers.
> > > >How-To-Repeat:
> > >   This can be tested by appending `server 127.0.0.1` multiple
> > >   times to the configuration file.
> > > 
> > >   Alternatively, assuming a default OpenNTPD configuration file
> > >   from OpenBSD 7.4, the following entries can be added to
> > >   /etc/hosts:
> > >   127.0.0.1   time.cloudflare.com
> > >   127.0.0.1   pool.ntp.org
> > > 
> > >   I noticed this bug using the default 7.4 configuration file. It
> > >   can happen because time.cloudflare.com is part of pool.ntp.org:
> > >   https://www.ntppool.org/scores/162.159.200.1
> > >   https://www.ntppool.org/scores/162.159.200.123
> > > >Fix:
> > >   Removing the `server time.cloudflare.com` line from the
> > >   configuration file is a simple fix the user can make, but
> > >   OpenNTPD should check if an IP address it tries to add to the
> > >   list of peers is already a peer, and ignore it if so. If a
> > >   server is added with the `server` (not `servers`) keyword in the
> > >   configuration file, OpenNTPD should try the next IP the domain
> > >   resolves to if applicable.
> > > 
> > 
> > Thanks for the report, I'll take a look.
> > 
> > -Otto
> > 
> 
> Due to verious reasons this is all a bit complicated, I did not find a
> nice solution yet. Some patience required.
> 
>   -Otto
> 

Found some more time. Try this. The approach is: we assume the
addresses of a pool (from the "servers" keyword) vary over time. So if
one of the pool addresses is in the address list of a peer ("server")
we skip it ad try to re-resolve the pool, lookin for a new address, as
we already did earlier when a pool member doe snot respond.

I decided to not implement "the move to abother address of the list"
in the "server" case. The address logic is already complex enough.

-Otto

Index: client.c
===
RCS file: /home/cvs/src/usr.sbin/ntpd/client.c,v
diff -u -p -r1.117 client.c
--- client.c24 Mar 2022 07:37:19 -  1.117
+++ client.c17 Dec 2023 09:13:16 -
@@ -108,6 +108,7 @@ client_nextaddr(struct ntp_peer *p)
return (-1);
 
if (p->addr_head.a == NULL) {
+   log_debug("kicking query for %s",  p->addr_head.name);
priv_dns(IMSG_HOST_DNS, p->addr_head.name, p->id);
p->state = STATE_DNS_INPROGRESS;
return (-1);
@@ -145,7 +146,15 @@ client_query(struct ntp_peer *p)
}
 
if (p->state < STATE_DNS_DONE || p->addr == NULL)
-   return (-1);
+   return (0);
+
+   /* if we're a pool member and a peer has taken our address, signal */
+   if (p->addr_head.pool != 0 && addr_already_used(&p->addr->ss)) {
+   log_debug("pool addr %s used by peer, will reresolve pool",
+   log_sockaddr((struct sockaddr *)&p->addr->ss));
+   p->senderrors++;
+   return (0);
+   }
 
if (p->query.fd == -1) {
struct sockaddr *sa = (struct sockaddr *)&p->addr->ss;
Index: ntp.c
===
RCS file: /home/cvs/src/usr.sbin/ntpd/ntp.c,v
diff -u -p -r1.171 ntp.c
--- ntp.c   6 Dec 2023 15:51:53 -   1.171
+++ ntp.c   17 Dec 2023 09:13:16 -
@@ -54,8 +54,6 @@ int   ntp_dispatch_imsg(void);
 intntp_dispatch_imsg_dns(void);
 void   peer_add(struct ntp_peer *);
 void   peer_remove(struct ntp_peer *);
-intinpool(struct sockaddr_storage *,
-

Re: dump(8) SEGVs in searchdir() on 7.4-stable

2023-12-17 Thread Otto Moerbeek
On Sun, Dec 17, 2023 at 02:33:18PM +0100, Alexander Klimov wrote:

> Hello devs!
> 
> This year dump(8) already crashed three times.
> Fortunately that produced core dumps.
> But unfortunately debugging symbols are missing:
> 
> tower# find /raid1/backups/tower -name dump.core
> /raid1/backups/tower/2023/05/26/dump.core
> /raid1/backups/tower/2023/09/21/dump.core
> /raid1/backups/tower/2023/12/14/dump.core
> tower# gdb /sbin/dump /raid1/backups/tower/2023/05/26/dump.core
> GNU gdb 6.3
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-unknown-openbsd7.4"...
> (no debugging symbols found)
> 
> 
> warning: exec file is newer than core file.
> Core was generated by `dump'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x05e01d161fef in ?? ()
> (gdb) bt
> #0  0x05e01d161fef in ?? ()
> #1  0x05e01d161b67 in ?? ()
> #2  0x05e01d15cacb in ?? ()
> #3  0x05e01d15ac22 in ?? ()
> #4  0x in ?? ()
> (gdb) quit
> tower# gdb /sbin/dump /raid1/backups/tower/2023/09/21/dump.core
> GNU gdb 6.3
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-unknown-openbsd7.4"...
> (no debugging symbols found)
> 
> 
> warning: exec file is newer than core file.
> Core was generated by `dump'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x0fe3f655dfef in ?? ()
> (gdb) bt
> #0  0x0fe3f655dfef in ?? ()
> #1  0x0fe3f655db67 in ?? ()
> #2  0x0fe3f6558acb in ?? ()
> #3  0x0fe3f6556c22 in ?? ()
> #4  0x in ?? ()
> (gdb) quit
> tower# gdb /sbin/dump /raid1/backups/tower/2023/12/14/dump.core
> GNU gdb 6.3
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-unknown-openbsd7.4"...
> (no debugging symbols found)
> 
> Core was generated by `dump'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x08e920ed287f in ?? ()
> (gdb) bt
> #0  0x08e920ed287f in ?? ()
> #1  0x08e920ed23f7 in ?? ()
> #2  0x08e920ecd2bb in ?? ()
> #3  0x08e920ecb3f2 in ?? ()
> #4  0x in ?? ()
> (gdb) quit
> tower#
> 
> Re-building the userland gave me a /usr/obj/sbin/dump/dump
> with some debug info at least:
> 
> mp/dump /raid1/backups/tower/2023/05/26/dump.core 
> <
> GNU gdb 6.3
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-unknown-openbsd7.4"...
> 
> warning: exec file is newer than core file.
> Core was generated by `dump'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x05e01d161fef in searchdir ()
> (gdb) bt
> #0  0x05e01d161fef in searchdir ()
> #1  0x05e01d161b67 in mapdirs ()
> #2  0x05e01d15cacb in main ()
> (gdb) info line
> No line number information available.
> (gdb) quit
> /backups/tower/2023/09/21/dump.core   
> <
> GNU gdb 6.3
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-unknown-openbsd7.4"...
> 
> warning: exec file is newer than core file.
> Core was generated by `dump'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x0fe3f655dfef in searchdir ()
> (gdb) bt
> #0  0x0fe3f655dfef in searchdir ()
> #1  0x0fe3f655db67 in mapdirs ()
> #2  0x0fe3f6558acb in main ()
> (gdb) info line
> No line number information available.
> (gdb) quit
> /backups/tower/2023/12/14/dump.core   
> <
> GNU gdb 6.3
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU Ge

Re: repeated NTP peers in OpenNTPD

2023-12-15 Thread Otto Moerbeek
On Sun, Dec 10, 2023 at 09:57:08AM +0100, Otto Moerbeek wrote:

> On Fri, Dec 01, 2023 at 09:18:32PM +, guilherme.janc...@yandex.com wrote:
> 
> > >Synopsis:  Repeated NTP peers in OpenNTPD
> > >Category:  user
> > >Environment:
> > System  : OpenBSD 7.4
> > Details : OpenBSD 7.4 (GENERIC.MP) #0: Sun Oct 22 12:13:42 MDT 2023
> >  
> > r...@syspatch-74-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> > If the same address/domain is specified multiple times in
> > OpenNTPD's configuration file, or if multiple domains resolve
> > to the same IP address, OpenNTPD will treat the same IP address
> > as if it was multiple peers.
> > >How-To-Repeat:
> > This can be tested by appending `server 127.0.0.1` multiple
> > times to the configuration file.
> > 
> > Alternatively, assuming a default OpenNTPD configuration file
> > from OpenBSD 7.4, the following entries can be added to
> > /etc/hosts:
> > 127.0.0.1   time.cloudflare.com
> > 127.0.0.1   pool.ntp.org
> > 
> > I noticed this bug using the default 7.4 configuration file. It
> > can happen because time.cloudflare.com is part of pool.ntp.org:
> > https://www.ntppool.org/scores/162.159.200.1
> > https://www.ntppool.org/scores/162.159.200.123
> > >Fix:
> > Removing the `server time.cloudflare.com` line from the
> > configuration file is a simple fix the user can make, but
> > OpenNTPD should check if an IP address it tries to add to the
> > list of peers is already a peer, and ignore it if so. If a
> > server is added with the `server` (not `servers`) keyword in the
> > configuration file, OpenNTPD should try the next IP the domain
> > resolves to if applicable.
> > 
> 
> Thanks for the report, I'll take a look.
> 
>   -Otto
> 

Due to verious reasons this is all a bit complicated, I did not find a
nice solution yet. Some patience required.

-Otto



Re: repeated NTP peers in OpenNTPD

2023-12-10 Thread Otto Moerbeek
On Fri, Dec 01, 2023 at 09:18:32PM +, guilherme.janc...@yandex.com wrote:

> >Synopsis:Repeated NTP peers in OpenNTPD
> >Category:user
> >Environment:
>   System  : OpenBSD 7.4
>   Details : OpenBSD 7.4 (GENERIC.MP) #0: Sun Oct 22 12:13:42 MDT 2023
>
> r...@syspatch-74-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   If the same address/domain is specified multiple times in
>   OpenNTPD's configuration file, or if multiple domains resolve
>   to the same IP address, OpenNTPD will treat the same IP address
>   as if it was multiple peers.
> >How-To-Repeat:
>   This can be tested by appending `server 127.0.0.1` multiple
>   times to the configuration file.
> 
>   Alternatively, assuming a default OpenNTPD configuration file
>   from OpenBSD 7.4, the following entries can be added to
>   /etc/hosts:
>   127.0.0.1   time.cloudflare.com
>   127.0.0.1   pool.ntp.org
> 
>   I noticed this bug using the default 7.4 configuration file. It
>   can happen because time.cloudflare.com is part of pool.ntp.org:
>   https://www.ntppool.org/scores/162.159.200.1
>   https://www.ntppool.org/scores/162.159.200.123
> >Fix:
>   Removing the `server time.cloudflare.com` line from the
>   configuration file is a simple fix the user can make, but
>   OpenNTPD should check if an IP address it tries to add to the
>   list of peers is already a peer, and ignore it if so. If a
>   server is added with the `server` (not `servers`) keyword in the
>   configuration file, OpenNTPD should try the next IP the domain
>   resolves to if applicable.
> 

Thanks for the report, I'll take a look.

-Otto



Re: ntpd constraint don't use v6 when v6 is only there after ntpd start-up

2023-12-05 Thread Otto Moerbeek
On Thu, Nov 30, 2023 at 11:43:12AM +0100, Otto Moerbeek wrote:

> On Wed, Nov 29, 2023 at 07:43:57PM +0100, Otto Moerbeek wrote:
> 
> > On Wed, Nov 29, 2023 at 11:57:15AM +0100, Otto Moerbeek wrote:
> > 
> > > On Wed, Nov 29, 2023 at 08:49:55AM +0100, Otto Moerbeek wrote:
> > > 
> > > > On Tue, Nov 28, 2023 at 04:19:07PM +0100, Paul de Weerd wrote:
> > > > 
> > > > > Hi all,
> > > > > 
> > > > > I have a few APU's I'm using to experiment with some stuff.  I found 
> > > > > all
> > > > > of them unable to sync with NTP because they don't have IPv4
> > > > > connectivity to the outside world.
> > > > > 
> > > > > Digging a bit deeper, it turns out that v6 is only configured after
> > > > > ntpd is started.  This means the constraints cannot be reached (ntpd
> > > > > logs "constraints configured but none available").  Even if v6 becomes
> > > > > available (shortly after) ntpd is started, ntpd still refuses to try
> > > > > to connect to the constraints over IPv6.
> > > > > 
> > > > > Simply restarting ntpd when an IPv6 address is configured makes
> > > > > everything go again: the constraint servers can be reached, so those
> > > > > are checked, and then the regular NTP servers also work fine.
> > > > > 
> > > > > Address configuration is dynamic:
> > > > > 
> > > > > --- cat /etc/hostname.em0 
> > > > > up
> > > > > inet autoconf
> > > > > inet6 autoconf
> > > > > --
> > > > > 
> > > > > I have confirmed the behaviour by removing all config from the
> > > > > interface, stopping ntpd and then bringing up a v4 address (ifconfig
> > > > > em0 inet autoconf), starting ntpd and bringing up a v6 address
> > > > > (ifconfig em0 inet6 autoconf).  ntpd never connects to the constraint
> > > > > servers, despite having a v6 address (and the constraint servers have
> > > > >  records, obviously).  Again, restarting ntpd when a v6 address is
> > > > > configured gets things going: constraint servers are reached just
> > > > > fine, and time is adjusted according to NTP.
> > > > > 
> > > > > Paul 'WEiRD' de Weerd
> > > > 
> > > > I'll see if I can find the root cuase of this.
> > > > 
> > > > -Otto
> > > > 
> > > 
> > > 
> > > So I tried a couple of configs--all with a v6 address coming up late--
> > > with both no v4 at all and v4 but not working, but in all cases
> > > (though it may take a while) the contrainst *did* use v6 addresses,
> > > both for the hardcoded case and retrieved via DNS case.
> > > 
> > > So I like to see your config and also -vv log files to figure out
> > > what's different in your setup.
> > > 
> > >   -Otto
> > > 
> > 
> > With your config detail i managed to reproduce.
> > 
> > What is happening is that the initial constraint DNS info which does
> > not include v6 info gets re-used. The diff below resets the constraint
> > DNS info immediately after first use and then periodically (but only
> > after all constraint queries have been done). For constraints we do no
> > want to stick to a DNS resolve result too long anyway.
> > 
> > For NTP peers it worked already, since they redo DNS after they cycled
> > though the list of available addresses.
> > 
> > I'm doing some more tests, but here's the diff I'm using.
> > 
> > -Otto
> > 
> 
> Updated diff, previous diff has the effect that conststraints would
> continue to be requested. This one only does that for constraints that
> did not reply. Also including a few nits.
> 
> Please test,
> 
>   -Otto

Paul tested and is happy, I'd like to get a review.

-Otto

> 
> Index: constraint.c
> ===
> RCS file: /home/cvs/src/usr.sbin/ntpd/constraint.c,v
> diff -u -p -r1.54 constraint.c
> --- constraint.c  27 Nov 2022 13:19:00 -  1.54
> +++ constraint.c  30 Nov 2023 10:40:34 -
> @@ -554,7 +554,6 @@ constraint_close(u_int32_t id)
>   return (1);
> 

Re: makefs: sporadic segfaults with FAT32

2023-11-30 Thread Otto Moerbeek
On Fri, Dec 01, 2023 at 05:59:27AM +, Klemens Nanni wrote:

> -current amd64 sometimes dumps core when creating a FAT32 image.
> Minimal reproducer below;  other FS types, sizes or files are stable,
> FAT32 seems to be the culprit.  I don't have time to look into this.
> 
>   $ cd /usr/src/*bin/makefs
>   $ make DEBUG=-g
>   $ mkdir empty/
>   $ until ! ./obj/makefs -t msdos -o fat_type=32 -s 257M ./empty.img 
> ./empty/ ; do true ; done
>   [...]
> 
> Takes a few seconds/retries at most for me.
> 
>   Creating `./empty.img'
>   ./empty.img: 525272 sectors in 65659 FAT32 clusters (4096 bytes/cluster)
>   MBR type: 11
>   bps=512 spc=8 res=32 nft=2 mid=0xf0 spt=63 hds=255 hid=0 bsec=526336 
> bspf=513 rdcl=2 infs=1 bkbs=2
>   Segmentation fault (core dumped) 
> 
>   $ egdb -q ./obj/makefs ./makefs.core -batch -ex bt
>   [New process 372642]
>   Core was generated by `makefs'.
>   Program terminated with signal SIGSEGV, Segmentation fault.
>   #0  0x08b6b4acb899 in msdosfs_mount (devvp=0x7be6c6083870, 
> flags=) at /s/usr.sbin/makefs/msdos/msdosfs_vfsops.c:287
>   287 && !memcmp(fp->fsisig4, "\0\0\125\252", 4))
>   #0  0x08b6b4acb899 in msdosfs_mount (devvp=0x7be6c6083870, 
> flags=) at /s/usr.sbin/makefs/msdos/msdosfs_vfsops.c:287
>   #1  0x08b6b4ac64fb in msdos_makefs (image=0x7be6c6083bcc 
> "./empty.img", dir=0x7be6c6083bdc "./empty/", root=0x8b927f57660, 
> fsopts=0x7be6c60838d0) at /s/usr.sbin/makefs/msdos.c:149
>   #2  0x08b6b4ab6343 in main (argc=2, argv=) at 
> /s/usr.sbin/makefs/makefs.c:211
> 
> It always chokes on fp->fsisig4.
> 

buffer is 512 bytes, struct fsinfo is 1024. I don't know the MSDOS
layout, but pmp->pm_BytesPerSec is probably not right for the bread.

-Otto

#0  0x09b048ddc8d9 in msdosfs_mount (devvp=0x79af007c6050,
flags=) at /usr/src/usr.sbin/makefs/msdos/msdosfs_vfsops.c:287
287 && !memcmp(fp->fsisig4, "\0\0\125\252", 4))
(gdb) print bp
$1 = (struct mkfsbuf *) 0x9b2cf0fcc80
(gdb) print *bp
$2 = {b_data = 0x9b2cf123e00, b_bufsize = 512, b_bcount = 512, b_blkno
= 1, b_lblkno = 1, b_fs = 0x79af007c60b0, b_tailq = {tqe_next = 0x0, 
tqe_prev = 0x9b048de2848 }}
(gdb) list
282 goto error_exit;
283 fp = (struct fsinfo *)bp->b_data;
284 if (!memcmp(fp->fsisig1, "RRaA", 4)
285 && !memcmp(fp->fsisig2, "rrAa", 4)
286 && !memcmp(fp->fsisig3, "\0\0\125\252", 4)
287 && !memcmp(fp->fsisig4, "\0\0\125\252", 4))
288 pmp->pm_nxtfree = getulong(fp->fsinxtfree);
289 else
290 pmp->pm_fsinfo = 0;
291 brelse(bp, 0);
(gdb) ptype /o struct fsinfo
/* offset  |size */  type = struct fsinfo {
/*  0  |   4 */u_int8_t fsisig1[4];
/*  4  | 480 */u_int8_t fsifill1[480];
/*484  |   4 */u_int8_t fsisig2[4];
/*488  |   4 */u_int8_t fsinfree[4];
/*492  |   4 */u_int8_t fsinxtfree[4];
/*496  |  12 */u_int8_t fsifill2[12];
/*508  |   4 */u_int8_t fsisig3[4];
/*512  | 508 */u_int8_t fsifill3[508];
/*   1020  |   4 */u_int8_t fsisig4[4];

   /* total size (bytes): 1024 */
 }



Re: ntpd constraint don't use v6 when v6 is only there after ntpd start-up

2023-11-30 Thread Otto Moerbeek
On Wed, Nov 29, 2023 at 07:43:57PM +0100, Otto Moerbeek wrote:

> On Wed, Nov 29, 2023 at 11:57:15AM +0100, Otto Moerbeek wrote:
> 
> > On Wed, Nov 29, 2023 at 08:49:55AM +0100, Otto Moerbeek wrote:
> > 
> > > On Tue, Nov 28, 2023 at 04:19:07PM +0100, Paul de Weerd wrote:
> > > 
> > > > Hi all,
> > > > 
> > > > I have a few APU's I'm using to experiment with some stuff.  I found all
> > > > of them unable to sync with NTP because they don't have IPv4
> > > > connectivity to the outside world.
> > > > 
> > > > Digging a bit deeper, it turns out that v6 is only configured after
> > > > ntpd is started.  This means the constraints cannot be reached (ntpd
> > > > logs "constraints configured but none available").  Even if v6 becomes
> > > > available (shortly after) ntpd is started, ntpd still refuses to try
> > > > to connect to the constraints over IPv6.
> > > > 
> > > > Simply restarting ntpd when an IPv6 address is configured makes
> > > > everything go again: the constraint servers can be reached, so those
> > > > are checked, and then the regular NTP servers also work fine.
> > > > 
> > > > Address configuration is dynamic:
> > > > 
> > > > --- cat /etc/hostname.em0 
> > > > up
> > > > inet autoconf
> > > > inet6 autoconf
> > > > --
> > > > 
> > > > I have confirmed the behaviour by removing all config from the
> > > > interface, stopping ntpd and then bringing up a v4 address (ifconfig
> > > > em0 inet autoconf), starting ntpd and bringing up a v6 address
> > > > (ifconfig em0 inet6 autoconf).  ntpd never connects to the constraint
> > > > servers, despite having a v6 address (and the constraint servers have
> > > >  records, obviously).  Again, restarting ntpd when a v6 address is
> > > > configured gets things going: constraint servers are reached just
> > > > fine, and time is adjusted according to NTP.
> > > > 
> > > > Paul 'WEiRD' de Weerd
> > > 
> > > I'll see if I can find the root cuase of this.
> > > 
> > >   -Otto
> > > 
> > 
> > 
> > So I tried a couple of configs--all with a v6 address coming up late--
> > with both no v4 at all and v4 but not working, but in all cases
> > (though it may take a while) the contrainst *did* use v6 addresses,
> > both for the hardcoded case and retrieved via DNS case.
> > 
> > So I like to see your config and also -vv log files to figure out
> > what's different in your setup.
> > 
> > -Otto
> > 
> 
> With your config detail i managed to reproduce.
> 
> What is happening is that the initial constraint DNS info which does
> not include v6 info gets re-used. The diff below resets the constraint
> DNS info immediately after first use and then periodically (but only
> after all constraint queries have been done). For constraints we do no
> want to stick to a DNS resolve result too long anyway.
> 
> For NTP peers it worked already, since they redo DNS after they cycled
> though the list of available addresses.
> 
> I'm doing some more tests, but here's the diff I'm using.
> 
>   -Otto
> 

Updated diff, previous diff has the effect that conststraints would
continue to be requested. This one only does that for constraints that
did not reply. Also including a few nits.

Please test,

-Otto

Index: constraint.c
===
RCS file: /home/cvs/src/usr.sbin/ntpd/constraint.c,v
diff -u -p -r1.54 constraint.c
--- constraint.c27 Nov 2022 13:19:00 -  1.54
+++ constraint.c30 Nov 2023 10:40:34 -
@@ -554,7 +554,6 @@ constraint_close(u_int32_t id)
return (1);
}
 
-   /* Go on and try the next resolved address for this constraint */
return (constraint_init(cstr));
 }
 
@@ -927,7 +926,7 @@ httpsdate_init(const char *addr, const c
 * version is based on our wallclock, which may well be inaccurate...
 */
if (!synced) {
-   log_debug("constraints: skipping time in certificate 
validation");
+   log_debug("constraints: using received time in certificate 
validation");
tls_config_insecure_noverifytime(httpsdate->tls_config);
}
 
Index: ntp.c
==

Re: ntpd constraint don't use v6 when v6 is only there after ntpd start-up

2023-11-29 Thread Otto Moerbeek
On Wed, Nov 29, 2023 at 11:57:15AM +0100, Otto Moerbeek wrote:

> On Wed, Nov 29, 2023 at 08:49:55AM +0100, Otto Moerbeek wrote:
> 
> > On Tue, Nov 28, 2023 at 04:19:07PM +0100, Paul de Weerd wrote:
> > 
> > > Hi all,
> > > 
> > > I have a few APU's I'm using to experiment with some stuff.  I found all
> > > of them unable to sync with NTP because they don't have IPv4
> > > connectivity to the outside world.
> > > 
> > > Digging a bit deeper, it turns out that v6 is only configured after
> > > ntpd is started.  This means the constraints cannot be reached (ntpd
> > > logs "constraints configured but none available").  Even if v6 becomes
> > > available (shortly after) ntpd is started, ntpd still refuses to try
> > > to connect to the constraints over IPv6.
> > > 
> > > Simply restarting ntpd when an IPv6 address is configured makes
> > > everything go again: the constraint servers can be reached, so those
> > > are checked, and then the regular NTP servers also work fine.
> > > 
> > > Address configuration is dynamic:
> > > 
> > > --- cat /etc/hostname.em0 
> > > up
> > > inet autoconf
> > > inet6 autoconf
> > > --
> > > 
> > > I have confirmed the behaviour by removing all config from the
> > > interface, stopping ntpd and then bringing up a v4 address (ifconfig
> > > em0 inet autoconf), starting ntpd and bringing up a v6 address
> > > (ifconfig em0 inet6 autoconf).  ntpd never connects to the constraint
> > > servers, despite having a v6 address (and the constraint servers have
> > >  records, obviously).  Again, restarting ntpd when a v6 address is
> > > configured gets things going: constraint servers are reached just
> > > fine, and time is adjusted according to NTP.
> > > 
> > > Paul 'WEiRD' de Weerd
> > 
> > I'll see if I can find the root cuase of this.
> > 
> > -Otto
> > 
> 
> 
> So I tried a couple of configs--all with a v6 address coming up late--
> with both no v4 at all and v4 but not working, but in all cases
> (though it may take a while) the contrainst *did* use v6 addresses,
> both for the hardcoded case and retrieved via DNS case.
> 
> So I like to see your config and also -vv log files to figure out
> what's different in your setup.
> 
>   -Otto
> 

With your config detail i managed to reproduce.

What is happening is that the initial constraint DNS info which does
not include v6 info gets re-used. The diff below resets the constraint
DNS info immediately after first use and then periodically (but only
after all constraint queries have been done). For constraints we do no
want to stick to a DNS resolve result too long anyway.

For NTP peers it worked already, since they redo DNS after they cycled
though the list of available addresses.

I'm doing some more tests, but here's the diff I'm using.

-Otto

Index: ntp.c
===
RCS file: /home/cvs/src/usr.sbin/ntpd/ntp.c,v
diff -u -p -r1.170 ntp.c
--- ntp.c   27 Nov 2022 13:19:00 -  1.170
+++ ntp.c   29 Nov 2023 18:31:23 -
@@ -75,6 +75,7 @@ ntp_main(struct ntpd_conf *nconf, struct
int  nullfd, pipe_dns[2], idx_clients;
int  ctls;
int  fd_ctl;
+   int  clear_cdns;
u_intpfd_elms = 0, idx2peer_elms = 0;
u_intlistener_cnt, new_cnt, sent_cnt, trial_cnt;
u_intctl_cnt;
@@ -89,7 +90,7 @@ ntp_main(struct ntpd_conf *nconf, struct
struct stat  stb;
struct ctl_conn *cc;
time_t   nextaction, last_sensor_scan = 0, now;
-   time_t   last_action = 0, interval;
+   time_t   last_action = 0, interval, last_cdns_reset = 0;
void*newp;
 
if (socketpair(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, PF_UNSPEC,
@@ -326,9 +327,11 @@ ntp_main(struct ntpd_conf *nconf, struct
(peer_cnt == 0 && sensors_cnt == 0)))
priv_settime(0, "no valid peers configured");
 
+   clear_cdns = 1;
TAILQ_FOREACH(cstr, &conf->constraints, entry) {
-   if (constraint_query(cstr, conf->status.synced) == -1)
-   continue;
+   constraint_

Re: ntpd constraint don't use v6 when v6 is only there after ntpd start-up

2023-11-29 Thread Otto Moerbeek
On Wed, Nov 29, 2023 at 08:49:55AM +0100, Otto Moerbeek wrote:

> On Tue, Nov 28, 2023 at 04:19:07PM +0100, Paul de Weerd wrote:
> 
> > Hi all,
> > 
> > I have a few APU's I'm using to experiment with some stuff.  I found all
> > of them unable to sync with NTP because they don't have IPv4
> > connectivity to the outside world.
> > 
> > Digging a bit deeper, it turns out that v6 is only configured after
> > ntpd is started.  This means the constraints cannot be reached (ntpd
> > logs "constraints configured but none available").  Even if v6 becomes
> > available (shortly after) ntpd is started, ntpd still refuses to try
> > to connect to the constraints over IPv6.
> > 
> > Simply restarting ntpd when an IPv6 address is configured makes
> > everything go again: the constraint servers can be reached, so those
> > are checked, and then the regular NTP servers also work fine.
> > 
> > Address configuration is dynamic:
> > 
> > --- cat /etc/hostname.em0 
> > up
> > inet autoconf
> > inet6 autoconf
> > --
> > 
> > I have confirmed the behaviour by removing all config from the
> > interface, stopping ntpd and then bringing up a v4 address (ifconfig
> > em0 inet autoconf), starting ntpd and bringing up a v6 address
> > (ifconfig em0 inet6 autoconf).  ntpd never connects to the constraint
> > servers, despite having a v6 address (and the constraint servers have
> >  records, obviously).  Again, restarting ntpd when a v6 address is
> > configured gets things going: constraint servers are reached just
> > fine, and time is adjusted according to NTP.
> > 
> > Paul 'WEiRD' de Weerd
> 
> I'll see if I can find the root cuase of this.
> 
>   -Otto
> 


So I tried a couple of configs--all with a v6 address coming up late--
with both no v4 at all and v4 but not working, but in all cases
(though it may take a while) the contrainst *did* use v6 addresses,
both for the hardcoded case and retrieved via DNS case.

So I like to see your config and also -vv log files to figure out
what's different in your setup.

-Otto



Re: ntpd constraint don't use v6 when v6 is only there after ntpd start-up

2023-11-28 Thread Otto Moerbeek
On Tue, Nov 28, 2023 at 04:19:07PM +0100, Paul de Weerd wrote:

> Hi all,
> 
> I have a few APU's I'm using to experiment with some stuff.  I found all
> of them unable to sync with NTP because they don't have IPv4
> connectivity to the outside world.
> 
> Digging a bit deeper, it turns out that v6 is only configured after
> ntpd is started.  This means the constraints cannot be reached (ntpd
> logs "constraints configured but none available").  Even if v6 becomes
> available (shortly after) ntpd is started, ntpd still refuses to try
> to connect to the constraints over IPv6.
> 
> Simply restarting ntpd when an IPv6 address is configured makes
> everything go again: the constraint servers can be reached, so those
> are checked, and then the regular NTP servers also work fine.
> 
> Address configuration is dynamic:
> 
> --- cat /etc/hostname.em0 
> up
> inet autoconf
> inet6 autoconf
> --
> 
> I have confirmed the behaviour by removing all config from the
> interface, stopping ntpd and then bringing up a v4 address (ifconfig
> em0 inet autoconf), starting ntpd and bringing up a v6 address
> (ifconfig em0 inet6 autoconf).  ntpd never connects to the constraint
> servers, despite having a v6 address (and the constraint servers have
>  records, obviously).  Again, restarting ntpd when a v6 address is
> configured gets things going: constraint servers are reached just
> fine, and time is adjusted according to NTP.
> 
> Paul 'WEiRD' de Weerd

I'll see if I can find the root cuase of this.

-Otto



Re: getsockname() reports more bytes in the socket address buffer than what actually exists

2023-11-26 Thread Otto Moerbeek
On Sun, Nov 26, 2023 at 07:12:47PM -0800, Dev Email wrote:

> >Sypnosis: getsockname() reports more information in the socket address
> buffer than what actually exists
> 
> >Category: library
> 
> >Environment:
> 
>     System  : OpenBSD 7.4
>     Details : OpenBSD 7.4 (GENERIC.MP) #1397: Tue Oct 10 09:02:37
> MDT 2023
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>     Architecture: OpenBSD.amd64
>     Machine : amd64
> 
> >Description:
> 
> When calling the "getsockname()" standard C library call, it returns more
> bytes than the actual length in the
> 
> resulting socket path. This results in sockets named, for example,
> "socket123", being represented instead as
> 
> "socket123\0\0\0\0\0\0\0...". The problem appears to stem from the C library
> implementation setting the socket
> 
> name to the maximum length allowed by the socket address rather than the
> actual name. This is not a problem
> 
> if you use traditional null-terminated C strings, but becomes a problem in
> other languages which use the returned
> 
> length to determine how to use the string.
> 
> 
> This was found during the course of investigating this issue:
> https://github.com/rust-lang/rust/issues/116523
> 
> >How-To-Repeat:
> 
> I was able to repeat the bug using this C program:
> 
> ---
> 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> int main(void) {
>   int server_sock, rc, len, i;
>   struct sockaddr_un address_in, address_out;
>   char address_name[32] = { 0 };
>   char c;
> 
>   server_sock = socket(AF_UNIX, SOCK_STREAM, 0);
>   if (server_sock == -1) {
>     puts("failed to open socket");
>     exit(1);
>   }
> 
>   address_in.sun_family = AF_UNIX;
>   strcpy(address_in.sun_path, "/tmp/socket123");
>   len = strlen("/tmp/socket123") + sizeof(address_in) -
> sizeof(address_in.sun_path);
> 
>   rc = bind(server_sock, &address_in, len);
>   if (rc == -1) {
>     puts("failed to bind to socket");
>     exit(1);
>   }
> 
>   len = 1;
>   rc = getsockname(server_sock, &address_out, &len);
>   if (rc == -1) {
>     puts("failed to read socket name");
>     exit(1);
>   }
> 
>   len -= (sizeof(address_out) - sizeof(address_out.sun_path));
>   printf("socket address length is %d\n", len);
> 
>   for (i = 0; i < len; i++) {
>     c = address_out.sun_path[i];
>     if (c >= 33 && c <= 126) {
>   printf("%c", c);
>     } else {
>   printf("\\%d", c);
>     }
>   }
>   printf("\n");
> 
>   close(server_sock);
>   return 0;
> }
> 
> ---
> 
> When run on a Linux machine, it produces this output:
> 
> ---
> 
> $ cc bug.c -o bug
> 
> $ ./bug
> 
> socket address length is 15
> /tmp/socket123\0
> 
> ---
> 
> I receive the same output on FreeBSD and NetBSD.
> 
> When run on OpenSBD 7.4, it produces this output:
> 
> ---
> 
> $ cc bug.c -o bug
> 
> $ ./bug
> 
> socket address length is 104
> /tmp/socket123\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
> 
> \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
> 
> 
> ---
> 
> Ideally, OpenBSD would return the same output.
> 
> >Fix:
> 
> This could be fixed by returning the actual length of the socket address in
> the "len" variable
> 
> in the "getsockname" function.
> 
> 
> I don't believe that dmesg is relevant for this bug; I can provide it if
> requested.
> 

I'm not 10%% convinced it is a bug, there's is something to be said
for setting len to sizeof(struct sockaddr_un).

getsockname is underspecified in Posix Unix domain sockets, the other
address families use fixed size structs.

FreeBSD's man page even says it is unsupported for Unix domain
sockets. So I'm undecided, other devs might have stronger opinions.

-Otto



Re: vxlan(4) custom destination UDP port seems not working

2023-11-15 Thread Otto Moerbeek
On Wed, Nov 15, 2023 at 12:42:46PM +0100, Luca Di Gregorio wrote:

> # uname -a
> OpenBSD X.my.domain 7.4 GENERIC#0 amd64
> 
> # ifconfig vxlan0 tunnel SOURCE_IP DEST_IP:8472 vnetid 5
> # ifconfig vxlan0 inet 192.168.5.1/30
> # ifconfig vxlan0 up
> 
>  # ifconfig vxlan0: I can't see the dest UDP port 8472 anywhere
> vxlan0: flags=8843 mtu 1500
> lladdr fe:e1:ba:d9:e4:0b
> index 18 llprio 3
> encap: vnetid 5 parent none txprio 0 rxprio outer
> groups: vxlan
> tunnel: inet  SOURCE_IP -->  DEST_IP  ttl 1 nodf
> Addresses (max cache: 100, timeout: 240):
> inet 192.168.5.1 netmask 0xfffc broadcast 192.168.5.3
> 
> # ping 192.168.5.2
> 
> In tcpdump, I see that arp packets are sent to UDP port 4789, not 8472:
> SOURCE_IP.4789 >  DEST_IP.4789: VXLAN vni 5: arp who-has 192.168.5.2 tell
> 192.168.5.1 [ttl 1]
> 
> Is this a bug?

It helps to read the vxlan(4) manpage, specifcially the paragraph abouts ports.

-Otto



Re: installer should work around ocsp failure

2023-10-28 Thread Otto Moerbeek
On Sat, Oct 28, 2023 at 12:00:29AM +0100, Stuart Henderson wrote:

> On 2023/10/27 13:15, Theo de Raadt wrote:
> > > It occurred to me later maybe the clock was off? 
> > 
> > Oh now people want a ntp client on the installer??!?!
> 
> Don't we already have a tempprary time sync in the installer from
> ftplist for exactly this situation?
> 

yes, but it's done after fetching sets

-Otto



Re: openbsd74/arm64 kernel panic on m2

2023-10-18 Thread Otto Moerbeek
On Wed, Oct 18, 2023 at 03:24:08PM +0530, Evgeniy Kozhuhovskiy wrote:

> Hello!
> 
> I just installed OpenBSD 7.4 on my M2 Air 13 (Model Number: Z15W000KZRU/A),
> but it would not boot.
> 
> I'm attaching photo of kernel panic. Unfortunately, keyboard is not working
> in ddb.[image: IMG_2604.jpg]
> 
> -- 
> With best regards, Evgeniy Kozhuhovskiy

Screenshots are often ignored and at least provide a barrier to get
answers.

If you provide the panic message and backtrace in the mail, he right
people might get triggered and text searches works for others having
the same issue.

-Otto



Re: patch crash related to remove_special_lines

2023-07-12 Thread Otto Moerbeek
On Tue, Jul 11, 2023 at 09:13:38PM +0200, Theo Buehler wrote:

> On Tue, Jul 11, 2023 at 08:35:31PM +0200, Theo Buehler wrote:
> > On Tue, Jul 11, 2023 at 02:32:48PM +0200, Theo Buehler wrote:
> > > On Tue, Jul 11, 2023 at 11:48:57AM +0100, Stuart Henderson wrote:
> > > > I ran into a segfault with patch(1) in a port, here's a test case with a
> > > > minimal reproducer.
> > > > 
> > > > $ echo foo > test
> > > > $ perl -e 'print "--- test.orig\n+++ test\n@@ -1,1 +1,2 @@\n foo\n+" . 
> > > > 'x' x 32768 . "\n\\ No newline at end of file\n"' > test.patch
> > > 
> > > patch maintains the line lengths in an array of shorts p_len[] and
> > > doesn't check for overflows. This long line overflows the length, so
> > > you get a bad buffer underrun when doing 's[p_len[filldst - 1]] = 0;'
> > 
> > The below appears to fix this and passes regress. It won't be able to
> > apply the binary patch, but it should no longer segfault.
> 
> More complete diff, thanks otto

The test case sthen posted applies now. I did not check in detail if the
result is right though,

OK, 

-Otto
> 
> Index: patch.c
> ===
> RCS file: /cvs/src/usr.bin/patch/patch.c,v
> retrieving revision 1.71
> diff -u -p -r1.71 patch.c
> --- patch.c   3 Aug 2022 07:30:37 -   1.71
> +++ patch.c   11 Jul 2023 19:10:55 -
> @@ -99,7 +99,7 @@ static void copy_till(LINENUM, bool);
>  static void  spew_output(void);
>  static void  dump_line(LINENUM, bool);
>  static bool  patch_match(LINENUM, LINENUM, LINENUM);
> -static bool  similar(const char *, const char *, int);
> +static bool  similar(const char *, const char *, ssize_t);
>  static __dead void usage(void);
>  
>  /* true if -E was specified on command line.  */
> @@ -1012,7 +1012,7 @@ patch_match(LINENUM base, LINENUM offset
>   LINENUM pat_lines = pch_ptrn_lines() - fuzz;
>   const char  *ilineptr;
>   const char  *plineptr;
> - short   plinelen;
> + ssize_t plinelen;
>  
>   for (iline = base + offset + fuzz; pline <= pat_lines; pline++, 
> iline++) {
>   ilineptr = ifetch(iline, offset >= 0);
> @@ -1048,7 +1048,7 @@ patch_match(LINENUM base, LINENUM offset
>   * Do two lines match with canonicalized white space?
>   */
>  static bool
> -similar(const char *a, const char *b, int len)
> +similar(const char *a, const char *b, ssize_t len)
>  {
>   while (len) {
>   if (isspace((unsigned char)*b)) { /* whitespace (or \n) to 
> match? */
> Index: pch.c
> ===
> RCS file: /cvs/src/usr.bin/patch/pch.c,v
> retrieving revision 1.63
> diff -u -p -r1.63 pch.c
> --- pch.c 26 Dec 2022 19:16:02 -  1.63
> +++ pch.c 11 Jul 2023 19:08:06 -
> @@ -56,7 +56,7 @@ static LINENUM  p_end = -1; /* last line 
>  static LINENUM   p_max;  /* max allowed value of p_end */
>  static LINENUM   p_context = 3;  /* # of context lines */
>  static char  **p_line = NULL;/* the text of the hunk */
> -static short *p_len = NULL;  /* length of each line */
> +static ssize_t   *p_len = NULL;  /* length of each line */
>  static char  *p_char = NULL; /* +, -, and ! */
>  static int   hunkmax = INITHUNKMAX;  /* size of above arrays to begin with */
>  static int   p_indent;   /* indent to patch */
> @@ -127,7 +127,7 @@ set_hunkmax(void)
>   if (p_line == NULL)
>   p_line = calloc((size_t) hunkmax, sizeof(char *));
>   if (p_len == NULL)
> - p_len = calloc((size_t) hunkmax, sizeof(short));
> + p_len = calloc((size_t) hunkmax, sizeof(ssize_t));
>   if (p_char == NULL)
>   p_char = calloc((size_t) hunkmax, sizeof(char));
>  }
> @@ -140,7 +140,7 @@ grow_hunkmax(void)
>  {
>   int new_hunkmax;
>   char**new_p_line;
> - short   *new_p_len;
> + ssize_t *new_p_len;
>   char*new_p_char;
>  
>   new_hunkmax = hunkmax * 2;
> @@ -152,7 +152,7 @@ grow_hunkmax(void)
>   if (new_p_line == NULL)
>   free(p_line);
>  
> - new_p_len = reallocarray(p_len, new_hunkmax, sizeof(short));
> + new_p_len = reallocarray(p_len, new_hunkmax, sizeof(ssize_t));
>   if (new_p_len == NULL)
>   free(p_len);
>  
> @@ -1192,7 +1192,7 @@ bool
>  pch_swap(void)
>  {
>   char**tp_line;  /* the text of the hunk */
> - short   *tp_len;/* length of each line */
> + ssize_t *tp_len;/* length of each line */
>   char*tp_char;   /* +, -, and ! */
>   LINENUM i;
>   LINENUM n;
> @@ -1349,7 +1349,7 @@ pch_context(void)
>  /*
>   * Return the length of a particular patch line.
>   */
> -short
> +ssize_t
>  pch_line_len(LINENUM line)
>  {
>   return p_len[line];
> Index: pch.h
> ===
> RCS file: /cvs/src/usr.bin/patch/p

Re: dvmrpd start causes kernel panic: assertion failed

2023-06-11 Thread Otto Moerbeek
On Sun, Jun 11, 2023 at 07:08:47PM +0200, Why 42? The lists account. wrote:

> 
> On Wed, Jun 07, 2023 at 03:50:29PM +0300, Vitaliy Makkoveev wrote:
> > > Please, share your dvmrpd.conf.
> > > 
> > 
> > Also, you could try to use ktrace to provide some additional info.
> 
> Hi,
> 
> Thanks for responding, the system is some 30Km away and, er, crashed.
> But maybe I will get there tomorrow. I wasn't able to get it to react to
> input from the remote KVM system that I was using,
> 
> AFAICR, the dvmrpd.conf just contained a copy of the file from examples,
> with the interface names changed to "em0" and "ure0" i.e. the two "up"
> interfaces on the system (ure being a T-Link USB-Ethernet adaptor).
> 
> Forgive my ignorance, but does this matter? I mean the error looks (to
> me) like an attempt to catch an unexpected set of cirumstances i.e.
> 
>   kernel diagnostic assertion "ident == &nowake || timo || 
> _kernel_lock_held()" failed
>  
> It seems as if none of those three things were true, therefore the
> assertion failed, so we just need to know why it was written in the first
> place and the meaning of those clauses, if you see what I mean?
> 
> Cheers,
> Robb.
> 
> P.S. I was starting the daemon manually via a terminal window, just as
> you suggested.

In general, if a developer asks for information on a setup, it's with
good reason. Details do matter. Just give the information.

-Otto



Re: dhcpd(8) does not offer fixed addresses

2023-04-10 Thread Otto Moerbeek
On Mon, Apr 10, 2023 at 05:20:58PM +0200, Otto Moerbeek wrote:

> On Mon, Apr 10, 2023 at 08:14:49AM -0700, Jillian Alana Bolton wrote:
> 
> > On Mon, Apr 10, 2023 at 09:17:01AM +0200, Otto Moerbeek wrote:
> > > On Thu, Apr 06, 2023 at 12:13:50PM -0700, Jillian Alana Bolton wrote:
> > > 
> > > > 
> > > > >Synopsis:  dhcpd(8) does not offer fixed addresses
> > > > >Category:  user
> > > > >Environment:
> > > > System  : OpenBSD 7.2
> > > > Details : OpenBSD 7.2 (GENERIC) #728: Tue Sep 27 11:49:18 
> > > > MDT 2022
> > > >  
> > > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
> > > > 
> > > > Architecture: OpenBSD.amd64
> > > > Machine : amd64
> > > > >Description:
> > > > I want to offer fixed addresses to some clients in an otherwise
> > > > dynamically assigned network. I tried Following the example at
> > > > https://www.openbsd.org/faq/pf/example1.html#dhcp and the 
> > > > expected
> > > > behavior is that the client with MAC address 52:54:00:84:7f:c9
> > > > would receive a DHCPOFFER for address 191.168.101.21.  Instead,
> > > > the DHCPOFFER specifies 192.168.101.128.
> > > > 
> > > > If I remove the 'range' declaration, the client received no
> > > > DHCPOFFER.
> > > > >How-To-Repeat:
> > > > pwt01-gw1# cat /etc/dhcpd.conf  # THIS IS THE SERVER
> > > > subnet 192.168.101.0 netmask 255.255.255.0 {
> > > > range 192.168.101.128 192.168.101.196;
> > > > host pwt01-web01 {
> > > > fixed-address 191.168.101.20;
> > > > hardware ethernet 52:54:00:6b:b4:3e;
> > > > }
> > > > host pwt01-web02 {
> > > > fixed-address 191.168.101.21;
> > > > hardware ethernet 52:54:00:84:7f:c9;
> > > > }
> > > > }
> > > > 
> > > > pwt01-web02# ifconfig vio0  # THIS IS THE CLIENT
> > > > vio0: 
> > > > flags=808843 mtu 1500
> > > > lladdr 52:54:00:84:7f:c9
> > > > index 1 priority 0 llprio 3
> > > > groups: egress
> > > > media: Ethernet autoselect
> > > > status: active
> > > > pwt01-web02# dhcpleasectl vio0
> > > > .
> > > > vio0 [Bound]
> > > > inet 192.168.101.128 netmask 255.255.255.0
> > > > lease 12 hours
> > > > dhcp server 192.168.101.2
> > > > pwt01-web02#
> > > > 
> > > > >Fix:
> > > >  > > > lines)>
> > > 
> > > This is likely not a bug.  You config looks OK and I have been using a
> > > very similar config for years.
> > > 
> > > Some questions:
> > > 
> > > - Did you restart dhcpd?
> > > - Show the logs of a lease request by pwt01-web02
> > > 
> > >   -Otto
> > 
> > I found the cause late last night!  I have an off-by-one
> > error in the first octet of my fixed-address declarations.
> > 
> > > > subnet 192.168.101.0 netmask 255.255.255.0 {
> > > > range 192.168.101.128 192.168.101.196;
> > > > host pwt01-web01 {
> > > > fixed-address 191.168.101.20;
> >   ^^^
> > > > hardware ethernet 52:54:00:6b:b4:3e;
> > > > }
> > > > host pwt01-web02 {
> > > > fixed-address 191.168.101.21;
> >   ^^^
> > > > hardware ethernet 52:54:00:84:7f:c9;
> > > > }
> > > > }
> > 
> > I did restart dhcpd with: rcctl restart dhcpd
> > 
> > If you still want the logs, do you mean from rsyslogd(8)
> > or something like tcpdump(8)?
> > 
> > Jillian
> 
> /var/log/daemon
> 
>   -Otto
> 

Ah, I see you fixed your issue. No more logs needed.

-Otto



Re: dhcpd(8) does not offer fixed addresses

2023-04-10 Thread Otto Moerbeek
On Mon, Apr 10, 2023 at 08:14:49AM -0700, Jillian Alana Bolton wrote:

> On Mon, Apr 10, 2023 at 09:17:01AM +0200, Otto Moerbeek wrote:
> > On Thu, Apr 06, 2023 at 12:13:50PM -0700, Jillian Alana Bolton wrote:
> > 
> > > 
> > > >Synopsis:dhcpd(8) does not offer fixed addresses
> > > >Category:user
> > > >Environment:
> > >   System  : OpenBSD 7.2
> > >   Details : OpenBSD 7.2 (GENERIC) #728: Tue Sep 27 11:49:18 MDT 2022
> > >
> > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
> > > 
> > >   Architecture: OpenBSD.amd64
> > >   Machine : amd64
> > > >Description:
> > >   I want to offer fixed addresses to some clients in an otherwise
> > >   dynamically assigned network. I tried Following the example at
> > >   https://www.openbsd.org/faq/pf/example1.html#dhcp and the expected
> > >   behavior is that the client with MAC address 52:54:00:84:7f:c9
> > >   would receive a DHCPOFFER for address 191.168.101.21.  Instead,
> > >   the DHCPOFFER specifies 192.168.101.128.
> > > 
> > >   If I remove the 'range' declaration, the client received no
> > >   DHCPOFFER.
> > > >How-To-Repeat:
> > >   pwt01-gw1# cat /etc/dhcpd.conf  # THIS IS THE SERVER
> > >   subnet 192.168.101.0 netmask 255.255.255.0 {
> > >   range 192.168.101.128 192.168.101.196;
> > >   host pwt01-web01 {
> > >   fixed-address 191.168.101.20;
> > >   hardware ethernet 52:54:00:6b:b4:3e;
> > >   }
> > >   host pwt01-web02 {
> > >   fixed-address 191.168.101.21;
> > >   hardware ethernet 52:54:00:84:7f:c9;
> > >   }
> > >   }
> > > 
> > >   pwt01-web02# ifconfig vio0  # THIS IS THE CLIENT
> > >   vio0: flags=808843 
> > > mtu 1500
> > >   lladdr 52:54:00:84:7f:c9
> > >   index 1 priority 0 llprio 3
> > >   groups: egress
> > >   media: Ethernet autoselect
> > >   status: active
> > >   pwt01-web02# dhcpleasectl vio0
> > >   .
> > >   vio0 [Bound]
> > >   inet 192.168.101.128 netmask 255.255.255.0
> > >   lease 12 hours
> > >   dhcp server 192.168.101.2
> > >   pwt01-web02#
> > > 
> > > >Fix:
> > >   
> > 
> > This is likely not a bug.  You config looks OK and I have been using a
> > very similar config for years.
> > 
> > Some questions:
> > 
> > - Did you restart dhcpd?
> > - Show the logs of a lease request by pwt01-web02
> > 
> > -Otto
> 
> I found the cause late last night!  I have an off-by-one
> error in the first octet of my fixed-address declarations.
> 
> > > subnet 192.168.101.0 netmask 255.255.255.0 {
> > > range 192.168.101.128 192.168.101.196;
> > > host pwt01-web01 {
> > > fixed-address 191.168.101.20;
>   ^^^
> > > hardware ethernet 52:54:00:6b:b4:3e;
> > > }
> > > host pwt01-web02 {
> > > fixed-address 191.168.101.21;
>   ^^^
> > > hardware ethernet 52:54:00:84:7f:c9;
> > > }
> > > }
> 
> I did restart dhcpd with: rcctl restart dhcpd
> 
> If you still want the logs, do you mean from rsyslogd(8)
> or something like tcpdump(8)?
> 
> Jillian

/var/log/daemon

-Otto



Re: dhcpd(8) does not offer fixed addresses

2023-04-10 Thread Otto Moerbeek
On Thu, Apr 06, 2023 at 12:13:50PM -0700, Jillian Alana Bolton wrote:

> 
> >Synopsis:dhcpd(8) does not offer fixed addresses
> >Category:user
> >Environment:
>   System  : OpenBSD 7.2
>   Details : OpenBSD 7.2 (GENERIC) #728: Tue Sep 27 11:49:18 MDT 2022
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   I want to offer fixed addresses to some clients in an otherwise
>   dynamically assigned network. I tried Following the example at
>   https://www.openbsd.org/faq/pf/example1.html#dhcp and the expected
>   behavior is that the client with MAC address 52:54:00:84:7f:c9
>   would receive a DHCPOFFER for address 191.168.101.21.  Instead,
>   the DHCPOFFER specifies 192.168.101.128.
> 
>   If I remove the 'range' declaration, the client received no
>   DHCPOFFER.
> >How-To-Repeat:
>   pwt01-gw1# cat /etc/dhcpd.conf  # THIS IS THE SERVER
>   subnet 192.168.101.0 netmask 255.255.255.0 {
>   range 192.168.101.128 192.168.101.196;
>   host pwt01-web01 {
>   fixed-address 191.168.101.20;
>   hardware ethernet 52:54:00:6b:b4:3e;
>   }
>   host pwt01-web02 {
>   fixed-address 191.168.101.21;
>   hardware ethernet 52:54:00:84:7f:c9;
>   }
>   }
> 
>   pwt01-web02# ifconfig vio0  # THIS IS THE CLIENT
>   vio0: flags=808843 
> mtu 1500
>   lladdr 52:54:00:84:7f:c9
>   index 1 priority 0 llprio 3
>   groups: egress
>   media: Ethernet autoselect
>   status: active
>   pwt01-web02# dhcpleasectl vio0
>   .
>   vio0 [Bound]
>   inet 192.168.101.128 netmask 255.255.255.0
>   lease 12 hours
>   dhcp server 192.168.101.2
>   pwt01-web02#
> 
> >Fix:
>   

This is likely not a bug.  You config looks OK and I have been using a
very similar config for years.

Some questions:

- Did you restart dhcpd?
- Show the logs of a lease request by pwt01-web02

-Otto



Re: Error in vi(1) manpage

2023-01-29 Thread Otto Moerbeek
On Sat, Jan 28, 2023 at 07:55:57PM +0100, Tomáš Rippl  wrote:

> System: OpenBSD 7.2
> Architecture: OpenBSD.amd64
> Machine: amd64
> 
> >Description
> There is a bug in vi(1) manpage in VI TEXT INPUT COMMANDS section.
> ^ is said to "Erase all of the autoindent characters, and reset 
> the autoindent level."
> Actually, it does not reset the autoindent level.
> 0 is said to "Erase all of the autoindent characters."
> Actually, it also resets the autoindent level.
> It seems that the sentence part that mentions the autoindent level reset 
> should be moved from ^ to 0

Yes  I think you are right.

-Otto


Index: docs/USD.doc/vi.man/vi.1
===
RCS file: /home/cvs/src/usr.bin/vi/docs/USD.doc/vi.man/vi.1,v
retrieving revision 1.82
diff -u -p -r1.82 vi.1
--- docs/USD.doc/vi.man/vi.122 Apr 2022 21:09:48 -  1.82
+++ docs/USD.doc/vi.man/vi.129 Jan 2023 08:18:19 -
@@ -1593,10 +1593,10 @@ Erase to the previous
 column boundary.
 .Pp
 .It Cm ^ Ns Aq Cm control-D
-Erase all of the autoindent characters, and reset the autoindent level.
+Erase all of the autoindent characters.
 .Pp
 .It Cm 0 Ns Aq Cm control-D
-Erase all of the autoindent characters.
+Erase all of the autoindent characters, and reset the autoindent level.
 .Pp
 .It Aq Cm control-T
 Insert sufficient



Re: EOF handling does not conform to POSIX and termios(4)

2023-01-23 Thread Otto Moerbeek
On Sun, Jan 22, 2023 at 09:20:26PM +0100, Otto Moerbeek wrote:

> On Sun, Jan 22, 2023 at 02:44:47PM +0100, S??ren Tempel wrote:
> 
> > Hi Otto,
> > 
> > Thanks for your fast reply. Remarks below.
> > 
> > Otto Moerbeek  wrote:
> > > Some observations:
> > >
> > > - You are reading from stdout.
> > 
> > Sorry, small mistake that slipped in while reformatting the code for the ML.
> > 
> > > - Since you are reading one char at the time, the moment the EOF is
> > > processed the buffer is empty, so no pending chars. I'll take that as
> > > a valid interpretation of Posix.
> > 
> > I am no expert on the interpretation of the POSIX standard, but Section
> > 11.1.9 explicitly talks about "bytes waiting to be read" and not about
> > bytes in the buffer of the process "the moment the EOF is processed".
> > 
> > > - If you modify your program to read 4 chars at the time (see below),
> > > you will see your expected behaviour.
> > >
> > >   -Otto
> > >
> > > #include 
> > > #include 
> > >
> > > int main(void)
> > > {
> > >   for (;;) {
> > >   char c[4];
> > >   ssize_t r = read(0, c, sizeof(c));
> > >   if (r == 0)
> > >   break; /* EOF */
> > >   printf("read: %zd chars\n", r);
> > >   }
> > > }
> > 
> > While this program works for the input "foo" it does (just like
> > my original program) also terminate on the input "foo1" since
> > the buffer is, once again, empty when the EOF is received. However, POSIX
> > mandates that read(2) should only return a byte count of zero if "the
> > EOF occurred at the beginning of a line" and for the input
> > "foo1" the EOF does not occur at the beginning of the line but
> > OpenBSD read(2) still returns zero.
> > 
> > Greetings,
> > S??ren
> 
> I'd have to check a FreeBSD machine (I can do so tomorrow), but macOS
> follows the same logic as OpenBSD. So this maye vey well be
> established BSD behaviour.
> 
> The text in our man page are almost verbatim copies of the Posix text
> (same for the man pages of macOS and FreeBSD). I do wonder why they do
> not describe the slightly different BSD semantics.
> 
> I'll post the FreeBSD results tomorrow.
> 
>   -Otto
> 

Confirmed, FreeBSD has the same behaviour as macOS and OpenBSD.

-Otto



Re: EOF handling does not conform to POSIX and termios(4)

2023-01-22 Thread Otto Moerbeek
On Sun, Jan 22, 2023 at 02:44:47PM +0100, S??ren Tempel wrote:

> Hi Otto,
> 
> Thanks for your fast reply. Remarks below.
> 
> Otto Moerbeek  wrote:
> > Some observations:
> >
> > - You are reading from stdout.
> 
> Sorry, small mistake that slipped in while reformatting the code for the ML.
> 
> > - Since you are reading one char at the time, the moment the EOF is
> > processed the buffer is empty, so no pending chars. I'll take that as
> > a valid interpretation of Posix.
> 
> I am no expert on the interpretation of the POSIX standard, but Section
> 11.1.9 explicitly talks about "bytes waiting to be read" and not about
> bytes in the buffer of the process "the moment the EOF is processed".
> 
> > - If you modify your program to read 4 chars at the time (see below),
> > you will see your expected behaviour.
> >
> > -Otto
> >
> > #include 
> > #include 
> >
> > int main(void)
> > {
> > for (;;) {
> > char c[4];
> > ssize_t r = read(0, c, sizeof(c));
> > if (r == 0)
> > break; /* EOF */
> > printf("read: %zd chars\n", r);
> > }
> > }
> 
> While this program works for the input "foo" it does (just like
> my original program) also terminate on the input "foo1" since
> the buffer is, once again, empty when the EOF is received. However, POSIX
> mandates that read(2) should only return a byte count of zero if "the
> EOF occurred at the beginning of a line" and for the input
> "foo1" the EOF does not occur at the beginning of the line but
> OpenBSD read(2) still returns zero.
> 
> Greetings,
> S??ren

I'd have to check a FreeBSD machine (I can do so tomorrow), but macOS
follows the same logic as OpenBSD. So this maye vey well be
established BSD behaviour.

The text in our man page are almost verbatim copies of the Posix text
(same for the man pages of macOS and FreeBSD). I do wonder why they do
not describe the slightly different BSD semantics.

I'll post the FreeBSD results tomorrow.

-Otto



Re: EOF handling does not conform to POSIX and termios(4)

2023-01-21 Thread Otto Moerbeek
On Sat, Jan 21, 2023 at 07:25:33PM +0100, S??ren Tempel wrote:

> Hello,
> 
> Section 11.1.9 of POSIX.1-2008 mandates the following behavior regarding
> the handling of the special EOF character:
> 
>   EOF  Special character on input, which is recognized if the
>   ICANON flag is set. When received, all the bytes waiting to be
>   read are immediately passed to the process without waiting for a
>   , and the EOF is discarded. [...] If ICANON is set, the
>   EOF character shall be discarded when processed.
> 
> The "Special Characters" section of the OpenBSD termios(4) man page also
> emphasizes that EOF is discarded if there is pending input. However, it
> seems that this is not correctly implemented on -stable presently.
> Consider the following C program:
> 
>   #include 
>   #include 
> 
>   int main(void) {
>   for (;;) {
>   char c;
> 
>   int r = read(1, &c, sizeof(c));
>   if (r == 0) break; /* EOF */
>   printf("c: %c\n", c);
>   }
>   }
> 
> After compiling and executing this program (in canonical mode input
> processing), enter "foo". According to the text referenced
> above, this should cause the characters 'f', 'o', and 'o' to be printed.
> The EOF should be discarded and hence the program should prompt for more
> input. Over in Linux-land this is exactly what happens, however, on
> OpenBSD 7.2 the EOF is **not** discarded and hence the program
> terminates after printing the aforementioned characters. If my
> understanding of the POSIX standard and the termios(4) man page is
> correct, then OpenBSD's handling of the EOF character is not conforming
> to the specification, and hence I would consider this a bug.
> 
> I ran into this while using GNU ed(1) on OpenBSD where partially entered
> text in input and command mode is not handled correctly on EOF because
> of this bug.
> 
> Greetings,
> S??ren
> 

Some observations:

- You are reading from stdout.

- Since you are reading one char at the time, the moment the EOF is
processed the buffer is empty, so no pending chars. I'll take that as
a valid interpretation of Posix.

- If you modify your program to read 4 chars at the time (see below),
you will see your expected behaviour.

-Otto

#include 
#include 
  
int main(void)
{
for (;;) {
char c[4];
ssize_t r = read(0, c, sizeof(c));
if (r == 0)
break; /* EOF */
printf("read: %zd chars\n", r);
}
  
}
  



Re: acme-client canary corrupted issue

2022-12-14 Thread Otto Moerbeek
On Wed, Dec 14, 2022 at 03:51:44PM +0100, Renaud Allard wrote:

> 
> 
> On 12/14/22 14:44, Theo de Raadt wrote:
> > sysctl kern.nosuidcoredump=3
> > 
> > mkdir /var/crash/acme-client
> > 
> > and then try to reproduce, and see if a core file is delivered there.
> > This coredump mechanism was added to capture some hard-to-capture coredumps,
> > you can see more info in core(5) and sysctl(3)
> > 
> 
> Thanks
> 
> I have been able to reproduce it reliably with the staging API, however,
> there is no core dump generated in /var/crash/acme-client.
> 
> To reproduce it, you need a certificate with alternative names using
> multiple different domains. Generate a cert, then fully remove one of the
> domains and ask for a forced reissue.
> 
> I tried with following Otto patch from today, and it seems it solves the
> issue.

Are you sure you attached the right patch?

-Otto

> 
> Index: acctproc.c
> ===
> RCS file: /cvs/src/usr.sbin/acme-client/acctproc.c,v
> retrieving revision 1.23
> diff -u -p -r1.23 acctproc.c
> --- acctproc.c14 Jan 2022 09:20:18 -  1.23
> +++ acctproc.c14 Dec 2022 11:06:45 -
> @@ -439,6 +439,7 @@ op_sign(int fd, EVP_PKEY *pkey, enum acc
> 
>   rc = 1;
>  out:
> + ECDSA_SIG_free(ec_sig);
>   EVP_MD_CTX_free(ctx);
>   free(pay);
>   free(sign);




Re: acme-client canary corrupted issue

2022-12-14 Thread Otto Moerbeek
On Wed, Dec 14, 2022 at 12:30:25PM +0100, Renaud Allard wrote:

> Hi Otto,
> 
> 
> On 12/14/22 12:01, Otto Moerbeek wrote:
> > On Tue, Dec 13, 2022 at 10:34:53AM +0100, Renaud Allard wrote:
> > 
> > > Hello,
> > > 
> > > I was force renewing some certs because I removed some domains from
> > > the cert, and got this:
> > > acme-client(53931) in free(): chunk canary corrupted 0xa06cb09db00 
> > > 0xb0@0xb0
> > > 
> > > I am using vm.malloc_conf=SUR>>
> > > 
> > > Best Regards
> > 
> > 
> > I cannot reproduce with several attempts. Please include details on
> > platform and version.
> > 
> > Can you show a run with -v on? That gives a hint where the problem
> > occurs.
> > 
> > Do you get a core dump? If so, try to get a backtrace.
> > 
> 
> 
> It's quite hard to reproduce, I only had it once when I shrank the
> alternative names involved in one certificate. There was no core dump.
> 
> This was produced on 7.2-stable amd64
> account and domain keys are ecdsa
> 
> I ran it with -vvF and could get my run log thanks to tmux back buffer.
> I will skip all the verification/certs babble
> 
> isildur# acme-client -vvF arnor.org
> 
> acme-client: /somewhere/arnor.org.key: loaded domain key
> 
> acme-client: /etc/acme/letsencrypt-privkey.pem: loaded account key
> 
> acme-client: /somewhere/arnor.org.crt: certificate valid: 74 days left
> 
> acme-client: /somewhere/arnor.org.crt: domain list changed, forcing renewal
> acme-client: https://acme-v02.api.letsencrypt.org/directory: directories
> 
> acme-client: acme-v02.api.letsencrypt.org: DNS: 172.65.32.248
> 
>  lots of standard certs/verif dialog *
> -END CERTIFICATE- ] (5800 bytes)
> 
> acme-client(53931) in free(): chunk canary corrupted 0xa06cb09db00 0xb0@0xb0
> acme-client: /somewhere/arnor.org.crt: created
> 
> acme-client: /somewhere/arnor.org.fullchain.pem: created
> 
> acme-client: signal: revokeproc(53931): Abort trap
> 
> Best Regards


Try this

-Otto

Index: revokeproc.c
===
RCS file: /home/cvs/src/usr.sbin/acme-client/revokeproc.c,v
retrieving revision 1.19
diff -u -p -r1.19 revokeproc.c
--- revokeproc.c22 Nov 2021 08:26:08 -  1.19
+++ revokeproc.c14 Dec 2022 14:16:46 -
@@ -239,6 +239,7 @@ revokeproc(int fd, const char *certfile,
goto out;
}
force = 2;
+   continue;
}
if (found[j]++) {
if (revocate) {



Re: acme-client canary corrupted issue

2022-12-14 Thread Otto Moerbeek
On Tue, Dec 13, 2022 at 10:34:53AM +0100, Renaud Allard wrote:

> Hello,
> 
> I was force renewing some certs because I removed some domains from
> the cert, and got this:
> acme-client(53931) in free(): chunk canary corrupted 0xa06cb09db00 0xb0@0xb0
> 
> I am using vm.malloc_conf=SUR>>
> 
> Best Regards


I cannot reproduce with several attempts. Please include details on
platform and version.

Can you show a run with -v on? That gives a hint where the problem
occurs.

Do you get a core dump? If so, try to get a backtrace.

-Otto



Re: how to use <> to assign entire disk to /+swap

2022-06-11 Thread Otto Moerbeek
On Fri, Jun 10, 2022 at 02:56:15PM -0400, Andrew Cagney wrote:

> On Fri, 10 Jun 2022 at 12:21, Claudio Jeker  wrote:
> 
> > > > Try:
> > > >
> > > > /   1g-*100%
> > > > swap1g  0%
> > >
> > > That worked:
> > >
> > > [root@openbsd root]# disklabel sd0
> > > ...
> > > #size   offset  fstype [fsize bsize   cpg]
> > >   a: 18874240   64  4.2BSD   2048 16384 12960 # /
> > >   b:  2097152 18874304swap# none
> > >   c: 209715200  unused
> > >
> > > Thanks!
> > >
> > > Can I suggest adding this as an example to disklabel(8).  I suspect
> > > assigning the entire disk to / is a common scenario, and would help
> > > clarify how * and % interact.
> 
> ... at least for anyone automating an install as part of a virtual
> test framework; and finding that the default partition size for
> /usr/src was too small :-( :-)
> 
> (for what it's worth, the other "disks" are NFS and are added later so
> don't appear in dmesg; and I habitually delete dmesg)
> 
> > That is a bad advice. Using single / is just bad habit and does not allow
> > to limit mountpoints with nodev, nosuid or wxallowed. For disks in the 10G
> > space I would make sure that /var, /tmp, /usr, /home are different
> > partitions.
> 
> Here's some of the text from disklabel(8)
> 
>  [...] giving mount point, min-max size range, and percentage of disk,
>  space-separated.  Max can be unlimited by specifying '*'.  If only mount
>  point and min size are given, the partition is created with that exact
>  size.
> 
> from my POV, an example clarifying this would have helped.
> 
> take care
> 

The allocation proces *is* described in the section above and below it
is an example.

-Otto



Re: Odd IPv6 ND behaviour after upgrading to OpenBSD 7.1

2022-04-30 Thread Otto Moerbeek
On Fri, Apr 29, 2022 at 04:42:25PM +0100, Ian Chilton wrote:

> Hi,
> 
> Not sure what the etiquette for this list is, so apologies if this is not 
> appropriate as it's not a confirmed bug...
> 
> I have a whole bunch of subnets which are static routed to a HSRP address, 
> provided by a pair of Cisco routers, on a linknet VLAN. Actually, there is 
> two VLANs, vlan209 and vlan409. In the case of v6, the HSRP IP is fe80::1, so 
> I have routes to fe80::1%vlan209 and fe80::1%vlan409.
> 
> This has worked fine for many weeks. On Wednesday evening I upgraded to 7.1.
> 
> On Friday morning, I woke up to nearly 2,000 alerts, because some v6 had 
> started flapping during the night.
> 
> It turns out that fe80::1%vlan409 had randomly become unreachable.
> 
> Every few minutes, it would become reachable again for 8 echo replies, then 
> goes unreachable again.
> 
> This is strange, because we use this same HSRP config / fe80::1 addresses for 
> all of our VLANs and have done for years, without issue.
> 
> Throughout this, the other OpenBSD host (still on 7.0), can access that 
> address with no problem.
> 
> Oddly, this host can still access fe80::1%vlan209 no problem.
> 
> What seems to happen is, a stale ND entry appears and 8 pings succeed...
> the-gw1# ndp -a |grep vlan409 | grep fe80
> fe80::1%vlan409  00:05:73:a0:00:01 vlan409 23h57m56s S R
> ..
> 
> Then this happens:
> the-gw1# ndp -a |grep vlan409 | grep fe80
> ndp: ioctl(SIOCGNBRINFO_IN6): Invalid argument
> ndp: failed to get neighbor information
> ndp: ioctl(SIOCGNBRINFO_IN6): Invalid argument
> ndp: failed to get neighbor information
> ndp: ioctl(SIOCGNBRINFO_IN6): Invalid argument
> ndp: failed to get neighbor information
> ndp: ioctl(SIOCGNBRINFO_IN6): Invalid argument
> ndp: failed to get neighbor information
> fe80::1%vlan409  (incomplete)  vlan409 1sI  2
> Check again, and the entry has disappeared.
> 
> A few mins later, the process repeats - 8 pings suddenly succeed and it 
> disappears again.
> 
> As I say though, fe80::1%vlan209 continues to work fine, as does 
> fe80::1%vlan409 from the other host.
> 
> fe80::1%vlan209  00:05:73:a0:00:01 vlan209 10s   R R
> 
> Interestingly, I did see a neighbour entry for fe80::1 on vlan409 on the 
> Cisco which is the HSRP master which had a MAC address of the-gw1, which 
> implied that the-gw1 is some how responding to ND requests for that IP 
> but I am not able to find those replies in a tcpdump.
> 
> As a workaround, i've added another HSRP address, fe80::2 on the Ciscos and 
> changed the static routes on this box to use that. After a few hours, that's 
> still reachable ok.
> 
> It might be total coincidence that this is after a 7.0 -> 7.1 upgrade, but 
> thought i'd report it and see if anyone else is seeing any similar issues.
> 
> Thanks,
> 
> Ian

I had some issues with neighbour discover lately, which started to
appear when I installed a new CPE.

The issue was that the kernel generated outgoing icmp6 messages with a
hop limit, which then got dropped by pf before even reaching the lan.

The workaround was to do

pass proto icmp6 allow-opts

In the meantime, bluhm@ has been working on a proper solution. See
https://marc.info/?l=openbsd-tech&m=165056094900572

-Otto



Re: ntpd constraint validation shows timestamp from 1899

2022-01-06 Thread Otto Moerbeek
On Thu, Jan 06, 2022 at 04:02:20PM +0100, Florian Obser wrote:

> On 2022-01-06 08:44 +01, Otto Moerbeek  wrote:
> > On Thu, Jan 06, 2022 at 08:38:37AM +0100, Otto Moerbeek wrote:
> >> Looking at the loop again and seeing the "maximum length exceeded" I
> >> think what has happened is that the loop exited without reading the
> >> Date: line and so no call to strptime() happened at all.
> >> 
> >> The code could be improved a bit by returning -1 in that case.
> >> 
> >>-Otto
> >> 
> >
> > Like this 
> >
> 
> OK florian
> 
> p.s.: I have to say I find this code rather convoluted, figuring out
> that this is actually correct took me way too long. (httpsdate is
> allocated with calloc one function up and one to the left and nothing
> touches tls_tm in between).

I thought about introducing a local var that gets set on a succesful
strptime() call but decided against that as their *should* only be one
source for it to be filled in. As for the

allocate()
call some functions and cleanup() in the last

I like this structure better:

allocate()
call some deep functions
cleanup()

Dunno if the current code allows restructuring in this way.

-Otto

> 
> > Index: constraint.c
> > ===
> > RCS file: /cvs/src/usr.sbin/ntpd/constraint.c,v
> > retrieving revision 1.52
> > diff -u -p -r1.52 constraint.c
> > --- constraint.c16 Jul 2021 13:59:10 -  1.52
> > +++ constraint.c6 Jan 2022 07:43:52 -
> > @@ -1019,7 +1019,7 @@ httpsdate_request(struct httpsdate *http
> > &httpsdate->tls_tm) == NULL) {
> > log_warnx("unsupported date format");
> > free(line);
> > -   return (-1);
> > +   goto fail;
> > }
> >  
> > free(line);
> > @@ -1027,6 +1027,8 @@ httpsdate_request(struct httpsdate *http
> >   next:
> > free(line);
> > }
> > +   if (httpsdate->tls_tm.tm_year == 0)
> > +   goto fail;
> >  
> > /*
> >  * Now manually check the validity of the certificate presented in the
> >
> 
> -- 
> I'm not entirely sure you are real.
> 



Re: ntpd constraint validation shows timestamp from 1899

2022-01-05 Thread Otto Moerbeek
On Thu, Jan 06, 2022 at 08:38:37AM +0100, Otto Moerbeek wrote:

> On Wed, Jan 05, 2022 at 02:19:28PM -0700, Theo de Raadt wrote:
> 
> > Otto Moerbeek  wrote:
> > 
> > > On Wed, Jan 05, 2022 at 11:45:36AM +0100, Matthias Schmidt wrote:
> > > 
> > > > >Synopsis:  ntpd constraint validation shows timestamp from 1899
> > > > >Environment:
> > > > System  : OpenBSD 7.0
> > > > Details : OpenBSD 7.0-current (GENERIC.MP) #216: Mon Jan  3 
> > > > 16:04:47 MST 2022
> > > >  
> > > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > > 
> > > > Architecture: OpenBSD.amd64
> > > > Machine : amd64
> > > > >Description:
> > > > 
> > > > Yesterday, the following log message from OpenNTPD appeared for the 
> > > > first and
> > > > only time in my logs:
> > > > 
> > > > Jan  4 19:35:04 sigma ntpd[72304]: maximum length exceeded
> > > > Jan  4 19:35:04 sigma ntpd[72304]: tls certificate not yet valid: 
> > > > 9.9.9.9 (9.9.9.9): not before 2021-07-27 00:00:00 UTC, now 1899-12-31 
> > > > 00:00:00 UTC
> > > > 
> > > > It seems like some hiccup during constraint TLS certificate validation.
> > > > 
> > > > Here are the log messages of the last two minutes BEFORE the lines above
> > > > appeared:
> > > > 
> > > > Jan  4 19:33:08 sigma ntpd[65131]: peer 134.76.249.201 now invalid
> > > > Jan  4 19:33:16 sigma ntpd[65131]: peer 141.2.22.74 now invalid
> > > > Jan  4 19:33:20 sigma ntpd[2066]: adjusting local clock by 0.468916s
> > > > Jan  4 19:33:20 sigma ntpd[65131]: clock is now unsynced
> > > > Jan  4 19:34:04 sigma ntpd[65131]: peer 134.76.249.201 now valid
> > > > Jan  4 19:34:22 sigma ntpd[2066]: adjusting local clock by 0.153686s
> > > > Jan  4 19:34:26 sigma ntpd[65131]: peer 134.76.249.201 now invalid
> > > > Jan  4 19:34:54 sigma ntpd[65131]: clock is now synced
> > > > Jan  4 19:34:54 sigma ntpd[65131]: constraint reply from 
> > > > 82.165.229.152: offset -0.891146
> > > > Jan  4 19:35:00 sigma ntpd[65131]: constraint reply from 82.165.229.87: 
> > > > offset -0.776174
> > > > 
> > > > The next time 9.9.9.9 appeared in the logs was around 15 minutes later:
> > > > 
> > > > Jan  4 19:50:43 sigma ntpd[65131]: constraint reply from 2620:fe::fe: 
> > > > offset -0.848819
> > > > Jan  4 19:50:43 sigma ntpd[65131]: constraint reply from 9.9.9.9: 
> > > > offset -0.872381
> > > 
> > > The 1899 time is the time taken from the http reply headers received.
> > > I suppose quad9 had a hickup and sent garbage.  Likely the strptime()
> > > call in ntpd/constraint.c went wrong but did not return an error in
> > > some way.  The answer to this contraint request was rejected. Other
> > > constraints worked apparantly and after a while quad9 was reporting a
> > > correct time in its http reply headers.
> > 
> > So in other words, the code worked precisely as intended.  The constraint
> > timestamp was not consumed, and the alert is simply amusing.
> > 
> 
> Yep.
> 
> Looking at the loop again and seeing the "maximum length exceeded" I
> think what has happened is that the loop exited without reading the
> Date: line and so no call to strptime() happened at all.
> 
> The code could be improved a bit by returning -1 in that case.
> 
>   -Otto
> 

Like this 

Index: constraint.c
===
RCS file: /cvs/src/usr.sbin/ntpd/constraint.c,v
retrieving revision 1.52
diff -u -p -r1.52 constraint.c
--- constraint.c16 Jul 2021 13:59:10 -  1.52
+++ constraint.c6 Jan 2022 07:43:52 -
@@ -1019,7 +1019,7 @@ httpsdate_request(struct httpsdate *http
&httpsdate->tls_tm) == NULL) {
log_warnx("unsupported date format");
free(line);
-   return (-1);
+   goto fail;
}
 
free(line);
@@ -1027,6 +1027,8 @@ httpsdate_request(struct httpsdate *http
  next:
free(line);
}
+   if (httpsdate->tls_tm.tm_year == 0)
+   goto fail;
 
/*
 * Now manually check the validity of the certificate presented in the



Re: ntpd constraint validation shows timestamp from 1899

2022-01-05 Thread Otto Moerbeek
On Wed, Jan 05, 2022 at 02:19:28PM -0700, Theo de Raadt wrote:

> Otto Moerbeek  wrote:
> 
> > On Wed, Jan 05, 2022 at 11:45:36AM +0100, Matthias Schmidt wrote:
> > 
> > > >Synopsis:ntpd constraint validation shows timestamp from 1899
> > > >Environment:
> > >   System  : OpenBSD 7.0
> > >   Details : OpenBSD 7.0-current (GENERIC.MP) #216: Mon Jan  3 
> > > 16:04:47 MST 2022
> > >
> > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > 
> > >   Architecture: OpenBSD.amd64
> > >   Machine : amd64
> > > >Description:
> > > 
> > > Yesterday, the following log message from OpenNTPD appeared for the first 
> > > and
> > > only time in my logs:
> > > 
> > > Jan  4 19:35:04 sigma ntpd[72304]: maximum length exceeded
> > > Jan  4 19:35:04 sigma ntpd[72304]: tls certificate not yet valid: 9.9.9.9 
> > > (9.9.9.9): not before 2021-07-27 00:00:00 UTC, now 1899-12-31 00:00:00 UTC
> > > 
> > > It seems like some hiccup during constraint TLS certificate validation.
> > > 
> > > Here are the log messages of the last two minutes BEFORE the lines above
> > > appeared:
> > > 
> > > Jan  4 19:33:08 sigma ntpd[65131]: peer 134.76.249.201 now invalid
> > > Jan  4 19:33:16 sigma ntpd[65131]: peer 141.2.22.74 now invalid
> > > Jan  4 19:33:20 sigma ntpd[2066]: adjusting local clock by 0.468916s
> > > Jan  4 19:33:20 sigma ntpd[65131]: clock is now unsynced
> > > Jan  4 19:34:04 sigma ntpd[65131]: peer 134.76.249.201 now valid
> > > Jan  4 19:34:22 sigma ntpd[2066]: adjusting local clock by 0.153686s
> > > Jan  4 19:34:26 sigma ntpd[65131]: peer 134.76.249.201 now invalid
> > > Jan  4 19:34:54 sigma ntpd[65131]: clock is now synced
> > > Jan  4 19:34:54 sigma ntpd[65131]: constraint reply from 82.165.229.152: 
> > > offset -0.891146
> > > Jan  4 19:35:00 sigma ntpd[65131]: constraint reply from 82.165.229.87: 
> > > offset -0.776174
> > > 
> > > The next time 9.9.9.9 appeared in the logs was around 15 minutes later:
> > > 
> > > Jan  4 19:50:43 sigma ntpd[65131]: constraint reply from 2620:fe::fe: 
> > > offset -0.848819
> > > Jan  4 19:50:43 sigma ntpd[65131]: constraint reply from 9.9.9.9: offset 
> > > -0.872381
> > 
> > The 1899 time is the time taken from the http reply headers received.
> > I suppose quad9 had a hickup and sent garbage.  Likely the strptime()
> > call in ntpd/constraint.c went wrong but did not return an error in
> > some way.  The answer to this contraint request was rejected. Other
> > constraints worked apparantly and after a while quad9 was reporting a
> > correct time in its http reply headers.
> 
> So in other words, the code worked precisely as intended.  The constraint
> timestamp was not consumed, and the alert is simply amusing.
> 

Yep.

Looking at the loop again and seeing the "maximum length exceeded" I
think what has happened is that the loop exited without reading the
Date: line and so no call to strptime() happened at all.

The code could be improved a bit by returning -1 in that case.

-Otto



Re: ntpd constraint validation shows timestamp from 1899

2022-01-05 Thread Otto Moerbeek
On Wed, Jan 05, 2022 at 11:45:36AM +0100, Matthias Schmidt wrote:

> >Synopsis:ntpd constraint validation shows timestamp from 1899
> >Environment:
>   System  : OpenBSD 7.0
>   Details : OpenBSD 7.0-current (GENERIC.MP) #216: Mon Jan  3 
> 16:04:47 MST 2022
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
> 
> Yesterday, the following log message from OpenNTPD appeared for the first and
> only time in my logs:
> 
> Jan  4 19:35:04 sigma ntpd[72304]: maximum length exceeded
> Jan  4 19:35:04 sigma ntpd[72304]: tls certificate not yet valid: 9.9.9.9 
> (9.9.9.9): not before 2021-07-27 00:00:00 UTC, now 1899-12-31 00:00:00 UTC
> 
> It seems like some hiccup during constraint TLS certificate validation.
> 
> Here are the log messages of the last two minutes BEFORE the lines above
> appeared:
> 
> Jan  4 19:33:08 sigma ntpd[65131]: peer 134.76.249.201 now invalid
> Jan  4 19:33:16 sigma ntpd[65131]: peer 141.2.22.74 now invalid
> Jan  4 19:33:20 sigma ntpd[2066]: adjusting local clock by 0.468916s
> Jan  4 19:33:20 sigma ntpd[65131]: clock is now unsynced
> Jan  4 19:34:04 sigma ntpd[65131]: peer 134.76.249.201 now valid
> Jan  4 19:34:22 sigma ntpd[2066]: adjusting local clock by 0.153686s
> Jan  4 19:34:26 sigma ntpd[65131]: peer 134.76.249.201 now invalid
> Jan  4 19:34:54 sigma ntpd[65131]: clock is now synced
> Jan  4 19:34:54 sigma ntpd[65131]: constraint reply from 82.165.229.152: 
> offset -0.891146
> Jan  4 19:35:00 sigma ntpd[65131]: constraint reply from 82.165.229.87: 
> offset -0.776174
> 
> The next time 9.9.9.9 appeared in the logs was around 15 minutes later:
> 
> Jan  4 19:50:43 sigma ntpd[65131]: constraint reply from 2620:fe::fe: offset 
> -0.848819
> Jan  4 19:50:43 sigma ntpd[65131]: constraint reply from 9.9.9.9: offset 
> -0.872381

The 1899 time is the time taken from the http reply headers received.
I suppose quad9 had a hickup and sent garbage.  Likely the strptime()
call in ntpd/constraint.c went wrong but did not return an error in
some way.  The answer to this contraint request was rejected. Other
constraints worked apparantly and after a while quad9 was reporting a
correct time in its http reply headers.

-Otto

> 
> I attached the full ntpd log in case that helps.
> 
> Here's my ntpd.conf
> 
> servers time.uni-paderborn.de
> servers ntp.1und1.de
> server ntp1.uni-ulm.de
> server ntp2.uni-ulm.de
> server ntp.uni-osnabrueck.de
> server times.tubit.tu-berlin.de
> 
> constraint from "9.9.9.9" # quad9 v4 without DNS
> constraint from "2620:fe::fe" # quad9 v6 without DNS
> constraints from "gmx.de"
> 
> >How-To-Repeat:
> 
> Sorry, no idea.  Happened for the first time.
> 
> >Fix:
> 
> Unknown as well.
> 
> dmesg:
> OpenBSD 7.0-current (GENERIC.MP) #216: Mon Jan  3 16:04:47 MST 2022
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 12765257728 (12173MB)
> avail mem = 12362371072 (11789MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0x9cbfd000 (65 entries)
> bios0: vendor LENOVO version "JBET73WW (1.37 )" date 08/14/2019
> bios0: LENOVO 20BX0049GE
> acpi0 at bios0: ACPI 5.0
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SLIC ASF! HPET ECDT APIC MCFG SSDT SSDT SSDT SSDT 
> SSDT SSDT SSDT SSDT SSDT PCCT SSDT TCPA SSDT UEFI MSDM BATB FPDT UEFI DMAR
> acpi0: wakeup devices LID_(S4) SLPB(S3) IGBE(S4) EXP2(S4) XHCI(S3) EHC1(S3)
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpihpet0 at acpi0: 14318179 Hz
> acpiec0 at acpi0
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz, 2095.44 MHz, 06-3d-04
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,PT,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz, 2095.15 MHz, 06-3d-04
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F1

Re: Missing -- separator in grep -A

2021-12-28 Thread Otto Moerbeek
So this is the complete diff, regress parts are from Stefan.

Asking for OK's. I'll commit the usr.bin and sys part and let Stefan
do the regress part.

-Otto

Index: usr.bin/grep/util.c
===
RCS file: /cvs/src/usr.bin/grep/util.c,v
retrieving revision 1.63
diff -u -p -r1.63 util.c
--- usr.bin/grep/util.c 23 Jul 2020 20:19:27 -  1.63
+++ usr.bin/grep/util.c 28 Dec 2021 10:07:00 -
@@ -258,8 +258,8 @@ print:
 
if ((tail > 0 || c) && !cflag && !qflag) {
if (c) {
-   if (first > 0 && tail == 0 && (Bflag < linesqueued) &&
-   (Aflag || Bflag))
+   if (first > 0 && tail == 0 && (Aflag || (Bflag &&
+   Bflag < linesqueued)))
printf("--\n");
first = 1;
tail = Aflag;
Index: sys/arch/amd64/conf/Makefile.amd64
===
RCS file: /cvs/src/sys/arch/amd64/conf/Makefile.amd64,v
retrieving revision 1.123
diff -u -p -r1.123 Makefile.amd64
--- sys/arch/amd64/conf/Makefile.amd64  17 Dec 2021 14:59:22 -  1.123
+++ sys/arch/amd64/conf/Makefile.amd64  28 Dec 2021 10:07:00 -
@@ -158,7 +158,8 @@ ioconf.o: ioconf.c
 
 locore.o: assym.h
${NORMAL_S}
-   @[[ -n `objdump -D $@ | grep -A1 doreti_iret | sort | uniq -d` ]] || \
+   @[[ -n `objdump -D $@ | grep -A1 doreti_iret | grep -v ^-- | sort | \
+uniq -d` ]] || \
 { rm -f $@; echo "ERROR: overlaid iretq instructions don't line up"; \
   echo "#GP-on-iretq fault handling would be broken"; exit 1; }
 
Index: regress/usr.bin/grep/Makefile
===
RCS file: /cvs/src/regress/usr.bin/grep/Makefile,v
retrieving revision 1.17
diff -u -p -r1.17 Makefile
--- regress/usr.bin/grep/Makefile   12 Dec 2012 15:11:25 -  1.17
+++ regress/usr.bin/grep/Makefile   28 Dec 2021 10:07:00 -
@@ -1,7 +1,7 @@
 # $OpenBSD: Makefile,v 1.17 2012/12/12 15:11:25 weerd Exp $
 
 REGRESS_TARGETS=t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 \
-   t18 t19 t20 t21 t22 t23 t24 t25 t26
+   t18 t19 t20 t21 t22 t23 t24 t25 t26 t27
 
 t1:
grep t.s ${.CURDIR}/in | diff - ${.CURDIR}/t1.out
@@ -102,8 +102,13 @@ t25:
 t26:
echo 'aaab' | grep -o 'a*' | head -n 10 | diff - ${.CURDIR}/t26.out
 
+t27:
+   grep -A1 'C' ${.CURDIR}/t27.in | diff - ${.CURDIR}/t27a.out
+   grep -B1 'C' ${.CURDIR}/t27.in | diff - ${.CURDIR}/t27b.out
+   grep -C1 'C' ${.CURDIR}/t27.in | diff - ${.CURDIR}/t27c.out
+
 
 .PHONY: t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 t20
-.PHONY: t21 t22 t23 t24 t25 t26
+.PHONY: t21 t22 t23 t24 t25 t26 t27
 
 .include 
Index: regress/usr.bin/grep/t27.in
===
RCS file: regress/usr.bin/grep/t27.in
diff -N regress/usr.bin/grep/t27.in
--- /dev/null   1 Jan 1970 00:00:00 -
+++ regress/usr.bin/grep/t27.in 28 Dec 2021 10:07:00 -
@@ -0,0 +1,12 @@
+A
+B
+C
+D
+A
+B
+C
+D
+A
+B
+C
+D
Index: regress/usr.bin/grep/t27a.out
===
RCS file: regress/usr.bin/grep/t27a.out
diff -N regress/usr.bin/grep/t27a.out
--- /dev/null   1 Jan 1970 00:00:00 -
+++ regress/usr.bin/grep/t27a.out   28 Dec 2021 10:07:00 -
@@ -0,0 +1,8 @@
+C
+D
+--
+C
+D
+--
+C
+D
Index: regress/usr.bin/grep/t27b.out
===
RCS file: regress/usr.bin/grep/t27b.out
diff -N regress/usr.bin/grep/t27b.out
--- /dev/null   1 Jan 1970 00:00:00 -
+++ regress/usr.bin/grep/t27b.out   28 Dec 2021 10:07:00 -
@@ -0,0 +1,8 @@
+B
+C
+--
+B
+C
+--
+B
+C
Index: regress/usr.bin/grep/t27c.out
===
RCS file: regress/usr.bin/grep/t27c.out
diff -N regress/usr.bin/grep/t27c.out
--- /dev/null   1 Jan 1970 00:00:00 -
+++ regress/usr.bin/grep/t27c.out   28 Dec 2021 10:07:00 -
@@ -0,0 +1,11 @@
+B
+C
+D
+--
+B
+C
+D
+--
+B
+C
+D



Re: Missing -- separator in grep -A

2021-12-26 Thread Otto Moerbeek
On Sun, Dec 26, 2021 at 12:04:48PM +0100, Stefan Hagen wrote:

> Otto Moerbeek wrote:
> > On Sat, Dec 25, 2021 at 04:44:11PM -0800, Greg Steuck wrote:
> > 
> > > The separator doesn't get printed when I use this script on OpenBSD. It
> > > does get printed on FreeBSD or if I used GNU grep. The issue appears to
> > > be an off-by-one of some sort because removing the empty line makes the
> > > separator disappear on both systems.
> > > 
> > > #!/bin/sh
> > > 
> > > grep -E -A6 '^(.w)?g' <<'EOF'
> > > $wg
> > > a
> > > b
> > > c
> > > d
> > > e
> > > f
> > > 
> > > g = \ ds ->
> > > h
> > > i
> > > j
> > > EOF
> > > 
> > 
> > Hi,
> > 
> > please include the expected output and the output seen in bug reports,
> > it makes the initial diagnosis much easier. I now had to run your
> > testcase on another system to see what you meant. 
> > 
> > BTW, on MacOS I do see the separator with your test, both with the
> > empty line and without it.
> > 
> > ANyway, here's an attempt at a fix. Without much coffee, so beware.
> > 
> > -Otto
> 
> Hi,
> 
> the fix works as intended here and is consistent with what I see on
> Linux and FreeBSD.
> 
> Unfortunately the separator dashes are also printed in non-interactive 
> mode (also on Linux and FreeBSD), so we need to check our grep usage.
> 
> I only found one in src (I haven't checked ports):
> ./sys/arch/amd64/conf/Makefile.amd64: \
> @[[ -n `objdump -D $@ | grep -A1 doreti_iret | sort | uniq -d` ]] || \
> 
> Attached is a regress test based on your fix (also for -B and -C) that
> shows the behavior we have now and I think it's correct.
> 
> Best Regards,
> Stefan
> 
> Index: regress/usr.bin/grep/Makefile
> ===
> RCS file: /home/cvs/src/regress/usr.bin/grep/Makefile,v
> retrieving revision 1.17
> diff -u -p -u -p -r1.17 Makefile
> --- regress/usr.bin/grep/Makefile 12 Dec 2012 15:11:25 -  1.17
> +++ regress/usr.bin/grep/Makefile 26 Dec 2021 10:15:02 -
> @@ -1,7 +1,7 @@
>  # $OpenBSD: Makefile,v 1.17 2012/12/12 15:11:25 weerd Exp $
>  
>  REGRESS_TARGETS=t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 \
> - t18 t19 t20 t21 t22 t23 t24 t25 t26
> + t18 t19 t20 t21 t22 t23 t24 t25 t26 t27
>  
>  t1:
>   grep t.s ${.CURDIR}/in | diff - ${.CURDIR}/t1.out
> @@ -102,8 +102,13 @@ t25:
>  t26:
>   echo 'aaab' | grep -o 'a*' | head -n 10 | diff - ${.CURDIR}/t26.out
>  
> +t27:
> + grep -A1 'C' ${.CURDIR}/t27.in | diff - ${.CURDIR}/t27a.out
> + grep -B1 'C' ${.CURDIR}/t27.in | diff - ${.CURDIR}/t27b.out
> + grep -C1 'C' ${.CURDIR}/t27.in | diff - ${.CURDIR}/t27c.out
> +
>  
>  .PHONY: t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 
> t20
> -.PHONY: t21 t22 t23 t24 t25 t26
> +.PHONY: t21 t22 t23 t24 t25 t26 t27
>  
>  .include 
> Index: regress/usr.bin/grep/t27.in
> ===
> RCS file: regress/usr.bin/grep/t27.in
> diff -N regress/usr.bin/grep/t27.in
> --- /dev/null 1 Jan 1970 00:00:00 -
> +++ regress/usr.bin/grep/t27.in   26 Dec 2021 10:32:02 -
> @@ -0,0 +1,12 @@
> +A
> +B
> +C
> +D
> +A
> +B
> +C
> +D
> +A
> +B
> +C
> +D
> Index: regress/usr.bin/grep/t27a.out
> ===
> RCS file: regress/usr.bin/grep/t27a.out
> diff -N regress/usr.bin/grep/t27a.out
> --- /dev/null 1 Jan 1970 00:00:00 -
> +++ regress/usr.bin/grep/t27a.out 26 Dec 2021 10:07:48 -
> @@ -0,0 +1,8 @@
> +C
> +D
> +--
> +C
> +D
> +--
> +C
> +D
> Index: regress/usr.bin/grep/t27b.out
> ===
> RCS file: regress/usr.bin/grep/t27b.out
> diff -N regress/usr.bin/grep/t27b.out
> --- /dev/null 1 Jan 1970 00:00:00 -
> +++ regress/usr.bin/grep/t27b.out 26 Dec 2021 10:10:21 -
> @@ -0,0 +1,8 @@
> +B
> +C
> +--
> +B
> +C
> +--
> +B
> +C
> Index: regress/usr.bin/grep/t27c.out
> ===
> RCS file: regress/usr.bin/grep/t27c.out
> diff -N regress/usr.bin/grep/t27c.out
> --- /dev/null 1 Jan 1970 00:00:00 -
> +++ regress/usr.bin/grep/t27c.out 26 Dec 2021 10:07:48 -
> @@ -0,0 +1,11 @@
> +B
> +C
> +D

Re: Missing -- separator in grep -A

2021-12-26 Thread Otto Moerbeek
On Sat, Dec 25, 2021 at 04:44:11PM -0800, Greg Steuck wrote:

> The separator doesn't get printed when I use this script on OpenBSD. It
> does get printed on FreeBSD or if I used GNU grep. The issue appears to
> be an off-by-one of some sort because removing the empty line makes the
> separator disappear on both systems.
> 
> #!/bin/sh
> 
> grep -E -A6 '^(.w)?g' <<'EOF'
> $wg
> a
> b
> c
> d
> e
> f
> 
> g = \ ds ->
> h
> i
> j
> EOF
> 

Hi,

please include the expected output and the output seen in bug reports,
it makes the initial diagnosis much easier. I now had to run your
testcase on another system to see what you meant. 

BTW, on MacOS I do see the separator with your test, both with the
empty line and without it.

ANyway, here's an attempt at a fix. Without much coffee, so beware.

-Otto

Test script I used:
grep -E -A6 X << 'EOF'
X
1
2
3
4
X
1
2
3
4
5
6
7
X
1
2
3
4
5
6
X
1
EOF

Output with diff on and OpenBSD and MAcOS:
X
1
2
3
4
X
1
2
3
4
5
6
--
X
1
2
3
4
5
6
--
X
1


Index: util.c
===
RCS file: /cvs/src/usr.bin/grep/util.c,v
retrieving revision 1.63
diff -u -p -r1.63 util.c
--- util.c  23 Jul 2020 20:19:27 -  1.63
+++ util.c  26 Dec 2021 08:20:10 -
@@ -258,8 +258,7 @@ print:
 
if ((tail > 0 || c) && !cflag && !qflag) {
if (c) {
-   if (first > 0 && tail == 0 && (Bflag < linesqueued) &&
-   (Aflag || Bflag))
+   if (first > 0 && tail == 0 && (Aflag || (Bflag && Bflag 
< linesqueued)))
printf("--\n");
first = 1;
tail = Aflag;



Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-25 Thread Otto Moerbeek
On Wed, Nov 24, 2021 at 02:48:29PM -0700, Ted Bullock wrote:

> On 2021-11-20 2:49 p.m., Ted Bullock wrote:
> > This patch disables fchmod in the bootblock for IDE drives on sparc64
> > 
> > I can confirm that this allows my sunblade 100 to boot -current
> Hi folks,
> 
> I'm requesting to have the patch I sent in a previous mail reviewed and
> committed. I also attached the patch here.
> 
> (for reference)
> https://marc.info/?l=openbsd-bugs&m=163744498300496&w=2
> 
> This should affect only a small set of older sparc64 machines that use
> IDE drives instead of SCSI.  At least two such machines types are not
> working and are corrupting filesystems when booted with recent versions
> of OpenBSD since writing calls were introduced to the second stage boot
> block. Tested on an Ultra 5 and Sun Blade 100. The impact is that these
> older sparc64 machines will potentially re-use entropy from previous
> boot cycles if a new random seed file was not written between restarts.
> 
> -- 
> Ted Bullock 

Tests ok (using a SCSI disk, I still see the correct behaviour, I do
not have IDE disks so I cannot test that).

One comment inline,

-Otto

> Index: arch/sparc64/stand/ofwboot/ofdev.c
> ===
> RCS file: /cvs/src/sys/arch/sparc64/stand/ofwboot/ofdev.c,v
> retrieving revision 1.31
> diff -u -p -u -p -r1.31 ofdev.c
> --- arch/sparc64/stand/ofwboot/ofdev.c9 Dec 2020 18:10:19 -   
> 1.31
> +++ arch/sparc64/stand/ofwboot/ofdev.c20 Nov 2021 12:36:10 -
> @@ -520,7 +520,7 @@ devopen(struct open_file *of, const char
>   char fname[256];
>   char buf[DEV_BSIZE];
>   struct disklabel label;
> - int handle, part;
> + int handle, part, parent;
>   int error = 0;
>  #ifdef SOFTRAID
>   char volno;
> @@ -649,6 +649,9 @@ devopen(struct open_file *of, const char
>  #endif
>   if ((handle = OF_finddevice(fname)) == -1)
>   return ENOENT;
> +
> + parent = OF_parent(handle);

I think the OF_parent call can go inside the !strcmp(buf, "block") block.


> +
>   DNPRINTF(BOOT_D_OFDEV, "devopen: found %s\n", fname);
>   if (OF_getprop(handle, "name", buf, sizeof buf) < 0)
>   return ENXIO;
> @@ -685,6 +688,13 @@ devopen(struct open_file *of, const char
>  
>   of->f_dev = devsw;
>   of->f_devdata = &ofdev;
> +
> + /* Some PROMS have bugged writing code for ide block devices */

here I mean

> + if (OF_getprop(parent, "name", buf, sizeof buf) < 0)
> + return ENXIO;
> + if (!strcmp(buf, "ide"))
> + of->f_flags |= F_NOWRITE;
> +
>  #ifdef SPARC_BOOT_UFS
>   bcopy(&file_system_ufs, &file_system[nfsys++], sizeof 
> file_system[0]);
>   bcopy(&file_system_ufs2, &file_system[nfsys++], sizeof 
> file_system[0]);
> Index: arch/sparc64/stand/ofwboot/vers.c
> ===
> RCS file: /cvs/src/sys/arch/sparc64/stand/ofwboot/vers.c,v
> retrieving revision 1.22
> diff -u -p -u -p -r1.22 vers.c
> --- arch/sparc64/stand/ofwboot/vers.c 9 Dec 2020 18:10:19 -   1.22
> +++ arch/sparc64/stand/ofwboot/vers.c 20 Nov 2021 12:36:11 -
> @@ -1 +1 @@
> -const char version[] = "1.21";
> +const char version[] = "1.22";
> Index: lib/libsa/fchmod.c
> ===
> RCS file: /cvs/src/sys/lib/libsa/fchmod.c,v
> retrieving revision 1.1
> diff -u -p -u -p -r1.1 fchmod.c
> --- lib/libsa/fchmod.c3 Aug 2019 15:22:17 -   1.1
> +++ lib/libsa/fchmod.c20 Nov 2021 12:36:12 -
> @@ -53,6 +53,11 @@ fchmod(int fd, mode_t m)
>   errno = EOPNOTSUPP;
>   return (-1);
>   }
> + /* writing is broken or unsupported */
> + if (f->f_flags & F_NOWRITE) {
> + errno = EOPNOTSUPP;
> + return (-1);
> + }
>  
>   errno = (f->f_ops->fchmod)(f, m);
>   return (0);
> Index: lib/libsa/stand.h
> ===
> RCS file: /cvs/src/sys/lib/libsa/stand.h,v
> retrieving revision 1.71
> diff -u -p -u -p -r1.71 stand.h
> --- lib/libsa/stand.h 24 Oct 2021 17:49:19 -  1.71
> +++ lib/libsa/stand.h 20 Nov 2021 12:36:12 -
> @@ -107,10 +107,11 @@ struct open_file {
>  extern struct open_file files[];
>  
>  /* f_flags values */
> -#define  F_READ  0x0001  /* file opened for reading */
> -#define  F_WRITE 0x0002  /* file opened for writing */
> -#define  F_RAW   0x0004  /* raw device open - no file system */
> -#define F_NODEV  0x0008  /* network open - no device */
> +#define F_READ  0x0001 /* file opened for reading */
> +#define F_WRITE 0x0002 /* file opened for writing */
> +#define F_RAW   0x0004 /* raw device open - no file system */
> +#define F_NODEV 0x0008 /* netwo

Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-24 Thread Otto Moerbeek
On Wed, Nov 24, 2021 at 11:29:09PM +0100, Mark Kettenis wrote:

> > Date: Wed, 24 Nov 2021 14:48:29 -0700
> > From: Ted Bullock 
> 
> Hi Ted,
> 
> Yes, I think your idea is sane and the diff looks good.  But I'd like
> to test it myself before I commit it and I haven't found the time to
> do so yet.
> 
> Do remind me if I haven't done so in a week or so.

Same for me.

-Otto

> 
> Thanks,
> 
> Mark
> 
> > On 2021-11-20 2:49 p.m., Ted Bullock wrote:
> > > This patch disables fchmod in the bootblock for IDE drives on sparc64
> > > 
> > > I can confirm that this allows my sunblade 100 to boot -current
> > Hi folks,
> > 
> > I'm requesting to have the patch I sent in a previous mail reviewed and
> > committed. I also attached the patch here.
> > 
> > (for reference)
> > https://marc.info/?l=openbsd-bugs&m=163744498300496&w=2
> > 
> > This should affect only a small set of older sparc64 machines that use
> > IDE drives instead of SCSI.  At least two such machines types are not
> > working and are corrupting filesystems when booted with recent versions
> > of OpenBSD since writing calls were introduced to the second stage boot
> > block. Tested on an Ultra 5 and Sun Blade 100. The impact is that these
> > older sparc64 machines will potentially re-use entropy from previous
> > boot cycles if a new random seed file was not written between restarts.
> > 
> > -- 
> > Ted Bullock 
> > 
> > [2:text/plain Show Save:sunblade100.patch (3kB)]
> > 



Re: getrrsetbyname() doesn't set RES_USE_DNSSEC, preventing VerifyHostKeyDNS with OpenSSH

2021-11-17 Thread Otto Moerbeek
Hi,

And here a sketch of the AD bit approach, include the asr changes.
Largely untested and lacking docs and the localhost vs non-localhost
distinction.

-Otto


Index: include/resolv.h
===
RCS file: /cvs/src/include/resolv.h,v
retrieving revision 1.22
diff -u -p -r1.22 resolv.h
--- include/resolv.h14 Jan 2019 06:23:06 -  1.22
+++ include/resolv.h18 Nov 2021 07:12:08 -
@@ -191,6 +191,7 @@ struct __res_state_ext {
 /* DNSSEC extensions: use higher bit to avoid conflict with ISC use */
 #defineRES_USE_DNSSEC  0x2000  /* use DNSSEC using OK bit in 
OPT */
 #defineRES_USE_CD  0x1000  /* set Checking Disabled flag */
+#defineRES_USE_AD  0x8000  /* set Authentic Data flag */
 
 #define RES_DEFAULT(RES_RECURSE | RES_DEFNAMES | RES_DNSRCH)
 
Index: lib/libc/asr/res_mkquery.c
===
RCS file: /cvs/src/lib/libc/asr/res_mkquery.c,v
retrieving revision 1.13
diff -u -p -r1.13 res_mkquery.c
--- lib/libc/asr/res_mkquery.c  14 Jan 2019 06:49:42 -  1.13
+++ lib/libc/asr/res_mkquery.c  18 Nov 2021 07:12:08 -
@@ -62,6 +62,8 @@ res_mkquery(int op, const char *dname, i
h.flags |= RD_MASK;
if (ac->ac_options & RES_USE_CD)
h.flags |= CD_MASK;
+   if (ac->ac_options & RES_USE_AD)
+   h.flags |= AD_MASK;
h.qdcount = 1;
if (ac->ac_options & (RES_USE_EDNS0 | RES_USE_DNSSEC))
h.arcount = 1;
Index: lib/libc/asr/res_send_async.c
===
RCS file: /cvs/src/lib/libc/asr/res_send_async.c,v
retrieving revision 1.39
diff -u -p -r1.39 res_send_async.c
--- lib/libc/asr/res_send_async.c   28 Sep 2019 11:21:07 -  1.39
+++ lib/libc/asr/res_send_async.c   18 Nov 2021 07:12:08 -
@@ -378,6 +378,8 @@ setup_query(struct asr_query *as, const 
h.flags |= RD_MASK;
if (as->as_ctx->ac_options & RES_USE_CD)
h.flags |= CD_MASK;
+   if (as->as_ctx->ac_options & RES_USE_AD)
+   h.flags |= AD_MASK;
h.qdcount = 1;
if (as->as_ctx->ac_options & (RES_USE_EDNS0 | RES_USE_DNSSEC))
h.arcount = 1;
Index: usr.bin/ssh/dns.c
===
RCS file: /cvs/src/usr.bin/ssh/dns.c,v
retrieving revision 1.41
diff -u -p -r1.41 dns.c
--- usr.bin/ssh/dns.c   19 Jul 2021 03:13:28 -  1.41
+++ usr.bin/ssh/dns.c   18 Nov 2021 07:12:08 -
@@ -29,6 +29,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -195,7 +196,7 @@ verify_host_key_dns(const char *hostname
 struct sshkey *hostkey, int *flags)
 {
u_int counter;
-   int result;
+   int result, old_options;
struct rrsetinfo *fingerprints = NULL;
 
u_int8_t hostkey_algorithm;
@@ -218,8 +219,12 @@ verify_host_key_dns(const char *hostname
return -1;
}
 
+   old_options = _res.options;
+   _res.options |= RES_USE_AD;
result = getrrsetbyname(hostname, DNS_RDATACLASS_IN,
DNS_RDATATYPE_SSHFP, 0, &fingerprints);
+   _res.options = old_options;
+
if (result) {
verbose("DNS lookup error: %s", dns_result_totext(result));
return -1;



Re: getrrsetbyname() doesn't set RES_USE_DNSSEC, preventing VerifyHostKeyDNS with OpenSSH

2021-11-17 Thread Otto Moerbeek
On Thu, Nov 18, 2021 at 07:22:27AM +0100, Otto Moerbeek wrote:

> On Wed, Nov 17, 2021 at 01:05:05PM -0800, tho...@habets.se wrote:
> 
> > On Wed, 17 Nov 2021 20:46:46 +, Otto Moerbeek  said:
> > > Well, I should have been more clear as well, dig sets both the AD bit
> > > (by default) and the DO bit (on +dnssec). More clienst do this. This
> > > is part of the can of worms.
> > 
> > Right, yeah. I misunderstood the AD bit in the query being the
> > trigger, but as you say it's the DO, and as the RFC says, queries
> > shouldn't set the AD.
> 
> I wasn't comlete and not right, as Florian is saying, a more recent
> RFC does define the AD bit for queries. Sorry about that oversight.
> Sometimes it's hard to keep track of all the relevant RFCs.
> 
> So AD bit in a query gets you a potential AD bits set in the reply,
> but without the records needed to validate the signature, you have to
> trust the resolver. Setting the DO bit does get you the DNSSEC records
> in addition to the AD bit in the reply if valiudation succeeded. That
> is also a reason setting the DO bit is a can of worms, it grows the
> response sizes and not all equipment handles tat properly.
> 
> > 
> > > You are forcing *all* clients resolving to use dnssec. Only a solution
> > > that limits the scope to the the smallest case (ssh doing an
> > > getrrsetbyname() for DNS_RDATATYPE_SSHFP is likely acceptable. Sadly
> > > the context used by resolving is program-wide, so setting a flag in
> > > _res is also not going to work.
> > 
> > Yeah. For the machine I'm on I actually want all DNS requests system
> > wide to use DNSSEC. So personally that's working as intended. But I
> > see your point.
> > 
> > Currently there's no way to get a signed response, right?
> > 
> > With my patch for a per-program level option I just successfully
> > tested:
> > 
> >   RES_OPTIONS=dnssec ssh foo.example.com
> > 
> > But of course it doesn't limit to just getrrsetbyname().
> > 
> > Is the asr_ctx (where flags look like they live) program-wide, or just
> > thread wide? I can basically hear you cringing already, so maybe the
> > only real solution is to have getrrsetbyname_async_run() pass in flags
> > to _res_query_async_ctx()->setup_query() to OR in the option?
> > 
> > And then maybe have getrrsetbyname() call that stack twice, once with
> > and once without RES_USE_DNSSEC, in case DNSSEC is broken?
> 
> After sleeping on it, likely setting the DO bit for a single query can
> be done like ntpd is doing for the CD bit, see probe_root_ns() in
> ntp_dns.c
> 
> But I agree with Florian that setting the AD bit on the query is
> better, although afaik, there is no way of doing that yet.
> 
>   -Otto
> 

A diff setting the DO bit for just the SSHFP query could look like this.
Setting the AD bit is not possible atm, but I might take a look how to
do that.

-Otto

Index: dns.c
===
RCS file: /cvs/src/usr.bin/ssh/dns.c,v
retrieving revision 1.41
diff -u -p -r1.41 dns.c
--- dns.c   19 Jul 2021 03:13:28 -  1.41
+++ dns.c   18 Nov 2021 06:38:13 -
@@ -29,6 +29,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -195,7 +196,7 @@ verify_host_key_dns(const char *hostname
 struct sshkey *hostkey, int *flags)
 {
u_int counter;
-   int result;
+   int result, old_options;
struct rrsetinfo *fingerprints = NULL;
 
u_int8_t hostkey_algorithm;
@@ -218,8 +219,12 @@ verify_host_key_dns(const char *hostname
return -1;
}
 
+   old_options = _res.options;
+   _res.options |= RES_USE_DNSSEC;
result = getrrsetbyname(hostname, DNS_RDATACLASS_IN,
DNS_RDATATYPE_SSHFP, 0, &fingerprints);
+   _res.options = old_options;
+
if (result) {
verbose("DNS lookup error: %s", dns_result_totext(result));
return -1;



Re: getrrsetbyname() doesn't set RES_USE_DNSSEC, preventing VerifyHostKeyDNS with OpenSSH

2021-11-17 Thread Otto Moerbeek
On Wed, Nov 17, 2021 at 01:05:05PM -0800, tho...@habets.se wrote:

> On Wed, 17 Nov 2021 20:46:46 +0000, Otto Moerbeek  said:
> > Well, I should have been more clear as well, dig sets both the AD bit
> > (by default) and the DO bit (on +dnssec). More clienst do this. This
> > is part of the can of worms.
> 
> Right, yeah. I misunderstood the AD bit in the query being the
> trigger, but as you say it's the DO, and as the RFC says, queries
> shouldn't set the AD.

I wasn't comlete and not right, as Florian is saying, a more recent
RFC does define the AD bit for queries. Sorry about that oversight.
Sometimes it's hard to keep track of all the relevant RFCs.

So AD bit in a query gets you a potential AD bits set in the reply,
but without the records needed to validate the signature, you have to
trust the resolver. Setting the DO bit does get you the DNSSEC records
in addition to the AD bit in the reply if valiudation succeeded. That
is also a reason setting the DO bit is a can of worms, it grows the
response sizes and not all equipment handles tat properly.

> 
> > You are forcing *all* clients resolving to use dnssec. Only a solution
> > that limits the scope to the the smallest case (ssh doing an
> > getrrsetbyname() for DNS_RDATATYPE_SSHFP is likely acceptable. Sadly
> > the context used by resolving is program-wide, so setting a flag in
> > _res is also not going to work.
> 
> Yeah. For the machine I'm on I actually want all DNS requests system
> wide to use DNSSEC. So personally that's working as intended. But I
> see your point.
> 
> Currently there's no way to get a signed response, right?
> 
> With my patch for a per-program level option I just successfully
> tested:
> 
>   RES_OPTIONS=dnssec ssh foo.example.com
> 
> But of course it doesn't limit to just getrrsetbyname().
> 
> Is the asr_ctx (where flags look like they live) program-wide, or just
> thread wide? I can basically hear you cringing already, so maybe the
> only real solution is to have getrrsetbyname_async_run() pass in flags
> to _res_query_async_ctx()->setup_query() to OR in the option?
> 
> And then maybe have getrrsetbyname() call that stack twice, once with
> and once without RES_USE_DNSSEC, in case DNSSEC is broken?

After sleeping on it, likely setting the DO bit for a single query can
be done like ntpd is doing for the CD bit, see probe_root_ns() in
ntp_dns.c

But I agree with Florian that setting the AD bit on the query is
better, although afaik, there is no way of doing that yet.

-Otto



Re: getrrsetbyname() doesn't set RES_USE_DNSSEC, preventing VerifyHostKeyDNS with OpenSSH

2021-11-17 Thread Otto Moerbeek
On Wed, Nov 17, 2021 at 12:26:00PM -0800, tho...@habets.se wrote:

> On Wed, 17 Nov 2021 20:00:21 +0000, Otto Moerbeek  said:
> > You seem to be confused about the meaning of the ad bit. It is a bit
> > that only has significance on replies, see
> > https://datatracker.ietf.org/doc/html/rfc4035#section-4.6
> 
> Ah, wireshark calls this bit the "AD bit" when sent by dig, so that's
> why I used that incorrect term:
> 
> Domain Name System (query)
> Transaction ID: 0x4bea
> Flags: 0x0120 Standard query
> 0...    = Response: Message is a query
> .000 0...   = Opcode: Standard query (0)
>  ..0.   = Truncated: Message is not truncated
>  ...1   = Recursion desired: Do query recursively
>   .0..  = Z: reserved (0)
>   ..1.  = AD bit: Set
>   ...0  = Non-authenticated data: Unacceptable
> 
> From what you linked it looks like a separate bug in dig.

Well, I should have been more clear as well, dig sets both the AD bit
(by default) and the DO bit (on +dnssec). More clienst do this. This
is part of the can of worms.

> 
> > Signalling that yo want a DNSSEC validated answer is normally done by
> > setting the DO bit in the EDNS options.
> 
> You're right. Looking more carefully though without RSS_USE_DNSSEC
> (which is not possible to set) OpenBSD doesn't do that though:
> 
> Domain Name System (query)
> Transaction ID: 0x4dbe
> Flags: 0x0100 Standard query
> 0...    = Response: Message is a query
> .000 0...   = Opcode: Standard query (0)
>  ..0.   = Truncated: Message is not truncated
>  ...1   = Recursion desired: Do query recursively
>   .0..  = Z: reserved (0)
>   ...0  = Non-authenticated data: Unacceptable
> Questions: 1
> Answer RRs: 0
> Authority RRs: 0
> Additional RRs: 1
> Queries
> X: type SSHFP, class IN
> Name: XX
> [Name Length: 21]
> [Label Count: 3]
> Type: SSHFP (SSH Key Fingerprint) (44)
> Class: IN (0x0001)
> Additional records
> : type OPT
> Name: 
> Type: OPT (41)
> UDP payload size: 4096
> Higher bits in extended RCODE: 0x00
> EDNS0 version: 0
> Z: 0x
> 0...    = DO bit: Cannot handle DNSSEC security 
> RRs
> .000    = Reserved: 0x
> Data length: 0
> 
> DO is not set unless RSS_USE_DNSSEC is set.
> 
> >> It seems the unwind DNS server *unconditionally* returns with 'ad' set, so
> > Nope, unwind sets the ad bit only on DNSSEC validated answers, and
> > other resolvers can be configured to do so.
> 
> Sorry, I was told this on #openbsd, but should have checked myself.
> 
> >> it works if (and only if?) unwind is the server queried. This seems like a
> >> bug, and it should probably work with all DNS servers (e.g. 8.8.8.8[3]).
> > Quad8 already sets the ad bit on DNSSEC validated answers, just as unwind.
> 
> If given edns0 DO, yes. But (per above) it's never provided by the
> getrrsetbyname() path.
> 
> >> I believe that the fix here should be:
> >>
> >>else if (!strcmp(tok[i], "dnssec"))
> >>ac->ac_options |= RES_USE_DNSSEC;
> > you are opening a can of worms.
> 
> How should RES_USE_DNSSEC be enabled for getrrsetbyname(), then?
> 
> Hard coding it in getrrsetbyname() presumably risks it not being able
> to retrieve the SSHFP records at all, which makes it not even possible
> to fall back to "ask" for VerifyHostKeyDNS?

You are forcing *all* clients resolving to use dnssec.  Only a solution
that limits the scope to the the smallest case (ssh doing an
getrrsetbyname() for DNS_RDATATYPE_SSHFP is likely acceptable. Sadly
the context used by resolving is program-wide, so setting a flag in
_res is also not going to work.

-Otto



Re: getrrsetbyname() doesn't set RES_USE_DNSSEC, preventing VerifyHostKeyDNS with OpenSSH

2021-11-17 Thread Otto Moerbeek
On Wed, Nov 17, 2021 at 10:45:45AM -0800, Thomas Habets wrote:

> OpenSSH calls getrrsetbyname() in dns.c:verify_host_key_dns().
> It then checks for RRSET_VALIDATED, which is only set if the DNS response
> has the 'ad' attribute set.
> 
> getrrsetbyname() in turn uses res_.* to do DNS requests, but doesn't set
> RES_USE_DNSSEC when doing so.
> Thus the DNS query that goes out does not have the 'ad' bit set, causing
> the response too to not have 'ad' set.

You seem to be confused about the meaning of the ad bit. It is a bit
that only has significance on replies, see
https://datatracker.ietf.org/doc/html/rfc4035#section-4.6

Signalling that yo want a DNSSEC validated answer is normally done by
setting the DO bit in the EDNS options.

> 
> From my looking at the call stack there's actually no way for OpenSSH, or
> the user via env or /etc/resolv.conf, to set RES_USE_DNSSEC.
> 
> It seems the unwind DNS server *unconditionally* returns with 'ad' set, so

Nope, unwind sets the ad bit only on DNSSEC validated answers, and
other resolvers can be configured to do so.

> it works if (and only if?) unwind is the server queried. This seems like a
> bug, and it should probably work with all DNS servers (e.g. 8.8.8.8[3]).

Quad8 already sets the ad bit on DNSSEC validated answers, just as unwind.


> I believe that the fix here should be:
> 
>   else if (!strcmp(tok[i], "dnssec"))
>   ac->ac_options |= RES_USE_DNSSEC;

you are opening a can of worms.

> 
>   else if (!strcmp(tok[i], "edns0"))
>   ac->ac_options |= RES_USE_EDNS0;

-Otto

> 
> 
> [1]
> https://cvsweb.openbsd.org/src/usr.bin/ssh/dns.c?rev=1.41&content-type=text/x-cvsweb-markup
> [2]
> https://cvsweb.openbsd.org/src/lib/libc/asr/asr.c?rev=1.66&content-type=text/x-cvsweb-markup
> [3] I realize that the path from the recursive resolver to the machine must
> be secure. I'm using 8.8.8.8 as an example.
> https://serverfault.com/questions/1063853/sshfp-not-working
> 
> -- 
> typedef struct me_s {
>  char name[]  = { "Thomas Habets" };
>  char email[] = { "tho...@habets.se " };
>  char kernel[]= { "Linux" };
>  char *pgpKey[]   = { "http://www.habets.pp.se/pubkey.txt"; };
>  char pgp[] = { "9907 8698 8A24 F52F 1C2E  87F6 39A4 9EEA 460A 0169" };
>  char coolcmd[]   = { "echo '. ./_&. ./_'>_;. ./_" };
> } me_t;



Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-11-15 Thread Otto Moerbeek
On Mon, Nov 15, 2021 at 11:59:07AM -0700, Ted Bullock wrote:

> On 2021-11-14 7:57 p.m., Theo de Raadt wrote:
> > But I fear the "write" operation was never been tested by the Sun,
> > or maybe we are not doing something
> > 
> > but only on earlier generations
> I perused the change log of stand.h and revision 1.67 also might/will
> trigger the same bug on upgrading between versions for these machines.

So you build code with just the struct fs_ops change, while
keeping everything else the same?

Very puzzling...

-Otto



Re: httpd(8) crash on amd64 -current

2021-11-11 Thread Otto Moerbeek
On Thu, Nov 11, 2021 at 04:29:58PM +0100, Otto Moerbeek wrote:

> On Thu, Nov 11, 2021 at 04:13:37PM +0100, Florian Obser wrote:
> 
> > 
> > No idea how to reproduce this, I'm just running an httpd with debug
> > symbols and kern.nosuidcoredump=3
> > Pretty sure this is the crash various people mumbled about.
> > 
> > Smells like a use-after-fruit to me.
> 
> In server_http.c:351 desc->http_query is set to point in the middle of
> a string.  In the cases of goto fail belows that it will not be

and goto abort

> strdupped.  A free of desc->http_query then later bombs.
> 
>   -Otto
> 
> 
> > 
> > Core was generated by `httpd'.
> > Program terminated with signal SIGABRT, Aborted.
> > #0  thrkill () at /tmp/-:3
> > 3   /tmp/-: No such file or directory.
> > (gdb) bt
> > #0  thrkill () at /tmp/-:3
> > #1  0x09d1979a211e in _libc_abort () at 
> > /usr/src/lib/libc/stdlib/abort.c:51
> > #2  0x09d19798a726 in wrterror (d=0x9d230d35980,
> > msg=0x9d19795b05d "modified chunk-pointer %p")
> > at /usr/src/lib/libc/stdlib/malloc.c:307
> > #3  0x09d19798e0cc in find_chunknum (d=0x0, info=, 
> > ptr=0x0,
> > check=-236688) at /usr/src/lib/libc/stdlib/malloc.c:1063
> > #4  0x09d19798ac89 in ofree (argpool=0x7f7c66b0, p=0x9d1884d6a07,
> > clear=0, check=, argsz=0)
> > at /usr/src/lib/libc/stdlib/malloc.c:1409
> > #5  0x09d19798a96b in free (ptr=0x9d1884d6a07)
> > at /usr/src/lib/libc/stdlib/malloc.c:1470
> > #6  0x09cf5d137288 in server_httpdesc_free (desc=0x9d1ff641600)
> > at /usr/src/usr.sbin/httpd/server_http.c:113
> > #7  0x09cf5d13c1a1 in server_close_http (clt=0x9d1ff645000)
> > at /usr/src/usr.sbin/httpd/server_http.c:1088
> > #8  0x09cf5d133afc in server_close (clt=0x9d1ff645000,
> > msg=0x9d1ff633380 "malformed (400 Bad Request)")
> > at /usr/src/usr.sbin/httpd/server.c:1306
> > #9  0x09cf5d13890d in server_abort_http (clt=0x9d1ff645000, code=400,
> > msg=0x9cf5d113dea "malformed")
> > at /usr/src/usr.sbin/httpd/server_http.c:1077
> > #10 0x09cf5d137c13 in server_read_http (bev=0x9d1ff61b800,
> > arg=0x9d1ff645000) at /usr/src/usr.sbin/httpd/server_http.c:366
> > --Type  for more, q to quit, c to continue without paging--
> > #11 0x09d1f3766f29 in bufferevent_readcb (fd=,
> > event=, arg=0x9d1ff61b800)
> > at /usr/src/lib/libevent/evbuffer.c:140
> > #12 0x09d1f3765b9f in event_process_active (base=0x9d1884c5c00)
> > at /usr/src/lib/libevent/event.c:333
> > #13 event_base_loop (base=0x9d1884c5c00, flags=0)
> > at /usr/src/lib/libevent/event.c:483
> > #14 0x09cf5d131a11 in proc_run (ps=0x9d1884cc800, p=0x9cf5d148a70 
> > ,
> > procs=0x9cf5d148b90 , nproc=2, run=0x9cf5d132100 ,
> > arg=0x0) at /usr/src/usr.sbin/httpd/proc.c:604
> > #15 0x09cf5d1320d2 in server (ps=0x9d1884cc800, p=0x9cf5d148a70 )
> > at /usr/src/usr.sbin/httpd/server.c:87
> > #16 0x09cf5d1303c5 in proc_init (ps=0x9d1884cc800,
> > procs=0x9cf5d148a70 , nproc=2, debug=0, argc=5,
> > argv=0x7f7d6de8, proc_id=PROC_SERVER)
> > at /usr/src/usr.sbin/httpd/proc.c:260
> > #17 0x09cf5d1276f1 in main (argc=0, argv=0x7f7d6de8)
> > at /usr/src/usr.sbin/httpd/httpd.c:220
> > 
> > -- 
> > I'm not entirely sure you are real.
> > 
> 



Re: httpd(8) crash on amd64 -current

2021-11-11 Thread Otto Moerbeek
On Thu, Nov 11, 2021 at 04:13:37PM +0100, Florian Obser wrote:

> 
> No idea how to reproduce this, I'm just running an httpd with debug
> symbols and kern.nosuidcoredump=3
> Pretty sure this is the crash various people mumbled about.
> 
> Smells like a use-after-fruit to me.

In server_http.c:351 desc->http_query is set to point in the middle of
a string.  In the cases of goto fail belows that it will not be
strdupped.  A free of desc->http_query then later bombs.

-Otto


> 
> Core was generated by `httpd'.
> Program terminated with signal SIGABRT, Aborted.
> #0  thrkill () at /tmp/-:3
> 3 /tmp/-: No such file or directory.
> (gdb) bt
> #0  thrkill () at /tmp/-:3
> #1  0x09d1979a211e in _libc_abort () at 
> /usr/src/lib/libc/stdlib/abort.c:51
> #2  0x09d19798a726 in wrterror (d=0x9d230d35980,
> msg=0x9d19795b05d "modified chunk-pointer %p")
> at /usr/src/lib/libc/stdlib/malloc.c:307
> #3  0x09d19798e0cc in find_chunknum (d=0x0, info=, ptr=0x0,
> check=-236688) at /usr/src/lib/libc/stdlib/malloc.c:1063
> #4  0x09d19798ac89 in ofree (argpool=0x7f7c66b0, p=0x9d1884d6a07,
> clear=0, check=, argsz=0)
> at /usr/src/lib/libc/stdlib/malloc.c:1409
> #5  0x09d19798a96b in free (ptr=0x9d1884d6a07)
> at /usr/src/lib/libc/stdlib/malloc.c:1470
> #6  0x09cf5d137288 in server_httpdesc_free (desc=0x9d1ff641600)
> at /usr/src/usr.sbin/httpd/server_http.c:113
> #7  0x09cf5d13c1a1 in server_close_http (clt=0x9d1ff645000)
> at /usr/src/usr.sbin/httpd/server_http.c:1088
> #8  0x09cf5d133afc in server_close (clt=0x9d1ff645000,
> msg=0x9d1ff633380 "malformed (400 Bad Request)")
> at /usr/src/usr.sbin/httpd/server.c:1306
> #9  0x09cf5d13890d in server_abort_http (clt=0x9d1ff645000, code=400,
> msg=0x9cf5d113dea "malformed")
> at /usr/src/usr.sbin/httpd/server_http.c:1077
> #10 0x09cf5d137c13 in server_read_http (bev=0x9d1ff61b800,
> arg=0x9d1ff645000) at /usr/src/usr.sbin/httpd/server_http.c:366
> --Type  for more, q to quit, c to continue without paging--
> #11 0x09d1f3766f29 in bufferevent_readcb (fd=,
> event=, arg=0x9d1ff61b800)
> at /usr/src/lib/libevent/evbuffer.c:140
> #12 0x09d1f3765b9f in event_process_active (base=0x9d1884c5c00)
> at /usr/src/lib/libevent/event.c:333
> #13 event_base_loop (base=0x9d1884c5c00, flags=0)
> at /usr/src/lib/libevent/event.c:483
> #14 0x09cf5d131a11 in proc_run (ps=0x9d1884cc800, p=0x9cf5d148a70 ,
> procs=0x9cf5d148b90 , nproc=2, run=0x9cf5d132100 ,
> arg=0x0) at /usr/src/usr.sbin/httpd/proc.c:604
> #15 0x09cf5d1320d2 in server (ps=0x9d1884cc800, p=0x9cf5d148a70 )
> at /usr/src/usr.sbin/httpd/server.c:87
> #16 0x09cf5d1303c5 in proc_init (ps=0x9d1884cc800,
> procs=0x9cf5d148a70 , nproc=2, debug=0, argc=5,
> argv=0x7f7d6de8, proc_id=PROC_SERVER)
> at /usr/src/usr.sbin/httpd/proc.c:260
> #17 0x09cf5d1276f1 in main (argc=0, argv=0x7f7d6de8)
> at /usr/src/usr.sbin/httpd/httpd.c:220
> 
> -- 
> I'm not entirely sure you are real.
> 



Re: dc strips leading 0's in 2o output, is this wanted?

2021-05-16 Thread Otto Moerbeek
On Sun, May 16, 2021 at 09:48:33AM +0200, Otto Moerbeek wrote:

> On Sat, May 15, 2021 at 08:46:51PM +0200, p...@delphinusdns.org wrote:
> 
> > >Synopsis:  dc strips leading 0's in 2o (base 2 output)
> > >Category:  user
> > >Environment:
> > System  : OpenBSD 6.9
> > Details : OpenBSD 6.9 (GENERIC.MP) #473: Mon Apr 19 10:40:28 MDT 
> > 2021
> >  
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> > I'm proofing a program and I had a hard time with the following output:
> > 
> > 12CBEE93BA7494A4A3F576A844CA2539 <---
> > 00010010110010101110100100111011101001110100100101001010010010100011010101110110101011000100110010100010010100111001
> > pod$ dc
> > 16i
> > 2o
> > 12CBEE93BA7494A4A3F576A844CA2539
> > p
> > 1001011001010111010010011101110100111010010010100101001001010001\
> > 1010101110110101011000100110010100010010100111001
> > 12 p
> > 10010
> > 
> > It seems that (3) leading 0's are stripped changing the value of 0x12.
> > Coincidentally 12 decimal would fit in the first 5 bits.  I don't
> > know if there is a mode I forgot to check, but debug was at first
> > a little slow due to this 'bug'.  Am i using it wrong?
> > >How-To-Repeat:
> > I'm using this ghetto function to produce the binary in code:
> > 
> > void 
> > print_binary(u32 input)
> > {
> > int64_t i;
> > 
> > for (i = 31;  i >= 0; i--) {
> > if((input >> i) & 0x0001U) printf("1");
> > else printf("0");
> > }
> > 
> > }
> > 
> > This isn't the first incarnation of the print_binary() so I apologize
> > for its format.  I tried hacking it to something I can work with.
> > 
> > >Fix:
> > Not provided, sorry, I did look at the source code but this seems
> > beyond me at first glance, and I'm not even sure if I'm using dc
> > right.
> > 
> 
> How can stripping leading zeros lead to a change in value?  They are
> always implicit and always stripped by dc. Only 0 is printed as 0, not
> stripping the leading zero because that would lead to an empty string.
> 
> You are assuming some form of grouping wil be done. That is not true.
> Note that 125 is not a multiple of 4.
> 
> to illustrate using bc:
> $ bc
> obase=16
> ibase=2
> 10010110010101110100100111011101001110100100101001010010010100011010101110110101011000100110010100010010100111001
> 12CBEE93BA7494A4A3F576A844CA2539

To phrase it differently, when converting from base 2 to base 16, you
need to start at the *end*, not the beginning of the string. The
grouping of 4 base 2 digits to 1 base 16 digit only works since 2^4 ==
16.

-Otto



Re: dc strips leading 0's in 2o output, is this wanted?

2021-05-16 Thread Otto Moerbeek
On Sat, May 15, 2021 at 08:46:51PM +0200, p...@delphinusdns.org wrote:

> >Synopsis:dc strips leading 0's in 2o (base 2 output)
> >Category:user
> >Environment:
>   System  : OpenBSD 6.9
>   Details : OpenBSD 6.9 (GENERIC.MP) #473: Mon Apr 19 10:40:28 MDT 
> 2021
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   I'm proofing a program and I had a hard time with the following output:
> 
> 12CBEE93BA7494A4A3F576A844CA2539 <---
> 00010010110010101110100100111011101001110100100101001010010010100011010101110110101011000100110010100010010100111001
> pod$ dc
> 16i
> 2o
> 12CBEE93BA7494A4A3F576A844CA2539
> p
> 1001011001010111010010011101110100111010010010100101001001010001\
> 1010101110110101011000100110010100010010100111001
> 12 p
> 10010
> 
>   It seems that (3) leading 0's are stripped changing the value of 0x12.
>   Coincidentally 12 decimal would fit in the first 5 bits.  I don't
>   know if there is a mode I forgot to check, but debug was at first
>   a little slow due to this 'bug'.  Am i using it wrong?
> >How-To-Repeat:
>   I'm using this ghetto function to produce the binary in code:
> 
> void 
> print_binary(u32 input)
> {
> int64_t i;
> 
> for (i = 31;  i >= 0; i--) {
> if((input >> i) & 0x0001U) printf("1");
> else printf("0");
> }
> 
> }
> 
>   This isn't the first incarnation of the print_binary() so I apologize
>   for its format.  I tried hacking it to something I can work with.
> 
> >Fix:
>   Not provided, sorry, I did look at the source code but this seems
>   beyond me at first glance, and I'm not even sure if I'm using dc
>   right.
> 

How can stripping leading zeros lead to a change in value?  They are
always implicit and always stripped by dc. Only 0 is printed as 0, not
stripping the leading zero because that would lead to an empty string.

You are assuming some form of grouping wil be done. That is not true.
Note that 125 is not a multiple of 4.

to illustrate using bc:
$ bc
obase=16
ibase=2
10010110010101110100100111011101001110100100101001010010010100011010101110110101011000100110010100010010100111001
12CBEE93BA7494A4A3F576A844CA2539


-Otto



Re: library: Basic Regular Expression (BRE) bug in \{m,n\} with \(\) and \n

2021-04-03 Thread Otto Moerbeek
On Fri, Apr 02, 2021 at 04:17:48PM +0200, Otto Moerbeek wrote:

> On Fri, Apr 02, 2021 at 01:57:07PM +0200, Otto Moerbeek wrote:
> 
> > On Tue, Feb 23, 2021 at 04:16:09AM -0800, Michael Paoli wrote:
> > 
> > > > Synopsis:  Basic Regular Expression (BRE) bug in \{m,n\} with \(\) 
> > > > and \n
> > > > Category:  library
> > > > Environment:
> > > System  : OpenBSD 6.7
> > > Details : OpenBSD 6.7 (GENERIC) #7: Wed Jan  6 15:19:25 MST 
> > > 2021
> > > t...@syspatch-67-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
> > > 
> > > Architecture: OpenBSD.amd64
> > > Machine : amd64
> > > > Description:
> > > Certain BRE expressions fail/misbehave unexpectedly.
> > > The failures are the same in both grep and sed (without -E).
> > > The failures only occur with certain combinations of use of:
> > > \{\}, \(\), \n (where n is digit) syntax, dropping any one
> > > of those then generally fails to trigger the bug.
> > > The bug/error can be seen most clearly in unexpected
> > > behavior of the \{m,n\} portion in the given context.
> > > If more of the (apparently dependent) context is removed,
> > > the bug doesn't show up.  E.g. some of the clearest cases
> > > involve replacing * with \{0,\} in the BRE, and getting
> > > quite unexpected results (one would expect the results
> > > to be the same).  These same BREs work under both
> > > Solaris 11 and GNU/Linux with their sed and grep.
> > > > How-To-Repeat:
> > > This example code can be used to illustrate the problem,
> > > and both show cases where the bug shows up, and also slightly
> > > differing contexts where the bug does not occur.
> > > In each of these cases, the output should be the STRING
> > > we set/echo into grep/sed where we use our BRE, but in the bug
> > > cases we get no output.
> > > It's also suggested test cases be added to the code to catch
> > > possible regression bugs, should issue recur.  :-)
> > > Example code to show where bug does (and doesn't) show up:
> > > (
> > > exec 2>&1
> > > set -- \
> > > 'YYxx' 'Y*\(x\)\1' \
> > > 'YYxx' 'Y\{0,\}\(x\)\1' \
> > > 'YYxx' 'Y\{2,\}\(x\)\1' \
> > > 'YYxx' 'Y\{0,\}\(x\)' \
> > > 'YYxx' 'Y\{2,\}x' \
> > > 'YYxx' 'Y\{2,\}x\{1,\}' \
> > > 'YYxx' 'Y\{2,\}x\{0,\}' \
> > > 'YYxxz' 'Y\{2,\}x\{0,\}z' \
> > > 'YYxxz' 'Y\{0,\}x\{0,\}z' \
> > > 'YYxyxy' 'Y\{2,\}\(xy\)\1' \
> > > 'YYxyxy' 'Y\{0,\}\(xy\)\1' \
> > > 'YYxyxy' 'Y*\(xy\)\1' \
> > > 'YYxyxy' 'Y\{0,\}\(xy\)xy'
> > > while [ "$#" -ge 2 ]
> > > do
> > > STRING="$1"; shift; BRE="$1"; shift
> > > set -x
> > > echo "$STRING" | grep -e "$BRE"
> > > echo "$STRING" | sed -ne "s/$BRE/&/p"
> > > set +x
> > > done
> > > )
> > > Example run of above code.  Bug is present where our
> > > STRING echoed into grep/sed fails to appear in the
> > > output:
> > > + echo YYxx
> > > + grep -e Y*\(x\)\1
> > > YYxx
> > > + echo YYxx
> > > + sed -ne s/Y*\(x\)\1/&/p
> > > YYxx
> > > + set +x
> > > + echo YYxx
> > > + grep -e Y\{0,\}\(x\)\1
> > > + echo YYxx
> > > + sed -ne s/Y\{0,\}\(x\)\1/&/p
> > > + set +x
> > > + echo YYxx
> &g

Re: library: Basic Regular Expression (BRE) bug in \{m,n\} with \(\) and \n

2021-04-02 Thread Otto Moerbeek
On Fri, Apr 02, 2021 at 01:57:07PM +0200, Otto Moerbeek wrote:

> On Tue, Feb 23, 2021 at 04:16:09AM -0800, Michael Paoli wrote:
> 
> > > Synopsis:  Basic Regular Expression (BRE) bug in \{m,n\} with \(\) 
> > > and \n
> > > Category:  library
> > > Environment:
> > System  : OpenBSD 6.7
> > Details : OpenBSD 6.7 (GENERIC) #7: Wed Jan  6 15:19:25 MST 2021
> > t...@syspatch-67-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
> > 
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > > Description:
> > Certain BRE expressions fail/misbehave unexpectedly.
> > The failures are the same in both grep and sed (without -E).
> > The failures only occur with certain combinations of use of:
> > \{\}, \(\), \n (where n is digit) syntax, dropping any one
> > of those then generally fails to trigger the bug.
> > The bug/error can be seen most clearly in unexpected
> > behavior of the \{m,n\} portion in the given context.
> > If more of the (apparently dependent) context is removed,
> > the bug doesn't show up.  E.g. some of the clearest cases
> > involve replacing * with \{0,\} in the BRE, and getting
> > quite unexpected results (one would expect the results
> > to be the same).  These same BREs work under both
> > Solaris 11 and GNU/Linux with their sed and grep.
> > > How-To-Repeat:
> > This example code can be used to illustrate the problem,
> > and both show cases where the bug shows up, and also slightly
> > differing contexts where the bug does not occur.
> > In each of these cases, the output should be the STRING
> > we set/echo into grep/sed where we use our BRE, but in the bug
> > cases we get no output.
> > It's also suggested test cases be added to the code to catch
> > possible regression bugs, should issue recur.  :-)
> > Example code to show where bug does (and doesn't) show up:
> > (
> > exec 2>&1
> > set -- \
> > 'YYxx' 'Y*\(x\)\1' \
> > 'YYxx' 'Y\{0,\}\(x\)\1' \
> > 'YYxx' 'Y\{2,\}\(x\)\1' \
> > 'YYxx' 'Y\{0,\}\(x\)' \
> > 'YYxx' 'Y\{2,\}x' \
> > 'YYxx' 'Y\{2,\}x\{1,\}' \
> > 'YYxx' 'Y\{2,\}x\{0,\}' \
> > 'YYxxz' 'Y\{2,\}x\{0,\}z' \
> > 'YYxxz' 'Y\{0,\}x\{0,\}z' \
> > 'YYxyxy' 'Y\{2,\}\(xy\)\1' \
> > 'YYxyxy' 'Y\{0,\}\(xy\)\1' \
> > 'YYxyxy' 'Y*\(xy\)\1' \
> > 'YYxyxy' 'Y\{0,\}\(xy\)xy'
> > while [ "$#" -ge 2 ]
> > do
> > STRING="$1"; shift; BRE="$1"; shift
> > set -x
> > echo "$STRING" | grep -e "$BRE"
> > echo "$STRING" | sed -ne "s/$BRE/&/p"
> > set +x
> > done
> > )
> > Example run of above code.  Bug is present where our
> > STRING echoed into grep/sed fails to appear in the
> > output:
> > + echo YYxx
> > + grep -e Y*\(x\)\1
> > YYxx
> > + echo YYxx
> > + sed -ne s/Y*\(x\)\1/&/p
> > YYxx
> > + set +x
> > + echo YYxx
> > + grep -e Y\{0,\}\(x\)\1
> > + echo YYxx
> > + sed -ne s/Y\{0,\}\(x\)\1/&/p
> > + set +x
> > + echo YYxx
> > + grep -e Y\{2,\}\(x\)\1
> > YYxx
> > + echo YYxx
> > + sed -ne s/Y\{2,\}\(x\)\1/&/p
> > YYxx
> > + set +x
> > + echo YYxx
> > + grep -e Y\{0,\}\(x\)
> > YYxx
> > + echo YYxx
> > + sed -ne s/Y\{0,\}\(x\)/&/p
> > YYxx
> > + set +x
> > + echo YYxx
> > + grep

Re: library: Basic Regular Expression (BRE) bug in \{m,n\} with \(\) and \n

2021-04-02 Thread Otto Moerbeek
On Tue, Feb 23, 2021 at 04:16:09AM -0800, Michael Paoli wrote:

> > Synopsis:  Basic Regular Expression (BRE) bug in \{m,n\} with \(\) and 
> > \n
> > Category:  library
> > Environment:
> System  : OpenBSD 6.7
> Details : OpenBSD 6.7 (GENERIC) #7: Wed Jan  6 15:19:25 MST 2021
> t...@syspatch-67-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
> 
> Architecture: OpenBSD.amd64
> Machine : amd64
> > Description:
> Certain BRE expressions fail/misbehave unexpectedly.
> The failures are the same in both grep and sed (without -E).
> The failures only occur with certain combinations of use of:
> \{\}, \(\), \n (where n is digit) syntax, dropping any one
> of those then generally fails to trigger the bug.
> The bug/error can be seen most clearly in unexpected
> behavior of the \{m,n\} portion in the given context.
> If more of the (apparently dependent) context is removed,
> the bug doesn't show up.  E.g. some of the clearest cases
> involve replacing * with \{0,\} in the BRE, and getting
> quite unexpected results (one would expect the results
> to be the same).  These same BREs work under both
> Solaris 11 and GNU/Linux with their sed and grep.
> > How-To-Repeat:
> This example code can be used to illustrate the problem,
> and both show cases where the bug shows up, and also slightly
> differing contexts where the bug does not occur.
> In each of these cases, the output should be the STRING
> we set/echo into grep/sed where we use our BRE, but in the bug
> cases we get no output.
> It's also suggested test cases be added to the code to catch
> possible regression bugs, should issue recur.  :-)
> Example code to show where bug does (and doesn't) show up:
> (
> exec 2>&1
> set -- \
> 'YYxx' 'Y*\(x\)\1' \
> 'YYxx' 'Y\{0,\}\(x\)\1' \
> 'YYxx' 'Y\{2,\}\(x\)\1' \
> 'YYxx' 'Y\{0,\}\(x\)' \
> 'YYxx' 'Y\{2,\}x' \
> 'YYxx' 'Y\{2,\}x\{1,\}' \
> 'YYxx' 'Y\{2,\}x\{0,\}' \
> 'YYxxz' 'Y\{2,\}x\{0,\}z' \
> 'YYxxz' 'Y\{0,\}x\{0,\}z' \
> 'YYxyxy' 'Y\{2,\}\(xy\)\1' \
> 'YYxyxy' 'Y\{0,\}\(xy\)\1' \
> 'YYxyxy' 'Y*\(xy\)\1' \
> 'YYxyxy' 'Y\{0,\}\(xy\)xy'
> while [ "$#" -ge 2 ]
> do
> STRING="$1"; shift; BRE="$1"; shift
> set -x
> echo "$STRING" | grep -e "$BRE"
> echo "$STRING" | sed -ne "s/$BRE/&/p"
> set +x
> done
> )
> Example run of above code.  Bug is present where our
> STRING echoed into grep/sed fails to appear in the
> output:
> + echo YYxx
> + grep -e Y*\(x\)\1
> YYxx
> + echo YYxx
> + sed -ne s/Y*\(x\)\1/&/p
> YYxx
> + set +x
> + echo YYxx
> + grep -e Y\{0,\}\(x\)\1
> + echo YYxx
> + sed -ne s/Y\{0,\}\(x\)\1/&/p
> + set +x
> + echo YYxx
> + grep -e Y\{2,\}\(x\)\1
> YYxx
> + echo YYxx
> + sed -ne s/Y\{2,\}\(x\)\1/&/p
> YYxx
> + set +x
> + echo YYxx
> + grep -e Y\{0,\}\(x\)
> YYxx
> + echo YYxx
> + sed -ne s/Y\{0,\}\(x\)/&/p
> YYxx
> + set +x
> + echo YYxx
> + grep -e Y\{2,\}x
> YYxx
> + echo YYxx
> + sed -ne s/Y\{2,\}x/&/p
> YYxx
> + set +x
> + echo YYxx
> + grep -e Y\{2,\}x\{1,\}
> YYxx
> + echo YYxx
> + sed -ne s/Y\{2,\}x\{1,\}/&/p
> YYxx
> + set +x
> + echo YYxx
> + grep -e Y\{2,\}x\{0,\}
> YYxx
> + echo YYxx
> + sed -ne s/Y\{2,\}x\{0,\}/&/p
> YYxx
> + set +x
> + echo YYxxz
> + grep -e Y\{2,\}x\{0,\}z
> YYxxz
> + echo YYxxz
> + sed -ne s/Y\{2,\}x\{0,\}z/&/p
> YYxxz
> + set +x
> + echo YYxxz
> + grep -e Y\{0,\}x\{0,\}z
> YYxxz
> + echo YYxxz
> + sed -ne s/Y\{0,\}x\{0,\}z/&/p
> YYxxz
> + set +x
> + echo YYxyxy
> + grep -e Y\{2,\}\(xy\)\1
> YYxyxy
> + echo YYxyxy
> + sed -ne s/Y\{2,\}\(xy\)\1/&/p
> YYxyxy
> + set +x
> + echo YYxyxy
> + grep -e Y\{0,\}\(xy\)\1
> + echo YYxyxy
> + sed -ne s/Y\{0,\}\(xy\)\1/&/p
> + set +x
> + echo YYxyxy
> + grep -e Y*\(xy\)\1
>  

Re: LLVM error

2021-02-19 Thread Otto Moerbeek
On Fri, Feb 19, 2021 at 08:09:02PM -0500, david rosier wrote:

This is not a bug.  Likely you did not run sysmerge when upgrading,
recently the memory limits were upped.

vsweb.openbsd.org/cgi-bin/cvsweb/src/etc/etc.arm64/login.conf.diff?r1=1.7&r2=1.8&f=h

-Otto

> LLVM ERROR: out of memory
> Stack dump:
> 0.  Program arguments: c++ -O2 -pipe -fno-ret-protector -mno-retpoline
> -std=c++14 -fvisibility-inlines-hidden -fno-exceptions -fno-rtti -Wall -W
> -Wno-unused-parameter -Wwrite-strings -Wcast-qual
> -Wno-missing-field-initializers -pedantic -Wno-long-long
> -Wdelete-non-virtual-dtor -Wno-comment -fno-pie -MD -MP
> -I/usr/src/gnu/usr.bin/clang/libclangAST/obj/../include/clang/AST
> -I/usr/src/gnu/usr.bin/clang/libclangAST/../../../llvm/clang/include
> -I/usr/src/gnu/usr.bin/clang/libclangAST/../../../llvm/llvm/include
> -I/usr/src/gnu/usr.bin/clang/libclangAST/../include
> -I/usr/src/gnu/usr.bin/clang/libclangAST/obj
> -I/usr/src/gnu/usr.bin/clang/libclangAST/obj/../include -DNDEBUG
> -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS
> -DLLVM_PREFIX="/usr" -c -MD -MP -MT ASTImporter.o -MT ASTImporter.po -MT
> ASTImporter.so -MT ASTImporter.do
> /usr/src/gnu/usr.bin/clang/libclangAST/../../../llvm/clang/lib/AST/ASTImporter.cpp
> -o ASTImporter.o.o
> 1.   parser at end of file
> 2.  Per-module optimization passes
> 3.  Running pass 'Interprocedural Sparse Conditional Constant
> Propagation' on module
> '/usr/src/gnu/usr.bin/clang/libclangAST/../../../llvm/clang/lib/AST/ASTImporter.cpp'.
> c++: error: clang frontend command failed due to signal (use -v to see
> invocation)
> OpenBSD clang version 10.0.1
> Target: amd64-unknown-openbsd6.9
> Thread model: posix
> InstalledDir: /usr/bin
> c++: note: diagnostic msg: PLEASE submit a bug report to
> http://llvm.org/bugs/ and include the crash backtrace, preprocessed source,
> and associated run script.
> c++: note: diagnostic msg:
> 
> 
> PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
> Preprocessed source(s) and associated run script(s) are located at:
> c++: note: diagnostic msg: /tmp/ASTImporter-a0794c.cpp
> c++: note: diagnostic msg: /tmp/ASTImporter-a0794c.sh
> c++: note: diagnostic msg:
> 
> 
> *** Error 254 in gnu/usr.bin/clang/libclangAST (:67
> 'ASTImporter.o': @c++ -O2 -pipe  -fno-ret-protector -mno-retpoline
> -std=c++1...)
> *** Error 2 in gnu/usr.bin/clang (:48 'all': @for entry in
> include/llvm/Config libLLVMSupport libLLVMTableGen llvm-tblgen inc...)
> *** Error 2 in gnu/usr.bin (:48 'all': @for entry in cc
> clang cxxfilt cvs binutils binutils-2.17 perl texinfo; do  set -e; if...)
> *** Error 2 in gnu (:48 'all': @for entry in lib usr.bin
> usr.sbin; do  set -e; if test -d /usr/src/gnu/${entry}.amd64; then  ...)
> *** Error 2 in . (:48 'all': @for entry in lib include bin
> libexec sbin usr.bin usr.sbin share games gnu sys; do  set -e; if ...)
> *** Error 2 in . (Makefile:97 'do-build')
> *** Error 2 in /usr/src (Makefile:74 'build')
> -- 
> Unix is very simple, but it takes a genius to understand the simplicity.



Re: munmap sometimes does coredump on arm after mmap success

2021-01-20 Thread Otto Moerbeek
On Wed, Jan 20, 2021 at 07:04:14PM +0100, Christian Jullien wrote:

> Fortunalety, OpenBSD has mquery specific call which does exactly what I need
> to find a valid start address.
> Porting a software is sometimes very easy, sometimes it needs some
> adaptations.
> I'm fine with mquery.

To state the obvious: your original code doing mmap(...MAP_FIXED...)
only works by accident on other systems. Due to address space size
and/or randomization differences your code just got lucky on those
systems.

Just don't use it mmap(...MAP_FIXED...).

-Otto

> 
> Christian
> 
> -Original Message-
> From: Theo de Raadt [mailto:dera...@openbsd.org] 
> Sent: Wednesday, January 20, 2021 18:01
> To: jull...@eligis.com
> Cc: 'Otto Moerbeek'; bugs@openbsd.org
> Subject: Re: munmap sometimes does coredump on arm after mmap success
> 
> > Even if the behavior is still strange for me, I accept how it behaves on
> > OpenBSD.
> 
> mmap and munmap work the same on all systems.
> 
> You are just less likely to replace used memory, which is relied upon.
>  
> 



Re: munmap sometimes does coredump on arm after mmap success

2021-01-20 Thread Otto Moerbeek
On Wed, Jan 20, 2021 at 04:54:26PM +0100, Christian Jullien wrote:

> Hum! I'm not convinced as don't do anything with returned memory block
> between mmap and munmap.
> The offending code is similar to munmap(mmap(...)) => coredump!

Sigh. You *are* doing something with the returned memory: unmapping it.
Again: MAP_FIXED will replace existing mappings. Think about it what
that means.

-Otto

> 
> 
> -Original Message-
> From: Mark Kettenis [mailto:mark.kette...@xs4all.nl] 
> Sent: Wednesday, January 20, 2021 15:53
> To: jull...@eligis.com
> Cc: dera...@openbsd.org; o...@drijf.net; bugs@openbsd.org
> Subject: Re: munmap sometimes does coredump on arm after mmap success
> 
> > Reply-To: 
> > From: "Christian Jullien" 
> > 
> > I will no longer use MAP_FIXED on OpenBSD and accept that save/restore
> fails
> > on this system. It is a rather minor feature.
> > 
> > Note for myself: I clearly accept that MAP_FIXED can fails to allocate at
> a
> > given address but, when succeeded I still don't understand why the address
> > range returned at this address does not entirely belong to my process and
> > nobody else even my own code can allocate something in that range.
> > IMHO munmap should never fail for an address returned by mmap.
> 
> The munmap doesn't fail.  It succeeds and unmaps bits of the address
> space that you're running on.  You're pulling out the carpet from
> under your own feet!  So you fall over and this results in a core dump.
> 



Re: munmap sometimes does coredump on arm after mmap success

2021-01-20 Thread Otto Moerbeek
On Wed, Jan 20, 2021 at 03:02:52PM +0100, Christian Jullien wrote:

> I will no longer use MAP_FIXED on OpenBSD and accept that save/restore fails
> on this system. It is a rather minor feature.
> 
> Note for myself: I clearly accept that MAP_FIXED can fails to allocate at a
> given address but, when succeeded I still don't understand why the address

there your reasoning is flawed. Read the man page of mmap: it
cdxlearly states that MAP_FIXED replaces existing mappings. 

> range returned at this address does not entirely belong to my process and
> nobody else even my own code can allocate something in that range.

the range does belong to you process, but it might contain malloc
data, library code or data, program code or data, who knows.

> IMHO munmap should never fail for an address returned by mmap.

munmap does not fail. Your programs faults because it is accessing data
or code that just got unmapped.

-Otto

> 
> Theo, your time is precious, you're not obliged to reply and, in any case,
> you can close this ticket.
> 
> Thank you for your time.
> 
> Christian
> 
> -Original Message-
> From: Theo de Raadt [mailto:dera...@openbsd.org] 
> Sent: Wednesday, January 20, 2021 09:29
> To: jull...@eligis.com
> Cc: 'Otto Moerbeek'; bugs@openbsd.org
> Subject: Re: munmap sometimes does coredump on arm after mmap success
> 
> Christian Jullien  wrote:
> 
> > My allocator is much complex than that, it has start heuristics and then
> > makes different mmap/munmap until it finds a location having the right
> > (possibly reduced) size.
> > That's why I was surprised to see munmap failed after successful mmap.
> 
> There is no possible start heuristic.  Various allocators are too likely to
> use address space you don't know about.  Anything except for the NULL page,
> or per-architecture limitations, is up for grabs.
> 
> So how do you know if a page is currently in use, and that you should not
> use it?
> 
> You don't.
> 
> > I'll refine my strategy or I'll fall back to -novm.
> 
> Almost all operating systems have random allocators with the same
> characteristic of making MAP_FIXED a terrible idea.  And the result?
> Application will fail occasionally.  That's not very nice.
> 



Re: munmap sometimes does coredump on arm after mmap success

2021-01-19 Thread Otto Moerbeek
On Wed, Jan 20, 2021 at 07:04:08AM +0100, Christian Jullien wrote:

> Hi,
> 
> I'm running OpenBSD 6.8 on arm :
> 
> $ sysctl | grep hw
> hw.machine=armv7
> hw.model=ARM Cortex-A8 r3p2
> hw.ncpu=1
> hw.byteorder=1234
> hw.pagesize=4096
> hw.disknames=
> hw.diskcount=2
> hw.cpuspeed=0
> hw.product=TI AM335x BeagleBone Black
> hw.serialno==
> hw.physmem=477257728
> hw.usermem=477241344
> hw.ncpufound=1
> hw.allowpowerdown=1
> hw.ncpuonline=1
> 
> $ uname -a
> OpenBSD  6.8 GENERIC#353 armv7
> $ clang -v
> OpenBSD clang version 10.0.1
> Target: armv7-unknown-openbsd6.8-gnueabi
> Thread model: posix
> InstalledDir: /usr/bin
> 
> And I'm trying to port my OpenLisp ISLISP compiler on it (which is quite
> different than OpenLISP, I have anteriority of the name). It needs a portion
> of memory reserved at a fixed address to allow to expand memory on demand
> and to save/restore images.
> This code works on all systems I know (> 160 GNU triplets) and it works for
> years on OpenBSD x86/x64.
> 
> On arm, the portion of code below, sometimes does a coredump after mmap
> successfully returned the desired memory mapped region.
> Changing the start address or memory size (down to 0x20 == 2Mb) does not
> change the issue.
> I can understand that mmap can fail for some given values but not why munmap
> fails after mmap success.

This is not a bug.

mmap with MAP_FIXED replaces any existing mapping. The unmap then zaps that
mapping, which could contain code or data used by the process. Due to
address space randomization of various things, it depens on luck.

The proper way is to not depend on a fixed address. If your image dump
contains addresses, make them relative. For OpenLisp that might be a
huge task though. But depending on fixed addresses is not going to
work in the end.

An approach that does not do damage but won't succeeed all the time:

Try without MAP_FIXED. The hint will be used and if it overlaps with
an existing mapping, mmap will return a different address or MAP_FAILED
if it could not. Then check the address and if it isn't the expected
address, bail out. It is better to fail than to cause memory corruption.

-Otto
> 
> Here is the code snippet:
> 
> $ more foo.c
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> void
> olsystrybase(void *start, size_t size) {
>   int fd = -1;
>   void *mem;
> 
>   printf("Trying mmap start=%p, %lx, fd=%d\n", start, size, fd);
>   mem = mmap(start,
>  size,
>  PROT_READ | PROT_WRITE,
>  (MAP_ANONYMOUS | MAP_FIXED | MAP_PRIVATE),
>  fd,
>  0);
>   printf("mmap returns: start=%p, mem=%p, size=%lx fd=%d\n", start, mem,
> size, fd);
> 
>   if (mem == (void *)MAP_FAILED) {
> return;
>   }
> 
>   printf("Going to munmap: %p, %lx\n", mem, size);
>   (void)munmap(mem, size);
>   printf("munmap done: %lx\n", size);
> }
> 
> int
> main() {
>   olsystrybase((void*)0x5000, 0x88);
> }
> 
> To reproduce (it may succeed from 1 to ~10 before hanging on the system I
> use):
> 
> $ clang -o foo foo.c; while (./foo); do done
> Trying mmap start=0x5000, 88, fd=-1
> mmap returns: start=0x5000, mem=0x5000, size=88 fd=-1
> Going to munmap: 0x5000, 88
> munmap done: 88
> Trying mmap start=0x5000, 88, fd=-1
> mmap returns: start=0x5000, mem=0x5000, size=88 fd=-1
> Going to munmap: 0x5000, 88
> munmap done: 88
> Trying mmap start=0x5000, 88, fd=-1
> mmap returns: start=0x5000, mem=0x5000, size=88 fd=-1
> Going to munmap: 0x5000, 88
> munmap done: 88
> Trying mmap start=0x5000, 88, fd=-1
> mmap returns: start=0x5000, mem=0x5000, size=88 fd=-1
> Going to munmap: 0x5000, 88
> munmap done: 88
> Trying mmap start=0x5000, 88, fd=-1
> Segmentation fault (core dumped)
> 



Re: rge(4) interrupt storm

2021-01-06 Thread Otto Moerbeek
On Wed, Jan 06, 2021 at 02:34:14PM -0800, xSAPPYx wrote:

> On Sun, Nov 22, 2020 at 6:19 AM Mark Kettenis 
> wrote:
> 
> > > Date: Sat, 21 Nov 2020 14:14:21 +0100
> > > From: Otto Moerbeek 
> > >
> > > On Fri, Nov 20, 2020 at 04:25:31PM +0100, Otto Moerbeek wrote:
> > >
> > > > On Fri, Nov 20, 2020 at 02:01:55PM +0100, Claudio Jeker wrote:
> > > >
> > > > > On Fri, Nov 20, 2020 at 11:32:18AM +0100, Otto Moerbeek wrote:
> > > > > > On Fri, Nov 20, 2020 at 11:09:25AM +0100, Mark Kettenis wrote:
> > > > > >
> > > > > > > It's a relatively new driver.  It uses MSI which pretty much
> > rules out
> > > > > > > an issue with shared interrupts.  So I suspect this is an issue
> > with
> > > > > > > the rge(4) driver.  In the past we have fun with packet counter
> > > > > > > overflow interrupts.  Is the storm present immediately after you
> > bring
> > > > > > > up the interface?  Or even before?
> > > > > >
> > > > > > No storm if not configured and no cable plugged in.
> > > > > > No storm if not configured and cable plugged in
> > > > > > No storm if configured and no cable
> > > > > >
> > > > > > Storm start when I plug the cable in.
> > > > >
> > > > > Sounds like an unexpected interrupt source that should probably be
> > masked.
> > > > >
> > > > > I would look at rge_intr() and what status you get and compare it to
> > the
> > > > > RGE_ISR defines. This may help to figure out what is going on.
> > > > >
> > > > > --
> > > > > :wq Claudio
> > > >
> > > > The value of status after the RGE_READ_4 call is 0x10 all the time:
> > > > RGE_ISR_RX_DESC_UNAVAIL
> > > >
> > > > -Otto
> > > >
> > >
> > > If I apply the diff below the device starts to work without interrupt
> > storm.
> > > This is pure blind coding, I have little idea what I'm doing...
> > >
> > >   -Otto
> > >
> > > Index: dev/pci/if_rgereg.h
> > > ===
> > > RCS file: /cvs/src/sys/dev/pci/if_rgereg.h,v
> > > retrieving revision 1.4
> > > diff -u -p -r1.4 if_rgereg.h
> > > --- dev/pci/if_rgereg.h   31 Oct 2020 07:50:41 -  1.4
> > > +++ dev/pci/if_rgereg.h   21 Nov 2020 13:06:39 -
> > > @@ -88,7 +88,7 @@
> > >
> > >  #define RGE_INTRS\
> > >   (RGE_ISR_RX_OK | RGE_ISR_RX_ERR | RGE_ISR_TX_OK |   \
> > > - RGE_ISR_TX_ERR | RGE_ISR_RX_DESC_UNAVAIL | RGE_ISR_LINKCHG |\
> > > + RGE_ISR_TX_ERR | RGE_ISR_LINKCHG |  \
> > >   RGE_ISR_TX_DESC_UNAVAIL | RGE_ISR_PCS_TIMEOUT | RGE_ISR_SYSTEM_ERR)
> > >
> > >  #define RGE_INTRS_TIMER  \
> >
> > That makes some sense.  The description of that bit suggests this is
> > the interrupt you get when a packet arrives but there is no room in
> > the rx ring for it.
> >
> > That bit isn't actually all that useful.  It could be used to account
> > dropped packets, but it isn't.  It also could provide a trigger to
> > refill the ring if for some reason we end up with an empty rx ring.
> > In practice that doesn't work so well though, since a steady stream of
> > packets will mean the interrupt will keep on firing and potentially
> > keep the kernel from doing what it needs to free up mbufs such that
> > they can be put back on the ring.  It is better to use a timeout to
> > refill the ring if the minimum number of mbufs on the ring can't be
> > maintained.
> >
> > The question remains why the interrupt keeps firing in a scenario
> > where the ring should have enough packets on it.  But the answer may
> > turn out to be irrelevant.
> >
> >
> 
> This patch really helped on an Odroid H2+ as well.
> https://www.hardkernel.com/shop/odroid-h2plus/
> 
> # Before
> fw1$ vmstat -i
> interrupt   total rate
> irq0/clock   70183801  399
> irq0/ipi246950
> irq144/inteldrm0 11610
> irq176/azalia0  50
> irq101/nvme0  10804856
> irq114/rge0   38290462803   217809
> irq115/rge1   

Re: Memory leak with getaddrinfo()

2020-12-30 Thread Otto Moerbeek
On Mon, Dec 28, 2020 at 09:21:04PM +0100, Otto Moerbeek wrote:

> On Mon, Dec 28, 2020 at 08:54:46PM +0100, Theo Buehler wrote:
> 
> > On Sun, Dec 27, 2020 at 08:44:52PM +0100, Otto Moerbeek wrote:
> > > Hi,
> > > 
> > > So here the diff that just fixes the mem leak on thread exit. It does
> > > not contains the TLS init part, I'd like to do that differently than
> > > what I did in the version I posted earlier.
> > > 
> > > As stated before, this makes the getaddrino(3) test program run in
> > > constant memory.
> > > 
> > > ok? 
> > 
> > Please wrap this line at ,void:
> > 
> > > -_thread_tag_storage(void **tag, void *storage, size_t sz, void *err)
> > > +_thread_tag_storage(void **tag, void *storage, size_t sz, void 
> > > (*dt)(void *),void *err)
> > 
> > This diff made firefox 84.0.1 crash frequently on my main laptop (about
> > once every 10-15 minutes) while iridium was still stable and I could not
> > observe any other fallout.
> > 
> > The backtrace always looked like this:
> 
> OK, the two problems cannot be solved independently. I'll come up with
> a new diff 

This diff is no on tech@

> 
>   -Otto
> 
> > 
> > #0  thrkill () at /tmp/-:3
> > #1  0x00ae9eab90ee in nsProfileLock::FatalSignalHandler (signo=11, 
> > info=0xae12f05420, context=0xae12f05330)
> > at 
> > /usr/obj/ports/firefox-84.0.1/firefox-84.0.1/toolkit/profile/nsProfileLock.cpp:168
> > #2  0x00ae9f5aada7 in WasmTrapHandler (signum=11, info=0xae12f05420, 
> > context=0xae12f05330) at 
> > /usr/obj/ports/firefox-84.0.1/firefox-84.0.1/js/src/wasm/WasmSignalHandlers.cpp:980
> > #3  
> > #4  _asr_ctx_unref (ac=0xdfdfdfdfdfdfdfdf) at 
> > /usr/src/lib/libc/asr/asr.c:401
> > #5  _asr_resolver_done_tp (arg=0xadc8b2eec0) at 
> > /usr/src/lib/libc/asr/asr.c:141
> > #6  0x00ae5f6d0025 in _rthread_tls_destructors (thread=0xae81077440) at 
> > /usr/src/lib/libc/thread/rthread_tls.c:182
> > #7  0x00ae5f65ec13 in _libc_pthread_exit (retval=) at 
> > /usr/src/lib/libc/thread/rthread.c:150
> > #8  0x00ae39f038d9 in _rthread_start (v=) at 
> > /usr/src/lib/librthread/rthread.c:97
> > #9  0x00ae5f6e1c8a in __tfork_thread () at 
> > /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84
> > (gdb)
> > 
> 



Re: Memory leak with getaddrinfo()

2020-12-28 Thread Otto Moerbeek
On Mon, Dec 28, 2020 at 08:54:46PM +0100, Theo Buehler wrote:

> On Sun, Dec 27, 2020 at 08:44:52PM +0100, Otto Moerbeek wrote:
> > Hi,
> > 
> > So here the diff that just fixes the mem leak on thread exit. It does
> > not contains the TLS init part, I'd like to do that differently than
> > what I did in the version I posted earlier.
> > 
> > As stated before, this makes the getaddrino(3) test program run in
> > constant memory.
> > 
> > ok? 
> 
> Please wrap this line at ,void:
> 
> > -_thread_tag_storage(void **tag, void *storage, size_t sz, void *err)
> > +_thread_tag_storage(void **tag, void *storage, size_t sz, void (*dt)(void 
> > *),void *err)
> 
> This diff made firefox 84.0.1 crash frequently on my main laptop (about
> once every 10-15 minutes) while iridium was still stable and I could not
> observe any other fallout.
> 
> The backtrace always looked like this:

OK, the two problems cannot be solved independently. I'll come up with
a new diff 

-Otto

> 
> #0  thrkill () at /tmp/-:3
> #1  0x00ae9eab90ee in nsProfileLock::FatalSignalHandler (signo=11, 
> info=0xae12f05420, context=0xae12f05330)
> at 
> /usr/obj/ports/firefox-84.0.1/firefox-84.0.1/toolkit/profile/nsProfileLock.cpp:168
> #2  0x00ae9f5aada7 in WasmTrapHandler (signum=11, info=0xae12f05420, 
> context=0xae12f05330) at 
> /usr/obj/ports/firefox-84.0.1/firefox-84.0.1/js/src/wasm/WasmSignalHandlers.cpp:980
> #3  
> #4  _asr_ctx_unref (ac=0xdfdfdfdfdfdfdfdf) at /usr/src/lib/libc/asr/asr.c:401
> #5  _asr_resolver_done_tp (arg=0xadc8b2eec0) at 
> /usr/src/lib/libc/asr/asr.c:141
> #6  0x00ae5f6d0025 in _rthread_tls_destructors (thread=0xae81077440) at 
> /usr/src/lib/libc/thread/rthread_tls.c:182
> #7  0x00ae5f65ec13 in _libc_pthread_exit (retval=) at 
> /usr/src/lib/libc/thread/rthread.c:150
> #8  0x00ae39f038d9 in _rthread_start (v=) at 
> /usr/src/lib/librthread/rthread.c:97
> #9  0x00ae5f6e1c8a in __tfork_thread () at 
> /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84
> (gdb)
> 



Re: Memory leak with getaddrinfo()

2020-12-27 Thread Otto Moerbeek
Hi,

So here the diff that just fixes the mem leak on thread exit. It does
not contains the TLS init part, I'd like to do that differently than
what I did in the version I posted earlier.

As stated before, this makes the getaddrino(3) test program run in
constant memory.

ok? 

-Otto

Index: asr/asr.c
===
RCS file: /cvs/src/lib/libc/asr/asr.c,v
retrieving revision 1.64
diff -u -p -r1.64 asr.c
--- asr/asr.c   6 Jul 2020 13:33:05 -   1.64
+++ asr/asr.c   27 Dec 2020 19:44:01 -
@@ -117,7 +117,7 @@ _asr_resolver_done(void *arg)
_asr_ctx_unref(ac);
return;
} else {
-   priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
+   priv = _THREAD_PRIVATE_DT(_asr, _asr, NULL, &_asr);
if (*priv == NULL)
return;
asr = *priv;
@@ -128,6 +128,21 @@ _asr_resolver_done(void *arg)
free(asr);
 }
 
+static void
+_asr_resolver_done_tp(void *arg)
+{
+   struct asr **priv = arg;
+   struct asr *asr;
+
+   if (*priv == NULL)
+   return;
+   asr = *priv;
+
+   _asr_ctx_unref(asr->a_ctx);
+   free(asr);
+   free(priv);
+}
+
 void *
 asr_resolver_from_string(const char *str)
 {
@@ -349,7 +364,8 @@ _asr_use_resolver(void *arg)
}
else {
DPRINT("using thread-local resolver\n");
-   priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
+   priv = _THREAD_PRIVATE_DT(_asr, _asr, _asr_resolver_done_tp,
+   &_asr);
if (*priv == NULL) {
DPRINT("setting up thread-local resolver\n");
*priv = _asr_resolver();
Index: include/thread_private.h
===
RCS file: /cvs/src/lib/libc/include/thread_private.h,v
retrieving revision 1.35
diff -u -p -r1.35 thread_private.h
--- include/thread_private.h13 Feb 2019 13:22:14 -  1.35
+++ include/thread_private.h27 Dec 2020 19:44:01 -
@@ -98,7 +98,8 @@ struct thread_callbacks {
void(*tc_mutex_destroy)(void **);
void(*tc_tag_lock)(void **);
void(*tc_tag_unlock)(void **);
-   void*(*tc_tag_storage)(void **, void *, size_t, void *);
+   void*(*tc_tag_storage)(void **, void *, size_t, void (*)(void *),
+   void *);
__pid_t (*tc_fork)(void);
__pid_t (*tc_vfork)(void);
void(*tc_thread_release)(struct pthread *);
@@ -142,6 +143,7 @@ __END_HIDDEN_DECLS
 #define _THREAD_PRIVATE_MUTEX_LOCK(name)   do {} while (0)
 #define _THREAD_PRIVATE_MUTEX_UNLOCK(name) do {} while (0)
 #define _THREAD_PRIVATE(keyname, storage, error)   &(storage)
+#define _THREAD_PRIVATE_DT(keyname, storage, dt, error)&(storage)
 #define _MUTEX_LOCK(mutex) do {} while (0)
 #define _MUTEX_UNLOCK(mutex)   do {} while (0)
 #define _MUTEX_DESTROY(mutex)  do {} while (0)
@@ -168,7 +170,12 @@ __END_HIDDEN_DECLS
 #define _THREAD_PRIVATE(keyname, storage, error)   \
(_thread_cb.tc_tag_storage == NULL ? &(storage) :   \
_thread_cb.tc_tag_storage(&(__THREAD_NAME(keyname)),\
-   &(storage), sizeof(storage), error))
+   &(storage), sizeof(storage), NULL, (error)))
+
+#define _THREAD_PRIVATE_DT(keyname, storage, dt, error)
\
+   (_thread_cb.tc_tag_storage == NULL ? &(storage) :   \
+   _thread_cb.tc_tag_storage(&(__THREAD_NAME(keyname)),\
+   &(storage), sizeof(storage), (dt), (error)))
 
 /*
  * Macros used in libc to access mutexes.
Index: thread/rthread_cb.h
===
RCS file: /cvs/src/lib/libc/thread/rthread_cb.h,v
retrieving revision 1.2
diff -u -p -r1.2 rthread_cb.h
--- thread/rthread_cb.h 5 Sep 2017 02:40:54 -   1.2
+++ thread/rthread_cb.h 27 Dec 2020 19:44:01 -
@@ -35,5 +35,5 @@ void  _thread_mutex_unlock(void **);
 void   _thread_mutex_destroy(void **);
 void   _thread_tag_lock(void **);
 void   _thread_tag_unlock(void **);
-void   *_thread_tag_storage(void **, void *, size_t, void *);
+void   *_thread_tag_storage(void **, void *, size_t, void (*)(void*), void *);
 __END_HIDDEN_DECLS
Index: thread/rthread_libc.c
===
RCS file: /cvs/src/lib/libc/thread/rthread_libc.c,v
retrieving revision 1.3
diff -u -p -r1.3 rthread_libc.c
--- thread/rthread_libc.c   10 Jan 2019 18:45:33 -  1.3
+++ thread/rthread_libc.c   27 Dec 2020 19:44:01 -
@@ -31,7 +31,7 @@ static pthread_mutex_t_thread_tag_mutex
  * This function will never return NULL.
  */
 static void
-_thread_tag_init(void **tag)
+_thread_tag_init(void **tag, void (*dt)(v

Re: Memory leak with getaddrinfo()

2020-12-26 Thread Otto Moerbeek
On Sat, Dec 26, 2020 at 11:31:32AM +0100, Otto Moerbeek wrote:

> On Sat, Dec 26, 2020 at 11:07:00AM +0100, Otto Moerbeek wrote:
> 
> > On Fri, Dec 25, 2020 at 02:04:03PM +0100, Otto Moerbeek wrote:
> > 
> > > On Fri, Dec 25, 2020 at 12:59:10PM +0100, Otto Moerbeek wrote:
> > > 
> > > > On Fri, Dec 25, 2020 at 12:35:57PM +0100, Mark Kettenis wrote:
> > > > 
> > > > > > Date: Fri, 25 Dec 2020 11:34:47 +0100
> > > > > > From: Otto Moerbeek 
> > > > > > 
> > > > > > On Thu, Dec 24, 2020 at 01:29:28PM +0100, Uli Schlachter wrote:
> > > > > > 
> > > > > > > Hi,
> > > > > > > 
> > > > > > > due to that other thread, it occurred to me that getaddrinfo() 
> > > > > > > also has
> > > > > > > another bug: It leaks memory. _asr_use_resolver() allocates memory
> > > > > > > per-thread (via _asr_resolver()) and saves it via 
> > > > > > > _THREAD_PRIVATE() in
> > > > > > > _asr, but nothing frees that memory. A reproducer follows bellow. 
> > > > > > > On
> > > > > > > Debian, no memory leak is observed (= RES in top stays constant 
> > > > > > > over time).
> > > > > > > 
> > > > > > > I have no good suggestion for how to fix this leak, but I feel 
> > > > > > > like this
> > > > > > > might also be helpful in fixing the thread unsafety from "that 
> > > > > > > other
> > > > > > > thread". Both bugs originate from storing a pointer to an 
> > > > > > > allocation via
> > > > > > > _THREAD_PRIVATE(), which is something that does not really work 
> > > > > > > with
> > > > > > > that API.
> > > > > > > 
> > > > > > > IMHO this internal API needs to change. At this point, one can 
> > > > > > > also fix
> > > > > > > the other problem by having an API that guarantees that each 
> > > > > > > thread gets
> > > > > > > zeroed per-thread data instead of memcpy()ing from a global 
> > > > > > > default.
> > > > > > > 
> > > > > > > Other users of _THREAD_PRIVATE() instead seem to only store 
> > > > > > > buffers,
> > > > > > > e.g. strerror_l() or localtime(). These buffers do not need extra 
> > > > > > > cleanup.
> > > > > > > 
> > > > > > > Reproducer (sorry for the line wrapping; this is basically just 
> > > > > > > the
> > > > > > > previous example, but without calling getaddrinfo() on the main 
> > > > > > > thread:
> > > > > > > lots of threads are started and each thread calls getaddrinfo() 
> > > > > > > once):
> > > > > > > 
> > > > > > > #include 
> > > > > > > #include 
> > > > > > > #include 
> > > > > > > #include 
> > > > > > > #include 
> > > > > > > #include 
> > > > > > > 
> > > > > > > #define NUM_THREADS 50
> > > > > > > 
> > > > > > > static void do_lookup(const char *host)
> > > > > > > {
> > > > > > >   int s;
> > > > > > >   struct addrinfo hints;
> > > > > > >   struct addrinfo *result;
> > > > > > > 
> > > > > > >   memset(&hints, 0, sizeof(hints));
> > > > > > >   hints.ai_family = AF_UNSPEC;
> > > > > > >   hints.ai_socktype = SOCK_STREAM;
> > > > > > >   hints.ai_flags = AI_ADDRCONFIG;
> > > > > > >   hints.ai_protocol = IPPROTO_TCP;
> > > > > > > 
> > > > > > >   s = getaddrinfo(host, NULL, &hints, &result);
> > > > > > >   if (s != 0) {
> > > > > > >   fprintf(stderr, "Lookup error for %s: %s\n", host, 
> > > > > > > gai_strerror(s));
> > > > > > >   } else {
> > > > > > >   freeaddrinfo(result);
> > > > > > >   }
> > > > > > > }
> > > >

Re: Memory leak with getaddrinfo()

2020-12-26 Thread Otto Moerbeek
On Sat, Dec 26, 2020 at 11:07:00AM +0100, Otto Moerbeek wrote:

> On Fri, Dec 25, 2020 at 02:04:03PM +0100, Otto Moerbeek wrote:
> 
> > On Fri, Dec 25, 2020 at 12:59:10PM +0100, Otto Moerbeek wrote:
> > 
> > > On Fri, Dec 25, 2020 at 12:35:57PM +0100, Mark Kettenis wrote:
> > > 
> > > > > Date: Fri, 25 Dec 2020 11:34:47 +0100
> > > > > From: Otto Moerbeek 
> > > > > 
> > > > > On Thu, Dec 24, 2020 at 01:29:28PM +0100, Uli Schlachter wrote:
> > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > due to that other thread, it occurred to me that getaddrinfo() also 
> > > > > > has
> > > > > > another bug: It leaks memory. _asr_use_resolver() allocates memory
> > > > > > per-thread (via _asr_resolver()) and saves it via _THREAD_PRIVATE() 
> > > > > > in
> > > > > > _asr, but nothing frees that memory. A reproducer follows bellow. On
> > > > > > Debian, no memory leak is observed (= RES in top stays constant 
> > > > > > over time).
> > > > > > 
> > > > > > I have no good suggestion for how to fix this leak, but I feel like 
> > > > > > this
> > > > > > might also be helpful in fixing the thread unsafety from "that other
> > > > > > thread". Both bugs originate from storing a pointer to an 
> > > > > > allocation via
> > > > > > _THREAD_PRIVATE(), which is something that does not really work with
> > > > > > that API.
> > > > > > 
> > > > > > IMHO this internal API needs to change. At this point, one can also 
> > > > > > fix
> > > > > > the other problem by having an API that guarantees that each thread 
> > > > > > gets
> > > > > > zeroed per-thread data instead of memcpy()ing from a global default.
> > > > > > 
> > > > > > Other users of _THREAD_PRIVATE() instead seem to only store buffers,
> > > > > > e.g. strerror_l() or localtime(). These buffers do not need extra 
> > > > > > cleanup.
> > > > > > 
> > > > > > Reproducer (sorry for the line wrapping; this is basically just the
> > > > > > previous example, but without calling getaddrinfo() on the main 
> > > > > > thread:
> > > > > > lots of threads are started and each thread calls getaddrinfo() 
> > > > > > once):
> > > > > > 
> > > > > > #include 
> > > > > > #include 
> > > > > > #include 
> > > > > > #include 
> > > > > > #include 
> > > > > > #include 
> > > > > > 
> > > > > > #define NUM_THREADS 50
> > > > > > 
> > > > > > static void do_lookup(const char *host)
> > > > > > {
> > > > > > int s;
> > > > > > struct addrinfo hints;
> > > > > > struct addrinfo *result;
> > > > > > 
> > > > > > memset(&hints, 0, sizeof(hints));
> > > > > > hints.ai_family = AF_UNSPEC;
> > > > > > hints.ai_socktype = SOCK_STREAM;
> > > > > > hints.ai_flags = AI_ADDRCONFIG;
> > > > > > hints.ai_protocol = IPPROTO_TCP;
> > > > > > 
> > > > > > s = getaddrinfo(host, NULL, &hints, &result);
> > > > > > if (s != 0) {
> > > > > > fprintf(stderr, "Lookup error for %s: %s\n", host, 
> > > > > > gai_strerror(s));
> > > > > > } else {
> > > > > > freeaddrinfo(result);
> > > > > > }
> > > > > > }
> > > > > > 
> > > > > > static void *
> > > > > > do_things(void *arg)
> > > > > > {
> > > > > > (void) arg;
> > > > > > do_lookup("ipv4.google.com");
> > > > > > return NULL;
> > > > > > }
> > > > > > 
> > > > > > int main()
> > > > > > {
> > > > > > pthread_t threads[NUM_THREADS];
> > > > > > int i;
> > > > > > int s;
> > > > > > 
> > > >

Re: Memory leak with getaddrinfo()

2020-12-26 Thread Otto Moerbeek
On Fri, Dec 25, 2020 at 02:04:03PM +0100, Otto Moerbeek wrote:

> On Fri, Dec 25, 2020 at 12:59:10PM +0100, Otto Moerbeek wrote:
> 
> > On Fri, Dec 25, 2020 at 12:35:57PM +0100, Mark Kettenis wrote:
> > 
> > > > Date: Fri, 25 Dec 2020 11:34:47 +0100
> > > > From: Otto Moerbeek 
> > > > 
> > > > On Thu, Dec 24, 2020 at 01:29:28PM +0100, Uli Schlachter wrote:
> > > > 
> > > > > Hi,
> > > > > 
> > > > > due to that other thread, it occurred to me that getaddrinfo() also 
> > > > > has
> > > > > another bug: It leaks memory. _asr_use_resolver() allocates memory
> > > > > per-thread (via _asr_resolver()) and saves it via _THREAD_PRIVATE() in
> > > > > _asr, but nothing frees that memory. A reproducer follows bellow. On
> > > > > Debian, no memory leak is observed (= RES in top stays constant over 
> > > > > time).
> > > > > 
> > > > > I have no good suggestion for how to fix this leak, but I feel like 
> > > > > this
> > > > > might also be helpful in fixing the thread unsafety from "that other
> > > > > thread". Both bugs originate from storing a pointer to an allocation 
> > > > > via
> > > > > _THREAD_PRIVATE(), which is something that does not really work with
> > > > > that API.
> > > > > 
> > > > > IMHO this internal API needs to change. At this point, one can also 
> > > > > fix
> > > > > the other problem by having an API that guarantees that each thread 
> > > > > gets
> > > > > zeroed per-thread data instead of memcpy()ing from a global default.
> > > > > 
> > > > > Other users of _THREAD_PRIVATE() instead seem to only store buffers,
> > > > > e.g. strerror_l() or localtime(). These buffers do not need extra 
> > > > > cleanup.
> > > > > 
> > > > > Reproducer (sorry for the line wrapping; this is basically just the
> > > > > previous example, but without calling getaddrinfo() on the main 
> > > > > thread:
> > > > > lots of threads are started and each thread calls getaddrinfo() once):
> > > > > 
> > > > > #include 
> > > > > #include 
> > > > > #include 
> > > > > #include 
> > > > > #include 
> > > > > #include 
> > > > > 
> > > > > #define NUM_THREADS 50
> > > > > 
> > > > > static void do_lookup(const char *host)
> > > > > {
> > > > >   int s;
> > > > >   struct addrinfo hints;
> > > > >   struct addrinfo *result;
> > > > > 
> > > > >   memset(&hints, 0, sizeof(hints));
> > > > >   hints.ai_family = AF_UNSPEC;
> > > > >   hints.ai_socktype = SOCK_STREAM;
> > > > >   hints.ai_flags = AI_ADDRCONFIG;
> > > > >   hints.ai_protocol = IPPROTO_TCP;
> > > > > 
> > > > >   s = getaddrinfo(host, NULL, &hints, &result);
> > > > >   if (s != 0) {
> > > > >   fprintf(stderr, "Lookup error for %s: %s\n", host, 
> > > > > gai_strerror(s));
> > > > >   } else {
> > > > >   freeaddrinfo(result);
> > > > >   }
> > > > > }
> > > > > 
> > > > > static void *
> > > > > do_things(void *arg)
> > > > > {
> > > > >   (void) arg;
> > > > >   do_lookup("ipv4.google.com");
> > > > >   return NULL;
> > > > > }
> > > > > 
> > > > > int main()
> > > > > {
> > > > >   pthread_t threads[NUM_THREADS];
> > > > >   int i;
> > > > >   int s;
> > > > > 
> > > > >   for (;;) {
> > > > >   for (i = 0; i < NUM_THREADS; i++) {
> > > > >   s = pthread_create(&threads[i], NULL, 
> > > > > do_things, NULL);
> > > > >   if (s != 0)
> > > > >   fprintf(stderr, "Error creating 
> > > > > thread");
> > > > >   }
> > > > >   for (i = 0; i &l

Re: Memory leak with getaddrinfo()

2020-12-25 Thread Otto Moerbeek
On Fri, Dec 25, 2020 at 12:59:10PM +0100, Otto Moerbeek wrote:

> On Fri, Dec 25, 2020 at 12:35:57PM +0100, Mark Kettenis wrote:
> 
> > > Date: Fri, 25 Dec 2020 11:34:47 +0100
> > > From: Otto Moerbeek 
> > > 
> > > On Thu, Dec 24, 2020 at 01:29:28PM +0100, Uli Schlachter wrote:
> > > 
> > > > Hi,
> > > > 
> > > > due to that other thread, it occurred to me that getaddrinfo() also has
> > > > another bug: It leaks memory. _asr_use_resolver() allocates memory
> > > > per-thread (via _asr_resolver()) and saves it via _THREAD_PRIVATE() in
> > > > _asr, but nothing frees that memory. A reproducer follows bellow. On
> > > > Debian, no memory leak is observed (= RES in top stays constant over 
> > > > time).
> > > > 
> > > > I have no good suggestion for how to fix this leak, but I feel like this
> > > > might also be helpful in fixing the thread unsafety from "that other
> > > > thread". Both bugs originate from storing a pointer to an allocation via
> > > > _THREAD_PRIVATE(), which is something that does not really work with
> > > > that API.
> > > > 
> > > > IMHO this internal API needs to change. At this point, one can also fix
> > > > the other problem by having an API that guarantees that each thread gets
> > > > zeroed per-thread data instead of memcpy()ing from a global default.
> > > > 
> > > > Other users of _THREAD_PRIVATE() instead seem to only store buffers,
> > > > e.g. strerror_l() or localtime(). These buffers do not need extra 
> > > > cleanup.
> > > > 
> > > > Reproducer (sorry for the line wrapping; this is basically just the
> > > > previous example, but without calling getaddrinfo() on the main thread:
> > > > lots of threads are started and each thread calls getaddrinfo() once):
> > > > 
> > > > #include 
> > > > #include 
> > > > #include 
> > > > #include 
> > > > #include 
> > > > #include 
> > > > 
> > > > #define NUM_THREADS 50
> > > > 
> > > > static void do_lookup(const char *host)
> > > > {
> > > > int s;
> > > > struct addrinfo hints;
> > > > struct addrinfo *result;
> > > > 
> > > > memset(&hints, 0, sizeof(hints));
> > > > hints.ai_family = AF_UNSPEC;
> > > > hints.ai_socktype = SOCK_STREAM;
> > > > hints.ai_flags = AI_ADDRCONFIG;
> > > > hints.ai_protocol = IPPROTO_TCP;
> > > > 
> > > > s = getaddrinfo(host, NULL, &hints, &result);
> > > > if (s != 0) {
> > > > fprintf(stderr, "Lookup error for %s: %s\n", host, 
> > > > gai_strerror(s));
> > > > } else {
> > > > freeaddrinfo(result);
> > > > }
> > > > }
> > > > 
> > > > static void *
> > > > do_things(void *arg)
> > > > {
> > > > (void) arg;
> > > > do_lookup("ipv4.google.com");
> > > > return NULL;
> > > > }
> > > > 
> > > > int main()
> > > > {
> > > > pthread_t threads[NUM_THREADS];
> > > > int i;
> > > > int s;
> > > > 
> > > > for (;;) {
> > > > for (i = 0; i < NUM_THREADS; i++) {
> > > > s = pthread_create(&threads[i], NULL, 
> > > > do_things, NULL);
> > > > if (s != 0)
> > > > fprintf(stderr, "Error creating 
> > > > thread");
> > > > }
> > > > for (i = 0; i < NUM_THREADS; i++) {
> > > > pthread_join(threads[i], NULL);
> > > > }
> > > > }
> > > > return 0;
> > > > }
> > > > 
> > > > Cheers,
> > > > Uli
> > > > -- 
> > > > This can be a, a little complicated. Listen, my advice is... ask
> > > > somebody else for advice, at least someone who's... got more experience
> > > > at...  giving advice.
> > > > 
> > > 
> > > Hoi,
&

Re: Memory leak with getaddrinfo()

2020-12-25 Thread Otto Moerbeek
On Fri, Dec 25, 2020 at 12:35:57PM +0100, Mark Kettenis wrote:

> > Date: Fri, 25 Dec 2020 11:34:47 +0100
> > From: Otto Moerbeek 
> > 
> > On Thu, Dec 24, 2020 at 01:29:28PM +0100, Uli Schlachter wrote:
> > 
> > > Hi,
> > > 
> > > due to that other thread, it occurred to me that getaddrinfo() also has
> > > another bug: It leaks memory. _asr_use_resolver() allocates memory
> > > per-thread (via _asr_resolver()) and saves it via _THREAD_PRIVATE() in
> > > _asr, but nothing frees that memory. A reproducer follows bellow. On
> > > Debian, no memory leak is observed (= RES in top stays constant over 
> > > time).
> > > 
> > > I have no good suggestion for how to fix this leak, but I feel like this
> > > might also be helpful in fixing the thread unsafety from "that other
> > > thread". Both bugs originate from storing a pointer to an allocation via
> > > _THREAD_PRIVATE(), which is something that does not really work with
> > > that API.
> > > 
> > > IMHO this internal API needs to change. At this point, one can also fix
> > > the other problem by having an API that guarantees that each thread gets
> > > zeroed per-thread data instead of memcpy()ing from a global default.
> > > 
> > > Other users of _THREAD_PRIVATE() instead seem to only store buffers,
> > > e.g. strerror_l() or localtime(). These buffers do not need extra cleanup.
> > > 
> > > Reproducer (sorry for the line wrapping; this is basically just the
> > > previous example, but without calling getaddrinfo() on the main thread:
> > > lots of threads are started and each thread calls getaddrinfo() once):
> > > 
> > > #include 
> > > #include 
> > > #include 
> > > #include 
> > > #include 
> > > #include 
> > > 
> > > #define NUM_THREADS 50
> > > 
> > > static void do_lookup(const char *host)
> > > {
> > >   int s;
> > >   struct addrinfo hints;
> > >   struct addrinfo *result;
> > > 
> > >   memset(&hints, 0, sizeof(hints));
> > >   hints.ai_family = AF_UNSPEC;
> > >   hints.ai_socktype = SOCK_STREAM;
> > >   hints.ai_flags = AI_ADDRCONFIG;
> > >   hints.ai_protocol = IPPROTO_TCP;
> > > 
> > >   s = getaddrinfo(host, NULL, &hints, &result);
> > >   if (s != 0) {
> > >   fprintf(stderr, "Lookup error for %s: %s\n", host, 
> > > gai_strerror(s));
> > >   } else {
> > >   freeaddrinfo(result);
> > >   }
> > > }
> > > 
> > > static void *
> > > do_things(void *arg)
> > > {
> > >   (void) arg;
> > >   do_lookup("ipv4.google.com");
> > >   return NULL;
> > > }
> > > 
> > > int main()
> > > {
> > >   pthread_t threads[NUM_THREADS];
> > >   int i;
> > >   int s;
> > > 
> > >   for (;;) {
> > >   for (i = 0; i < NUM_THREADS; i++) {
> > >   s = pthread_create(&threads[i], NULL, do_things, NULL);
> > >   if (s != 0)
> > >   fprintf(stderr, "Error creating thread");
> > >   }
> > >   for (i = 0; i < NUM_THREADS; i++) {
> > >   pthread_join(threads[i], NULL);
> > >   }
> > >   }
> > >   return 0;
> > > }
> > > 
> > > Cheers,
> > > Uli
> > > -- 
> > > This can be a, a little complicated. Listen, my advice is... ask
> > > somebody else for advice, at least someone who's... got more experience
> > > at...  giving advice.
> > > 
> > 
> > Hoi,
> > 
> > the diff (which is certainly wip) below fixes the asr related leaks.
> > There's still a leak on creating/destroying threads, in particular the
> > stacks do not seem to be unmapped.
> 
> That is (somewhat) expected.  Stacks are recycled if they are
> allocated using "default" parameters; see _rthread_free_stack().

yes, agreed, they do seem to be reused. I was looking with valgrind,
and this seem to inrtroduce a bug where the stack reuse is effectively
disabled. So this observation was a fluke.

Still seeing growing mem usage when I create and destroy threads,
though. Strange thing is that while top show increasing SIZE and RES,
ktrace shows no syscall that do allocations, just repeating

 67230/276969  a.outRET   __tfork 445287/0x6

Re: Memory leak with getaddrinfo()

2020-12-25 Thread Otto Moerbeek
On Thu, Dec 24, 2020 at 01:29:28PM +0100, Uli Schlachter wrote:

> Hi,
> 
> due to that other thread, it occurred to me that getaddrinfo() also has
> another bug: It leaks memory. _asr_use_resolver() allocates memory
> per-thread (via _asr_resolver()) and saves it via _THREAD_PRIVATE() in
> _asr, but nothing frees that memory. A reproducer follows bellow. On
> Debian, no memory leak is observed (= RES in top stays constant over time).
> 
> I have no good suggestion for how to fix this leak, but I feel like this
> might also be helpful in fixing the thread unsafety from "that other
> thread". Both bugs originate from storing a pointer to an allocation via
> _THREAD_PRIVATE(), which is something that does not really work with
> that API.
> 
> IMHO this internal API needs to change. At this point, one can also fix
> the other problem by having an API that guarantees that each thread gets
> zeroed per-thread data instead of memcpy()ing from a global default.
> 
> Other users of _THREAD_PRIVATE() instead seem to only store buffers,
> e.g. strerror_l() or localtime(). These buffers do not need extra cleanup.
> 
> Reproducer (sorry for the line wrapping; this is basically just the
> previous example, but without calling getaddrinfo() on the main thread:
> lots of threads are started and each thread calls getaddrinfo() once):
> 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> #define NUM_THREADS 50
> 
> static void do_lookup(const char *host)
> {
>   int s;
>   struct addrinfo hints;
>   struct addrinfo *result;
> 
>   memset(&hints, 0, sizeof(hints));
>   hints.ai_family = AF_UNSPEC;
>   hints.ai_socktype = SOCK_STREAM;
>   hints.ai_flags = AI_ADDRCONFIG;
>   hints.ai_protocol = IPPROTO_TCP;
> 
>   s = getaddrinfo(host, NULL, &hints, &result);
>   if (s != 0) {
>   fprintf(stderr, "Lookup error for %s: %s\n", host, 
> gai_strerror(s));
>   } else {
>   freeaddrinfo(result);
>   }
> }
> 
> static void *
> do_things(void *arg)
> {
>   (void) arg;
>   do_lookup("ipv4.google.com");
>   return NULL;
> }
> 
> int main()
> {
>   pthread_t threads[NUM_THREADS];
>   int i;
>   int s;
> 
>   for (;;) {
>   for (i = 0; i < NUM_THREADS; i++) {
>   s = pthread_create(&threads[i], NULL, do_things, NULL);
>   if (s != 0)
>   fprintf(stderr, "Error creating thread");
>   }
>   for (i = 0; i < NUM_THREADS; i++) {
>   pthread_join(threads[i], NULL);
>   }
>   }
>   return 0;
> }
> 
> Cheers,
> Uli
> -- 
> This can be a, a little complicated. Listen, my advice is... ask
> somebody else for advice, at least someone who's... got more experience
> at...  giving advice.
> 

Hoi,

the diff (which is certainly wip) below fixes the asr related leaks.
There's still a leak on creating/destroying threads, in particular the
stacks do not seem to be unmapped.

-Otto

Index: asr/asr.c
===
RCS file: /cvs/src/lib/libc/asr/asr.c,v
retrieving revision 1.64
diff -u -p -r1.64 asr.c
--- asr/asr.c   6 Jul 2020 13:33:05 -   1.64
+++ asr/asr.c   25 Dec 2020 09:09:26 -
@@ -117,7 +117,7 @@ _asr_resolver_done(void *arg)
_asr_ctx_unref(ac);
return;
} else {
-   priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
+   priv = _THREAD_PRIVATE_DT(_asr, _asr, NULL, &_asr);
if (*priv == NULL)
return;
asr = *priv;
@@ -128,6 +128,21 @@ _asr_resolver_done(void *arg)
free(asr);
 }
 
+static void
+_asr_resolver_done_tp(void *arg)
+{
+   struct asr **priv = arg;
+   struct asr *asr;
+
+   if (*priv == NULL)
+   return;
+   asr = *priv;
+
+   _asr_ctx_unref(asr->a_ctx);
+   free(asr);
+   free(priv);
+}
+
 void *
 asr_resolver_from_string(const char *str)
 {
@@ -349,7 +364,10 @@ _asr_use_resolver(void *arg)
}
else {
DPRINT("using thread-local resolver\n");
-   priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
+   priv = _THREAD_PRIVATE_DT(_asr, _asr, _asr_resolver_done_tp,
+   &_asr);
+   if (priv != &_asr && *priv == _asr) 
+   *priv = NULL;
if (*priv == NULL) {
DPRINT("setting up thread-local resolver\n");
*priv = _asr_resolver();
Index: asr/asr_private.h
===
RCS file: /cvs/src/lib/libc/asr/asr_private.h,v
retrieving revision 1.47
diff -u -p -r1.47 asr_private.h
--- asr/asr_private.h   28 Apr 2018 15:16:49 -  1.47
+++ asr/asr_private.h   25 Dec 2020 09:09:26 -
@@ -319,6 +319,8 @@ struct asr_query *_gethostbyaddr_async_c
 
 int _asr

Re: getaddrinfo() is not thread-safe in 6.8

2020-12-24 Thread Otto Moerbeek
On Thu, Dec 24, 2020 at 02:28:25PM +0100, Otto Moerbeek wrote:

> On Thu, Dec 24, 2020 at 01:59:34PM +0100, Mark Kettenis wrote:
> 
> > > Date: Thu, 24 Dec 2020 12:27:01 +0100
> > > From: Otto Moerbeek 
> > > 
> > > On Thu, Dec 24, 2020 at 10:36:37AM +0100, Otto Moerbeek wrote:
> > > 
> > > > On Thu, Dec 24, 2020 at 08:25:45AM +0100, Uli Schlachter wrote:
> > > > 
> > > > > Hi everyone,
> > > > > 
> > > > > Am 24.12.20 um 04:35 schrieb Alexey Sokolov:
> > > > > [...]
> > > > > > Hi, getaddrinfo() crashes when multiple threads run getaddrinfo()
> > > > > > concurrently. This didn't happen in 6.7.
> > > > > > It looks like asr_ctx which is supposed to be thread-local 
> > > > > > according to
> > > > > > _asr_use_resolver(), is actually static / shared between threads.
> > > > > >> How-To-Repeat:
> > > > > [...]
> > > > > > The reproducing code written by psychon, CCed, while debugging the 
> > > > > > crash
> > > > > > of a more complicated software
> > > > > [...]
> > > > > here is a bit more information about what is going on:
> > > > > 
> > > > > When run under egdb, you can "watch _asr". This reports:
> > > > > 
> > > > > Old value = (struct asr *) 0x0
> > > > > New value = (struct asr *) 0x79905dce2a0
> > > > > _asr_use_resolver (arg=) at 
> > > > > /usr/src/lib/libc/asr/asr.c:360
> > > > > 
> > > > > (gdb) bt
> > > > > #0  _asr_use_resolver (arg=) at
> > > > > /usr/src/lib/libc/asr/asr.c:360
> > > > > #1  0x07990558862e in _libc_res_init () at
> > > > > /usr/src/lib/libc/asr/res_init.c:44
> > > > > #2  0x07990552c1e4 in _libc_getaddrinfo (hostname=0x796629c37a3
> > > > > "ipv4.google.com", servname=0x0, hints=0x7f7ce120,
> > > > > res=0x7f7ce158) at /usr/src/lib/libc/asr/getaddrinfo.c:36
> > > > > #3  0x0796629c4d2e in do_lookup ()
> > > > > #4  0x0796629c4da3 in do_things ()
> > > > > #5  0x0796629c4ded in main ()
> > > > > 
> > > > > At this time, the program is still single-threaded (as seen in the
> > > > > backtrace). This line in _asr_use_resolver() is the culprit:
> > > > > 
> > > > > priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
> > > > > 
> > > > > the macro is defined as:
> > > > > 
> > > > > #define _THREAD_PRIVATE(keyname, storage, error)
> > > > >   (_thread_cb.tc_tag_storage == NULL ? &(storage) :
> > > > >   _thread_cb.tc_tag_storage(&(__THREAD_NAME(keyname)),
> > > > >   &(storage), sizeof(storage), error))
> > > > > 
> > > > > gdb has to say about this:
> > > > > 
> > > > > (gdb) print _thread_cb.tc_tag_storage
> > > > > $3 = (void *(*)(void **, void *, size_t, void *)) 0x0
> > > > > 
> > > > > Thus, because we are still single-threaded, the program uses the 
> > > > > static
> > > > > variable _asr directly to store its "per thread" resolver (priv = 
> > > > > &_asr
> > > > > in the macro invocation above and thus *priv = _asr_resolver() ends up
> > > > > storing the resolver in a global variable).
> > > > > 
> > > > > Later, threads are started. The next time that this code runs,
> > > > > _thread_cb.tc_tag_storage points to _thread_tag_storage() from
> > > > > lib/libc/thread/rthread_libc.c. This function uses memcpy() to
> > > > > initialise the per-thread variant of the variable with the contents of
> > > > > the global variant. Thus, all threads now end up using the resolver
> > > > > instance that the main thread created.
> > > > > 
> > > > > I did not investigate what exactly goes wrong later, but since 
> > > > > multiple
> > > > > threads are now sharing a resolver, nothing good can happen.
> > > > > 
> > > > > The actual crash is a use-after-free (and it is always the same one):
> > > > > iter_family() in lib/libc/asr/getaddrinfo_async.c executes
> > > > > AS_FAMILY(as). This

Re: getaddrinfo() is not thread-safe in 6.8

2020-12-24 Thread Otto Moerbeek
On Thu, Dec 24, 2020 at 01:59:34PM +0100, Mark Kettenis wrote:

> > Date: Thu, 24 Dec 2020 12:27:01 +0100
> > From: Otto Moerbeek 
> > 
> > On Thu, Dec 24, 2020 at 10:36:37AM +0100, Otto Moerbeek wrote:
> > 
> > > On Thu, Dec 24, 2020 at 08:25:45AM +0100, Uli Schlachter wrote:
> > > 
> > > > Hi everyone,
> > > > 
> > > > Am 24.12.20 um 04:35 schrieb Alexey Sokolov:
> > > > [...]
> > > > > Hi, getaddrinfo() crashes when multiple threads run getaddrinfo()
> > > > > concurrently. This didn't happen in 6.7.
> > > > > It looks like asr_ctx which is supposed to be thread-local according 
> > > > > to
> > > > > _asr_use_resolver(), is actually static / shared between threads.
> > > > >> How-To-Repeat:
> > > > [...]
> > > > > The reproducing code written by psychon, CCed, while debugging the 
> > > > > crash
> > > > > of a more complicated software
> > > > [...]
> > > > here is a bit more information about what is going on:
> > > > 
> > > > When run under egdb, you can "watch _asr". This reports:
> > > > 
> > > > Old value = (struct asr *) 0x0
> > > > New value = (struct asr *) 0x79905dce2a0
> > > > _asr_use_resolver (arg=) at 
> > > > /usr/src/lib/libc/asr/asr.c:360
> > > > 
> > > > (gdb) bt
> > > > #0  _asr_use_resolver (arg=) at
> > > > /usr/src/lib/libc/asr/asr.c:360
> > > > #1  0x07990558862e in _libc_res_init () at
> > > > /usr/src/lib/libc/asr/res_init.c:44
> > > > #2  0x07990552c1e4 in _libc_getaddrinfo (hostname=0x796629c37a3
> > > > "ipv4.google.com", servname=0x0, hints=0x7f7ce120,
> > > > res=0x7f7ce158) at /usr/src/lib/libc/asr/getaddrinfo.c:36
> > > > #3  0x0796629c4d2e in do_lookup ()
> > > > #4  0x0796629c4da3 in do_things ()
> > > > #5  0x0796629c4ded in main ()
> > > > 
> > > > At this time, the program is still single-threaded (as seen in the
> > > > backtrace). This line in _asr_use_resolver() is the culprit:
> > > > 
> > > > priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
> > > > 
> > > > the macro is defined as:
> > > > 
> > > > #define _THREAD_PRIVATE(keyname, storage, error)
> > > > (_thread_cb.tc_tag_storage == NULL ? &(storage) :
> > > > _thread_cb.tc_tag_storage(&(__THREAD_NAME(keyname)),
> > > > &(storage), sizeof(storage), error))
> > > > 
> > > > gdb has to say about this:
> > > > 
> > > > (gdb) print _thread_cb.tc_tag_storage
> > > > $3 = (void *(*)(void **, void *, size_t, void *)) 0x0
> > > > 
> > > > Thus, because we are still single-threaded, the program uses the static
> > > > variable _asr directly to store its "per thread" resolver (priv = &_asr
> > > > in the macro invocation above and thus *priv = _asr_resolver() ends up
> > > > storing the resolver in a global variable).
> > > > 
> > > > Later, threads are started. The next time that this code runs,
> > > > _thread_cb.tc_tag_storage points to _thread_tag_storage() from
> > > > lib/libc/thread/rthread_libc.c. This function uses memcpy() to
> > > > initialise the per-thread variant of the variable with the contents of
> > > > the global variant. Thus, all threads now end up using the resolver
> > > > instance that the main thread created.
> > > > 
> > > > I did not investigate what exactly goes wrong later, but since multiple
> > > > threads are now sharing a resolver, nothing good can happen.
> > > > 
> > > > The actual crash is a use-after-free (and it is always the same one):
> > > > iter_family() in lib/libc/asr/getaddrinfo_async.c executes
> > > > AS_FAMILY(as). This is:
> > > > 
> > > >  #define AS_FAMILY(p) ((p)->as_ctx->ac_family[(p)->as_family_idx])
> > > > 
> > > > (gdb) print as->as_ctx->ac_family[as->as_family_idx]
> > > > Cannot access memory at address 0x28b3653e000
> > > > 
> > > > The struct as->as_ctx was overwritten with 0xdf. Here is a random struct
> > > > member as an example:
> > > > 
> > > > (gdb) print as->as_ctx[0]->ac_domain
&

Re: getaddrinfo() is not thread-safe in 6.8

2020-12-24 Thread Otto Moerbeek
On Thu, Dec 24, 2020 at 10:36:37AM +0100, Otto Moerbeek wrote:

> On Thu, Dec 24, 2020 at 08:25:45AM +0100, Uli Schlachter wrote:
> 
> > Hi everyone,
> > 
> > Am 24.12.20 um 04:35 schrieb Alexey Sokolov:
> > [...]
> > > Hi, getaddrinfo() crashes when multiple threads run getaddrinfo()
> > > concurrently. This didn't happen in 6.7.
> > > It looks like asr_ctx which is supposed to be thread-local according to
> > > _asr_use_resolver(), is actually static / shared between threads.
> > >> How-To-Repeat:
> > [...]
> > > The reproducing code written by psychon, CCed, while debugging the crash
> > > of a more complicated software
> > [...]
> > here is a bit more information about what is going on:
> > 
> > When run under egdb, you can "watch _asr". This reports:
> > 
> > Old value = (struct asr *) 0x0
> > New value = (struct asr *) 0x79905dce2a0
> > _asr_use_resolver (arg=) at /usr/src/lib/libc/asr/asr.c:360
> > 
> > (gdb) bt
> > #0  _asr_use_resolver (arg=) at
> > /usr/src/lib/libc/asr/asr.c:360
> > #1  0x07990558862e in _libc_res_init () at
> > /usr/src/lib/libc/asr/res_init.c:44
> > #2  0x07990552c1e4 in _libc_getaddrinfo (hostname=0x796629c37a3
> > "ipv4.google.com", servname=0x0, hints=0x7f7ce120,
> > res=0x7f7ce158) at /usr/src/lib/libc/asr/getaddrinfo.c:36
> > #3  0x0796629c4d2e in do_lookup ()
> > #4  0x0796629c4da3 in do_things ()
> > #5  0x0796629c4ded in main ()
> > 
> > At this time, the program is still single-threaded (as seen in the
> > backtrace). This line in _asr_use_resolver() is the culprit:
> > 
> > priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
> > 
> > the macro is defined as:
> > 
> > #define _THREAD_PRIVATE(keyname, storage, error)
> > (_thread_cb.tc_tag_storage == NULL ? &(storage) :
> > _thread_cb.tc_tag_storage(&(__THREAD_NAME(keyname)),
> > &(storage), sizeof(storage), error))
> > 
> > gdb has to say about this:
> > 
> > (gdb) print _thread_cb.tc_tag_storage
> > $3 = (void *(*)(void **, void *, size_t, void *)) 0x0
> > 
> > Thus, because we are still single-threaded, the program uses the static
> > variable _asr directly to store its "per thread" resolver (priv = &_asr
> > in the macro invocation above and thus *priv = _asr_resolver() ends up
> > storing the resolver in a global variable).
> > 
> > Later, threads are started. The next time that this code runs,
> > _thread_cb.tc_tag_storage points to _thread_tag_storage() from
> > lib/libc/thread/rthread_libc.c. This function uses memcpy() to
> > initialise the per-thread variant of the variable with the contents of
> > the global variant. Thus, all threads now end up using the resolver
> > instance that the main thread created.
> > 
> > I did not investigate what exactly goes wrong later, but since multiple
> > threads are now sharing a resolver, nothing good can happen.
> > 
> > The actual crash is a use-after-free (and it is always the same one):
> > iter_family() in lib/libc/asr/getaddrinfo_async.c executes
> > AS_FAMILY(as). This is:
> > 
> >  #define AS_FAMILY(p) ((p)->as_ctx->ac_family[(p)->as_family_idx])
> > 
> > (gdb) print as->as_ctx->ac_family[as->as_family_idx]
> > Cannot access memory at address 0x28b3653e000
> > 
> > The struct as->as_ctx was overwritten with 0xdf. Here is a random struct
> > member as an example:
> > 
> > (gdb) print as->as_ctx[0]->ac_domain
> > $8 = 0xdfdfdfdfdfdfdfdf  > 0xdfdfdfdfdfdfdfdf>
> > 
> > At this point, as->as_ctx == _asr->a_ctx, but I am not sure how much
> > this helps / surprises, since the crash happens in the main thread...
> > I guess one of the threads freed the main thread's resolver in
> > _asr_resolver_done()..?
> > 
> > I guess that _asr_resolver_done() is supposed to set _asr back to a NULL
> > pointer. However, a breakpoint on pthread_create() says that it is not
> > doing so. In fact, I cannot find any callers of _asr_resolver_done()...?
> > Perhaps some code was changed to directly call _asr_ctx_unref() instead?
> > At least the code looks like asr_run() does _asr_async_free() at the end
> > and this just calls _asr_ctx_unref() directly, even though the context
> > was originally aquired by calling getaddrinfo_async() with last
> > parameter NULL, which it passed on to _asr_use_resolver(NULL) and thus
> > acquiring the thread-local context.
>

Re: getaddrinfo() is not thread-safe in 6.8

2020-12-24 Thread Otto Moerbeek
On Thu, Dec 24, 2020 at 08:25:45AM +0100, Uli Schlachter wrote:

> Hi everyone,
> 
> Am 24.12.20 um 04:35 schrieb Alexey Sokolov:
> [...]
> > Hi, getaddrinfo() crashes when multiple threads run getaddrinfo()
> > concurrently. This didn't happen in 6.7.
> > It looks like asr_ctx which is supposed to be thread-local according to
> > _asr_use_resolver(), is actually static / shared between threads.
> >> How-To-Repeat:
> [...]
> > The reproducing code written by psychon, CCed, while debugging the crash
> > of a more complicated software
> [...]
> here is a bit more information about what is going on:
> 
> When run under egdb, you can "watch _asr". This reports:
> 
> Old value = (struct asr *) 0x0
> New value = (struct asr *) 0x79905dce2a0
> _asr_use_resolver (arg=) at /usr/src/lib/libc/asr/asr.c:360
> 
> (gdb) bt
> #0  _asr_use_resolver (arg=) at
> /usr/src/lib/libc/asr/asr.c:360
> #1  0x07990558862e in _libc_res_init () at
> /usr/src/lib/libc/asr/res_init.c:44
> #2  0x07990552c1e4 in _libc_getaddrinfo (hostname=0x796629c37a3
> "ipv4.google.com", servname=0x0, hints=0x7f7ce120,
> res=0x7f7ce158) at /usr/src/lib/libc/asr/getaddrinfo.c:36
> #3  0x0796629c4d2e in do_lookup ()
> #4  0x0796629c4da3 in do_things ()
> #5  0x0796629c4ded in main ()
> 
> At this time, the program is still single-threaded (as seen in the
> backtrace). This line in _asr_use_resolver() is the culprit:
> 
> priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
> 
> the macro is defined as:
> 
> #define _THREAD_PRIVATE(keyname, storage, error)
>   (_thread_cb.tc_tag_storage == NULL ? &(storage) :
>   _thread_cb.tc_tag_storage(&(__THREAD_NAME(keyname)),
>   &(storage), sizeof(storage), error))
> 
> gdb has to say about this:
> 
> (gdb) print _thread_cb.tc_tag_storage
> $3 = (void *(*)(void **, void *, size_t, void *)) 0x0
> 
> Thus, because we are still single-threaded, the program uses the static
> variable _asr directly to store its "per thread" resolver (priv = &_asr
> in the macro invocation above and thus *priv = _asr_resolver() ends up
> storing the resolver in a global variable).
> 
> Later, threads are started. The next time that this code runs,
> _thread_cb.tc_tag_storage points to _thread_tag_storage() from
> lib/libc/thread/rthread_libc.c. This function uses memcpy() to
> initialise the per-thread variant of the variable with the contents of
> the global variant. Thus, all threads now end up using the resolver
> instance that the main thread created.
> 
> I did not investigate what exactly goes wrong later, but since multiple
> threads are now sharing a resolver, nothing good can happen.
> 
> The actual crash is a use-after-free (and it is always the same one):
> iter_family() in lib/libc/asr/getaddrinfo_async.c executes
> AS_FAMILY(as). This is:
> 
>  #define AS_FAMILY(p) ((p)->as_ctx->ac_family[(p)->as_family_idx])
> 
> (gdb) print as->as_ctx->ac_family[as->as_family_idx]
> Cannot access memory at address 0x28b3653e000
> 
> The struct as->as_ctx was overwritten with 0xdf. Here is a random struct
> member as an example:
> 
> (gdb) print as->as_ctx[0]->ac_domain
> $8 = 0xdfdfdfdfdfdfdfdf  0xdfdfdfdfdfdfdfdf>
> 
> At this point, as->as_ctx == _asr->a_ctx, but I am not sure how much
> this helps / surprises, since the crash happens in the main thread...
> I guess one of the threads freed the main thread's resolver in
> _asr_resolver_done()..?
> 
> I guess that _asr_resolver_done() is supposed to set _asr back to a NULL
> pointer. However, a breakpoint on pthread_create() says that it is not
> doing so. In fact, I cannot find any callers of _asr_resolver_done()...?
> Perhaps some code was changed to directly call _asr_ctx_unref() instead?
> At least the code looks like asr_run() does _asr_async_free() at the end
> and this just calls _asr_ctx_unref() directly, even though the context
> was originally aquired by calling getaddrinfo_async() with last
> parameter NULL, which it passed on to _asr_use_resolver(NULL) and thus
> acquiring the thread-local context.
> 
> Cheers,
> Uli
> -- 
> This can be a, a little complicated. Listen, my advice is... ask
> somebody else for advice, at least someone who's... got more experience
> at...  giving advice.
> 

Thanks for this analysis, it's consistent with my observations I posted
earlier.

-Otto



Re: getaddrinfo() is not thread-safe in 6.8

2020-12-24 Thread Otto Moerbeek
On Thu, Dec 24, 2020 at 08:28:17AM +0100, Otto Moerbeek wrote:

> On Wed, Dec 23, 2020 at 11:58:30PM -0700, Theo de Raadt wrote:
> 
> > Brad Smith  wrote:
> > 
> > > On 12/23/2020 10:35 PM, Alexey Sokolov wrote:
> > > >> Synopsis:  getaddrinfo() is not thread-safe in 6.8
> > > >> Category:  system
> > > >> Environment:
> > > > System  : OpenBSD 6.8
> > > > Details : OpenBSD 6.8 (GENERIC.MP) #1: Tue Nov  3 09:06:04 
> > > > MST 2020
> > > > 
> > > > r...@syspatch-68-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > >
> > > > Architecture: OpenBSD.amd64
> > > > Machine : amd64
> > > >> Description:
> > > > Hi, getaddrinfo() crashes when multiple threads run getaddrinfo()
> > > > concurrently. This didn't happen in 6.7.
> > > > It looks like asr_ctx which is supposed to be thread-local according to
> > > > _asr_use_resolver(), is actually static / shared between threads.
> > > >> How-To-Repeat:
> > > > Compile this code (gcc a.c -pthread), and run. It will segfault in
> > > > several seconds. Happens with both gcc (4.2.1) and egcc (8.4.0).
> > > 
> > > Before going any further. Use the system compiler. That is Clang (cc).
> > 
> > I disagree.   As the test appears fairly idiomatic threading code, I
> > find it unlikely a compiler change should matter.
> > 
> 
> I can reprorduce on arm64 (with clang). If I change do_things() to run
> forever and move the main thread call to do_things down to after the
> creation loop the crash does not seem to happen. So I suppose the bug
> is related to to the creation and destruction of threads and their
> local data.
> 
>   -Otto
> 

Two observations, starting again with the original test code.

1. Removing the do_things() call in the main thread makes the crash go away.  

2. Making sure the thread runtime env is initialized before the call
to do_things() by doing a pthread_attr_init() call (see modified test
program below) make the crash also go away.

So there seem something fishy going on if the main thread starts using
thread local data before the thread runtime is properly initialized.

Cc:ing guenther@ to see if he has something to say.

-Otto


#include 
#include 
#include 
#include 
#include 
#include 

#define NUM_THREADS 5

static void do_lookup(const char *host)
{
int s;
struct addrinfo hints;
struct addrinfo *result;

memset(&hints, 0, sizeof(hints));
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;
hints.ai_flags = AI_ADDRCONFIG;
hints.ai_protocol = IPPROTO_TCP;

s = getaddrinfo(host, NULL, &hints, &result);
if (s != 0) {
fprintf(stderr, "Lookup error for %s: %s\n", host, 
gai_strerror(s));
} else {
freeaddrinfo(result);
}
}

static void *
do_things(void *arg)
{
(void) arg;
do_lookup("ipv4.google.com");
do_lookup("ipv6.google.com");
do_lookup("google.com");
do_lookup("heise.de");
return NULL;
}

int main()
{
pthread_attr_t a;
pthread_t threads[NUM_THREADS];
int i;
int s;

pthread_attr_init(&a); // remove this and the crash reappears
for (;;) {
do_things(NULL);
for (i = 0; i < NUM_THREADS; i++) {
s = pthread_create(&threads[i], NULL, do_things, NULL);
if (s != 0)
fprintf(stderr, "Error creating thread");
}
for (i = 0; i < NUM_THREADS; i++) {
s = pthread_join(threads[i], NULL);
if (s != 0)
fprintf(stderr, "Error joining thread");
}
}
return 0;
}



Re: getaddrinfo() is not thread-safe in 6.8

2020-12-23 Thread Otto Moerbeek
On Wed, Dec 23, 2020 at 11:58:30PM -0700, Theo de Raadt wrote:

> Brad Smith  wrote:
> 
> > On 12/23/2020 10:35 PM, Alexey Sokolov wrote:
> > >> Synopsis:getaddrinfo() is not thread-safe in 6.8
> > >> Category:system
> > >> Environment:
> > >   System  : OpenBSD 6.8
> > >   Details : OpenBSD 6.8 (GENERIC.MP) #1: Tue Nov  3 09:06:04 MST 2020
> > >   
> > > r...@syspatch-68-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > >
> > >   Architecture: OpenBSD.amd64
> > >   Machine : amd64
> > >> Description:
> > > Hi, getaddrinfo() crashes when multiple threads run getaddrinfo()
> > > concurrently. This didn't happen in 6.7.
> > > It looks like asr_ctx which is supposed to be thread-local according to
> > > _asr_use_resolver(), is actually static / shared between threads.
> > >> How-To-Repeat:
> > > Compile this code (gcc a.c -pthread), and run. It will segfault in
> > > several seconds. Happens with both gcc (4.2.1) and egcc (8.4.0).
> > 
> > Before going any further. Use the system compiler. That is Clang (cc).
> 
> I disagree.   As the test appears fairly idiomatic threading code, I
> find it unlikely a compiler change should matter.
> 

I can reprorduce on arm64 (with clang). If I change do_things() to run
forever and move the main thread call to do_things down to after the
creation loop the crash does not seem to happen. So I suppose the bug
is related to to the creation and destruction of threads and their
local data.

-Otto



Re: pthread: segfault with user stack

2020-12-03 Thread Otto Moerbeek
On Wed, Dec 02, 2020 at 10:58:14AM +0100, Sebastien Marie wrote:

> On Wed, Dec 02, 2020 at 08:29:15AM +0100, Otto Moerbeek wrote:
> > 
> > Anyway, here's a man page diff
> 
> another man page diff.
> 
> 
> Zig upstream said that OpenBSD man page for pthread_join() is a bit
> misleading (but they will look to follow the posix way).
> 
> pthread_join man page citation:
> > When a pthread_join() returns successfully, the target thread has been 
> > terminated.
> 
> I agree with them that it isn't perfectly accurate, even if POSIX
> is saying the same.
> 
> When pthread_join() returns successfully, it means that the
> synchronization point between pthread_exit() and pthread_join() has
> been passed, but the target thread could have not called __threxit()
> at this point.
> 
> Is a more nuanced statement is desirable ? Or adding a CAVEAT section
> to precise it ?
> 
> 
> diff 671d91b2e74d462e4a819e4e39e69d9570a7a2b6 /usr/src
> blob - feb059b39d2fb0a0261feaefb821277e612f0c5d
> file + lib/libpthread/man/pthread_join.3
> --- lib/libpthread/man/pthread_join.3
> +++ lib/libpthread/man/pthread_join.3
> @@ -61,7 +61,7 @@ by the terminating thread is stored in the location re
>  .Fa value_ptr .
>  When a
>  .Fn pthread_join
> -returns successfully, the target thread has been terminated.
> +returns successfully, the target thread is about to terminate.
>  The results of multiple simultaneous calls to
>  .Fn pthread_join
>  specifying the same target thread are undefined.

It's hard to come up with the proper formulation. the terminating
thread has passed the "point of no return" in a way. Maybe 

..., the target thread has called pthread_exit(), but note that its
resources might not have been cleaned up yet.

-Otto





Re: pthread: segfault with user stack

2020-12-01 Thread Otto Moerbeek
On Wed, Dec 02, 2020 at 07:48:07AM +0100, Otto Moerbeek wrote:

> On Tue, Dec 01, 2020 at 01:14:22PM -0800, guent...@openbsd.org wrote:
> 
> > On Tue, 1 Dec 2020, Otto Moerbeek wrote:
> > > On Tue, Dec 01, 2020 at 08:00:18PM +0100, Otto Moerbeek wrote:
> > > > On Tue, Dec 01, 2020 at 10:13:29AM -0800, guent...@openbsd.org wrote:
> > ...
> > > > The man page is lacking or even wrong in this respect. It explicitly
> > > > talks about how to do deallocation.
> > 
> > Yeah, that's a bug in the manpage.
> > 
> > 
> > > And curiously, if I use 4*PTHREAD_STACK_MIN for both the mmap size arg
> > > and the pthread_attr_setstack size arg, the crash does not appear.
> > 
> > Uh, that's like noting that whether a use-after-free crashes depends on 
> > the size of the allocation: it's the UAF that's wrong, the size is 
> > irrelevant.
> 
> Of course.  I just was curious why it does npt happen with a different size.
> 
> > 
> > pthread_join() returning merely tells you that the target thread has 
> > gotten far enough into pthread_exit() as to pass its return value to the 
> > joining thread.  It still has more cleanup to do before finally entering 
> > the kernel to vanish and there's no standard API to detect when that's 
> > happened.
> > 
> > I suppose a masochists could use kvm_getprocs() to examine the caller's 
> > own threads, but the real answer is that pthread_attr_setstack() is not 
> > appropriate for threads that will come and go in a long-lived process 
> > where cleanup of the stacks is necessary; for those, if you need to set a 
> > different stack size, use pthread_attr_setstacksize() and let the 
> > implementation handle the allocation and deallocation.
> > 
> > 
> > Philip
> > 
> 

Quick test was *too* quick.

Anyway, here's a man page diff

-Otto

Index: pthread_attr_setstack.3
===
RCS file: /cvs/src/lib/libpthread/man/pthread_attr_setstack.3,v
retrieving revision 1.5
diff -u -p -r1.5 pthread_attr_setstack.3
--- pthread_attr_setstack.3 12 Apr 2018 17:13:34 -  1.5
+++ pthread_attr_setstack.3 2 Dec 2020 07:28:29 -
@@ -46,11 +46,12 @@ the provided stack must be page-aligned.
 It will be replaced (meaning zeroed) with a new
 .Ar MAP_ANON | Ar MAP_STACK
 mapping.
-It is recommended that the initial mapping be allocated using
-an allocator which has a matching deallocator that discards whole
-pages, to clear the
-.Ar MAP_STACK
-attribute afterwards.
+The passed memory object should not be deallocated or reused,
+even when the thread using it has terminated.
+If there is no need for a specific memory object as stack,
+the
+.Xr pthread_attr_set_stacksize 3
+function should be used.
 .Sh RETURN VALUES
 Upon successful completion,
 .Fn pthread_attr_setstack



Re: pthread: segfault with user stack

2020-12-01 Thread Otto Moerbeek
On Tue, Dec 01, 2020 at 01:14:22PM -0800, guent...@openbsd.org wrote:

> On Tue, 1 Dec 2020, Otto Moerbeek wrote:
> > On Tue, Dec 01, 2020 at 08:00:18PM +0100, Otto Moerbeek wrote:
> > > On Tue, Dec 01, 2020 at 10:13:29AM -0800, guent...@openbsd.org wrote:
> ...
> > > The man page is lacking or even wrong in this respect. It explicitly
> > > talks about how to do deallocation.
> 
> Yeah, that's a bug in the manpage.
> 
> 
> > And curiously, if I use 4*PTHREAD_STACK_MIN for both the mmap size arg
> > and the pthread_attr_setstack size arg, the crash does not appear.
> 
> Uh, that's like noting that whether a use-after-free crashes depends on 
> the size of the allocation: it's the UAF that's wrong, the size is 
> irrelevant.

Of course.  I just was curious why it does npt happen with a different size.

> 
> pthread_join() returning merely tells you that the target thread has 
> gotten far enough into pthread_exit() as to pass its return value to the 
> joining thread.  It still has more cleanup to do before finally entering 
> the kernel to vanish and there's no standard API to detect when that's 
> happened.
> 
> I suppose a masochists could use kvm_getprocs() to examine the caller's 
> own threads, but the real answer is that pthread_attr_setstack() is not 
> appropriate for threads that will come and go in a long-lived process 
> where cleanup of the stacks is necessary; for those, if you need to set a 
> different stack size, use pthread_attr_setstacksize() and let the 
> implementation handle the allocation and deallocation.
> 
> 
> Philip
> 



Re: pthread: segfault with user stack

2020-12-01 Thread Otto Moerbeek
On Tue, Dec 01, 2020 at 08:00:18PM +0100, Otto Moerbeek wrote:

> On Tue, Dec 01, 2020 at 10:13:29AM -0800, guent...@openbsd.org wrote:
> 
> > On Tue, 1 Dec 2020, Sebastien Marie wrote:
> > > I have a random segfault while using threads with custom stack.
> > > 
> > > In short, I am doing:
> > > - allocate a stack space for the thread
> > > - create a thread (with custom stack using pthread_attr_setstack())
> > > - join the thread
> > > - free the allocated stack
> > > - create a new thread, etc...
> > > 
> > > If I remove the fact to free the allocate stack once done, it seems
> > > fine (but reach ENOMEM after a while).
> > > 
> > > I am suspecting some state corruption or stack reuse, but I can't find
> > > where for now. Or am I doing bad thing and I shouldn't deallocate the
> > > stack at his place ?
> > 
> > You should not deallocate the stack.  To quote POSIX, XSI 2.9.8:
> > 
> > 
> > 2.9.8  Use of Application-Managed Thread Stacks
> > 
> >An "application-managed thread stack" is a region of memory 
> > allocated by the application--or
> >example, memory returned by the malloc() or mmap() functions--and 
> > designated as a stack
> >through the act of passing the address and size of the stack, 
> > respectively, as the stackaddr and
> >stacksize arguments to pthread_attr_setstack(). Application-managed 
> > stacks allow the application
> >to precisely control the placement and size of a stack.
> > 
> >The application grants to the implementation permanent ownership of 
> > and control over the
> >application-managed stack when the attributes object in which the 
> > stack or stackaddr attribute has
> >been set is used, either by presenting that attribute's object as 
> > the attr argument in a call to
> >pthread_create() that completes successfully, or by storing a 
> > pointer to the attributes object in the
> >sigev_notify_attributes member of a struct sigevent and passing that 
> > struct sigevent to a function
> >accepting such argument that completes successfully. The application 
> > may thereafter utilize the
> >memory within the stack only within the normal context of stack 
> > usage within or properly
> >synchronized with a thread that has been scheduled by the 
> > implementation with stack pointer
> >value(s) that are within the range of that stack. In particular, the 
> > region of memory cannot be
> >freed, nor can it be later specified as the stack for another thread.
> > 
> > 
> > Note that last sentence.
> > 
> > 
> > Philip Guenther
> > 
> 
> The man page is lacking or even wrong in this respect. It explicitly
> talks about how to do deallocation.

And curiously, if I use 4*PTHREAD_STACK_MIN for both the mmap size arg
and the pthread_attr_setstack size arg, the crash does not appear.

-Otto



Re: pthread: segfault with user stack

2020-12-01 Thread Otto Moerbeek
On Tue, Dec 01, 2020 at 10:13:29AM -0800, guent...@openbsd.org wrote:

> On Tue, 1 Dec 2020, Sebastien Marie wrote:
> > I have a random segfault while using threads with custom stack.
> > 
> > In short, I am doing:
> > - allocate a stack space for the thread
> > - create a thread (with custom stack using pthread_attr_setstack())
> > - join the thread
> > - free the allocated stack
> > - create a new thread, etc...
> > 
> > If I remove the fact to free the allocate stack once done, it seems
> > fine (but reach ENOMEM after a while).
> > 
> > I am suspecting some state corruption or stack reuse, but I can't find
> > where for now. Or am I doing bad thing and I shouldn't deallocate the
> > stack at his place ?
> 
> You should not deallocate the stack.  To quote POSIX, XSI 2.9.8:
> 
> 
> 2.9.8  Use of Application-Managed Thread Stacks
> 
>An "application-managed thread stack" is a region of memory allocated 
> by the application--or
>example, memory returned by the malloc() or mmap() functions--and 
> designated as a stack
>through the act of passing the address and size of the stack, 
> respectively, as the stackaddr and
>stacksize arguments to pthread_attr_setstack(). Application-managed 
> stacks allow the application
>to precisely control the placement and size of a stack.
> 
>The application grants to the implementation permanent ownership of 
> and control over the
>application-managed stack when the attributes object in which the 
> stack or stackaddr attribute has
>been set is used, either by presenting that attribute's object as the 
> attr argument in a call to
>pthread_create() that completes successfully, or by storing a pointer 
> to the attributes object in the
>sigev_notify_attributes member of a struct sigevent and passing that 
> struct sigevent to a function
>accepting such argument that completes successfully. The application 
> may thereafter utilize the
>memory within the stack only within the normal context of stack usage 
> within or properly
>synchronized with a thread that has been scheduled by the 
> implementation with stack pointer
>value(s) that are within the range of that stack. In particular, the 
> region of memory cannot be
>freed, nor can it be later specified as the stack for another thread.
> 
> 
> Note that last sentence.
> 
> 
> Philip Guenther
> 

The man page is lacking or even wrong in this respect. It explicitly
talks about how to do deallocation.

-Otto



Re: apu4 fatal protection fault in supervisor mode [Was: apu4 kernel panic]

2020-11-29 Thread Otto Moerbeek
On Sun, Nov 29, 2020 at 06:38:15PM -, Christian Weisgerber wrote:

> On 2020-11-29, Theo Buehler  wrote:
> 
> > Thanks for digging into this. Your APU seems much worse off than mine,
> > which takes a few weeks before crashing these days, so it's not much use
> > for bisecting.
> 
> The APU2 that serves as my home gateway has been running just fine.
> 
> OpenBSD 6.8-current (GENERIC.MP) #1: Thu Oct 29 00:14:31 CET 2020
> na...@ariolc.mips.inka.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> ...
> bios0: vendor coreboot version "v4.9.0.6" date 06/08/2019
> 
> # uptime
>  7:37PM  up 31 days, 17:38, 1 user, load averages: 0.00, 0.00, 0.00
> 
> -- 
> Christian "naddy" Weisgerber  na...@mips.inka.de
> 

Same here for my two apu2's, 

-Otto



Re: OpenBSD crashing after being setup as a router hooked up to my modem

2020-11-21 Thread Otto Moerbeek
On Sat, Nov 21, 2020 at 05:20:37PM -0500, sam wrote:

> >Environment:
>     System  : OpenBSD 6.8
>     Details : OpenBSD 6.8 (GENERIC.MP) #1: Tue Nov  3 09:06:04 MST 2020
>  r...@syspatch-68-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>     Architecture: OpenBSD.amd64
>     Machine : amd64
> >Description:
>     I recently bought an APU4d4 and I set it up as an OpenBSD router for my
> house.
>     I connect the first interface of the device to my modem. I then connect
> the
>     second port on the device to my switch. I then enable IP forwarding from
> the first
>     nic to the second nic, and enable dhcpd on the nic that's connected to
> the switch.
> 
>     It's fully operational, however after a time (it varies) OpenBSD
> crashes. I've tried
>     running a syspatch and updating all packages to no avail. I'm tried to
> update the
>     device's BIOS and I've reinstalled OpenBSD multiple times.
> 
>     When I had this device the latest version of OpenBSD was 6.7 so I
> figured maybe it was
>     exclusive to that version, but after testing it on 6.8 today the same
> exact issue occurs.
>     This only happens when I have the device setup to handle all data from
> the LAN to the internet.
> 
>     I've set the router up to create a small LAN and I observed no issues.
> This leads me to believe
>     maybe there's some sort of issue with there being too much traffic for
> the device to handle?
>     However, I'm not abel to find anyone else who's having this issue so I'm
> not too sure
> 
>     That's all the information I've been able to gather. I've provided
> screenshots of the crash below and
> 
>     I've also provided my pf.conf file if that my be of any interest. Some
> of the ip addresses have been replaced
> 
>     with X's and such.
> 
> >How-To-Repeat:
>     1. Setup OpenBSD to ip forward between 2 NICs and enable PF
>     2. Connect one NIC to modem, and one to a switch.
>     3. setup DHCP on the nic that's connected to the switch.
>     4. After a seemingly random amount of time, the system will crash.
> >Fix:
>     N/A

You have a serial console, please capture logs through that. txt logs
are the preferred format. Plus post at least the output of ifconfig -A

You could try to start with a more basic pf config and then add pieces
step by step to see if you can find a part that correlates to your
crashes.

-Otto


> 
> dmesg:
> OpenBSD 6.8 (GENERIC.MP) #1: Tue Nov  3 09:06:04 MST 2020
> r...@syspatch-68-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 1996484608 (1903MB)
> avail mem = 1920987136 (1831MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0x7ee8b020 (13 entries)
> bios0: vendor coreboot version "v4.12.0.3" date 07/30/2020
> bios0: PC Engines apu4
> acpi0 at bios0: ACPI 6.0
> acpi0: sleep states S0 S1 S4 S5
> acpi0: tables DSDT FACP SSDT MCFG TPM2 APIC HEST SSDT SSDT DRTM HPET
> acpi0: wakeup devices PBR4(S4) PBR5(S4) PBR6(S4) PBR7(S4) PBR8(S4) UOH1(S3)
> UOH2(S3) UOH3(S3) UOH4(S3) UOH5(S3) UOH6(S3) XHC0(S4)
> acpitimer0 at acpi0: 3579545 Hz, 32 bits
> acpimcfg0 at acpi0
> acpimcfg0: addr 0xf800, bus 0-64
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: AMD GX-412TC SOC, 998.27 MHz, 16-30-01
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PERFTSC,PCTRL3,ITSC,BMI1,XSAVEOPT
> cpu0: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line
> 16-way L2 cache
> cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
> cpu0: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: AMD GX-412TC SOC, 998.24 MHz, 16-30-01
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PERFTSC,PCTRL3,ITSC,BMI1,XSAVEOPT
> cpu1: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line
> 16-way L2 cache
> cpu1: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
> cpu1: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: AMD GX-412TC SOC, 998.13 MHz, 16-30-01
> cpu2: 
> FPU,VME,DE,

Re: rge(4) interrupt storm

2020-11-21 Thread Otto Moerbeek
On Fri, Nov 20, 2020 at 04:25:31PM +0100, Otto Moerbeek wrote:

> On Fri, Nov 20, 2020 at 02:01:55PM +0100, Claudio Jeker wrote:
> 
> > On Fri, Nov 20, 2020 at 11:32:18AM +0100, Otto Moerbeek wrote:
> > > On Fri, Nov 20, 2020 at 11:09:25AM +0100, Mark Kettenis wrote:
> > > 
> > > > It's a relatively new driver.  It uses MSI which pretty much rules out
> > > > an issue with shared interrupts.  So I suspect this is an issue with
> > > > the rge(4) driver.  In the past we have fun with packet counter
> > > > overflow interrupts.  Is the storm present immediately after you bring
> > > > up the interface?  Or even before?
> > > 
> > > No storm if not configured and no cable plugged in.
> > > No storm if not configured and cable plugged in
> > > No storm if configured and no cable
> > > 
> > > Storm start when I plug the cable in.
> > 
> > Sounds like an unexpected interrupt source that should probably be masked.
> > 
> > I would look at rge_intr() and what status you get and compare it to the
> > RGE_ISR defines. This may help to figure out what is going on.
> > 
> > -- 
> > :wq Claudio
> 
> The value of status after the RGE_READ_4 call is 0x10 all the time:
> RGE_ISR_RX_DESC_UNAVAIL
> 
>   -Otto
> 

If I apply the diff below the device starts to work without interrupt storm.
This is pure blind coding, I have little idea what I'm doing...

-Otto

Index: dev/pci/if_rgereg.h
===
RCS file: /cvs/src/sys/dev/pci/if_rgereg.h,v
retrieving revision 1.4
diff -u -p -r1.4 if_rgereg.h
--- dev/pci/if_rgereg.h 31 Oct 2020 07:50:41 -  1.4
+++ dev/pci/if_rgereg.h 21 Nov 2020 13:06:39 -
@@ -88,7 +88,7 @@
 
 #define RGE_INTRS  \
(RGE_ISR_RX_OK | RGE_ISR_RX_ERR | RGE_ISR_TX_OK |   \
-   RGE_ISR_TX_ERR | RGE_ISR_RX_DESC_UNAVAIL | RGE_ISR_LINKCHG |\
+   RGE_ISR_TX_ERR | RGE_ISR_LINKCHG |  \
RGE_ISR_TX_DESC_UNAVAIL | RGE_ISR_PCS_TIMEOUT | RGE_ISR_SYSTEM_ERR)
 
 #define RGE_INTRS_TIMER\



Re: rge(4) interrupt storm

2020-11-20 Thread Otto Moerbeek
On Fri, Nov 20, 2020 at 02:01:55PM +0100, Claudio Jeker wrote:

> On Fri, Nov 20, 2020 at 11:32:18AM +0100, Otto Moerbeek wrote:
> > On Fri, Nov 20, 2020 at 11:09:25AM +0100, Mark Kettenis wrote:
> > 
> > > It's a relatively new driver.  It uses MSI which pretty much rules out
> > > an issue with shared interrupts.  So I suspect this is an issue with
> > > the rge(4) driver.  In the past we have fun with packet counter
> > > overflow interrupts.  Is the storm present immediately after you bring
> > > up the interface?  Or even before?
> > 
> > No storm if not configured and no cable plugged in.
> > No storm if not configured and cable plugged in
> > No storm if configured and no cable
> > 
> > Storm start when I plug the cable in.
> 
> Sounds like an unexpected interrupt source that should probably be masked.
> 
> I would look at rge_intr() and what status you get and compare it to the
> RGE_ISR defines. This may help to figure out what is going on.
> 
> -- 
> :wq Claudio

The value of status after the RGE_READ_4 call is 0x10 all the time:
RGE_ISR_RX_DESC_UNAVAIL

-Otto



Re: rge(4) interrupt storm

2020-11-20 Thread Otto Moerbeek
On Fri, Nov 20, 2020 at 11:09:25AM +0100, Mark Kettenis wrote:

> It's a relatively new driver.  It uses MSI which pretty much rules out
> an issue with shared interrupts.  So I suspect this is an issue with
> the rge(4) driver.  In the past we have fun with packet counter
> overflow interrupts.  Is the storm present immediately after you bring
> up the interface?  Or even before?

No storm if not configured and no cable plugged in.
No storm if not configured and cable plugged in
No storm if configured and no cable

Storm start when I plug the cable in.

-Otto



  1   2   3   4   >