Re: bounty for virtio 1.0 (now with instructions!)
On 03.11.2020 23:20, co...@sdf.org wrote: > On Tue, Nov 03, 2020 at 10:42:27PM +0100, Martin Husemann wrote: >> On Tue, Nov 03, 2020 at 10:23:30PM +0100, Reinoud Zandijk wrote: >>> To be clear, do we want to (keep) supporting legacy devices? Its not >>> required >>> in 1.0 and could clean up the code a lot! >> >> Yes, we need that still, as not all hosts offer the newer ones. > > The QEMU people mentioned that even if "legacy virtio" IDs are used, > there's a bit to show that it's compatible with newer virtio. > Things that claim old virtio probably do both old & new. > "Transitional virtio" supports both, 1.0 and 0.9. I had a draft work with the upgrade of virtio, but incomplete.. signature.asc Description: OpenPGP digital signature
Re: make COMPAT_LINUX match SYSV binaries
On 21.10.2020 14:14, co...@sdf.org wrote: > On Tue, Oct 20, 2020 at 07:11:05PM +, co...@sdf.org wrote: >> hello, >> >> As a background, some Linux binaries don't claim to be targeting the >> Linux OS, but instead are "SYSV". >> >> We have used some heuristics to still identify those binaries as being >> Linux binaries, like looking into the symbols defined by the binary. >> >> it looks like we no longer have other forms of compat expected to use >> SYSV ELF binaries. Perhaps we should drop this elaborate detection logic >> in favour of detecting SYSV == Linux? >> >> As an added bonus, it allows detecting binaries built with a musl >> toolchain as being Linux binaries. >> > > I feel compelled to explain further: > any OS that doesn't rely on this tag is prone to spitting out binaries > with the wrong tag. For example, Go spits out Solaris binaries with SYSV > as well. > > Our current solution to it is the kernel reading through the binary, > checking if it contains certain known symbols that are common on Linux. > > We support the following forms of compat: > > ultrixnot ELF > sunos not ELF (we support only oold stuff) > freebsd always correctly tagged, because the native OS > checks this, like we do. > linux ELF, not always correctly tagged > > > So, currently, we only support one OS that has this problem, which is > linux. I am proposing we take advantage of it. > > In the event someone adds support for another OS with this problem (say, > modern Solaris), I don't expect this compat to be enabled by default, > for security reasons. So the problem will only occur if a user enables > both forms of compat at the same time. > > Users already have to opt in to have Linux compat support. I think it is > a lot to ask to have them tag every binary. > I couldn't run musl binaries without either patching the kernel or ELF files, so I'm for making this easier. In my case, I had to add manually build-id tag to musl binaries. For some reason someone in the kernel assumed that they are always present, which is just a special case in some distros. signature.asc Description: OpenPGP digital signature
Re: [PATCH v2] Issue 64-bit versions of *XSAVE* for 64-bit amd64 programs
The same bug was fixed in FreeBSD here: "amd64: Store full 64bit of FIP/FDP for 64bit processes when using XSAVE." https://github.com/freebsd/freebsd/commit/62a9018a8878533432500e5cb89f9bd07fd9ef14 Kamil Rytarowski CTO, Moritz Systems www.moritz.systems pt., 16 paź 2020 o 15:26 Michał Górny napisał(a): > > When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64 > use the 64-suffixed variant in order to include the complete FIP/FDP > registers in the x87 area. > > The difference between the two variants is that the FXSAVE64 (new) > variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64), > while the legacy FXSAVE variant uses split fields: 32-bit offset, > 16-bit segment and 16-bit reserved field (union fp_addr.fa_32). > The latter implies that the actual addresses are truncated to 32 bits > which is insufficient in modern programs. > > The change is applied only to 64-bit programs on amd64. Plain i386 > and compat32 continue using plain FXSAVE. Similarly, NVMM is not > changed as I am not familiar with that code. > > This is a potentially breaking change. However, I don't think it likely > to actually break anything because the data provided by the old variant > were not meaningful (because of the truncated pointer). > --- > sys/arch/x86/include/cpufunc.h | 76 ++ > sys/arch/x86/include/fpu.h | 4 +- > sys/arch/x86/x86/fpu.c | 30 ++ > sys/dev/nvmm/x86/nvmm_x86_svm.c| 6 +- > sys/dev/nvmm/x86/nvmm_x86_vmx.c| 6 +- > tests/lib/libc/sys/t_ptrace_x86_wait.h | 2 - > 6 files changed, 105 insertions(+), 19 deletions(-) > > diff --git a/sys/arch/x86/include/cpufunc.h b/sys/arch/x86/include/cpufunc.h > index 38534c305544..dd8b0c7dc375 100644 > --- a/sys/arch/x86/include/cpufunc.h > +++ b/sys/arch/x86/include/cpufunc.h > @@ -485,6 +485,82 @@ xrstor(const void *addr, uint64_t mask) > ); > } > > +#ifdef __x86_64__ > +static inline void > +fxsave64(void *addr) > +{ > + uint8_t *area = addr; > + > + __asm volatile ( > + "fxsave64 %[area]" > + : [area] "=m" (*area) > + : > + : "memory" > + ); > +} > + > +static inline void > +fxrstor64(const void *addr) > +{ > + const uint8_t *area = addr; > + > + __asm volatile ( > + "fxrstor64 %[area]" > + : > + : [area] "m" (*area) > + : "memory" > + ); > +} > + > +static inline void > +xsave64(void *addr, uint64_t mask) > +{ > + uint8_t *area = addr; > + uint32_t low, high; > + > + low = mask; > + high = mask >> 32; > + __asm volatile ( > + "xsave64%[area]" > + : [area] "=m" (*area) > + : "a" (low), "d" (high) > + : "memory" > + ); > +} > + > +static inline void > +xsaveopt64(void *addr, uint64_t mask) > +{ > + uint8_t *area = addr; > + uint32_t low, high; > + > + low = mask; > + high = mask >> 32; > + __asm volatile ( > + "xsaveopt64 %[area]" > + : [area] "=m" (*area) > + : "a" (low), "d" (high) > + : "memory" > + ); > +} > + > +static inline void > +xrstor64(const void *addr, uint64_t mask) > +{ > + const uint8_t *area = addr; > + uint32_t low, high; > + > + low = mask; > + high = mask >> 32; > + __asm volatile ( > + "xrstor64 %[area]" > + : > + : [area] "m" (*area), "a" (low), "d" (high) > + : "memory" > + ); > +} > +#endif > + > /* > -- */ > > #ifdef XENPV > diff --git a/sys/arch/x86/include/fpu.h b/sys/arch/x86/include/fpu.h > index 3255a8ca39e0..bdd86abfdd39 100644 > --- a/sys/arch/x86/include/fpu.h > +++ b/sys/arch/x86/include/fpu.h > @@ -14,8 +14,8 @@ struct trapframe; > void fpuinit(struct cpu_info *); > void fpuinit_mxcsr_mask(void); > > -void fpu_area_save(void *, uint64_t); > -void fpu_area_restore(const void *, uint64_t); > +void fpu_area_save(void *, uint64_t, bool); > +void fpu_area_restore(const void *, uint64_t, bool); > > void fpu_save(void); > > diff --git a/sys/arch/x86/x86/fpu.c b/sys/arch/x86/x86/
Re: CVS commit: src/external/gpl3/gcc/dist/gcc/config/aarch64
On 15.10.2020 17:14, Rin Okuyama wrote: > On 2020/10/15 16:12, matthew green wrote: >> Martin Husemann writes: >>> On Thu, Oct 15, 2020 at 05:28:12PM +1100, matthew green wrote: you could try reverting most of our changes to this file and making sure you run with /proc mounted -o linux. ryo@ recently added additional /proc/cpuinfo support that should make this just work with the upstream version, but i haven't had chance to update and see if this is the case. > > I've confirmed that -mtune=native works fine at least for A53, > even if all the local changes to driver-aarch64.c is reverted. > I will commit it soon. > >>> If we go this route, we should make the relevant procfs nodes >>> independent >>> of -o linux. >> >> that would be fine by me. > > Nowadays, -o linux is turned on by default (unless nolinux is > specified explicitly). Still, native apps probably should not > depend on it. > > This needs MI changes to procfs, not MD to aarch64. Should we > enable /proc/cpuinfo unconditionally? I'm against the policy of restoring the /proc dependency for this corner case in one application. We need to upstream the NetBSD specific patches to mainline GCC. signature.asc Description: OpenPGP digital signature
Re: [PATCH 0/2] Delete CIRCLEQ
This removal is a part of a larger synchronization with other BSDs as we lack various features in sys/queue.h (like LIST_PREV()). CIRCLEQ was already deleted from the documentation and disabled in the kernel in NetBSD-7. If there are still any unaware users, they are certainly long broken. What's the benefit of keeping it around and having no users and documented deprecation plus being prone to miscompilation? The removal does not break libc or kernel ABIs. Most 3rd party users of these macros deliver a homegrown copy of sys/queue.h anyway. Kamil Rytarowski CTO, Moritz Systems www.moritz.systems pon., 12 paź 2020 o 13:15 Mouse napisał(a): > > >>> Remove the CIRCLEQ API completely from the system headers and > >>> document this fact in the QUEUE(3) man-page. > > >> why? queue.h may be used by more than src. > > >> i don't see any benefit except forcing working code to be changed, > >> possibly introducing bugs. > > >> please leave it alone. > > > It's been deprecated since -7, we can remove it for -10. > > So? I still agree with mrg: I too see no benefit to removing it (or > for that matter to deprecating it). What am I missing? What benefit > do you see? > > /~\ The ASCII Mouse > \ / Ribbon Campaign > X Against HTMLmo...@rodents-montreal.org > / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: [PATCH 0/2] Delete CIRCLEQ
Everything relatively modern that uses sys/queue.h directly was already switched a long time ago to TAILQ. Christos Zoulas changed most users of CIRCLEQ in the src tree 6 years ago. The last leftover is handled in this patchset. I was able to find some 3rd projects using CIRCLEQ, but probably all of them are both: 1. old 2. ship with a local copy of sys/queue.h. There is not much working code using CIRCLEQ left and NetBSD is pretty much the last one to deliver it. It's also broken for modern compilers and we need to launder pointers to workaround the pointer aliasing design flaw. Kamil Rytarowski CTO, Moritz Systems www.moritz.systems pon., 12 paź 2020 o 04:23 matthew green napisał(a): > > Kamil Rytarowski writes: > > Switch the last user (ypserv) from CIRCLEQ to TAILQ. > > This is inspired by analogous refactoring from OpenBSD: > > https://github.com/openbsd/src/commit/d53c0cf4d32fdbd8b42debfba068f1b0efa423dc#diff-8d0a4fbb89658213ebf314ff188581d7 > > > > Remove the CIRCLEQ API completely from the system headers and document > > this fact in the QUEUE(3) man-page. > > why? queue.h may be used by more than src. > > i don't see any benefit except forcing working code to > be changed, possibly introducing bugs. > > please leave it alone. > > > .mrg.
[PATCH 2/2] Remove the CIRCLEQ API
It was marked deprecated in NetBSD 7 and already removed from FreeBSD in 2000 and OpenBSD in 2015. --- share/man/man3/queue.3 | 10 +++ sys/sys/queue.h| 196 - 2 files changed, 10 insertions(+), 196 deletions(-) diff --git a/share/man/man3/queue.3 b/share/man/man3/queue.3 index 4293090986d4..359c772e5439 100644 --- a/share/man/man3/queue.3 +++ b/share/man/man3/queue.3 @@ -1091,3 +1091,13 @@ and .Nm STAILQ functions first appeared in .Fx 2.1.5 . +.Pp +The +.Nm CIRCLEQ +functions first appeared in +.Bx 4.4 +and were deprecated in +.Nx 7 +and removed in +.Nx 10 +due to the pointer aliasing violations. diff --git a/sys/sys/queue.h b/sys/sys/queue.h index 5764c7a98a53..ec686364126b 100644 --- a/sys/sys/queue.h +++ b/sys/sys/queue.h @@ -69,14 +69,6 @@ * after an existing element, at the head of the list, or at the end of * the list. A tail queue may be traversed in either direction. * - * A circle queue is headed by a pair of pointers, one to the head of the - * list and the other to the tail of the list. The elements are doubly - * linked so that an arbitrary element can be removed without a need to - * traverse the list. New elements can be added to the list before or after - * an existing element, at the head of the list, or at the end of the list. - * A circle queue may be traversed in either direction, but has a more - * complex end of list detection. - * * For details on the use of these macros, see the queue(3) manual page. */ @@ -663,192 +655,4 @@ struct { \ ((struct type *)(void *)\ ((char *)((head)->stqh_last) - offsetof(struct type, field - -#ifndef _KERNEL -/* - * Circular queue definitions. Do not use. We still keep the macros - * for compatibility but because of pointer aliasing issues their use - * is discouraged! - */ - -/* - * __launder_type(): We use this ugly hack to work around the compiler - * noticing that two types may not alias each other and elide tests in code. - * We hit this in the CIRCLEQ macros when comparing 'struct name *' and - * 'struct type *' (see CIRCLEQ_HEAD()). Modern compilers (such as GCC - * 4.8) declare these comparisons as always false, causing the code to - * not run as designed. - * - * This hack is only to be used for comparisons and thus can be fully const. - * Do not use for assignment. - * - * If we ever choose to change the ABI of the CIRCLEQ macros, we could fix - * this by changing the head/tail sentinal values, but see the note above - * this one. - */ -static __inline const void * __launder_type(const void *); -static __inline const void * -__launder_type(const void *__x) -{ - __asm __volatile("" : "+r" (__x)); - return __x; -} - -#if defined(QUEUEDEBUG) -#define QUEUEDEBUG_CIRCLEQ_HEAD(head, field) \ - if ((head)->cqh_first != CIRCLEQ_ENDC(head) && \ - (head)->cqh_first->field.cqe_prev != CIRCLEQ_ENDC(head))\ - QUEUEDEBUG_ABORT("CIRCLEQ head forw %p %s:%d", (head), \ - __FILE__, __LINE__); \ - if ((head)->cqh_last != CIRCLEQ_ENDC(head) && \ - (head)->cqh_last->field.cqe_next != CIRCLEQ_ENDC(head)) \ - QUEUEDEBUG_ABORT("CIRCLEQ head back %p %s:%d", (head), \ - __FILE__, __LINE__); -#define QUEUEDEBUG_CIRCLEQ_ELM(head, elm, field) \ - if ((elm)->field.cqe_next == CIRCLEQ_ENDC(head)) { \ - if ((head)->cqh_last != (elm)) \ - QUEUEDEBUG_ABORT("CIRCLEQ elm last %p %s:%d", \ - (elm), __FILE__, __LINE__); \ - } else {\ - if ((elm)->field.cqe_next->field.cqe_prev != (elm)) \ - QUEUEDEBUG_ABORT("CIRCLEQ elm forw %p %s:%d", \ - (elm), __FILE__, __LINE__); \ - } \ - if ((elm)->field.cqe_prev == CIRCLEQ_ENDC(head)) { \ - if ((head)->cqh_first != (elm)) \ - QUEUEDEBUG_ABORT("CIRCLEQ elm first %p %s:%d", \ - (elm), __FILE__, __LINE__); \ - } else {\ - if ((elm)->field.cqe_prev->field.cqe_next != (elm)) \ - QUEUEDEBUG_ABORT("CIRCLEQ elm prev %p %s:%d", \ - (elm), __FILE__, __LINE__); \ - } -#define QUEUEDEBUG_CIRCLEQ_POSTREMOVE(elm, field) \ - (elm)->field.cqe_next = (void *)1L; \ - (elm)->f
[PATCH 1/2] Convert the CIRCLEQ (from sys/queue.h) usage to TAILQ
The CIRCLEQ API from sys/queue.h is deprecated since NetBSD 7 and it will be removed soon. The CIRCLEQ API implementation is prone to a miscompilation due to the pointer aliasing and is discouraged. --- usr.sbin/ypserv/ypserv/ypserv_db.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/usr.sbin/ypserv/ypserv/ypserv_db.c b/usr.sbin/ypserv/ypserv/ypserv_db.c index d6f9eda39eaa..9edfabc4f675 100644 --- a/usr.sbin/ypserv/ypserv/ypserv_db.c +++ b/usr.sbin/ypserv/ypserv/ypserv_db.c @@ -65,7 +65,7 @@ __RCSID("$NetBSD: ypserv_db.c,v 1.22 2011/02/01 21:00:25 chuck Exp $"); LIST_HEAD(domainlist, opt_domain); /* LIST of domains */ LIST_HEAD(maplist, opt_map); /* LIST of maps (in a domain) */ -CIRCLEQ_HEAD(mapq, opt_map); /* CIRCLEQ of maps (LRU) */ +TAILQ_HEAD(mapq, opt_map); /* TAILQ of maps (LRU) */ struct opt_map { char*map; /* map name (malloc'd) */ @@ -76,7 +76,7 @@ struct opt_map { dev_t dbdev; /* device db is on */ ino_t dbino; /* inode of db */ time_t dbmtime;/* time of last db modification */ - CIRCLEQ_ENTRY(opt_map) mapsq; /* map queue pointers */ + TAILQ_ENTRY(opt_map) mapsq; /* map queue pointers */ LIST_ENTRY(opt_map) mapsl; /* map list pointers */ }; @@ -106,7 +106,7 @@ ypdb_init(void) { LIST_INIT(&doms); - CIRCLEQ_INIT(&maps); + TAILQ_INIT(&maps); } /* @@ -161,7 +161,7 @@ yp_private(datum key, int ypprivate) void ypdb_close_map(struct opt_map *map) { - CIRCLEQ_REMOVE(&maps, map, mapsq); /* remove from LRU circleq */ + TAILQ_REMOVE(&maps, map, mapsq);/* remove from LRU tailq */ LIST_REMOVE(map, mapsl);/* remove from domain list */ #ifdef DEBUG @@ -326,8 +326,8 @@ ypdb_open_db(const char *domain, const char *map, u_int *status, */ if (finfo.st_dev == m->dbdev && finfo.st_ino == m->dbino && finfo.st_mtime == m->dbmtime) { - CIRCLEQ_REMOVE(&maps, m, mapsq); /* adjust LRU queue */ - CIRCLEQ_INSERT_HEAD(&maps, m, mapsq); + TAILQ_REMOVE(&maps, m, mapsq); /* adjust LRU queue */ + TAILQ_INSERT_HEAD(&maps, m, mapsq); if (map_info) *map_info = m; return (m->db); @@ -423,7 +423,7 @@ retryopen: m->dbdev = finfo.st_dev; m->dbino = finfo.st_ino; m->dbmtime = finfo.st_mtime; - CIRCLEQ_INSERT_HEAD(&maps, m, mapsq); + TAILQ_INSERT_HEAD(&maps, m, mapsq); LIST_INSERT_HEAD(&d->dmaps, m, mapsl); if (strcmp(map, YP_HOSTNAME) == 0 || strcmp(map, YP_HOSTADDR) == 0) { if (!usedns) { -- 2.28.0
[PATCH 0/2] Delete CIRCLEQ
Switch the last user (ypserv) from CIRCLEQ to TAILQ. This is inspired by analogous refactoring from OpenBSD: https://github.com/openbsd/src/commit/d53c0cf4d32fdbd8b42debfba068f1b0efa423dc#diff-8d0a4fbb89658213ebf314ff188581d7 Remove the CIRCLEQ API completely from the system headers and document this fact in the QUEUE(3) man-page. Kamil Rytarowski (2): Convert the CIRCLEQ (from sys/queue.h) usage to TAILQ Remove the CIRCLEQ API share/man/man3/queue.3 | 10 ++ sys/sys/queue.h| 196 - usr.sbin/ypserv/ypserv/ypserv_db.c | 14 +-- 3 files changed, 17 insertions(+), 203 deletions(-) -- 2.28.0
Re: SIGCHLD and sigaction()
On 19.08.2020 20:02, Roy Marples wrote: > On 18/08/2020 20:52, Mouse wrote: Perhaps it would need a new flavour of file descriptor, [...] >>> Linux has apparently done this: pidfd (file descriptors representing >>> a process). The idea is that you can pass them to various system >>> call variants that otherwise take pids, without the risk that the >>> process has exited in the mean time and the pid re-used. >> >> I've been thinking about something like that myself, starting with >> AF_PID sockets, then deciding they wouldn't/couldn't work (as think I >> mentioned in this thread, the socket infrastructure really wants the >> contents of a socket to be independent of who's accessing it). >> Personally, I've wanted it as a way to provide an out-of-band channel >> to userland programs (like a control command channel for various >> daemons), but...hmm. >> >> Feels strange to find an idea I like coming from Linux. > > You can relax. > FreeBSD did it first for Capsicum. > > https://www.freebsd.org/cgi/man.cgi?query=pdfork&sektion=2 > > Roy Hi, I have got a draft work on this. It's composed of: 1. New id_t type in P_PIDFD. 2. sigsendset(2) + sigsend(3), picked from SVID (Solaris, etc) It's a generalized version of kill(2). It allows specifying P_PIDFD. 3. waitid supporting new id type: P_PIDFD. Linux is close to the above, except reinventing the wheel instead of picking sigsendset(2) + integration with /proc. FreeBSD went with a distinct direction and invented new semantics of process file descriptors, that differs to references over pid_t. For example whenever you close(2) a file descriptor, it kills the process. You also must wait for process events in FreeBSD using kevent(2). This introduced incompatibilities with the established UNIX semantics. Linux went a different path and whenever a process dies, we get ESRCH, which is sensible as it avoids e.g. killing a random process with recycled pid_t. We want process file descriptors to reference process handles through PID namespaces, for hopefully upcoming support of containers. signature.asc Description: OpenPGP digital signature
Proposal to enable WAPBL by default for 10.0
I propose to enable WAPBL ("log" in fstab(5)) by default for 10.0 and newer. WAPBL - Write Ahead Physical Block Logging file system journaling More info on WAPBL in wapbl(4). https://netbsd.gw.com/cgi-bin/man-cgi?wapbl++NetBSD-current Rationale: the default filesystem (FFS) without WAPBL is more prone to data loss. A potential blocker: http://gnats.netbsd.org/54504 panic: kernel diagnostic assertion "(wapbl_transaction_len(wl) <= (wl->wl_circ_size - wl->wl_reserved_bytes))" failed: file ".../src/sys/kern/vfs_wapbl.c", line 1271 wapbl_end: current transaction too big to flush signature.asc Description: OpenPGP digital signature
Re: kernel stack usage
On 04.07.2020 15:51, Jaromír Doleček wrote: > Le sam. 4 juil. 2020 à 15:30, Kamil Rytarowski a écrit : >>> Kamil - what's the difference in gcc between -Wframe-larger-than= and >>> -Wstack-size= ? >>> >> >> -Wstack-size doesn't exist? > > Sorry, meant -Wstack-usage= > It looks like WStack-usage is GCC specific (absent in Clang) and it includes alloca + vla. Both are not welcome in the kernel anyway so both should give in most cases the same result. >>> I see according to gcc documentation -Wframe-larger-than doesn't count >>> size for alloca() and variable-length arrays, which makes it much less >>> useful than -Wstack-usage. >>> >> >> It's not a problem. >> >> Whenever alloca or VLA is in use, it's already breaking the stack >> protector. There are a few exceptions registered in sys/conf/ssp.mk. >> >> We could add -Walloca -Wvla for USE_SSP=yes builds to catch quickly >> inappropriate usage (frequently violated by code optimizer). > > It's already not used in our kernel code, I checked and I found it > used only in some arch-specific stand/ code. So -Walloca should be > safe to turn on unconditionally, regardless of SSP. Unless the > compiler emits alloca() calls itself. > Sounds good. I'm for adding -Walloca and ideally -Wvla wherever possible in the kernel. > Jaromir > signature.asc Description: OpenPGP digital signature
Re: pg_jobc going negative?
On 10.07.2020 15:41, Robert Elz wrote: > Unfortunately > I have no idea what "qualifying pgrp for job control" is supposed to mean. The Design and implementation of 4.4book phrases it as: number of processes with parent controlling terminal. Unfortunately the book does not explain whether pg_jobc can go bellow 0. signature.asc Description: OpenPGP digital signature
Re: kernel stack usage
On 04.07.2020 14:00, Jaromír Doleček wrote: > Can anybody using clang please confirm kernel build with > -Wframe-larger-than=3584? > NetBSD-current from today, amd64 GENERIC builds for me. > Kamil - what's the difference in gcc between -Wframe-larger-than= and > -Wstack-size= ? > -Wstack-size doesn't exist? > I see according to gcc documentation -Wframe-larger-than doesn't count > size for alloca() and variable-length arrays, which makes it much less > useful than -Wstack-usage. > It's not a problem. Whenever alloca or VLA is in use, it's already breaking the stack protector. There are a few exceptions registered in sys/conf/ssp.mk. We could add -Walloca -Wvla for USE_SSP=yes builds to catch quickly inappropriate usage (frequently violated by code optimizer). On 04.07.2020 14:10, Martin Husemann wrote: > On Sat, Jul 04, 2020 at 02:00:04PM +0200, Jaromír Dole?ek wrote: >> Can anybody using clang please confirm kernel build with >> -Wframe-larger-than=3584? > > Side note: 3584 is inacceptably large, we need to trim it down to ~1k. > Setting it to 3584 is a good start, it can be lowered later to 1024. signature.asc Description: OpenPGP digital signature
Re: pg_jobc going negative?
On 09.06.2020 20:11, Robert Elz wrote: > Date:Tue, 9 Jun 2020 17:04:54 +0200 > From: Kamil Rytarowski > Message-ID: > > > | Yes... syzkaller had like 12 different ways to reproduce it. > > OK, thanks. > > | There is still a race and we randomly go to negative pg_jobc. > > I am not at all surprised... > > I will look at it over the next couple of days. No guarantees... > Ping? This kernel crash is blocking GDB/etc and it is an instant crash. > kre >
Re: makesyscalls (moving forward)
On 15.06.2020 15:21, Johnny Billquist wrote: > > Anyway. Who here does not modify their path at login anyway. The path has to be readily available for pkgsrc users with unprepared environment. However if we install the utility into /usr/sys (similar to /usr/games), we can use a full path to the program and it will be good enough (for me). Are there other programs that would be moved to this directory? I have got a feeling that too many programs already rely on specific kernel internals so making a distinction would only confuse people and impose unclear conditions what belongs where. fsdb(8) or crash(8) are definitely not going to be very usable with mixed kernel and userland versions. Something we possibly agree upon is that makesyscalls(1) would not be a tool for administer a computer/server, so /usr/sbin /sbin is not a good place. signature.asc Description: OpenPGP digital signature
Re: makesyscalls (moving forward)
On 15.06.2020 14:35, Reinoud Zandijk wrote: > On Mon, Jun 15, 2020 at 02:06:00PM +0200, Kamil Rytarowski wrote: >> On 15.06.2020 13:44, Reinoud Zandijk wrote: >>> No need to install it in base. The resulting files can then be committed >>> as `regen' just like the pcidevs variants. >> >> I disagree as we don't want to pull ${BSDSRCDIR} dependency for users, for >> building an application. > > Lets try to make it clear then: who are the users? > > 1) Kernel syscall and compat (module) code; only when updating calls > > 2) ktrace (and friends) system calls decode. That would greatly increase > readability ! Esp. if passed arguments could be automatically dumped too. > The above are good for TOOLDIR. Below: > 3) (llvm) fuzzers for testing; this is intree too so no big deal > LLVM is an external project and only in a special case part of the basesystem. While there, there is the same issue with GCC sanitizers. We definitely don't want to request regular LLVM or GCC users building the toolchain to depend on TOOLDIR / BSDSRCDIR. > 4) some syscall bashing tools for testing etc. They are tailored anyway so > using a $BSDSRCDIR specfic program that is not installed is not that relevant. > I don't know what is syscall bashing tool for testing. > But what else? There are IMHO no other valid users. > As of today GDB, but other similar programs can/shall follow. syscall tracers (I wrote picotrace, truss - both distributed in pkgsrc; there is strace) Language runtime, basically everything that is not using libc could use it (go, rust, D, etc). kernel fuzzers (syzkaller) We work on rumpkernel syscall fuzzers during the ongoing GSoC. >> This utility shall receive ATF testing and thus shall be part of $PATH. > > ATF testing sounds like a good idea; but does an utility have to be installed > to be able to test code? > Yes. That could be worked around for ATF, but generally it needs testing. > Also the generated files need to be updated in the kernel source tree and are > tightly coupled to the kernel code templates. > Yes. >> Putting it to /kern would be bad as we will gain another kernel ABI >> dependency and this program won't be usable in TOOLDIR neither when working >> with different target NetBSD release than the developer's computer. >> >> I personally think that the definition file shall be embedded directly into >> the program to avoid any issues with incompatible script version vs >> makesyscalls(1) program. > > You got a point there, and embedding it would make sense yes; but i still > wouldn't install the program or its definition files as its kernel source > version dependent and when building tools etc. $BSDSRCDIR is obviously > available anyway. > For a developer it is fine to request BSDSRCDIR to be available, but for users (of e.g. GDB) this is certainly an overkill. makesyscalls(1) will be maybe up to 1MB. Just $BSDSRCDIR/sys takes around 450MB. If we want to depend on BSDSRCDIR for programs in pkgsrc, this is IMO a blocker. Prebuilt picotrace takes less than 100kb. Adding a hard dependency on BSDSRCDIR would be severe overkill. The intention of this tool is too export its functionality to regular programs that can be built in pkgsrc and there are plenty of them. > Reinoud > I'm definitely going to be a user of this program (in default PATH, without BSDSRCDIR) and whenever possible, I will wire pkgsrc packages to depend on it as soon as possible. None of the pkgsrc programs will need BSDSRCDIR. signature.asc Description: OpenPGP digital signature
Re: makesyscalls (moving forward)
On 15.06.2020 14:16, Johnny Billquist wrote: > On 2020-06-15 14:12, Kamil Rytarowski wrote: >> On 15.06.2020 14:11, Johnny Billquist wrote: >>> >>> We should not clutter the directories that are in the normal users path >>> with things that a normal user would never care about. >> >> I never used 90% of the programs from /usr/bin /usr/sbin /bin /sbin. but >> I definitely would use makesyscall(1). If you have other argument that >> "I don't use it" please speak up. > > I'm not convinced you are particularly representative of "users". > NetBSD is a my daily driver so I'm a user! > But it would be interesting to hear how and when you are planning to use > makesyscalls. > I work with the syscall layer almost continuously in various projects (debuggers, fuzzers, syscall tracers, sanitizers, non-libc language runtimes etc). Reiterating over the same list 10 times just increases the frustration and perception of lost time of repeating the same process in an incompatible way for another program. The tool shall centralize the whole knowledge about passed arguments, structs and export it to users through a flexible code generation. We already distribute to users /usr/include/sys/syscalls.h (and it is used e.g. by GDB to parse the syscalls, as parsing syscalls.master in that case was harder). makesyscalls(1) is intended to be a more specialized and generic version of the same functionality as distributed by this header. With some sort of fanciness, we could generate these lists on the fly in some projects (for e.g. GDB) and we would want the utility to be available in place. If it is restricted to build-only phase of various programs (that definitely shall be free from BSDSRCDIR dependency) it will be good enough. I'm for adding this program in PATH and I would be a user on a regular basis. I basically need it for pretty everything (2 GSoC ongoing projects are about covering the same syscalls in 2 different ways). Asking me for a use-case is odd to me as it is an elementary program that belongs to /usr/bin. > Johnny > signature.asc Description: OpenPGP digital signature
Re: makesyscalls (moving forward)
On 15.06.2020 14:11, Johnny Billquist wrote: > > We should not clutter the directories that are in the normal users path > with things that a normal user would never care about. I never used 90% of the programs from /usr/bin /usr/sbin /bin /sbin. but I definitely would use makesyscall(1). If you have other argument that "I don't use it" please speak up. signature.asc Description: OpenPGP digital signature
Re: makesyscalls (moving forward)
On 15.06.2020 13:44, Reinoud Zandijk wrote: > No need to install it in base. The resulting > files can then be committed as `regen' just like the pcidevs variants. I disagree as we don't want to pull ${BSDSRCDIR} dependency for users, for building an application. This utility shall receive ATF testing and thus shall be part of $PATH. On 15.06.2020 13:44, Reinoud Zandijk wrote: > I wouldn't install the file in the FS; kernel and userland are often out of > sync, maybe even versions apart with say NetBSD-8 userland but NetBSD-current > kernel. If anywhere, request the data from the kernel by exposing it in /kern. > Exposing it one way or another might be an attack vector too ... Putting it to /kern would be bad as we will gain another kernel ABI dependency and this program won't be usable in TOOLDIR neither when working with different target NetBSD release than the developer's computer. I personally think that the definition file shall be embedded directly into the program to avoid any issues with incompatible script version vs makesyscalls(1) program. signature.asc Description: OpenPGP digital signature
Re: makesyscalls (moving forward)
On 15.06.2020 00:57, Johnny Billquist wrote: > On 2020-06-15 00:52, Kamil Rytarowski wrote: >> On 15.06.2020 00:26, Johnny Billquist wrote: >>> But that's just me. I'll leave the deciding to you guys... >>> >> >> This is only me, but /sbin and /usr/sbin are for users with root >> privileges, while /bin and /usr/bin for everybody. makesyscalls(1) >> intends to be an end-user program that aids building software and this >> is just another specialized program similar to flex(1) or yacc(1), just >> a more domain specific code generator. > > Is ping only for people with root privileges??? > ping needs setuid so yes. > Johnny > signature.asc Description: OpenPGP digital signature
Re: makesyscalls (moving forward)
On 15.06.2020 00:26, Johnny Billquist wrote: > But that's just me. I'll leave the deciding to you guys... > This is only me, but /sbin and /usr/sbin are for users with root privileges, while /bin and /usr/bin for everybody. makesyscalls(1) intends to be an end-user program that aids building software and this is just another specialized program similar to flex(1) or yacc(1), just a more domain specific code generator. I don't see any reason why to restrict makesyscalls(1) to root-only. /usr/bin is a settled path for native programs and chaing it is not worth it (and I personally see no reason). If I want to plug makesyscalls(1) into LLVM or GDB or some fuzzers, it would be certainly cumbersome to pass full path to some /sys/bin or similar. signature.asc Description: OpenPGP digital signature
Re: makesyscalls (moving forward)
On 14.06.2020 23:59, Johnny Billquist wrote: > On 2020-06-14 23:21, Paul Goyette wrote: >> On Sun, 14 Jun 2020, David Holland wrote: >> >> >> >>> This raises two points that need to be bikeshedded: >>> >>> (1) What's the new tool called, and where does it live in the tree? >>> "usr.bin/makesyscalls" is fine with me but ymmv. >> >> "usr.bin/makesyscalls" sounds good to me. > > Uh? usr.bin is where stuff for /usr/bin is located, right? Anything > there should be pretty normal tools that any user might be interested > in. Don't seem to me as makesyscalls would be a tool like that? > > Possibly some sbin thing, but in all honestly, wouldn't this make more > sense to have somewhere under sys? Don't we have some other tools and > bits which are specific for kernel and library building? > /usr/bin is appropriate and there are already similar tools (like ioctlprint(1)). It's already in PATH and definitely in interest of some end-users (like me) and I do want to have it. makesyscalls(1) sounds like a good name. /usr/share/sys/syscalls.de should be an internal detail for makesyscalls(1). I actually think that syscalls.def should be builtin into the program and we should avoid an external file dependency as it is expected to be operational for only one kernel ABI release + makesyscalls(1) version. There are expected no external consumers of this .def file and all we need and want is to pass rules how to generate syscall definitions. makesyscalls(1) will likely quickly turn into a ./build.sh tool and reducing management of an external file is especially a good idea. There is already a prior art as ioctlprint(1) has a builtin database for the ioctl codes and it works well. > Johnny > signature.asc Description: OpenPGP digital signature
[PATCH] Add support for RUMP_USE_LIBC_ALLOCATORS
I propose to add a new option for building rumpkernel: RUMP_USE_LIBC_ALLOCATORS. This option wires the kernel allocators directly to the libc functions. This is useful for sanitizers with their fine-grained checks of allocated chunks. http://netbsd.org/~kamil/patch-00268-RUMP_USE_LIBC_ALLOCATORS.2.txt Does this code and approach look good? Is possible to improve this approach? There is a fallout with ATF test t_vm::uvmwait as it no longer passes as the memory limit is no longer respected. With this patch rumpkernel is now more sensitive to real memory access bugs. The immediate detected problem is with kthread_join() that attempts to join a thread that was freed. http://netbsd.org/~kamil/rump/heap-use-after-free.txt We use this patch during the ongoing GSoC so quick merge into src/ is wanted. signature.asc Description: OpenPGP digital signature
Re: makesyscalls
On 10.06.2020 01:13, David Holland wrote: > The question is: do we want the Python version in the tree now For this, I would say "NO", at least as long Python is out of base and IMO it shall not be there. But it is fine to put into othersrc/. On 10.06.2020 01:13, David Holland wrote: > Rewriting in C is a possible future step. The code generator I have in > mind going forward should not be done in Python. But again, more on > that later. I would like to have mksyscalls (and at some point makeioctls) much more flexible and as a tool scriptable. I had to iterate a dozen of times over all our syscalls in various fuzzers, sanitizers, debuggers, tracers etc. Something that is very needed is knowing the full serialized struct passed as a pointer to each syscall. It's a lot of work to teach the tool about it, but it could be finally centralized and time saved of repeatedly teaching all the other programs about this property of syscalls. signature.asc Description: OpenPGP digital signature
Re: pg_jobc going negative?
On 09.06.2020 16:35, Robert Elz wrote: > Date:Tue, 9 Jun 2020 14:13:56 +0200 > From: Kamil Rytarowski > Message-ID: <85d5e51f-afd1-1038-fd68-2366ff073...@netbsd.org> > > | Here is the simplest reproducer crashing the kernel on negative pg_jobc: > > I have not looked at this closely yet, but this is likely because > ptrace() fiddles p_pptr which the routines that manipulate the pg_jobc > more or less expect to be a constant. > > Is there any known reproducer of this problem which does not involve ptrace() > ? > Yes... syzkaller had like 12 different ways to reproduce it. As far as I can tell, in the syzkaller case they all are about races. One of them is here: https://syzkaller.appspot.com/text?tag=ReproC&x=128060f610 After adding the asserts, all look similar to me: forking + setpgid(0, 0). > At first glance, the manipulations of pg_jobc looks a bit dodgy to me, but I > haven't investigated enough to be able to spot a definite problem yet > (possible ptrace() generated issue aside - and yes, those need to work as > well). > > I doubt very much that adding a new mutex will make a difference, all the > manipulations are done with proc_lock held, which is kind of the "big lock" > for process manipulation - adding finer grained locking might improve > performance, by improving concurrency, but is unlikely (at this stage, > nothing is impossible) to be a fix for this problem. > There is still a race and we randomly go to negative pg_jobc. > kre >
Re: pg_jobc going negative?
On 09.06.2020 08:38, Maxime Villard wrote: >> Should we allow pg_jobc going negative? > > I don't think so, the code is not designed to expect negative values. Here is the simplest reproducer crashing the kernel on negative pg_jobc: http://netbsd.org/~kamil/ptrace/pg_jobc-crash.c On 09.06.2020 08:38, Maxime Villard wrote: > In short, (1) my understanding of it is that the code is not designed to > expect negative values, and (2) since I added the KASSERTs 10/11 of the > random bugs didn't trigger. Big signs the bug is indeed related to > refcounting. > > It would be nice if someone with better understanding than me of the lwp > group stuff could have a look, though. Generally, pg_jobc looks like a ref counting mechanism. FreeBSD reworked the code and added a dedicated pgrp mutex. I don't know which path is the best for us, especially regarding the SMP properties. + ad@
Re: pg_jobc going negative?
On 09.06.2020 10:23, Michael van Elst wrote: > m...@m00nbsd.net (Maxime Villard) writes: > >> You can see they are all different, but all have to do with reading the >> group pointer, which was either freed, overwritten, not initialized, >> unmapped, or contained pure garbage. This is typical of refcounting bugs >> where a resource disappears under your feet. > > pg_jobc is not a reference counter. The assertion probably stopped > a bug in a different place by coincidence. > As the first step, is it fine to replace all pg_jobc == 0/ != 0 checks with pg_jobc > 0 / <= 0? This should not make anything worse than it is now. The remaining code assumes that pg_jobc never goes below 0. And then, follow up with the removal of the assert. We will check with syzkaller whether the races/crashes are gone. signature.asc Description: OpenPGP digital signature
pg_jobc going negative?
pg_jobc is a process group struct member counting the number of processes with a parent capable of doing job control. Once reaching 0, the process group is orphaned. With the addition of asserts checking for pg_jobc > 0 (by maxv@), we caught issues that pg_jobc can go negative. I have landed new ATF tests that trigger this reliably with ptrace(2) (src/tests/lib/libc/sys/t_ptrace_fork_wait.h r.1.7). The problem was originally triggered with GDB. There are also other sources of these asserts due to races The ptrace(2) ATF tests are reliable triggering a negative pg_jobc, however there are racy tests doing the same as triggered by syzkaller (mentioned at least in [1]). Should we allow pg_jobc going negative? (Other BSDs allow this.) Is going negative in the first place a real bug? Are the races triggered by syzkaller real bugs? [1] http://mail-index.netbsd.org/current-users/2020/05/04/msg038511.html On 20.04.2020 18:32, Maxime Villard wrote: > Module Name: src > Committed By: maxv > Date: Mon Apr 20 16:32:03 UTC 2020 > > Modified Files: > src/sys/kern: kern_proc.c > > Log Message: > Add three KASSERTs, to detect refcount bugs. > > This narrows down an unknown bug in some place near, that has manifested > itself in various forms (use-after-frees, uninit accesses, page faults, > segmentation faults), all pointed out by syzbot. > > The first KASSERT in fixjobc() fires when the bug is encountered. > > > To generate a diff of this commit: > cvs rdiff -u -r1.244 -r1.245 src/sys/kern/kern_proc.c > > Please note that diffs are not public domain; they are subject to the > copyright notices on the relevant files. > > > Modified files: > > Index: src/sys/kern/kern_proc.c > diff -u src/sys/kern/kern_proc.c:1.244 src/sys/kern/kern_proc.c:1.245 > --- src/sys/kern/kern_proc.c:1.244Sun Apr 19 20:31:59 2020 > +++ src/sys/kern/kern_proc.c Mon Apr 20 16:32:03 2020 > @@ -1,4 +1,4 @@ > -/* $NetBSD: kern_proc.c,v 1.244 2020/04/19 20:31:59 thorpej Exp $ */ > +/* $NetBSD: kern_proc.c,v 1.245 2020/04/20 16:32:03 maxv Exp $ */ > > /*- > * Copyright (c) 1999, 2006, 2007, 2008, 2020 The NetBSD Foundation, Inc. > @@ -62,7 +62,7 @@ > */ > > #include > -__KERNEL_RCSID(0, "$NetBSD: kern_proc.c,v 1.244 2020/04/19 20:31:59 thorpej > Exp $"); > +__KERNEL_RCSID(0, "$NetBSD: kern_proc.c,v 1.245 2020/04/20 16:32:03 maxv Exp > $"); > > #ifdef _KERNEL_OPT > #include "opt_kstack.h" > @@ -554,6 +554,7 @@ proc_sessrele(struct session *ss) > { > > KASSERT(mutex_owned(proc_lock)); > + KASSERT(ss->s_count > 0); > /* >* We keep the pgrp with the same id as the session in order to >* stop a process being given the same pid. Since the pgrp holds > @@ -1181,8 +1182,11 @@ fixjobc(struct proc *p, struct pgrp *pgr > if (entering) { > pgrp->pg_jobc++; > p->p_lflag &= ~PL_ORPHANPG; > - } else if (--pgrp->pg_jobc == 0) > - orphanpg(pgrp); > + } else { > + KASSERT(pgrp->pg_jobc > 0); > + if (--pgrp->pg_jobc == 0) > + orphanpg(pgrp); > + } > } > > /* > @@ -1197,8 +1201,11 @@ fixjobc(struct proc *p, struct pgrp *pgr > if (entering) { > child->p_lflag &= ~PL_ORPHANPG; > hispgrp->pg_jobc++; > - } else if (--hispgrp->pg_jobc == 0) > - orphanpg(hispgrp); > + } else { > + KASSERT(hispgrp->pg_jobc > 0); > + if (--hispgrp->pg_jobc == 0) > + orphanpg(hispgrp); > + } > } > } > } > signature.asc Description: OpenPGP digital signature
Re: UBSan: Undefined Behavior in lf_advlock
On 06.06.2020 00:25, Jaromír Doleček wrote: > Le ven. 5 juin 2020 à 21:49, syzbot > a écrit : >> [ 44.1699615] panic: UBSan: Undefined Behavior in >> /syzkaller/managers/netbsd-kubsan/kernel/sys/kern/vfs_lockf.c:843:16, signed >> integer overflow: 131072 + 9223372036854771712 cannot be represented in type >> 'long int' >> >> [ 44.1931600] cpu0: Begin traceback... >> [ 44.1999503] vpanic() at netbsd:vpanic+0x287 sys/kern/subr_prf.c:290 >> [ 44.2299515] isAlreadyReported() at netbsd:isAlreadyReported >> [ 44.2599494] HandleOverflow() at netbsd:HandleOverflow+0x1c9 >> sys/../common/lib/libc/misc/ubsan.c:375 >> [ 44.2899499] lf_advlock() at netbsd:lf_advlock+0x2124 >> sys/kern/vfs_lockf.c:843 > > This happens in: > if (fl->l_len > 0) > end = start + fl->l_len - 1; > else { > > when call to fcntl() is arranged so that 'start' ends up 0x2 and > fl->l_len 0x7000, overflowing the off_t. > > I wonder, Is it important to fix cases like that? > > Jaromir > Generally we want to mute noise so signal is high on serious issues. Overflowing off_t shall not happen as it is a signed integer. A compiler is free to impose assumptions that the result is not overflowed and miscompile. In this case it is probably safe, but better to silent it. signature.asc Description: OpenPGP digital signature
Re: kernel stack usage
Can we adopt -Wframe-larger-than=1024 and mark it fatal? This option is supported by GCC and Clang. On 31.05.2020 15:55, Jaromír Doleček wrote: > I think it would make sense to add -Wstack-usage=X to kernel builds. > > Either 2KB or 1KB seems to be good limit, not too many offenders > between 1KB and 2KB so maybe worth it to use 1KB and change them. > > I'm sure there would be something similar for LLVM too. > > Jaromir > > Le dim. 31 mai 2020 à 15:39, Simon Burge a écrit : >> >> matthew green wrote: >> >>> glad to see this effort and the clean up already! >>> >>> ideally, we can break the kernel build if large stack consumers >>> are added to the kernel. i'd be OK with it being default on, >>> with of course a way to skip it, and if necessary it can have >>> a whitelist of "OK large users." >> >> I started to look at -fstack-usage which gives a nice MI way of getting >> the data instead of parsing MD objdump output, but didn't get any >> further than the below patch. The find(1) command referenced in the >> patch gives the following >> >> 1008 nfs_serv.c:2181:1:nfsrv_link >> 1056 procfs_linux.c:422:1:procfs_do_pid_stat >> 1056 vfs_subr.c:1521:1:vfs_buf_print >> 1072 dl_print.c:80:1:sdl_print >> 1104 core_elf32.c:300:1:coredump_getseghdrs_elf32 >> 1104 core_elf32.c:300:1:coredump_getseghdrs_elf64 >> 1104 sys_ptrace_common.c:1595:1:proc_regio >> 1120 subr_bufq.c:131:1:bufq_alloc >> 1136 db_lwp.c:64:1:db_lwp_whatis >> 1152 kern_ktrace.c:1269:1:ktrwrite >> 1168 uvm_swap.c:768:1:uvm_swap_stats.part.1 >> 1312 nfs_serv.c:1905:1:nfsrv_rename >> 2112 tty_60.c:68:1:compat_60_ptmget_ioctl >> 2144 memmem.c:83:14:twoway_memmem >> >> Anyone else want to run with adding a bit more to this to do some build >> failure as mrg@ suggests and documenting for options(4)? >> >> Cheers, >> Simon. >> -- >> Index: Makefile.kern.inc >> === >> RCS file: /cvsroot/src/sys/conf/Makefile.kern.inc,v >> retrieving revision 1.270 >> diff -d -p -u -r1.270 Makefile.kern.inc >> --- Makefile.kern.inc 21 May 2020 18:44:19 - 1.270 >> +++ Makefile.kern.inc 31 May 2020 13:34:13 - >> @@ -104,6 +104,11 @@ CFLAGS+= -ffreestanding -fno-zero-initia >> CFLAGS+= ${${ACTIVE_CC} == "gcc":? -fno-delete-null-pointer-checks :} >> CFLAGS+= ${DEBUG} ${COPTS} >> AFLAGS+= -D_LOCORE -Wa,--fatal-warnings >> +.if defined(KERN_STACK_USAGE) >> +# example usage to find largest stack users in kernel compile directory: >> +#find . -name \*.su | xargs awk '{ printf "%6d %s\n", $2, $1 }' | sort >> +1n >> +CFLAGS+= -fstack-usage >> +.endif >> >> # XXX >> .if defined(HAVE_GCC) || defined(HAVE_LLVM) >> @@ -338,8 +343,8 @@ ${_s:T:R}.o: ${_s} >> .if !target(__CLEANKERNEL) >> __CLEANKERNEL: .USE >> ${_MKMSG} "${.TARGET}ing the kernel objects" >> - rm -f ${KERNELS} *.map eddep tags *.[io] *.ko *.ln [a-z]*.s vers.c \ >> - [Ee]rrs linterrs makelinks assym.h.tmp assym.h \ >> + rm -f ${KERNELS} *.map *.[io] *.ko *.ln [a-z]*.s *.su vers.c \ >> + eddep tags [Ee]rrs linterrs makelinks assym.h.tmp assym.h \ >> ${EXTRA_KERNELS} ${EXTRA_CLEAN} >> .endif >> signature.asc Description: OpenPGP digital signature
Re: sys/idtype.h unused enumeration values
On 19.05.2020 15:06, Robert Elz wrote: > Date:Tue, 19 May 2020 14:12:31 +0200 > From: Kamil Rytarowski > Message-ID: <6874bb63-5146-797f-98b7-b9c497677...@gmx.com> > > | Rationale for pointless? > > There is no point. What more can I say? > > | My points were: > | > | - Clobbering OS that claims the goals of clean design and clean code > | with mutant alien bodies without counterparts in the native kernel, > | without request from any relevant standard body. > > Having the extra entries is harmless, there is no point deleting them. > I've abandoned the intention of changing these values (by adding comments, renaming etc). Once I will have spare time I might look into implementing the missing ID types, but I don't promise to do it soon. P_PSETID is possibly the easiest one. Thank you for the feedback. signature.asc Description: OpenPGP digital signature
Re: sys/idtype.h unused enumeration values
On 19.05.2020 08:38, Robert Elz wrote: > Date:Mon, 18 May 2020 21:11:36 +0200 > From: Kamil Rytarowski > Message-ID: <05255347-1c55-2762-aaf6-fec3caf48...@gmx.com> > > | Next, I can add my value at the end of list (and before _P_MAXIDTYPE). > > Other than this, everything that you propose is pointless. This one > you can do (assuming of course that there is a good reason for your new > entry, but there's no reason to doubt that now.) Rationale for pointless? My points were: - Clobbering OS that claims the goals of clean design and clean code with mutant alien bodies without counterparts in the native kernel, without request from any relevant standard body. - Collecting garbage in public headers that is unused, misleading and can at best be dummy. - Not know any public users. Extremely unlikely to have any private ones. But as there is insist on keeping that mutant clobber around as it formally leaked into ABI (even if very unlikely to have any single user ever), it's better to implement project identifiers and the other kernel components so this can be meaningful. > > kre > signature.asc Description: OpenPGP digital signature
Re: sys/idtype.h unused enumeration values
On 18.05.2020 22:18, Christos Zoulas wrote: > > >> On May 18, 2020, at 3:40 PM, Kamil Rytarowski wrote: >> >> If I delete P_TASKID ... P_P_CPUID ones, P_SETID will be reordered (but >> we can force the number anyway). If I delete P_CID there is an inelegant >> hole. Naturally P_SETID -> P_CID can fill the gap. >> >> This is in theory ABI change, but no users could use in a useful >> approach previously. >> >> My intention isto g/c unused values and keep this clean and elegant (as >> this is still possible). >> > > Why don't you leave them alone for the same reason FreeBSD did (source > compatibility) > and append the ones you want? If you look they #define P_ZONEID P_JAILID when > they made the change... > > christos > My point is that there is no source code (at least in base) that we gain compatibility with, no ABI compatibility layer and these concepts do not match the current NetBSD kernel features. If there is anything in 3rd party pretending to use these values, it would not work anyway. If we want to these compat defines, they already live in: external/cddl/osnet/dist/uts/common/sys/procset.h https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=170346 Regarding FreeBSD, I don't see rationale for inclusion of these values. It looks like it was copy-pasted (there were also Solaris-specific preprocessor guards in the initial version). But if there is intention to keep these values around (as it might be in theory too late as they leaked somewhere), it's fine. Thanks. signature.asc Description: OpenPGP digital signature
Re: sys/idtype.h unused enumeration values
On 18.05.2020 21:31, Taylor R Campbell wrote: >> Date: Mon, 18 May 2020 21:11:36 +0200 >> From: Kamil Rytarowski >> >> On 18.05.2020 20:24, Robert Elz wrote: >>> Date:Mon, 18 May 2020 19:45:55 +0200 >>> From:Kamil Rytarowski >>> Message-ID: >>> >>> | I have got a local use-case for another P_type (premature to discuss it >>> | in this thread) and I would rather recycle an unused value. >>> >>> Don't do that, it is just a number, use one that hasn't been used >>> for this purpose before. >>> >>> | Do we plan to get Solaris feature-parity with all the types? I assume >>> | that the answer is NO. If so, can we delete the P_ values that are not >>> | applicable for NetBSD? >>> >>> I have no problem with that - just don't reuse the values for some >>> different purpose, keep them reserved (assign them meaningless reserved >>> names) just in case there's ever a need to implement one of those things >>> (this is very very cheap insurance). >> >> I propose to delete P_CID and recycle its place for P_PSETID. > > There are 2,147,483,630 different numbers you can choose from, if I > counted right -- that's over two billion. It's not that important to > be economical with reuse when there's a scary comment above it > exhorting you not to mess with the existing numbers... Also, P_PSETID > already appears to be assigned the number 15, so why would you want to > change that? > If I delete P_TASKID ... P_P_CPUID ones, P_SETID will be reordered (but we can force the number anyway). If I delete P_CID there is an inelegant hole. Naturally P_SETID -> P_CID can fill the gap. This is in theory ABI change, but no users could use in a useful approach previously. My intention isto g/c unused values and keep this clean and elegant (as this is still possible). signature.asc Description: OpenPGP digital signature
Re: sys/idtype.h unused enumeration values
On 18.05.2020 20:24, Robert Elz wrote: > Date:Mon, 18 May 2020 19:45:55 +0200 > From: Kamil Rytarowski > Message-ID: > > | I have got a local use-case for another P_type (premature to discuss it > | in this thread) and I would rather recycle an unused value. > > Don't do that, it is just a number, use one that hasn't been used > for this purpose before. > > | Do we plan to get Solaris feature-parity with all the types? I assume > | that the answer is NO. If so, can we delete the P_ values that are not > | applicable for NetBSD? > > I have no problem with that - just don't reuse the values for some > different purpose, keep them reserved (assign them meaningless reserved > names) just in case there's ever a need to implement one of those things > (this is very very cheap insurance). > > kre > Solaris ABI compat is something that has small chances of being useful in future. Solaris is also not a reference for NetBSD here. Linux uses the POSIX types: P_ALL, P_PID, P_GID + extension P_PIDFD. https://github.com/torvalds/linux/blob/master/include/uapi/linux/wait.h#L16 FreeBSD already recycled the P_ZONEID value for P_JAILID. Keeping these unused Solaris values in tree has no added value. If we want it for the sake of defining it, push this to src/external/. On 18.05.2020 20:19, Christos Zoulas wrote: > I copied these from FreeBSD who in turn copied them from solaris and changed > P_ZONEID to P_JAILID. The FreeBSD comment is: > http://bxr.su/FreeBSD/sys/sys/wait.h#100 > I decided to keep all the names too. > I propose to delete: P_TASKID, P_PROJID, P_POOLID, P_ZONEID, P_CTID as they have no equivalent objects in NetBSD. (If they could have, these objects could be designed differently). P_CID could be in theory used, but has no users and could be deleted. P_PSETID shall be used in pset_bind(3)/pset_unbind(3), but it looks like it was not implemented (and thus, has no users today). http://netbsd.org/~kamil/patch-00256-cleanup-waitids.txt I propose to delete P_CID and recycle its place for P_PSETID. If we will ever have a need for P_CTID or similar, we could readd them to the enumeration. Next, I can add my value at the end of list (and before _P_MAXIDTYPE). signature.asc Description: OpenPGP digital signature
sys/idtype.h unused enumeration values
POSIX notes: "The type idtype_t shall be defined as an enumeration type whose possible values shall include at least the following: P_ALL P_PGID P_PID" https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_wait.h.html For some reason we copied as-is solaris types into our public headers with fields that will possibly never apply to NetBSD, such as P_CTID, P_POOLID etc. I have got a local use-case for another P_type (premature to discuss it in this thread) and I would rather recycle an unused value. Do we plan to get Solaris feature-parity with all the types? I assume that the answer is NO. If so, can we delete the P_ values that are not applicable for NetBSD? If something is used in some compat shim for sunos software (dtrace, zfs) we could easily add a compat wrapper in the 3rd party code. Defining the types does not make the features to work. signature.asc Description: OpenPGP digital signature
Re: KAUTH_SYSTEM_UNENCRYPTED_SWAP
Is it possible to avoid negation in the name? KAUTH_SYSTEM_ENABLE_SWAP_ENCRYPTION On 17.05.2020 00:51, Paul Goyette wrote: > I'm not sure I like the name! > > Can you call it KAUTH_SYSTEM_DISABLE_SWAPENCRYPT ? That more > closely describes the action which is being controlled. > > > On Sat, 16 May 2020, Alexander Nasonov wrote: > >> Attached patch adds KAUTH_SYSTEM_UNENCRYPTED_SWAP and >> it forbids changing vm.swap_encrypt from 1 to 0 when >> securelevel > 0. >> >> If there are no objections, I'm going to commit it tomorrow. >> >> -- >> Alex >> >> >> !DSPAM:5ec06ddb24841398742664! >> > > ++--+---+ > | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | > | (Retired) | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com | > | Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org | > ++--+---+ signature.asc Description: OpenPGP digital signature
Re: vmspace refcnt refactor patch
On 16.05.2020 18:52, Nick Hudson wrote: > > On 16/05/2020 12:45, Kamil Rytarowski wrote: >> On 16.05.2020 07:48, Nick Hudson wrote: >>> On 15/05/2020 17:35, Kamil Rytarowski wrote: >>>> I propose the following patch: >>>> >>>> http://netbsd.org/~kamil/patch-00253-refactor-vmspace-refcnt.txt >>>> >>>> Does it look good? >>>> >>>> Nick, does it fix the hppa locking problem? >>>> >>> Thanks for working on this. >>> >>> Unfortunately, it doesn't fix all the problems... >>> >> Please check this: >> >> http://netbsd.org/~kamil/patch-00254-remove-lwp_lock_in_sstep.txt > > I can run the ptrace tests now without triggering LOCKDEBUG asserts. > > > Thanks, > > Nick > OK. I'm waiting now for the feedback from Andew before committing. signature.asc Description: OpenPGP digital signature
Re: vmspace refcnt refactor patch
On 16.05.2020 07:48, Nick Hudson wrote: > On 15/05/2020 17:35, Kamil Rytarowski wrote: >> I propose the following patch: >> >> http://netbsd.org/~kamil/patch-00253-refactor-vmspace-refcnt.txt >> >> Does it look good? >> >> Nick, does it fix the hppa locking problem? >> > > Thanks for working on this. > > Unfortunately, it doesn't fix all the problems... > Please check this: http://netbsd.org/~kamil/patch-00254-remove-lwp_lock_in_sstep.txt In general I don't like the locking model in ptrace(2), but this change is not making anything worse. > > setstep1: [ 209.5497837] Mutex error: assert_sleepable,78: spin > lock held > > [ 209.5497837] lock address : 0x3ffccf00 type : spin > [ 209.5497837] initialized : 0x007d6f78 > [ 209.5497837] shared holds : 0 exclusive: 1 > [ 209.5497837] shares wanted: 0 exclusive: 0 > [ 209.5497837] relevant cpu : 0 last held: 0 > [ 209.5497837] relevant lwp : 0x3fc36d00 last held: > 0x3fc36d00 > [ 209.5497837] last locked* : 0x0084a760 unlocked : > 0x007d5134 > [ 209.5497837] owner field : 0xff10 wait/spin: 0/1 > > [ 209.5497837] panic: LOCKDEBUG: Mutex error: assert_sleepable,78: spin > lock held > [ 209.5497837] cpu0: Begin traceback... > [ 209.5497837] vpanic() at netbsd:vpanic+0x1b8 > [ 209.5497837] panic() at netbsd:panic+0x38 > [ 209.5497837] lockdebug_abort1() at netbsd:lockdebug_abort1+0x170 > [ 209.5497837] lockdebug_barrier() at netbsd:lockdebug_barrier+0x114 > [ 209.5497837] assert_sleepable() at netbsd:assert_sleepable+0xa0 > [ 209.5497837] pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x254 > [ 209.5497837] uvm_mapent_alloc() at netbsd:uvm_mapent_alloc+0x150 > [ 209.5497837] uvm_map_enter() at netbsd:uvm_map_enter+0x848 > [ 209.5497837] uvm_map() at netbsd:uvm_map+0xd4 > [ 209.5497837] uvm_map_reserve() at netbsd:uvm_map_reserve+0x240 > [ 209.5497837] uvm_map_extract() at netbsd:uvm_map_extract+0x6fc > [ 209.5497837] uvm_io() at netbsd:uvm_io+0xf8 > [ 209.5497837] process_domem() at netbsd:process_domem+0x94 > [ 209.5497837] ss_get_value() at netbsd:ss_get_value+0x74 > [ 209.5497837] process_sstep() at netbsd:process_sstep+0x74 > [ 209.5497837] do_ptrace() at netbsd:do_ptrace+0xe00 > [ 209.5497837] sys_ptrace() at netbsd:sys_ptrace+0x4c > [ 209.5497837] syscall() at netbsd:syscall+0x480 > [ 209.5497837] -- syscall #26(7, 141a, 1, 0, ...) (0xd3131000) > [ 209.5497837] cpu0: End traceback... > > nick@zoom:/netbsd/hppa/obj.hppa/sys/arch/hppa/compile/GENERIC$ > hppa--netbsd-addr2line -e netbsd.gdb -f 0x007d6f78 > sched_cpuattach > /netbsd/hppa/src/sys/kern/kern_runq.c:147 > nick@zoom:/netbsd/hppa/obj.hppa/sys/arch/hppa/compile/GENERIC$ > hppa--netbsd-addr2line -e netbsd.gdb -f 0x0084a760 > lwp_lock > /netbsd/hppa/src/sys/sys/lwp.h:412 > nick@zoom:/netbsd/hppa/obj.hppa/sys/arch/hppa/compile/GENERIC$ > hppa--netbsd-addr2line -e netbsd.gdb -f 0x007d5134 > calcru > /netbsd/hppa/src/sys/kern/kern_resource.c:506 (discriminator 2) > nick@zoom:/netbsd/hppa/obj.hppa/sys/arch/hppa/compile/GENERIC$ > > Nick signature.asc Description: OpenPGP digital signature
vmspace refcnt refactor patch
I propose the following patch: http://netbsd.org/~kamil/patch-00253-refactor-vmspace-refcnt.txt Does it look good? Nick, does it fix the hppa locking problem? signature.asc Description: OpenPGP digital signature
fstat(1) and sysctl(3)
Is there any good reason that fstat(1) needs kvm(3)? Is it viable to offer its functionality with sysctl(3), in particular in struct kinfo_file? I'm have got a use-case (in GDB (*)) where I would make use of more fields in struct kinfo_file, at least file path and socket information. Maybe it would be viable to switch fstat(1) to sysctl(3)? (*) FreeBSD uses: fbsd_info_proc_files_entry (kf->kf_type, kf->kf_fd, kf->kf_flags, kf->kf_offset, kf->kf_vnode_type, kf->kf_sock_domain, kf->kf_sock_type, kf->kf_sock_protocol, &kf->kf_sa_local, &kf->kf_sa_peer, kf->kf_path); -- gdb/fbsd-nat.c
Re: New tools proposal: ioctlname and ioctldecode
On 02.04.2020 04:06, Mouse wrote: >> We should maintain a contract that all new ioctl operations are >> system wide unique. > > That is, unfortunately, unenforceable in the presence of a user base > that writes and shares code. If I #define IOCNEWTHING _IO('?',7) and > someone else #defines IOCOTHERTHING _IO('?',7), there really isn't any > way to prevent that, nor to prevent them from conflicting when - and > eventually it will happen - someone wants to run a system with both new > thing and other thing. > This is partially enforceable. As once we generate catchall switch like: case FOO_OP: ... case BAR_OP: ... a compiler will report error whenever FOO_OP = BAR_OP.
Re: New tools proposal: ioctlname and ioctldecode
On 02.04.2020 03:33, Mouse wrote: $ ioctlname 2148554498 WSKBDIO_COMPLEXBELL >>> Where would you get the mapping between the ioctl value and the >>> name? [...] >> Everything is already done in kdump and reused in other tools like >> ktruss. > > So, the big switch() method. > Yes. We should maintain a contract that all new ioctl operations are system wide unique. >>> What about collisions, two ioctls having the same numeric value? >> There are some collisions, but not that many of them. In these cases >> we try to pick the more interesting device. > > For kdump, that makes some sense. For this tool, I think it would make > the most sense to report them all. (Arguably, kdump should too...) > > Of course, that would mean changing things at least slightly from the > current kdump approach. I'm not sure that would necessarily be a bad > thing, especially if we could somehow (major handwave here) tell which > ioctls go together, in which case kdump could do heuristics along the > lines of "this fd accepted FOO_IOC_A, so we're going to decode this one > as FOO_IOC_B rather than BAR_IOC_C". > This would be ideal... however it's not that simple and I would recommend to go with the path of removing the conflicts entirely for the general benefit. For the time being we just live with conflicts and ignore a subset of operations. From a quick look there are conflicts e.g. due to chio(1) a SCSI charger added in NetBSD 1.3 for and.com. Are there still any users?
Re: New tools proposal: ioctlname and ioctldecode
On 02.04.2020 02:14, Christos Zoulas wrote: > In article <0fd513f7-6f0c-6ed1-2119-6ce5313d4...@gmx.com>, > Kamil Rytarowski wrote: >> I propose to add two new tools: >> >> - ioctlname >> - ioctldecode > > I would call it ioctlprint and have: > > ioctlprint [-f ] || arg > > for the input arg can be: > name = TIOCFOO > expr = _IO?('x', y, z) > value = 0xfoobee > > The format can be contain %e %n %v which expand to what you > think and defaults to: > > "%n %e %v\n" > > christos > I've implemented: ioctlprint [-f format] [-e emul] arg... $ ./ioctlprint 2148554498 2148554498 WSKBDIO_COMPLEXBELL _IOW('W',0x2,0x10) 0x80105702 WSKBDIO_COMPLEXBELL _IOW('W',0x2,0x10) 0x80105702 $ ./ioctlprint -f "%o %d %d %i %x %e %n\n" 2148554498 020004053402 2148554498 2148554498 2148554498 0x80105702 _IOW('W',0x2,0x10) WSKBDIO_COMPLEXBELL %n - name %e - expression %x - print HEX number %o - print OCT number %d %i - print DEC number http://netbsd.org/~kamil/patch-00245-kdump-ioctlname.3.txt
Re: New tools proposal: ioctlname and ioctldecode
On 02.04.2020 02:56, Mouse wrote: >> $ ioctldecode 2148554498 >> _IOW('W',0x2,0x10) > >> $ ioctlname 2148554498 >> WSKBDIO_COMPLEXBELL > > Where would you get the mapping between the ioctl value and the name? > Would this be just a huge switch statement (or compiled-in table), or > would it search /usr/include/sys at runtime, or what? > > In particular, would any special action need to be taken upon adding a > new ioctl for it to be recognized? > Everything is already done in kdump and reused in other tools like ktruss. Please check: src/usr.bin/kdump/Makefile.ioctl-c. > What about collisions, two ioctls having the same numeric value? (I'm > not aware of any offhand, but I'd be astonished if it never happened.) > There are some collisions, but not that many of them. In these cases we try to pick the more interesting device. In sanitizers there are around 2000 individual ioctls and 51 collisions. Some of them are gone as we remove old unmaintained device drivers.
New tools proposal: ioctlname and ioctldecode
I propose to add two new tools: - ioctlname - ioctldecode Both of them are a special mode embedded into kdump(1). >From time to time there is need to decode IOCTL codes and there is no (as far as I am aware) easy tool to do so. ioctlname is already invented and it calls the internal function ioctldecode(). commit f551b480cd03a35cec2a4927270c00cfaa508a27 Author: christos Date: Mon Apr 13 14:39:23 2009 + Allow kdump to be used as an ioctl decoder if invoked as ioctlname. Today the internal/hidden program ioctlname produces something like: "_IOW('W',0x2,0x10)". I propose to rename this program to ioctldecode and change ioctlname to print a descriptive operation name as received by the ioctlname() function. $ ioctldecode 2148554498 _IOW('W',0x2,0x10) $ ioctlname 2148554498 WSKBDIO_COMPLEXBELL I propose to install both programs and add appropriate man-pages. A functional patch is here: http://netbsd.org/~kamil/patch-00245-kdump-ioctlname.2.txt
Re: Avoid UB in pslist.h (NULL + 0)
On 24.03.2020 14:30, Kamil Rytarowski wrote: > (3) Patch Clang to start optimizing on NULL + in C so we can return to > points (1) and (2). > I have received a feedback that the particular NULL + 0 issue is intended to be reported to the C committee as a defect. I appreciate this approach. If there is an intention to tune the common interpretation of the C code, it's better to collaborate with the C committee directly. signature.asc Description: OpenPGP digital signature
Re: Avoid UB in pslist.h (NULL + 0)
On 24.03.2020 07:43, Taylor R Campbell wrote: >> Date: Sun, 22 Mar 2020 03:30:56 +0100 >> From: Kamil Rytarowski >> >> On 22.03.2020 01:50, Taylor R Campbell wrote: >>> So far, after several weeks of discussion, nobody has presented a case >>> that there is a credible thread of a compiler actually misbehaving in >>> this scenario. >> >> There are no public declarations (that was requested on a local ML) but >> according to my IRC talks, there were plans to start optimizing on NULL >> + 0 operations, but is was/is considered as a chicken-egg issue. There >> is a time span now to alert users and > > Perhaps it is not a chicken-egg issue, but pointless abuse of leeway > in the standard. So far the public declarations are that Clang > developers realized they _could_ abuse the leeway this way but they > _chose not to_; that essentially everyone involved sees such > `optimization' as abuse; that there are no public reasons stated for > why the C standard diverges from the C++ standard on this note. > > If you have contrary information, please cite it specifically. > I have nothing to add. I'm personally agnostic to both sides. >> Both languages can be synced here as the incompatibility was revealed >> and discussed. Personally I wouldn't be surprised to see adoption of the >> C behavior as nullptr + 0 is invalid in C++ (and it will be likely >> invalid in future C). > > The fragment nullptr + 0 is invalid for an unrelated reason: nullptr > is a pointer to an incomplete type. We've gone over this already. > > It's not helpful to keep bringing it up: at best it's a distraction, > and at worst it gives the impression you might not understand basic > aspects of C and C++ -- an impression which, if true, would make it > all the more important that you _not_ go around tweaking things to > appease sanitizers before discussing and understanding them. > >> So far in reply I get 'just noise from the tool' so maybe my messages >> were not clear enough and reaching authorities (from the clang community >> and C committee) not credible enough. > > You successfully presented a case that it is technically undefined > behaviour. This is not in dispute -- a month ago I cited the specific > place where the C standard neglects to define it[*] -- so I don't > understand why you continue to argue that case. > > [*] https://mail-index.NetBSD.org/tech-kern/2020/02/25/msg026105.html > I reported that these issues are triggered in userland rump (+ in the pslist.h ATF tests). Rump has userland assumptions with compiler flags. I already declarared that using assembly code is not planned to be touched. >>> (b) Change how we invoke ubsan and the compiler by passing >>> -fno-delete-null-pointer-checks to clang. joerg objected to this >>> but I don't recall the details off the top of my head; joerg, can >>> you expand on your argument against this, and which alternative >>> you would prefer? >> >> I tried to enable -fno-delete-null-pointer-check for RUMPKERNEL, but I >> was asked to revert... after first getting acknowledge from you that it >> is a good idea... > > This is an inaccurate representation of what happened. What I said is > (https://mail-index.NetBSD.org/tech-kern/2020/03/08/msg026125.html): > > `Adding -fno-delete-null-pointer-checks for clang too sounds >sensible to me in general, but please check with joerg first. It >remains unclear to me that it's necessary here.' > > I asked you to revert because: > (a) the discussion was still ongoing without a conclusion, > (b) it remained unclear whether the change was necessary, I noted that it is necessary at least for memcpy(NULL, NULL, X)-like usage in rump. Even if we ignore NULL + 0. The kernel code is already prebuilt with -fno-delete-null-pointer-checks, but not userland rump. This causes slight incompatibilities. I don't see what's unclear. I k > (c) you failed to check with joerg like I asked, and I checked earlier that joerg disagrees with GCC and he has its interpretation of the standard that memcpy(NULL, NULL, 0)-like optimization is wrong. > (d) joerg objected. > And GCC developers do not support this (me neither). > I appreciate that sometimes it takes longer than you or I might like > to get a clear explanation out of joerg, but it is also fatiguing to > have to correct misrepresentations of positions in long meandering > threads in order to ensure changes you are itching to make -- or have > already made despite ongoing discussion -- are justified and correct. > >>> (c) Change how we invoke ubsan, bu
Re: Avoid UB in pslist.h (NULL + 0)
On 22.03.2020 01:50, Taylor R Campbell wrote: >> Date: Sun, 22 Mar 2020 00:03:57 +0100 >> From: Kamil Rytarowski >> >> I propose to change the fun(pointer + 0) logic with fun(pointer, 0). > > I don't think this is a good approach -- it requires modifying code > further and further away from the relevant part. > > > But let's step back a moment. > > So far, after several weeks of discussion, nobody has presented a case > that there is a credible thread of a compiler actually misbehaving in > this scenario. > There are no public declarations (that was requested on a local ML) but according to my IRC talks, there were plans to start optimizing on NULL + 0 operations, but is was/is considered as a chicken-egg issue. There is a time span now to alert users and It was also confirmed by two people that clang or a C compiler in general is allowed to optimize the code. > Yes, it is technically undefined behaviour -- nobody disputes that -- > but so is lots of other stuff that NetBSD relies on such as inline > asm. In C++, it is explicitly _not_ undefined behaviour; Clang > specifically stopped short of exploiting it as undefined behaviour in > C; nobody presented reasons why it should be undefined in C but > defined in C++. > Our kernel/rump code is mostly in C or at most a common subset of C and C++. As the languages are not fully compatible in details, it's safe to assume the C behavior that is valid for every C++ program, rather than C++ one that is not valid for C. Both languages can be synced here as the incompatibility was revealed and discussed. Personally I wouldn't be surprised to see adoption of the C behavior as nullptr + 0 is invalid in C++ (and it will be likely invalid in future C). > So this all serves to work around false alarms from ubsan -- not > actual bugs, just noise from the tool. Presumably the reason we use > ubsan at all is that it helps find actual bugs -- true alarms. > There's a few ways we might approach the false alarms: > Out of 3 cases that I was requested, 3 of them were confirmed to be real bugs (alignment, memcpy(NULL,NULL, 0), ptr + 0). Two of the cases can generate crashing code now. One of them was researched by the clang developers and C committee and confirmed that a compiler can impose assumptions that would break misdesigned code. I presented examples for the two UB issues that they are harmful now. So far in reply I get 'just noise from the tool' so maybe my messages were not clear enough and reaching authorities (from the clang community and C committee) not credible enough. > (a) Ignore them. It's what we've been doing so far. > This is not true. We already run our kernel with GCC with 0 UBSan reports and perform syzbot fuzzing. We catch real problems there. It already happened. We are reaching to the point of running all ATF reports under UBSan. If we cannot suppress noise from a tool (any kind of it), then the tool is not much usable for signaling real problems. > (b) Change how we invoke ubsan and the compiler by passing > -fno-delete-null-pointer-checks to clang. joerg objected to this > but I don't recall the details off the top of my head; joerg, can > you expand on your argument against this, and which alternative > you would prefer? > I tried to enable -fno-delete-null-pointer-check for RUMPKERNEL, but I was asked to revert... after first getting acknowledge from you that it is a good idea... > (c) Change how we invoke ubsan, but just ubsan -- not the compiler. > There's a cost to this: if we diverge from how we invoke the > compiler, we might be disabling true alarms from ubsan that > reflect how the compiler will actually behave. > UBSan is an integral part of a compiler. > (d) Patch ubsan to disable the false alarms. This incurs a > maintenance burden, but maybe leaves less of a sharp edge to trip > on than setting compiler flags in the makefile -- updates that > change the behaviour are more likely to lead to merge conflicts. > I'm fine to patch the tool to disable false alarms or rather report them upstream. There are some indeed false positives sometimes when we depend on assumptions that are imposed after compilation, during linking phase. In these cases we disable sanitizing as there is no way to teach the tool. > (e) Patch our own code to suppress the false alarms. The cost to this > churn is that it can introduce bugs of its own, and make the code > harder to understand, and the complexity may become obsolete in > the next version of the tool but will remain a Chesterton's fence > (`why is there a dummy argument in all these functions? can we get > rid of it?'). > Every single commit can introduce bugs on their own.
Avoid UB in pslist.h (NULL + 0)
I propose to change the fun(pointer + 0) logic with fun(pointer, 0). We still maintain the sanity checks and we can optimize the code to generate not worse code than before. We can pass the "0" argument only with DIAGNOSTIC or DEBUG kernel. http://netbsd.org/~kamil/patch-00242-pslist-avoid-null-pointer-arithmetic.txt I confirm that with this change there are no longer any reports UBSan reports. I'm open to tune this patch to expected coding style. signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
On 09.03.2020 07:05, Martin Husemann wrote: > Also note that the getuid()/geteuid() example here is IMHO unrelated to the > original issue that caused this discussion, so I am not even convinced this > is NOT a ubsan bug. We instruct a C compiler that pointer used in the pserialize macros is never NULL, as the side effect of adding to it 0. As the pointer can be NULL, this at least confuses the compiler and can result in a miscompilation. We workaround it today with -fno-delete-null-pointer-checks in RUMP. In regular userland we shall avoid NULL pointer arithmetic. signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
On 08.03.2020 19:42, Mouse wrote: >>> The correct fix is not to disable the null-pointer-check option but >>> to remove the broken automatic non-null arguments in GCC. > >> The C standard and current usage (GNU) disagrees here. > > GNU is not some kind of arbiter of "current usage" (whatever _that_ > means). > >> memcpy (along with a number of other functions) must not accept NULL >> arguments and compiler can optimize the code based on these >> assumptions. > > Then such functions - or the language in which they are embedded - is > not suitable for writing kernels. > Theoretical C is not suitable for writing kernels and we need extensions for the freestanding environment. We require at least assembly stubs. > But, is it "must not accept null arguments" or is it "may do anything > they like when presented with null arguments"? Actually both. > But the answer is to fix the problem, not to > twist the code into a pretzel to work around the compiler's refusal to > be suitable for the job. > We use -fno-delete-null-pointer-checks to disable these optimizations. signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
On 08.03.2020 19:30, Taylor R Campbell wrote: >> Date: Sun, 8 Mar 2020 20:52:29 +0300 >> From: Roman Lebedev >> >> so we are allowed to lower that in clang front-end as: >> >> int >> statu(char *a) >> { >> __builtin_assume(a != NULL); >> a += getuid() - geteuid(); >> __builtin_assume(a != NULL); > > Allowed, yes. > > What I'm wondering is whether this is something Clang will actually do > -- and whether it is for any serious reason other than to capriciously > screw people who write software for real machines instead of for the > pure C abstract machine to the letter of the standard (and if so, > whether -fno-delete-null-pointer-checks is enough to disable it). > > Evidently making that assumption _not_ allowed in C++, so presumably > it's not important for performance purposes. It's also not important > for expressive purposes, because I could just as well have written > assert(a != NULL). > >>> I was told by Roman that it was checked during a C committee meeting and >>> confirmed to be an intentional UB. >> Correction: Aaron Ballman asked about this in non-public WG14 reflector >> mailing list, it wasn't a committee meeting, but the point still stands. > > What does `intentional' UB mean, exactly? What is the intent behind > having p + 0 for null p be undefined in C, while the C++ committee saw > fit to define it? > > Is it because there technically once existed C implementations on > bizarre architectures with wacky arithmetic on pointers like Zeta-C, > or is it because there are actual useful consequences to inferring > that p must be nonnull if we evaluate p + 0? > > I ask because in principle a conformant implementation could compile > the NetBSD kernel into a useless blob that does nothing -- we rely on > all sorts of behaviour relative to a real physical machine that is not > defined by the letter of the standard, like inline asm, or converting > integers from the VM system's virtual address allocator into pointers > to objects. But such an implementation would not be useful. > Aaron (as he was mentioned by name), is a voting member in the C++ committee and actively working on harmonizing C and C++ standards. There is a good chance that C and C++ will be synced here (they definitely should). signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
On 08.03.2020 18:12, Joerg Sonnenberger wrote: > On Sun, Mar 08, 2020 at 03:33:57PM +0100, Kamil Rytarowski wrote: >> There was also a request to make a proof that memcpy(NULL,NULL,0) is UB >> and can be miscompiled. >> >> Here is a reproducer: >> >> http://netbsd.org/~kamil/memcpy-ub.c >> >> 131 kamil@rugged /tmp $ gcc -O0 memcpy.c >> >> 132 kamil@rugged /tmp $ ./a.out >> >> 1 >> >> 133 kamil@rugged /tmp $ gcc -O2 memcpy.c >> 134 kamil@rugged /tmp $ ./a.out >> 0 >> >> A fallback for freestanding environment is to use >> -fno-delete-null-pointer-check. > > The correct fix is not to disable the null-pointer-check option but to > remove the broken automatic non-null arguments in GCC. > > Joerg > The C standard and current usage (GNU) disagrees here. memcpy (along with a number of other functions) must not accept NULL arguments and compiler can optimize the code based on these assumptions. -fno-delete-null-pointer-check is a fallback disabling this optimizations. signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
On 08.03.2020 18:11, Joerg Sonnenberger wrote: > On Sun, Mar 08, 2020 at 03:30:02PM +0100, Kamil Rytarowski wrote: >> NULL+x is now miscompiled by Clang/LLVM after this commit: >> >> https://reviews.llvm.org/rL369789 >> >> This broke various programs like: >> >> "Performing base + offset pointer arithmetic is only allowed when base >> itself is not nullptr. In other words, the compiler is assumed to allow >> that base + offset is always non-null, which an upcoming compiler >> release will do in this case. The result is that CommandStream.cpp, >> which calls this in a loop until the result is nullptr, will never >> terminate (until it runs junk data and crashes)." > > As you said, using a non-zero offset. Noone here argued that using > non-zero offsets is or should be valid since that would obviously create > a pointer outside the zero-sized object. > > Joerg > We catch NULL + x at least here: Undefined Behavior in t_subr_prf.c:179:9, pointer expression with base 0 overflowed to 0x14 Undefined Behavior in t_subr_prf.c:179:9, pointer expression with base 0 overflowed to 0xa signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
On 08.03.2020 18:00, Taylor R Campbell wrote: > Thanks for doing the research. > >> Date: Sun, 8 Mar 2020 15:30:02 +0100 >> From: Kamil Rytarowski >> >> NULL+0 was added to UBSan proactively as it is as of today not >> miscompiled, but it is planned to so in not so distant time. It is a >> chicken-egg problem. > > If you say it is planned, can you please cite the plan? > Adding Roman Levedev, we discussed about enabling NULL+0 optimization. Teaching LLVM about this will allow more aggressive optimization, so code like this: http://netbsd.org/~kamil/null_plus_0-ub.c will report different results depending on optimization levels. > In C++, adding zero to a null pointer is explicitly allowed and > guaranteed to return a null pointer. See, for example, C++11 5.7 > `Additive operators', clause 7, p. 117: `If the value 0 is added to or > subtracted from a pointer value, the result compares equal to the > original pointer value.' C++17 clarifies in 8.7 `Additive operators', > clause 7, p. 132: `If the value 0 is added to or subtracted from a > null pointer value, the result is a null pointer.' > > So it would be a little surprising to me if compilers -- which tend to > focus their efforts on C++ more than C these days -- went out of their > way to act on the inference that evaluating p + 0 implies p is nonnull > in C. > I underlined the C language in my message. More elaborated answer here: https://reviews.llvm.org/D67122#inline-612172 I was told by Roman that it was checked during a C committee meeting and confirmed to be an intentional UB. >> There is however a fallback. If we want to use NULL+0, we must use >> -fno-delete-null-pointer-checks that avoids miscompilation and raising >> UBSan reports. If we want to allow NULL+x we must enable -fwrapv. > > Adding -fno-delete-null-pointer-checks for clang too sounds sensible > to me in general, but please check with joerg first. It remains > unclear to me that it's necessary here. > Today it will calm down LLVM UBSan. In future potentially avoid miscompilation. There are also memcpy(NULL,NULL,0)-like cases that need research why they happen in the first place. signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
On 08.03.2020 15:30, Kamil Rytarowski wrote: > On 22.02.2020 17:25, Kamil Rytarowski wrote: >> When running the ATF tests under MKLIBCSANITIZER [1], there are many >> NULL pointer arithmetic issues . >> >> http://netbsd.org/~kamil/mksanitizer-reports/ubsan-2020-02-22-null-pointer.txt >> >> These issues are in macros like: >> - IN_ADDRHASH_READER_FOREACH() >> - IN_ADDRLIST_WRITER_INSERT_TAIL() >> - IFADDR_READER_FOREACH() >> - etc >> >> These macros wrap internally pserialize-safe linked lists. >> >> What's the proper approach to address this issue? >> >> These reports are responsible for around half of all kinds of the >> remaining Undefined Behavior unique issues when executing ATF tests. >> >> >> [1] ./build.sh -N0 -U -V MAKECONF=/dev/null -V HAVE_LLVM=yes -V MKGCC=no >> -V MKLLVM=yes -V MKLIBCSANITIZER=yes -j8 -u -O /public/netbsd-llvm >> distribution >> >> >> > > NULL + 0 and NULL + x are both Undefined Behavior (as confirmed by a C > committee member, it was discussed and confirmed to be intentional). > > NULL+x is now miscompiled by Clang/LLVM after this commit: > > https://reviews.llvm.org/rL369789 > > This broke various programs like: > > "Performing base + offset pointer arithmetic is only allowed when base > itself is not nullptr. In other words, the compiler is assumed to allow > that base + offset is always non-null, which an upcoming compiler > release will do in this case. The result is that CommandStream.cpp, > which calls this in a loop until the result is nullptr, will never > terminate (until it runs junk data and crashes)." > > https://github.com/google/filament/pull/1566 > > LLVM middle-end uses those guarantees for transformations. > > > This pushed the LLVM devs to implement this NULL+x and NULL+0 check in > UBSan: > > https://reviews.llvm.org/D67122 > > NULL+0 was added to UBSan proactively as it is as of today not > miscompiled, but it is planned to so in not so distant time. It is a > chicken-egg problem. > > There is however a fallback. If we want to use NULL+0, we must use > -fno-delete-null-pointer-checks that avoids miscompilation and raising > UBSan reports. If we want to allow NULL+x we must enable -fwrapv. > > This means that we can avoid pserialize reports by enabling these CFLAGS > for clang as well. > > 22 # We are compiling the kernel code with > no-delete-null-pointer-checks, > 23 # and compiling without it, causes issues at least on sh3 by adding > 24 # aborts after kern_assert on NULL pointer checks. > 25 CFLAGS+=${${ACTIVE_CC} == "gcc":? > -fno-delete-null-pointer-checks :} > 26 > > https://nxr.netbsd.org/xref/src/sys/rump/Makefile.rump#25 > > I'm going to test it and switch -fno-delete-null-pointer-check on > Clang/LLVM. > > > For the historical note, a reproducer miscompiling NULL + x: > > http://netbsd.org/~kamil/null_plus_x-ub.c > > 136 kamil@chieftec /tmp $ /usr/local/bin/clang -O0 null_plus_x-ub.c > 137 kamil@chieftec /tmp $ ./a.out > 1 > 138 kamil@chieftec /tmp $ /usr/local/bin/clang -O2 null_plus_x-ub.c > 139 kamil@chieftec /tmp $ ./a.out > 0 > 151 kamil@chieftec /tmp $ /usr/local/bin/clang -O3 -fwrapv null_plus_x-ub.c > 152 kamil@chieftec /tmp $ ./a.out > > 1 > > NULL+0 is planned to follow up. > There was also a request to make a proof that memcpy(NULL,NULL,0) is UB and can be miscompiled. Here is a reproducer: http://netbsd.org/~kamil/memcpy-ub.c 131 kamil@rugged /tmp $ gcc -O0 memcpy.c 132 kamil@rugged /tmp $ ./a.out 1 133 kamil@rugged /tmp $ gcc -O2 memcpy.c 134 kamil@rugged /tmp $ ./a.out 0 A fallback for freestanding environment is to use -fno-delete-null-pointer-check. signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
On 22.02.2020 17:25, Kamil Rytarowski wrote: > When running the ATF tests under MKLIBCSANITIZER [1], there are many > NULL pointer arithmetic issues . > > http://netbsd.org/~kamil/mksanitizer-reports/ubsan-2020-02-22-null-pointer.txt > > These issues are in macros like: > - IN_ADDRHASH_READER_FOREACH() > - IN_ADDRLIST_WRITER_INSERT_TAIL() > - IFADDR_READER_FOREACH() > - etc > > These macros wrap internally pserialize-safe linked lists. > > What's the proper approach to address this issue? > > These reports are responsible for around half of all kinds of the > remaining Undefined Behavior unique issues when executing ATF tests. > > > [1] ./build.sh -N0 -U -V MAKECONF=/dev/null -V HAVE_LLVM=yes -V MKGCC=no > -V MKLLVM=yes -V MKLIBCSANITIZER=yes -j8 -u -O /public/netbsd-llvm > distribution > > > NULL + 0 and NULL + x are both Undefined Behavior (as confirmed by a C committee member, it was discussed and confirmed to be intentional). NULL+x is now miscompiled by Clang/LLVM after this commit: https://reviews.llvm.org/rL369789 This broke various programs like: "Performing base + offset pointer arithmetic is only allowed when base itself is not nullptr. In other words, the compiler is assumed to allow that base + offset is always non-null, which an upcoming compiler release will do in this case. The result is that CommandStream.cpp, which calls this in a loop until the result is nullptr, will never terminate (until it runs junk data and crashes)." https://github.com/google/filament/pull/1566 LLVM middle-end uses those guarantees for transformations. This pushed the LLVM devs to implement this NULL+x and NULL+0 check in UBSan: https://reviews.llvm.org/D67122 NULL+0 was added to UBSan proactively as it is as of today not miscompiled, but it is planned to so in not so distant time. It is a chicken-egg problem. There is however a fallback. If we want to use NULL+0, we must use -fno-delete-null-pointer-checks that avoids miscompilation and raising UBSan reports. If we want to allow NULL+x we must enable -fwrapv. This means that we can avoid pserialize reports by enabling these CFLAGS for clang as well. 22 # We are compiling the kernel code with no-delete-null-pointer-checks, 23 # and compiling without it, causes issues at least on sh3 by adding 24 # aborts after kern_assert on NULL pointer checks. 25 CFLAGS+=${${ACTIVE_CC} == "gcc":? -fno-delete-null-pointer-checks :} 26 https://nxr.netbsd.org/xref/src/sys/rump/Makefile.rump#25 I'm going to test it and switch -fno-delete-null-pointer-check on Clang/LLVM. For the historical note, a reproducer miscompiling NULL + x: http://netbsd.org/~kamil/null_plus_x-ub.c 136 kamil@chieftec /tmp $ /usr/local/bin/clang -O0 null_plus_x-ub.c 137 kamil@chieftec /tmp $ ./a.out 1 138 kamil@chieftec /tmp $ /usr/local/bin/clang -O2 null_plus_x-ub.c 139 kamil@chieftec /tmp $ ./a.out 0 151 kamil@chieftec /tmp $ /usr/local/bin/clang -O3 -fwrapv null_plus_x-ub.c 152 kamil@chieftec /tmp $ ./a.out 1 NULL+0 is planned to follow up. signature.asc Description: OpenPGP digital signature
Re: Current status of "support jails-like features"?
On 08.03.2020 03:55, メーリングリストNetBSD wrote: > Hello, > > I'm interested in "support jails-like features" [1] among The NetBSD > projects. > I investigated several current container systems [2], but couldn't find > one providing kernel-level virtualization. > I would like to know the current state of this project. Are there any > NetBSD developers working on this project? > There is initial work in this domain. Replied in a private mail. > [1] http://wiki.netbsd.org/projects/project/kernel_components/ > [2] https://github.com/NetBSDfr/sailor > https://github.com/jmmv/sandboxctl > > --- > wataru kasai signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
On 24.02.2020 23:35, Mouse wrote: >> Unless I remember wrong, older C standards explicitly say that the >> integer 0 can be converted to a pointer, and that will be the NULL >> pointer, and a NULL pointer cast as an integer shall give the value >> 0. > > The only one I have anything close to a copy of is C99, for which I > have a very late draft. > > Based on that: > > You are not quite correct. Any integer may be converted to a pointer, > and any pointer may be converted to an integer - but the mapping is > entirely implementation-dependent, except in the integer->pointer > direction when the integer is a "null pointer constant", defined as > "[a]n integer constant expression with the value 0" (or such an > expression cast to void *, though not if we're talking specifically > about integers), in which case "the resulting pointer, called a null > pointer, is guaranteed to compare unequal to a pointer to any object or > function". You could have meant that, but what you wrote could also be > taken as applying to the _run-time_ integer value 0, which C99's > promise does not apply to. (Quotes are from 6.3.2.3.) > > I don't think there is any promise that converting a null pointer of > any type back to an integer will necessarily produce a zero integer. > > /~\ The ASCII Mouse > \ / Ribbon Campaign > X Against HTML mo...@rodents-montreal.org > / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B > $ cat test.cpp #include int main(int argc, char **argv) { if (((char *)0)[argc]) return 1; else return 0; } $ g++ test.cpp $ ./a.out Memory fault (core dumped) And some variations: $ g++ test.cpp test.cpp: In function ‘int main(int, char**)’: test.cpp:6:15: warning: converting NULL to non-pointer type [-Wconversion-null] if (NULL[argc]) ^ test.cpp:6:15: error: invalid types ‘long int[int]’ for array subscript $ g++ test.cpp test.cpp: In function ‘int main(int, char**)’: test.cpp:6:18: error: invalid types ‘std::nullptr_t[int]’ for array subscript if (nullptr[argc]) ^ NULL in C is expected to be harmonized with nullptr from C++. We still can store NULL/nullptr in variables as before and there is no change in the produced code. The only change is on the syntax level as we can catch more bugs earlier. Whenever a compiler will be smart enough to deduce that the code is nullptr[0] it will raise an error. signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
On 24.02.2020 21:18, Mouse wrote: >>> If we use 0x0, it can be a valid pointer. > >>> If we use NULL, it's not expected to work and will eventually >>> generate a syntax erro. > > Then someone has severely broken compatability with older versions of > C. 0x0 and (when one of the suitable #includes has been done) NULL > have both historically been perfectly good null pointer constants. > > Also...syntax error? Really? _Syntax_ error?? I'd really like to see > what they've done to the grammar to lead to that; I'm having trouble > imagining how that would be done. > The process of evaluation of the NULL semantics is not a recent thing. Not so long time, still in the NetBSD times, it was a general practice to allow dereferencing the NULL pointer and expect zeroed bytes over there. We still maintain compatibility with this behavior (originated as a hack in PDP11) in older NetBSD releases (NetBSD-0.9 Franz Lisp binaries depend on this). > /~\ The ASCII Mouse > \ / Ribbon Campaign > X Against HTML mo...@rodents-montreal.org > / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B > signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
On 24.02.2020 15:35, Don Lee wrote: > >> On Feb 24, 2020, at 8:05 AM, Mouse wrote: >> > RUST is better defined that C and is indeed used in OS development > these days ...so? I don't see how this is related to the rest of the discussion. >>> As C is considered as not suitable for OS development, >> >> Once again, there is no such language as C. There is a family of >> closely related languages collectively called C. >> >> But it's actually the compiler, not the language. >> >>> there is an escape plan, already with a successful story in this >>> domain. >> >> There's another one, and one that doesn't require the complete rewrite >> a switch as drastic as C->rust would: various compilers (including >> older versions of the gcc family) that don't think it reasonable to >> take clear code and language-lawyer it into broken executables. >> > We need to be mindful of the gargantuan body of code written in “C”, > expecting the “old” behavior, much of it no longer having any sort of support. > > Software lives almost as long as government programs. > > -dgl- > While there, CHERI CPU can catch invalid intermediates (invalid pointer, before dereferencing). This is something that breaks a lot of old C code. tcpdump (that still preserves ifdefs for MSDOS) received rewrite to remove these types of bugs. https://www.cl.cam.ac.uk/~dc552/papers/asplos15-memory-safe-c.pdf signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
On 24.02.2020 15:04, Jason Thorpe wrote: > >> On Feb 24, 2020, at 4:22 AM, Kamil Rytarowski wrote: >> >> A compiler once being smart enough can introduce ILL/SEGV traps into >> code that performs operations on NULL pointers. This already bitten us >> when we were registering a handler at address 0x0 for the kernel code, >> GCC changed the operation into a cpu trap. (IIRC it was in the sparc code.) > > Nonsense, I think it's fair to classify that as a bug. That sort of stuff is > *not* supposed to happen if -ffreestanding is passed to the compiler. > > -- thorpej > If we use 0x0, it can be a valid pointer. If we use NULL, it's not expected to work and will eventually generate a syntax erro. UBSan as a runtime tool tries to indirectly catch the latter with the former and is prone to some rare false positives (so far not reported). If a compiler is too smart for 0x0 pointers, transforming them to abort traps, it is a compiler bug. I noted that this already happens. On 24.02.2020 15:05, Mouse wrote: > (3) If you have reason to think the C committee would be interested in > having me as a member, let me know whom to talk to. I might or might > not actually end up interested in joining, but I'd like more info. http://www.open-std.org/jtc1/sc22/wg14/ signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
On 24.02.2020 14:03, Mouse wrote: It is now in C++ mainstream and already in C2x draft. >>> Then those are not suitable languages for OS implementations. >> This battle is lost for C > > C is not a language. C is a family of closely related languages. > If we tread C as gnu89 gnu99 gnu11 k&r etc this is true. > Some of them are suitable for OS implementation. It appears some of > the more recent ones are not, but this does not mean the older ones > also aren't. > From my perception the trend is inversed. Things that were undefined or unspecified in older revisions of C, are more clearly defined now. > Undefined behaviour as a way of describing differences between > implementations, things that it limits portability to depend on, is > useful. Undefined behaviour as a license-by-fiat for compilers to > unnecessarily transform code in unexpected ways is not. Software > languages and their compilers exist to serve their users, not the other > way around; it is not a compiler's place to take the position of "ha > ha, the code you wrote is clear but I can find a way to lawyer it into > formally undefined behaviour, so I'm going to transform it into > something I know damn well you didn't expect". > Please join the C committee as a voting member or at least submit papers with language changes. Complaining here won't change anything. (Out of people in the discussion, I am involved in wg14 discussions and submit papers.) >> RUST is better defined that C and is indeed used in OS development >> these days > > ...so? I don't see how this is related to the rest of the discussion. > As C is considered as not suitable for OS development, there is an escape plan, already with a successful story in this domain. > /~\ The ASCII Mouse > \ / Ribbon Campaign > X Against HTML mo...@rodents-montreal.org > / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B > signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
On 24.02.2020 13:41, Joerg Sonnenberger wrote: > On Mon, Feb 24, 2020 at 11:42:01AM +0100, Kamil Rytarowski wrote: >> Forbidding NULL pointer arithmetic is not just for C purists trolls. It >> is now in C++ mainstream and already in C2x draft. > > This is not true. NULL pointer arithmetic and nullptr arithmetic are > *very* different things. Do not conflate them. > > Joerg > As noted, they are allowed to be practically the same in C++. The C proposal (n2394) NULL is marked as deprecated and NULL should be set to nullptr. signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
On 24.02.2020 12:14, Mouse wrote: >> Forbidding NULL pointer arithmetic is not just for C purists trolls. >> It is now in C++ mainstream and already in C2x draft. > > Then those are not suitable languages for OS implementations. > > I'm with campbell and mrg on this one. It is not appropriate to twist > NetBSD's code into a pretzel to work around "bugs" created by language > committees deciding to give compilers new latitutde to "optimize" > meaningful code into trash. > This battle is lost for C and not be fought on a downstream user of a C compiler (Matt Thomas insisted at some point to get the kernel buildable with C++ and patched it for this..). A compiler once being smart enough can introduce ILL/SEGV traps into code that performs operations on NULL pointers. This already bitten us when we were registering a handler at address 0x0 for the kernel code, GCC changed the operation into a cpu trap. (IIRC it was in the sparc code.) Looking at it from the proper perspective, the only rumpkernel reported NULL->0 arithmetic is performed by the pserialize macros. Once we will patch them, the problem can go away. So claim about twisting the kernel code or churn is exaggeration. RUST is better defined that C and is indeed used in OS development these days (there are startups doing OS development in RUST, e.g. https://github.com/oxidecomputer). signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
On 24.02.2020 05:03, Taylor R Campbell wrote: >> Date: Sun, 23 Feb 2020 22:51:08 +0100 >> From: Kamil Rytarowski >> >> On 23.02.2020 20:08, Taylor R Campbell wrote: >> Date: Sun, 23 Feb 2020 22:51:08 +0100 >> From: Kamil Rytarowski >> >> On 23.02.2020 20:08, Taylor R Campbell wrote: >>>> Date: Sat, 22 Feb 2020 17:25:42 +0100 >>>> From: Kamil Rytarowski >>>> >>>> What's the proper approach to address this issue? >>> >>> What do these reports mean? >>> >>> UBSan: Undefined Behavior in >>> /usr/src/sys/rump/net/lib/libnet/../../../../netinet6/in6.c:2351:2, pointer >>> expression with base 0 overflowed to 0 >> >> We added 0 to a NULL pointer. >> >> They can be triggered by code like: >> >> char *p = NULL; >> p += 0; > > It seems to me the proper approach is to teach the tool to accept > this, and to avoid cluttering the tree with churn to work around the > tool's deficiency, unless there's actually a serious compelling > argument -- beyond a language-lawyering troll -- that (char *)NULL + 0 > is meaningfully undefined. > > We already assume, for example, that memset(...,0,...) is the same as > initialization to null pointers where the object in question is a > pointer or has pointers as subobjects. > Forbidding NULL pointer arithmetic is not just for C purists trolls. It is now in C++ mainstream and already in C2x draft. The newer C standard will most likely (already accepted by the committee) adopt nullptr on par with nullptr from C++. In C++ we can "#define NULL nullptr" and possibly the same will be possible in C. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2394.pdf This will change all arithmetic code operating on NULL into syntax error. > I think we should treat memcpy(NULL,NULL,0) similarly and tell the > tool `no, on NetBSD that really is defined and we're not interested in > hearing about theoretical nasal demons from armchair language > lawyers'. > memcpy(3) and other string functions are different. It is undefined if we just run it with memcpy(rand(), rand(), 0) and the first two arguments point to invalid memory. memcpy(0, 0, x) have another issue with overlapping memory that makes it undefined. In theory memcpy(x,y,z) where x or y are 0 is valid, whenever we map 0x0 in the address space, but that is so rare that GCC defines these arguments as nonnull. signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
On 23.02.2020 20:08, Taylor R Campbell wrote: >> Date: Sat, 22 Feb 2020 17:25:42 +0100 >> From: Kamil Rytarowski >> >> When running the ATF tests under MKLIBCSANITIZER [1], there are many >> NULL pointer arithmetic issues . >> >> http://netbsd.org/~kamil/mksanitizer-reports/ubsan-2020-02-22-null-pointer.txt >> >> These issues are in macros like: >> - IN_ADDRHASH_READER_FOREACH() >> - IN_ADDRLIST_WRITER_INSERT_TAIL() >> - IFADDR_READER_FOREACH() >> - etc >> >> These macros wrap internally pserialize-safe linked lists. >> >> What's the proper approach to address this issue? > > What do these reports mean? > > UBSan: Undefined Behavior in > /usr/src/sys/rump/net/lib/libnet/../../../../netinet6/in6.c:2351:2, pointer > expression with base 0 overflowed to 0 > We added 0 to a NULL pointer. They can be triggered by code like: char *p = NULL; p += 0; or char *p = NULL; if (p[0] == NULL) {} signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
On 22.02.2020 19:39, Joerg Sonnenberger wrote: > On Sat, Feb 22, 2020 at 05:25:42PM +0100, Kamil Rytarowski wrote: >> When running the ATF tests under MKLIBCSANITIZER [1], there are many >> NULL pointer arithmetic issues . > > Which flags are the sanitizers using? Because I wouldn't be surprised if > they just hit _PSLIST_VALIDATE_PTRS and friends. > > Joerg > This patch did not help. I double checked that this branch is really taken. Index: sys/sys/pslist.h === RCS file: /cvsroot/src/sys/sys/pslist.h,v retrieving revision 1.7 diff -u -r1.7 pslist.h --- sys/sys/pslist.h1 Dec 2019 15:28:19 - 1.7 +++ sys/sys/pslist.h22 Feb 2020 20:51:42 - @@ -32,6 +32,7 @@ #ifndef_SYS_PSLIST_H #define_SYS_PSLIST_H +#include #include #include @@ -288,7 +289,9 @@ * Type-safe macros for convenience. */ -#if defined(__COVERITY__) || defined(__LGTM_BOT__) +#if defined(__COVERITY__) || defined(__LGTM_BOT__) || \ + __has_feature(undefined_behavior_sanitizer) || \ + defined(__SANITIZE_UNDEFINED__) #define_PSLIST_VALIDATE_PTRS(P, Q) 0 #define_PSLIST_VALIDATE_CONTAINER(P, T, F) 0 #else signature.asc Description: OpenPGP digital signature
NULL pointer arithmetic issues
When running the ATF tests under MKLIBCSANITIZER [1], there are many NULL pointer arithmetic issues . http://netbsd.org/~kamil/mksanitizer-reports/ubsan-2020-02-22-null-pointer.txt These issues are in macros like: - IN_ADDRHASH_READER_FOREACH() - IN_ADDRLIST_WRITER_INSERT_TAIL() - IFADDR_READER_FOREACH() - etc These macros wrap internally pserialize-safe linked lists. What's the proper approach to address this issue? These reports are responsible for around half of all kinds of the remaining Undefined Behavior unique issues when executing ATF tests. [1] ./build.sh -N0 -U -V MAKECONF=/dev/null -V HAVE_LLVM=yes -V MKGCC=no -V MKLLVM=yes -V MKLIBCSANITIZER=yes -j8 -u -O /public/netbsd-llvm distribution signature.asc Description: OpenPGP digital signature
Re: fault(4)
On 08.02.2020 11:47, Maxime Villard wrote: > > Running ATF with kASan+LOCKDEBUG+fault with {N=32 scope=GLOBAL} already > gives > an instant crash: > > kernel diagnostic assertion "radix_tree_empty_tree_p(&pmap->pm_pvtree)" > failed: file ".../sys/arch/x86/x86/pmap.c" > There is a number of similar reports on syzbot. > Looks like radixtree.c doesn't handle allocation failures very well > somewhere. > > fault(4) seems like the kind of feature that would be useful for > stress-testing > and fuzzing. As you can see in the diff, its code is extremely simple. > > Maxime > > [1] https://m00nbsd.net/garbage/fault/fault.diff This tool is a must have but I defer review to others.
SIGCHLD set to SIG_DFL on exec(3)
On 06.02.2020 20:51, Robert Elz wrote: > Module Name: src > Committed By: kre > Date: Thu Feb 6 19:51:59 UTC 2020 > > Modified Files: > src/bin/sh: main.c > > Log Message: > If we are invoked with SIGCHLD ignored, we fail badly, as we assume > that we can always wait(2) for our children, and an ignored SIGCHLD > prevents that. Recent versions of bash can be convinced (due to a > bug most likely) to invoke us that way. Always return SIGCHLD to > SIG_DFL during init - we already prevent scripts from fiddling it. > > All ash derived shells apparently have this problem (observed by > Martijn Dekker, and notified on the bash-bug list). Actual issue > diagnosed by Harald van Dijk (same list). > > + (void)signal(SIGCHLD, SIG_DFL); We are allowed to fix this in the kernel for everybody: "If the SIGCHLD signal is set to be ignored by the calling process image, it is unspecified whether the SIGCHLD signal is set to be ignored or to the default action in the new process image." "This volume of POSIX.1-2017 specifies that signals set to SIG_IGN remain set to SIG_IGN, and that the new process image inherits the signal mask of the thread that called exec in the old process image. This is consistent with historical implementations, and it permits some useful functionality, such as the nohup command. However, it should be noted that many existing applications wrongly assume that they start with certain signals set to the default action and/or unblocked. In particular, applications written with a simpler signal model that does not include blocking of signals, such as the one in the ISO C standard, may not behave properly if invoked with some signals blocked. Therefore, it is best not to block or ignore signals across execs without explicit reason to do so, and especially not to block signals across execs of arbitrary (not closely cooperating) programs." https://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html signature.asc Description: OpenPGP digital signature
Kernel (9.99.44) responsiveness issues
I keep observing responsiveness issues on NetBSD-current. This happened in last 2 months. Whenever I start building something with -j${CORES}, I have significant delays of responsiveness in other applications. load averages: 2.69, 5.56, 6.22; up 0+01:32:42 12:12:34 71 processes: 69 sleeping, 2 on CPU CPU states: 0.0% user, 0.0% nice, 0.0% system, 0.1% interrupt, 99.8% idle Memory: 19G Act, 9639M Inact, 416K Wired, 34M Exec, 19G File, 43M Free Swap: 64G Total, 64G Free PID USERNAME PRI NICE SIZE RES STATE TIME WCPUCPU COMMAND 0 root 00 0K 87M CPU/7 0:40 0.00% 0.49% [system] 15823 root 85016M 2508K poll/1 0:01 0.00% 0.00% nbmake 25446 kamil 43028M 2452K CPU/0 0:00 0.00% 0.00% top 14117 root 114027M 3356K tstile/0 0:00 0.00% 0.00% ld 29088 root 114027M 3280K tstile/3 0:00 0.00% 0.00% ld 20839 root 114027M 3208K tstile/1 0:00 0.00% 0.00% ld 19550 root 114026M 3184K tstile/6 0:00 0.00% 0.00% ld 13716 root 114026M 3104K tstile/2 0:00 0.00% 0.00% ld 8758 root 114026M 3048K tstile/7 0:00 0.00% 0.00% ld 240 root 114026M 2580K tstile/0 0:00 0.00% 0.00% ld I can see in top(1) that processes are locked in turnstiles and load goes down. $ uname -a NetBSD chieftec 9.99.44 NetBSD 9.99.44 (GENERIC) #0: Fri Jan 31 19:26:07 CET 2020 root@chieftec:/public/netbsd-root/sys/arch/amd64/compile/GENERIC amd64 135 kamil@chieftec /home/kamil $ cpuctl list Num HwId Unbound LWPs Interrupts Last change #Intr -- - 00online intr Sun Feb 2 10:40:26 2020 13 12online intr Sun Feb 2 10:40:26 2020 0 24online intr Sun Feb 2 10:40:26 2020 0 36online intr Sun Feb 2 10:40:26 2020 0 41online intr Sun Feb 2 10:40:26 2020 0 53online intr Sun Feb 2 10:40:26 2020 0 65online intr Sun Feb 2 10:40:26 2020 0 77online intr Sun Feb 2 10:40:26 2020 0 136 kamil@chieftec /home/kamil $ cpuctl identify 0 Cannot bind to target CPU. Output may not accurately describe the target. Run as root to allow binding. cpu0: highest basic info 000d cpu0: highest extended info 8008 cpu0: "Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz" cpu0: Intel Xeon E3-1200v2 and 3rd gen core, Ivy Bridge (686-class), 3392.48 MHz cpu0: family 0x6 model 0x3a stepping 0x9 (id 0x306a9) cpu0: features 0xbfebfbff cpu0: features 0xbfebfbff cpu0: features 0xbfebfbff cpu0: features1 0x7fbae3ff cpu0: features1 0x7fbae3ff cpu0: features1 0x7fbae3ff cpu0: features2 0x28100800 cpu0: features3 0x1 cpu0: features5 0x281 cpu0: xsave features 0x7 cpu0: xsave instructions 0x1 cpu0: xsave area size: current 832, maximum 832, xgetbv enabled cpu0: enabled xsave 0x7 cpu0: I-cache 32KB 64B/line 8-way, D-cache 32KB 64B/line 8-way cpu0: L2 cache 256KB 64B/line 8-way cpu0: L3 cache 8MB 64B/line 16-way cpu0: 64B prefetching cpu0: ITLB 64 4KB entries 4-way, 2M/4M: 8 entries cpu0: DTLB 64 4KB entries 4-way, 2M/4M: 32 entries (L0) cpu0: L2 STLB 512 4KB entries 4-way cpu0: Initial APIC ID 1 cpu0: Cluster/Package ID 0 cpu0: Core ID 0 cpu0: SMT ID 1 cpu0: MONITOR/MWAIT extensions 0x3 cpu0: monitor-line size 64 cpu0: C1 substates 2 cpu0: C2 substates 1 cpu0: C3 substates 1 cpu0: DSPM-eax 0x77 cpu0: DSPM-ecx 0x9 cpu0: SEF highest subleaf cpu0: Perfmon-eax 0x7300403 cpu0: Perfmon-eax 0x7300403 cpu0: Perfmon-edx 0x603 cpu0: microcode version 0x15, platform ID 1 signature.asc Description: OpenPGP digital signature
Re: x86 bootstrap features
On 20.01.2020 17:42, Emile `iMil' Heitor wrote: > > Hi Kamil, Emmanuel & all, > > On Tue, 24 Sep 2019, Kamil Rytarowski wrote: > >> On 24.09.2019 14:26, Emmanuel Dreyfus wrote: >>> On Tue, Sep 24, 2019 at 01:31:51PM +0200, Kamil Rytarowski wrote: >>>> My use-case is "qemu-system-x86_64 -kernel ./netbsd". Last I tried >>>> (with >>>> multiboot2 patches merged) it still did not work. >>> >>> I did not commit anything in multiboot support in the NetBSD kernel, >>> I only worked on bootstraps for now, hence the steady failure you >>> experience should come at no suprise. >>> >>> For now our kernel has support code for multiboot 1 for i386 only. >> >> qemu-system-i386 works, but -x86_64 not. >> >> Are there plans to add it to the amd64 kernel? > > Is there any news on this front? Being able to boot an amd64 kernel > directly > from kvm would give NetBSD the ability to be started by AWS > Firecracker[1] out > of the box which would be amazing. > > [1]: https://github.com/firecracker-microvm/firecracker > There was some work on multiboot but it was reverted. I consider qemu -kernel NetBSD/amd64 booting as a high priority. > > Emile `iMil' Heitor * > _ > | http://imil.net | ASCII ribbon campaign ( ) > | http://www.NetBSD.org | - against HTML email X > | http://gcu.info | & vCards / \ > > > !DSPAM:5e25d89018011320695049! > signature.asc Description: OpenPGP digital signature
Re: GSoC Proposal - Defragmentation for FFS
On 20.01.2020 04:53, Chen,Xizi wrote: > Hi NetBSD Community, > Welcome@ > I am currently a first year Master's Degree student studying Computer > Science at Northwest Missouri State University. I'm a professional C/C++ > developer with about 4 years of work experience as a storage systems > engineer. My work mostly revolved around storage stack and I/O path in > general. > > One of the issues that hits very close to home for me is file system > aging. I'm interested in learning in detail how to tackle fragmentation. > I believe "Defragmentation for FFS" project provides me an excellent > opportunity to learn more. > > I have experience using and writing software in Linux, including > experience in writing and debugging kernel modules. I am interested in > learning more about the project "Defragmentation for FFS" albeit my work > mostly involved working with XFS, I am more than willing to learn and > contribute to it. > > Looking forward to learning more about the project and working with > NetBSD community. > > Cheers, > /Xizi Chen,/ > /NWMSU/ Please elaborate your involvement in XFS. There is an ongoing (but not formally GSoC) project by Maciej to port XFS testsuite for the NetBSD filesystems. We probably could use some help here. signature.asc Description: OpenPGP digital signature
Re: sys_ptrace_lwpstatus.c (Was: CVS commit: src/sys)
On 26.12.2019 15:11, Valery Ushakov wrote: > On Thu, Dec 26, 2019 at 08:52:39 +0000, Kamil Rytarowski wrote: > >> Module Name: src >> Committed By:kamil >> Date:Thu Dec 26 08:52:39 UTC 2019 >> >> Modified Files: >> src/sys/kern: files.kern sys_ptrace_common.c >> src/sys/sys: ptrace.h >> Added Files: >> src/sys/kern: sys_ptrace_lwpstatus.c >> >> Log Message: >> Put ptrace_read_lwpstatus() and process_read_lwpstatus() to a new file >> >> Fixes "no PTRACE" kernel build, in particular zaurus kernel=INSTALL_C700. > > This is counterintuitive when a sys_ptrace* file with ptrace_* > functions does not depend on options ptrace. That seems to be a > strong indication the functions and the file are misnamed. > > filekern/sys_ptrace.c ptrace > filekern/sys_ptrace_common.cptrace > filekern/sys_ptrace_lwpstatus.c kern > > -uwe > I will refactor this. signature.asc Description: OpenPGP digital signature
Introducing PT_LWPSTATUS + PT_LWPNEXT, obsoleting PT_LWPINFO
PT_LWPINFO is a legacy ptrace(2) operation that was originally intended to retrieve the thread (LWP) information inside a traced process. At the end of the day, this call has been designed to work as an iterator over threads and retrieve the LWP id + event information. The event information is received in a raw format (PL_EVENT_NONE, PL_EVENT_SIGNAL, PL_EVENT_SUSPENDED). Problems: 1. PT_LWPINFO shares the operation name with PT_LWPINFO from FreeBSD that works differently and is used for different purposes: - On FreeBSD PT_LWPINFO returns pieces of information for the suspended thread, not the next thread in iteration. - FreeBSD uses a custom interface for iterating over threads (actually retrieving the threads is done with PT_GETNUMLWPS + PT_GETLWPLIST). - There is almost no overlapping correct usage of PT_LPWINFO on NetBSD and PL_LWPINFO FreeBSD and this causes confusion and misuse of the interfaces (recently I fixed such misuse in the DTrace code). 2. pl_event can merely return whether a signal was emitted to all threads or a single one. There is no information whether this is per-LWP signal or per-PROC signal, no siginfo_t information attached etc. 3. Syncing our behavior with FreeBSD would mean complete breakage of our PT_LWPINFO users and it is actually not needed, as we receive full siginfo_t through Linux-like PT_GET_SIGINFO instead of reimplementing siginfo_t inside ptrace_lwpinfo in a FreeBSD-style. (Actually FreeBSD wanted to follow up after NetBSD and adopt some of our APIs in ptrace(2) and signals.). 4. Our PT_LWPINFO is reduced in usability to list LWP ids in a traced process. 5. The PT_LWPINFO semantics cannot be used in core files as-is (as our PT_LPWINFO returns next LWP, not the prompted one) and pl_event is at least redundant with netbsd_elfcore_procinfo.cpi_siglwp... and still less powerful (as it cannot distinguish per-LWP and per-PROC signal in a single-threaded application). 6. PT_LWPINFO is already documented in the BUGS section of ptrace(2)... as it contains more flaws. This is basically the only weak part of our ptrace(2) API. Proposed solution: 1. Remove PT_LWPINFO from the public ptrace(2) API, keep it only as a hidden namespaced symbol for legacy purposes. 2. Introduce PT_LWPSTATUS that is prompts the kernel about exact thread and retrieves useful information about LWP. 3. Introduce PT_LWPNEXT with the iteration semantics from PT_LWPINFO, namely return the next LWP. 4. Ship with per-LWP information in core(5) files as "PT_LWPSTATUS@nnn". 5. Fix flattening the signal context in netbsd_elfcore_procinfo in core(5) files and move per-LWP signal information to per-LWP structure "PT_LWPSTATUS@nnn". 6. Do not bother with FreeBSD like PT_GETNUMLWPS + PT_GETLWPLIST calls, as this is a micro-optimization. We intend to retrieve the list of threads once on attach/exec and later trace them through the LWP events (PTRACE_LWP_CREATE, PTRACE_LWP_EXIT). It's more valuable to keep more compat with current usage of PT_LWPINFO. 7. Keep the existing ATF tests for PT_LWPINFO to avoid rot. PT_LWPSTATUS and PT_LWPNEXT operate over newly introduced "struct ptrace_lwpstatus". This structure is inspired by: - SmartOS lwpstatus_t, - struct ptrace_lwpinfo from NetBSD, - struct ptrace_lwpinfo from FreeBSD and their usage in real existing world-wide open-source software. #define PL_LNAMELEN 20 /* extra 4 for alignment */ struct ptrace_lwpstatus { lwpid_t pl_lwpid; /* LWP described */ sigset_tpl_sigpend; /* LWP signals pending */ sigset_tpl_sigmask; /* LWP signal mask */ charpl_name[PL_LNAMELEN]; /* LWP name, may be empty */ void*pl_private;/* LWP private data */ /* Add fields at the end */ }; - pt_lwpid is picked from PT_LWPINFO. - pl_event is removed entirely as useless, misleading and harmful. - pl_sigpend and pl_sigmask are mainly intended to untangle the cpi_sig* fields from "struct ptrace_lwpstatus" (fix "XXX" in the kernel code). - pl_name is a quick to use API to retrieve the LWP name, replacing sysctl() prompting (previous algorithm: retrieving the number of LWPs, retrieving all LWPs, iterating over them, finding matching id, copying LWP name); pl_name will also ship with the missing LWP name information in core(5) files - pl_private implements currently missing interface to read the TLS base value. In the end I have decided to avoid a write-mode version of PT_LWPSTATUS that rewrites signals, name or private pointer. These options are practically unused in real existing open-source software. There are 2 exceptions that I am familiar with but both are specific to kludges overusing ptrace(2). Once these operations will be really needed, they can be implemented without write-mode version of PT_LWPSTATUS, patching guest's code. Diff fixing the build against the in-sources GDB is as follows: diff --git a/
Re: [filemon] CVS commit: htdocs/support/security
On 17.12.2019 15:44, Andrew Doran wrote: Typically with a character device, the kmod can get unloaded while an ioctl is being executed on it. > > That's solvable. Conceptually I think the main stumbling block is that > there are two layers at play which need to reconciled: specfs and devsw. It > could also be an opportunity to lessen the distinction between block and > character devices, there's no real need for cached access from userspace, > that would simplify things too. > When it comes to syscalls, I haven't looked closely, but the issue is likely the same. > > It's atomic and side effect free if done correctly. We have pleasing > examples of this. This is hard to get right though, so naturally mistakes > are made. > It would be nice to have at least an example of doing it right. > Andrew >
Re: [filemon] CVS commit: htdocs/support/security
On 17.12.2019 14:19, Maxime Villard wrote: > Le 17/12/2019 à 12:34, Kamil Rytarowski a écrit : >> On 17.12.2019 09:16, Maxime Villard wrote: >>>> Module Name: htdocs >>>> Committed By: christos >>>> Date: Tue Dec 17 01:03:49 UTC 2019 >>>> >>>> Modified Files: >>>> htdocs/support/security: advisory.html index.html >>>> >>>> Log Message: >>>> new advisory >>>> >>>> >>>> To generate a diff of this commit: >>>> cvs rdiff -u -r1.140 -r1.141 htdocs/support/security/advisory.html >>>> cvs rdiff -u -r1.173 -r1.174 htdocs/support/security/index.html >>>> >>>> Please note that diffs are not public domain; they are subject to the >>>> copyright notices on the relevant files. >>> >>> There is something I don't understand here. Why keep this totally >>> useless >>> misfeature, when we already have many tracing facilities that can do >>> just >>> about the same job without having security issues? >>> >>> The recommendation in the advisory is literally to remove the kernel >>> module from the fs. I don't see what could possibly be the use of such a >>> misfeature as filemon; I would remove it completely from the kernel >>> source tree. >>> >>> Maxime >> >> From: >> http://ftp.netbsd.org/pub/NetBSD/security/advisories/NetBSD-SA2019-006.txt.asc >> >> >> "Additionally the way filemon does filesystem interception is racy >> and can lead to random crashes if the system calls are in use >> while the module is unloaded." >> >> Is this issue fixable? Not speaking for filemon in particular, I find >> this ability to rewrite the syscall table on the fly as a feature. >> Keeping a functional module with this property (even if disabled by >> default) seems useful to me. > > As far as I can tell, there are many races caused by autoloading. > > Typically with a character device, the kmod can get unloaded while an ioctl > is being executed on it. When it comes to syscalls, I haven't looked > closely, but the issue is likely the same. > > You can use tricks to "narrow down" the races; eg in NVMM, I use a global > 'nmachines' variable, which prevents unloading in ~most cases. But I see > no way to fix these races, except using atomic refcounts and mutexes on > the ioctls and syscalls; obviously, this won't scale. > > I find this autoloading stuff to be a misfeature, too. If we want something > on by default, then we should put it in GENERIC; if we want it disabled but > accessible, then we should put it in a kmod that requires a manual modload > from root. > > Putting stuff in kmods AND having the kernel load them automatically serves > no purpose; it just adds bugs, and sometimes creates the wrong feeling that > a driver is disabled while it isn't (like filemon). > > Other than that, I don't really understand your point; you can still do > syscall interception with a custom kmod if you want, regardless of filemon. > Out of filemon, I am only interested in syscall_hijack (filemon_wrapper_install() + filemon_wrapper_deinstall()). If that can be reliable, I would keep it in src/sys/modules/example. https://nxr.netbsd.org/xref/src/sys/dev/filemon/filemon_wrapper.c#90 I have no personal interest on the rest of filemon. Switching this make(1) feature to at least ptrace(2) should be straightforward. > Maxime
Re: [filemon] CVS commit: htdocs/support/security
On 17.12.2019 09:16, Maxime Villard wrote: >> Module Name: htdocs >> Committed By: christos >> Date: Tue Dec 17 01:03:49 UTC 2019 >> >> Modified Files: >> htdocs/support/security: advisory.html index.html >> >> Log Message: >> new advisory >> >> >> To generate a diff of this commit: >> cvs rdiff -u -r1.140 -r1.141 htdocs/support/security/advisory.html >> cvs rdiff -u -r1.173 -r1.174 htdocs/support/security/index.html >> >> Please note that diffs are not public domain; they are subject to the >> copyright notices on the relevant files. > > There is something I don't understand here. Why keep this totally useless > misfeature, when we already have many tracing facilities that can do just > about the same job without having security issues? > > The recommendation in the advisory is literally to remove the kernel > module from the fs. I don't see what could possibly be the use of such a > misfeature as filemon; I would remove it completely from the kernel > source tree. > > Maxime From: http://ftp.netbsd.org/pub/NetBSD/security/advisories/NetBSD-SA2019-006.txt.asc "Additionally the way filemon does filesystem interception is racy and can lead to random crashes if the system calls are in use while the module is unloaded." Is this issue fixable? Not speaking for filemon in particular, I find this ability to rewrite the syscall table on the fly as a feature. Keeping a functional module with this property (even if disabled by default) seems useful to me.
Re: [patch] PT_GET_LWP_PRIVATE and PT_SET_LWP_PRIVATE
On 06.12.2019 06:17, Kamil Rytarowski wrote: > I have implemented l_private accessor in ptrace(2). > > By default this uses l_private from the lwp struct. > > PT_SET_LWP_PRIVATE uses lwp_setprivate() directly and this call > abstracts internally MI and MD code. > > The PT_SET_LWP_PRIVATE operation uses by default l_private from the > struct lwp, however whenever appropriate, picks MD logic that grabs this > value from a MD filed (register or pcb). > > MD code is implemented for: > - sparc, sparc64 > - alpha > - hppa > - powerpc > - sh3 > > There is included compat32 support and ATF tests. > > http://netbsd.org/~kamil/patch-00202-PT_GET_LWP_PRIVATE.txt > > This ptrace operation is inspired by PTRACE_GET_THREAD_AREA / > PTRACE_SET_THREAD_AREA from linux. > > Please review. > > After merging this, I will add an entry in man-page ptrace(2) and > implement core(5) generation of PT_GET_LWP_PRIVATE for each LWP within a > process. > Actually, we can go better here and in one go remove one of the issues in our current ptrace(2) API regarding legacy operation PT_LWPINFO (it is obsoleted by PT_GET_SIGINFO and confusing with a different PT_LWPINFO found in FreeBSD). I will come up with another and better patch. signature.asc Description: OpenPGP digital signature
[patch] PT_GET_LWP_PRIVATE and PT_SET_LWP_PRIVATE
I have implemented l_private accessor in ptrace(2). By default this uses l_private from the lwp struct. PT_SET_LWP_PRIVATE uses lwp_setprivate() directly and this call abstracts internally MI and MD code. The PT_SET_LWP_PRIVATE operation uses by default l_private from the struct lwp, however whenever appropriate, picks MD logic that grabs this value from a MD filed (register or pcb). MD code is implemented for: - sparc, sparc64 - alpha - hppa - powerpc - sh3 There is included compat32 support and ATF tests. http://netbsd.org/~kamil/patch-00202-PT_GET_LWP_PRIVATE.txt This ptrace operation is inspired by PTRACE_GET_THREAD_AREA / PTRACE_SET_THREAD_AREA from linux. Please review. After merging this, I will add an entry in man-page ptrace(2) and implement core(5) generation of PT_GET_LWP_PRIVATE for each LWP within a process. signature.asc Description: OpenPGP digital signature
Re: ptrace(2) interface for TLSBASE
On 04.12.2019 19:11, Jason Thorpe wrote: > > >> On Dec 4, 2019, at 8:47 AM, Kamil Rytarowski wrote: >> >> Today it's missing.. do we need it in core files? > > Seems like you would absolutely need it in core files, otherwise the debugger > won't know what it is when performing a post-mortem. > > If we add a ptrace accessor a'la Linux, then it fits nicely into the existing > model we have for ELF core files: > > * We also use ptrace(2) request numbers (the ones that exist in > * machine-dependent space) to identify register info notes. The > * info in such notes is in the same format that ptrace(2) would > * export that information. > > -- thorpej > OK, I will add a core(5) file addition with TLS base ("PT_GET_THREAD_AREA@nnn"). While I will also include a thread name: PT_[SG]ET_THREAD_NAME. We currently use sysctl to retrieve it from a living process, but miss in core files. Today using sysctl is more complex than it needs to be [1] and there is no direct translation to core(5) files. [1] https://github.com/llvm/llvm-project/blob/master/lldb/source/Plugins/Process/NetBSD/NativeThreadNetBSD.cpp#L158 signature.asc Description: OpenPGP digital signature
Re: ptrace(2) interface for TLSBASE
Today it's missing.. do we need it in core files? On 04.12.2019 16:56, Andrew Cagney wrote: > Where is it in a core file? > > On Tue, 3 Dec 2019 at 11:18, Kamil Rytarowski wrote: >> >> TLSBASE is stored on a selection of ports in a dedicated mcontext entry, >> out of gpregs. >> >> In the amd64 case mcontext looks like this: >> >> typedef struct { >> __gregset_t __gregs; >> __greg_t_mc_tlsbase; >> __fpregset_t__fpregs; >> } mcontext_t; >> >> The same situation is for: i386, arm, m68k and mips. >> >> This patch implements the accessors for amd64: >> >> http://netbsd.org/~kamil/patch-00199-TLSBASE.txt >> >> Does it look correct? If so I will add the same code for i386, >> i386-on-amd64, arm, m68k and mips. >> >> I'm not completely sure as l_private might be desynced with TLSBASE that >> was updated without letting the kernel know. >> >> I intend to get this change merged into -9, before 9.0. >> >> This interface pending to be introduced in GDB/NetBSD. >> >> NB. FS/GS in x86 __gpregs is confusing as it is not TLS base. >> signature.asc Description: OpenPGP digital signature
Re: ptrace(2) interface for TLSBASE
On 04.12.2019 04:10, Kamil Rytarowski wrote: > On 03.12.2019 17:55, Joerg Sonnenberger wrote: >> On Tue, Dec 03, 2019 at 05:11:49PM +0100, Kamil Rytarowski wrote: >>> TLSBASE is stored on a selection of ports in a dedicated mcontext entry, >>> out of gpregs. >> >> That's an implementation detail and IMO something we shouldn't leak >> outside the arch. Just provide an accessor for l_private. There are then >> two possible options: >> (1) An MD register is used directly for the TLS base. In that case, the >> register should be used and l_private is normally irrelevant. >> (2) No MD register is used (directly). In that case l_private is >> correct. >> >> Joerg >> > > As long as I sympathize with the idea of making it uniform and hide the > internals I think this is not the best approach here. We exactly want to > reconstruct more or less mcontext (known as user-data) on per-CPU basis. > This means that we want to leak the idea of an OS of the registers/state > on certain CPU into an application through the ptrace(2) accessors. > > The accessor is planned to be used in CPU-specific code in debuggers. We > already preserved an LLDB field for TLS in amd64 and i386 (on your > suggestion), reflecting the format of x86 mcontext. > > I don't have users pending where such uniform API would be used. Today > the only short term planned user is GDB (next, sooner or later LLDB will > gain it). There are no other recognized open-source projects that I am > aware that make use of it (except dosemu for DPMI/VM86 and recognition > of strace&rr for tracking syscalls) > > I think it can be interesting to provide PT_[SG]ET_TLS() that manages > TLS register inside mcontext/gpregs, but I don't see/have any users for > it today, so I want to avoid it, at least until the day it can have 1 > real user. > > A question is what to do with vax. As far as I can see, reading/writing > l_private is still valid for it. On the other hand there are ideas to > spare a register for TLS and break ABI with other changes, this is why I > prefer to wait with it. > > Our advantage of picking the PT_[GS]ETTLSBASE API over what we get > inside FreeBSD/Linux is that it is uniformly named regardless of > arch/compat. Linux and FreeBSD must detect arch and compat mode before > prompting either FS or GS (on x86) [1]. > > [1] > https://sources.debian.org/src/gdb/8.3.1-1/gdb/gdbserver/linux-x86-low.c/#L197 > Hm... I was too quick. I noted that there is already an interface in a newer Linux kernel: PTRACE_GET_THREAD_AREA and PTRACE_SET_THREAD_AREA. (However it was implemented only for few CPUs and not uniformly as typically in the Linux ptrace()). It has the same purpose as a generic API to access l_private and there is a user in GDB for it with IPA [2]. [2] https://suchakra.wordpress.com/2016/06/29/fast-tracing-with-gdb/ PTRACE_GET_THREAD_AREA is used in few other smaller projects (edb, custom debuggers). PTRACE_SET_THREAD_AREA is used in usermode Linux. Both interfaces (get and set) find uses. I will go for an API: PT_SET_THREAD_AREA and PT_GET_THREAD_AREA and try to target all CPUs. signature.asc Description: OpenPGP digital signature
Re: ptrace(2) interface for TLSBASE
On 03.12.2019 17:55, Joerg Sonnenberger wrote: > On Tue, Dec 03, 2019 at 05:11:49PM +0100, Kamil Rytarowski wrote: >> TLSBASE is stored on a selection of ports in a dedicated mcontext entry, >> out of gpregs. > > That's an implementation detail and IMO something we shouldn't leak > outside the arch. Just provide an accessor for l_private. There are then > two possible options: > (1) An MD register is used directly for the TLS base. In that case, the > register should be used and l_private is normally irrelevant. > (2) No MD register is used (directly). In that case l_private is > correct. > > Joerg > As long as I sympathize with the idea of making it uniform and hide the internals I think this is not the best approach here. We exactly want to reconstruct more or less mcontext (known as user-data) on per-CPU basis. This means that we want to leak the idea of an OS of the registers/state on certain CPU into an application through the ptrace(2) accessors. The accessor is planned to be used in CPU-specific code in debuggers. We already preserved an LLDB field for TLS in amd64 and i386 (on your suggestion), reflecting the format of x86 mcontext. I don't have users pending where such uniform API would be used. Today the only short term planned user is GDB (next, sooner or later LLDB will gain it). There are no other recognized open-source projects that I am aware that make use of it (except dosemu for DPMI/VM86 and recognition of strace&rr for tracking syscalls) I think it can be interesting to provide PT_[SG]ET_TLS() that manages TLS register inside mcontext/gpregs, but I don't see/have any users for it today, so I want to avoid it, at least until the day it can have 1 real user. A question is what to do with vax. As far as I can see, reading/writing l_private is still valid for it. On the other hand there are ideas to spare a register for TLS and break ABI with other changes, this is why I prefer to wait with it. Our advantage of picking the PT_[GS]ETTLSBASE API over what we get inside FreeBSD/Linux is that it is uniformly named regardless of arch/compat. Linux and FreeBSD must detect arch and compat mode before prompting either FS or GS (on x86) [1]. [1] https://sources.debian.org/src/gdb/8.3.1-1/gdb/gdbserver/linux-x86-low.c/#L197 signature.asc Description: OpenPGP digital signature
ptrace(2) interface for TLSBASE
TLSBASE is stored on a selection of ports in a dedicated mcontext entry, out of gpregs. In the amd64 case mcontext looks like this: typedef struct { __gregset_t __gregs; __greg_t_mc_tlsbase; __fpregset_t__fpregs; } mcontext_t; The same situation is for: i386, arm, m68k and mips. This patch implements the accessors for amd64: http://netbsd.org/~kamil/patch-00199-TLSBASE.txt Does it look correct? If so I will add the same code for i386, i386-on-amd64, arm, m68k and mips. I'm not completely sure as l_private might be desynced with TLSBASE that was updated without letting the kernel know. I intend to get this change merged into -9, before 9.0. This interface pending to be introduced in GDB/NetBSD. NB. FS/GS in x86 __gpregs is confusing as it is not TLS base. signature.asc Description: OpenPGP digital signature
Re: netbsd32_{,u}int64 in sys/types.h for compat/sys/siginfo.h
On 23.11.2019 17:29, Christos Zoulas wrote: > In article , > Christos Zoulas wrote: >> In article <468095c0-b973-7f23-1cfa-3dc19e3b8...@gmail.com>, >> Rin Okuyama wrote: >>> On 2019/11/22 10:56, Christos Zoulas wrote: In article <679493cf-3e85-f56d-85e4-dfaf7958a...@gmail.com>, Rin Okuyama wrote: >>> ... > This was my misunderstanding. These codes are used when tracer is 64-bit > and traced is 32-bit. Don't know whether this is useful though... Yes, and someone broke them because all the ptrace 64->32 calls for registers seem to return 0 now. Was that code changed recently? >>> ... >>> >>> I fixed it: >>> http://mail-index.netbsd.org/source-changes/2019/11/22/msg111068.html >>> >>> Now, gdb/amd64 seems to work for i386 binaries to some extent, at least. >> >> Thanks! I think that the gdb on head should working for i386 binaries >> on amd64. I am installing a new kernel and userland and I will test >> shortly. > > And it works fine for both static and dynamic binaries. We could also add > kernel debugging support for i386 kernels, but that needs changes in > the i386 header files: > > 1. Everywhere s/#include I think this is a good change anyway for all MD headers because they >expect to load their own arch stuff. MI stuff needs to stay " 2. change __vaddr_t in segments.h ld_descriptor to uint32_t >That should have no impact and it is a good change because it is a >packed structure and needs to be fixed size. > 3. Add some fixed field sizes in struct pcb instead of pointers with >#ifdef __x86_64__ (or copy struct pcb into gdb like kamil did). This >is just ugly, but I think it is better than copying the struct. I >could add a comment explaining why. > > Is it worth it? > I consider fixing this as required. > chritod > signature.asc Description: OpenPGP digital signature
Re: __{read,write}_once
On 22.11.2019 02:42, Robert Elz wrote: > Date:Fri, 22 Nov 2019 01:04:56 +0100 > From: Kamil Rytarowski > Message-ID: <1a9d9b40-42fe-be08-d7b3-e6ecead5b...@gmx.com> > > > | I think that picking C11 terminology is the way forward. > > Use a name like that iff the intent is to also exactly match the > semantics implied, otherwise it will only create more confusion. > > kre > I think this matches. We want to make operation in 'single operation' (or looking like so) and 'atomic' is a settled name in English (there are options like: 'integral', 'intrinsic', 'elemental', 'essential', 'fundamental' or 'single operation'). If there are stronger feelings against it, I would go for '__write_singleop()', '__read_singleop()'. signature.asc Description: OpenPGP digital signature
Re: __{read,write}_once
On 22.11.2019 00:53, Robert Elz wrote: > Date:Thu, 21 Nov 2019 19:19:51 +0100 > From:Maxime Villard > Message-ID: > > | So in the end which name do we use? Are people really unhappy with > _racy()? > | At least it has a general meaning, and does not imply atomicity or > ordering. > > I dislike naming discussions, as in general, while there are often a > bunch of obviously incorrect names (here for example, read_page_data() ...) > there is very rarely one obviously right answer, and it all really just > becomes > a matter of choice for whoever is doing it. > > Nevertheless, perhaps something that says more what is actually happening, > rather than mentions what doesn't matter (races here), so perhaps something > like {read,write}_single_cycle() or {read,write}_1_bus_xact() or something > along those lines ? > > kre > I think that picking C11 terminology is the way forward. It's settled probably forever now and it will be simpler to find corresponding documentation and specification of it. signature.asc Description: OpenPGP digital signature
Re: netbsd32_{,u}int64 in sys/types.h for compat/sys/siginfo.h
On 21.11.2019 10:49, Rin Okuyama wrote: > On 2019/11/20 20:12, Rin Okuyama wrote: >> On 2019/11/19 22:59, Kamil Rytarowski wrote: > ... >>> >>> We still miss compat32 support for PT_GETXMMREGS and PT_SETXMMREGS, at >>> some point of time we shall add it for completeness. >> >> Thank you! >> >> With amd64/netbsd32_machdep.c rev 1.130, all tests in t_ptrace* pass in >> COMPAT_NETBSD32 on amd64, except for that involved with XMM registers. >> I will examine how to implement PT32_[GS]ETXMMREGS. > > I wrote a draft version of patch which adds PT32_[GS]ETXMMREGS support: > > http://www.netbsd.org/~rin/amd64-PT32_GSETXMMREGS-20191121.patch > > With this patch, XMM-related tests pass for COMPAT_NETBSD32 on amd64. > > Some remarks: > > (1) PT_[GS]ETXMMREGS ptrace(2) commands are added to . > These are only used for COMPAT_NETBSD32, and not exposed to userland. > This is correct. We don't want to export XMMREGS to amd64 users. Pleae remove /* */ from this part: +/* + "PT_GETXMMREGS", \ + "PT_SETXMMREGS" + */ This will allow ktruss and related software to have meaningful form for compat32 ptracing. > (2) COMPAT_NETBSD32 codes are called from process_machdep.c via > module_hook framework. This may be too much though... > I have no opinion here. > Comments? > I will leave this to be reviewed by mgorny@. Adding him to CC. >> Also, it seems that some COMPAT_NETBSD32 related codes for amd64 need to >> be cleaned up. For example, there remain COMPAT_NETBSD32 codes in >> amd64/process_machdep.c, that are no longer used: >> >> https://nxr.netbsd.org/xref/src/sys/arch/amd64/amd64/process_machdep.c#129 >> >> https://nxr.netbsd.org/xref/src/sys/arch/amd64/amd64/process_machdep.c#183 >> >> ... >> >> I will examine them too. > > This was my misunderstanding. These codes are used when tracer is 64-bit > and traced is 32-bit. Don't know whether this is useful though... > > Thanks, > rin Maxime added it. If you want to change it, please consult with maxv@. signature.asc Description: OpenPGP digital signature
Re: Proposal: validate FFS root inode during the mount.
On 20.11.2019 18:14, Michael van Elst wrote: > n...@gmx.com (Kamil Rytarowski) writes: > >> =46rom a high level point of view, we want to reject early corrupted FS o= >> n >> a mount. Today we panic the kernel needlessly. > > > Rejecting won't help much, there are so many other parts that may be corrupt > that you cannot validate on mount. > For start we want to stop the kernel from crashing on mount. > The goal should be to gracefully handle corrupted data structures by returning > errors instead of crashing the kernel. > mbouyer@ wants to panic always, after a successful mount. I have no strong opinion except handling the corrupted data either with a panic or some error returned from the kernel. signature.asc Description: OpenPGP digital signature
Re: Proposal: validate FFS root inode during the mount.
On 20.11.2019 16:11, Mouse wrote: >> During the fuzzing of FFS filesystem, we had a couple of issues >> caused by corrupted inode fields. [...] > >> To make sure that corrupted mount won't cause harm to the user, I >> want to add function to validate root inode on mount step (after >> superblock validation) > > Don't you have more or less the same issue with every other non-free > inode in the filesystem? The only thing I can see that's special about > the root inode in this regard is that it is the only inode that is used > immediately upon mount. > From a high level point of view, we want to reject early corrupted FS on a mount. Today we panic the kernel needlessly. signature.asc Description: OpenPGP digital signature
Re: netbsd32_{,u}int64 in sys/types.h for compat/sys/siginfo.h
On 18.11.2019 13:08, Rin Okuyama wrote: > On 2019/11/18 20:15, Kamil Rytarowski wrote: >> I was thinking about something along these lines: >> >> http://netbsd.org/~kamil/patch-00196-siginfo_netbsd32_compat_uint64.txt >> >> In future some compat of i386 could use 8-byte alignment for 64-bit >> types. >> >> Not build tested. >> > > Thank you for providing the patch. It looks fine for me. I committed > it with changes below: > > - Take into account of __arm__ && __ARM_EABI__ for compat with arm-oabi > - Undefine NETBSD32_SIGINFO_UINT64_ALIGN for clarity > > Thanks! > rin Great work! Please try to run in compat32: cd /usr/tests/lib/libc/sys atf-run t_ptrace* | atf-report We still miss compat32 support for PT_GETXMMREGS and PT_SETXMMREGS, at some point of time we shall add it for completeness. signature.asc Description: OpenPGP digital signature
Re: netbsd32_{,u}int64 in sys/types.h for compat/sys/siginfo.h
On 18.11.2019 11:40, Rin Okuyama wrote: > On 2019/11/18 16:52, Kamil Rytarowski wrote: >> On 18.11.2019 05:38, Rin Okuyama wrote: >>> Thank you for your comment! >>> >>> On 2019/11/17 22:42, Kamil Rytarowski wrote: >>>> Please check it also with picotrace/truss: >>>> >>>> http://pkgsrc.se/devel/picotrace >>> >>> netbsd32_signal.c needed to catch up with kern_sig.c so that syscall >>> information is provided with SIGTRAP TRAP_SCE/TRAP_SCX. I committed >>> the fix and picotrace/truss works fine now on COMPAT_NETBSD32. >>> >> >> Thanks! I have submitted a mail how to further improve it. > > Thank you for your advice! I committed it! > >>> On 2019/11/17 22:42, Kamil Rytarowski wrote: >>>> On 17.11.2019 04:34, Rin Okuyama wrote: >>>>> Hi, >>>>> >>>>> In order to distangle circular dependency between >>>>> v.s. , >>>>> I propose >>>>> >>>>> (1) Move NETBSD32_INT64_ALIGN from to >>>>> >>>>> >>>>> (2) Move netbsd32_{,u}int64 from to >>>>> >>>>> >>>>> See attached patch for example on amd64. >>>>> >>>> >>>> What do you think about duplicating the defines of netbsd32_uint64 >>>> inside sys/compat/sys/siginfo.h + adding a comment about keeping it in >>>> sync with netbsd32.h? >>>> >>>> I think that avoiding spaghetti dependencies is a benefit. >>>> >>>> We already duplicated there _ptrace_state, removing circular >>>> dependencies between sys/ptrace.h and sys/siginfo.h. >>> >>> I don't think this is a good idea in this case. If we want to have >>> duplicate define of netbsd32_uint64, and to avoid an "#ifdef __x86_64__" >>> mess in , we need to move NETBSD32_INT64_ALIGN to >>> other than . If so, >>> why not ? >>> >> >> No need to move anything out of machine/netbsd32_machdep.h. It's >> sufficient to define netbsd32_int64 with a custom non-conflicting name >> or protect it with #ifdef before typedefing/defining twice. >> >>> Also, in my proposal, spaghetti dependencies are avoided in the end; >>> everyone depends on , and not on each other. >>> >> >> However I have no strong opinions here. I would personally avoid >> compat32 definitions in sys/types.h. >> >> Compat code tends to need hacks, so it is sensible imho to restrict them >> to compat headers (I am aware that it's not always followed). > > I agree with you on that compat codes should not be put into sys headers > as far as possible. However, I still do not understand what you mean. > > (1) NETBSD32_INT64_ALIGN == __attribute__((__aligned__(4))) is in > , and > > (2) needs > > Therefore, we cannot use NETBSD32_INT64_ALIGN in . > Of course, we can typedef _netbsd32_uint64 in like: > > typedef uint64_t _netbsd32_uint64 > #ifdef __x86_64__ > __attribute__((__aligned__(4))) > #endif > ; > > Do you think it is OK? I guess that I miss some elementary things... > Sorry in advance ;-). > > Thanks, > rin I was thinking about something along these lines: http://netbsd.org/~kamil/patch-00196-siginfo_netbsd32_compat_uint64.txt In future some compat of i386 could use 8-byte alignment for 64-bit types. Not build tested. signature.asc Description: OpenPGP digital signature
Re: netbsd32_{,u}int64 in sys/types.h for compat/sys/siginfo.h
On 18.11.2019 05:38, Rin Okuyama wrote: > Thank you for your comment! > > On 2019/11/17 22:42, Kamil Rytarowski wrote: >> Please check it also with picotrace/truss: >> >> http://pkgsrc.se/devel/picotrace > > netbsd32_signal.c needed to catch up with kern_sig.c so that syscall > information is provided with SIGTRAP TRAP_SCE/TRAP_SCX. I committed > the fix and picotrace/truss works fine now on COMPAT_NETBSD32. > Thanks! I have submitted a mail how to further improve it. > On 2019/11/17 22:42, Kamil Rytarowski wrote: >> On 17.11.2019 04:34, Rin Okuyama wrote: >>> Hi, >>> >>> In order to distangle circular dependency between >>> v.s. , >>> I propose >>> >>> (1) Move NETBSD32_INT64_ALIGN from to >>> >>> >>> (2) Move netbsd32_{,u}int64 from to >>> >>> >>> See attached patch for example on amd64. >>> >> >> What do you think about duplicating the defines of netbsd32_uint64 >> inside sys/compat/sys/siginfo.h + adding a comment about keeping it in >> sync with netbsd32.h? >> >> I think that avoiding spaghetti dependencies is a benefit. >> >> We already duplicated there _ptrace_state, removing circular >> dependencies between sys/ptrace.h and sys/siginfo.h. > > I don't think this is a good idea in this case. If we want to have > duplicate define of netbsd32_uint64, and to avoid an "#ifdef __x86_64__" > mess in , we need to move NETBSD32_INT64_ALIGN to > other than . If so, > why not ? > No need to move anything out of machine/netbsd32_machdep.h. It's sufficient to define netbsd32_int64 with a custom non-conflicting name or protect it with #ifdef before typedefing/defining twice. > Also, in my proposal, spaghetti dependencies are avoided in the end; > everyone depends on , and not on each other. > However I have no strong opinions here. I would personally avoid compat32 definitions in sys/types.h. Compat code tends to need hacks, so it is sensible imho to restrict them to compat headers (I am aware that it's not always followed). > Thanks, > rin signature.asc Description: OpenPGP digital signature
Re: netbsd32_{,u}int64 in sys/types.h for compat/sys/siginfo.h
On 17.11.2019 04:34, Rin Okuyama wrote: > Hi, > > In order to distangle circular dependency between > v.s. , > I propose > > (1) Move NETBSD32_INT64_ALIGN from to > > > (2) Move netbsd32_{,u}int64 from to > > > See attached patch for example on amd64. > What do you think about duplicating the defines of netbsd32_uint64 inside sys/compat/sys/siginfo.h + adding a comment about keeping it in sync with netbsd32.h? I think that avoiding spaghetti dependencies is a benefit. We already duplicated there _ptrace_state, removing circular dependencies between sys/ptrace.h and sys/siginfo.h. > Background is: > > Now, gdb for i386 does not work again on amd64 (both on -current and > netbsd-9) with: > > ptrace: Invalid arguments. > > This is because sizeof(struct netbsd32_ptrace_siginfo) is 128+4+4=136 > on amd64 whereas sizeof(struct ptrace_siginfo) is 128+4=132 on i386; > netbsd32_ptrace_siginfo has uint64_t members via siginfo32_t via > __ksiginfo32 since compat/sys/siginfo.h rev 1.5: > > http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/compat/sys/siginfo.h#rev1.5 > > As a result, netbsd32_ptrace_siginfo requires 4-byte tail padding on > amd64. However, tail padding does not appear on i386, since 64-bit > objects only need 4-byte alignment on i386. > > Actually, gdb/i386 becomes sane with this hack: > Please check it also with picotrace/truss: http://pkgsrc.se/devel/picotrace > > Index: sys/compat/sys/siginfo.h > === > RCS file: /home/netbsd/src/sys/compat/sys/siginfo.h,v > retrieving revision 1.8 > diff -p -u -r1.8 siginfo.h > --- sys/compat/sys/siginfo.h 30 Sep 2019 21:13:33 - 1.8 > +++ sys/compat/sys/siginfo.h 12 Nov 2019 11:04:58 - > @@ -34,6 +34,12 @@ > > #ifdef _KERNEL > > +typedef uint64_t _args_t > +#ifdef __x86_64__ > + __attribute__((__aligned__(4))) > +#endif > + ; > + > typedef union sigval32 { > int sival_int; > uint32_t sival_ptr; > @@ -73,7 +79,7 @@ struct __ksiginfo32 { > int _sysnum; > int _retval[2]; > int _error; > - uint64_t _args[8]; /* SYS_MAXSYSARGS */ > + _args_t _args[8]; /* SYS_MAXSYSARGS */ > } _syscall; > > struct { > > > We provide netbsd32_uint64 for this purpose: > > typedef uint64_t netbsd32_uint64 NETBSD32_INT64_ALIGN; > > and NETBSD32_INT64_ALIGN is __attribute__((__aligned__(4))) on amd64. > However, unfortunately, NETBSD32_INT64_ALIGN is defined in > , and requires > . > > Thoughts? Any comments or objections? > > Thanks, > rin signature.asc Description: OpenPGP digital signature
Re: __{read,write}_once
On 16.11.2019 15:31, Mindaugas Rasiukevicius wrote: > Maxime Villard wrote: >> Alright so let's add the macros with volatile (my initial patch). Which >> name do we use? I actually like __{read,write}_racy(). > > I suggest __atomic_load_relaxed()/__atomic_store_relaxed(). If these > are to be provided as macros, then I also suggest to ensure that they > provide compiler-level barrier. > I'm in favor of this naming. signature.asc Description: OpenPGP digital signature
Re: __{read,write}_once
On 08.11.2019 13:40, Mindaugas Rasiukevicius wrote: >>> There is more code in the NetBSD kernel which needs fixing. I would say >>> pretty much all lock-free code should be audited. >> I believe KCSAN can greatly help with that, since it automatically reports >> concurrent accesses. Up to us then to switch to atomic, or other kinds of >> markers like READ_ONCE. > Is there a CSAN i.e. such sanitizer for userspace applications? No.. there is TSan. If it would be useful to get CSan in userspace it probably shouldn't be too difficult to write it. At least full TSan is heavy and requires 64bit CPU. signature.asc Description: OpenPGP digital signature