Re: Proposal: validate FFS root inode during the mount.
Kirk has added a bunch of checksum and other integrity checks to FreeBSD. If you are looking for extra sanity, that might be a good place to snag code from. They would be more comprehensive than just checking the root node... Warner On Wed, Nov 20, 2019, 8:39 AM Mouse wrote: > >>> To make sure that corrupted mount won't cause harm to the user, I > >>> want to add function to validate root inode on mount [...] > >> Don't you have more or less the same issue with every other non-free > >> inode in the filesystem? > > I think the point is, when the root inode is corrupted, you can't > > unmount then filesystem. > > If that were the problem, I'd expect the fix to be support for forcibly > unmounting filesystems even when they're in bizarre states like that. > Arguably that is something that should go in anyway. I long ago added > a flag to umount(8) > > -R Take the special | node argument as a path to be passed > directly > to unmount(2), bypassing all attempts to be smart about > mechani- > cally determining the correct path from the argument. This > option is incompatible with any option that potentially > unmounts > more than one filesystem, such as -a, but it can be used with > -f > and/or -v. This is the only way to unmount something that > does > not appear as a directory (such as a nullfs mount of a plain > file); there are probably other cases where it is necessary. > > Could that be suitable for dealing with the "can't unmount" aspect, or > is there kernel work needed too? The initial post indicates that there > is crasher behaviour involved, though it's not clear to what extent > it's directly related to the "can't unmount" syndrome - the post says > it can't be unmounted, but blames umount, not unmount, so it's not > clear to me whether that's userland's fault or not. > > /~\ The ASCII Mouse > \ / Ribbon Campaign > X Against HTMLmo...@rodents-montreal.org > / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B >
Re: Adding an ioctl to check for disklabel existence
On Thu, Oct 3, 2019 at 9:19 AM Robert Elz wrote: > Now it makes no sense at all. > FreeBSD made the explicit decision when disks were sneaking up on 2TB to move to GPT labels. Why invent a new scheme that interoperates poorly with other things? GPT, for better or worse, won. disklabel64 would add no value over GPT, require a lot of extra code and be an ongoing source of confusion and difficulty for our users. This is why UFS2 didn't bring in a 64-bit disklabel format... NetBSD is, of course, free to do what it likes. My semi-outsider's view suggests, though, that the FreeBSD experience is relevant and timely. Warner
Re: re-enabling debugging of 32 bit processes with 64 bit debugger
On Sat, Jun 29, 2019 at 2:04 PM Christos Zoulas wrote: > In article zccfacpsbmoz-4xguf54n...@mail.gmail.com>, > Andrew Cagney wrote: > > > >Having 32-bit and 64-bit debuggers isn't sufficient. Specifically, it > >can't handle an exec() call where the new executable has a different > >ISA; and this imnsho is a must have. > > It is really hard to make a 32 bit debugger work on a 64 bit system > because of the tricks we play with the location of the shared > libraries in rtld and the debugger needs to be aware of them. > In retrospect it would have been simpler (and uglier) to have > /32 and /64 in all the shared library paths so that they would > not occupy the filesystem space, but even then things could break > for raw dlopen() calls, or opening other data files that are not > size neutral. HP/UX with Context Dependent Files and IBM/AIX with > Hidden Directories were attempts to a solution, but they created > a new dimension of problems. > I came to a similar conclusion when I hacked FreeBSD rtld to grok the difference between hard and soft float on the same system at the same time. Warner
Re: Removing PF
On Sun, Mar 31, 2019, 8:13 AM Sevan Janiyan wrote: > > > On 30/03/2019 23:27, Mouse wrote: > > (Rule of thumb: anyone who calls something "secure" or > > "insecure" without giving any indication of the threat model in > > question either doesn't understand security or hopes you don't; neither > > alternative is good. It's not universally applicable - here, for > > example, I suspect you were just being a bit over-brief - but it's been > > remarkably useful to me.) > > > Deeming it insecure on that basis of all the bug fixes upstream have > which haven't been merged in our tree since our last sync including > published patches from around this point onwards: > https://www.openbsd.org/errata42.html both of which need to be evaluated > to see if applicable. > Also on the basis of nobody doing this for years, I'd say this is prime evidence for there being no effective maintainer for years. Warner >
Re: Removing PF
On Sat, Mar 30, 2019, 2:29 PM Maxime Villard wrote: > Le 30/03/2019 à 20:26, Michael van Elst a écrit : > > On Sat, Mar 30, 2019 at 08:10:21PM +0100, Maxime Villard wrote: > > > >> ... sure, meanwhile you didn't really answer to the core of the issue, > which > >> I think was stated clearly by Sevan ... > > > > The issue is that we need to work on npf before we can drop other code. > > ... the questions raised were: why would someone use an insecure firewall? > ... > and isn't it irresponsible to provide an insecure firewall? ... you still > fail to answer ... I see fewer and fewer reasons to keep talking to you, > given your clear inability to answer in good faith ... > Also, this is a plan to depreciate, not remove from the tree tomorrow. Declaring it for all to see that it is a rotting, festering caucus is a good thing. Maybe, it will spur someone to fix that. Extremely unlikely, but possible. It does let the users know with enough time to migrate and/or enhance npf to meet their needs. It starts to break the log jam that has lead to three under supported firewalls in the tree. Warner >
Re: Regarding the ULTRIX and OSF1 compats
Picking a random message in this thread to respond to. FreeBSD has struggled with deprecation as well (which is what this is). I'm working on a doc to help there, but the basic criteria are: 1. What is the cost to keep it. Include the API change tax here. 2. What is the benefit the project gets from it. How many people use THING and how much "good" do we get out of this. 3. Is the THING working for anything non-trivial? 4. Is there someone actively looking after THING? It's basically nothing more than a cost-benefit analysis. In the case of COMPAT_ULTRIX (which is not going away) you'd get: 1. Cost is low, though not zero. It's a thin veneer over stuff the system would have anyway. 2. Some people are still running Ultrix binaries. 3. As far as has been reported, it's useful for non-trivial binaries. 4. Nobody is really looking after it, but there's enough use to generate bug fixes. So on the whole, there's some benefit at a modest cost to keeping a feature that's basically working. Keep is a decent decision. In the case of COMPAT_OSF (which some would like to be removed): 1. Cost is relatively high, as there's parts we'd not have in a normal system (MACH features missing, must make API changes blind, no way to test) 2. Nobody has reported OSF binaries in recent memory, though some used it years ago (it was quite important in the 90s for alpha bring up). 3. It's basically broken. Non-trivial binaries are impossible because of the missing bits. 4. No one is looking after it. Which is all negative: there's no benefit for something that's not known to be working, and even if it was working it's incomplete for a user base of zero with no maintainer. Add to that that since there's no good way to test, the work to keep it compiling is make-work: it's a box to tick that provides no benefit other than ticking the box. Seems like a clear and compelling case to me, but my involvement with NetBSD is too tangential for me to strongly advocate for that. Anyway, my suggestion is that if there's this much contention for a removal, I'd suggest coming up with a set of reasonable criteria people can agree on that help focus the discussion on cost / benefit rather than some of the more esoteric philosophical arguments I've seen in the thread which feel good, but put a lot of work on others to generate that good feeling. Warner
Re: nandemulator
On Sun, Feb 24, 2019, 11:33 AM David Holland wrote: > On Sat, Feb 23, 2019 at 02:05:39PM -0700, Warner Losh wrote: > > On Sat, Feb 23, 2019 at 12:40 PM David Holland < > dholland-t...@netbsd.org> > > wrote: > > > > > Do we have docs for the object nandemulator is supposed to be > > > emulating? Some questions have arisen about how complete it is and > > > nobody I've talked to seems to really have answers. > > > > So looking at the code... > >[...] > > I know these aren't definitive answers as I didn't write the code and am > > basing this on briefly studying the code + the knowledge I picked up > about > > NAND while working with planar SLC and MLC NAND in the 34nm to 19nm > > technology nodes for Intel, Micron and Toshiba. So in the absence of > other > > answers, mine may be OK. However, I'd be happy to defer to someone who > > wrote the code and/or did a comparison of commands vs datasheets from > that > > era. > > I think you underestimate how much the rest of us don't know :-) > > Many thanks -- that is definitely enough information to sort things > out, and I'd had no idea even where to begin looking. > I'm happy to fill in more details. I worked at FusionIO for their third and forth generation of cards doing tweaks to thresholds to optimize read performance and reliability... I forget what the baseline for most people is :) I had thought about saying "just a lot of old stuff from the early 2000s," but that seemed to be too vague. But seriously. I'm happy to help in any way I can. Warner >
Re: nandemulator
On Sat, Feb 23, 2019 at 12:40 PM David Holland wrote: > Do we have docs for the object nandemulator is supposed to be > emulating? Some questions have arisen about how complete it is and > nobody I've talked to seems to really have answers. > So looking at the code... It's an ONFI emulator. That mens Intel/Micron parts (as opposed to the so-called 'Toggle' parts from Toshiba/Samsung). It's from 2011, so it can't be emulating anything newer than 30ish nm processes. The limited command set suggests that it's emulating just SLC parts. It uses bogus manufacturer data, and a place holder name (NANDEMULATOR made by NetBSD), so I doubt there's a specific model used here. It hard codes a 32MB device with 2k pages, which suggests an even older device (45nm SLC generation maybe). It emulates things at a much lower level than FreeBSD's nandsim, it would appear, but I've not studied either more than briefly for this email. This is in keeping with the other nand_*.c files in that directory. They are for parts like the Micron MT29F2G08AAC and such. These date from 2005 to 2008 if I can believe the quick sample of data sheets that I found. The list of supported commands is approximately that of the emulator for the Micron part. No mention is made of MLC or TLC, which usually indicates on the older parts they are SLC. MLC and TLC parts sometimes have additional features / commands required (or sometimes just desired) for coping with partial page programming, etc. It doesn't look super complete to my eye. But it's been 6 years since I was building NAND based PCIe storage devices for a living. I know these aren't definitive answers as I didn't write the code and am basing this on briefly studying the code + the knowledge I picked up about NAND while working with planar SLC and MLC NAND in the 34nm to 19nm technology nodes for Intel, Micron and Toshiba. So in the absence of other answers, mine may be OK. However, I'd be happy to defer to someone who wrote the code and/or did a comparison of commands vs datasheets from that era. Warner
Re: scsipi: physio split the request
On Fri, Dec 28, 2018, 11:04 AM Warner Losh > > On Fri, Dec 28, 2018, 1:25 AM matthew green >> > Of course larger transfers would also mitigate the overhead for each I/O >> > operation, but we already do several Gigabyte/s with 64k transfers and >> > filesystem I/O tends to be even smaller. >> >> yes - the benefits will be in the 0-10% range for most things. it >> will help, but only a fairly small amount, most of us won't notice. >> >> i've seen peaks of 1.4GB/s with an nvme(4) device with ffs on top. >> > > > I've seen 3.3GB/s of 128k-512k transfers on FreeBSD off of nvme, but > that's mostly video. It seems to be limited there not so much by transfer > size, but by the ability to queue transactions. We see <1% by raising > MAXPHYS to 1MB over the default 128k there. > Also, we are limited by what the device itself can do which varies a lot by drive. From a low of 1GB/s to a high of just under 3.4GB/s. Warner >
Re: scsipi: physio split the request
On Fri, Dec 28, 2018, 1:25 AM matthew green > Of course larger transfers would also mitigate the overhead for each I/O > > operation, but we already do several Gigabyte/s with 64k transfers and > > filesystem I/O tends to be even smaller. > > yes - the benefits will be in the 0-10% range for most things. it > will help, but only a fairly small amount, most of us won't notice. > > i've seen peaks of 1.4GB/s with an nvme(4) device with ffs on top. > I've seen 3.3GB/s of 128k-512k transfers on FreeBSD off of nvme, but that's mostly video. It seems to be limited there not so much by transfer size, but by the ability to queue transactions. We see <1% by raising MAXPHYS to 1MB over the default 128k there. Warner >
Re: svr4, again
On Thu, Dec 20, 2018, 6:17 PM Maxime Villard Le 20/12/2018 à 18:11, Kamil Rytarowski a écrit : > > https://github.com/krytarowski/franz-lisp-netbsd-0.9-i386 > > > > On the other hand unless we need it for bootloaders, drivers or > > something needed to run NetBSD, I'm for removal of srv3, sunos etc > compat. > > Yes. > > So, first things first, and to come back to my email about ibcs2: what are > the reasons for keeping it? As I said previously, this is not for x86 but > for Vax. As was also said, FreeBSD removed it just a few days ago. > It had been disconnected from the build for a while too... Warner I'm bringing up compat_ibcs2 because I did start a thread on port-vax@ about > it last year (as quoted earlier), and back then it seemed that no one knew > what was the use case on Vax. >
Re: svr4, again
On Wed, Dec 19, 2018 at 4:38 PM wrote: > On Wed, Dec 19, 2018 at 11:01:27AM -0700, Warner Losh wrote: > > FreeBSD ditched SYSV maybe 2 years ago, but > > we still have IBCS in the tree because people are still using it (last we > > checked) and bug fixes / reports are still trickling in... > > > > Which is a long way of saying 'be careful' :) > > > > Warner > > That statement lasted all of a few hours. > > https://v4.freshbsd.org/commit/freebsd/src/342242 I had no idea this was going to happen so quickly... Warner
Re: Support for tv_sec=-1 (one second before the epoch) timestamps?
On Sat, Dec 15, 2018, 1:17 PM Mouse > Might I suggest that the obvious solution to this, and probably a > > host of other issues, is to make time_t an always negative number > > (negint/neglong?) and redefine the epoch as 03:14:09 UTC on Tuesday, > > 19 January 2038, > > While it's academic as far as this thread is concerned, you can get > much the same effect by making time_t a (positive) unsigned value and > redefining the epoch to be 1901-12-13 20:45:54 UTC. > No, it is not. That breaks the naive seconds since 1970 to/from broken down time that is specified in the standard (which is a fatal flaw since it assigns no unique value to leap seconds, pretending that they don't exist). Warner > But, if you're going to redefine the epoch, there are a whole lot of > options available. > > /~\ The ASCII Mouse > \ / Ribbon Campaign > X Against HTMLmo...@rodents-montreal.org > / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B >
Re: Support for tv_sec=-1 (one second before the epoch) timestamps?
On Thu, Dec 13, 2018, 9:41 AM > > > On Dec 13, 2018, at 6:06 AM, Martin Husemann wrote: > > > > > > [EXTERNAL EMAIL] > > > > On Thu, Dec 13, 2018 at 03:29:03AM +, David Holland wrote: > >> On Wed, Dec 12, 2018 at 10:27:04PM +0100, Joerg Sonnenberger wrote: > >>> On Wed, Dec 12, 2018 at 08:46:33PM +0100, Micha? G?rny wrote: > While researching libc++ test failures, I've discovered that NetBSD > suffers from the same issue as FreeBSD -- that is, both the userspace > tooling and the kernel have problems with (time_t)-1 timestamp, > i.e. one second before the epoch. > >>> > >>> I see no reason why that should be valid or more general, why any > >>> negative value of time_t is required to be valid. > >> > >> Are you Dan Pop? :-) > > > > Not sure about that, but I agree that we should not extend the range of > > time_t (aka "seconds since the epoch") to negative values. It is a > pandora > > box, keep it closed. > > > > Martin > > You could certainly make that restriction. On the other hand, the TZ > project maintains timezone offset rules for times long before the epoch. > Those time stamps, and the rules for processing them, are well defined. At > least until you get far enough back that Gregorian vs. Julian calendar > becomes a consideration. > Doesn't matter. The meaning of negative time_t values is implementation defined. Some implementations have it well defined, but there are problems due to things like the return value of time(). Is it convenient for values before the epoch to be well defined? Sure. But it's not required by the standards. I would posit that makes the specific test that kicked off this thread invalid. Warner >
Re: Support for tv_sec=-1 (one second before the epoch) timestamps?
On Wed, Dec 12, 2018 at 2:34 PM Joerg Sonnenberger wrote: > On Wed, Dec 12, 2018 at 08:46:33PM +0100, Michał Górny wrote: > > While researching libc++ test failures, I've discovered that NetBSD > > suffers from the same issue as FreeBSD -- that is, both the userspace > > tooling and the kernel have problems with (time_t)-1 timestamp, > > i.e. one second before the epoch. > > I see no reason why that should be valid or more general, why any > negative value of time_t is required to be valid. > Does (time_t)-1 mean a date in 1969 (int)? Or does it mean a date in 2106 (uint32_t)? Or is it a date in 584942419325 (uint64_t)? I don't think that POSIX actually says anything about the right answer. All I could find about what time_t is the vague statement: "time_t Used for time in seconds." and the following from the time() call: 'Upon successful completion, *time*() shall return the value of time. Otherwise, (time_t)-1 shall be returned.' This strongly implies, to my mind, that -1 is not a valid time_t. Warner
Re: Missing compat_43 stuff for netbsd32?
On Tue, Sep 11, 2018, 5:48 PM Brad Spencer wrote: > Eduardo Horvath writes: > > > On Tue, 11 Sep 2018, Paul Goyette wrote: > > > >> While working on the compat code, I noticed that there are a few old > >> syscalls which are defined in syc/compat/netbsd323/syscalls.master > >> with a type of COMPAT_43, yet there does not exist any compat_netbsd32 > >> implementation as far as I can see... > >> > >> #64 ogetpagesize > >> #84 owait > >> #89 ogetdtablesize > >> #108osigvec > >> #142ogethostid (interestingly, there _is_ an implementation > >> for osethostid!) > >> #149oquota > >> > >> Does any of this really matter? Should we attempt to implement them? > > > > I believe COMPAT_43 is not NetBSD 4.3 it's BSD 4.3. Anybody have any > old > > BSD 4.3 80386 binaries they still run? Did BSD 4.3 run on an 80386? > Did > > the 80386 even exist when Berkeley published BSD 4.3? > > > > It's probably only useful for running ancient SunOS 4.x binaries, maybe > > Ultrix, Irix or OSF-1 depending on how closely they followed BSD 4.3. > > > > Eduardo > > > It has been a very long time since I did this, and I may not remember > correctly, but I believe that COMPAT_43 is needed on NetBSD/i386 to run > BSDI binaries. I remember using the BSDI Netscape 3.x binary back in > the day and I think it was required. > FreeBSD does too... net2 was closer to 4.3 system calls for many things than 4.4. Warner >
Re: Missing compat_43 stuff for netbsd32?
On Tue, Sep 11, 2018, 4:38 PM Thor Lancelot Simon wrote: > There can be a lot of value to being able to run really old executables, > but you need the right customer in the right state of utter desperation... > I'm writing a COMPAT_V7 right now to celebrate Unix 50 next year. To be difficult, this is really COMPAT_VENIX for an old 8088 v7 port that I have some history with. There is a wait call there mentioned elsewhere in the thread. I want it mostly so I can run the compiler on something fast... Maybe not desperation, but at least a little crazy .. it's not clear even a kernel module is the right path since qemu userland might be easier... Warner >
Re: new errno ?
On Sat, Jul 7, 2018, 11:43 AM Jason Thorpe wrote: > > > On Jul 6, 2018, at 2:49 PM, Eitan Adler wrote: > > For those interested in some of the history: > https://lists.freebsd.org/pipermail/freebsd-hackers/2003-May/000791.html > > > ...and the subsequent thread went just as I expected it might. Sigh. > > Anyway... in what situations is this absurd error code used in the 802.11 > code? > ENOTTY is best for how 802.11 uses it. Warner EFAULT seems wrong because it means something very specific. Actually, > that brings me to a bigger point... rather than having a generic error code > for "lulz I could have panic'd here, heh", why not simply return an error > code appropriate for the situation that would have otherwise resulted in > calling panic()? There are many to choose from :-) > > -- thorpej > >
Re: new errno ?
On Fri, Jul 6, 2018, 2:10 PM Greg Troxel wrote: > > Phil Nelson writes: > > > Hello, > > > > In working on the 802.11 refresh, I ran into a new errno code from > FreeBSD: > > > > #define EDOOFUS 88 /* Programming error */ > > > > Shall we add this one? (Most likely with a different number since > 88 is taken > > in the NetBSD errno.h.) > > > >I could use EPROTO instead, but > > My immediate reaction is not to add it. It's pretty clearly not in > posix, unlikely to be added, and sounds unprofessional. Poul-Henning added it to differentiate between potentially valid but not in this combo (EINVAL or EFAULT) and args that are clearly programming errors (EDOOFUS), but in code that couldn't just panic. It seems like it would be used in cases where there is a KASSERT in the > non-DIAGNOSTIC case. I might just map it to EFAULT or EINVAL. > Not a terrible choice. Warner >
Re: QEMU/NetBSD status wiki page
On Sun, May 27, 2018 at 11:57 AM, Kamil Rytarowski <n...@gmx.com> wrote: > On 27.05.2018 16:53, Warner Losh wrote: > > > > > > On Sun, May 27, 2018 at 4:05 AM, Kamil Rytarowski <n...@gmx.com > > <mailto:n...@gmx.com>> wrote: > > > > As requested, I've prepared a QEMU/NetBSD status page: > > > > http://wiki.netbsd.org/users/kamil/qemu/ > > <http://wiki.netbsd.org/users/kamil/qemu/> > > > > I've attempted to be rather conservative with claims that something > > works, without detailed verification. > > > > > > FreeBSD has a complete QEMU user-mode implementation in a branch right > > now. It's sufficiently advanced we build all our arm, arm64 and mips > > packages using it. What's in upstream QEMU is totally, totally broken. > > The work breaks things down so the common BSD could be shared. Starting > > from that base would be a huge leg up to getting things working. > > > > Thank you for the feedback. > > I would like to stress that In my point of view - whether bluetooth or > vde is 100% functional - doesn't really matter in the context of: > > - user mode emulation > - hardware assisted virtualization > - virtio > - vhost > - device passthrough > > Once that will work well, getting this or that library for compression > of images of GUI is a matter packaging in tools. > > We can consider whether to collect the native kernel implementation of > nbd from Bitrig, as it was required for at least a single ARM evaluation > board in a bootstrap/booting process. > > > The HQEMU project can be very useful for releng, as we can boost > emulation of e.g. ARM by a factor of 3-20x on a amd64 host (exact boost > times depend on the type of executed code), run the tests more quickly > and save precious time and CPU cycles. Haven't investigated that. > > > I'm in the process of getting it upstream. FreeBSD's branch is a royal > > mess that has all the usual problem with a git branch that has lots of > > merges applied: it had become almost impossible to rebase. I've sorted > > most of that out, and am now sorting out collapsing down all the bug > > fixes and/or qemu API changes that happened over the years so each > > change in my branch is buildable. That should land this summer, maybe in > > time for 3.0, but maybe not. > > > > How close is this code to linux-user? I think that maintaining a concept > of bsd-user in 2018 is obsolete, new code in one BSD can be closer to > Linux or Solaris than other BSDs. > I'm not sure I follow this logic at all. The BSDs share a base that's quite similar, even if new bits aren't similar. Have you looked at the code I'm upstreaming? See the bsd-user branch in https://github.com/seanbruno/qemu-bsd-user for details. It actually works today, so it's not obsolete. It might be better not shared, but since that doesn't exist today, I can't judge those efforts. > Ideally we should go for [unix-]user shared between Linux and BSDs, add > OS specific differences in dedicated {linux,freebsd,netbsd}-user, > splitting NetBSD and FreeBSD. > I used to think that but no longer. There's a lot of code to deal with threading and vm differences that insinuates itself into a lot of code. I'm not so sure that sharing between Linux and anything else is really all that sane, though there's some commonality. Without substantial changes in upstream behavior, it will also result in lots of breakage as the code velocity there is fast and often times the changes made are no good for BSD. > For now please ignore NetBSD code in this upstreaming process. > I'm upstreaming exactly what we have, which moves the current netbsd/opensd to their own subdirectory of bsd-user. The code in upstream is currently totally broken, and this won't break it any more. My efforts are to push up the code we have today that works really really well and nothing further. Any cross-bsd or pan-unix efforts will post-date my upstreaming since those do not exist today. Warner
Re: QEMU/NetBSD status wiki page
On Sun, May 27, 2018 at 4:05 AM, Kamil Rytarowskiwrote: > As requested, I've prepared a QEMU/NetBSD status page: > > http://wiki.netbsd.org/users/kamil/qemu/ > > I've attempted to be rather conservative with claims that something > works, without detailed verification. > FreeBSD has a complete QEMU user-mode implementation in a branch right now. It's sufficiently advanced we build all our arm, arm64 and mips packages using it. What's in upstream QEMU is totally, totally broken. The work breaks things down so the common BSD could be shared. Starting from that base would be a huge leg up to getting things working. I'm in the process of getting it upstream. FreeBSD's branch is a royal mess that has all the usual problem with a git branch that has lots of merges applied: it had become almost impossible to rebase. I've sorted most of that out, and am now sorting out collapsing down all the bug fixes and/or qemu API changes that happened over the years so each change in my branch is buildable. That should land this summer, maybe in time for 3.0, but maybe not. Warner
Re: Kernel module framework status?
On Sat, May 5, 2018, 4:17 AMwrote: > If someone wants to do this route of metadata, please consider the > addition of a metadata property "should this be auto loaded". > > Currently we have ad-hoc logic for some modules that might be auto > loaded (compat_...) and it'd probably be cleaner to do this. > In FreeBSD, I generally converted the ad hoc logic to tables and made sure the metadata mini language was expressive enough to cope. Warner >
Re: Kernel module framework status?
On Fri, May 4, 2018 at 12:32 AM, John Nemethwrote: > On May 3, 10:54pm, Mouse wrote: > } > } > There is also the idea of having a module specify the device(s) > } > it handles by vendor:product > } > } Isn't that rather restrictive in what buses it permits supporting? > > I suppose that other types of identifiers could be used. > > } Indeed, PCI (and close relatives, like PCIe) and USB are the only > } things I can name offhand that even _have_ vendor:product. (Of course, > } I'm sure there are lots of buses out there I've never heard of, or > } don't know enough about.) > > Only buses where the devices are identified would work. For > buses like ISA where you have to probe the devices, it would not > be workable. > Don't forget that ISA buses have ISAPNP as an option, so it's more of a mixed bus. But yea, this can old work on self-enumerating, self-identifying buses. > }-- End of excerpt from Mouse > Warner
Re: Kernel module framework status?
On Thu, May 3, 2018 at 8:54 PM, Mousewrote: > > There is also the idea of having a module specify the device(s) > > it handles by vendor:product > > Isn't that rather restrictive in what buses it permits supporting? > > Indeed, PCI (and close relatives, like PCIe) and USB are the only > things I can name offhand that even _have_ vendor:product. (Of course, > I'm sure there are lots of buses out there I've never heard of, or > don't know enough about.) > FreeBSD's modules have metadata. Some of this metadata can describe "plug and play" tables the drivers use to match devices. FreeBSD's newbus has a method to get the textual representation of this "plug and play" data. Combined, I wrote devmatch to sort through the unattached devices matching their plug and play data to modules to get a list of modules to load. I'll be presenting a talk on this at BSDcan next month... Warner
Re: Spectre
On Thu, Jan 18, 2018 at 7:58 AM,wrote: > > > > On Jan 18, 2018, at 9:48 AM, Mouse wrote: > > > >> Since this involves a speculative load that is legal from the > >> hardware definition point of view (the load is done by kernel code), > >> this isn't a hardware bug the way Meltdown is. > > > > Well, I'd say it's the same fundamental hardware bug as meltdown, but > > not compounded by an additional hardware property (which I'm not sure I > > would call a bug) which is made much worse by the actual bug. > > > > To my mind, the bug here is that annulling spec ex doesn't annul _all_ > > its effects. That, fundamentally, is what's behind both spectre and > > meltdown. In meltdown it's exacerbated by spec ex's failure to check > > permissions fully - but if the side effects were annulled correctly, > > even that failure wouldn't cause trouble. > > That's true. But the problem is that cache fill is only the most > obvious and easiest to exploit side channel. There are others, such > as timing due to execution units being busy, that are harder to exploit > but also harder to cure. It seems to me that blocking all observable > side effects of speculative execution can probably only be done by > disabling speculative execution outright. That clearly isn't a good > thing. The Spectre fixes all amount to a speculative barrier, which > will do the job just as well (though it requires code change). The > Meltdown fix is more obvious: don't omit mode dependent access checks > before launching a speculative load, as most CPU designers already did. > One difficulty with caches: You'd have to re-cache what you eject, otherwise there's an observable effect. That's the whole point of this family of attacks: the micro architecture does something that you can observe that you'd normally not be able to observe. It's really really hard to not leak any side-channel data at all. Side channel has become the new buffer overflow. Warner
Re: virtual to physical memory address translation
On Mon, Jan 15, 2018 at 8:09 AM, John Nemethwrote: > On Jan 15, 2:04pm, Michael van Elst wrote: > } m...@netbsd.org (Emmanuel Dreyfus) writes: > } > } >Sorry if that has been covered ad nauseum, but I canot find relevant > } >information about that: on NetBSD, how can I get the physical memory > } >address given a virtual memory address? This is to port the Linux > } >Meltdown PoC so that we have something to test our systems against. > } > } pmap_extract() returns the physical address of a virtual address. > } pmap_kernel() gives you the kernel map. > > I suspect that he wants to do this from userland. > You have to walk the page tables, or trick some driver into leaking this information somehow. There's no standard interface to get it. In FreeSBD there's no standard interface, but that hasn't stopped people from getting metldown and spectre working, though I don't think they have shared that PoC code. Warner
Re: Reading a DDS tape with 2M blocks
On Jan 9, 2018 3:59 PM, "Greg Troxel"wrote: Edgar Fuß writes: > I have a DDS tape (written on an IRIX machine) with 2M blocks. > Any way to read this on a NetBSD machine? > My memories of SCSI ILI handling on DDS are fuzzy. I remember you can operate > these tapes in fixed or variable block size mode, where some values in the CDB > either mean blocks or bytes. I thought in variable mode, you could read block > sizes other than the (virtual) physical block size of the tape. Did you try dd if=/dev/rsd0d of=FILE bs=2m or similar? I believe that dd does reads of the given bs and these reads are passed to the tape device driver which then does reads of that size from the hardware, and that this then works fine. Might need MAXPHYS of 2m too... Warner
Re: Proposal to obsolete SYS_pipe
On Dec 24, 2017 11:10 PM, "Robert Elz"wrote: Date:Sun, 24 Dec 2017 18:42:19 -0800 From:John Nemeth Message-ID: <201712250242.vbp2gjjm017...@server.cornerstoneservice.ca> | HISTORY | A pipe() function call appeared in Version 6 AT UNIX. That I think would be a man page bug - pipe() was certainly in 5th edition, but that is as far back as I go, so I am not sure when it did appear - the syscall number suggests it was not in the very early versions though (not 1st or 2nd edition probably.) It is in the 3rd edition man pages, but is documented with only one return code. The 4th edition manual looks very similar, but does have both values documented. The source is fragmentary so it's hard to track down. 2nd edition has no manuals, but no pipe in libc. I just went through FreeBSD's system call man pages and corrected a number of details like this... Warner
Re: ext2fs superblock updates
On Thu, Nov 16, 2017 at 12:12 PM, Mousewrote: > >> They are generated by _newfs_ and left untouched thereafter. > > Interesting, thanks. what's so useful about the superblock at newfs > > time? > > It contains enough information for fsck to find other critical things > (like cylinder groups and their inode tables). If the primary > superblock has been destroyed but the rest of the filesystem is intact, > fsck is supposed be able to put the filesystem back together with the > help of a backup superblock. Yes. For UFS filesystems on BSD labeled disks, there's additional hints to fsck about the size of different parts of the filesystem that allow it to guess fairly well at the location of these alternate super blocks. Warner
Re: FUA and TCQ
On Mon, Sep 26, 2016 at 8:27 AM, Michael van Elst <mlel...@serpens.de> wrote: > i...@bsdimp.com (Warner Losh) writes: > >>NVMe is even worse. There's one drive that w/o queueing I can barely >>get 1GB/s out of. With queueing and multiple requests I can get the >>spec sheet rated 3.6GB/s. Here queueing is critical for Netflix to get to >>90-93Gbps that our 100Gbps boxes can do (though it is but one of >>many things). > > Luckily the Samsung 950pro isn't of that type. Can you tell what > NVMe devices (in particular in M.2 form factor) have that problem? I've not used any m.2 devices. These tests were raw dd's of 128k I/Os with one thread of execution, so no effective queueing at all. As queueing gets involved, the performance increases dramatically as the drive idle time drops substantially. I'd imagine most drives are like this for the workload I was testing since you had to make a full round-trip from the kernel to userland after the completion to get the next I/O rather than having it already in the hardware... Unless NetBSD's context switching is substantially faster than FreeBSD's, I'd expect to see similar results there as well. Some cards do a little better, but not by much... All cards to significantly better when multiple transactions are scheduled simultaneously. Just ran a couple of tests and found dd of 4k blocks gave me 160MB/s, 128k blocks gave me 600MB/s, 1M blocks gave me 636MB/s. random read/write with 64 jobs and an I/O depth of 128 with 128k random reeds with fio gave me 3.5GB/s. This particular drive is rated at 3.6GB/s. This is for a HGST Ultrastar SN100. All numbers from FreeBSD. In production, for unencrypted traffic, we see a similar number to the deep queue fio test. While I've not tried on NetBSD, I'd be surprised if you got significantly more than these numbers due to the round trip to user land vs having the next request being present in the drive... Warner
Re: Plan: journalling fixes for WAPBL
On Sat, Sep 24, 2016 at 2:01 AM, David Hollandwrote: > On Fri, Sep 23, 2016 at 07:51:32PM +0200, Manuel Bouyer wrote: > > > > *if you have the write cache disabled* > > > > > > *Running with the write cache enabled is a bad idea* > > > > On ATA devices, you can't permanently disable the write cache. You have > > to do it on every power cycles. > > There are also drives that ignore attempts to turn off write caching. These drives lie to the host and say that caching is off, when it really is still on, right? Warner
Re: Plan: journalling fixes for WAPBL
On Fri, Sep 23, 2016 at 11:54 AM, Warner Losh <i...@bsdimp.com> wrote: > On Fri, Sep 23, 2016 at 11:20 AM, Thor Lancelot Simon <t...@panix.com> wrote: >> On Fri, Sep 23, 2016 at 05:15:16PM +, Eric Haszlakiewicz wrote: >>> On September 23, 2016 10:51:30 AM EDT, Warner Losh <i...@bsdimp.com> wrote: >>> >All NCQ gives you is the ability to schedule multiple requests and >>> >to get notification of their completion (perhaps out of order). There's >>> >no coherency features are all in NCQ. >>> >>> This seems like the key thing needed to avoid FUA: to implement fsync() you >>> just wait for notifications of completion to be received, and once you have >>> those for all requests pending when fsync was called, or started as part of >>> the fsync, then you're done. >> >> The other key point is that -- unless SATA NCQ is radically different from >> SCSI tagged queuing in a particularly stupid way -- the rules require all >> "simple" tags to be completed before any "ordered" tag is completed. That >> is, >> ordered tags are barriers against all simple tags. > > SATA NCQ doesn't have ordered tags. There's just 32 slots to send > requests into. Don't allow the word 'tag' to confuse you into thinking > it is anything at all like SCSI tags. You get ordering by not > scheduling anything until after the queue has drained when you send > your "ordered" command. It is that stupid. And it can be even worse, since if the 'ordered' item must complete after all before it, you have to drain the queue before you can even send it to the drive. Depends on what the ordering guarantees you want are... Warner
Re: Plan: journalling fixes for WAPBL
On Fri, Sep 23, 2016 at 11:20 AM, Thor Lancelot Simon <t...@panix.com> wrote: > On Fri, Sep 23, 2016 at 05:15:16PM +, Eric Haszlakiewicz wrote: >> On September 23, 2016 10:51:30 AM EDT, Warner Losh <i...@bsdimp.com> wrote: >> >All NCQ gives you is the ability to schedule multiple requests and >> >to get notification of their completion (perhaps out of order). There's >> >no coherency features are all in NCQ. >> >> This seems like the key thing needed to avoid FUA: to implement fsync() you >> just wait for notifications of completion to be received, and once you have >> those for all requests pending when fsync was called, or started as part of >> the fsync, then you're done. > > The other key point is that -- unless SATA NCQ is radically different from > SCSI tagged queuing in a particularly stupid way -- the rules require all > "simple" tags to be completed before any "ordered" tag is completed. That is, > ordered tags are barriers against all simple tags. SATA NCQ doesn't have ordered tags. There's just 32 slots to send requests into. Don't allow the word 'tag' to confuse you into thinking it is anything at all like SCSI tags. You get ordering by not scheduling anything until after the queue has drained when you send your "ordered" command. It is that stupid. Warner
Re: FUA and TCQ
On Fri, Sep 23, 2016 at 8:05 AM, Thor Lancelot Simonwrote: > Our storage stack's inability to use tags with SATA targets is a huge > gating factor for performance with real workloads (the residual use of > the kernel lock at and below the bufq layer is another). FreeBSD's storage stack does support NCQ. When that's artificially turned off, performance drops on a certain brand of SSDs from about 500-550MB/s for large reads down to 200-300MB/s depending on too many factors to go into here. It helps a lot for work loads and is critical for Netflix to get 36-38Gbps rate from our 40Gbps systems. > Starting de > novo with NVMe, where it's perverse and structurally difficult to not > support multiple commands in flight simultaneously, will help some, but > SATA SSDs are going to be around for a long time still and it'd be > great if this limitation went away. NVMe is even worse. There's one drive that w/o queueing I can barely get 1GB/s out of. With queueing and multiple requests I can get the spec sheet rated 3.6GB/s. Here queueing is critical for Netflix to get to 90-93Gbps that our 100Gbps boxes can do (though it is but one of many things). > That said, I am not going to fix it myself so all I can do is sit here > and pontificate -- which is worth about what you paid for it, and no > more. Yea, I'm just a FreeBSD guy lurking here. Warner
Re: Plan: journalling fixes for WAPBL
On Fri, Sep 23, 2016 at 7:38 AM, Thor Lancelot Simonwrote: > On Fri, Sep 23, 2016 at 11:47:24AM +0200, Manuel Bouyer wrote: >> On Thu, Sep 22, 2016 at 09:33:18PM -0400, Thor Lancelot Simon wrote: >> > > AFAIK ordered tags only guarantees that the write will happen in order, >> > > but not that the writes are actually done to stable storage. >> > >> > The target's not allowed to report the command complete unless the data >> > are on stable storage, except if you have write cache enable set in the >> > relevant mode page. >> > >> > If you run SCSI drives like that, you're playing with fire. Expect to get >> > burned. The whole point of tagged queueing is to let you *not* set that >> > bit in the mode pages and still get good performance. >> >> Now I remember that I did indeed disable disk write cache when I had >> scsi disks in production. It's been a while though. >> >> But anyway, from what I remember you still need the disk cache flush >> operation for SATA, even with NCQ. It's not equivalent to the SCSI tags. All NCQ gives you is the ability to schedule multiple requests and to get notification of their completion (perhaps out of order). There's no coherency features are all in NCQ. > I think that's true only if you're running with write cache enabled; but > the difference is that most ATA disks ship with it turned on by default. > > With an aggressive implementation of tag management on the host side, > there should be no performance benefit from unconditionally enabling > the write cache -- all the available cache should be used to stage > writes for pending tags. Sometimes it works. You don't need to flush all the writes, but do need to take special care if you need more coherent semantics, which often is a small minority of the writes, so I would agree the affect can be mostly mitigated. Not completely since any coherency point has to drain the queue completely. The cache drain ops are non-NCQ, and to send non-NCQ requests no NCQ requests can be pending. TRIM[*] commands are the same way. Warner [*] There is an NCQ version of TRIM, but it requires the AUX register to be sent and very few sata hosts controllers support that (though AHCI does, many of the LSI controllers don't in any performant way).
Re: Proposal for kernel clock changes
On Apr 1, 2014, at 1:50 PM, David Laight da...@l8s.co.uk wrote: This may mean that you can (effectively) count the ticks on all your clocks since 'boot' and then scale the frequency of each to give the same 'time since boot' - even though that will slightly change the relationship between old timestamps taken on different clocks. Possibly you do need a small offset for each clock to avoid discrepencies in the 'current time' when you recalculate the clocks frequency. If the underling clock moves in frequency, you need to have both a scale on the frequency, and a time to count adjustment as well. Otherwise on long-running systems you accumulate a fair amount of error. It doesn’t take much more than 1ppm of error to accumulate a second of error in 10 days if you don’t have ‘on time’ marks that integrate all of time up to that point. Then the error in phase will be related to the time since last phase sync, rather than since time of boot. Warner
Re: asymmetric smp
On Apr 1, 2014, at 5:49 AM, Johnny Billquist b...@softjar.se wrote: Good points. Is this the right time to ask why booting NetBSD on a VAX (a 3500) now takes more than 15 minutes? What is the system doing all that time??? FreeBSD used to take forever to boot on certain low-end ARM CPUs with /etc/rc.d after it was imported from NetBSD. This was due to crappy root-device performance (100kB/s is enough for anybody, right?) and crappy, at the time, pmap code that caused excess page traffic in the /etc/rc.d environment. Perhaps those areas would be fruitful to profile? Also, there were some inefficiencies that were either the result of a botched port, or were basic to the system that got fixed. Between fixing all these things, the boot time went from 10 minutes down to ~20s. Warner
Re: hf/sf [Was Re: CVS commit: pkgsrc/misc/raspberrypi-userland]
On Nov 11, 2013, at 4:31 PM, Justin Cormack wrote: On Mon, Nov 11, 2013 at 10:56 PM, Michael van Elst mlel...@serpens.de wrote: m...@3am-software.com (Matt Thomas) writes: Exactly. with hf, floating point values are passed in floating point registers. That can not be hidden via a library (this works on x86 since the stack has all the arguments). It could be hidden by emulating the floating point hardware. Thats not sane. The slowdown would be enormous. You are emulating registers as well as operations. Is there a complete write up of the conventions here? Warner
Re: MACHINE_ARCH on NetBSD/evbearmv6hf-el current
On Oct 26, 2013, at 12:24 PM, Alistair Crooks wrote: On Sat, Oct 26, 2013 at 11:10:52AM -0700, Matt Thomas wrote: On Oct 26, 2013, at 10:54 AM, Izumi Tsutsui tsut...@ceres.dti.ne.jp wrote: By static MACHINE_ARCH, or dynamic sysctl(3)? If dynamic sysctl(3) is prefered, which node? hw.machine_arch which has been defined for a long long time. Yes, defined before sf vs hf issue arised, and you have changed the definition (i.e. make it dynamic) without public discussion. That's the problem. It was already dynamic (it changes for compat_netbsd32). Whether or when it's dynamic or not, it would be great if you could fix it so that binary packages can be used. And Tsutsui-san is right - public discussion needs to take place, and consumers made aware, before these kind of changes are made. I don't see any further emails on this thread. Was there ever a resolution, or just crickets? Warner
Re: pulse-per-second API status
On Nov 1, 2013, at 12:19 PM, paul_kon...@dell.com wrote: On Nov 1, 2013, at 2:04 PM, Mouse mo...@rodents-montreal.org wrote: ... But it still may not work in the sense of living up to the expectations people have come to have for PPS on serial ports. My worry is not that it's not the best time available in some circumstances. My worry is that putting it into the tree will lead to its getting used as if it were as good as PPS on anything else, leading both to timeservers that claim stratum 1 but give bad chime and to people blaming NetBSD for its crappy PPS support when the real problem is that they don't understand the USB issues and it _looks_ like any other PPS support until you test the resulting time carefully. Not just PPS on serial ports, but PPS on other hardware. I don't know this API. But my first reaction when I saw the designation PPS is to think of GPS timekeeping boxes and other precision frequency sources that have a PPS output. On those devices, the PPS output is divided down from the main oscillator frequency, i.e., you can expect accuracies of 10^-9 for modest price crystal oscillators, 10^-10 to 10^-12 for higher end stuff -- and jitter in the nanosecond range or better. It seems rather confusing to have another interface that goes by the same name but has specs 6 or more orders of magnitude worse. How about a different name that avoids this confusion? Just because the signal has an Allen Variance of 10^-10 doesn't mean that you'll be able to measure each pulse with that precision, or that the tau of that figure is 1s... Most common time counter hardware in SoCs and the like is good to anywhere from hundreds of microseconds to tens of nano seconds. Hundreds of microseconds isn't much worse than the millisecondish USB accuracy. The PPS API even allows for an estimate of the accuracy of the measurements, IIRC, but that may be a higher-level facility of NTP (it has been a few years since I've done this stuff professionally). I don't think there will be any confusion at all, especially if the measured accuracy and variance of this facility is documented. 1ms is quite accurate enough for NTP though. NTP has trouble on the network getting below 1ms of accuracy, especially when there are any hops at all in the topology. It won't be the best NTP server in the world, but it will be accurate enough for most things. If you need more accuracy, get better hardware.. To those saying 'fix NMEA mode to be better': You can't. The characters that spit this code out aren't guaranteed to be at top of second any more than approximately...The exact timing varies from receiver to receiver, and if USB is involved, the same silly delays are present there too, only worse because the message spans USB packets (or likely would since it is just short of 100 characters long IIRC)... And even if you get those issues out of the way, I also believe there's ambiguity in the NMEA standard between the 'on time' point for the NMEA messages. Is it the start of the message, the end? Is is the first transition of the first bit of the message, or the end of the first character? Since it isn't considered a precision signal, nobody times it exactly (or didn't a few years ago). It is useful, at best, for knowing what time the external PPS is about to be or just was... So adding support to ucom isn't a horrible idea, as long as expectations are managed... Warner
Re: NetBSD port for AT91SAM9G20?
On Sep 5, 2012, at 12:43 AM, Jukka Marin wrote: Hi, I have asked this before, but got no replies. We are making AT91SAM9G20 based hardware and I would love to run NetBSD on it. However, I can't find the time to port NetBSD to this MCU and hardware. Is there anyone with some spare time and interest in this kind of a project? I could provide the hardware and documentation required. I might even sponsor some $'s to the NetBSD project if I could run my favourite OS on our hardware. The main features of our current hardware are: - AT91SAM9G20 MCU (400 MHz) - 64 MB RAM - 128 MB NAND FLASH - 8 MB NOR FLASH - hardware watchdog - RTC with battery backup - 4 x 10/100 Mbps Ethernet (with a switch) - 3 x RS232 - 2 x RS485 - 3G GSM modem - digital inputs (opto isolated) - relay outputs - LEDs - USB host / device ports - expansion slot - power supply 9...30 VDC - 19 rack mount case (or a smaller metal case) Apart from a few clocks, this should work with the AT91SAM9260 support that's in the tree. The device tables/trees are the same, and the errata for the devices are quite similar. Warner
Re: NetBSD port for AT91SAM9G20?
On Sep 5, 2012, at 11:50 AM, vinc...@labri.fr wrote: Warner Losh i...@bsdimp.com writes: On Sep 5, 2012, at 12:43 AM, Jukka Marin wrote: The main features of our current hardware are: - AT91SAM9G20 MCU (400 MHz) [...] Apart from a few clocks, this should work with the AT91SAM9260 support that's in the tree. The device tables/trees are the same, and the errata for the devices are quite similar. Yes, maybe you'll have to add the CPU id to at91/at91dbgu* for it to be recognized correctly but that should be it. You might want to look at FreeBSD's cpu identification. I think I have all the SAM9 CPUs plus the RM9200 accounted for, plus autoprobing for the dbgu unit, which differs from SoC to SoC. However, be warned that the ethernet MAC will not work because the code currently intree is for another type of AT91 processor which has sufficiently different characteristics wrt number of RX and TX buffers and maximum sizes of these, although it will probably attach correctly and might be able to send short packets by chance. I'd forgotten that detail. FreeBSD's driver copes with both (plus there's a driver that just talks to the new hardware). I started a rewrite of it for the at91sam9260 but ran out of time. If other developers want smaller hardware than what Jukka offers to experiment, Propox sells small cards with an option for a at91sam9g20 CPU. http://www.propox.com/products/t_232.html http://www.propox.com/products/t_231.html Couldn't figure out how buy these in the US. They are about US$110 if I read things right. I recently got a nice little board from Glomation that ships quickly and is cheap (starting at US$55 for the low end up to about $95 for the high end in Q1 lot sizes). http://www.glomationinc.com/products.html The GESBC-9G20u is the $55 one. You have to write for a pricelist for the other boards, but they are very responsive to email. Warner
Re: Path to kernel modules (second attempt)
On Jul 8, 2012, at 10:20 AM, Matthew Mondor wrote: On Sun, 8 Jul 2012 17:57:00 +0200 Edgar Fuß e...@math.uni-bonn.de wrote: Please not /kernel as it was already mentioned, it is too similar to /kern. What about /netbsd? E.g. /netbsd/6.0_BETA/{modules,kernel,firmware}. /netbsd/amd64/6.0/GENERIC/{modules,kernel,firmware} :) ? One more note about FreeBSD's structure. In addition to looking in /boot/$KERNNAME, it will also look in /boot/modules. This is done so that you can have multiple different kernels of the same version, that might use different internal KBIs that 3rd party drivers don't use. You can install your 3rd party driver into /boot/modules and load it with successive kernels (we move /boot/kernel to /boot/kernel.old before recreating the /boot/kernel to install the new kernel and modules). This works well for similar versions (eg 9.0, 9.1), but works less well with 8.x-9.x. But can the kernel easily detect that its image was booted in a particular directory, and use that as base directory to look for modules? Also, how more complex would this be for the bootloader that also needs to preload a few modules to be able to boot? FreeBSD's boot loader passes this in... Warner
Re: Path to kernel modules (second attempt)
On Jul 7, 2012, at 4:17 PM, Matthew Mondor wrote: On Sat, 07 Jul 2012 22:46:50 +0200 Jean-Yves Migeon jeanyves.mig...@free.fr wrote: On 07.07.2012 21:57, Mindaugas Rasiukevicius wrote: Hello, Regarding the PR/38724, I propose to change the path to /kernel/. Can we reach some consensus quickly for netbsd-6? /kernel is way to close to /kern, and they serve different purposes. IMHO that will raise confusion. Perhaps /kmod, or /modules like dholland suggests? Technically modules are not libraries, but maybe /libdata/module is a good option? We already have firmwares in /libdata/firmware, and those get used by the kernel. That also makes sense But it kinda fails with multiple kernels. On FreeBSD, we went with /boot/$KERNNAME/kernel for the kernel, with all the modules associated with it in /boot/$KERNNAME. By default, we load /boot/kernel/kernel and the loader may also choose to load other things. The reason we put it in /boot was because we have a secondary boot loader (/boot/loader) and on some platforms we were looking at you needed a separate boot partition to do things correctly. this layout allows for that as well as transparently supporting multiple kernels. I know on one of my MIPS boards, I can read kernels or the boot loader off of FAT partitions, so my /boot there is a FAT file system, with the rest of the system in a UFS file system on separate partitions/slices on my CF. Just something to think about before you go stuffing it into /lbidata/module or something... Warner