Re: Bug in fs/cd9660 raises questions about inode number computing
On Sat, 10 May 2014 08:11:40 +0200 Thomas Schmitt scdbac...@gmx.net wrote: kern/48787 can be counted as a successful one. kern/48797 demonstrates that i need to free myself more from expectations which occupied my mind when studying isofs of a different kernel. Thanks to Martin Husemann for posing the right questions. Thanks for working on this, -- Matt
Re: Bug in fs/cd9660 raises questions about inode number computing
On Tue, 06 May 2014 12:20:53 +0200 Thomas Schmitt scdbac...@gmx.net wrote: How to properly submit them ? A PR (Problem Report) in the kern category with an attached unified diff would seem adequate if you cannot commit the changes yourself. Sorry if that is already obvious to you. Unfortunately I'm not personally familiar enough with iso9660 to confirm that the fixes are right, or to answer the other questions, though; hopefully others will. -- Matt
Re: Vnode API change: add global vnode cache
On Wed, 30 Apr 2014 17:15:16 +0200 J. Hannken-Illjes hann...@eis.cs.tu-bs.de wrote: vcache_get(mp, key, key_len, vpp) to lookup and possibly load a vnode. vcache_lookup(mp, key, key_len, vpp) to lookup a vnode. vcache_remove(mp, key, key_len) to remove a vnode from the cache. VFS_LOAD_NODE(mp, vp, key, key_len, new_key) to initialise a vnode. Updated diff at http://www.netbsd.org/~hannken/vnode-pass6-4.diff One small question: Is it expected in vcache_common() for the interlock to remain held even if returning an error? Thanks, -- Matt
Re: Vnode API change: add global vnode cache
On Sat, 10 May 2014 01:29:47 + Taylor R Campbell campbell+netbsd-tech-k...@mumble.net wrote: Is it expected in vcache_common() for the interlock to remain held even if returning an error? vget unconditionally drops the interlock, so it will never remain held, error or not. Oh, thanks. I can now see that vget() must be called with it held, and indeed drops it itself. -- Matt
Re: Panic when deleting large number of files inside DomU
On Wed, 19 Sep 2012 12:00:45 +0200 Roger Pau Monne roger@citrix.com wrote: Yes, WAPBL enabled. I will fill a PR about this if there are no news. Was a PR already filed for this, or was the reason discovered and fixed since? A quick search showed one of your closed Xen related PRs but it seems to be a different issue, unless I'm mistaken. Thanks, -- Matt
Re: NetBSD-5 appears to have forgotten how to execute 0.9A binaries
On Tue, 11 Sep 2012 09:45:22 -0700 buh...@lothlorien.nfbcal.org (Brian Buhrow) wrote: provide further results. I assume a fix would want to be pulled up,assuming I find it, on the grounds that it's a security fix. I'll also see about trying -current and NetBSD-6, but I'm guessing those are vulnerable as well, given Matthew's test with my binary under NetBSD-6 yesterday. Was a PR for this ever filed, or the problem fixed since? Any relation to SA2013-013? Thanks, -- Matt
Re: Does options P1003_1B_SEMAPHORE still exist?
On Mon, 17 Sep 2012 10:42:49 -0700 (PDT) Paul Goyette p...@whooppee.com wrote: Sorry for the long delay, I'm slowly recouping with tech-kern mail. I recently noticed that there is a built-in ksem module that includes sys/kern/uipc_sem.c The man page for sem(4) states that this code should be included in the kernel only if options P1003_1B_SEMAPHORE is defined. Yet a search of the kernel sources shows no usage for this option anywhere, and the uipc_sem.c file is unconditionally included by sys/conf/files So, I have a few questions: 1. Should sem(4) really be in manual section 4? It doesn't appear to be a device driver! (Maybe a more detailed man page should be written for section 9?) I have the impression that those syscalls should all be documented in a section 2 manual page instead (kern/37427). Not totally related but misc/38979 would have similar results for the scheduler control related syscalls. I now realize that I probably don't have a PR for these ones, but the mqueue and setaffinity related syscalls are also undocumented. At the time I filed the PRs they were contested by AD because the libc counterparts were already documented, with the syscalls considered the private interface. I personally believe that all syscalls should be documented in NetBSD (and recently I have learned that I'm not the only one to think they should be, so perhaps I should eventually write these manual pages, afterall). 2. Should the man page be updated to remove the reference to the option? A quick grep on netbsd-6 here only shows: share/man/man4/options.4:.It Cd options P1003_1B_SEMAPHORE share/man/man4/sem.4:.Cd options P1003_1B_SEMAPHORE sys/compat/freebsd/freebsd_syscall.h:#if defined(P1003_1B_SEMAPHORE) || !defined(_KERNEL) sys/compat/freebsd/freebsd_syscallargs.h:#if defined(P1003_1B_SEMAPHORE) || !defined(_KERNEL) sys/compat/freebsd/freebsd_syscallargs.h:#if defined(P1003_1B_SEMAPHORE) || !defined(_KERNEL) sys/compat/freebsd/freebsd_syscalls.c:#if defined(P1003_1B_SEMAPHORE) || !defined(_KERNEL) sys/compat/freebsd/freebsd_sysent.c:#if defined(P1003_1B_SEMAPHORE) || !defined(_KERNEL) sys/compat/freebsd/syscalls.master:#if defined(P1003_1B_SEMAPHORE) || !defined(_KERNEL) sys/compat/netbsd32/netbsd32_syscall.h:#if defined(P1003_1B_SEMAPHORE) || (!defined(_KERNEL_OPT) defined(_LIBC)) sys/compat/netbsd32/netbsd32_syscallargs.h:#if defined(P1003_1B_SEMAPHORE) || (!defined(_KERNEL_OPT) defined(_LIBC)) sys/compat/netbsd32/netbsd32_syscallargs.h:#if defined(P1003_1B_SEMAPHORE) || (!defined(_KERNEL_OPT) defined(_LIBC)) sys/compat/netbsd32/netbsd32_syscalls.c:#if defined(P1003_1B_SEMAPHORE) || (!defined(_KERNEL_OPT) defined(_LIBC)) sys/compat/netbsd32/netbsd32_sysent.c:#if defined(P1003_1B_SEMAPHORE) || (!defined(_KERNEL_OPT) defined(_LIBC)) sys/compat/netbsd32/syscalls.master:#if defined(P1003_1B_SEMAPHORE) || (!defined(_KERNEL_OPT) defined(_LIBC)) sys/kern/init_sysctl.c:#if defined(MODULAR) || defined(P1003_1B_SEMAPHORE) sys/modules/compat_netbsd32/Makefile:CPPFLAGS+= -DP1003_1B_SEMAPHORE -DCOREDUMP -DKERN_SA 3. If the code is truly unconditional, should it really be a module? If so, could it be made to auto-load when needed? Could it also be auto unloaded? It seems that other POSIX librt components such as message queues, scheduler control, cpu affinity, etc, are not optional. I don't know why those semaphores should be, thus they could probably remain as part of the base kernel with the option removed, unless we'd want all of RT components to be optional and in a module, perhaps? But librt of course wouldn't be usable then, unless it's loaded... Anyone remember a particular reason why these semaphores might be unwanted in custom kernels, but the rest of librt wanted anyway? -- Matt
Re: NetBSD-5 appears to have forgotten how to execute 0.9A binaries
On Tue, 6 May 2014 07:56:22 -0700 Brian Buhrow buh...@nfbcal.org wrote: hello. There was a fix implemented for the original problem by Chuck Silvers and tested by me. I'll look to see if I can find the commits. I'm not sure if it was documented in a pr or not or if it got pulled up to NetBSD-6. I'm pretty sure it's in -current and I know it's in -5 as a pullup. If you want to have a look, it happened in the first half ofSeptember 2012. Unfortunately I couldn't locate the exact change or pullup tickets. But considering the change was pulled up to netbsd-5, and that 6.0 was released around October, I guess that if netbsd-6 needed the change it was also fixed then. Thanks, -- Matt
Re: resource leak in linux emulation?
On Mon, 5 May 2014 15:43:56 +1200 Mark Davies m...@ecs.vuw.ac.nz wrote: On Mon, 05 May 2014, Christos Zoulas wrote: I wrote: So can someone suggest where exactly the patch should go. And isn't proc_lock held at this point (entered at line 344, exit at line 569)? How about this? Seems good to me and can confirm that its fixed the increasing proc count problem. Can someone commit and pull up to 6? I also see emulation-code specific exit hooks support, I've not checked if it's really possible, but could that linux-specific case be solved there instead of in the generic code if so? Thanks, -- Matt
Re: asymmetric smp
On Mon, 5 May 2014 01:10:24 -0400 Matthew Mondor mm_li...@pulsar-zone.net wrote: which some CPUs might have trouble with (i.e. RAS)... I think that what I meant was CAS -- Matt
Re: asymmetric smp
On Wed, 02 Apr 2014 17:21:02 +0200 Johnny Billquist b...@softjar.se wrote: On 2014-04-02 16:10, John Nemeth wrote: On Apr 2, 1:55pm, Johnny Billquist wrote: } The root fs in on nfs, as I'm running the machine diskless. Disk is } served from a -current NetBSD/alpha system sitting right next to it. And } I have changed the Alpha to run at 10 MB/s half duplex, and I have 2k } block size for NFS. Login is obviously already running, since that is } what also prompts for the username, and doing it twice should even put } some stuff in local cache. Uh, actually getty does the initial prompt for username on the console. After collecting the username, getty execs login. Hmm. My mistake in that case. So we have image activation at that point. Hmm... Possibly other things to verify would be /etc/passwd.conf (you'll likely need to also regenerate passwords if you change those settings), and if VAX has specialized lock code or uses the new generic atomic operations which some CPUs might have trouble with (i.e. RAS)... -- Matt
Re: 6.0_BETA-6.0_BETA2 rename
On Mon, 30 Jul 2012 16:59:14 +0200 Edgar Fuß e...@math.uni-bonn.de wrote: Just out of curiosity: Why was 6.0_BETA renamed 6.0_BETA2 recently? The release of second beta binaries: http://blog.netbsd.org/tnf/entry/netbsd_6_0_beta2_binaries After the beta series, release candidates might be expected i.e. RC1, RC2 etc until official release, at which point the netbsd-6 branch will become 6.0_STABLE. -- Matt
Re: Core statement on directory naming for kernel modules
On Fri, 27 Jul 2012 17:28:14 -0700 jnem...@victoria.tc.ca (John Nemeth) wrote: On Dec 17, 1:58pm, Matthew Mondor wrote: } This reminds me though: why/how does sysctl/kern.module.autoload } default to 1 for non-MODULAR kernels (at least on netbsd-6)? Or an } alternative question: are these sysctl knobs useful at all with } non-MODULAR kernels, or are they then artifacts? Good question. Non-MODULAR kernels still have parts of the MODULAR subsystem in order to initialise built-in modules. However, the linking code isn't there, so it would be impossible to load a module. I'll make a note to trim some of the excess stuff in non-MODULAR kernels. Indeed the linker isn't there, which was confirmed using nm when I initially noticed those knobs. Thank you for looking into this, -- Matt
Re: Core statement on directory naming for kernel modules
On Fri, 27 Jul 2012 13:57:52 + (UTC) Geoff Wing ma...@primenet.com.au wrote: John Nemeth jnem...@victoria.tc.ca typed: : .. Being able to properly unload a built-in module would be a nice : feature. This sounds a bit like a possible security problem, though presumably/hopefully limited by the current security level and AAA. Do you mean in the case an external module could then be loaded instead of a built-in one? Probably that someone who wants to prevent the kernel from loading external modules would use a kernel without MODULAR, or change the runlevel. This reminds me though: why/how does sysctl/kern.module.autoload default to 1 for non-MODULAR kernels (at least on netbsd-6)? Or an alternative question: are these sysctl knobs useful at all with non-MODULAR kernels, or are they then artifacts? Thanks, -- Matt
Re: Quota on tmpfs
On Tue, 17 Jul 2012 20:54:28 + (UTC) mlel...@serpens.de (Michael van Elst) wrote: I would also guess that sparse files are very rarely used. But for disk usage purposes you want to consider real disk usage including overhead because the quotas are mostly used to partition the available space. That doesn't work if your quotas allow you to write a few thousand files of 1 byte length that account together as a single single block when they really occupy a few thousand blocks. A scenario in which they're frequently used is block-based file system transfer protocols (especially distributed ones where blocks may download in random order, including bittorrent), also by download managers that support download optimization where multiple connections will be made to transfer multiple file sections at a time (i.e. the DownloadThemAll Firefox extension). Another common usage of sparse files is for live file system images. The cost of creation (open/creat + trunk/lseek + newfs) is small compared to writing a full image of zeros, then the blocks can be lazily allocated and written when needed. Apparently some database storage formats use sparse files, but the ones I'm currently using don't seem to... -- Matt
Re: Quota on tmpfs
On Tue, 17 Jul 2012 21:26:44 -0400 Matthew Mondor mm_li...@pulsar-zone.net wrote: A scenario in which they're frequently used is block-based file system s/file system/file/ :) -- Matt
Re: Quota on tmpfs
On Fri, 13 Jul 2012 07:54:07 + David Holland dholland-t...@netbsd.org wrote: On Thu, Jul 12, 2012 at 09:33:42PM -0400, Matthew Mondor wrote: Yet another hack would be to create a sparse ffs image under a tmpfs, mounted with quotas via vnd, but evaluating its ideal size might be difficult, and you'd have to re-apply quota settings in the script that creates the image at boot time... :) Using mfs instead of tmpfs is probably a better bet here. mfs brings in enough of ufs that adding quota support to it shouldn't be particularly complicated. I was also wondering initially if mfs didn't actually already support quotas because of this similarity, but it doesn't seem so at the moment indeed Thanks, -- Matt
Re: Quota on tmpfs
On Fri, 13 Jul 2012 08:03:42 + David Holland dholland-t...@netbsd.org wrote: I believe the situation with both mfs and lfs is that some pieces of the support are in place but not others. It was clear when hacking up the code that neither had actually been tried by anyone in a long, long time... I admit myself not having tried LFS again after the advent of WAPBL, and only having used MFS to boot small custom userlands using crunchgen(1) long ago (floppy disks :) -- Matt
Re: Quota on tmpfs
On Thu, 12 Jul 2012 16:17:42 +0200 Edgar Fuß e...@math.uni-bonn.de wrote: How do I enable new quota on a tmpfs? A possible solution might be a per-user tmpfs, each limited using -s... of course, it's more complex to manage though. If I remember there is some optional support for symbolic links to resolve to user-specific targets, but I forgot the details. With that /tmp/ could potentially be a symbolic link pointing to say, /tmpfs/user/ I think. Yet another hack would be to create a sparse ffs image under a tmpfs, mounted with quotas via vnd, but evaluating its ideal size might be difficult, and you'd have to re-apply quota settings in the script that creates the image at boot time... :) -- Matt
Re: Path to kernel modules (second attempt)
On Sun, 8 Jul 2012 17:57:00 +0200 Edgar Fuß e...@math.uni-bonn.de wrote: Please not /kernel as it was already mentioned, it is too similar to /kern. What about /netbsd? E.g. /netbsd/6.0_BETA/{modules,kernel,firmware}. /netbsd/amd64/6.0/GENERIC/{modules,kernel,firmware} :) ? But can the kernel easily detect that its image was booted in a particular directory, and use that as base directory to look for modules? Also, how more complex would this be for the bootloader that also needs to preload a few modules to be able to boot? -- Matt
Re: Path to kernel modules (second attempt)
On Sat, 07 Jul 2012 22:46:50 +0200 Jean-Yves Migeon jeanyves.mig...@free.fr wrote: On 07.07.2012 21:57, Mindaugas Rasiukevicius wrote: Hello, Regarding the PR/38724, I propose to change the path to /kernel/. Can we reach some consensus quickly for netbsd-6? /kernel is way to close to /kern, and they serve different purposes. IMHO that will raise confusion. Perhaps /kmod, or /modules like dholland suggests? Technically modules are not libraries, but maybe /libdata/module is a good option? We already have firmwares in /libdata/firmware, and those get used by the kernel. That also makes sense -- Matt
Re: Path to kernel modules (second attempt)
On Sat, 7 Jul 2012 20:54:12 -0600 Warner Losh i...@bsdimp.com wrote: But it kinda fails with multiple kernels. On FreeBSD, we went with /boot/$KERNNAME/kernel for the kernel, with all the modules associated with it in /boot/$KERNNAME. By default, we load /boot/kernel/kernel and the loader may also choose to load other things. The reason we put it in /boot was because we have a secondary boot loader (/boot/loader) and on some platforms we were looking at you needed a separate boot partition to do things correctly. this layout allows for that as well as transparently supporting multiple kernels. I know on one of my MIPS boards, I can read kernels or the boot loader off of FAT partitions, so my /boot there is a FAT file system, with the rest of the system in a UFS file system on separate partitions/slices on my CF. I think that the version and arch directories would be maintained. But you're right, and when I think of it, it's actually one of the reasons I use monolithic kernels. If modules and kernels always corresponded well and were closely coupled in a directory, it'd be much less trouble for me to test and move kernels around, or maintain multiple versions of them on the same host. At the moment, single monolithic files do this much better. Some kernel configuration changes not only affect the main image, but also the modules, and full ABI compatibility would be a difficult problem. It might not matter for someone who wants to avoid using a custom kernel (I agree that modules should help a lot in this case for the end user, no matter their arrangement). But if we eventually begin to see modules under non-BSD licenses which can only be distributed as modules, more tech users might likely want modules as well... Or it might not matter at all, if an admin can simply link together all modules in a single kernel image, and keep the non-distributable image private in the organization (I think there is some work in this area, other than the traditional monolithic builds)? So something like /kmod/amd64/6.0/GENERIC/, or a layout where /netbsd-GENERIC/ could be a directory, /netbsd-GENERIC/image the kernel, /netbsd-GENERIC/modules/ its corresponding modules, would be nice. In the latter case, prehaps also a /netbsd symlink pointing to the corresponding /foo/image, somewhat like the vmlinuz link of some Linux distributions? Thanks for sharing your experience, -- Matt
Re: Problem with chown
On Wed, 27 Jun 2012 23:20:36 - David Lord net...@lordynet.org wrote: I tried NetBSD-6-BETA2 but had too many problems. Attempted reinstalls of NetBSD-5 have all obviously failed. Indeed, downgrading is usually more problematic, postinstall not being of much use in this case -- Matt
Re: per-mount maxvnodes
On Thu, 7 Jun 2012 17:50:58 +0200 Manuel Bouyer bou...@antioche.eu.org wrote: On Thu, Jun 07, 2012 at 11:09:26AM -0400, Mouse wrote: Therefore comes the idea to have a per-mount maxvnodes. I tried implementing it, the biggest problem is how to set the value. sysctl kern./usr/local.maxvnodes? It's a little ambiguous, in that it's possible - or at least it was last time I tried it - to have multiple mounts with the same mounted-on string. But that's definitely an unusual case, and I see nothing wrong with accessing the topmost mount in that case; that's what normal filesystem accesses will do, after all. No, I think this should be a mount option. Maybe it's time to revisit the mount(2) interface (proplist anyone ? :) If mounts had an ID (like processes), then it'd be easier to use sysctl for them (commands such as mount and df might want to also export such IDs, so possibly also statvfs(2))... There are device ID, but I'm not sure this could serve this purpose properly. This also reminds me of the thread about possibly allowing to temporarily enable noatime for a particular operation such as a backup or find... Perhaps that such options should eventually be dynamically scoped such that a particular process or lwp could temporarily bind another value for its own use (if it has the necessary privileges, of course)? I'm not sure how far fetched this can be relatively to the code, I'm not very familiar with the FS code. -- Matt
Re: Rump FS throughput
On Fri, 1 Jun 2012 22:30:10 +0200 Thomas Klausner w...@netbsd.org wrote: On Thu, May 31, 2012 at 01:45:53PM -0400, Matthew Mondor wrote: Although it's useful to mount random media more safely than it would be using kernel-space, I noticed that using 64KB reads, the kernel cd9660 will gladly read ~20MB/s from a DVD, but that rump_cd9660 using 64KB reads is limited to aproximately 4MB/s at most, even if the system is mostly idle during those transfers (on netbsd-6/amd64 and 4 3.3GHz cores). Some suggestions from Antti via email proxy: Maybe he is using the block device (/dev/cd0a) instead of the raw device (/dev/rcd0a). IIRC the former has some pretty serious performance problems for userspace I/O. Also in the maybe department, libp2k should probably detect and autoadjust a block device to raw device. Or, someone could just fix the bdev stuff. Thanks for forwarding his suggestions, If I try using the raw device (rcd0a), the speed is about 1.2MB/s (I can't hear the DVD drive motor spin up either), while with the block device (cd0a) the speed is about 4MB/s (in this case it spins up to a higher speed). With the same DVD and cd0a mounted using the kernel FS implementation and the same command (cat /cdrom/* /dev/null), I get from 10 (start) to 20 (end) MB/s. These tests were on NetBSD-6. I'm not familiar enough with libp2k or bdev to know what needs to be done, but I could certainly take a look eventually. But I probably also should verify if an ISO-9660 FUSE implementation exists, and perhaps try to port it eventually, and see if performance is adequate for general use. Thanks again, -- Matt
Re: link-sets in modules
On Mon, 28 May 2012 06:51:43 -0700 (PDT) Paul Goyette p...@whooppee.com wrote: I _do_ like part 2 of your proposal - linking the core kernel first, and then re-linking with selected modules. I also think that this would be very nice -- Matt
Re: Should kqueue descriptors work outsid of the creating process?
On Thu, 31 May 2012 10:38:38 -0400 (EDT) Mouse mo...@rodents-montreal.org wrote: Recently we found out (PR kern/46463) that kqueue() file descriptors, which originaly were designed to be local process only objects, could be passed with SCM_RIGHTS messages to other processes. [...] I propose to not allow sending kqueue file descriptors [...] Or are there any legit uses for foreign kqueue()s? It seems to me, for what it may be worth, that this is asking the wrong question. Rather, I would ask whether there are illegitimate uses for `foreign' kqueue descriptors, and, if not, fix them to be passable like any other descriptors. It's true that it's normally the parent's reponsibility to decide which FDs to close or set close-on-exec before fork(2)... Was there a design decision not to inherit kqueue descriptors for security or complexity reasons? Since signals, signal mask, signal stack and restart/interrupt flags are also inherited according to sigaction(2), probably that an EVFILT_SIGNAL filter would still be valid... But how about EVFILT_TIMER? timer_create(2) timers are not inherited, setitimer(2) doesn't specify, but it also uses the same ptimers pool timer_create(2) uses. EVFILT_TIMER apears to use its own system though. For EVFILT_PROC, it appears to be for the specified process, so I guess it might still work if inherited? And there also EVFILT_VNODE... who knows what other filters might be added in the future? What I can see is that the implications of inheriting this special descriptor are quite more complex than for normal FDs... Which makes me think that it very well could be a design decision not to inherit these, in which case I don't object to also prevent passing it via SCM_RIGHTS ancillary message. -- Matt
Re: Should kqueue descriptors work outsid of the creating process?
On Thu, 31 May 2012 14:40:44 -0400 Matthew Mondor mm_li...@pulsar-zone.net wrote: What I can see is that the implications of inheriting this special descriptor are quite more complex than for normal FDs... Which makes me think that it very well could be a design decision not to inherit these, in which case I don't object to also prevent passing it via SCM_RIGHTS ancillary message. When catching up with mail, I unfortunately read the PR thread after writing this (as well as Christos's concerns about treating some FDs differently than others). What came to my mind was that kqueue could have used another type of special object instead of a descriptor, but it's too late for a change of API, and although I see some other interfaces using such integers which aren't necessary file descriptors (i.e. timer_create(2)), kqueue's API expects close(2) to clean it up... -- Matt
Re: CVS commit: src/tests/modules
On Wed, 21 Mar 2012 21:47:31 + David Holland dholland-t...@netbsd.org wrote: But, how about kern.module.supported or kern.module.canload or something? I like the kern.module.supported, or perhaps kern.module.enabled, as I have systems built without module loading support yet still have a few module sysctls around under that same hierarchy, and module.modular also seems ambiguous and redundant... -- Matt
Re: Rewriting kernfs and procfs - GSoC'12
On Tue, 20 Mar 2012 10:35:13 +0900 Julio Merino j...@julipedia.org wrote: Personally, I'd also like to see this project done. It was at one point an idea I wanted to work on, but then lost the time to do so and forgotten about it completely. I was initially reticent to reply to this thread at this time, because some details might be out of the scope of the GSoC project. But I think that those questions are important to consider in the design of a new procfs implementation, and the project description was very summary, so I decided to post them anyway: It was nice to be able to mount procfs with -o linux when I used Linux binary compatibility. Are there other scenarios where it is required? If not, should a new implementation simply be as compatible as possible with Linux, such that -o linux not be necessary? Even some supposedly portable software occasionally now expect a Linux-compatible procfs tree. Otherwise, I think that currently NetBSD doesn't make use of it, as kernfs and procfs are not mounted on my systems. Is there functionality that it should provide which sysctl/vmstat/pmap/fstat/drvctl don't? While on Linux it's used as a central repository for a lot of information, I regularily stumble on ad-hoc parsers in a number of applications that query from it, wondering why they didn't export that information via sysctl... If it should diverge from Linux and still support -o linux, is there a particular hierarchical direction it should respect, and suggested file format(s), i.e. plist is an example, which applications could parse using a supplied library? Or should the data be in a format designed for human reading only, with sysctl used for software? I doubt that a new implementation needs to remain compatible with the traditional 4.4BSD procfs hierarchy, as it's not really being used by software yet. I once thought that it might be useful to export procfs via NFS, but our current implementation doesn't support it. Is this something that a new implementation should allow? Thanks, -- Matt
netbsd-6/amd64 and TLS
Hello, I stumbled upon something interesting tonight when testing a new unstable ECL (Embeddable Common Lisp). When built with TLS support (--with-__threads=yes), a noticeable slowdown can be experienced compared to with --with-_threads=no. For now, I'm not sure yet if it has to do with a bug in ECL or in NetBSD, though, I should check the TLS/non-TLS code paths whenever I have more time. But I wanted to meanwhile share this, in case someone else also noticed something similar, or has a clue as to why this happens. The system was built using DBG='-g -O2'. Thanks, -- Matt
Re: Problem with install of NetBSD-6 from cd on i386 siside
On Wed, 07 Mar 2012 15:14:52 - David Lord net...@lordynet.org wrote: I have since obtained netbsd-6 src via cvs on a different system, built a release, copied sets over network and updated target pc to NetBSD-6. I am able to mount the cdrom and tar -tzvf comp.tgz initially gave same error as above but then completed ok. Seems the drive isn't being allowed to spin up. Just a note: beware about the missing -p option when extracting sets. Permissions will not be restored properly and things like setuid binaries will not be working (a common issue would be for instance, su(1) not working after installing base.tgz). This might not matter for the comp.tgz set, though. -- Matt
Re: Respawn crashed PUFFS filesystems?
On Sun, 12 Feb 2012 01:02:38 -0500 (EST) Mouse mo...@rodents-montreal.org wrote: Of course the feature would be broken in some cases, but we could make the thing optional using a vfs.puffs.respawn sysctl, which would contain a colon-separated mount points subjected to respawn. What happens if a mount point contains a colon? More to the point, I think this puts the information in the wrong place. Is there any way it could be set as an option at mount time? (That's a serious question; I don't know puffs enough to answer it.) I also think that a mount respawn option would be elegant -- Matt
Re: extattr namespaces
On Mon, 6 Feb 2012 09:51:19 + Emmanuel Dreyfus m...@netbsd.org wrote: We ahve two extended attributes API in tree: one from FreeBSD and one from Linux. We are about to toss the FreeBSD one in favor of the Linux one. That is easy now since we never had working extended attributes in a release. One thing that I'm wondering: what are the character constraints on those class names in the Linux API? The reason is that if UTF8 is allowed, it'd be possible for two names to show as an equivalent representation to humans, while they'd be different for the system, and this could have security implications if we ever use these to support extended permissions such as ACLs in the future. In the FreeBSD API, namespaces are int. There are two namesapces defined: ssytem and user. There is no way to add other namespaces, though I have no idea what happens if one use an int valude different than system or user. For performance and security, integers make more sense to me than strings. However, I don't think there'd be a problem if internally they're integers, yet showed to userland with a strings interface (we traditionally do this for user and group IDs, in which case tools such as id or ls can show the IDs as well as names). Or if names were restricted as necessary if IDs were dropped. At least for namespace name strings and the SYSTEM namespace attribute name strings, they should probably be restricted to a-z (or A-Z). I don't think that this would matter much for user namespace attributes, though. -- Matt
Re: Adding an option to avoid SIGPIPE for all file descriptors
On Wed, 25 Jan 2012 12:25:46 -0500 Steven Bellovin s...@cs.columbia.edu wrote: On Jan 23, 2012, at 11:05 58PM, Matt Thomas wrote: On Jan 23, 2012, at 7:58 PM, Steven Bellovin wrote: I also wonder whether we should also have a note that disabled SIGPIPE. similar to what paxctl does. You mean a system-wide flag? That would worry me; I think it would have bad effects, since anything that did a | b paxctl sets a note in the executable. I don't like that, either, but on philosophical grounds. The problem I have is that the semantics of the execution now depend on something not in the source code; however, the code needs to know about it in order to cope properly. (Setuid is somewhat different, since it also reflects the policy of the site.) I also don't see the point, as opposed to a system call to set the flag. A system-wide flag would mess with applications that expect the SIGPIPE traditional behaviour, and I also find rather awkward to depend on an ELF note for this. The use of ELF notes for paxctl is less questionable but still awkward: at application upgrade the admin must remember to also set the special paxctl flag again on the new executable, vs a vnode flag. Applications already can use signal(3) or sigaction(2) if they don't want it (and now the FD-specific setsockopt(2)/fcntl(2), which I see no problem with). But if I understand, Matt's suggestion is to be able to disable SIGPIPE signaling for some of them behind their back? Then how about a process/PID-specific nosigpipe sysctl(3) perhaps (we have things like stopfork/stopexec/stopexit), or a more general way to control if/which signals are ignored for a process via sysctl? Or something like nohup(1) but for SIGPIPE, nosigpipe(1), or a more general nosig(1) allowing to specify which signals to ignore? Thanks, -- Matt
Re: Possible incorrect usage of STACKALIGN in kern_exec
On Tue, 24 Jan 2012 21:01:49 +0100 Martin Husemann mar...@duskware.de wrote: On Tue, Jan 24, 2012 at 08:21:42PM +0100, Paul Fleischer wrote: Is the usage of STACKALIGN indeed incorrect in this situation, or am I missing the big picture? I stumbled across this when revamping execve1 for posix_spawn recently. The intention seems to be to align the stack on a 8 byte boundary (where arm usualy only requires 4 byte alignment). I did not dig in the ARM ABI docs deep enough to see why this would be needed. However, the current implementation seems to be broken - the macro works on the stack pointer but not on a length variable, as you noted. Can anyone explain why arm would need 8 byte alignment? Do some architectures (i.e. x86) have better performance if the stack is 16-bytes aligned? If so, perhaps that this could be MI, satisfying both 8-bytes (or 4-bytes) alignment, by aligning stacks at 16-bytes? Would this be considered wasteful? Of course, x86-64 MD code could also be used... There is also a related PR but which is for threads stack alignment: lib/39465 Thanks, -- Matt
Re: Reduce KAUTH_GENERIC_ISSUSER usage (batch 1)
On Tue, 17 Jan 2012 20:36:35 -0500 Elad Efrat e...@netbsd.org wrote: Attached is a diff that reduces the use of KAUTH_GENERIC_ISSUSER. I plan to commit it a week or so after the branch. Thanks for working on this. While I understand most changes, after looking at the diff I wondered: anyone know what is special about pxg(4) that requires a special MACHDEP_PXG check as opposed to MACHDEP_UNMANAGEDMEM? Thanks, -- Matt
Re: buffer cache ufs changes (preliminary ffsv2 extattr support)
On Sun, 15 Jan 2012 15:21:40 -0500 (EST) Mouse mo...@rodents-montreal.org wrote: However, I think that constitutes a good implementation of a bad idea. This makes a file no longer a long list of octets; it becomes multiple long lists of octets. The Mac did this, with resource forks and data forks, and you may note OS X doesn't do it any longer. I suspect these will seem like a good idea for a while, until people start discovering all the things they break, or that break them, and realize that they didn't learn from history and thus had to repeat it. I didn't know that Apple dropped the idea, but I have always found the idea flaky myself (and sorry for the rant): - Applications may still implement and maintain metadata as they wish without the feature - Requires changes to support in OS, FS, and many file manipulation tools - No standard API for these, few, incompatible, restricted solutions/formats for archival - Security implications (scanning tools which aren't aware might skip hidden/extended data; if ACLs are eventually implemented and are using these, the implementation should not only support a system domain, but also use IDs rather than strings (or at least severely sanity-check a restricted string format)) - Inevitable eventual loss of the extended data, possibly because of backup procedures not aware of it, moving/copying/editing files with non-aware/third-party tools, etc (also consider editors that save to another file to then rename) - An administrative nightmare when tools such as find/locate/grep/diff won't disclose data that the admin might be looking for but is now in an extended attribute But this is only the opinion of a user, and I could keep the feature disabled on my systems, of course, so I don't necessarily object to optional support for it. -- Matt
Re: PUFFS and existing file that get ENOENT
On Mon, 16 Jan 2012 10:56:33 + (UTC) y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote: when the kernel wants to cache other files. ie. whenever the kernel decides to reclaim it. :-) you can increase the chance by running while :;do sysctl -w kern.maxvnodes=0; done or something like that. Wouldn't the performance also drop significantly with a permanently low maxvnodes, though? Thanks, -- Matt
Re: NetBSD/usermode (Was: CVS commit: src)
On Sat, 31 Dec 2011 17:20:16 + David Holland dholland-t...@netbsd.org wrote: The other obvious approach is to add one or more new ptrace operations to provide proper/adequate/better support for intercepting system calls. This is probably a more useful facility in the long run, and it *can* be made leakproof, but it'll be more work. Could this also eventually allow systrace-style functionality that'd be safer than the previous implementation? Thanks, -- Matt
Re: close and ERESTART
On Mon, 26 Dec 2011 05:19:22 + Taylor R Campbell campbell+net...@mumble.net wrote: + + error = fd_close(SCARG(uap, fd)); + if (error == ERESTART) + error = EINTR; + + return error; If it's also guaranteed that the file descriptor state is closed in the event of an ERESTART error, I like this, personally. -- Matt
Re: cloning device close race?
On Sun, 18 Dec 2011 23:40:33 -0500 Matthew Mondor mm_li...@pulsar-zone.net wrote: On Sun, 18 Dec 2011 22:34:03 -0500 Thor Lancelot Simon t...@panix.com wrote: If you run 10 or so copies at once on a multiprocessor system with DIAGNOSTIC, you'll see a lot of this message emitted: vrelel: missing VOP_CLOSE(): vnode @ 0xfe801e73cb28, flags (0x800030MPSAFE,LOCKSWORK,INACTNOW) tag VT_UFS(1), type VCHR(4), usecount 2, writecount 0, holdcount 0 freelisthd 0x0, mount 0x800024235000, data 0xfe801de01f00 lock 0xfe801e73cc38 tag VT_UFS, ino 46213, on dev 4, 0 flags 0x0, nlink 1 mode 020644, owner 0, group 0, size 0 I am guessing the problem also exists with other cloning pseudodevices, not just the new /dev/random implementation. This just reminds me that a friend yesterday pointed me to this article about close(2)'s POSIX semantics: http://www.daemonology.net/blog/2011-12-17-POSIX-close-is-broken.html In case someone else was also interested in this, I was informed off-list that NetBSD ensures that the file descriptor be in closed state after close(2), in the event where it is interrupted and errors with EINTR. In another discussion with the person who originally forwarded me the above URL, I was told that according to her investigation, Linux also does this. Thanks, -- Matt
Re: cloning device close race?
On Sun, 18 Dec 2011 22:34:03 -0500 Thor Lancelot Simon t...@panix.com wrote: If you run 10 or so copies at once on a multiprocessor system with DIAGNOSTIC, you'll see a lot of this message emitted: vrelel: missing VOP_CLOSE(): vnode @ 0xfe801e73cb28, flags (0x800030MPSAFE,LOCKSWORK,INACTNOW) tag VT_UFS(1), type VCHR(4), usecount 2, writecount 0, holdcount 0 freelisthd 0x0, mount 0x800024235000, data 0xfe801de01f00 lock 0xfe801e73cc38 tag VT_UFS, ino 46213, on dev 4, 0 flags 0x0, nlink 1 mode 020644, owner 0, group 0, size 0 I am guessing the problem also exists with other cloning pseudodevices, not just the new /dev/random implementation. This just reminds me that a friend yesterday pointed me to this article about close(2)'s POSIX semantics: http://www.daemonology.net/blog/2011-12-17-POSIX-close-is-broken.html I then only checked the close(2) manual page so far, which indeed lists EINTR as a possible errno value on error. But since the article also mentions that some OSs decided to ensure that EINTR never be returned to avoid the problems, I wondered: does NetBSD already do something to ensure that either: 1) EINTR not be possible or atomically be restated transparently in the same LWP, or 2) the state of an FD after an interrupted close(2) never be inconsistent? The latter solution might still allow race conditions in multithreaded code, possibly. Thanks, -- Matt
Re: [RFC] getgroups2 system call
On Wed, 14 Dec 2011 07:04:06 +0100 m...@netbsd.org (Emmanuel Dreyfus) wrote: - a fixed lentgh header is highly desirable for performance optimization. For instance glusterfs fetches the header and the data using readv(2) with an iovec that has two slots. That way it gets write date aligned on a page boundary. What does NFS do in this case? I seem to remember that it also imposes a sane size limit, possibly even below NGROUPS_MAX, is it really the case? If so, would this also be acceptable? -- Matt
Re: Lost file-system story
On Fri, 9 Dec 2011 22:12:25 -0500 Donald Allen donaldcal...@gmail.com wrote: Linux systems do periodically write ext2 meta-data to the disk. And ext2 fsck has always been very good, and has gotten better over the years, due to the efforts of Ted T'so. I first installed Linux in 1993, almost 20 years ago, and have been using it continuously ever since. I have *never* lost an ext2 filesystem and I've never mounted one sync. I'm not sure if it's the case on Linux with ext2, but by default NetBSD FFS mounts are not sync, nor async; metadata is sync and data blocks are async. In async mode, all data is asyncronously written, including the metadata, and in sync mode everything is written synchronously (the default OpenBSD uses, if I recall). I just wanted to specify this as you mentioned not mounting your ext2 systems in sync mode, but a default NetBSD FFS mount will not be in sync mode either. Other available options with FFS are using soft dependencies (softdep) or WAPBL metadata journalling (log), with which it is possible to have increased performance VS the default mode, without really sacrificing reliability, unlike with the fully async mode. In those modes, metadata is written asynchroneously as well. Sorry if what I said is already obvious to you, -- Matt
Re: Use consistent errno for read(2) failure on directories
On Fri, 9 Dec 2011 09:33:54 +0100 Nicolas Joly nj...@pasteur.fr wrote: According to the online OpenGroup specification for read(2) available at [1], read(2) on directories is implementation dependant. If unsupported, it shall fail with EISDIR. In the case of sys/rump/librump/rumpvfs/rumpfs.c, is it possible that the underlaying implementation could previously decide if it could support read(2) on directories, and this would no longer be the case with this patch? Thanks, -- Matt
Re: Use consistent errno for read(2) failure on directories
On Fri, 9 Dec 2011 11:56:32 +0100 Nicolas Joly nj...@pasteur.fr wrote: On Fri, Dec 09, 2011 at 04:36:55AM -0500, Matthew Mondor wrote: In the case of sys/rump/librump/rumpvfs/rumpfs.c, is it possible that the underlaying implementation could previously decide if it could support read(2) on directories, and this would no longer be the case with this patch? No. This only impact the rump fs itself (used as the root file system in applications); its does not matter while accessing other fs through rump. Thanks for the explanation, -- Matt
Re: Lost file-system story
On Fri, 9 Dec 2011 15:50:35 -0500 Donald Allen donaldcal...@gmail.com wrote: were not designed to do this. The reason I'm beating on this is that I would have liked to use NetBSD for the application I have in mind, but I need the performance improvement that async provides (my tests show this; the tests also show that NetBSD async is about as fast as Linux, much faster than OpenBSD async, at least for doing a lot of writing, such as un-tarring a large tar file). This is practical if the joint The speed and reliability WAPBL provides have been enough for my uses personally; are the few seconds saved using async really that worth the trouble? Also, if raw speed is needed to do many installations on identical systems, dd with large blocks to mirror the system might be a faster alternative... I'm not argueing that fsck shouldn't be able to recover though; it ideally should, but the problem seems to be that too much metadata is missing when crashing while writing in async mode. OpenBSD's async mode would be slightly slower while flushing metadata more often, probably. Perhaps that having a sysctl to control flushing would be a good thing, though. Thanks, -- Matt
Re: emap
On Mon, 5 Dec 2011 04:19:13 + (UTC) y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote: Although I didn't think it'd be necessary to say so until this point, I admit that I myself didn't really understand what Takashi said about recommending amd64 over i386. If the hardware is 32-bit, or on constrained memory devices, i386 definitely needs to be supported. it isn't my recommendation. rmind@'s. Sorry about that, I should have rechecked upthread instead of looking at the quoting mess :) -- Matt
Re: secmodel_register(9) API
On Tue, 29 Nov 2011 02:51:38 +0100 Jean-Yves Migeon j...@netbsd.org wrote: Reviews before merge welcome. If nobody raises his voice, I'll proceed to commit it at the end of the week. Hello, I admit not having audited the kauth and secmodel code recently, the last time being shortly after Elad's initial implementation, please bear with me if some observations are irrelevant: There are various ad-hoc calls to printf() which could probably be replaced by a more generic function call also resolving the error number to a string matching the constant i.e. secmodel_perr(int errno, const char *function); or similar, possibly wrapped by a macro using __FUNCTION__ avoiding the redundant function names The initialization functions, such as secmodel_keylock_init(), will report an error in the dmesg but do not propagate errors (they're void functions, suggesting the caller will not suspect anything). Should not the system panic for similar security critical failures? I think that I saw a similar situation under the various case MODULE_CMD_INIT. When seeing the strcasecmp() calls in the eval_* functions for names such as is-securelevel-above or is-root, my first impression was that integer constants, macros, or even a system of interned strings and references would be nice. Then it struck me that if the goal was to export these, exporting actual variables might be best (although in any case, exporting these seem to somewhat defeat kauth-style centralization. But if I understand, this is not for general use in the kernel, but for use by other security models? If so, it's not so much out of scope in the sense that it remains in sys/secmodel)... Note that the following is not criticism on your patch, but pipe-dreaming and opinion. It's also outside the scope of the existing kauth implementation, but I couldn't resist, considering it was slightly on-topic: Having a main area to look for security related decisions is a good thing, and kauth was a good step in that direction. It's also nice to permit an administrator or organization to tweak the system for their needs using an elegant architecture. However, I've always found its design to be slightly too dynamic, perhaps too much of an interpreter (and those eval_* functions make it even more so). Then there is all the C code dedicated to attaching, detaching parts to the program tree at runtime, etc. Although I'm not familiar enough with the original Darwin implementation, that is probably similar there. Since this is security related, it would not be unreasonable if the only possible runtime changes were user/admin configuration (module-specific sysctl configuration knobs, file system permissions, PaX flags, etc). This means that the final runtime security system could be statically generated at compile-time. Dreaming ahead along that path (this part could still be possible with an interpreter-like model though), it might be possible to create a similar system, centralized yet modular (not at runtime, but for human-friendly organization), to design and use a simple mostly declarative language to edit and represent security models, then compile that representation to static code. The input could be more elegant (also more easily allowing to define the domains and their authorize interface, any hierarchies, etc), the output could permit a more efficient runtime (generating unrolled code where wanted rather than loops running among hooks lists)... And of course there could be specialized static analysis and test tools to warn model designers of possible shortcommings in their designs. With finally a preprocessor tool so that it'd be possible to embed the language with C code, where necessary... But then again, I'm only pipe dreaming, and that's always easier than implementing any of that, of course :) -- Matt
Re: emap
On Fri, 25 Nov 2011 23:25:24 +0400 Aleksej Saushev a...@inbox.ru wrote: Thor Lancelot Simon t...@panix.com writes: On Fri, Nov 25, 2011 at 12:50:58PM +0400, Aleksej Saushev wrote: Mindaugas Rasiukevicius rm...@netbsd.org writes: y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote: hi, what's the status of emap and pipe? ... and encourage our users to use amd64 instead of i386. I'm sorry to intervene, what about WINE? Unless we're going to have it functional on amd64, encouraging is useless. I don't understand your comment. Are you suggesting that a large fraction of NetBSD/i386 users use WINE and therefore would not be able to switch to the amd64 port? I mean that those users who could switch most probably have switched already. And one of serious reasons to stay on i386 is functional WINE. Although I didn't think it'd be necessary to say so until this point, I admit that I myself didn't really understand what Takashi said about recommending amd64 over i386. If the hardware is 32-bit, or on constrained memory devices, i386 definitely needs to be supported. But then again, I'm not familiar with the emap code; from the bits I read in this thread, it could serve to optimize pipes? That pipes can be better optimized on amd64 than on i386 is no problem to me, so I assumed that he was talking about encouraging users to use amd64 if they want to take advantage of a particular feature, not that i386 would get deprecated and start to become unsupported. It would be nice if someone who knows better could explain better what was meant, or confirm what I said above (if I understood correctly), considering that it caused some worries... Thanks, -- Matt
Re: puffs netbsd-5 (was VOP_GETATTR: locking protocol change proposal)
On Mon, 21 Nov 2011 08:04:46 + Emmanuel Dreyfus m...@netbsd.org wrote: FWIW I spent weeks tracking down a file corruption bug on growing giles in PUFFS because VOP_GETATTR operates on an unlocked vnode. If the VOP_GETATTR request follows a not yet completed VOP_FSYNC (as done by ioflush thread), I got toasted: VOP_FSYNC causes PUFFS to send a SETATTR to the filesystem, and on completion VOP_GETATTR gets from the filesystem a file size smaller thant what VOP_FSYNC just set. This cause the file to be truncated by the kernel, and data written between VOP_FSYNC and VOP_GETATTR to be discarded and replaced by a chunk of zeroed bytes. I had to add a lock on file size modification in PUFFS to fix the problem. I seem to remember you previously writing about using puffs/rump on netbsd-5, is that still on netbsd-5? The reason I ask is that I'm seeing various bugs when using psshfs (and had various problems when mounting CDs using rump_cd9660); at the time when I corresponded with Pooka about it he told me that it wasn't ready for production use on netbsd-5 and recommended -current. One of the problems is the process can suddenly start to consume as much CPU time as it can, while operations become real slow or lock. Another issue had to do with inconsistencies between the rump-seen state and actual on-disk state, possibly due to cache invalidation issues or the like... A few days back I still had the psshfs process locked in a loop (I didn't use it often enough to investigate where it loops yet). This might not be related at all to the locking issues you're having, though. Thanks, -- Matt
Re: puffs netbsd-5 (was VOP_GETATTR: locking protocol change proposal)
On Mon, 21 Nov 2011 08:45:52 + Emmanuel Dreyfus m...@netbsd.org wrote: On Mon, Nov 21, 2011 at 03:26:35AM -0500, Matthew Mondor wrote: I seem to remember you previously writing about using puffs/rump on netbsd-5, is that still on netbsd-5? I use PUFFS on netbsd-5, and fixed a few bugs in it, so you defintively need latest netbsd-5 to avoid bugs. I nevver used rump, and indeed Pooka told me that it was not production-ready on netbsd-5. My systems are fairly recent; what I'll do then is update again and use psshfs some more, so that I can file a PR when I again get the busy looping issue. My two older PRs related to rump/puffs on NetBSD-5 were kern/43589 and kern/43590, which were unrelated problems. Thanks, -- Matt
Re: fs-independent quotas (binary plists)
On Thu, 17 Nov 2011 10:50:17 +0100 Manuel Bouyer bou...@antioche.eu.org wrote: In this context, text format means a key/value pair format, in which some keys are optionnal and values can be of arbitrary types. Maybe you can do this with a binary format too, but it doesn't exists yet. This reminds me that years ago someone implemented support to save plists in a binary format[1] (this doesn't necessarily mean that it would help solve this problem, though). But I'm surprised that since all these years the support wasn't added; anyone know if there is general resistance to an optional compact and portable binary format, and if so, the reasons? If such a format was supported, it wouldn't be harder to machine or human-process (proplib could be used as it is now for code, and bplists could be easily exported to an xml format as requested to edit in an editor, i.e. via a viplist, plistctl or such command (which also could use advisory locking, of course, and save back to binary format if the system is configured to use a binary format). In theory, it could also increase performance, and a binary format would be simpler to parse by the kernel than xml, minimizing bugs... [1] ftp://ftp.netbsd.org/pub/NetBSD/misc/freza/bplist-2007-10-27.diff Thanks, -- Matt
Re: MAXNAMLEN vs NAME_MAX
On Sun, 13 Nov 2011 23:08:30 + David Holland dholland-t...@netbsd.org wrote: I was recently talking to some people who'd been working with some (physicists, I think) doing data-intensive simulation of some kind, and that reminded me: for various reasons, many people who are doing serious data collection or simulation tend to encode vast amounts of metadata in the names of their data files. Arguably this is a bad way of doing things, but there are reasons for it and not so many clear alternatives... anyway, 256 character filenames often aren't enough in that context. It's only my opinion, but they really should be using multiple files or a database for the metadata with as necessary a link to an actual file for data. But I also tend to think the same of software relying on extended attributes, resource forks and the like (with the possible exception of a specialized facility for extended permissions :) (This sort of usage also often involves things like 50,000 files in one directory, so the columnizing behavior of ls is far from the top of the list of relevant issues.) This reminds me, does anyone know about the current state of UFS_DIRHASH? I remember reading about some issues with it and ending up disabling it on my kernels, yet huge directories can occur in a number of scenarios (probably a more pressing issue than extending file names, actually)... The 255 limit was just because that's how many bytes a one byte length field permitted, not because anyone thought names that long made sense. But if you're going to increase it, why stop at 511? That number means nothing - the next logical limit would be 65535 wouldn't it? Well... yes but there are other considerations. As you noted, going past one physical sector is problematic; going past one filesystem block very problematic. Plus, as long as MMU pages remain 4K, allocating contiguous kernel virtual space for path buffers (since if NAME_MAX were raised to 64K, PATH_MAX would have to be at least that large) could start to be a problem. I agree, especially with all the software that allocates path/file name buffers on the stack (but even on the heap it could be a general memory waste with 64KB, other than the memory management performance issues). -- Matt
Re: sysctl(7) knob to allow users to control CPU affinity
On Thu, 03 Nov 2011 17:01:48 +1100 matthew green m...@eterna.com.au wrote: Since the default is to not allow affinity control, it's not of utmost importance, but it could allow a compromise between total restriction and total freedom... I have no objection to that sysctl personally. i think the default should be changed, but user-specified affinity shouldn't be considered an absolute rule, just a preference. i'm not sure i understand exactly what sort of issue you're envisioning. I assumed there could be issues since pset(3) is restricted to the superuser (as well as pthread_setaffinity_np(3) now), but when rethinking about it I admit not seeing a problem as non-privileged processes cannot change the process priority beyond their class' priority. The only other case that comes to my mind would be a dmover(9) like system eventually reserving processor(s) for dedicated tasks, but I guess that in this case the reserved cores would simply be made unavailable in cpuctl(8)/pset(3)/etc... -- Matt
Re: sysctl(7) knob to allow users to control CPU affinity
On Thu, 03 Nov 2011 01:50:49 +0100 Jean-Yves Migeon j...@netbsd.org wrote: Here's a proposal for a sysctl(7) knob to easily allow non-superusers to set the CPU affinity of processes and threads they own: security.secmodel.suser.usersetaffinity (ressembles the one already existing to allow for user mounts) Would it be acceptable to modify current secmodel_suser(9) to allow this? This issue comes regularly on various tech-* MLs, motivated by the fact that people expect this behavior based on what they encounter on other OS. Just out of curiosity, but is it possible for the superuser to still reserve wanted CPU/cores, such that non-privileged users could, if that sysctl is enabled, work with the non-reserved ones? Or, can the sysadmin specify CPU/cores and/or limits for non-privileged users? Since the default is to not allow affinity control, it's not of utmost importance, but it could allow a compromise between total restriction and total freedom... I have no objection to that sysctl personally. Thanks, -- Matt
Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive
On Mon, 31 Oct 2011 19:58:27 -0400 Greg Troxel g...@ir.bbn.com wrote: Obligatory actual netbsd tech-kern content: It seems like we really need a sync_synchronous(2) system call that guarantees that all file system operations that have completed (syscall returned) before the issuance of the sync_synchronous call are on disk before sync_synchronous returns. It seems odd that for sync, there is no waiting, fsync seems to wait, and fsync_range can flush or not flush caches, more or less. Hmm since in sync(2), the non-synchronous issue is noted as a bug: BUGS sync() may return before the buffers are completely flushed. Does this mean that sync(2) should normally be synchronous and fixed to be, such that sync_synchronous(2) not be necessary? -- Matt
Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive
On Fri, 28 Oct 2011 20:33:29 -0400 Greg Troxel g...@ir.bbn.com wrote: So, I'm inclined to patch rdiff-backup not to fsync, since it seems excessive, and the backup is toast if the machine crashes before it is finished -- in that case rdiff-backup just rolls back. Opinions? I also wonder why fsync would be used for every file, especially if you consider a whole run a single transaction, even more so if using snapshots (although you don't mention using them). In which case it simply should report failure and abort on any open/write/rename/close error, and at the end, fsync once, also checking for error. If at that point everything was successful, the transaction is commited (as far as software is concerned, of course, hardware buffers might still need flushing), otherwise everything should be rolled back, unless an inconsistent state is allowed (where the next full backup might fix that). I'm however wondering if the excessive fsync(2)s weren't eventually added because of issues with ext4, as I somehow remember unix semantic exceptions with it, and know that some have lost files using it as they'd normally safely use other file systems (and I haven't followed progress to know if it's since fixed). But if rdiff-backup cannot optionally avoid those, adding an option to tell it not to fsync at every file as you suggested would be very sane IMO (it still could default to sync mode, in case there's upstream resistence)... I can understand the need for some transaction-logging applications to call fdatasync(2) regularily, but that's another matter (and even then it's usually configurable after how many bytes or seconds to call it to allow the administrator to tweak performance). -- Matt
Re: Extended attributes Linux interface
On Fri, 21 Oct 2011 00:29:12 -0400 Matthew Mondor mm_li...@pulsar-zone.net wrote: If unicode strings are possible, I think that it'd be possible for a string to look like system but to actually be something else to an auditing administrator, unless all tools clearly showed those non-ASCII bytes in an escaped format. If the above theory is true, if we eventually supported extended permissions such as access lists, they could possibly be implemented in a special empty string class, with a special empty string key, and a single structured object value specifying the permissions, rather than relying on various keys within the system class. Yet ideally for performance and security, it'd be ideal if the interface only presented integer IDs for the class, and reserved integer key attributes for the i.e. EXTATTR_SYSTEM class (just like our groups are really gids). The Linux compatibility interface, if preserved, could be oblivious to system class attributes and only be useful for the general purpose user attributes... The problem here would be that user tools using only the Linux API would not be able to backup the full state (in this case, the extended permissions, unfortunately)... -- Matt
Extended attributes Linux interface
Hello, There were previously discussions, started by Emmanuel, concerning the extended attributes, including on the various available APIs and which to support etc. At the time I read them I was catching up with a lot of mail and had written down a small note about a potential security implication that crossed my mind if we used the Linux interface. Perhaps someone can (dis)confirm: Strings are used instead of IDs to distinguish the class of an extended attribute, i.e. system etc. My question is then: must those be limited to ASCII or can they support arbitrary bytes, or UTF-8? If unicode strings are possible, I think that it'd be possible for a string to look like system but to actually be something else to an auditing administrator, unless all tools clearly showed those non-ASCII bytes in an escaped format. Of course, if the kernel wanted to match system, it wouldn't match then, but the fact that it may _appear_ to be correct to an admin may introduce a security issue if extended permissions were ever implemented on top of that system. Perhaps that this problem could also exist with the key names in case they're part of permission descriptions? Thanks, -- Matt
Re: UNIX kernel notification system
On Mon, 3 Oct 2011 00:40:46 -0700 Erik Fair f...@netbsd.org wrote: Why not a classification/taxonomy of kernel missives? This doesn't mean we can't continue to have relatively free form (and possibly amusing) text for those conditions we're not yet prepared to classify/codify yet ('cause they're rare, or debug, or ... whatever). The potential for win is in making (or retaining) software parse-ability to enable software response. Interestingly this very paragraph reminds me of Common Lisp signals and restarts; signals can be conditions or errors and hold structure (and inheritence), blocks of code may ignore or catch them, uncatched exceptions may be handled by software including the invokation of restarts, or left alone to be routed to the debugger (which is even overridable through a hook), and there is support for stack-unwind protected code which gets executed no matter if an exception causes a long jump out. Of course, all of this seems overkill for our purposes, but probably worth mentioning for inspiration... -- Matt
Re: UNIX kernel notification system
On Mon, 3 Oct 2011 11:31:17 -0700 Erik Fair f...@netbsd.org wrote: less(1) (or more(1)) doesn't take care of you? The nice thing about such formatting is that the text can be wrapped at relatively arbitrary word boundaries, making it more readably displayable on a wider range of display widths (e.g. mobile phones, tablets). Just as all the world's not a VAX (cried the PDP-11 users), so also is the world rather more than just 80 by 24. Sorry to have to add anything to this off-topic discussion; One issue is that a message may a mix of text to be wrapped and text to be left as-is (code for instance), so every paragraph/line must be able to be auto-wrap annotated. Of course there is the possibility of HTML mail (with its own issues) and multiple MIME parts, but it's traditionally fine on tech lists to mix code and text inline, with the only exceptions that I see being in Apple Mail posts. Another issue is that readers that will wrap such paragraphs don't usually have a configuration option to specify the width of auto-flowed paragraphs, so for instance in the client I use (a GTK2 client), those paragraphs extend far right until the end of the window (which means much more than 80 columns), so reading them is unfortunately harder. But, with the recent proliferation of Apple Mail posts on the mailing lists I try to throttle my complaints about it (my last one being http://mail-index.netbsd.org/tech-userlevel/2010/10/30/msg004119.html :) -- Matt
Re: (off topic) mail line wrapping
On Tue, 4 Oct 2011 09:35:16 +0200 Alan Barrett a...@cequrux.com wrote: (flowed paragraph follows) Ignoring special cases, the rules are roughly this: The sender marks soft-wrapped paragraphs by ending every line except the last with a space. The sender marks hard-wrapped lines by not ending them with a space. (A paragraph of only one line cannot be soft wrapped.) Fortunately, your auto-flowed paragraphs are still properly wrapped so that even clients that don't support it will display them properly, though. -- Matt
Re: Multiboot a NetBSD kernel with Grub2: it works
On Tue, 13 Sep 2011 19:36:03 +0200 Emmanuel Kasper emman...@libera.cc wrote: I have just posted a detailed install from GRUB howto on netbsd-users. Did the documentation you proposed get commited into the official docs somewhere since? If not, please consider filing a PR with the information, so that it doesn't get lost. The bit about needing to pass /netbsd twice so command line arguments get passed to the kernel is also worthy of mention... Thanks, -- Matt
Re: Changing the gpio(4) API/ABI
On Fri, 23 Sep 2011 12:38:13 +0200 Marc Balmer m...@msys.ch wrote: With gpio(4) we still carry an old API with us, which I want to remove. While working on it, I will also introduce a third locator to device drivers that attach to gpio pins, flags. It will be needed for e.g. gpioiic(4) to invert the SDA/SCL pin numbers. WIll documenting the changes be enough? Perhaps only one other question: Is there any advantage to keep compatibility with OpenBSD (from which gpio(4) was intially ported); are there tools from OpenBSD than can be used because of this compatibility? Or has gpio(4) stalled on OpenBSD? Another option would be to allow a full redesign under a new device name/copy, if that's a concern. Personally, although I've seen gpio in the releases I used since quite a while, I've never used it, and I doubt that I used any code relying on it... Thanks, -- Matt
Re: pty(4) 1024 bytes buffer limit
On Fri, 9 Sep 2011 09:38:31 -0400 Matthew Mondor mm_li...@pulsar-zone.net wrote: On Fri, 9 Sep 2011 00:26:43 + (UTC) chris...@astron.com (Christos Zoulas) wrote: Please file a PR about this. I've been meaning to fix it. Thanks, I will. For reference and to close this thread, the relevant PR was kern/45352, which was fixed and closed, thanks to Christos for the fixes and to the others who posted hints. -- Matt
Re: KAUTH_PROCESS_SCHEDULER_*AFFINITY restricted to root in default secmodel?
On Mon, 29 Aug 2011 01:07:52 +0200 Alistair Crooks a...@pkgsrc.org wrote: Sorry for replying to an old thread, I'm still catching up with mail :) i've found this some what annoying. IMO, we should have a a way to say let normal users do this. i'm not sure sysctl is the right place, but maybe an overlay secmodel? on some of my machines, i don't want to have to be root to do this. it's annoying to have to use root to get the highest performance i can out of an application. the current default is fine, however. Something analogous to our friends: % sysctl -a | grep mount vfs.generic.usermount = 0 security.models.suser.usermount = 0 % And/or like security.models.bsd44.curtain, etc; I think that a sysctl for this would be nice too. Also, I'm not sure if this is doable (an annoyance if users and scripts have been using the old knobs), but I tend to think that sysctls that affect the default secmodel (bsd44) should ideally all be under security.models.bsd44.? -- Matt
Re: 5.1 USB panic on second removal of memory stick
On Wed, 15 Jun 2011 20:04:23 -0700 Bob Lee g...@force10networks.com wrote: Hello Bob, I'm working on a PowerPC system, and have a problem when I remove the usb memory stick the second time. That is insert memory stick, remove memory stick, insert memory stick, and remove memory stick. At this point the system panics, with 'ehci_rem_qh: ED not found'. Anyone else seen anything like this? If your problem still occurs, please file a PR, along with the backtrace and dmesg, so that it doesn't get lost. I think that it should be filed in the kern category. Thanks, -- Matt
Re: hot swap storage devices
Sorry to reply to such an old thread (I'm catching up with ml mail). On Mon, 27 Jun 2011 12:35:48 -0700 Erik Fair f...@netbsd.org wrote: With regard to hot swap storage devices, we really have two choices which are not mutually exclusive: 1. Treat as now, but with some additional code in the kernel which yells, hey! put that back! I have data to write on it! when a device goes away without prior notice (umount), and hold on to (rather than discard) the data in the I/O buffer cache, in the hope that the user notices and heeds the directive. Timeout to discard? Probably depends upon how much RAM utilization pressure you're under. I think minutes would be a good unit here. I think that this is the best solution; This is basically what AmigaOS did, and it was nice, but it also had a unified interface where even console was implemented on top of graphics, with intuition.library resident in ROM, making it possible to pop-up requesters at any time. And it was designed for single-user... This is more tricky in our case though; as the kernel should then be able to forcefully trigger a requester, which ideally shouldn't interrupt running processes and from which it must be possible to resume working, on whatever currently active interface (console, possibly in tmux/screen, or X11). I wonder what the feasibility of this could be: reserve a wscons VT (where possible) for this type of requester; when the kernel must use it, remember which is the active VT, switch to the requester VT in text mode where the requester is shown. Depending on configuration, this behaviour could be enabled or disabled, and possibly a timeout could be configured. Once the timeout expired or the needed user action was performed (user selects cancel, retry, inserts a requested device, etc), return back to the previous VT. But this still does not deal with device identification; on AmigaOS disks had labels and the system would verify upon insert/connect if the label corresponded to such a pending requester... Thanks, -- Matt
Re: pty(4) 1024 bytes buffer limit
On Fri, 9 Sep 2011 00:26:43 + (UTC) chris...@astron.com (Christos Zoulas) wrote: Please file a PR about this. I've been meaning to fix it. Thanks, I will. -- Matt
Re: pty(4) 1024 bytes buffer limit
On Fri, 09 Sep 2011 08:30:51 +1000 matthew green m...@eterna.com.au wrote: I looked at the various tty(4) termios(4) and pty(4) without finding an option to change the buffer size. Is there a way at all to change it? there's no option. infact, it's all hard coded as magic 1024 constants in about 4 places in sys/kern. i kept meaning to fix that, but haven't gotten around to it. Thanks for the confirmation, -- Matt
pty(4) 1024 bytes buffer limit
Hello, I've been wondering if it was possible to change the pty(4) internal buffer size, as I noticed that ppp tunnels cannot use a larger frame size. Because of this, it seems that the optimal MTU be 856, which is so small that context switches become the bottleneck. It would be nice to for instance be able to use an MTU of 3000 so that there are less context switches, but unfortunately tracing the processes show that 1024 bytes are read from the pty devices at most. I looked at the various tty(4) termios(4) and pty(4) without finding an option to change the buffer size. Is there a way at all to change it? Thanks, -- Matt
Re: sys/dev/isa/fd.c FDUNIT/FDTYPE
On Wed, 4 May 2011 19:54:37 -0700 jnem...@victoria.tc.ca (John Nemeth) wrote: This doesn't mean we should be doing hack jobs. NetBSD is about doing things right. Can postinstall fix/recreate specific buggy devices? Or could it warn that /dev/fd* might need to be recreated? Otherwise, at least it should be mentionned in UPDATING, and that would be allright, IMO. -- Matt
Re: extent-patch and overview of what is supposed to follow
On Sat, 2 Apr 2011 11:49:14 +0200 Martin Husemann mar...@duskware.de wrote: On Sat, Apr 02, 2011 at 11:30:16AM +0200, Manuel Bouyer wrote: AFAIK dtrace doesn't work on non-modular kernels ... Nor on most of our archs, and AFAICT there is not even a document describing the (maybe nontrivial amount of) work needed to make it work there. I don't think that we should leave the tracking for a hypothetical future; it'd be better if the interface, or implementation, allowed to do such tracking -- Matt
Re: Status and future of 3rd party ABI compatibility layer
On Wed, 23 Mar 2011 16:06:07 +0100 Joerg Sonnenberger jo...@britannica.bec.de wrote: As such, I want to propose moving the last two categories into the Attic for further dusting. It makes sense to me, -- Matt
Re: Status and future of 3rd party ABI compatibility layer
On Wed, 2 Mar 2011 00:40:44 + Andrew Doran a...@netbsd.org wrote: With modules now basically working we should either retire or move some of these items to pkgsrc so that the interested parties maintain them. An awful lot of the compat stuff is now very compartmentalised, with not much more work to do. Is all compat code i386 specific? Otherwise, do modules really work on all architectures involved? Can a module built from third-party code be linked statically to a monolithic kernel without hassle, for systems on which enabling loadable modules is not allowed? Thanks, -- Matt
Re: mpt Serious performance issues
On Fri, 4 Feb 2011 09:17:01 +0100 Stephan stephan...@googlemail.com wrote: Now this is REALLY strange. I was wondering about why the read speed is sometimes high (~70MB/s) and sometimes very slow (~2MB/s). So I repeated the test utilizing find / -exec cat {} \; /dev/null to read everything from the filesystem while watching the physical disks with my eyes and the throughput with sysstat. The findings is -that sometimes the upper disks is 100% busy while the lower disk is NOT being accessed at all, and the read speed is ~2MB/s -then sometimes the adapter switches to the lower disk while the upper disk isn´t utilised anymore, and the read speed increases to ~70MB/s -until the adapter again switches to the upper disk which leads to the massive decrease in speed So what do you think about that? Just in case, none of those disks show any reallocated sectors using atactl smart status? I'm asking because I've seen very inconsistent speeds on some drives whenever the remapping logic had to be turned on. Also, nothing in dmesg about read error retries? As I've also seen brand new disks with very high read error rates but otherwise normal smart stats. They two would crawl when reading certain areas. Unfortunately I'm seeing this later defect more often recently. -- Matt
Re: freebsd 5.99.41 as XEN3_DOMU
On Sun, 19 Dec 2010 20:54:26 +0100 Manuel Bouyer bou...@antioche.eu.org wrote: Well, in the current state, modules are a not enabled in the Xen kernels (modules should be built specifically for Xen, but the build tools do not allow this right now). So you have to compile all what you need in a monolitic kernel. But ZFS is only available as module, so unfortunably this means no ZFS for xen. One way around it is to run NetBSD in a HVM guest. It it common for modules not to be able to be statically linked in a monolithic kernel? I understand providing ZFS as a module is convenient for licensing reasons, but probably that it shouldn't be too hard to somehow optionally link such a module to a kernel image at build time, and call an init/load hook at boot runtime? I tend to think that other than allowing to optionally dynamically load code, another advantage to modules would probably be that they also can optionally be included monolitically, with ideally no code changes... Thanks, -- Matt
Re: New apple keymap variant or keymap in /usr/share/wscons/keymaps?
On Sun, 28 Nov 2010 21:04:54 +0100 Frank Wille fr...@phoenix.owl.de wrote: I came to the conclusion that it might be easier and less intrusive to create a new keymap file (e.g. called ukbd.apple.powerbook) for those function keys. So they can easily be added to any national keyboard layout. But I realized that wsconsctl is unable to process a mapping-line with just one Cmd_*, or a Cmd followed by Cmd_Function in it. When there is no good reason that those are rejected I will fix it in the wsconsctl-parser now. When a while ago I posted PRs with a new keymap to be added to the kernel, I was told that they now should ideally be added as userland keymaps. When later supplying a userland keymap (the FR_CA one), I noticed that the interface wasn't as friendly or powerful as it could have been. In case you intend to also enhance the keymap infrastructure and interface, I have an old pending PR (misc/26720) with a few enhancements for it, but I never got back to update the diffs for a recent -current or to keep enhancing it. Those are userland changes though, possibly tech-userlevel is a better place to continue the thread in this direction. But other than encoding= support, it might also be nice to be able to have include support like include= as well, after which it would be possible to restructure the keymaps and move common parts together; and if such include support allowed conditionals, parts could be loaded conditionally and automatically depending on machine model (assuming that would become available via sysctl), etc. What demotivated me from keeping to work on it back then was the low interest of the developers about that PR, but most importantly that I'm usually using X11 terminals and ssh myself, with the default EN-US wscons keymap being fine when I'm really at the console (and that almost exclusively occurs at installation time). If we want to pipe-dream, for the future, now that there's Lua in base, it's even possible to redo the whole userland keymap loading/management part with a more powerful language than sh. This last part being back on topic with tech-kern, with the advent of the kernel-lua project, it might even be possible to eventually allow user translation mechanics in the form of Lua scripts... :) -- Matt
Re: vmpage race and deadlock
On Sun, 28 Nov 2010 09:30:44 +0100 Juergen Hannken-Illjes hann...@eis.cs.tu-bs.de wrote: Usually within hours I get a deadlock where a thread is waiting on genput but the page in question is neither BUSY nor WANTED. I suppose I tracked (*1) it down to three places, where we change page flags without holding the object lock. With this diff (*2) in place the test runs for 48 hours. This is a nice find, which most probably also deserves a PR, as netbsd-5 also lacks proper synchronization there. Thanks, -- Matt
Re: mlock() issues
On Fri, 22 Oct 2010 10:18:52 +0100 Sad Clouds cryintotheblue...@googlemail.com wrote: A pipelined request, say for 10 small files can be served with a single writev() system call (provided those files are cached in RAM), if you rely on kernel file cache, you need to issue 10 write() system calls. Is this also true if the 10 iovecs point to mmap(2)ed files/buffers which pages were recently accessed? -- Matt
Re: mlock() issues
On Fri, 22 Oct 2010 12:06:37 +0100 Sad Clouds cryintotheblue...@googlemail.com wrote: Well if you're allocating memory yourself, then you've just created your own application cache. Say many files were mapped in the process's address space, the OS would still be responsible of keeping frequently used ones pages active, possibly swapping out long-unused ones, unless of course MAP_WIRED was used. A syscall per access would be eliminated however, i.e. read(2), and I think that zero-copy could be used (with page loaning) if writing 64KB blocks out to a socket from a memory-mapped file. On the other hand if you mmap() those files directly, what happens if another process truncates some of those files while you're reading them? I didn't do a test (it's definitely worth testing), but I think that a SIGSEGV could occur if a previously available page disappeared unless MAP_COPY, and file need to be remapped. I could see a problem where a siginfo-provided address might need to be easily matched with the file so that the process can efficiently know which file to remap... and for many files the current kqueue(2) EVFILT_VNODE isn't very useful either to detect that a file was recently modified, as it'd require too many open file descriptors :( There was some discussion made years ago about a kqueue(2) filter that could be set on a directory under which any modified file (possibly for the whole involved filesystem for the superuser) would generate an event with information about which file is modified by inode, but this seems non-trivial and wasn't yet implemented. There also are issues with inode to file string lookup (multiple files could point to a common destination, and a reverse name cache is needed). Anyway, I like this kind of discussion and have nothing against NIH personally (it fuels variety and competition, in fact), so thanks for sharing your custom cache experiments and performance numbers. If you happen to do achieve interesting performance along the above lines with mmap(2) as well, I'd also like to know how it went. Thanks, -- Matt
Re: kernel module loading vs securelevel
On Mon, 18 Oct 2010 14:51:03 +0200 Jean-Yves Migeon jeanyves.mig...@free.fr wrote: *lurker mode off* IIRC, part of agc work with netpgp is to integrate signature verification within kernel. *lurker mode on* Thanks, that's nice to know, I didn't look at netpgp yet but might eventually check if its RSA implementation (if any) can eventually be worked into common/lib/libc/rsa, which would be a major step forward to allow the kernel to verify signatures. I started writing a task list to have an idea of what needs to be done, and it's not trivial (http://cvs.pulsar-zone.net/cgi-bin/cvsweb.cgi/mmondor/netbsd/signed_modules.txt?rev=1.5;content-type=text%2Fplain). I might give an implementation a try during my next vacations, but no timeline or guarantee (disclaimer!). Motivation is also a factor as my current (very simple) solution to the various MODULAR issues I've faced (mostly maintenance related) has been so far to use monolithic kernels. -- Matt
Re: kernel module loading vs securelevel
On Mon, 18 Oct 2010 09:31:32 -0400 Steven Bellovin s...@cs.columbia.edu wrote: Signatures provide *authentication*; what is needed here is *authorization*. While I agree, there also are situations were both can be welcome... Another solution someone proposed which I like is hashing the modules to then at load time rehash and match a module against the hash set, which would be a simpler, shorter-term solution. I think that embedding the hashes set in the kernel image would be safer than using a file, however. Unfortunately, this makes developing, installing or upgrading a module less friendly as the kernel image has to be refreshed and the system rebooted. -- Matt
Re: kernel module loading vs securelevel
On Sat, 16 Oct 2010 13:58:19 -0400 Thor Lancelot Simon t...@panix.com wrote: 2) Finish the asymmetric operation support in cryptodev and actually require modules to be signed. This is basically a superset of #1 above that could get about as complicated as one wanted it to (ugh) but might be worthwhile if kept simple. You seem to now agree with me that this could be a solution. It indeed requires more work, but it also has advantages: not having to care about module location or their mutability; allowing delegation (multiple trusted public keys allow to verify signatures of various trusted third parties), among others. A developer working on a module only has to sign it without any further trouble to test it (assuming he included his public key in the kernel image). No need to go change the flags of a hashes file (a plausible point of failure anyway), update it, make it immutable again, etc. Of course a serious problem would still exist if the kernel's database of trusted keys could be modified. An effort could be made so that these cannot be modified at runtime but only at kernel image build time, requireing a reboot, and those that care can manage to load the kernel image from a read-only source. To simplify things, couldn't X.509 parsing strictly be done by the userland build infrastructure? The list of trusted keys can be stored in a simple binary format as part of the kernel image, and the module signature can also be stored as a simple binary format as part of that module. If you want to be able to revoke an existing key at runtime, support the use of subkeys and CAs and the like, things suddenly become more complex, but I don't think it's necessary for this. Even a simpler system with no trusted entities list could make use of this: a random key pair could be generated at build time, the public part of it the only stored key in the kernel image, and all modules signed with the private part of that key, which then gets discarded... Although the only advantage over veriexec-like hashes in this case would be reduced kernel image read-only data segment (i.e. one 1024-bit public key stored instead of n * 160-bit hashes). -- Matt
Re: [ANN] Lunatik -- NetBSD kernel scripting with Lua (GSoC project results)
On Sun, 10 Oct 2010 19:45:41 -0600 Samuel Greear l...@evilcode.net wrote: I didn't like the fact that the only option for loading a script into the kernel was to load the script source. I would make loading pre-compiled scripts the preferential method. In fact, I would probably tear eval out of the kernel lua implementation and only support loading of precompiled byte-code into the kernel. If the tokenizer is considered heavy, or a potential source of exploit, or if scripts are expected to frequently be loaded and a peformance bottleneck exists, I also think that loading pre-tokenized bytecode would be a good idea. However, there are several things to consider: some systems (i.e. Java) do important sanity checks at tokenization time. Is this important for Lua? Secondly, is the Lua bytecode using a stable, well defined instruction set which is unlikely to change? Otherwise as it improves and gets updated any pre-tokenized scripts might need to be regenerated. Of course, that's probably not an issue if everything is part of the base system and always get rebuilt together. Thanks, -- Matt
audio/video capture timestamping
Hello, Since I have an old Brooktree878 card which NetBSD supports, which I successfully used in the past with custom software using bktr(4) as part of a security suite, I thought I'd give it a new life and try to convert rare VHS which were rotting in a drawer to a digital format. I tried mencoder and ffmpeg, at first encoding in real-time, and had a/v sync problems, so I then tried simply capturing the stream to an interleaved avi file without compression, but unfortunately the issues still persisted. With mencoder, the video would often skip a bit to resync with the smooth audio, and with ffmpeg the audio would often skip to resync with the smooth video. Capturing from bktr(4) alone, or from audio(4) alone, is fine. If the source was not a tape, it would probably be possible to dump the video and audio streams separately and multiplex them afterwards, but this would probably be useless because of the mechanical issues causing slight speed variations. Encoding both streams with mencoder and the same card worked on Linux anyway, and I was then curious as to what was wrong on NetBSD. I also tried with another audio chip on NetBSD, without success. After reading a thread (http://www.mail-archive.com/po...@openbsd.org/msg26418.html) about it, it seems that the main problem would be our capture devices not supporting timestamps, which if available could be used by an encoder to more accurately synchronize audio/video. I'd like to know if someone already thought about those issues on NetBSD or already started some work to allow this. Indeed, on Linux with the same card and software the synchronization is better, and the timestamps are probably the reason. ALSA appears to support querying the timestamp for a recorded buffer, and v4l2 also seems to support timestamps. We do have a v4l2 partial video(4) implementation, although I didn't yet try capture with a uvideo(4) through it, and didn't yet read enough of its code to see if timestamps are supported. In any case, it probably means that it'd be possible to eventually hook bktr(4) though video(4) as well via video(9) and provide timestamps for userland to use... As for audio(4), I don't know how it could safely be extended to support timestamps without backwards compatibility issues, other than perhaps allowing an ioctl(2) to be used to request that timestamps be enabled, and yet another ioctl(2) to request the latest timestamp for the latest read buffer, which might also be considered quite hackish. Was there a WIP for another audio interface already for NetBSD, which also supports timestamps? Or does anyone have suggestions on how audio(4) could provide timestamps decently? I honestly have no idea on how much time I myself could put working on this, but it'd already be nice to determine what is really wanted in this area for the future, so that if I (or any other coder) has enough interest and time, progress could be made... so I'm asking for opinions and ideas. Thanks, -- Matt
Re: Length of wmesg for condvar?
On Sun, 8 Aug 2010 17:23:23 -0700 (PDT) Paul Goyette p...@whooppee.com wrote: Should these be changed? Are there any adverse effects from having a wmesg longer than 8 characters? It seems to me that the exporter of those use strncpy() (i.e. kern/init_sysctl.c) and that the structures use WMESGLEN and KI_WMESGLEN both defined as 8. So other than inadvertently truncated names it at least should not cause corruption, but I think that truncated names could also be problematic when trying to distinguish two strings starting with the same 8 characters (is that likely now)? Especially when the only thing that differs between two states is some suffix like rd and rw... After all, those are intended for humans :) -- Matt
Re: Length of wmesg for condvar?
On Mon, 9 Aug 2010 22:21:02 +0100 David Laight da...@l8s.co.uk wrote: On Mon, Aug 09, 2010 at 02:02:51PM -0700, Paul Goyette wrote: Does anyone object to my going through and coming up with shorter names (= 8 chars) for these condvars? It is worth chcking whether they are displayed with a %.8s format (or similar) so that they don't need to be 0 terminated. Otherwise the names must be strictly less than 8 bytes. David -- David Laight: da...@l8s.co.uk That is worthy of concern, so I checked top and ps: top uses char wmesg[KI_WMESGLEN + 1]; strlcpy(wmesg, pp-p_wmesg, sizeof(wmesg)); ps uses strprintorsetwidth(v, l-l_wmesg, mode); v-width = min(v-width, KI_WMESGLEN); Thanks, -- Matt
Re: Using coccinelle for (quick?) syntax fixing
On Sun, 08 Aug 2010 18:05:11 +0200 Jean-Yves Migeon jeanyves.mig...@free.fr wrote: Opinions? Any interest in it? My intent is to put NetBSD specific scripts on wiki.n.o, and provide links for more generic ones. That seems like a handy tool to save time and avoid a number of typos, if it's used right. Thanks for sharing, I personally didn't know Coccinelle. And example scripts can often be more useful than plain documentation, especially if it's in a WIP state (I liked that they showed in a few lines why it's better than sed :)) -- Matt
Re: Preserving early console output (pre-Copyright stuff)
On Thu, 1 Jul 2010 06:00:41 -0700 (PDT) Paul Goyette p...@whooppee.com wrote: That's what I thought I'd get for an answer! :) There is a serial port, but I haven't figured out yet how to make it work in the BIOS. And while I do have other machines with serial ports I've never used those ports and don't even have serial cables! (The last time I used a serial cable was way back in the days of modems and dial-up 'net access!) Sometimes I've been thinking about this as more and more hardware don't ship with RS232 anymore. Is there a relatively common BIOS interface which would allow, even if non-efficiently, to use a USB port as a serial device without too much code? If so, possibly that a special usb-serial bootblock could be using that sometime in the future? If there is no BIOS common interface, I can see it's a problem because of all the driver code that'd be needed at boot time... Thanks, -- Matt
Re: why not remove AF_LOCAL sockets on last close?
On Thu, 24 Jun 2010 22:55:51 -0400 Thor Simon t...@coyotepoint.com wrote: Can anyone tell me why, exactly, we shouldn't remove bound AF_LOCAL sockets from the filesystem on last close? The following test program produces second socket bind failed on every system I've tested it on, and seems to cover the only possible use case for this feature... I initially had the impression that leaving the socket around was a feature to allow re-binding to the same file by an unprivileged process after first creating the socket node as root (i.e. at a location where unprivileged processes cannot create new files such as /var/run/) to then set its permissions in a way to permit the unprivileged user or group to bind(2) it. However, I wrote a small test program and realized that despite SO_REUSEADDR this doesn't work, and indeed after checking the kernel code SO_REUSEADDR is ignored in the AF_LOCAL unp_bind() code. #include sys/types.h #include sys/socket.h #include sys/un.h #include err.h #include stdio.h #include stdlib.h #include string.h #include unistd.h int main(int argc, char **argv) { struct sockaddr_un sun; int s, opt; if (argc != 2) errx(EXIT_FAILURE, Usage: %s path, argv[0]); if ((s = socket(PF_LOCAL, SOCK_DGRAM, 0)) == -1) err(EXIT_FAILURE, socket()); opt = 1; if (setsockopt(s, SOL_SOCKET, SO_REUSEADDR, opt, sizeof(int)) == -1) err(EXIT_FAILURE, setsockopt(SO_REUSEADDR)); sun.sun_family = AF_UNIX; sun.sun_len = sizeof(sun); (void)strlcpy(sun.sun_path, argv[1], sizeof(sun.sun_path)); if (bind(s, (struct sockaddr *)sun, sun.sun_len) != 0) err(EXIT_FAILURE, bind()); (void)close(s); return EXIT_SUCCESS; } $ cc -o test test.c $ ./test /tmp/foo.sock $ ./test /tmp/foo.sock test: bind(): Address already in use So I to do what I described above, one has to create a directory in /var/run instead, with permissions such that the unprivileged process can create a file there. Then I'm unsure why we leave those sockets dangling around, although it's quite easy to explicitely unlink them at close time... -- Matt
Re: why not remove AF_LOCAL sockets on last close?
On Fri, 25 Jun 2010 14:51:45 +0200 Joerg Sonnenberger jo...@britannica.bec.de wrote: On Thu, Jun 24, 2010 at 10:55:51PM -0400, Thor Simon wrote: Can anyone tell me why, exactly, we shouldn't remove bound AF_LOCAL sockets from the filesystem on last close? If you want to do that, wouldn't it be easier to just go the Linux route and move them into a separate (virtual) namespace completely? Could this not pose security risks in certain scenarios? Or would such a namespace also support permissions? Thanks, -- Matt
Re: why not remove AF_LOCAL sockets on last close?
On Fri, 25 Jun 2010 09:19:03 -0400 Thor Simon t...@coyotepoint.com wrote: I think this is (always has been) a considerable blind spot on the part of BSD partisans. Sure, we're happy to gripe about persistent SysV IPC objects every time we have to remember how to use ipcrm, but bound AF_UNIX sockets have the same issue, and we just ignore it. I don't think most people have trouble with SysV IPC, considering those persistent resources were often used by short lived, but frequently used commands/processes, utilising both the permissions and persistent resources features (and NetBSD allows the admin to set the limit of the various SysV resources with accuracy); admitedly we can now do the same using files, mmap and advisory locks, though. But I agree that if leaving the sockets around permits no interesting feature whatsoever (i.e. it doesn't even serve for SO_REUSEADDR), it very well could be a design or implementation bug, even if common software already explicitely unlink AF_LOCAL sockets to account for this issue... -- Matt
Re: why not remove AF_LOCAL sockets on last close?
On Fri, 25 Jun 2010 08:59:18 -0400 Matthew Mondor mm_li...@pulsar-zone.net wrote: However, I wrote a small test program and realized that despite SO_REUSEADDR this doesn't work, and indeed after checking the kernel code SO_REUSEADDR is ignored in the AF_LOCAL unp_bind() code. Out of curiosity, I modified the test to see if immediately unlinking the socket node after bind(2) would leave it around until it's closed, a feature which some software expect for files on certain OS/FS combinations. However, the socket node is immediately deleted at unlink(2) even if it's still open and bound, so an application also shouldn't rely on this.
Re: updating COMPAT_LINUX for linux 2.6.x support (take 2)
On Thu, 17 Jun 2010 10:25:59 + Andrew Doran a...@netbsd.org wrote: This is mainly down the fact that we need kernel_lock to bracket legacy sections of code that aren't preemption safe. I think MULTIPROCESSOR should be sent off to the glue factory but that's another discussion :-). Is there any way that performance for the uniprocessor case could be preserved, where some synchronization/preemption-safe blocks are unnecessary, without having conditional code when MULTIPROCESSOR? Or is it that for uniprocessor the same precautions are always required on -current now? Thanks, -- Matt
Re: Writing to multiple descriptors with one system call
On Thu, 18 Mar 2010 21:36:47 +0100 Jean-Yves Migeon jeanyves.mig...@free.fr wrote: Pretty much all servers use the accept loop thing and fork/pthread right after, but this was not my point. High performance non-single-threaded servers often maintain a pool of persistent processes or threads which accept(2) concurrently, either in blocking mode or with polling (generally polling to allow listening to multiple addresses/interfaces). But indeed this doesn't change much for this thread... Having 80% system time passed in write() calls is not negligeable, but if you send the data byte after byte, I hardly see why it would be the syscall's fault here. You will have to assess that the overhead does indeed come from the context switch, and not by queuing up packets for the PHY, block I/Os, or moving data around the IP stack. There is a big mess behind a write(2), and the context switch is just one small part of it. Instrument. You can't control what you can't measure. Agreed -- Matt
Re: Writing to multiple descriptors with one system call
On Wed, 17 Mar 2010 16:22:44 + Sad Clouds cryintotheblue...@googlemail.com wrote: On Wed, 17 Mar 2010 16:01:28 + Quentin Garnier c...@cubidou.net wrote: Do you have a real world use for that? For instance, I wouldn't call a web server that sends the same data to all its clients *at the same time* realistic. Why? Because it never happens? I think it happens quite often. Another example is a server that is sending live data, i.e. audio playback, video stream, etc. If you can't use multicasting over a WAN, then you have a situation where you are streaming the same data to large number of clients. In the past I wrote a custom httpd which read broadcast security camera frames from LAN to broadcast them over connected HTTP clients, and since clients remain connected with keep-alive, it has to iterate through connections to send in new packets. (http://cvs.pulsar-zone.net/cgi-bin/cvsweb.cgi/mmondor/tests/bktr_httpd/) However, clients which cannot cope with the sending speed are throttled so that some packets are skipped, which makes things a little more complex than simply using a send this message to all FDs... kqueue(2)/kevent(2) were used for polling, and in my case the available bandwidth was always the bottleneck, however. I also have a question: did your test really use non-blocking sockets for writing, and an efficient polling mechanism like kqueue or libevent used, while disabling write polling when the sendq is empty, enabling it back when there's data to send, and only sending data when a poll event indicates that write is allowed? Otherwise, I assume that the LWP would lock on write(2). If a broadcast writev(2) to multiple FDs variant existed, it possibly would have to present an interface similar that of kevent, or be tied as a new protocol over kqueue, because of the FD specific errors/events... libevent for instance also supports transfer buffer queues and could possibly be adapted to support such a feature too. However I'm also unsure if this would really help or just move some userland and syscall overhead up to kernel overhead and achieve a similar overall performance. A test implementation might indeed be needed, to really know :( -- Matt
Re: (Semi-random) thoughts on device tree structure and devfs
On Tue, 9 Mar 2010 21:59:23 + (UTC) chris...@astron.com (Christos Zoulas) wrote: In article 70f62c5e1003091104l20b98c5ex66842f01e6f17...@mail.gmail.com, Masao Uebayashi uebay...@gmail.com wrote: Wow, that sucks. Not being able to change permissions (and less importantly, mv or rm the device files) would definitely be a problem. Could you show me use cases how it sucks? I need more use cases. - I want to present a subset of devices to a chrooted devfs. - I want to give a different set of permissions than the default. - I want to be able to call a device by a different (symbolic name) without using symlinks. - I want to prevent access to the device completely by not providing a device node. - I want to preserve those changes across boots. - I want to be able to move all my disk devices to a subdirectory. I had to deal with every of those scenarios, and never could stand existing devfs implementations on my systems; I however previously participated to a thread about devfs with ideas and suggestions for a possibly less broken pipe-dream implementation, but it simply tought me how complex a decent implementation would have to be, IMO. I however like the idea of simply having additional symlinks automatically be created to redirect unique names to the actual existing nodes (possibly the best implementation of this would be done via a virtual fs controled by the kernel, mounted under /dev/uuid/ or the like?). This wouldn't affect the target device node permissions, at least, and might solve most of the hotplug issues for users who need automount or can't track dmesg to then manually mount a device... Of course, if a removable device is supposed to move around a few sb* nodes depending on when/where it's plugged, then at least the admin can set permissions for all devices in that class, additionally to the permissions for the fs in /etc/fstab, just as traditionally. -- Matt