Re: Bug in fs/cd9660 raises questions about inode number computing

2014-05-10 Thread Matthew Mondor
On Sat, 10 May 2014 08:11:40 +0200
"Thomas Schmitt"  wrote:

> kern/48787 can be counted as a successful one.
> kern/48797 demonstrates that i need to free myself more from
> expectations which occupied my mind when studying isofs of
> a different kernel.
> Thanks to Martin Husemann for posing the right questions.

Thanks for working on this,
-- 
Matt


Re: Vnode API change: add global vnode cache

2014-05-09 Thread Matthew Mondor
On Sat, 10 May 2014 01:29:47 +
Taylor R Campbell  wrote:

>Is it expected in vcache_common() for the interlock to remain held even
>if returning an error?
> 
> vget unconditionally drops the interlock, so it will never remain
> held, error or not.

Oh, thanks.  I can now see that vget() must be called with it held, and
indeed drops it itself.
-- 
Matt


Re: Vnode API change: add global vnode cache

2014-05-09 Thread Matthew Mondor
On Wed, 30 Apr 2014 17:15:16 +0200
"J. Hannken-Illjes"  wrote:

> > vcache_get(mp, key, key_len, vpp) to lookup and possibly load a vnode.
> > vcache_lookup(mp, key, key_len, vpp) to lookup a vnode.
> > vcache_remove(mp, key, key_len) to remove a vnode from the cache.
> > VFS_LOAD_NODE(mp, vp, key, key_len, new_key) to initialise a vnode.
> 
> Updated diff at http://www.netbsd.org/~hannken/vnode-pass6-4.diff

One small question:

Is it expected in vcache_common() for the interlock to remain held even
if returning an error?

Thanks,
-- 
Matt


Re: Bug in fs/cd9660 raises questions about inode number computing

2014-05-09 Thread Matthew Mondor
On Tue, 06 May 2014 12:20:53 +0200
"Thomas Schmitt"  wrote:

> How to properly submit them ?

A PR (Problem Report) in the kern category with an attached unified
diff would seem adequate if you cannot commit the changes yourself.
Sorry if that is already obvious to you.

Unfortunately I'm not personally familiar enough with iso9660 to
confirm that the fixes are right, or to answer the other questions,
though; hopefully others will.

-- 
Matt


Re: NetBSD-5 appears to have forgotten how to execute 0.9A binaries

2014-05-06 Thread Matthew Mondor
On Tue, 6 May 2014 07:56:22 -0700
Brian Buhrow  wrote:

>   hello.  There was a fix implemented for the original problem by Chuck
> Silvers and tested by me.   I'll look to see if I can find the commits.
> I'm not sure if it was documented in a pr or not or if it got pulled up to
> NetBSD-6.  I'm pretty sure it's in -current and I know it's in -5 as a
> pullup.  If you want to have a look, it happened in the first half
> ofSeptember 2012.

Unfortunately I couldn't locate the exact change or pullup tickets.
But considering the change was pulled up to netbsd-5, and that 6.0 was
released around October, I guess that if netbsd-6 needed the change it
was also fixed then.

Thanks,
-- 
Matt


Re: Does "options P1003_1B_SEMAPHORE" still exist?

2014-05-06 Thread Matthew Mondor
On Mon, 17 Sep 2012 10:42:49 -0700 (PDT)
Paul Goyette  wrote:

Sorry for the long delay, I'm slowly recouping with tech-kern mail.

> I recently noticed that there is a built-in "ksem" module that includes 
> sys/kern/uipc_sem.c
> 
> The man page for sem(4) states that this code should be included in the 
> kernel only if "options P1003_1B_SEMAPHORE" is defined.  Yet a search of 
> the kernel sources shows no usage for this option anywhere, and the 
> uipc_sem.c file is unconditionally included by sys/conf/files
> 
> So, I have a few questions:
> 
> 1. Should sem(4) really be in manual section 4?  It doesn't appear to be 
> a device driver!  (Maybe a more detailed man page should be written for 
> section 9?)

I have the impression that those syscalls should all be documented in a
section 2 manual page instead (kern/37427).  Not totally related but
misc/38979 would have similar results for the scheduler control related
syscalls.  I now realize that I probably don't have a PR for these ones,
but the mqueue and setaffinity related syscalls are also undocumented.

At the time I filed the PRs they were contested by AD because the libc
counterparts were already documented, with the syscalls considered the
private interface.  I personally believe that all syscalls should be
documented in NetBSD (and recently I have learned that I'm not the only
one to think they should be, so perhaps I should eventually write these
manual pages, afterall).

> 2. Should the man page be updated to remove the reference to the option?

A quick grep on netbsd-6 here only shows:

share/man/man4/options.4:.It Cd options P1003_1B_SEMAPHORE
share/man/man4/sem.4:.Cd "options P1003_1B_SEMAPHORE"
sys/compat/freebsd/freebsd_syscall.h:#if defined(P1003_1B_SEMAPHORE) || 
!defined(_KERNEL)
sys/compat/freebsd/freebsd_syscallargs.h:#if defined(P1003_1B_SEMAPHORE) || 
!defined(_KERNEL)
sys/compat/freebsd/freebsd_syscallargs.h:#if defined(P1003_1B_SEMAPHORE) || 
!defined(_KERNEL)
sys/compat/freebsd/freebsd_syscalls.c:#if defined(P1003_1B_SEMAPHORE) || 
!defined(_KERNEL)
sys/compat/freebsd/freebsd_sysent.c:#if defined(P1003_1B_SEMAPHORE) || 
!defined(_KERNEL)
sys/compat/freebsd/syscalls.master:#if defined(P1003_1B_SEMAPHORE) || 
!defined(_KERNEL)
sys/compat/netbsd32/netbsd32_syscall.h:#if defined(P1003_1B_SEMAPHORE) || 
(!defined(_KERNEL_OPT) && defined(_LIBC))
sys/compat/netbsd32/netbsd32_syscallargs.h:#if defined(P1003_1B_SEMAPHORE) || 
(!defined(_KERNEL_OPT) && defined(_LIBC))
sys/compat/netbsd32/netbsd32_syscallargs.h:#if defined(P1003_1B_SEMAPHORE) || 
(!defined(_KERNEL_OPT) && defined(_LIBC))
sys/compat/netbsd32/netbsd32_syscalls.c:#if defined(P1003_1B_SEMAPHORE) || 
(!defined(_KERNEL_OPT) && defined(_LIBC))
sys/compat/netbsd32/netbsd32_sysent.c:#if defined(P1003_1B_SEMAPHORE) || 
(!defined(_KERNEL_OPT) && defined(_LIBC))
sys/compat/netbsd32/syscalls.master:#if defined(P1003_1B_SEMAPHORE) || 
(!defined(_KERNEL_OPT) && defined(_LIBC))
sys/kern/init_sysctl.c:#if defined(MODULAR) || defined(P1003_1B_SEMAPHORE)
sys/modules/compat_netbsd32/Makefile:CPPFLAGS+= -DP1003_1B_SEMAPHORE -DCOREDUMP 
-DKERN_SA

> 3. If the code is truly unconditional, should it really be a module?  If 
> so, could it be made to auto-load when needed?  Could it also be auto 
> unloaded?

It seems that other POSIX librt components such as message queues,
scheduler control, cpu affinity, etc, are not optional.  I don't know
why those semaphores should be, thus they could probably remain as part
of the base kernel with the option removed, unless we'd want all of RT
components to be optional and in a module, perhaps?  But librt of
course wouldn't be usable then, unless it's loaded...

Anyone remember a particular reason why these semaphores might be
unwanted in custom kernels, but the rest of librt wanted anyway?
-- 
Matt


Re: NetBSD-5 appears to have forgotten how to execute 0.9A binaries

2014-05-06 Thread Matthew Mondor
On Tue, 11 Sep 2012 09:45:22 -0700
buh...@lothlorien.nfbcal.org (Brian Buhrow) wrote:

> provide further results.  I assume a fix would want to be pulled
> up,assuming I find it, on the grounds that it's a security fix.  I'll also
> see about trying -current and NetBSD-6, but I'm guessing those are
> vulnerable as well, given Matthew's test with my binary under NetBSD-6
> yesterday.

Was a PR for this ever filed, or the problem fixed since?  Any relation
to SA2013-013?

Thanks,
-- 
Matt


Re: Panic when deleting large number of files inside DomU

2014-05-06 Thread Matthew Mondor
On Wed, 19 Sep 2012 12:00:45 +0200
Roger Pau Monne  wrote:

> Yes, WAPBL enabled. I will fill a PR about this if there are no news.

Was a PR already filed for this, or was the reason discovered and fixed
since? A quick search showed one of your closed Xen related PRs but it
seems to be a different issue, unless I'm mistaken.

Thanks,
-- 
Matt


Re: asymmetric smp

2014-05-05 Thread Matthew Mondor
On Mon, 5 May 2014 01:10:24 -0400
Matthew Mondor  wrote:

> which some CPUs might have trouble with (i.e. RAS)...

I think that what I meant was CAS

-- 
Matt


Re: resource leak in linux emulation?

2014-05-04 Thread Matthew Mondor
On Mon, 5 May 2014 15:43:56 +1200
Mark Davies  wrote:

> On Mon, 05 May 2014, Christos Zoulas wrote:
> > I wrote:
> > >So can someone suggest where exactly the patch should go.  And
> > >isn't proc_lock held at this point (entered at line 344, exit at
> > >line 569)?
> > 
> > How about this?
> 
> Seems good to me and can confirm that its fixed the increasing proc 
> count problem.  Can someone commit and pull up to 6?

I also see emulation-code specific exit hooks support, I've not checked
if it's really possible, but could that linux-specific case be solved
there instead of in the generic code if so?

Thanks,
-- 
Matt


Re: asymmetric smp

2014-05-04 Thread Matthew Mondor
On Wed, 02 Apr 2014 17:21:02 +0200
Johnny Billquist  wrote:

> On 2014-04-02 16:10, John Nemeth wrote:
> > On Apr 2,  1:55pm, Johnny Billquist wrote:
> > } The root fs in on nfs, as I'm running the machine diskless. Disk is
> > } served from a -current NetBSD/alpha system sitting right next to it. And
> > } I have changed the Alpha to run at 10 MB/s half duplex, and I have 2k
> > } block size for NFS. Login is obviously already running, since that is
> > } what also prompts for the username, and doing it twice should even put
> > } some stuff in local cache.
> >
> >   Uh, actually getty does the initial prompt for username on
> > the console.  After collecting the username, getty execs login.
> 
> Hmm. My mistake in that case. So we have image activation at that point. 
> Hmm...

Possibly other things to verify would be /etc/passwd.conf (you'll likely
need to also regenerate passwords if you change those settings), and if
VAX has specialized lock code or uses the new generic atomic operations
which some CPUs might have trouble with (i.e. RAS)...
-- 
Matt


Re: PostgreSQL 9.2 benchmarks

2012-08-29 Thread Matthew Mondor
On Wed, 29 Aug 2012 14:15:21 +0200
Francois Tigeot  wrote:

> On Tue, Aug 28, 2012 at 08:51:43PM +0100, Mindaugas Rasiukevicius wrote:
> > 
> > If the kernel is before 15th of August, netbsd-6 still had DIAGNOSTIC
> > option enabled by default, which would affect the performance.  Did you
> > use a kernel with the option disabled?
> 
> I'm not sure, I used a snapshot iso from the 16th and uname -a only reports
> NetBSD 6.0_BETA2

config -x /netbsd  should allow to view its kernel configuration file,
and uname should report when the kernel was built
-- 
Matt


Re: 6.0_BETA->6.0_BETA2 rename

2012-07-30 Thread Matthew Mondor
On Mon, 30 Jul 2012 16:59:14 +0200
Edgar Fuß  wrote:

> Just out of curiosity: Why was 6.0_BETA renamed 6.0_BETA2 recently?

The release of second beta binaries:
http://blog.netbsd.org/tnf/entry/netbsd_6_0_beta2_binaries

After the beta series, release candidates might be expected i.e. RC1,
RC2 etc until official release, at which point the netbsd-6 branch will
become 6.0_STABLE.
-- 
Matt


Re: Core statement on directory naming for kernel modules

2012-07-28 Thread Matthew Mondor
On Fri, 27 Jul 2012 17:28:14 -0700
jnem...@victoria.tc.ca (John Nemeth) wrote:

> On Dec 17,  1:58pm, Matthew Mondor wrote:
> } This reminds me though: why/how does sysctl/kern.module.autoload
> } default to 1 for non-MODULAR kernels (at least on netbsd-6)?  Or an
> } alternative question: are these sysctl knobs useful at all with
> } non-MODULAR kernels, or are they then artifacts?
> 
>  Good question.  Non-MODULAR kernels still have parts of the MODULAR
> subsystem in order to initialise built-in modules.  However, the linking
> code isn't there, so it would be impossible to load a module.  I'll make
> a note to trim some of the excess stuff in non-MODULAR kernels.

Indeed the linker isn't there, which was confirmed using nm when I
initially noticed those knobs.

Thank you for looking into this,
-- 
Matt


Re: Core statement on directory naming for kernel modules

2012-07-27 Thread Matthew Mondor
On Fri, 27 Jul 2012 13:57:52 + (UTC)
Geoff Wing  wrote:

> John Nemeth  typed:
> : .. Being able to properly unload a built-in module would be a nice
> : feature.
> 
> This sounds a bit like a possible security problem, though 
> presumably/hopefully
> limited by the current security level and AAA.

Do you mean in the case an external module could then be loaded instead
of a built-in one?  Probably that someone who wants to prevent the
kernel from loading external modules would use a kernel without
MODULAR, or change the runlevel.

This reminds me though: why/how does sysctl/kern.module.autoload
default to 1 for non-MODULAR kernels (at least on netbsd-6)?  Or an
alternative question: are these sysctl knobs useful at all with
non-MODULAR kernels, or are they then artifacts?

Thanks,
-- 
Matt


Re: How to get CPU-Specific Info?

2012-07-19 Thread Matthew Mondor
On Thu, 19 Jul 2012 13:02:13 -0400
Frank Kastenholz  wrote:

> There is /proc/cpuinfo --- but this seems to be
> oriented more towards things like versions/features/
> etc of the actual silicon.  I could add the stuff
> I want to this.  Or I could make a new /proc file
> (eg "/proc/cpucsrs")

We also have /dev/cpuctl which cpuctl(8) uses (/proc is optional except
for COMPAT_LINUX), but that's also mostly to report features; I also
think that sysctl would suit to export special CPU registers...
-- 
Matt


Re: Quota on tmpfs

2012-07-17 Thread Matthew Mondor
On Tue, 17 Jul 2012 21:26:44 -0400
Matthew Mondor  wrote:

> A scenario in which they're frequently used is block-based file system

s/file system/file/ :)
-- 
Matt


Re: Quota on tmpfs

2012-07-17 Thread Matthew Mondor
On Tue, 17 Jul 2012 20:54:28 + (UTC)
mlel...@serpens.de (Michael van Elst) wrote:

> I would also guess that sparse files are very rarely used. But for
> disk usage purposes you want to consider real disk usage including
> overhead because the quotas are mostly used to partition the available
> space. That doesn't work if your quotas allow you to write a few
> thousand files of 1 byte length that account together as a single
> single block when they really occupy a few thousand blocks.

A scenario in which they're frequently used is block-based file system
transfer protocols (especially distributed ones where blocks may
download in random order, including bittorrent), also by download
managers that support "download optimization" where multiple
connections will be made to transfer multiple file sections at a time
(i.e. the DownloadThemAll Firefox extension).

Another common usage of sparse files is for live file system images.
The cost of creation (open/creat + trunk/lseek + newfs) is small
compared to writing a full image of zeros, then the blocks can be
lazily allocated and written when needed.

Apparently some database storage formats use sparse files, but the ones
I'm currently using don't seem to...
-- 
Matt


Re: Quota on tmpfs

2012-07-13 Thread Matthew Mondor
On Fri, 13 Jul 2012 08:03:42 +
David Holland  wrote:

> I believe the situation with both mfs and lfs is that some pieces of
> the support are in place but not others. It was clear when hacking up
> the code that neither had actually been tried by anyone in a long,
> long time...

I admit myself not having tried LFS again after the advent of WAPBL,
and only having used MFS to boot small custom userlands using
crunchgen(1) long ago (floppy disks :)
-- 
Matt


Re: Quota on tmpfs

2012-07-13 Thread Matthew Mondor
On Fri, 13 Jul 2012 07:54:07 +
David Holland  wrote:

> On Thu, Jul 12, 2012 at 09:33:42PM -0400, Matthew Mondor wrote:
>  > Yet another hack would be to create a sparse ffs image under a tmpfs,
>  > mounted with quotas via vnd, but evaluating its ideal size might be
>  > difficult, and you'd have to re-apply quota settings in the script that
>  > creates the image at boot time... :)
> 
> Using mfs instead of tmpfs is probably a better bet here. mfs brings
> in enough of ufs that adding quota support to it shouldn't be
> particularly complicated.

I was also wondering initially if mfs didn't actually already support
quotas because of this similarity, but it doesn't seem so at the moment
indeed

Thanks,
-- 
Matt


Re: Quota on tmpfs

2012-07-13 Thread Matthew Mondor
On Fri, 13 Jul 2012 08:40:54 +0200
Thomas Klausner  wrote:

> On Thu, Jul 12, 2012 at 09:33:42PM -0400, Matthew Mondor wrote:
> > If I remember there is some optional support for symbolic links to
> > resolve to user-specific targets, but I forgot the details.  With
> > that /tmp/ could potentially be a symbolic link pointing to
> > say, /tmpfs// I think.
> 
> I think you mean the MAGIC SYMLINKS section in "man 7 symlink".

That's exactly it, thanks for the reference
-- 
Matt


Re: Quota on tmpfs

2012-07-12 Thread Matthew Mondor
On Thu, 12 Jul 2012 16:17:42 +0200
Edgar Fuß  wrote:

> How do I enable new quota on a tmpfs?

A possible solution might be a per-user tmpfs, each limited using -s...
of course, it's more complex to manage though.

If I remember there is some optional support for symbolic links to
resolve to user-specific targets, but I forgot the details.  With
that /tmp/ could potentially be a symbolic link pointing to
say, /tmpfs// I think.

Yet another hack would be to create a sparse ffs image under a tmpfs,
mounted with quotas via vnd, but evaluating its ideal size might be
difficult, and you'd have to re-apply quota settings in the script that
creates the image at boot time... :)
-- 
Matt


Re: Path to kernel modules (second attempt)

2012-07-08 Thread Matthew Mondor
On Sun, 8 Jul 2012 17:57:00 +0200
Edgar Fuß  wrote:

> > Please not /kernel as it was already mentioned, it is too similar to
> > /kern.
> What about /netbsd? E.g. /netbsd/6.0_BETA/{modules,kernel,firmware}.

/netbsd/amd64/6.0/GENERIC/{modules,kernel,firmware} :) ?

But can the kernel easily detect that its image was booted in a
particular directory, and use that as base directory to look for
modules?  Also, how more complex would this be for the bootloader that
also needs to preload a few modules to be able to boot?
-- 
Matt


Re: Path to kernel modules (second attempt)

2012-07-07 Thread Matthew Mondor
On Sat, 7 Jul 2012 20:54:12 -0600
Warner Losh  wrote:

> But it kinda fails with multiple kernels.  On FreeBSD, we went with 
> /boot/$KERNNAME/kernel for the kernel, with all the modules associated with 
> it in /boot/$KERNNAME. By default, we load /boot/kernel/kernel and the loader 
> may also choose to load other things.  The reason we put it in /boot was 
> because we have a secondary boot loader (/boot/loader) and on some platforms 
> we were looking at you needed a separate boot partition to do things 
> correctly.  this layout allows for that as well as transparently supporting 
> multiple kernels.  I know on one of my MIPS boards, I can read kernels or the 
> boot loader off of FAT partitions, so my /boot there is a FAT file system, 
> with the rest of the system in a UFS file system on separate 
> partitions/slices on my CF.

I think that the version and arch directories would be maintained.

But you're right, and when I think of it, it's actually one of the
reasons I use monolithic kernels.  If modules and kernels always
corresponded well and were closely coupled in a directory, it'd be much
less trouble for me to test and move kernels around, or maintain
multiple versions of them on the same host.  At the moment, single
monolithic files do this much better.  Some kernel configuration
changes not only affect the main image, but also the modules, and full
ABI compatibility would be a difficult problem.

It might not matter for someone who wants to avoid using a custom
kernel (I agree that modules should help a lot in this case for the end
user, no matter their arrangement).  But if we eventually begin to see
modules under non-BSD licenses which can only be distributed as
modules, more tech users might likely want modules as well...  Or it
might not matter at all, if an admin can simply link together all
modules in a single kernel image, and keep the non-distributable image
private in the organization (I think there is some work in this area,
other than the traditional monolithic builds)?

So something like /kmod/amd64/6.0/GENERIC/, or a layout
where /netbsd-GENERIC/ could be a directory, /netbsd-GENERIC/image the
kernel, /netbsd-GENERIC/modules/ its corresponding modules, would be
nice.  In the latter case, prehaps also a /netbsd symlink pointing to
the corresponding /foo/image, somewhat like the vmlinuz link of some
Linux distributions?

Thanks for sharing your experience,
-- 
Matt


Re: Path to kernel modules (second attempt)

2012-07-07 Thread Matthew Mondor
On Sat, 07 Jul 2012 22:46:50 +0200
Jean-Yves Migeon  wrote:

> On 07.07.2012 21:57, Mindaugas Rasiukevicius wrote:
> > Hello,
> > 
> > Regarding the PR/38724, I propose to change the path to "/kernel/".
> > Can we reach some consensus quickly for netbsd-6?
> 
> /kernel is way to close to /kern, and they serve different purposes.
> IMHO that will raise confusion.

Perhaps /kmod, or /modules like dholland suggests?

> Technically modules are not libraries, but maybe /libdata/module is a
> good option? We already have firmwares in /libdata/firmware, and those
> get used by the kernel.

That also makes sense
-- 
Matt


Re: Problem with "chown"

2012-06-27 Thread Matthew Mondor
On Wed, 27 Jun 2012 23:20:36 -
"David Lord"  wrote:

> I tried NetBSD-6-BETA2 but had too many problems. 
> Attempted reinstalls of NetBSD-5 have all obviously
> failed.

Indeed, downgrading is usually more problematic, postinstall not being
of much use in this case
-- 
Matt


Re: per-mount maxvnodes

2012-06-10 Thread Matthew Mondor
On Thu, 7 Jun 2012 17:50:58 +0200
Manuel Bouyer  wrote:

> On Thu, Jun 07, 2012 at 11:09:26AM -0400, Mouse wrote:
> > > Therefore comes the idea to have a per-mount maxvnodes.
> > 
> > > I tried implementing it, the biggest problem is how to set the value.
> > 
> > sysctl kern./usr/local.maxvnodes?
> > 
> > It's a little ambiguous, in that it's possible - or at least it was
> > last time I tried it - to have multiple mounts with the same mounted-on
> > string.  But that's definitely an unusual case, and I see nothing wrong
> > with accessing the topmost mount in that case; that's what normal
> > filesystem accesses will do, after all.
> 
> No, I think this should be a mount option. Maybe it's time to revisit the
> mount(2) interface (proplist anyone ? :)

If mounts had an ID (like processes), then it'd be easier to use sysctl
for them (commands such as mount and df might want to also export such
IDs, so possibly also statvfs(2))... There are device ID, but I'm not
sure this could serve this purpose properly.

This also reminds me of the thread about possibly allowing to
temporarily enable noatime for a particular operation such as a backup
or find... Perhaps that such options should eventually be dynamically
scoped such that a particular process or lwp could temporarily bind
another value for its own use (if it has the necessary privileges, of
course)?  I'm not sure how far fetched this can be relatively to the
code, I'm not very familiar with the FS code.
-- 
Matt


Re: Rump FS throughput

2012-06-02 Thread Matthew Mondor
On Fri, 1 Jun 2012 22:30:10 +0200
Thomas Klausner  wrote:

> On Thu, May 31, 2012 at 01:45:53PM -0400, Matthew Mondor wrote:
> > Although it's useful to mount random media more safely than it would be
> > using kernel-space, I noticed that using 64KB reads, the kernel cd9660
> > will gladly read ~20MB/s from a DVD, but that rump_cd9660 using
> > 64KB reads is limited to aproximately 4MB/s at most, even if the system
> > is mostly idle during those transfers (on netbsd-6/amd64 and 4 3.3GHz
> > cores).
> 
> Some suggestions from Antti via email proxy:
> Maybe he is using the block device (/dev/cd0a) instead of the raw device
> (/dev/rcd0a).  IIRC the former has some pretty serious performance
> problems for userspace I/O.  Also in the "maybe" department, libp2k
> should probably detect and autoadjust a block device to raw device.
> Or, someone could just fix the bdev stuff.

Thanks for forwarding his suggestions,

If I try using the raw device (rcd0a), the speed is about 1.2MB/s (I
can't hear the DVD drive motor spin up either), while with the block
device (cd0a) the speed is about 4MB/s (in this case it spins up to a
higher speed).  With the same DVD and cd0a mounted using the
kernel FS implementation and the same command
(cat /cdrom/* >/dev/null), I get from 10 (start) to 20 (end) MB/s.
These tests were on NetBSD-6.

I'm not familiar enough with libp2k or bdev to know what needs to be
done, but I could certainly take a look eventually.  But I probably
also should verify if an ISO-9660 FUSE implementation exists, and
perhaps try to port it eventually, and see if performance is adequate
for general use.

Thanks again,
-- 
Matt


Re: Should kqueue descriptors work outsid of the creating process?

2012-05-31 Thread Matthew Mondor
On Thu, 31 May 2012 14:40:44 -0400
Matthew Mondor  wrote:

> What I can see is that the implications of inheriting this special
> descriptor are quite more complex than for normal FDs...  Which makes
> me think that it very well could be a design decision not to inherit
> these, in which case I don't object to also prevent passing it via
> SCM_RIGHTS ancillary message.

When catching up with mail, I unfortunately read the PR thread after
writing this (as well as Christos's concerns about treating some FDs
differently than others).  What came to my mind was that kqueue could
have used another type of special object instead of a descriptor, but
it's too late for a change of API, and although I see some other
interfaces using such integers which aren't necessary file descriptors
(i.e. timer_create(2)), kqueue's API expects close(2) to clean it
up...
-- 
Matt


Re: Should kqueue descriptors work outsid of the creating process?

2012-05-31 Thread Matthew Mondor
On Thu, 31 May 2012 10:38:38 -0400 (EDT)
Mouse  wrote:

> > Recently we found out (PR kern/46463) that kqueue() file descriptors,
> > which originaly were designed to be "local process only" objects,
> > could be passed with SCM_RIGHTS messages to other processes.  [...]
> 
> > I propose to not allow sending kqueue file descriptors [...]
> 
> > Or are there any legit uses for "foreign" kqueue()s?
> 
> It seems to me, for what it may be worth, that this is asking the
> wrong question.  Rather, I would ask whether there are illegitimate
> uses for `foreign' kqueue descriptors, and, if not, fix them to be
> passable like any other descriptors.

It's true that it's normally the parent's reponsibility to decide which
FDs to close or set close-on-exec before fork(2)... Was there a design
decision not to inherit kqueue descriptors for security or complexity
reasons?

Since signals, signal mask, signal stack and restart/interrupt flags
are also inherited according to sigaction(2), probably that an
EVFILT_SIGNAL filter would still be valid...

But how about EVFILT_TIMER?  timer_create(2) timers are not inherited,
setitimer(2) doesn't specify, but it also uses the same ptimers pool
timer_create(2) uses.  EVFILT_TIMER apears to use its own system though.

For EVFILT_PROC, it appears to be for the specified process, so I guess
it might still work if inherited?

And there also EVFILT_VNODE... who knows what other filters might be
added in the future?

What I can see is that the implications of inheriting this special
descriptor are quite more complex than for normal FDs...  Which makes
me think that it very well could be a design decision not to inherit
these, in which case I don't object to also prevent passing it via
SCM_RIGHTS ancillary message.
-- 
Matt


Re: link-sets in modules

2012-05-31 Thread Matthew Mondor
On Mon, 28 May 2012 06:51:43 -0700 (PDT)
Paul Goyette  wrote:

> I _do_ like part 2 of your proposal - linking the "core" kernel first, 
> and then re-linking with selected modules.

I also think that this would be very nice
-- 
Matt


Rump FS throughput

2012-05-31 Thread Matthew Mondor
Hello,

Although it's useful to mount random media more safely than it would be
using kernel-space, I noticed that using 64KB reads, the kernel cd9660
will gladly read ~20MB/s from a DVD, but that rump_cd9660 using
64KB reads is limited to aproximately 4MB/s at most, even if the system
is mostly idle during those transfers (on netbsd-6/amd64 and 4 3.3GHz
cores).

This also reminds me of pty related issues with the previously small
buffer size, and whenever the buffer could be made larger, throughput
was much better.  However, this is already using a 64KB buffer, which
seems fairly large.  I didn't investigate it but I suspect that
frequent context switches might be the problem, or perhaps some rump HZ
or virtual-interrupts frequency issue.

It's not a critical problem (I can simply use the pure kernel FS
implementation, and I understand that Rump is still useful for
testing/debugging), but I wondered if anyone already knew exactly what
limits the troughput, and if there's an easy fix...

Probably that an alternative might be to try a Puffs/FUSE ISO-9660
implementation, but I didn't find such under pkgsrc/filesystems/ (and
from previous experience porting FUSE filesystems is sometimes
non-trivial).

I haven't tested recently, but I think that I remember rump_msdos also
being slow on USB flash devices compared to using mount_msdos.

Thanks,
-- 
Matt


Re: GSOC 2012 project clarification

2012-04-02 Thread Matthew Mondor
On Mon, 2 Apr 2012 22:02:35 +0200
Matthias Drochner  wrote:

> I'm not the originator of that project, but I've recently worked on
> support for SSD trim/erase commands which is quite similar technically.
> If this project materializes, I do offer my help. For now just two
> comments:
> 
> On Sun, 1 Apr 2012 13:04:13 -0400
> Sanket Padawe  wrote:
> > whenever that flag is set and a file/folder
> > gets deleted i.e. at the point of unlink system call we just need to
> > rewrite all the blocks of that file with some random data and then
> > release those blocks.
> 
> The deletion is generally not at the time of "unlink". It happens when
> the file isn't referenced by anything anymore. It needs to happen
> after that point, but before the file system's allocation management
> gets notified that the data blocks can be reused.
> For performance, this needs to be dove asynchronously. It needs some
> knowledge about kernel locking and signaling mechanisms to implement.
> 
> > So by
> > generating some pseudo random numbers we can erase the previous secure
> > data.
> 
> I'm not sure that pseudo-random numbers help security in the general
> case, compared to just zeros. For a plain harddisk, either one is
> good enough. For a SSD, both are useless. A difference would be
> if the device was an encrypted disk because all-0 would be a perfect
> "known plaintext". It should be configurable.

For reference, perhaps see what rm(1) -P option does (and GNU's
shred(1)), which is a commonly used technique: overwrite with 0xff,
overwrite with 0x00, then with some pseudo-random data.  I'm not sure
if the last step is necessary, but it's generally recommended to not
just overwrite with 0x00 but also with 0xff first.

rm(1) tells where to read more:
 The -P option attempts to conform to U.S. DoD 5220-22.M, "National
 Industrial Security Program Operating Manual" ("NISPOM") as updated by
...

Some hardware also support the feature, and as a second step it might
be nice to be able to use this feature where available...
-- 
Matt


Re: CVS commit: src/tests/modules

2012-03-22 Thread Matthew Mondor
On Wed, 21 Mar 2012 21:47:31 +
David Holland  wrote:

> But, how about "kern.module.supported" or "kern.module.canload" or
> something?

I like the kern.module.supported, or perhaps kern.module.enabled, as I
have systems built without module loading support yet still have a few
module sysctls around under that same hierarchy, and module.modular
also seems ambiguous and redundant...
-- 
Matt


Re: "Rewriting kernfs and procfs" - GSoC'12

2012-03-20 Thread Matthew Mondor
On Tue, 20 Mar 2012 10:35:13 +0900
Julio Merino  wrote:

> Personally, I'd also like to see this project done.  It was at one point 
> an idea I wanted to work on, but then lost the time to do so and 
> forgotten about it completely.

I was initially reticent to reply to this thread at this time, because
some details might be out of the scope of the GSoC project.  But I
think that those questions are important to consider in the design of a
new procfs implementation, and the project description was very
summary, so I decided to post them anyway:

It was nice to be able to mount procfs with -o linux when I used Linux
binary compatibility.  Are there other scenarios where it is required?
If not, should a new implementation simply be as compatible as possible
with Linux, such that -o linux not be necessary?  Even some supposedly
portable software occasionally now expect a Linux-compatible procfs
tree.

Otherwise, I think that currently NetBSD doesn't make use of it, as
kernfs and procfs are not mounted on my systems.  Is there
functionality that it should provide which
sysctl/vmstat/pmap/fstat/drvctl don't?  While on Linux it's used as a
central repository for a lot of information, I regularily stumble on
ad-hoc parsers in a number of applications that query from it,
wondering why they didn't export that information via sysctl...

If it should diverge from Linux and still support -o linux, is there
a particular hierarchical direction it should respect, and suggested
file format(s), i.e. plist is an example, which applications could
parse using a supplied library?  Or should the data be in a format
designed for human reading only, with sysctl used for software?  I
doubt that a new implementation needs to remain compatible with the
traditional 4.4BSD procfs hierarchy, as it's not really being used by
software yet.

I once thought that it might be useful to export procfs via NFS,
but our current implementation doesn't support it.  Is this something
that a new implementation should allow?

Thanks,
-- 
Matt


netbsd-6/amd64 and TLS

2012-03-18 Thread Matthew Mondor
Hello,

I stumbled upon something interesting tonight when testing a new
unstable ECL (Embeddable Common Lisp).  When built with TLS support
(--with-__threads=yes), a noticeable slowdown can be experienced
compared to with --with-_threads=no.  For now, I'm not sure yet if it
has to do with a bug in ECL or in NetBSD, though, I should check the
TLS/non-TLS code paths whenever I have more time.

But I wanted to meanwhile share this, in case someone else also noticed
something similar, or has a clue as to why this happens.

The system was built using DBG='-g -O2'.

Thanks,
-- 
Matt


Re: Recent 6.0_BETA crash

2012-03-08 Thread Matthew Mondor
On Thu, 08 Mar 2012 14:04:49 +0300
Aleksey Cheusov  wrote:

>  >> I few minutes ago I updated the kernel and modules on my 6.0_BETA
>  >> to the latest netbsd-6 sources and enabled debugging kernel options.
>  >> 
>  >>optionsDEBUG   # expensive debugging checks/support
>  >>makeoptionsDEBUG="-g"  # compile full symbol table
>  >> 
>  >> Userlevel was not updated.
>  >> After login in xdm the system crashed.
>  >> at /srv/src_netbsd_current/sys/arch/x86/x86/pmap.c:3326
>  >> 
>  >> 3326KASSERT(uvm_page_locked_p(pg));
>  >> 
>  >> Stacktrace is below. Any clue?
> 
> > It looks like the same problem as what makes the kernel occasionally panic
> > while running tests. It looks like a race condition causing memory 
> > corrution,
> > but this is hard to track down ...
> 
> On my system this crash is reproducible. At least it repeated three
> times with xdm login.  Does it make sense to send PR?

Other cases where I can reproduce a crash or freeze:

- Using tmpfs a lot, i.e. cvs co src and xsrc and build a full release
  on it (using -j4 on a 4 cores amd64).  The tmpfs shows 13GB free
  before I start, so I'm not sure the issue is related to a memory
  shortage.
- Playing a movie from a DVD using a puffs_cd9660, takes a while but
  eventually crashes

Unfortunately, I couldn't get a crash dump (although I do have a dump
device configured), so I'm considering connecting an RS232 cable to
this MB (it fortunately supports it) and finding again my null modem
cable such that I can instead try to get into ddb...

Sometimes the system just reboots, but most of the time it remains
frozen with the HDD light on until I reset.

I'm also unsure if I should file one PR for each case, at least until I
get more useful information...
-- 
Matt


Re: Problem with install of NetBSD-6 from cd on i386 siside

2012-03-07 Thread Matthew Mondor
On Wed, 07 Mar 2012 15:14:52 -
"David Lord"  wrote:

> I have since obtained netbsd-6 src via cvs on a different system,
> built a release, copied sets over network and updated target pc
> to NetBSD-6. I am able to mount the cdrom and "tar -tzvf comp.tgz"
> initially gave same error as above but then completed ok. Seems
> the drive isn't being allowed to spin up.

Just a note: beware about the missing -p option when extracting sets.
Permissions will not be restored properly and things like setuid
binaries will not be working (a common issue would be for instance,
su(1) not working after installing base.tgz).  This might not matter
for the comp.tgz set, though.
-- 
Matt


Re: Respawn crashed PUFFS filesystems?

2012-02-11 Thread Matthew Mondor
On Sun, 12 Feb 2012 01:02:38 -0500 (EST)
Mouse  wrote:

> > Of course the feature would be broken in some cases, but we could
> > make the thing optional using a vfs.puffs.respawn sysctl, which would
> > contain a colon-separated mount points subjected to respawn.
> 
> What happens if a mount point contains a colon?
> 
> More to the point, I think this puts the information in the wrong
> place.  Is there any way it could be set as an option at mount time?
> (That's a serious question; I don't know puffs enough to answer it.)

I also think that a mount respawn option would be elegant
-- 
Matt


Re: extattr namespaces

2012-02-06 Thread Matthew Mondor
On Mon, 6 Feb 2012 09:51:19 +
Emmanuel Dreyfus  wrote:

> We ahve two extended attributes API in tree: one from FreeBSD and one from 
> Linux. We are about to toss the FreeBSD one in favor of the Linux one. 
> That is easy now since we never had working extended attributes in a 
> release.

One thing that I'm wondering: what are the character constraints on
those class names in the Linux API?

The reason is that if UTF8 is allowed, it'd be possible for two names
to show as an equivalent representation to humans, while they'd be
different for the system, and this could have security implications if
we ever use these to support extended permissions such as ACLs in the
future.

> In the FreeBSD API, namespaces are int. There are two namesapces defined:
> ssytem and user. There is no way to add other namespaces, though I have
> no idea what happens if one use an int valude different than system or user.

For performance and security, integers make more sense to me than
strings.  However, I don't think there'd be a problem if internally
they're integers, yet showed to userland with a strings interface (we
traditionally do this for user and group IDs, in which case tools such
as id or ls can show the IDs as well as names).  Or if names were
restricted as necessary if IDs were dropped.

At least for namespace name strings and the SYSTEM namespace attribute
name strings, they should probably be restricted to a-z (or A-Z).  I
don't think that this would matter much for user namespace attributes,
though.
-- 
Matt


Re: Adding an option to avoid SIGPIPE for all file descriptors

2012-01-25 Thread Matthew Mondor
On Wed, 25 Jan 2012 12:25:46 -0500
Steven Bellovin  wrote:

> 
> On Jan 23, 2012, at 11:05 58PM, Matt Thomas wrote:
> 
> > 
> > On Jan 23, 2012, at 7:58 PM, Steven Bellovin wrote:
> > 
> >>> I also wonder whether we should also have a note that disabled SIGPIPE.
> >>> similar to what paxctl does.
> >>> 
> >> You mean a system-wide flag?  That would worry me; I think it would have
> >> bad effects, since anything that did
> >> 
> >>a | b 
> > 
> > paxctl sets a note in the executable.
> > 
> I don't like that, either, but on philosophical grounds.
> 
> The problem I have is that the semantics of the execution now depend on
> something not in the source code; however, the code needs to know about
> it in order to cope properly.  (Setuid is somewhat different, since it
> also reflects the policy of the site.)  I also don't see the point, as
> opposed to a system call to set the flag.  

A system-wide flag would mess with applications that expect the SIGPIPE
traditional behaviour, and I also find rather awkward to depend on an
ELF note for this.  The use of ELF notes for paxctl is less
questionable but still awkward: at application upgrade the admin must
remember to also set the special paxctl flag again on the new
executable, vs a vnode flag.

Applications already can use signal(3) or sigaction(2) if they don't
want it (and now the FD-specific setsockopt(2)/fcntl(2), which I see no
problem with).

But if I understand, Matt's suggestion is to be able to disable
SIGPIPE signaling for some of them behind their back?  Then how about a
process/PID-specific nosigpipe sysctl(3) perhaps (we have things like
stopfork/stopexec/stopexit), or a more general way to control if/which
signals are ignored for a process via sysctl?  Or something like
nohup(1) but for SIGPIPE, nosigpipe(1), or a more general nosig(1)
allowing to specify which signals to ignore?

Thanks,
-- 
Matt


Re: Possible incorrect usage of STACKALIGN in kern_exec

2012-01-24 Thread Matthew Mondor
On Tue, 24 Jan 2012 21:55:30 +0100
Joerg Sonnenberger  wrote:

> On Tue, Jan 24, 2012 at 03:30:37PM -0500, Matthew Mondor wrote:
> > There is also a related PR but which is for threads stack alignment:
> > lib/39465
> 
> That bug is wrong. NetBSD uses SYSV ABI and that mandates 4 Bytes
> alignment for the stack. GCC versions before ~4.5 or so are just
> completely broken in this regard.

I know that some OSs do it for i386 to reduce the overhead of SSE
instructions setup, but I wasn't aware that this could be problematic
with the i386 SYSV ABI (since 16 is a multiple of 4).  Having just
looked at both i386 and x86-64 SYSV ABI docs now, I think that you're
right for i386.

It's in the x86-64 case that stack frames are 16-byte aligned, with
arrays larger than 16 bytes also needing to be 16-byte aligned
(possibly including the stack)...

Thanks,
-- 
Matt


Re: Possible incorrect usage of STACKALIGN in kern_exec

2012-01-24 Thread Matthew Mondor
On Tue, 24 Jan 2012 21:01:49 +0100
Martin Husemann  wrote:

> On Tue, Jan 24, 2012 at 08:21:42PM +0100, Paul Fleischer wrote:
> > Is the usage of STACKALIGN indeed incorrect in this situation, or am I
> > missing the big picture?
> 
> I stumbled across this when revamping execve1 for posix_spawn recently.
> 
> The intention seems to be to align the stack on a 8 byte boundary
> (where arm usualy only requires 4 byte alignment). I did not dig in the
> ARM ABI docs deep enough to see why this would be needed.
> 
> However, the current implementation seems to be broken - the macro works
> on the stack pointer but not on a length variable, as you noted.
> 
> Can anyone explain why arm would need 8 byte alignment?

Do some architectures (i.e. x86) have better performance if the stack
is 16-bytes aligned?  If so, perhaps that this could be MI, satisfying
both 8-bytes (or 4-bytes) alignment, by aligning stacks at 16-bytes?
Would this be considered wasteful?  Of course, x86-64 MD code could
also be used...

There is also a related PR but which is for threads stack alignment:
lib/39465

Thanks,
-- 
Matt


Re: Reduce KAUTH_GENERIC_ISSUSER usage (batch 1)

2012-01-18 Thread Matthew Mondor
On Tue, 17 Jan 2012 20:36:35 -0500
Elad Efrat  wrote:

> Attached is a diff that reduces the use of KAUTH_GENERIC_ISSUSER. I
> plan to commit it a week or so after the branch.

Thanks for working on this.

While I understand most changes, after looking at the diff I wondered:
anyone know what is special about pxg(4) that requires a special
MACHDEP_PXG check as opposed to MACHDEP_UNMANAGEDMEM?

Thanks,
-- 
Matt


Re: PUFFS and existing file that get ENOENT

2012-01-16 Thread Matthew Mondor
On Mon, 16 Jan 2012 22:26:30 + (UTC)
y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:

> hi,
> 
> > On Mon, 16 Jan 2012 10:56:33 + (UTC)
> > y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
> > 
> >> when the kernel wants to cache other files.
> >> ie. whenever the kernel decides to reclaim it. :-)
> >> you can increase the chance by running
> >>while :;do sysctl -w kern.maxvnodes=0; done
> >> or something like that.
> > 
> > Wouldn't the performance also drop significantly with a permanently low
> > maxvnodes, though?
> 
> it does never succeed.
> anyway the performance is not a priority when trying to reproduce a bug.

Oh, I had missed the context, thanks for the explanation.
-- 
Matt


Re: PUFFS and existing file that get ENOENT

2012-01-16 Thread Matthew Mondor
On Mon, 16 Jan 2012 10:56:33 + (UTC)
y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:

> when the kernel wants to cache other files.
> ie. whenever the kernel decides to reclaim it. :-)
> you can increase the chance by running
>   while :;do sysctl -w kern.maxvnodes=0; done
> or something like that.

Wouldn't the performance also drop significantly with a permanently low
maxvnodes, though?

Thanks,
-- 
Matt


Re: buffer cache & ufs changes (preliminary ffsv2 extattr support)

2012-01-16 Thread Matthew Mondor
On Sun, 15 Jan 2012 15:21:40 -0500 (EST)
Mouse  wrote:

> However, I think that constitutes a good implementation of a bad idea.
> This makes a file no longer a long list of octets; it becomes multiple
> long lists of octets.  The Mac did this, with resource forks and data
> forks, and you may note OS X doesn't do it any longer.  I suspect these
> will seem like a good idea for a while, until people start discovering
> all the things they break, or that break them, and realize that they
> didn't learn from history and thus had to repeat it.

I didn't know that Apple dropped the idea, but I have always found the
idea flaky myself (and sorry for the "rant"):

- Applications may still implement and maintain metadata as they wish
  without the feature
- Requires changes to support in OS, FS, and many file manipulation
  tools
- No standard API for these, few, incompatible, restricted
  solutions/formats for archival
- Security implications (scanning tools which aren't aware might skip
  "hidden/extended" data; if ACLs are eventually implemented and are
  using these, the implementation should not only support a system
  domain, but also use IDs rather than strings (or at least severely
  sanity-check a restricted string format))
- Inevitable eventual loss of the extended data, possibly because of
  backup procedures not aware of it, moving/copying/editing files with
  non-aware/third-party tools, etc (also consider editors that save to
  another file to then rename)
- An administrative nightmare when tools such as find/locate/grep/diff
  won't disclose data that the admin might be looking for but is now in
  an extended attribute

But this is only the opinion of a user, and I could keep the feature
disabled on my systems, of course, so I don't necessarily object to
optional support for it.
-- 
Matt


Re: NetBSD/usermode (Was: CVS commit: src)

2011-12-31 Thread Matthew Mondor
On Sat, 31 Dec 2011 17:20:16 +
David Holland  wrote:

> The other obvious approach is to add one or more new ptrace operations
> to provide proper/adequate/better support for intercepting system
> calls. This is probably a more useful facility in the long run, and it
> *can* be made leakproof, but it'll be more work.

Could this also eventually allow systrace-style functionality that'd be
safer than the previous implementation?

Thanks,
-- 
Matt


Re: close and ERESTART

2011-12-26 Thread Matthew Mondor
On Mon, 26 Dec 2011 05:19:22 +
Taylor R Campbell  wrote:

> +
> + error = fd_close(SCARG(uap, fd));
> + if (error == ERESTART)
> + error = EINTR;
> +
> + return error;

If it's also guaranteed that the file descriptor state is closed in the
event of an ERESTART error, I like this, personally.
-- 
Matt


Re: cloning device close race?

2011-12-19 Thread Matthew Mondor
On Sun, 18 Dec 2011 23:40:33 -0500
Matthew Mondor  wrote:

> On Sun, 18 Dec 2011 22:34:03 -0500
> Thor Lancelot Simon  wrote:
> 
> > If you run 10 or so copies at once on a multiprocessor system
> > with DIAGNOSTIC, you'll see a lot of this message emitted:
> > 
> > vrelel: missing VOP_CLOSE(): vnode @ 0xfe801e73cb28, flags 
> > (0x800030)
> > tag VT_UFS(1), type VCHR(4), usecount 2, writecount 0, holdcount 0
> > freelisthd 0x0, mount 0x800024235000, data 0xfe801de01f00 lock 
> > 0xfe801e73cc38
> > tag VT_UFS, ino 46213, on dev 4, 0 flags 0x0, nlink 1
> > mode 020644, owner 0, group 0, size 0
> > 
> > I am guessing the problem also exists with other cloning
> > pseudodevices, not just the new /dev/random implementation.
> 
> This just reminds me that a friend yesterday pointed me to this article
> about close(2)'s POSIX semantics:
> 
> http://www.daemonology.net/blog/2011-12-17-POSIX-close-is-broken.html

In case someone else was also interested in this, I was informed
off-list that NetBSD ensures that the file descriptor be in closed
state after close(2), in the event where it is interrupted and errors
with EINTR.  In another discussion with the person who originally
forwarded me the above URL, I was told that according to her
investigation, Linux also does this.

Thanks,
-- 
Matt


Re: cloning device close race?

2011-12-18 Thread Matthew Mondor
On Sun, 18 Dec 2011 22:34:03 -0500
Thor Lancelot Simon  wrote:

> If you run 10 or so copies at once on a multiprocessor system
> with DIAGNOSTIC, you'll see a lot of this message emitted:
> 
> vrelel: missing VOP_CLOSE(): vnode @ 0xfe801e73cb28, flags 
> (0x800030)
>   tag VT_UFS(1), type VCHR(4), usecount 2, writecount 0, holdcount 0
>   freelisthd 0x0, mount 0x800024235000, data 0xfe801de01f00 lock 
> 0xfe801e73cc38
>   tag VT_UFS, ino 46213, on dev 4, 0 flags 0x0, nlink 1
>   mode 020644, owner 0, group 0, size 0
> 
> I am guessing the problem also exists with other cloning
> pseudodevices, not just the new /dev/random implementation.

This just reminds me that a friend yesterday pointed me to this article
about close(2)'s POSIX semantics:

http://www.daemonology.net/blog/2011-12-17-POSIX-close-is-broken.html

I then only checked the close(2) manual page so far, which indeed lists
EINTR as a possible errno value on error.  But since the article also
mentions that some OSs decided to ensure that EINTR never be returned
to avoid the problems, I wondered: does NetBSD already do something to
ensure that either: 1) EINTR not be possible or atomically be restated
transparently in the same LWP, or 2) the state of an FD after an
interrupted close(2) never be inconsistent?  The latter solution might
still allow race conditions in multithreaded code, possibly.

Thanks,
-- 
Matt


Re: [RFC] getgroups2 system call

2011-12-13 Thread Matthew Mondor
On Wed, 14 Dec 2011 07:04:06 +0100
m...@netbsd.org (Emmanuel Dreyfus) wrote:

> - a fixed lentgh header is highly desirable for performance
> optimization. For instance glusterfs fetches the header and the data
> using readv(2) with an iovec that has two slots. That way it gets write
> date aligned on a page boundary.

What does NFS do in this case?  I seem to remember that it also imposes
a sane size limit, possibly even below NGROUPS_MAX, is it really the
case?  If so, would this also be acceptable?
-- 
Matt


Re: Lost file-system story

2011-12-11 Thread Matthew Mondor
On Fri, 9 Dec 2011 22:12:25 -0500
Donald Allen  wrote:

> Linux systems do periodically write ext2 meta-data to the disk. And
> ext2 fsck has always been very good, and has gotten better over the
> years, due to the efforts of Ted T'so. I first installed Linux in
> 1993, almost 20 years ago, and have been using it continuously ever
> since. I have *never* lost an ext2 filesystem and I've never mounted
> one sync.

I'm not sure if it's the case on Linux with ext2, but by default NetBSD
FFS mounts are not sync, nor async; metadata is sync and data blocks
are async.  In async mode, all data is asyncronously written, including
the metadata, and in sync mode everything is written synchronously (the
default OpenBSD uses, if I recall).  I just wanted to specify this as
you mentioned not mounting your ext2 systems in sync mode, but a
default NetBSD FFS mount will not be in sync mode either.

Other available options with FFS are using soft dependencies (softdep)
or WAPBL metadata journalling (log), with which it is possible to have
increased performance VS the default mode, without really sacrificing
reliability, unlike with the fully async mode.  In those modes,
metadata is written asynchroneously as well.

Sorry if what I said is already obvious to you,
-- 
Matt


Re: Lost file-system story

2011-12-09 Thread Matthew Mondor
On Fri, 9 Dec 2011 15:50:35 -0500
Donald Allen  wrote:

> were not designed to do this. The reason I'm beating on this is that I
> would have liked to use NetBSD for the application I have in mind, but
> I need the performance improvement that async provides (my tests show
> this; the tests also show that NetBSD async is about as fast as Linux,
> much faster than OpenBSD async, at least for doing a lot of writing,
> such as un-tarring a large tar file). This is practical if the joint

The speed and reliability WAPBL provides have been enough for my uses
personally; are the few seconds saved using async really that worth the
trouble?  Also, if raw speed is needed to do many installations on
identical systems, dd with large blocks to mirror the system might be a
faster alternative...

I'm not argueing that fsck shouldn't be able to recover though; it
ideally should, but the problem seems to be that too much metadata is
missing when crashing while writing in async mode.

OpenBSD's async mode would be slightly slower while flushing metadata
more often, probably.  Perhaps that having a sysctl to control flushing
would be a good thing, though.

Thanks,
-- 
Matt


Re: Use consistent errno for read(2) failure on directories

2011-12-09 Thread Matthew Mondor
On Fri, 9 Dec 2011 11:56:32 +0100
Nicolas Joly  wrote:

> On Fri, Dec 09, 2011 at 04:36:55AM -0500, Matthew Mondor wrote:
> > In the case of sys/rump/librump/rumpvfs/rumpfs.c, is it possible that
> > the underlaying implementation could previously decide if it could
> > support read(2) on directories, and this would no longer be the case
> > with this patch?
> 
> No. This only impact the rump fs itself (used as the root file system
> in applications); its does not matter while accessing other fs through
> rump.

Thanks for the explanation,
-- 
Matt


Re: Use consistent errno for read(2) failure on directories

2011-12-09 Thread Matthew Mondor
On Fri, 9 Dec 2011 09:33:54 +0100
Nicolas Joly  wrote:

> According to the online OpenGroup specification for read(2) available
> at [1], read(2) on directories is implementation dependant. If
> unsupported, it shall fail with EISDIR.

In the case of sys/rump/librump/rumpvfs/rumpfs.c, is it possible that
the underlaying implementation could previously decide if it could
support read(2) on directories, and this would no longer be the case
with this patch?

Thanks,
-- 
Matt


Re: emap

2011-12-04 Thread Matthew Mondor
On Mon,  5 Dec 2011 04:19:13 + (UTC)
y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:

> > Although I didn't think it'd be necessary to say so until this point, I
> > admit that I myself didn't really understand what Takashi said about
> > recommending amd64 over i386.  If the hardware is 32-bit, or on
> > constrained memory devices, i386 definitely needs to be supported.
> 
> it isn't my recommendation.  rmind@'s.

Sorry about that, I should have rechecked upthread instead of looking
at the quoting mess :)
-- 
Matt


Re: secmodel_register(9) API

2011-11-29 Thread Matthew Mondor
On Tue, 29 Nov 2011 02:51:38 +0100
Jean-Yves Migeon  wrote:

> Reviews before merge welcome. If nobody raises his voice, I'll proceed 
> to commit it at the end of the week.

Hello,

I admit not having audited the kauth and secmodel code recently, the
last time being shortly after Elad's initial implementation, please
bear with me if some observations are irrelevant:

  There are various ad-hoc calls to printf() which could probably be
  replaced by a more generic function call also resolving the error
  number to a string matching the constant i.e. secmodel_perr(int
  errno, const char *function); or similar, possibly wrapped by a macro
  using __FUNCTION__ avoiding the redundant function names

  The initialization functions, such as secmodel_keylock_init(), will
  report an error in the dmesg but do not propagate errors (they're
  void functions, suggesting the caller will not suspect anything).
  Should not the system panic for similar security critical failures?
  I think that I saw a similar situation under the various "case
  MODULE_CMD_INIT".

  When seeing the strcasecmp() calls in the eval_* functions for names
  such as "is-securelevel-above" or "is-root", my first impression was
  that integer constants, macros, or even a system of interned strings
  and references would be nice.  Then it struck me that if the goal
  was to export these, exporting actual variables might be best
  (although in any case, exporting these seem to somewhat defeat
  kauth-style centralization.  But if I understand, this is not for
  general use in the kernel, but for use by other security models?  If
  so, it's not so much out of scope in the sense that it remains in
  sys/secmodel)...


Note that the following is not criticism on your patch, but
pipe-dreaming and opinion.  It's also outside the scope of the existing
kauth implementation, but I couldn't resist, considering it was slightly
on-topic:

Having a main area to look for security related decisions is a good
thing, and kauth was a good step in that direction.  It's also nice to
permit an administrator or organization to tweak the system for their
needs using an elegant architecture.

However, I've always found its design to be slightly too dynamic,
perhaps too much of an interpreter (and those eval_* functions make it
even more so).  Then there is all the C code dedicated to attaching,
detaching parts to the "program tree" at runtime, etc.  Although I'm
not familiar enough with the original Darwin implementation, that is
probably similar there.

Since this is security related, it would not be unreasonable if the only
possible runtime changes were user/admin configuration (module-specific
sysctl configuration knobs, file system permissions, PaX flags, etc).
This means that the final runtime security system could be statically
generated at compile-time.

Dreaming ahead along that path (this part could still be possible with
an interpreter-like model though), it might be possible to create a
similar system, centralized yet modular (not at runtime, but for
human-friendly organization), to design and use a simple mostly
declarative language to edit and represent security models, then
compile that representation to static code.  The input could be more
elegant (also more easily allowing to define the domains and their
authorize interface, any hierarchies, etc), the output could permit a
more efficient runtime (generating unrolled code where wanted rather
than loops running among hooks lists)...  And of course there could be
specialized static analysis and test tools to warn model designers of
possible shortcommings in their designs.  With finally a preprocessor
tool so that it'd be possible to embed the language with C code, where
necessary...

But then again, I'm only pipe dreaming, and that's always easier than
implementing any of that, of course :)
-- 
Matt


Re: emap

2011-11-25 Thread Matthew Mondor
On Fri, 25 Nov 2011 23:25:24 +0400
Aleksej Saushev  wrote:

> Thor Lancelot Simon  writes:
> 
> > On Fri, Nov 25, 2011 at 12:50:58PM +0400, Aleksej Saushev wrote:
> >> Mindaugas Rasiukevicius  writes:
> >> 
> >> > y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
> >> >> hi,
> >> >> 
> >> >> what's the status of emap and pipe?
> >> >> 
> >> >
> >> > ... and encourage our users to use amd64 instead
> >> > of i386.
> >> 
> >> I'm sorry to intervene, what about WINE? Unless we're going to have it
> >> functional on amd64, encouraging is useless.
> >
> > I don't understand your comment.  Are you suggesting that a large fraction 
> > of
> > NetBSD/i386 users use WINE and therefore would not be able to switch to the
> > amd64 port?
> 
> I mean that those users who could switch most probably have switched already.
> And one of serious reasons to stay on i386 is functional WINE.

Although I didn't think it'd be necessary to say so until this point, I
admit that I myself didn't really understand what Takashi said about
recommending amd64 over i386.  If the hardware is 32-bit, or on
constrained memory devices, i386 definitely needs to be supported.

But then again, I'm not familiar with the emap code; from the bits I
read in this thread, it could serve to optimize pipes?  That pipes can
be better optimized on amd64 than on i386 is no problem to me, so I
assumed that he was talking about encouraging users to use amd64 if
they want to take advantage of a particular feature, not that i386
would get deprecated and start to become unsupported.

It would be nice if someone who knows better could explain better what
was meant, or confirm what I said above (if I understood correctly),
considering that it caused some worries...

Thanks,
-- 
Matt


Re: puffs & netbsd-5 (was VOP_GETATTR: locking protocol change proposal)

2011-11-21 Thread Matthew Mondor
On Mon, 21 Nov 2011 08:45:52 +
Emmanuel Dreyfus  wrote:

> On Mon, Nov 21, 2011 at 03:26:35AM -0500, Matthew Mondor wrote:
> > I seem to remember you previously writing about using puffs/rump on
> > netbsd-5, is that still on netbsd-5?
> 
> I use PUFFS on netbsd-5, and fixed a few bugs in it, so you defintively
> need latest netbsd-5 to avoid bugs. I nevver used rump, and indeed Pooka
> told me that it was not production-ready on netbsd-5.

My systems are fairly recent; what I'll do then is update again and use
psshfs some more, so that I can file a PR when I again get the busy
looping issue.

My two older PRs related to rump/puffs on NetBSD-5 were kern/43589 and
kern/43590, which were unrelated problems.

Thanks,
-- 
Matt


Re: puffs & netbsd-5 (was VOP_GETATTR: locking protocol change proposal)

2011-11-21 Thread Matthew Mondor
On Mon, 21 Nov 2011 08:04:46 +
Emmanuel Dreyfus  wrote:

> FWIW I spent weeks tracking down a file corruption bug on growing giles
> in PUFFS because VOP_GETATTR operates on an unlocked vnode. If the 
> VOP_GETATTR request follows a not yet completed VOP_FSYNC (as done by 
> ioflush thread), I got toasted: VOP_FSYNC causes PUFFS to send a SETATTR
> to the filesystem, and on completion VOP_GETATTR gets from the filesystem
> a file size smaller thant what VOP_FSYNC just set. This cause the file
> to be truncated by the kernel, and data written between VOP_FSYNC
> and VOP_GETATTR to be discarded and replaced by a chunk of zeroed bytes.
> 
> I had to add a lock on file size modification in PUFFS to fix the problem.

I seem to remember you previously writing about using puffs/rump on
netbsd-5, is that still on netbsd-5?

The reason I ask is that I'm seeing various bugs when using psshfs (and
had various problems when mounting CDs using rump_cd9660); at the time
when I corresponded with Pooka about it he told me that it wasn't ready
for production use on netbsd-5 and recommended -current.  One of the
problems is the process can suddenly start to consume as much CPU
time as it can, while operations become real slow or lock.  Another
issue had to do with inconsistencies between the rump-seen state and
actual on-disk state, possibly due to cache invalidation issues or the
like...

A few days back I still had the psshfs process locked in a loop (I
didn't use it often enough to investigate where it loops yet).
This might not be related at all to the locking issues you're having,
though.

Thanks,
-- 
Matt


Re: fs-independent quotas (binary plists)

2011-11-17 Thread Matthew Mondor
On Thu, 17 Nov 2011 10:50:17 +0100
Manuel Bouyer  wrote:

> In this context, "text format" means a key/value pair format, in which
> some keys are optionnal and values can be of arbitrary types. Maybe you can
> do this with a binary format too, but it doesn't exists yet.

This reminds me that years ago someone implemented support to save
plists in a binary format[1] (this doesn't necessarily mean that it
would help solve this problem, though).  But I'm surprised that since
all these years the support wasn't added; anyone know if there is
general resistance to an optional compact and portable binary format,
and if so, the reasons?

If such a format was supported, it wouldn't be harder to machine or
human-process (proplib could be used as it is now for code, and bplists
could be easily exported to an xml format as requested to edit in an
editor, i.e. via a viplist, plistctl or such command (which also could
use advisory locking, of course, and save back to binary format if the
system is configured to use a binary format).  In theory, it could also
increase performance, and a binary format would be simpler to parse by
the kernel than xml, minimizing bugs...

[1] ftp://ftp.netbsd.org/pub/NetBSD/misc/freza/bplist-2007-10-27.diff

Thanks,
-- 
Matt


Re: MAXNAMLEN vs NAME_MAX

2011-11-14 Thread Matthew Mondor
On Mon, 14 Nov 2011 16:58:02 +
David Holland  wrote:

> On Mon, Nov 14, 2011 at 04:03:09AM -0500, Matthew Mondor wrote:
>  > > I was recently talking to some people who'd been working with some
>  > > (physicists, I think) doing data-intensive simulation of some kind,
>  > > and that reminded me: for various reasons, many people who are doing
>  > > serious data collection or simulation tend to encode vast amounts of
>  > > metadata in the names of their data files. Arguably this is a bad way
>  > > of doing things, but there are reasons for it and not so many clear
>  > > alternatives... anyway, 256 character filenames often aren't enough in
>  > > that context.
>  > 
>  > It's only my opinion, but they really should be using multiple files or
>  > a database for the metadata with as necessary a "link" to an actual
>  > file for data.
> 
> Perhaps, but telling people they should be working a different way
> usually doesn't help. (Have you ever done any stuff like this? Even if
> you have only a few settings and only a couple hundred output files,
> there's still no decent way to arrange it but name the output files
> after the settings.)

I agree that if they already started on the wrong path it's hard to
tell them to change their methods, but it was probably not ideal to
expect that file length was an unlimited resource...

The situations where I had to deal with such were web sites, with media
stored as files and metadata in databases (with file names either being
hashes or a serial number); another instance was in camera security
software saving stills and archiving videos as files, with the
directory and file names being based on a type of time stamp.  Another
case is mmmail where mail is stored in a custom format in files, backed
by a postgresql database.

It works well, but it can be tricky not to leak files (in the case of a
web application using postgresql for instance, delete trigger functions
can be used to insert entries in a table for files to be deleted, with
a scheduled event or daemon cleaning those up).  The few instances
where I've seen leaked files were after abnormal crashes/reboots
though; some recovery/cleanup software is then useful.

I guess that this also gives an answer you expected however: that
it's more complex to DTRT, as user software must create the link
between two loosely coupled systems :)

> Pathname buffers generally shouldn't be (and in NetBSD, aren't) on the
> stack regardless. Even at only 1K each, it's really easy to blow a 4k
> kernel stack with them. (In practice you can generally get away with
> one; but two, like you need for rename, link, symlink, etc. is too
> many.)
> 
> Or I guess you don't mean in the kernel, do you...

Oh, yes I meant userland indeed; as kernel code should minimize stack
use...
-- 
Matt


Re: MAXNAMLEN vs NAME_MAX

2011-11-14 Thread Matthew Mondor
On Sun, 13 Nov 2011 23:08:30 +
David Holland  wrote:

> I was recently talking to some people who'd been working with some
> (physicists, I think) doing data-intensive simulation of some kind,
> and that reminded me: for various reasons, many people who are doing
> serious data collection or simulation tend to encode vast amounts of
> metadata in the names of their data files. Arguably this is a bad way
> of doing things, but there are reasons for it and not so many clear
> alternatives... anyway, 256 character filenames often aren't enough in
> that context.

It's only my opinion, but they really should be using multiple files or
a database for the metadata with as necessary a "link" to an actual
file for data.
But I also tend to think the same of software relying on extended
attributes, resource forks and the like (with the possible exception of
a specialized facility for extended permissions :)

> (This sort of usage also often involves things like 50,000 files in
> one directory, so the columnizing behavior of ls is far from the top
> of the list of relevant issues.)

This reminds me, does anyone know about the current state of
UFS_DIRHASH?  I remember reading about some issues with it and ending up
disabling it on my kernels, yet huge directories can occur in a number
of scenarios (probably a more pressing issue than extending file names,
actually)...

>  > The 255 limit was just because that's how many bytes a one byte length
>  > field permitted, not because anyone thought names that long made sense.
>  > But if you're going to increase it, why  stop at 511?  That number
>  > means nothing - the next logical limit would be 65535 wouldn't it?
> 
> Well... yes but there are other considerations. As you noted, going
> past one physical sector is problematic; going past one filesystem
> block very problematic. Plus, as long as MMU pages remain 4K,
> allocating contiguous kernel virtual space for path buffers (since if
> NAME_MAX were raised to 64K, PATH_MAX would have to be at least that
> large) could start to be a problem.

I agree, especially with all the software that allocates path/file name
buffers on the stack (but even on the heap it could be a general memory
waste with 64KB, other than the memory management performance issues).
-- 
Matt


Re: sysctl(7) knob to allow users to control CPU affinity

2011-11-03 Thread Matthew Mondor
On Thu, 03 Nov 2011 17:01:48 +1100
matthew green  wrote:

> > Since the default is to not allow affinity control, it's not of utmost
> > importance, but it could allow a compromise between total restriction
> > and total freedom...  I have no objection to that sysctl personally.
> 
> i think the default should be changed, but user-specified affinity
> shouldn't be considered an absolute rule, just a preference.  i'm not
> sure i understand exactly what sort of issue you're envisioning.

I assumed there could be issues since pset(3) is restricted to the
superuser (as well as pthread_setaffinity_np(3) now), but when
rethinking about it I admit not seeing a problem as non-privileged
processes cannot change the process priority beyond their class'
priority.

The only other case that comes to my mind would be a dmover(9) like
system eventually reserving processor(s) for dedicated tasks, but I
guess that in this case the reserved cores would simply be made
unavailable in cpuctl(8)/pset(3)/etc...
-- 
Matt


Re: sysctl(7) knob to allow users to control CPU affinity

2011-11-02 Thread Matthew Mondor
On Thu, 03 Nov 2011 01:50:49 +0100
Jean-Yves Migeon  wrote:

> Here's a proposal for a sysctl(7) knob to easily allow non-superusers to 
> set the CPU affinity of processes and threads they own:
> 
> security.secmodel.suser.usersetaffinity
> 
> (ressembles the one already existing to allow for user mounts)
> 
> Would it be acceptable to modify current secmodel_suser(9) to allow this?
> 
> This issue comes regularly on various tech-* MLs, motivated by the fact 
> that people expect this behavior based on what they encounter on other OS.

Just out of curiosity, but is it possible for the superuser to still
reserve wanted CPU/cores, such that non-privileged users could, if that
sysctl is enabled, work with the non-reserved ones?  Or, can the
sysadmin specify CPU/cores and/or limits for non-privileged users?

Since the default is to not allow affinity control, it's not of utmost
importance, but it could allow a compromise between total restriction
and total freedom...  I have no objection to that sysctl personally.

Thanks,
-- 
Matt


Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive

2011-10-31 Thread Matthew Mondor
On Mon, 31 Oct 2011 19:58:27 -0400
Greg Troxel  wrote:

> Obligatory actual netbsd tech-kern content: It seems like we really need
> a sync_synchronous(2) system call that guarantees that all file system
> operations that have completed (syscall returned) before the issuance of
> the sync_synchronous call are on disk before sync_synchronous returns.
> It seems odd that for sync, there is no waiting, fsync seems to wait,
> and fsync_range can flush or not flush caches, more or less.

Hmm since in sync(2), the non-synchronous issue is noted as a bug:

BUGS
 sync() may return before the buffers are completely flushed.

Does this mean that sync(2) should normally be synchronous and fixed to
be, such that sync_synchronous(2) not be necessary?
-- 
Matt


Re: fs-independent quotas

2011-10-31 Thread Matthew Mondor
On Sun, 30 Oct 2011 13:28:18 -0400
"James K. Lowden"  wrote:

> Unless someone suggests a good word for "limited thing", maybe the best
> option is to invent a term of art and *define* it to mean what you
> want, after the manner of Humpty Dumpty.  To that end I suggest
> "quotar" or "quoton".   They're both short, easy to remember, and mean
> nothing obvious.   The latter kinda sorta sounds like "quota on", which
> might be helpful.  

Reminds me of BDB's DBT (Database Thing :) so there is some precedent

Or maybe something like quota_item or quota_object?
-- 
Matt


Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive

2011-10-28 Thread Matthew Mondor
On Fri, 28 Oct 2011 20:33:29 -0400
Greg Troxel  wrote:

> So, I'm inclined to patch rdiff-backup not to fsync, since it seems
> excessive, and the backup is toast if the machine crashes before it is
> finished -- in that case rdiff-backup just rolls back.  Opinions?

I also wonder why fsync would be used for every file, especially if you
consider a whole run a single "transaction", even more so if using
snapshots (although you don't mention using them).  In which case it
simply should report failure and abort on any open/write/rename/close
error, and at the end, fsync once, also checking for error.  If at
that point everything was successful, the "transaction" is commited (as
far as software is concerned, of course, hardware buffers might still
need flushing), otherwise everything should be rolled back, unless an
inconsistent state is allowed (where the next full backup might fix
that).

I'm however wondering if the excessive fsync(2)s weren't eventually
added because of issues with ext4, as I somehow remember unix semantic
exceptions with it, and know that some have lost files using it as
they'd normally safely use other file systems (and I haven't followed
progress to know if it's since fixed).

But if rdiff-backup cannot optionally avoid those, adding an option to
tell it not to fsync at every file as you suggested would be very sane
IMO (it still could default to sync mode, in case there's upstream
resistence)...

I can understand the need for some transaction-logging applications to
call fdatasync(2) regularily, but that's another matter (and even then
it's usually configurable after how many bytes or seconds to call it to
allow the administrator to tweak performance).
-- 
Matt


Re: Extended attributes Linux interface

2011-10-21 Thread Matthew Mondor
On Fri, 21 Oct 2011 00:29:12 -0400
Matthew Mondor  wrote:

> If unicode strings are possible, I think that it'd be possible for a
> string to look like "system" but to actually be something else to an
> auditing administrator, unless all tools clearly showed those non-ASCII
> bytes in an escaped format.

If the above theory is true, if we eventually supported extended
permissions such as access lists, they could possibly be implemented in
a special empty string class, with a special empty string key, and a
single structured object value specifying the permissions, rather than
relying on various keys within the "system" class.

Yet ideally for performance and security, it'd be ideal if the
interface only presented integer IDs for the class, and reserved
integer key attributes for the i.e. EXTATTR_SYSTEM class (just like our
groups are really gids).  The Linux compatibility interface, if
preserved, could be oblivious to system class attributes and only be
useful for the general purpose user attributes...  The problem here
would be that user tools using only the Linux API would not be able to
backup the full state (in this case, the extended permissions,
unfortunately)...
-- 
Matt


Extended attributes Linux interface

2011-10-20 Thread Matthew Mondor
Hello,

There were previously discussions, started by Emmanuel, concerning the
extended attributes, including on the various available APIs and which
to support etc.

At the time I read them I was catching up with a lot of mail and had
written down a small note about a potential security implication that
crossed my mind if we used the Linux interface.  Perhaps someone can
(dis)confirm:

Strings are used instead of IDs to distinguish the class of an extended
attribute, i.e. "system" etc.  My question is then: must those be
limited to ASCII or can they support arbitrary bytes, or UTF-8?

If unicode strings are possible, I think that it'd be possible for a
string to look like "system" but to actually be something else to an
auditing administrator, unless all tools clearly showed those non-ASCII
bytes in an escaped format.

Of course, if the kernel wanted to match "system", it wouldn't match
then, but the fact that it may _appear_ to be correct to an admin may
introduce a security issue if extended permissions were ever
implemented on top of that system.  Perhaps that this problem could
also exist with the key names in case they're part of permission
descriptions?

Thanks,
-- 
Matt


Re: UNIX kernel notification system

2011-10-05 Thread Matthew Mondor
On Wed, 5 Oct 2011 07:18:50 +
David Holland  wrote:

> Trying to enumerate all possible kernel messages and assign code
> numbers to them, so that they're machine readable, is a lost cause and
> also probably a bad idea. I've used systems that tried to do that.
> What happens is that (1) when you really want it (e.g., system is
> hosed, running single user, etc.) the daemon that's supposed to read
> the message codes and print them out in human-readable form isn't
> running, or dies, or gets behind and drops information/loses data, so
> you end up having to read the raw codes and translate them with the
> (paper) reference manual in hand. Enough of this and you don't need
> the manual any more, but that's not a state anyone wants to aspire to.

I agree that this is not the way to go; however I think that messages
could have more structure without necessarily being only an enumeration
(and the code to human-print them for the debugger or higher-level
processes could be part of the kernel services, although custom
formatters may be wanted for GUI applications, perhaps).
Unfortunately, this would probably also require at least some minimal
dynamic tagged-typed system (string, number, address/pointer,
condition-type, print-function, etc) so it has its own
complexity, without mentioning how to gracefully memory-manage
these objects...  There is proplist, but it seems overkill with
unnecessary parsing overhead.

If wanting to be minimal, why would a simple category integer
enumeration be a problem?  There would be much fewer categories than
message types, and the category could help to determine at which level
to route the message.  If I understand, this is what Erik had in mind?
A libc function could also easily map a category integer to a
human-readable uppercase string for debugging and syslog.

> As for receiving messages, as I think I've said from time to time
> before the right way to do this is to have a per-user process that
> connects to the message source and displays messages as they come in.
> This is true not just for kernel messages but also other random stuff
> that gets blatted to ttys: urgent syslogs, shutdown walls, biff
> notices, talk requests, etc. If there's no receiver process, writing
> to the tty is the fallback state; but remember that a desktop (let
> alone a phone) may have no ttys open even when the user is working
> away and willing to receive messages.

I think that this would be a good approach; it also reminds me
of dbus, although I'm not particularily a fan of its implementation
(it also embeds Rexx-like facilities for registering services and RPC,
something I find redundant on unix, and the broadcasting model has
security implications).

Assuming we had such a userspace message router, kernel messages would
also be tagged with the category, uid, pid, lwp (and possibly -1 where
one of those isn't applicable)?

> As I've also said from time to time, there really ought to be a
> system-level publish/subscribe system for carrying such messages; to
> really get such messages right you need to have multiple channels and
> the ability to filter or subscribe selectively or wahtever. Such a
> service can be used for other things too.

The category enum I was talking about earlier could also be called a
channel, I guess
-- 
Matt


Re: (off topic) mail line wrapping

2011-10-04 Thread Matthew Mondor
On Tue, 4 Oct 2011 09:35:16 +0200
Alan Barrett  wrote:

(flowed paragraph follows)

> Ignoring special cases, the rules are roughly this:  The sender 
> marks soft-wrapped paragraphs by ending every line except the 
> last with a space.  The sender marks hard-wrapped lines by not 
> ending them with a space.  (A paragraph of only one line cannot 
> be soft wrapped.)

Fortunately, your auto-flowed paragraphs are still properly wrapped so
that even clients that don't support it will display them properly,
though.
-- 
Matt


Re: UNIX kernel notification system

2011-10-03 Thread Matthew Mondor
On Mon, 3 Oct 2011 11:31:17 -0700
Erik Fair  wrote:

> less(1) (or more(1)) doesn't take care of you? The nice thing about such 
> formatting is that the text can be wrapped at relatively arbitrary word 
> boundaries, making it more readably displayable on a wider range of display 
> widths (e.g. mobile phones, tablets). Just as all the world's not a VAX 
> (cried the PDP-11 users), so also is the world rather more than just 80 by 24.

Sorry to have to add anything to this off-topic discussion;

One issue is that a message may a mix of text to be wrapped and text to
be left as-is (code for instance), so every paragraph/line must be able
to be auto-wrap annotated.  Of course there is the possibility of HTML
mail (with its own issues) and multiple MIME parts, but it's
traditionally fine on tech lists to mix code and text inline, with the
only exceptions that I see being in Apple Mail posts.

Another issue is that readers that will wrap such paragraphs don't
usually have a configuration option to specify the width of auto-flowed
paragraphs, so for instance in the client I use (a GTK2 client), those
paragraphs extend far right until the end of the window (which means
much more than 80 columns), so reading them is unfortunately harder.

But, with the recent proliferation of Apple Mail posts on the mailing
lists I try to throttle my complaints about it (my last one being
http://mail-index.netbsd.org/tech-userlevel/2010/10/30/msg004119.html :)
-- 
Matt


Re: UNIX kernel notification system

2011-10-03 Thread Matthew Mondor
On Mon, 3 Oct 2011 00:40:46 -0700
Erik Fair  wrote:

> Why not a classification/taxonomy of kernel missives? This doesn't mean we 
> can't continue to have relatively free form (and possibly amusing) text for 
> those conditions we're not yet prepared to classify/codify yet ('cause 
> they're rare, or debug, or ... whatever). The potential for win is in making 
> (or retaining) software parse-ability to enable software response.

Interestingly this very paragraph reminds me of Common Lisp signals
and restarts; signals can be conditions or errors and hold structure
(and inheritence), blocks of code may ignore or catch them, uncatched
exceptions may be handled by software including the invokation of
restarts, or left alone to be routed to the debugger (which is even
overridable through a hook), and there is support for stack-unwind
protected code which gets executed no matter if an exception causes a
long jump out.

Of course, all of this seems overkill for our purposes, but probably
worth mentioning for inspiration...
-- 
Matt


Re: Changing the gpio(4) API/ABI

2011-09-26 Thread Matthew Mondor
On Fri, 23 Sep 2011 12:38:13 +0200
Marc Balmer  wrote:

> With gpio(4) we still carry an old API with us, which I want to remove.  
> While working on it, I will also introduce a third locator to device drivers 
> that attach to gpio pins, flags.  It will be needed for e.g. gpioiic(4) to 
> invert the SDA/SCL pin numbers.
> 
> WIll documenting the changes be enough?

Perhaps only one other question: Is there any advantage to keep
compatibility with OpenBSD (from which gpio(4) was intially ported);
are there tools from OpenBSD than can be used because of this
compatibility?  Or has gpio(4) stalled on OpenBSD?

Another option would be to allow a full redesign under a new device
name/copy, if that's a concern.  Personally, although I've seen gpio in
the releases I used since quite a while, I've never used it, and I
doubt that I used any code relying on it...

Thanks,
-- 
Matt


Re: Multiboot a NetBSD kernel with Grub2: it works

2011-09-26 Thread Matthew Mondor
On Tue, 13 Sep 2011 19:36:03 +0200
Emmanuel Kasper  wrote:

> I have just posted a detailed "install from GRUB howto" on netbsd-users.

Did the documentation you proposed get commited into the official docs
somewhere since?  If not, please consider filing a PR with the
information, so that it doesn't get lost.

The bit about needing to pass /netbsd twice so command line
arguments get passed to the kernel is also worthy of mention...

Thanks,
-- 
Matt


Re: KAUTH_PROCESS_SCHEDULER_*AFFINITY restricted to root in default secmodel?

2011-09-25 Thread Matthew Mondor
On Mon, 29 Aug 2011 01:07:52 +0200
Alistair Crooks  wrote:

Sorry for replying to an old thread, I'm still catching up with mail :)

> > i've found this some what annoying.  IMO, we should have a a way to say
> > "let normal users do this".  i'm not sure sysctl is the right place, but
> > maybe an overlay secmodel?  on some of my machines, i don't want to have
> > to be root to do this.  it's annoying to have to use root to get the
> > highest performance i can out of an application.
> > 
> > the current default is fine, however.
> 
> Something analogous to our friends:
> 
> % sysctl -a | grep mount
> vfs.generic.usermount = 0
> security.models.suser.usermount = 0
> %

And/or like   security.models.bsd44.curtain,  etc; I think that a
sysctl for this would be nice too.

Also, I'm not sure if this is doable (an annoyance if users and scripts
have been using the old knobs), but I tend to think that sysctls that
affect the default secmodel (bsd44) should ideally all be under
security.models.bsd44.?
-- 
Matt


Re: pty(4) 1024 bytes buffer limit

2011-09-25 Thread Matthew Mondor
On Fri, 9 Sep 2011 09:38:31 -0400
Matthew Mondor  wrote:

> On Fri, 9 Sep 2011 00:26:43 + (UTC)
> chris...@astron.com (Christos Zoulas) wrote:
> 
> > Please file a PR about this. I've been meaning to fix it.
> 
> Thanks, I will.

For reference and to close this thread, the relevant PR was kern/45352,
which was fixed and closed, thanks to Christos for the fixes and to the
others who posted hints.
-- 
Matt


Re: "hot swap" storage devices

2011-09-19 Thread Matthew Mondor
Sorry to reply to such an old thread (I'm catching up with ml mail).

On Mon, 27 Jun 2011 12:35:48 -0700
Erik Fair  wrote:

> With regard to "hot swap" storage devices, we really have two choices which 
> are not mutually exclusive:
> 
> 1. Treat as now, but with some additional code in the kernel which yells, 
> "hey! put that back! I have data to write on it!" when a device goes away 
> without prior notice (umount), and hold on to (rather than discard) the data 
> in the I/O buffer cache, in the hope that the user notices and heeds the 
> directive. Timeout to discard? Probably depends upon how much RAM utilization 
> pressure you're under. I think "minutes" would be a good unit here.

I think that this is the best solution;

This is basically what AmigaOS did, and it was nice, but it also had a
unified interface where even console was implemented on top of
graphics, with intuition.library resident in ROM, making it possible to
pop-up requesters at any time.  And it was designed for single-user...

This is more tricky in our case though; as the kernel should then be
able to forcefully trigger a requester, which ideally shouldn't
interrupt running processes and from which it must be possible to
resume working, on whatever currently active interface (console,
possibly in tmux/screen, or X11).

I wonder what the feasibility of this could be: reserve a wscons VT
(where possible) for this type of requester; when the kernel must use
it, remember which is the active VT, switch to the requester VT in text
mode where the requester is shown.  Depending on configuration, this
behaviour could be enabled or disabled, and possibly a timeout could be
configured.  Once the timeout expired or the needed user action was
performed (user selects cancel, retry, inserts a requested device,
etc), return back to the previous VT.

But this still does not deal with device identification; on AmigaOS
disks had labels and the system would verify upon insert/connect if the
label corresponded to such a pending requester...

Thanks,
-- 
Matt


Re: 5.1 USB panic on second removal of memory stick

2011-09-19 Thread Matthew Mondor
On Wed, 15 Jun 2011 20:04:23 -0700
Bob Lee  wrote:

Hello Bob,

>   I'm working on a PowerPC system, and have a problem when I remove the
> usb memory stick the second time.  That is insert memory stick, remove
> memory stick, insert memory stick, and remove memory stick.
>   At this point the system panics, with 'ehci_rem_qh: ED not found'.
>   Anyone else seen anything like this?

If your problem still occurs, please file a PR, along with the
backtrace and dmesg, so that it doesn't get lost.  I think that it
should be filed in the "kern" category.

Thanks,
-- 
Matt


Re: pty(4) 1024 bytes buffer limit

2011-09-09 Thread Matthew Mondor
On Fri, 09 Sep 2011 08:30:51 +1000
matthew green  wrote:

> > I looked at the various tty(4) termios(4) and pty(4) without finding an
> > option to change the buffer size.  Is there a way at all to change it?
> 
> there's no option.  infact, it's all hard coded as magic 1024 constants
> in about 4 places in sys/kern.  i kept meaning to fix that, but haven't
> gotten around to it.

Thanks for the confirmation,
-- 
Matt


Re: pty(4) 1024 bytes buffer limit

2011-09-09 Thread Matthew Mondor
On Fri, 9 Sep 2011 00:26:43 + (UTC)
chris...@astron.com (Christos Zoulas) wrote:

> Please file a PR about this. I've been meaning to fix it.

Thanks, I will.
-- 
Matt


Re: pty(4) 1024 bytes buffer limit

2011-09-09 Thread Matthew Mondor
On Thu, 8 Sep 2011 17:45:38 -0400
Thor Lancelot Simon  wrote:

> On Thu, Sep 08, 2011 at 11:26:29AM -0400, Matthew Mondor wrote:
> > 
> > It would be nice to for instance be able to use an MTU of 3000 so that
> > there are less context switches, but unfortunately tracing the
> > processes show that 1024 bytes are read from the pty devices at most.
> 
> Are you sure using an MTU of 3000 would do much of anything?  Since
> almost all peers are connected by Ethernet somewhere along the line,
> you are unlikely to ever see packets larger than 1500 minus Ethernet
> framing size.

Indeed I could even avoid IP fragmentation with a low enough MTU, which
is what I tried in the initial setup (and am still using, because of
the 1024 bytes limit).

> How did you determine that the bottleneck for your application was
> context switches?  That the 1024-byte read size you're seeing is
> actually internal to the tty layer or ppp rather than application
> imposed in userspace?

I'm not sure that the bottleneck really are user context switches but I
highly suspect it, as the forwarding daemon is mostly idle while it
can't seem to send faster than about 178KB/sec (when using an MTU small
enough to avoid the 1024 bytes limit, without which performance drops
even more).  If I could test with higher MTU to move more work down
into the kernel and network, I could confirm or disprove :)

I wrote the application as a test, so am controlling the buffer size,
and am invoking pppd with the wanted mru setting.  While it's not
impossible that pppd imposes the limit, I've found some threads when
searching with people complaining about the same pty limit on NetBSD
and OpenBSD with it being absent on Linux.  But pppd also uses the
in-kernel ppp support I think, which is probably different than Linux's.

Also, although I didn't inspect carefully the whole if_ppp code, I
didn't see anything suggesting 1024 would be a limit, yet in the pty
code I do see TTYHOG:

/usr/include/sys/tty.h:#define  TTYHOG  1024

To definitely test if it's really a pty/tty limitation I could write a
small program and see, though; probably the best thing to confirm.
-- 
Matt


pty(4) 1024 bytes buffer limit

2011-09-08 Thread Matthew Mondor
Hello,

I've been wondering if it was possible to change the pty(4) internal
buffer size, as I noticed that ppp tunnels cannot use a larger frame
size.  Because of this, it seems that the optimal MTU be 856, which is
so small that context switches become the bottleneck.

It would be nice to for instance be able to use an MTU of 3000 so that
there are less context switches, but unfortunately tracing the
processes show that 1024 bytes are read from the pty devices at most.

I looked at the various tty(4) termios(4) and pty(4) without finding an
option to change the buffer size.  Is there a way at all to change it?

Thanks,
-- 
Matt


Re: Seventh Edition(V7) filesystem support.

2011-05-25 Thread Matthew Mondor
On Tue, 24 May 2011 22:48:40 +0900 (JST)
UCHIYAMA Yasushi  wrote:

> This filesystem purpose I intended is that file exchanging with small computer
> (such as H8/300, ARM7TDMI...)system. as alternative of FAT. and also,
> Tri-endian support. It can mount PDP-11 V7 disk image.
> 
> http://www.vnop.net/~uch/h8/w/tools/patch/netbsd5.99.24-v7fs110524.patch

Nice!
-- 
Matt


Re: sys/dev/isa/fd.c FDUNIT/FDTYPE

2011-05-07 Thread Matthew Mondor
On Wed, 4 May 2011 19:54:37 -0700
jnem...@victoria.tc.ca (John Nemeth) wrote:

>  This doesn't mean we should be doing hack jobs.  NetBSD is about
> doing things right.

Can postinstall fix/recreate specific buggy devices?  Or could it warn
that /dev/fd* might need to be recreated?  Otherwise, at least it
should be mentionned in UPDATING, and that would be allright, IMO.
-- 
Matt


Re: extent-patch and overview of what is supposed to follow

2011-04-02 Thread Matthew Mondor
On Sat, 2 Apr 2011 11:49:14 +0200
Martin Husemann  wrote:

> On Sat, Apr 02, 2011 at 11:30:16AM +0200, Manuel Bouyer wrote:
> > AFAIK dtrace doesn't work on non-modular kernels ...
> 
> Nor on most of our archs, and AFAICT there is not even a document 
> describing the (maybe nontrivial amount of) work needed to make it
> work there.

I don't think that we should leave the tracking for a hypothetical
future; it'd be better if the interface, or implementation, allowed to
do such tracking
-- 
Matt


Re: GSoC 2011 project proposal [Add kqueue support to GIO]

2011-04-01 Thread Matthew Mondor
On Tue, 29 Mar 2011 10:17:26 -0500
David Young  wrote:

> Are you sure that kqueue supports monitoring trees?  I know that
> it's possible for to watch a directory with kqueue for nodes
> added/deleted/renamed, however, when a process watches directory x/ with
> kqueue it won't necessarily wake up if there is a change under x/y/ or
> x/y/z/.

I remember participating to an older thread on this subject here years
ago, and indeed kqueue lacks the ability to monitor trees.  If I
remember, one of the conversations did evolve around the possibility
for kqueue to eventually be able to report events for a whole
filesystem, although if this was done the security model has to be
thought of carefully.  An event could for instance hold the following
fields: inode, event, nparams, params[] or the like.

I'm not sure if this is a requirement for inotify, but another
potential issue is the reverse mapping of inodes to paths; also long
ago I implemented a userland daemon to do this and realized that for
large trees the memory requirements were rather large (I didn't bother
implementing compression for it though).  Of course this gets
complicated by the fact that multiple paths may lead to a common node,
but it wouldn't be very hard for the kernel to remember one of the
paths to every active vnode, possibly...  which would also benefit
fstat(1) and procfs...
-- 
Matt


Re: Status and future of 3rd party ABI compatibility layer

2011-04-01 Thread Matthew Mondor
On Wed, 23 Mar 2011 16:06:07 +0100
Joerg Sonnenberger  wrote:

> As such, I want to propose moving the last two categories into the Attic
> for further dusting.

It makes sense to me,
-- 
Matt


Re: Status and future of 3rd party ABI compatibility layer

2011-03-03 Thread Matthew Mondor
On Wed, 2 Mar 2011 00:40:44 +
Andrew Doran  wrote:

> With modules now basically working we should either retire or move
> some of these items to pkgsrc so that the interested parties maintain them.
> An awful lot of the compat stuff is now very compartmentalised, with not
> much more work to do.

Is all compat code i386 specific?  Otherwise, do modules really work on
all architectures involved?  Can a module built from third-party code
be linked statically to a monolithic kernel without hassle, for systems
on which enabling loadable modules is not allowed?

Thanks,
-- 
Matt


Re: mpt Serious performance issues

2011-02-04 Thread Matthew Mondor
On Fri, 4 Feb 2011 09:17:01 +0100
Stephan  wrote:

> Now this is REALLY strange. I was wondering about why the read speed
> is sometimes high (~70MB/s) and sometimes very slow (~2MB/s). So I
> repeated the test utilizing
> 
> find / -exec cat {} \; > /dev/null &
> 
> to read everything from the filesystem while watching the physical
> disks with my eyes and the throughput with sysstat. The findings is
> 
> -that sometimes the upper disks is 100% busy while the lower disk is
> NOT being accessed at all, and the read speed is ~2MB/s
> -then sometimes the adapter switches to the lower disk while the upper
> disk isn´t utilised anymore, and the read speed increases to ~70MB/s
> -until the adapter again switches to the upper disk which leads to the
> massive decrease in speed
> 
> So what do you think about that?

Just in case, none of those disks show any reallocated sectors using
atactl smart status?  I'm asking because I've seen very inconsistent
speeds on some drives whenever the remapping logic had to be turned
on.  Also, nothing in dmesg about read error retries?  As I've also
seen brand new disks with very high read error rates but otherwise
normal smart stats.  They two would crawl when reading certain areas.
Unfortunately I'm seeing this later defect more often recently.
-- 
Matt


Re: Problems with raidframe under NetBSD-5.1/i386

2011-01-06 Thread Matthew Mondor
On Thu, 6 Jan 2011 10:05:17 -0800
buh...@lothlorien.nfbcal.org (Brian Buhrow) wrote:

>   Do you know the magic to turn off -werror for individual kernel
> builds?

Perhaps try defining NOGCCERROR (found looking at src/share/mk/bsd.README)
-- 
Matt


Re: freebsd 5.99.41 as XEN3_DOMU

2010-12-24 Thread Matthew Mondor
On Sun, 19 Dec 2010 20:54:26 +0100
Manuel Bouyer  wrote:

> Well, in the current state, modules are a not enabled in the Xen kernels
> (modules should be built specifically for Xen, but the build tools do not
> allow this right now). So you have to compile all what you need in a
> monolitic kernel. But ZFS is only available as module, so unfortunably
> this means no ZFS for xen.
> One way around it is to run NetBSD in a HVM guest.

It it common for modules not to be able to be statically linked in a
monolithic kernel?  I understand providing ZFS as a module is
convenient for licensing reasons, but probably that it shouldn't be too
hard to somehow optionally link such a module to a kernel image at
build time, and call an init/load hook at boot runtime?

I tend to think that other than allowing to optionally dynamically load
code, another advantage to modules would probably be that they also can
optionally be included monolitically, with ideally no code changes...

Thanks,
-- 
Matt


Re: Heads up: moving some uvmexp stat to being per-cpu

2010-12-15 Thread Matthew Mondor
On Tue, 14 Dec 2010 20:49:14 -0800
Matt Thomas  wrote:

> I have a fairly large but mostly simple patch which changes the stats 
> collected in
> uvmexp for faults, intrs, softs, syscalls, and traps from 32 bit to 64 bits 
> and
> puts them in cpu_data (in cpu_info).  This makes more accurate and a little 
> cheaper
> to update on 64bit systems.

I like the cleanliness of the changes;

A potential issue I see is how heavy this becomes on some 32-bit CPUs
i.e. m68k, where I see for instance 1 instruction being replaced by 9
instructions (including registers save/restore) to increment a
counter.  I'm not sure if in practice this will really affect
performance, or if it's worth benchmarking for those architectures,
however.

If it turned out to be a problem, I could see two possible solutions:
an option to disable some stat counters on slow systems (values could
simply remain 0 in that case), or a new counter type say,
cpustatcount_t and macros defined by the MD code to use 32-bit
cpu-specific counters where necessary, getting compiled/exported to
userland using 64-bit at statistics request time to avoid
compat/userland complications...

Thanks,
-- 
Matt


Re: vmpage race and deadlock

2010-11-28 Thread Matthew Mondor
On Sun, 28 Nov 2010 09:30:44 +0100
Juergen Hannken-Illjes  wrote:

> Usually within hours I get a deadlock where a thread is waiting on "genput"
> but the page in question is neither BUSY nor WANTED.  I suppose I tracked (*1)
> it down to three places, where we change page flags without holding the
> object lock.  With this diff (*2) in place the test runs for > 48 hours.

This is a nice find, which most probably also deserves a PR, as
netbsd-5 also lacks proper synchronization there.

Thanks,
-- 
Matt


Re: New apple keymap variant or keymap in /usr/share/wscons/keymaps?

2010-11-28 Thread Matthew Mondor
On Sun, 28 Nov 2010 21:04:54 +0100
Frank Wille  wrote:

> I came to the conclusion that it might be easier and less intrusive to
> create a new keymap file (e.g. called "ukbd.apple.powerbook") for those
> function keys. So they can easily be added to any national keyboard layout.
> 
> But I realized that wsconsctl is unable to process a mapping-line with just
> one Cmd_*, or a Cmd followed by Cmd_Function in it. When there is no good
> reason that those are rejected I will fix it in the wsconsctl-parser now.

When a while ago I posted PRs with a new keymap to be added to the
kernel, I was told that they now should ideally be added as userland
keymaps.  When later supplying a userland keymap (the FR_CA one), I
noticed that the interface wasn't as friendly or powerful as it could
have been.

In case you intend to also enhance the keymap infrastructure and
interface, I have an old pending PR (misc/26720) with a few
enhancements for it, but I never got back to update the diffs for a
recent -current or to keep enhancing it.  Those are userland changes
though, possibly tech-userlevel is a better place to continue the
thread in this direction.

But other than encoding= support, it might also be nice to be able to
have "include" support like include= as well, after which it would be
possible to restructure the keymaps and move common parts together; and
if such "include" support allowed conditionals, parts could be loaded
conditionally and automatically depending on machine model (assuming
that would become available via sysctl), etc.

What demotivated me from keeping to work on it back then was the low
interest of the developers about that PR, but most importantly that I'm
usually using X11 terminals and ssh myself, with the default EN-US
wscons keymap being fine when I'm really at the console (and that almost
exclusively occurs at installation time).

If we want to pipe-dream, for the future, now that there's Lua in base,
it's even possible to redo the whole userland keymap loading/management
part with a more powerful language than sh.  This last part being back
on topic with tech-kern, with the advent of the kernel-lua project, it
might even be possible to eventually allow user translation mechanics
in the form of Lua scripts... :)
-- 
Matt


Re: mlock() issues

2010-10-22 Thread Matthew Mondor
On Fri, 22 Oct 2010 12:06:37 +0100
Sad Clouds  wrote:

> Well if you're allocating memory yourself, then you've just created your
> own application cache.

Say many files were mapped in the process's address space, the OS would
still be responsible of keeping frequently used ones pages active,
possibly swapping out long-unused ones, unless of course MAP_WIRED was
used.  A syscall per access would be eliminated however, i.e. read(2),
and I think that zero-copy could be used (with page loaning) if writing
64KB blocks out to a socket from a memory-mapped file.

> On the other hand if you mmap() those files
> directly, what happens if another process truncates some of those files
> while you're reading them?

I didn't do a test (it's definitely worth testing), but I think that a
SIGSEGV could occur if a previously available page disappeared unless
MAP_COPY, and file need to be remapped.

I could see a problem where a siginfo-provided address might need to be
easily matched with the file so that the process can efficiently know
which file to remap...  and for many files the current kqueue(2)
EVFILT_VNODE isn't very useful either to detect that a file was
recently modified, as it'd require too many open file descriptors :(

There was some discussion made years ago about a kqueue(2) filter that
could be set on a directory under which any modified file (possibly for
the whole involved filesystem for the superuser) would generate an
event with information about which file is modified by inode, but this
seems non-trivial and wasn't yet implemented.  There also are issues
with inode to file string lookup (multiple files could point to a
common destination, and a reverse name cache is needed).

Anyway, I like this kind of discussion and have nothing against NIH
personally (it fuels variety and competition, in fact), so thanks for
sharing your custom cache experiments and performance numbers.  If you
happen to do achieve interesting performance along the above
lines with mmap(2) as well, I'd also like to know how it went.

Thanks,
-- 
Matt


Re: mlock() issues

2010-10-22 Thread Matthew Mondor
On Fri, 22 Oct 2010 10:18:52 +0100
Sad Clouds  wrote:

> A pipelined request, say for 10 small files can be served with a single
> writev() system call (provided those files are cached in RAM), if you
> rely on kernel file cache, you need to issue 10 write() system calls.

Is this also true if the 10 iovecs point to mmap(2)ed files/buffers
which pages were recently accessed?
-- 
Matt


Re: kernel module loading vs securelevel

2010-10-18 Thread Matthew Mondor
On Mon, 18 Oct 2010 09:31:32 -0400
Steven Bellovin  wrote:

> Signatures provide *authentication*; what is needed here is *authorization*.

While I agree, there also are situations were both can be welcome...

Another solution someone proposed which I like is hashing the modules
to then at load time rehash and match a module against the hash set,
which would be a simpler, shorter-term solution.  I think that
embedding the hashes set in the kernel image would be safer than using
a file, however.  Unfortunately, this makes developing, installing or
upgrading a module less friendly as the kernel image has to be
refreshed and the system rebooted.
-- 
Matt


  1   2   >