Re: Bug in fs/cd9660 raises questions about inode number computing

2014-05-10 Thread Matthew Mondor
On Sat, 10 May 2014 08:11:40 +0200
Thomas Schmitt scdbac...@gmx.net wrote:

 kern/48787 can be counted as a successful one.
 kern/48797 demonstrates that i need to free myself more from
 expectations which occupied my mind when studying isofs of
 a different kernel.
 Thanks to Martin Husemann for posing the right questions.

Thanks for working on this,
-- 
Matt


Re: Bug in fs/cd9660 raises questions about inode number computing

2014-05-09 Thread Matthew Mondor
On Tue, 06 May 2014 12:20:53 +0200
Thomas Schmitt scdbac...@gmx.net wrote:

 How to properly submit them ?

A PR (Problem Report) in the kern category with an attached unified
diff would seem adequate if you cannot commit the changes yourself.
Sorry if that is already obvious to you.

Unfortunately I'm not personally familiar enough with iso9660 to
confirm that the fixes are right, or to answer the other questions,
though; hopefully others will.

-- 
Matt


Re: Vnode API change: add global vnode cache

2014-05-09 Thread Matthew Mondor
On Wed, 30 Apr 2014 17:15:16 +0200
J. Hannken-Illjes hann...@eis.cs.tu-bs.de wrote:

  vcache_get(mp, key, key_len, vpp) to lookup and possibly load a vnode.
  vcache_lookup(mp, key, key_len, vpp) to lookup a vnode.
  vcache_remove(mp, key, key_len) to remove a vnode from the cache.
  VFS_LOAD_NODE(mp, vp, key, key_len, new_key) to initialise a vnode.
 
 Updated diff at http://www.netbsd.org/~hannken/vnode-pass6-4.diff

One small question:

Is it expected in vcache_common() for the interlock to remain held even
if returning an error?

Thanks,
-- 
Matt


Re: Vnode API change: add global vnode cache

2014-05-09 Thread Matthew Mondor
On Sat, 10 May 2014 01:29:47 +
Taylor R Campbell campbell+netbsd-tech-k...@mumble.net wrote:

Is it expected in vcache_common() for the interlock to remain held even
if returning an error?
 
 vget unconditionally drops the interlock, so it will never remain
 held, error or not.

Oh, thanks.  I can now see that vget() must be called with it held, and
indeed drops it itself.
-- 
Matt


Re: Panic when deleting large number of files inside DomU

2014-05-06 Thread Matthew Mondor
On Wed, 19 Sep 2012 12:00:45 +0200
Roger Pau Monne roger@citrix.com wrote:

 Yes, WAPBL enabled. I will fill a PR about this if there are no news.

Was a PR already filed for this, or was the reason discovered and fixed
since? A quick search showed one of your closed Xen related PRs but it
seems to be a different issue, unless I'm mistaken.

Thanks,
-- 
Matt


Re: NetBSD-5 appears to have forgotten how to execute 0.9A binaries

2014-05-06 Thread Matthew Mondor
On Tue, 11 Sep 2012 09:45:22 -0700
buh...@lothlorien.nfbcal.org (Brian Buhrow) wrote:

 provide further results.  I assume a fix would want to be pulled
 up,assuming I find it, on the grounds that it's a security fix.  I'll also
 see about trying -current and NetBSD-6, but I'm guessing those are
 vulnerable as well, given Matthew's test with my binary under NetBSD-6
 yesterday.

Was a PR for this ever filed, or the problem fixed since?  Any relation
to SA2013-013?

Thanks,
-- 
Matt


Re: Does options P1003_1B_SEMAPHORE still exist?

2014-05-06 Thread Matthew Mondor
On Mon, 17 Sep 2012 10:42:49 -0700 (PDT)
Paul Goyette p...@whooppee.com wrote:

Sorry for the long delay, I'm slowly recouping with tech-kern mail.

 I recently noticed that there is a built-in ksem module that includes 
 sys/kern/uipc_sem.c
 
 The man page for sem(4) states that this code should be included in the 
 kernel only if options P1003_1B_SEMAPHORE is defined.  Yet a search of 
 the kernel sources shows no usage for this option anywhere, and the 
 uipc_sem.c file is unconditionally included by sys/conf/files
 
 So, I have a few questions:
 
 1. Should sem(4) really be in manual section 4?  It doesn't appear to be 
 a device driver!  (Maybe a more detailed man page should be written for 
 section 9?)

I have the impression that those syscalls should all be documented in a
section 2 manual page instead (kern/37427).  Not totally related but
misc/38979 would have similar results for the scheduler control related
syscalls.  I now realize that I probably don't have a PR for these ones,
but the mqueue and setaffinity related syscalls are also undocumented.

At the time I filed the PRs they were contested by AD because the libc
counterparts were already documented, with the syscalls considered the
private interface.  I personally believe that all syscalls should be
documented in NetBSD (and recently I have learned that I'm not the only
one to think they should be, so perhaps I should eventually write these
manual pages, afterall).

 2. Should the man page be updated to remove the reference to the option?

A quick grep on netbsd-6 here only shows:

share/man/man4/options.4:.It Cd options P1003_1B_SEMAPHORE
share/man/man4/sem.4:.Cd options P1003_1B_SEMAPHORE
sys/compat/freebsd/freebsd_syscall.h:#if defined(P1003_1B_SEMAPHORE) || 
!defined(_KERNEL)
sys/compat/freebsd/freebsd_syscallargs.h:#if defined(P1003_1B_SEMAPHORE) || 
!defined(_KERNEL)
sys/compat/freebsd/freebsd_syscallargs.h:#if defined(P1003_1B_SEMAPHORE) || 
!defined(_KERNEL)
sys/compat/freebsd/freebsd_syscalls.c:#if defined(P1003_1B_SEMAPHORE) || 
!defined(_KERNEL)
sys/compat/freebsd/freebsd_sysent.c:#if defined(P1003_1B_SEMAPHORE) || 
!defined(_KERNEL)
sys/compat/freebsd/syscalls.master:#if defined(P1003_1B_SEMAPHORE) || 
!defined(_KERNEL)
sys/compat/netbsd32/netbsd32_syscall.h:#if defined(P1003_1B_SEMAPHORE) || 
(!defined(_KERNEL_OPT)  defined(_LIBC))
sys/compat/netbsd32/netbsd32_syscallargs.h:#if defined(P1003_1B_SEMAPHORE) || 
(!defined(_KERNEL_OPT)  defined(_LIBC))
sys/compat/netbsd32/netbsd32_syscallargs.h:#if defined(P1003_1B_SEMAPHORE) || 
(!defined(_KERNEL_OPT)  defined(_LIBC))
sys/compat/netbsd32/netbsd32_syscalls.c:#if defined(P1003_1B_SEMAPHORE) || 
(!defined(_KERNEL_OPT)  defined(_LIBC))
sys/compat/netbsd32/netbsd32_sysent.c:#if defined(P1003_1B_SEMAPHORE) || 
(!defined(_KERNEL_OPT)  defined(_LIBC))
sys/compat/netbsd32/syscalls.master:#if defined(P1003_1B_SEMAPHORE) || 
(!defined(_KERNEL_OPT)  defined(_LIBC))
sys/kern/init_sysctl.c:#if defined(MODULAR) || defined(P1003_1B_SEMAPHORE)
sys/modules/compat_netbsd32/Makefile:CPPFLAGS+= -DP1003_1B_SEMAPHORE -DCOREDUMP 
-DKERN_SA

 3. If the code is truly unconditional, should it really be a module?  If 
 so, could it be made to auto-load when needed?  Could it also be auto 
 unloaded?

It seems that other POSIX librt components such as message queues,
scheduler control, cpu affinity, etc, are not optional.  I don't know
why those semaphores should be, thus they could probably remain as part
of the base kernel with the option removed, unless we'd want all of RT
components to be optional and in a module, perhaps?  But librt of
course wouldn't be usable then, unless it's loaded...

Anyone remember a particular reason why these semaphores might be
unwanted in custom kernels, but the rest of librt wanted anyway?
-- 
Matt


Re: NetBSD-5 appears to have forgotten how to execute 0.9A binaries

2014-05-06 Thread Matthew Mondor
On Tue, 6 May 2014 07:56:22 -0700
Brian Buhrow buh...@nfbcal.org wrote:

   hello.  There was a fix implemented for the original problem by Chuck
 Silvers and tested by me.   I'll look to see if I can find the commits.
 I'm not sure if it was documented in a pr or not or if it got pulled up to
 NetBSD-6.  I'm pretty sure it's in -current and I know it's in -5 as a
 pullup.  If you want to have a look, it happened in the first half
 ofSeptember 2012.

Unfortunately I couldn't locate the exact change or pullup tickets.
But considering the change was pulled up to netbsd-5, and that 6.0 was
released around October, I guess that if netbsd-6 needed the change it
was also fixed then.

Thanks,
-- 
Matt


Re: resource leak in linux emulation?

2014-05-05 Thread Matthew Mondor
On Mon, 5 May 2014 15:43:56 +1200
Mark Davies m...@ecs.vuw.ac.nz wrote:

 On Mon, 05 May 2014, Christos Zoulas wrote:
  I wrote:
  So can someone suggest where exactly the patch should go.  And
  isn't proc_lock held at this point (entered at line 344, exit at
  line 569)?
  
  How about this?
 
 Seems good to me and can confirm that its fixed the increasing proc 
 count problem.  Can someone commit and pull up to 6?

I also see emulation-code specific exit hooks support, I've not checked
if it's really possible, but could that linux-specific case be solved
there instead of in the generic code if so?

Thanks,
-- 
Matt


Re: asymmetric smp

2014-05-05 Thread Matthew Mondor
On Mon, 5 May 2014 01:10:24 -0400
Matthew Mondor mm_li...@pulsar-zone.net wrote:

 which some CPUs might have trouble with (i.e. RAS)...

I think that what I meant was CAS

-- 
Matt


Re: asymmetric smp

2014-05-04 Thread Matthew Mondor
On Wed, 02 Apr 2014 17:21:02 +0200
Johnny Billquist b...@softjar.se wrote:

 On 2014-04-02 16:10, John Nemeth wrote:
  On Apr 2,  1:55pm, Johnny Billquist wrote:
  } The root fs in on nfs, as I'm running the machine diskless. Disk is
  } served from a -current NetBSD/alpha system sitting right next to it. And
  } I have changed the Alpha to run at 10 MB/s half duplex, and I have 2k
  } block size for NFS. Login is obviously already running, since that is
  } what also prompts for the username, and doing it twice should even put
  } some stuff in local cache.
 
Uh, actually getty does the initial prompt for username on
  the console.  After collecting the username, getty execs login.
 
 Hmm. My mistake in that case. So we have image activation at that point. 
 Hmm...

Possibly other things to verify would be /etc/passwd.conf (you'll likely
need to also regenerate passwords if you change those settings), and if
VAX has specialized lock code or uses the new generic atomic operations
which some CPUs might have trouble with (i.e. RAS)...
-- 
Matt


Re: 6.0_BETA-6.0_BETA2 rename

2012-07-30 Thread Matthew Mondor
On Mon, 30 Jul 2012 16:59:14 +0200
Edgar Fuß e...@math.uni-bonn.de wrote:

 Just out of curiosity: Why was 6.0_BETA renamed 6.0_BETA2 recently?

The release of second beta binaries:
http://blog.netbsd.org/tnf/entry/netbsd_6_0_beta2_binaries

After the beta series, release candidates might be expected i.e. RC1,
RC2 etc until official release, at which point the netbsd-6 branch will
become 6.0_STABLE.
-- 
Matt


Re: Core statement on directory naming for kernel modules

2012-07-28 Thread Matthew Mondor
On Fri, 27 Jul 2012 17:28:14 -0700
jnem...@victoria.tc.ca (John Nemeth) wrote:

 On Dec 17,  1:58pm, Matthew Mondor wrote:
 } This reminds me though: why/how does sysctl/kern.module.autoload
 } default to 1 for non-MODULAR kernels (at least on netbsd-6)?  Or an
 } alternative question: are these sysctl knobs useful at all with
 } non-MODULAR kernels, or are they then artifacts?
 
  Good question.  Non-MODULAR kernels still have parts of the MODULAR
 subsystem in order to initialise built-in modules.  However, the linking
 code isn't there, so it would be impossible to load a module.  I'll make
 a note to trim some of the excess stuff in non-MODULAR kernels.

Indeed the linker isn't there, which was confirmed using nm when I
initially noticed those knobs.

Thank you for looking into this,
-- 
Matt


Re: Core statement on directory naming for kernel modules

2012-07-27 Thread Matthew Mondor
On Fri, 27 Jul 2012 13:57:52 + (UTC)
Geoff Wing ma...@primenet.com.au wrote:

 John Nemeth jnem...@victoria.tc.ca typed:
 : .. Being able to properly unload a built-in module would be a nice
 : feature.
 
 This sounds a bit like a possible security problem, though 
 presumably/hopefully
 limited by the current security level and AAA.

Do you mean in the case an external module could then be loaded instead
of a built-in one?  Probably that someone who wants to prevent the
kernel from loading external modules would use a kernel without
MODULAR, or change the runlevel.

This reminds me though: why/how does sysctl/kern.module.autoload
default to 1 for non-MODULAR kernels (at least on netbsd-6)?  Or an
alternative question: are these sysctl knobs useful at all with
non-MODULAR kernels, or are they then artifacts?

Thanks,
-- 
Matt


Re: Quota on tmpfs

2012-07-17 Thread Matthew Mondor
On Tue, 17 Jul 2012 20:54:28 + (UTC)
mlel...@serpens.de (Michael van Elst) wrote:

 I would also guess that sparse files are very rarely used. But for
 disk usage purposes you want to consider real disk usage including
 overhead because the quotas are mostly used to partition the available
 space. That doesn't work if your quotas allow you to write a few
 thousand files of 1 byte length that account together as a single
 single block when they really occupy a few thousand blocks.

A scenario in which they're frequently used is block-based file system
transfer protocols (especially distributed ones where blocks may
download in random order, including bittorrent), also by download
managers that support download optimization where multiple
connections will be made to transfer multiple file sections at a time
(i.e. the DownloadThemAll Firefox extension).

Another common usage of sparse files is for live file system images.
The cost of creation (open/creat + trunk/lseek + newfs) is small
compared to writing a full image of zeros, then the blocks can be
lazily allocated and written when needed.

Apparently some database storage formats use sparse files, but the ones
I'm currently using don't seem to...
-- 
Matt


Re: Quota on tmpfs

2012-07-17 Thread Matthew Mondor
On Tue, 17 Jul 2012 21:26:44 -0400
Matthew Mondor mm_li...@pulsar-zone.net wrote:

 A scenario in which they're frequently used is block-based file system

s/file system/file/ :)
-- 
Matt


Re: Quota on tmpfs

2012-07-13 Thread Matthew Mondor
On Fri, 13 Jul 2012 07:54:07 +
David Holland dholland-t...@netbsd.org wrote:

 On Thu, Jul 12, 2012 at 09:33:42PM -0400, Matthew Mondor wrote:
   Yet another hack would be to create a sparse ffs image under a tmpfs,
   mounted with quotas via vnd, but evaluating its ideal size might be
   difficult, and you'd have to re-apply quota settings in the script that
   creates the image at boot time... :)
 
 Using mfs instead of tmpfs is probably a better bet here. mfs brings
 in enough of ufs that adding quota support to it shouldn't be
 particularly complicated.

I was also wondering initially if mfs didn't actually already support
quotas because of this similarity, but it doesn't seem so at the moment
indeed

Thanks,
-- 
Matt


Re: Quota on tmpfs

2012-07-13 Thread Matthew Mondor
On Fri, 13 Jul 2012 08:03:42 +
David Holland dholland-t...@netbsd.org wrote:

 I believe the situation with both mfs and lfs is that some pieces of
 the support are in place but not others. It was clear when hacking up
 the code that neither had actually been tried by anyone in a long,
 long time...

I admit myself not having tried LFS again after the advent of WAPBL,
and only having used MFS to boot small custom userlands using
crunchgen(1) long ago (floppy disks :)
-- 
Matt


Re: Quota on tmpfs

2012-07-12 Thread Matthew Mondor
On Thu, 12 Jul 2012 16:17:42 +0200
Edgar Fuß e...@math.uni-bonn.de wrote:

 How do I enable new quota on a tmpfs?

A possible solution might be a per-user tmpfs, each limited using -s...
of course, it's more complex to manage though.

If I remember there is some optional support for symbolic links to
resolve to user-specific targets, but I forgot the details.  With
that /tmp/ could potentially be a symbolic link pointing to
say, /tmpfs/user/ I think.

Yet another hack would be to create a sparse ffs image under a tmpfs,
mounted with quotas via vnd, but evaluating its ideal size might be
difficult, and you'd have to re-apply quota settings in the script that
creates the image at boot time... :)
-- 
Matt


Re: Path to kernel modules (second attempt)

2012-07-08 Thread Matthew Mondor
On Sun, 8 Jul 2012 17:57:00 +0200
Edgar Fuß e...@math.uni-bonn.de wrote:

  Please not /kernel as it was already mentioned, it is too similar to
  /kern.
 What about /netbsd? E.g. /netbsd/6.0_BETA/{modules,kernel,firmware}.

/netbsd/amd64/6.0/GENERIC/{modules,kernel,firmware} :) ?

But can the kernel easily detect that its image was booted in a
particular directory, and use that as base directory to look for
modules?  Also, how more complex would this be for the bootloader that
also needs to preload a few modules to be able to boot?
-- 
Matt


Re: Path to kernel modules (second attempt)

2012-07-07 Thread Matthew Mondor
On Sat, 07 Jul 2012 22:46:50 +0200
Jean-Yves Migeon jeanyves.mig...@free.fr wrote:

 On 07.07.2012 21:57, Mindaugas Rasiukevicius wrote:
  Hello,
  
  Regarding the PR/38724, I propose to change the path to /kernel/.
  Can we reach some consensus quickly for netbsd-6?
 
 /kernel is way to close to /kern, and they serve different purposes.
 IMHO that will raise confusion.

Perhaps /kmod, or /modules like dholland suggests?

 Technically modules are not libraries, but maybe /libdata/module is a
 good option? We already have firmwares in /libdata/firmware, and those
 get used by the kernel.

That also makes sense
-- 
Matt


Re: Path to kernel modules (second attempt)

2012-07-07 Thread Matthew Mondor
On Sat, 7 Jul 2012 20:54:12 -0600
Warner Losh i...@bsdimp.com wrote:

 But it kinda fails with multiple kernels.  On FreeBSD, we went with 
 /boot/$KERNNAME/kernel for the kernel, with all the modules associated with 
 it in /boot/$KERNNAME. By default, we load /boot/kernel/kernel and the loader 
 may also choose to load other things.  The reason we put it in /boot was 
 because we have a secondary boot loader (/boot/loader) and on some platforms 
 we were looking at you needed a separate boot partition to do things 
 correctly.  this layout allows for that as well as transparently supporting 
 multiple kernels.  I know on one of my MIPS boards, I can read kernels or the 
 boot loader off of FAT partitions, so my /boot there is a FAT file system, 
 with the rest of the system in a UFS file system on separate 
 partitions/slices on my CF.

I think that the version and arch directories would be maintained.

But you're right, and when I think of it, it's actually one of the
reasons I use monolithic kernels.  If modules and kernels always
corresponded well and were closely coupled in a directory, it'd be much
less trouble for me to test and move kernels around, or maintain
multiple versions of them on the same host.  At the moment, single
monolithic files do this much better.  Some kernel configuration
changes not only affect the main image, but also the modules, and full
ABI compatibility would be a difficult problem.

It might not matter for someone who wants to avoid using a custom
kernel (I agree that modules should help a lot in this case for the end
user, no matter their arrangement).  But if we eventually begin to see
modules under non-BSD licenses which can only be distributed as
modules, more tech users might likely want modules as well...  Or it
might not matter at all, if an admin can simply link together all
modules in a single kernel image, and keep the non-distributable image
private in the organization (I think there is some work in this area,
other than the traditional monolithic builds)?

So something like /kmod/amd64/6.0/GENERIC/, or a layout
where /netbsd-GENERIC/ could be a directory, /netbsd-GENERIC/image the
kernel, /netbsd-GENERIC/modules/ its corresponding modules, would be
nice.  In the latter case, prehaps also a /netbsd symlink pointing to
the corresponding /foo/image, somewhat like the vmlinuz link of some
Linux distributions?

Thanks for sharing your experience,
-- 
Matt


Re: Problem with chown

2012-06-28 Thread Matthew Mondor
On Wed, 27 Jun 2012 23:20:36 -
David Lord net...@lordynet.org wrote:

 I tried NetBSD-6-BETA2 but had too many problems. 
 Attempted reinstalls of NetBSD-5 have all obviously
 failed.

Indeed, downgrading is usually more problematic, postinstall not being
of much use in this case
-- 
Matt


Re: per-mount maxvnodes

2012-06-10 Thread Matthew Mondor
On Thu, 7 Jun 2012 17:50:58 +0200
Manuel Bouyer bou...@antioche.eu.org wrote:

 On Thu, Jun 07, 2012 at 11:09:26AM -0400, Mouse wrote:
   Therefore comes the idea to have a per-mount maxvnodes.
  
   I tried implementing it, the biggest problem is how to set the value.
  
  sysctl kern./usr/local.maxvnodes?
  
  It's a little ambiguous, in that it's possible - or at least it was
  last time I tried it - to have multiple mounts with the same mounted-on
  string.  But that's definitely an unusual case, and I see nothing wrong
  with accessing the topmost mount in that case; that's what normal
  filesystem accesses will do, after all.
 
 No, I think this should be a mount option. Maybe it's time to revisit the
 mount(2) interface (proplist anyone ? :)

If mounts had an ID (like processes), then it'd be easier to use sysctl
for them (commands such as mount and df might want to also export such
IDs, so possibly also statvfs(2))... There are device ID, but I'm not
sure this could serve this purpose properly.

This also reminds me of the thread about possibly allowing to
temporarily enable noatime for a particular operation such as a backup
or find... Perhaps that such options should eventually be dynamically
scoped such that a particular process or lwp could temporarily bind
another value for its own use (if it has the necessary privileges, of
course)?  I'm not sure how far fetched this can be relatively to the
code, I'm not very familiar with the FS code.
-- 
Matt


Re: Rump FS throughput

2012-06-02 Thread Matthew Mondor
On Fri, 1 Jun 2012 22:30:10 +0200
Thomas Klausner w...@netbsd.org wrote:

 On Thu, May 31, 2012 at 01:45:53PM -0400, Matthew Mondor wrote:
  Although it's useful to mount random media more safely than it would be
  using kernel-space, I noticed that using 64KB reads, the kernel cd9660
  will gladly read ~20MB/s from a DVD, but that rump_cd9660 using
  64KB reads is limited to aproximately 4MB/s at most, even if the system
  is mostly idle during those transfers (on netbsd-6/amd64 and 4 3.3GHz
  cores).
 
 Some suggestions from Antti via email proxy:
 Maybe he is using the block device (/dev/cd0a) instead of the raw device
 (/dev/rcd0a).  IIRC the former has some pretty serious performance
 problems for userspace I/O.  Also in the maybe department, libp2k
 should probably detect and autoadjust a block device to raw device.
 Or, someone could just fix the bdev stuff.

Thanks for forwarding his suggestions,

If I try using the raw device (rcd0a), the speed is about 1.2MB/s (I
can't hear the DVD drive motor spin up either), while with the block
device (cd0a) the speed is about 4MB/s (in this case it spins up to a
higher speed).  With the same DVD and cd0a mounted using the
kernel FS implementation and the same command
(cat /cdrom/* /dev/null), I get from 10 (start) to 20 (end) MB/s.
These tests were on NetBSD-6.

I'm not familiar enough with libp2k or bdev to know what needs to be
done, but I could certainly take a look eventually.  But I probably
also should verify if an ISO-9660 FUSE implementation exists, and
perhaps try to port it eventually, and see if performance is adequate
for general use.

Thanks again,
-- 
Matt


Re: link-sets in modules

2012-05-31 Thread Matthew Mondor
On Mon, 28 May 2012 06:51:43 -0700 (PDT)
Paul Goyette p...@whooppee.com wrote:

 I _do_ like part 2 of your proposal - linking the core kernel first, 
 and then re-linking with selected modules.

I also think that this would be very nice
-- 
Matt


Re: Should kqueue descriptors work outsid of the creating process?

2012-05-31 Thread Matthew Mondor
On Thu, 31 May 2012 10:38:38 -0400 (EDT)
Mouse mo...@rodents-montreal.org wrote:

  Recently we found out (PR kern/46463) that kqueue() file descriptors,
  which originaly were designed to be local process only objects,
  could be passed with SCM_RIGHTS messages to other processes.  [...]
 
  I propose to not allow sending kqueue file descriptors [...]
 
  Or are there any legit uses for foreign kqueue()s?
 
 It seems to me, for what it may be worth, that this is asking the
 wrong question.  Rather, I would ask whether there are illegitimate
 uses for `foreign' kqueue descriptors, and, if not, fix them to be
 passable like any other descriptors.

It's true that it's normally the parent's reponsibility to decide which
FDs to close or set close-on-exec before fork(2)... Was there a design
decision not to inherit kqueue descriptors for security or complexity
reasons?

Since signals, signal mask, signal stack and restart/interrupt flags
are also inherited according to sigaction(2), probably that an
EVFILT_SIGNAL filter would still be valid...

But how about EVFILT_TIMER?  timer_create(2) timers are not inherited,
setitimer(2) doesn't specify, but it also uses the same ptimers pool
timer_create(2) uses.  EVFILT_TIMER apears to use its own system though.

For EVFILT_PROC, it appears to be for the specified process, so I guess
it might still work if inherited?

And there also EVFILT_VNODE... who knows what other filters might be
added in the future?

What I can see is that the implications of inheriting this special
descriptor are quite more complex than for normal FDs...  Which makes
me think that it very well could be a design decision not to inherit
these, in which case I don't object to also prevent passing it via
SCM_RIGHTS ancillary message.
-- 
Matt


Re: Should kqueue descriptors work outsid of the creating process?

2012-05-31 Thread Matthew Mondor
On Thu, 31 May 2012 14:40:44 -0400
Matthew Mondor mm_li...@pulsar-zone.net wrote:

 What I can see is that the implications of inheriting this special
 descriptor are quite more complex than for normal FDs...  Which makes
 me think that it very well could be a design decision not to inherit
 these, in which case I don't object to also prevent passing it via
 SCM_RIGHTS ancillary message.

When catching up with mail, I unfortunately read the PR thread after
writing this (as well as Christos's concerns about treating some FDs
differently than others).  What came to my mind was that kqueue could
have used another type of special object instead of a descriptor, but
it's too late for a change of API, and although I see some other
interfaces using such integers which aren't necessary file descriptors
(i.e. timer_create(2)), kqueue's API expects close(2) to clean it
up...
-- 
Matt


Re: CVS commit: src/tests/modules

2012-03-22 Thread Matthew Mondor
On Wed, 21 Mar 2012 21:47:31 +
David Holland dholland-t...@netbsd.org wrote:

 But, how about kern.module.supported or kern.module.canload or
 something?

I like the kern.module.supported, or perhaps kern.module.enabled, as I
have systems built without module loading support yet still have a few
module sysctls around under that same hierarchy, and module.modular
also seems ambiguous and redundant...
-- 
Matt


Re: Rewriting kernfs and procfs - GSoC'12

2012-03-20 Thread Matthew Mondor
On Tue, 20 Mar 2012 10:35:13 +0900
Julio Merino j...@julipedia.org wrote:

 Personally, I'd also like to see this project done.  It was at one point 
 an idea I wanted to work on, but then lost the time to do so and 
 forgotten about it completely.

I was initially reticent to reply to this thread at this time, because
some details might be out of the scope of the GSoC project.  But I
think that those questions are important to consider in the design of a
new procfs implementation, and the project description was very
summary, so I decided to post them anyway:

It was nice to be able to mount procfs with -o linux when I used Linux
binary compatibility.  Are there other scenarios where it is required?
If not, should a new implementation simply be as compatible as possible
with Linux, such that -o linux not be necessary?  Even some supposedly
portable software occasionally now expect a Linux-compatible procfs
tree.

Otherwise, I think that currently NetBSD doesn't make use of it, as
kernfs and procfs are not mounted on my systems.  Is there
functionality that it should provide which
sysctl/vmstat/pmap/fstat/drvctl don't?  While on Linux it's used as a
central repository for a lot of information, I regularily stumble on
ad-hoc parsers in a number of applications that query from it,
wondering why they didn't export that information via sysctl...

If it should diverge from Linux and still support -o linux, is there
a particular hierarchical direction it should respect, and suggested
file format(s), i.e. plist is an example, which applications could
parse using a supplied library?  Or should the data be in a format
designed for human reading only, with sysctl used for software?  I
doubt that a new implementation needs to remain compatible with the
traditional 4.4BSD procfs hierarchy, as it's not really being used by
software yet.

I once thought that it might be useful to export procfs via NFS,
but our current implementation doesn't support it.  Is this something
that a new implementation should allow?

Thanks,
-- 
Matt


netbsd-6/amd64 and TLS

2012-03-18 Thread Matthew Mondor
Hello,

I stumbled upon something interesting tonight when testing a new
unstable ECL (Embeddable Common Lisp).  When built with TLS support
(--with-__threads=yes), a noticeable slowdown can be experienced
compared to with --with-_threads=no.  For now, I'm not sure yet if it
has to do with a bug in ECL or in NetBSD, though, I should check the
TLS/non-TLS code paths whenever I have more time.

But I wanted to meanwhile share this, in case someone else also noticed
something similar, or has a clue as to why this happens.

The system was built using DBG='-g -O2'.

Thanks,
-- 
Matt


Re: Problem with install of NetBSD-6 from cd on i386 siside

2012-03-07 Thread Matthew Mondor
On Wed, 07 Mar 2012 15:14:52 -
David Lord net...@lordynet.org wrote:

 I have since obtained netbsd-6 src via cvs on a different system,
 built a release, copied sets over network and updated target pc
 to NetBSD-6. I am able to mount the cdrom and tar -tzvf comp.tgz
 initially gave same error as above but then completed ok. Seems
 the drive isn't being allowed to spin up.

Just a note: beware about the missing -p option when extracting sets.
Permissions will not be restored properly and things like setuid
binaries will not be working (a common issue would be for instance,
su(1) not working after installing base.tgz).  This might not matter
for the comp.tgz set, though.
-- 
Matt


Re: Respawn crashed PUFFS filesystems?

2012-02-11 Thread Matthew Mondor
On Sun, 12 Feb 2012 01:02:38 -0500 (EST)
Mouse mo...@rodents-montreal.org wrote:

  Of course the feature would be broken in some cases, but we could
  make the thing optional using a vfs.puffs.respawn sysctl, which would
  contain a colon-separated mount points subjected to respawn.
 
 What happens if a mount point contains a colon?
 
 More to the point, I think this puts the information in the wrong
 place.  Is there any way it could be set as an option at mount time?
 (That's a serious question; I don't know puffs enough to answer it.)

I also think that a mount respawn option would be elegant
-- 
Matt


Re: extattr namespaces

2012-02-06 Thread Matthew Mondor
On Mon, 6 Feb 2012 09:51:19 +
Emmanuel Dreyfus m...@netbsd.org wrote:

 We ahve two extended attributes API in tree: one from FreeBSD and one from 
 Linux. We are about to toss the FreeBSD one in favor of the Linux one. 
 That is easy now since we never had working extended attributes in a 
 release.

One thing that I'm wondering: what are the character constraints on
those class names in the Linux API?

The reason is that if UTF8 is allowed, it'd be possible for two names
to show as an equivalent representation to humans, while they'd be
different for the system, and this could have security implications if
we ever use these to support extended permissions such as ACLs in the
future.

 In the FreeBSD API, namespaces are int. There are two namesapces defined:
 ssytem and user. There is no way to add other namespaces, though I have
 no idea what happens if one use an int valude different than system or user.

For performance and security, integers make more sense to me than
strings.  However, I don't think there'd be a problem if internally
they're integers, yet showed to userland with a strings interface (we
traditionally do this for user and group IDs, in which case tools such
as id or ls can show the IDs as well as names).  Or if names were
restricted as necessary if IDs were dropped.

At least for namespace name strings and the SYSTEM namespace attribute
name strings, they should probably be restricted to a-z (or A-Z).  I
don't think that this would matter much for user namespace attributes,
though.
-- 
Matt


Re: Adding an option to avoid SIGPIPE for all file descriptors

2012-01-25 Thread Matthew Mondor
On Wed, 25 Jan 2012 12:25:46 -0500
Steven Bellovin s...@cs.columbia.edu wrote:

 
 On Jan 23, 2012, at 11:05 58PM, Matt Thomas wrote:
 
  
  On Jan 23, 2012, at 7:58 PM, Steven Bellovin wrote:
  
  I also wonder whether we should also have a note that disabled SIGPIPE.
  similar to what paxctl does.
  
  You mean a system-wide flag?  That would worry me; I think it would have
  bad effects, since anything that did
  
 a | b 
  
  paxctl sets a note in the executable.
  
 I don't like that, either, but on philosophical grounds.
 
 The problem I have is that the semantics of the execution now depend on
 something not in the source code; however, the code needs to know about
 it in order to cope properly.  (Setuid is somewhat different, since it
 also reflects the policy of the site.)  I also don't see the point, as
 opposed to a system call to set the flag.  

A system-wide flag would mess with applications that expect the SIGPIPE
traditional behaviour, and I also find rather awkward to depend on an
ELF note for this.  The use of ELF notes for paxctl is less
questionable but still awkward: at application upgrade the admin must
remember to also set the special paxctl flag again on the new
executable, vs a vnode flag.

Applications already can use signal(3) or sigaction(2) if they don't
want it (and now the FD-specific setsockopt(2)/fcntl(2), which I see no
problem with).

But if I understand, Matt's suggestion is to be able to disable
SIGPIPE signaling for some of them behind their back?  Then how about a
process/PID-specific nosigpipe sysctl(3) perhaps (we have things like
stopfork/stopexec/stopexit), or a more general way to control if/which
signals are ignored for a process via sysctl?  Or something like
nohup(1) but for SIGPIPE, nosigpipe(1), or a more general nosig(1)
allowing to specify which signals to ignore?

Thanks,
-- 
Matt


Re: Possible incorrect usage of STACKALIGN in kern_exec

2012-01-24 Thread Matthew Mondor
On Tue, 24 Jan 2012 21:01:49 +0100
Martin Husemann mar...@duskware.de wrote:

 On Tue, Jan 24, 2012 at 08:21:42PM +0100, Paul Fleischer wrote:
  Is the usage of STACKALIGN indeed incorrect in this situation, or am I
  missing the big picture?
 
 I stumbled across this when revamping execve1 for posix_spawn recently.
 
 The intention seems to be to align the stack on a 8 byte boundary
 (where arm usualy only requires 4 byte alignment). I did not dig in the
 ARM ABI docs deep enough to see why this would be needed.
 
 However, the current implementation seems to be broken - the macro works
 on the stack pointer but not on a length variable, as you noted.
 
 Can anyone explain why arm would need 8 byte alignment?

Do some architectures (i.e. x86) have better performance if the stack
is 16-bytes aligned?  If so, perhaps that this could be MI, satisfying
both 8-bytes (or 4-bytes) alignment, by aligning stacks at 16-bytes?
Would this be considered wasteful?  Of course, x86-64 MD code could
also be used...

There is also a related PR but which is for threads stack alignment:
lib/39465

Thanks,
-- 
Matt


Re: Reduce KAUTH_GENERIC_ISSUSER usage (batch 1)

2012-01-18 Thread Matthew Mondor
On Tue, 17 Jan 2012 20:36:35 -0500
Elad Efrat e...@netbsd.org wrote:

 Attached is a diff that reduces the use of KAUTH_GENERIC_ISSUSER. I
 plan to commit it a week or so after the branch.

Thanks for working on this.

While I understand most changes, after looking at the diff I wondered:
anyone know what is special about pxg(4) that requires a special
MACHDEP_PXG check as opposed to MACHDEP_UNMANAGEDMEM?

Thanks,
-- 
Matt


Re: buffer cache ufs changes (preliminary ffsv2 extattr support)

2012-01-16 Thread Matthew Mondor
On Sun, 15 Jan 2012 15:21:40 -0500 (EST)
Mouse mo...@rodents-montreal.org wrote:

 However, I think that constitutes a good implementation of a bad idea.
 This makes a file no longer a long list of octets; it becomes multiple
 long lists of octets.  The Mac did this, with resource forks and data
 forks, and you may note OS X doesn't do it any longer.  I suspect these
 will seem like a good idea for a while, until people start discovering
 all the things they break, or that break them, and realize that they
 didn't learn from history and thus had to repeat it.

I didn't know that Apple dropped the idea, but I have always found the
idea flaky myself (and sorry for the rant):

- Applications may still implement and maintain metadata as they wish
  without the feature
- Requires changes to support in OS, FS, and many file manipulation
  tools
- No standard API for these, few, incompatible, restricted
  solutions/formats for archival
- Security implications (scanning tools which aren't aware might skip
  hidden/extended data; if ACLs are eventually implemented and are
  using these, the implementation should not only support a system
  domain, but also use IDs rather than strings (or at least severely
  sanity-check a restricted string format))
- Inevitable eventual loss of the extended data, possibly because of
  backup procedures not aware of it, moving/copying/editing files with
  non-aware/third-party tools, etc (also consider editors that save to
  another file to then rename)
- An administrative nightmare when tools such as find/locate/grep/diff
  won't disclose data that the admin might be looking for but is now in
  an extended attribute

But this is only the opinion of a user, and I could keep the feature
disabled on my systems, of course, so I don't necessarily object to
optional support for it.
-- 
Matt


Re: PUFFS and existing file that get ENOENT

2012-01-16 Thread Matthew Mondor
On Mon, 16 Jan 2012 10:56:33 + (UTC)
y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:

 when the kernel wants to cache other files.
 ie. whenever the kernel decides to reclaim it. :-)
 you can increase the chance by running
   while :;do sysctl -w kern.maxvnodes=0; done
 or something like that.

Wouldn't the performance also drop significantly with a permanently low
maxvnodes, though?

Thanks,
-- 
Matt


Re: NetBSD/usermode (Was: CVS commit: src)

2011-12-31 Thread Matthew Mondor
On Sat, 31 Dec 2011 17:20:16 +
David Holland dholland-t...@netbsd.org wrote:

 The other obvious approach is to add one or more new ptrace operations
 to provide proper/adequate/better support for intercepting system
 calls. This is probably a more useful facility in the long run, and it
 *can* be made leakproof, but it'll be more work.

Could this also eventually allow systrace-style functionality that'd be
safer than the previous implementation?

Thanks,
-- 
Matt


Re: close and ERESTART

2011-12-26 Thread Matthew Mondor
On Mon, 26 Dec 2011 05:19:22 +
Taylor R Campbell campbell+net...@mumble.net wrote:

 +
 + error = fd_close(SCARG(uap, fd));
 + if (error == ERESTART)
 + error = EINTR;
 +
 + return error;

If it's also guaranteed that the file descriptor state is closed in the
event of an ERESTART error, I like this, personally.
-- 
Matt


Re: cloning device close race?

2011-12-19 Thread Matthew Mondor
On Sun, 18 Dec 2011 23:40:33 -0500
Matthew Mondor mm_li...@pulsar-zone.net wrote:

 On Sun, 18 Dec 2011 22:34:03 -0500
 Thor Lancelot Simon t...@panix.com wrote:
 
  If you run 10 or so copies at once on a multiprocessor system
  with DIAGNOSTIC, you'll see a lot of this message emitted:
  
  vrelel: missing VOP_CLOSE(): vnode @ 0xfe801e73cb28, flags 
  (0x800030MPSAFE,LOCKSWORK,INACTNOW)
  tag VT_UFS(1), type VCHR(4), usecount 2, writecount 0, holdcount 0
  freelisthd 0x0, mount 0x800024235000, data 0xfe801de01f00 lock 
  0xfe801e73cc38
  tag VT_UFS, ino 46213, on dev 4, 0 flags 0x0, nlink 1
  mode 020644, owner 0, group 0, size 0
  
  I am guessing the problem also exists with other cloning
  pseudodevices, not just the new /dev/random implementation.
 
 This just reminds me that a friend yesterday pointed me to this article
 about close(2)'s POSIX semantics:
 
 http://www.daemonology.net/blog/2011-12-17-POSIX-close-is-broken.html

In case someone else was also interested in this, I was informed
off-list that NetBSD ensures that the file descriptor be in closed
state after close(2), in the event where it is interrupted and errors
with EINTR.  In another discussion with the person who originally
forwarded me the above URL, I was told that according to her
investigation, Linux also does this.

Thanks,
-- 
Matt


Re: cloning device close race?

2011-12-18 Thread Matthew Mondor
On Sun, 18 Dec 2011 22:34:03 -0500
Thor Lancelot Simon t...@panix.com wrote:

 If you run 10 or so copies at once on a multiprocessor system
 with DIAGNOSTIC, you'll see a lot of this message emitted:
 
 vrelel: missing VOP_CLOSE(): vnode @ 0xfe801e73cb28, flags 
 (0x800030MPSAFE,LOCKSWORK,INACTNOW)
   tag VT_UFS(1), type VCHR(4), usecount 2, writecount 0, holdcount 0
   freelisthd 0x0, mount 0x800024235000, data 0xfe801de01f00 lock 
 0xfe801e73cc38
   tag VT_UFS, ino 46213, on dev 4, 0 flags 0x0, nlink 1
   mode 020644, owner 0, group 0, size 0
 
 I am guessing the problem also exists with other cloning
 pseudodevices, not just the new /dev/random implementation.

This just reminds me that a friend yesterday pointed me to this article
about close(2)'s POSIX semantics:

http://www.daemonology.net/blog/2011-12-17-POSIX-close-is-broken.html

I then only checked the close(2) manual page so far, which indeed lists
EINTR as a possible errno value on error.  But since the article also
mentions that some OSs decided to ensure that EINTR never be returned
to avoid the problems, I wondered: does NetBSD already do something to
ensure that either: 1) EINTR not be possible or atomically be restated
transparently in the same LWP, or 2) the state of an FD after an
interrupted close(2) never be inconsistent?  The latter solution might
still allow race conditions in multithreaded code, possibly.

Thanks,
-- 
Matt


Re: [RFC] getgroups2 system call

2011-12-13 Thread Matthew Mondor
On Wed, 14 Dec 2011 07:04:06 +0100
m...@netbsd.org (Emmanuel Dreyfus) wrote:

 - a fixed lentgh header is highly desirable for performance
 optimization. For instance glusterfs fetches the header and the data
 using readv(2) with an iovec that has two slots. That way it gets write
 date aligned on a page boundary.

What does NFS do in this case?  I seem to remember that it also imposes
a sane size limit, possibly even below NGROUPS_MAX, is it really the
case?  If so, would this also be acceptable?
-- 
Matt


Re: Lost file-system story

2011-12-11 Thread Matthew Mondor
On Fri, 9 Dec 2011 22:12:25 -0500
Donald Allen donaldcal...@gmail.com wrote:

 Linux systems do periodically write ext2 meta-data to the disk. And
 ext2 fsck has always been very good, and has gotten better over the
 years, due to the efforts of Ted T'so. I first installed Linux in
 1993, almost 20 years ago, and have been using it continuously ever
 since. I have *never* lost an ext2 filesystem and I've never mounted
 one sync.

I'm not sure if it's the case on Linux with ext2, but by default NetBSD
FFS mounts are not sync, nor async; metadata is sync and data blocks
are async.  In async mode, all data is asyncronously written, including
the metadata, and in sync mode everything is written synchronously (the
default OpenBSD uses, if I recall).  I just wanted to specify this as
you mentioned not mounting your ext2 systems in sync mode, but a
default NetBSD FFS mount will not be in sync mode either.

Other available options with FFS are using soft dependencies (softdep)
or WAPBL metadata journalling (log), with which it is possible to have
increased performance VS the default mode, without really sacrificing
reliability, unlike with the fully async mode.  In those modes,
metadata is written asynchroneously as well.

Sorry if what I said is already obvious to you,
-- 
Matt


Re: Use consistent errno for read(2) failure on directories

2011-12-09 Thread Matthew Mondor
On Fri, 9 Dec 2011 09:33:54 +0100
Nicolas Joly nj...@pasteur.fr wrote:

 According to the online OpenGroup specification for read(2) available
 at [1], read(2) on directories is implementation dependant. If
 unsupported, it shall fail with EISDIR.

In the case of sys/rump/librump/rumpvfs/rumpfs.c, is it possible that
the underlaying implementation could previously decide if it could
support read(2) on directories, and this would no longer be the case
with this patch?

Thanks,
-- 
Matt


Re: Use consistent errno for read(2) failure on directories

2011-12-09 Thread Matthew Mondor
On Fri, 9 Dec 2011 11:56:32 +0100
Nicolas Joly nj...@pasteur.fr wrote:

 On Fri, Dec 09, 2011 at 04:36:55AM -0500, Matthew Mondor wrote:
  In the case of sys/rump/librump/rumpvfs/rumpfs.c, is it possible that
  the underlaying implementation could previously decide if it could
  support read(2) on directories, and this would no longer be the case
  with this patch?
 
 No. This only impact the rump fs itself (used as the root file system
 in applications); its does not matter while accessing other fs through
 rump.

Thanks for the explanation,
-- 
Matt


Re: Lost file-system story

2011-12-09 Thread Matthew Mondor
On Fri, 9 Dec 2011 15:50:35 -0500
Donald Allen donaldcal...@gmail.com wrote:

 were not designed to do this. The reason I'm beating on this is that I
 would have liked to use NetBSD for the application I have in mind, but
 I need the performance improvement that async provides (my tests show
 this; the tests also show that NetBSD async is about as fast as Linux,
 much faster than OpenBSD async, at least for doing a lot of writing,
 such as un-tarring a large tar file). This is practical if the joint

The speed and reliability WAPBL provides have been enough for my uses
personally; are the few seconds saved using async really that worth the
trouble?  Also, if raw speed is needed to do many installations on
identical systems, dd with large blocks to mirror the system might be a
faster alternative...

I'm not argueing that fsck shouldn't be able to recover though; it
ideally should, but the problem seems to be that too much metadata is
missing when crashing while writing in async mode.

OpenBSD's async mode would be slightly slower while flushing metadata
more often, probably.  Perhaps that having a sysctl to control flushing
would be a good thing, though.

Thanks,
-- 
Matt


Re: emap

2011-12-04 Thread Matthew Mondor
On Mon,  5 Dec 2011 04:19:13 + (UTC)
y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:

  Although I didn't think it'd be necessary to say so until this point, I
  admit that I myself didn't really understand what Takashi said about
  recommending amd64 over i386.  If the hardware is 32-bit, or on
  constrained memory devices, i386 definitely needs to be supported.
 
 it isn't my recommendation.  rmind@'s.

Sorry about that, I should have rechecked upthread instead of looking
at the quoting mess :)
-- 
Matt


Re: secmodel_register(9) API

2011-11-29 Thread Matthew Mondor
On Tue, 29 Nov 2011 02:51:38 +0100
Jean-Yves Migeon j...@netbsd.org wrote:

 Reviews before merge welcome. If nobody raises his voice, I'll proceed 
 to commit it at the end of the week.

Hello,

I admit not having audited the kauth and secmodel code recently, the
last time being shortly after Elad's initial implementation, please
bear with me if some observations are irrelevant:

  There are various ad-hoc calls to printf() which could probably be
  replaced by a more generic function call also resolving the error
  number to a string matching the constant i.e. secmodel_perr(int
  errno, const char *function); or similar, possibly wrapped by a macro
  using __FUNCTION__ avoiding the redundant function names

  The initialization functions, such as secmodel_keylock_init(), will
  report an error in the dmesg but do not propagate errors (they're
  void functions, suggesting the caller will not suspect anything).
  Should not the system panic for similar security critical failures?
  I think that I saw a similar situation under the various case
  MODULE_CMD_INIT.

  When seeing the strcasecmp() calls in the eval_* functions for names
  such as is-securelevel-above or is-root, my first impression was
  that integer constants, macros, or even a system of interned strings
  and references would be nice.  Then it struck me that if the goal
  was to export these, exporting actual variables might be best
  (although in any case, exporting these seem to somewhat defeat
  kauth-style centralization.  But if I understand, this is not for
  general use in the kernel, but for use by other security models?  If
  so, it's not so much out of scope in the sense that it remains in
  sys/secmodel)...


Note that the following is not criticism on your patch, but
pipe-dreaming and opinion.  It's also outside the scope of the existing
kauth implementation, but I couldn't resist, considering it was slightly
on-topic:

Having a main area to look for security related decisions is a good
thing, and kauth was a good step in that direction.  It's also nice to
permit an administrator or organization to tweak the system for their
needs using an elegant architecture.

However, I've always found its design to be slightly too dynamic,
perhaps too much of an interpreter (and those eval_* functions make it
even more so).  Then there is all the C code dedicated to attaching,
detaching parts to the program tree at runtime, etc.  Although I'm
not familiar enough with the original Darwin implementation, that is
probably similar there.

Since this is security related, it would not be unreasonable if the only
possible runtime changes were user/admin configuration (module-specific
sysctl configuration knobs, file system permissions, PaX flags, etc).
This means that the final runtime security system could be statically
generated at compile-time.

Dreaming ahead along that path (this part could still be possible with
an interpreter-like model though), it might be possible to create a
similar system, centralized yet modular (not at runtime, but for
human-friendly organization), to design and use a simple mostly
declarative language to edit and represent security models, then
compile that representation to static code.  The input could be more
elegant (also more easily allowing to define the domains and their
authorize interface, any hierarchies, etc), the output could permit a
more efficient runtime (generating unrolled code where wanted rather
than loops running among hooks lists)...  And of course there could be
specialized static analysis and test tools to warn model designers of
possible shortcommings in their designs.  With finally a preprocessor
tool so that it'd be possible to embed the language with C code, where
necessary...

But then again, I'm only pipe dreaming, and that's always easier than
implementing any of that, of course :)
-- 
Matt


Re: emap

2011-11-25 Thread Matthew Mondor
On Fri, 25 Nov 2011 23:25:24 +0400
Aleksej Saushev a...@inbox.ru wrote:

 Thor Lancelot Simon t...@panix.com writes:
 
  On Fri, Nov 25, 2011 at 12:50:58PM +0400, Aleksej Saushev wrote:
  Mindaugas Rasiukevicius rm...@netbsd.org writes:
  
   y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
   hi,
   
   what's the status of emap and pipe?
   
  
   ... and encourage our users to use amd64 instead
   of i386.
  
  I'm sorry to intervene, what about WINE? Unless we're going to have it
  functional on amd64, encouraging is useless.
 
  I don't understand your comment.  Are you suggesting that a large fraction 
  of
  NetBSD/i386 users use WINE and therefore would not be able to switch to the
  amd64 port?
 
 I mean that those users who could switch most probably have switched already.
 And one of serious reasons to stay on i386 is functional WINE.

Although I didn't think it'd be necessary to say so until this point, I
admit that I myself didn't really understand what Takashi said about
recommending amd64 over i386.  If the hardware is 32-bit, or on
constrained memory devices, i386 definitely needs to be supported.

But then again, I'm not familiar with the emap code; from the bits I
read in this thread, it could serve to optimize pipes?  That pipes can
be better optimized on amd64 than on i386 is no problem to me, so I
assumed that he was talking about encouraging users to use amd64 if
they want to take advantage of a particular feature, not that i386
would get deprecated and start to become unsupported.

It would be nice if someone who knows better could explain better what
was meant, or confirm what I said above (if I understood correctly),
considering that it caused some worries...

Thanks,
-- 
Matt


Re: puffs netbsd-5 (was VOP_GETATTR: locking protocol change proposal)

2011-11-21 Thread Matthew Mondor
On Mon, 21 Nov 2011 08:04:46 +
Emmanuel Dreyfus m...@netbsd.org wrote:

 FWIW I spent weeks tracking down a file corruption bug on growing giles
 in PUFFS because VOP_GETATTR operates on an unlocked vnode. If the 
 VOP_GETATTR request follows a not yet completed VOP_FSYNC (as done by 
 ioflush thread), I got toasted: VOP_FSYNC causes PUFFS to send a SETATTR
 to the filesystem, and on completion VOP_GETATTR gets from the filesystem
 a file size smaller thant what VOP_FSYNC just set. This cause the file
 to be truncated by the kernel, and data written between VOP_FSYNC
 and VOP_GETATTR to be discarded and replaced by a chunk of zeroed bytes.
 
 I had to add a lock on file size modification in PUFFS to fix the problem.

I seem to remember you previously writing about using puffs/rump on
netbsd-5, is that still on netbsd-5?

The reason I ask is that I'm seeing various bugs when using psshfs (and
had various problems when mounting CDs using rump_cd9660); at the time
when I corresponded with Pooka about it he told me that it wasn't ready
for production use on netbsd-5 and recommended -current.  One of the
problems is the process can suddenly start to consume as much CPU
time as it can, while operations become real slow or lock.  Another
issue had to do with inconsistencies between the rump-seen state and
actual on-disk state, possibly due to cache invalidation issues or the
like...

A few days back I still had the psshfs process locked in a loop (I
didn't use it often enough to investigate where it loops yet).
This might not be related at all to the locking issues you're having,
though.

Thanks,
-- 
Matt


Re: puffs netbsd-5 (was VOP_GETATTR: locking protocol change proposal)

2011-11-21 Thread Matthew Mondor
On Mon, 21 Nov 2011 08:45:52 +
Emmanuel Dreyfus m...@netbsd.org wrote:

 On Mon, Nov 21, 2011 at 03:26:35AM -0500, Matthew Mondor wrote:
  I seem to remember you previously writing about using puffs/rump on
  netbsd-5, is that still on netbsd-5?
 
 I use PUFFS on netbsd-5, and fixed a few bugs in it, so you defintively
 need latest netbsd-5 to avoid bugs. I nevver used rump, and indeed Pooka
 told me that it was not production-ready on netbsd-5.

My systems are fairly recent; what I'll do then is update again and use
psshfs some more, so that I can file a PR when I again get the busy
looping issue.

My two older PRs related to rump/puffs on NetBSD-5 were kern/43589 and
kern/43590, which were unrelated problems.

Thanks,
-- 
Matt


Re: fs-independent quotas (binary plists)

2011-11-17 Thread Matthew Mondor
On Thu, 17 Nov 2011 10:50:17 +0100
Manuel Bouyer bou...@antioche.eu.org wrote:

 In this context, text format means a key/value pair format, in which
 some keys are optionnal and values can be of arbitrary types. Maybe you can
 do this with a binary format too, but it doesn't exists yet.

This reminds me that years ago someone implemented support to save
plists in a binary format[1] (this doesn't necessarily mean that it
would help solve this problem, though).  But I'm surprised that since
all these years the support wasn't added; anyone know if there is
general resistance to an optional compact and portable binary format,
and if so, the reasons?

If such a format was supported, it wouldn't be harder to machine or
human-process (proplib could be used as it is now for code, and bplists
could be easily exported to an xml format as requested to edit in an
editor, i.e. via a viplist, plistctl or such command (which also could
use advisory locking, of course, and save back to binary format if the
system is configured to use a binary format).  In theory, it could also
increase performance, and a binary format would be simpler to parse by
the kernel than xml, minimizing bugs...

[1] ftp://ftp.netbsd.org/pub/NetBSD/misc/freza/bplist-2007-10-27.diff

Thanks,
-- 
Matt


Re: MAXNAMLEN vs NAME_MAX

2011-11-14 Thread Matthew Mondor
On Sun, 13 Nov 2011 23:08:30 +
David Holland dholland-t...@netbsd.org wrote:

 I was recently talking to some people who'd been working with some
 (physicists, I think) doing data-intensive simulation of some kind,
 and that reminded me: for various reasons, many people who are doing
 serious data collection or simulation tend to encode vast amounts of
 metadata in the names of their data files. Arguably this is a bad way
 of doing things, but there are reasons for it and not so many clear
 alternatives... anyway, 256 character filenames often aren't enough in
 that context.

It's only my opinion, but they really should be using multiple files or
a database for the metadata with as necessary a link to an actual
file for data.
But I also tend to think the same of software relying on extended
attributes, resource forks and the like (with the possible exception of
a specialized facility for extended permissions :)

 (This sort of usage also often involves things like 50,000 files in
 one directory, so the columnizing behavior of ls is far from the top
 of the list of relevant issues.)

This reminds me, does anyone know about the current state of
UFS_DIRHASH?  I remember reading about some issues with it and ending up
disabling it on my kernels, yet huge directories can occur in a number
of scenarios (probably a more pressing issue than extending file names,
actually)...

   The 255 limit was just because that's how many bytes a one byte length
   field permitted, not because anyone thought names that long made sense.
   But if you're going to increase it, why  stop at 511?  That number
   means nothing - the next logical limit would be 65535 wouldn't it?
 
 Well... yes but there are other considerations. As you noted, going
 past one physical sector is problematic; going past one filesystem
 block very problematic. Plus, as long as MMU pages remain 4K,
 allocating contiguous kernel virtual space for path buffers (since if
 NAME_MAX were raised to 64K, PATH_MAX would have to be at least that
 large) could start to be a problem.

I agree, especially with all the software that allocates path/file name
buffers on the stack (but even on the heap it could be a general memory
waste with 64KB, other than the memory management performance issues).
-- 
Matt


Re: sysctl(7) knob to allow users to control CPU affinity

2011-11-03 Thread Matthew Mondor
On Thu, 03 Nov 2011 17:01:48 +1100
matthew green m...@eterna.com.au wrote:

  Since the default is to not allow affinity control, it's not of utmost
  importance, but it could allow a compromise between total restriction
  and total freedom...  I have no objection to that sysctl personally.
 
 i think the default should be changed, but user-specified affinity
 shouldn't be considered an absolute rule, just a preference.  i'm not
 sure i understand exactly what sort of issue you're envisioning.

I assumed there could be issues since pset(3) is restricted to the
superuser (as well as pthread_setaffinity_np(3) now), but when
rethinking about it I admit not seeing a problem as non-privileged
processes cannot change the process priority beyond their class'
priority.

The only other case that comes to my mind would be a dmover(9) like
system eventually reserving processor(s) for dedicated tasks, but I
guess that in this case the reserved cores would simply be made
unavailable in cpuctl(8)/pset(3)/etc...
-- 
Matt


Re: sysctl(7) knob to allow users to control CPU affinity

2011-11-02 Thread Matthew Mondor
On Thu, 03 Nov 2011 01:50:49 +0100
Jean-Yves Migeon j...@netbsd.org wrote:

 Here's a proposal for a sysctl(7) knob to easily allow non-superusers to 
 set the CPU affinity of processes and threads they own:
 
 security.secmodel.suser.usersetaffinity
 
 (ressembles the one already existing to allow for user mounts)
 
 Would it be acceptable to modify current secmodel_suser(9) to allow this?
 
 This issue comes regularly on various tech-* MLs, motivated by the fact 
 that people expect this behavior based on what they encounter on other OS.

Just out of curiosity, but is it possible for the superuser to still
reserve wanted CPU/cores, such that non-privileged users could, if that
sysctl is enabled, work with the non-reserved ones?  Or, can the
sysadmin specify CPU/cores and/or limits for non-privileged users?

Since the default is to not allow affinity control, it's not of utmost
importance, but it could allow a compromise between total restriction
and total freedom...  I have no objection to that sysctl personally.

Thanks,
-- 
Matt


Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive

2011-10-31 Thread Matthew Mondor
On Mon, 31 Oct 2011 19:58:27 -0400
Greg Troxel g...@ir.bbn.com wrote:

 Obligatory actual netbsd tech-kern content: It seems like we really need
 a sync_synchronous(2) system call that guarantees that all file system
 operations that have completed (syscall returned) before the issuance of
 the sync_synchronous call are on disk before sync_synchronous returns.
 It seems odd that for sync, there is no waiting, fsync seems to wait,
 and fsync_range can flush or not flush caches, more or less.

Hmm since in sync(2), the non-synchronous issue is noted as a bug:

BUGS
 sync() may return before the buffers are completely flushed.

Does this mean that sync(2) should normally be synchronous and fixed to
be, such that sync_synchronous(2) not be necessary?
-- 
Matt


Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive

2011-10-28 Thread Matthew Mondor
On Fri, 28 Oct 2011 20:33:29 -0400
Greg Troxel g...@ir.bbn.com wrote:

 So, I'm inclined to patch rdiff-backup not to fsync, since it seems
 excessive, and the backup is toast if the machine crashes before it is
 finished -- in that case rdiff-backup just rolls back.  Opinions?

I also wonder why fsync would be used for every file, especially if you
consider a whole run a single transaction, even more so if using
snapshots (although you don't mention using them).  In which case it
simply should report failure and abort on any open/write/rename/close
error, and at the end, fsync once, also checking for error.  If at
that point everything was successful, the transaction is commited (as
far as software is concerned, of course, hardware buffers might still
need flushing), otherwise everything should be rolled back, unless an
inconsistent state is allowed (where the next full backup might fix
that).

I'm however wondering if the excessive fsync(2)s weren't eventually
added because of issues with ext4, as I somehow remember unix semantic
exceptions with it, and know that some have lost files using it as
they'd normally safely use other file systems (and I haven't followed
progress to know if it's since fixed).

But if rdiff-backup cannot optionally avoid those, adding an option to
tell it not to fsync at every file as you suggested would be very sane
IMO (it still could default to sync mode, in case there's upstream
resistence)...

I can understand the need for some transaction-logging applications to
call fdatasync(2) regularily, but that's another matter (and even then
it's usually configurable after how many bytes or seconds to call it to
allow the administrator to tweak performance).
-- 
Matt


Re: Extended attributes Linux interface

2011-10-21 Thread Matthew Mondor
On Fri, 21 Oct 2011 00:29:12 -0400
Matthew Mondor mm_li...@pulsar-zone.net wrote:

 If unicode strings are possible, I think that it'd be possible for a
 string to look like system but to actually be something else to an
 auditing administrator, unless all tools clearly showed those non-ASCII
 bytes in an escaped format.

If the above theory is true, if we eventually supported extended
permissions such as access lists, they could possibly be implemented in
a special empty string class, with a special empty string key, and a
single structured object value specifying the permissions, rather than
relying on various keys within the system class.

Yet ideally for performance and security, it'd be ideal if the
interface only presented integer IDs for the class, and reserved
integer key attributes for the i.e. EXTATTR_SYSTEM class (just like our
groups are really gids).  The Linux compatibility interface, if
preserved, could be oblivious to system class attributes and only be
useful for the general purpose user attributes...  The problem here
would be that user tools using only the Linux API would not be able to
backup the full state (in this case, the extended permissions,
unfortunately)...
-- 
Matt


Extended attributes Linux interface

2011-10-20 Thread Matthew Mondor
Hello,

There were previously discussions, started by Emmanuel, concerning the
extended attributes, including on the various available APIs and which
to support etc.

At the time I read them I was catching up with a lot of mail and had
written down a small note about a potential security implication that
crossed my mind if we used the Linux interface.  Perhaps someone can
(dis)confirm:

Strings are used instead of IDs to distinguish the class of an extended
attribute, i.e. system etc.  My question is then: must those be
limited to ASCII or can they support arbitrary bytes, or UTF-8?

If unicode strings are possible, I think that it'd be possible for a
string to look like system but to actually be something else to an
auditing administrator, unless all tools clearly showed those non-ASCII
bytes in an escaped format.

Of course, if the kernel wanted to match system, it wouldn't match
then, but the fact that it may _appear_ to be correct to an admin may
introduce a security issue if extended permissions were ever
implemented on top of that system.  Perhaps that this problem could
also exist with the key names in case they're part of permission
descriptions?

Thanks,
-- 
Matt


Re: UNIX kernel notification system

2011-10-04 Thread Matthew Mondor
On Mon, 3 Oct 2011 00:40:46 -0700
Erik Fair f...@netbsd.org wrote:

 Why not a classification/taxonomy of kernel missives? This doesn't mean we 
 can't continue to have relatively free form (and possibly amusing) text for 
 those conditions we're not yet prepared to classify/codify yet ('cause 
 they're rare, or debug, or ... whatever). The potential for win is in making 
 (or retaining) software parse-ability to enable software response.

Interestingly this very paragraph reminds me of Common Lisp signals
and restarts; signals can be conditions or errors and hold structure
(and inheritence), blocks of code may ignore or catch them, uncatched
exceptions may be handled by software including the invokation of
restarts, or left alone to be routed to the debugger (which is even
overridable through a hook), and there is support for stack-unwind
protected code which gets executed no matter if an exception causes a
long jump out.

Of course, all of this seems overkill for our purposes, but probably
worth mentioning for inspiration...
-- 
Matt


Re: UNIX kernel notification system

2011-10-04 Thread Matthew Mondor
On Mon, 3 Oct 2011 11:31:17 -0700
Erik Fair f...@netbsd.org wrote:

 less(1) (or more(1)) doesn't take care of you? The nice thing about such 
 formatting is that the text can be wrapped at relatively arbitrary word 
 boundaries, making it more readably displayable on a wider range of display 
 widths (e.g. mobile phones, tablets). Just as all the world's not a VAX 
 (cried the PDP-11 users), so also is the world rather more than just 80 by 24.

Sorry to have to add anything to this off-topic discussion;

One issue is that a message may a mix of text to be wrapped and text to
be left as-is (code for instance), so every paragraph/line must be able
to be auto-wrap annotated.  Of course there is the possibility of HTML
mail (with its own issues) and multiple MIME parts, but it's
traditionally fine on tech lists to mix code and text inline, with the
only exceptions that I see being in Apple Mail posts.

Another issue is that readers that will wrap such paragraphs don't
usually have a configuration option to specify the width of auto-flowed
paragraphs, so for instance in the client I use (a GTK2 client), those
paragraphs extend far right until the end of the window (which means
much more than 80 columns), so reading them is unfortunately harder.

But, with the recent proliferation of Apple Mail posts on the mailing
lists I try to throttle my complaints about it (my last one being
http://mail-index.netbsd.org/tech-userlevel/2010/10/30/msg004119.html :)
-- 
Matt


Re: (off topic) mail line wrapping

2011-10-04 Thread Matthew Mondor
On Tue, 4 Oct 2011 09:35:16 +0200
Alan Barrett a...@cequrux.com wrote:

(flowed paragraph follows)

 Ignoring special cases, the rules are roughly this:  The sender 
 marks soft-wrapped paragraphs by ending every line except the 
 last with a space.  The sender marks hard-wrapped lines by not 
 ending them with a space.  (A paragraph of only one line cannot 
 be soft wrapped.)

Fortunately, your auto-flowed paragraphs are still properly wrapped so
that even clients that don't support it will display them properly,
though.
-- 
Matt


Re: Multiboot a NetBSD kernel with Grub2: it works

2011-09-26 Thread Matthew Mondor
On Tue, 13 Sep 2011 19:36:03 +0200
Emmanuel Kasper emman...@libera.cc wrote:

 I have just posted a detailed install from GRUB howto on netbsd-users.

Did the documentation you proposed get commited into the official docs
somewhere since?  If not, please consider filing a PR with the
information, so that it doesn't get lost.

The bit about needing to pass /netbsd twice so command line
arguments get passed to the kernel is also worthy of mention...

Thanks,
-- 
Matt


Re: Changing the gpio(4) API/ABI

2011-09-26 Thread Matthew Mondor
On Fri, 23 Sep 2011 12:38:13 +0200
Marc Balmer m...@msys.ch wrote:

 With gpio(4) we still carry an old API with us, which I want to remove.  
 While working on it, I will also introduce a third locator to device drivers 
 that attach to gpio pins, flags.  It will be needed for e.g. gpioiic(4) to 
 invert the SDA/SCL pin numbers.
 
 WIll documenting the changes be enough?

Perhaps only one other question: Is there any advantage to keep
compatibility with OpenBSD (from which gpio(4) was intially ported);
are there tools from OpenBSD than can be used because of this
compatibility?  Or has gpio(4) stalled on OpenBSD?

Another option would be to allow a full redesign under a new device
name/copy, if that's a concern.  Personally, although I've seen gpio in
the releases I used since quite a while, I've never used it, and I
doubt that I used any code relying on it...

Thanks,
-- 
Matt


Re: pty(4) 1024 bytes buffer limit

2011-09-25 Thread Matthew Mondor
On Fri, 9 Sep 2011 09:38:31 -0400
Matthew Mondor mm_li...@pulsar-zone.net wrote:

 On Fri, 9 Sep 2011 00:26:43 + (UTC)
 chris...@astron.com (Christos Zoulas) wrote:
 
  Please file a PR about this. I've been meaning to fix it.
 
 Thanks, I will.

For reference and to close this thread, the relevant PR was kern/45352,
which was fixed and closed, thanks to Christos for the fixes and to the
others who posted hints.
-- 
Matt


Re: KAUTH_PROCESS_SCHEDULER_*AFFINITY restricted to root in default secmodel?

2011-09-25 Thread Matthew Mondor
On Mon, 29 Aug 2011 01:07:52 +0200
Alistair Crooks a...@pkgsrc.org wrote:

Sorry for replying to an old thread, I'm still catching up with mail :)

  i've found this some what annoying.  IMO, we should have a a way to say
  let normal users do this.  i'm not sure sysctl is the right place, but
  maybe an overlay secmodel?  on some of my machines, i don't want to have
  to be root to do this.  it's annoying to have to use root to get the
  highest performance i can out of an application.
  
  the current default is fine, however.
 
 Something analogous to our friends:
 
 % sysctl -a | grep mount
 vfs.generic.usermount = 0
 security.models.suser.usermount = 0
 %

And/or like   security.models.bsd44.curtain,  etc; I think that a
sysctl for this would be nice too.

Also, I'm not sure if this is doable (an annoyance if users and scripts
have been using the old knobs), but I tend to think that sysctls that
affect the default secmodel (bsd44) should ideally all be under
security.models.bsd44.?
-- 
Matt


Re: 5.1 USB panic on second removal of memory stick

2011-09-19 Thread Matthew Mondor
On Wed, 15 Jun 2011 20:04:23 -0700
Bob Lee g...@force10networks.com wrote:

Hello Bob,

   I'm working on a PowerPC system, and have a problem when I remove the
 usb memory stick the second time.  That is insert memory stick, remove
 memory stick, insert memory stick, and remove memory stick.
   At this point the system panics, with 'ehci_rem_qh: ED not found'.
   Anyone else seen anything like this?

If your problem still occurs, please file a PR, along with the
backtrace and dmesg, so that it doesn't get lost.  I think that it
should be filed in the kern category.

Thanks,
-- 
Matt


Re: hot swap storage devices

2011-09-19 Thread Matthew Mondor
Sorry to reply to such an old thread (I'm catching up with ml mail).

On Mon, 27 Jun 2011 12:35:48 -0700
Erik Fair f...@netbsd.org wrote:

 With regard to hot swap storage devices, we really have two choices which 
 are not mutually exclusive:
 
 1. Treat as now, but with some additional code in the kernel which yells, 
 hey! put that back! I have data to write on it! when a device goes away 
 without prior notice (umount), and hold on to (rather than discard) the data 
 in the I/O buffer cache, in the hope that the user notices and heeds the 
 directive. Timeout to discard? Probably depends upon how much RAM utilization 
 pressure you're under. I think minutes would be a good unit here.

I think that this is the best solution;

This is basically what AmigaOS did, and it was nice, but it also had a
unified interface where even console was implemented on top of
graphics, with intuition.library resident in ROM, making it possible to
pop-up requesters at any time.  And it was designed for single-user...

This is more tricky in our case though; as the kernel should then be
able to forcefully trigger a requester, which ideally shouldn't
interrupt running processes and from which it must be possible to
resume working, on whatever currently active interface (console,
possibly in tmux/screen, or X11).

I wonder what the feasibility of this could be: reserve a wscons VT
(where possible) for this type of requester; when the kernel must use
it, remember which is the active VT, switch to the requester VT in text
mode where the requester is shown.  Depending on configuration, this
behaviour could be enabled or disabled, and possibly a timeout could be
configured.  Once the timeout expired or the needed user action was
performed (user selects cancel, retry, inserts a requested device,
etc), return back to the previous VT.

But this still does not deal with device identification; on AmigaOS
disks had labels and the system would verify upon insert/connect if the
label corresponded to such a pending requester...

Thanks,
-- 
Matt


Re: pty(4) 1024 bytes buffer limit

2011-09-09 Thread Matthew Mondor
On Fri, 9 Sep 2011 00:26:43 + (UTC)
chris...@astron.com (Christos Zoulas) wrote:

 Please file a PR about this. I've been meaning to fix it.

Thanks, I will.
-- 
Matt


Re: pty(4) 1024 bytes buffer limit

2011-09-09 Thread Matthew Mondor
On Fri, 09 Sep 2011 08:30:51 +1000
matthew green m...@eterna.com.au wrote:

  I looked at the various tty(4) termios(4) and pty(4) without finding an
  option to change the buffer size.  Is there a way at all to change it?
 
 there's no option.  infact, it's all hard coded as magic 1024 constants
 in about 4 places in sys/kern.  i kept meaning to fix that, but haven't
 gotten around to it.

Thanks for the confirmation,
-- 
Matt


pty(4) 1024 bytes buffer limit

2011-09-08 Thread Matthew Mondor
Hello,

I've been wondering if it was possible to change the pty(4) internal
buffer size, as I noticed that ppp tunnels cannot use a larger frame
size.  Because of this, it seems that the optimal MTU be 856, which is
so small that context switches become the bottleneck.

It would be nice to for instance be able to use an MTU of 3000 so that
there are less context switches, but unfortunately tracing the
processes show that 1024 bytes are read from the pty devices at most.

I looked at the various tty(4) termios(4) and pty(4) without finding an
option to change the buffer size.  Is there a way at all to change it?

Thanks,
-- 
Matt


Re: sys/dev/isa/fd.c FDUNIT/FDTYPE

2011-05-07 Thread Matthew Mondor
On Wed, 4 May 2011 19:54:37 -0700
jnem...@victoria.tc.ca (John Nemeth) wrote:

  This doesn't mean we should be doing hack jobs.  NetBSD is about
 doing things right.

Can postinstall fix/recreate specific buggy devices?  Or could it warn
that /dev/fd* might need to be recreated?  Otherwise, at least it
should be mentionned in UPDATING, and that would be allright, IMO.
-- 
Matt


Re: extent-patch and overview of what is supposed to follow

2011-04-02 Thread Matthew Mondor
On Sat, 2 Apr 2011 11:49:14 +0200
Martin Husemann mar...@duskware.de wrote:

 On Sat, Apr 02, 2011 at 11:30:16AM +0200, Manuel Bouyer wrote:
  AFAIK dtrace doesn't work on non-modular kernels ...
 
 Nor on most of our archs, and AFAICT there is not even a document 
 describing the (maybe nontrivial amount of) work needed to make it
 work there.

I don't think that we should leave the tracking for a hypothetical
future; it'd be better if the interface, or implementation, allowed to
do such tracking
-- 
Matt


Re: Status and future of 3rd party ABI compatibility layer

2011-04-01 Thread Matthew Mondor
On Wed, 23 Mar 2011 16:06:07 +0100
Joerg Sonnenberger jo...@britannica.bec.de wrote:

 As such, I want to propose moving the last two categories into the Attic
 for further dusting.

It makes sense to me,
-- 
Matt


Re: Status and future of 3rd party ABI compatibility layer

2011-03-03 Thread Matthew Mondor
On Wed, 2 Mar 2011 00:40:44 +
Andrew Doran a...@netbsd.org wrote:

 With modules now basically working we should either retire or move
 some of these items to pkgsrc so that the interested parties maintain them.
 An awful lot of the compat stuff is now very compartmentalised, with not
 much more work to do.

Is all compat code i386 specific?  Otherwise, do modules really work on
all architectures involved?  Can a module built from third-party code
be linked statically to a monolithic kernel without hassle, for systems
on which enabling loadable modules is not allowed?

Thanks,
-- 
Matt


Re: mpt Serious performance issues

2011-02-04 Thread Matthew Mondor
On Fri, 4 Feb 2011 09:17:01 +0100
Stephan stephan...@googlemail.com wrote:

 Now this is REALLY strange. I was wondering about why the read speed
 is sometimes high (~70MB/s) and sometimes very slow (~2MB/s). So I
 repeated the test utilizing
 
 find / -exec cat {} \;  /dev/null 
 
 to read everything from the filesystem while watching the physical
 disks with my eyes and the throughput with sysstat. The findings is
 
 -that sometimes the upper disks is 100% busy while the lower disk is
 NOT being accessed at all, and the read speed is ~2MB/s
 -then sometimes the adapter switches to the lower disk while the upper
 disk isn´t utilised anymore, and the read speed increases to ~70MB/s
 -until the adapter again switches to the upper disk which leads to the
 massive decrease in speed
 
 So what do you think about that?

Just in case, none of those disks show any reallocated sectors using
atactl smart status?  I'm asking because I've seen very inconsistent
speeds on some drives whenever the remapping logic had to be turned
on.  Also, nothing in dmesg about read error retries?  As I've also
seen brand new disks with very high read error rates but otherwise
normal smart stats.  They two would crawl when reading certain areas.
Unfortunately I'm seeing this later defect more often recently.
-- 
Matt


Re: freebsd 5.99.41 as XEN3_DOMU

2010-12-24 Thread Matthew Mondor
On Sun, 19 Dec 2010 20:54:26 +0100
Manuel Bouyer bou...@antioche.eu.org wrote:

 Well, in the current state, modules are a not enabled in the Xen kernels
 (modules should be built specifically for Xen, but the build tools do not
 allow this right now). So you have to compile all what you need in a
 monolitic kernel. But ZFS is only available as module, so unfortunably
 this means no ZFS for xen.
 One way around it is to run NetBSD in a HVM guest.

It it common for modules not to be able to be statically linked in a
monolithic kernel?  I understand providing ZFS as a module is
convenient for licensing reasons, but probably that it shouldn't be too
hard to somehow optionally link such a module to a kernel image at
build time, and call an init/load hook at boot runtime?

I tend to think that other than allowing to optionally dynamically load
code, another advantage to modules would probably be that they also can
optionally be included monolitically, with ideally no code changes...

Thanks,
-- 
Matt


Re: New apple keymap variant or keymap in /usr/share/wscons/keymaps?

2010-11-28 Thread Matthew Mondor
On Sun, 28 Nov 2010 21:04:54 +0100
Frank Wille fr...@phoenix.owl.de wrote:

 I came to the conclusion that it might be easier and less intrusive to
 create a new keymap file (e.g. called ukbd.apple.powerbook) for those
 function keys. So they can easily be added to any national keyboard layout.
 
 But I realized that wsconsctl is unable to process a mapping-line with just
 one Cmd_*, or a Cmd followed by Cmd_Function in it. When there is no good
 reason that those are rejected I will fix it in the wsconsctl-parser now.

When a while ago I posted PRs with a new keymap to be added to the
kernel, I was told that they now should ideally be added as userland
keymaps.  When later supplying a userland keymap (the FR_CA one), I
noticed that the interface wasn't as friendly or powerful as it could
have been.

In case you intend to also enhance the keymap infrastructure and
interface, I have an old pending PR (misc/26720) with a few
enhancements for it, but I never got back to update the diffs for a
recent -current or to keep enhancing it.  Those are userland changes
though, possibly tech-userlevel is a better place to continue the
thread in this direction.

But other than encoding= support, it might also be nice to be able to
have include support like include= as well, after which it would be
possible to restructure the keymaps and move common parts together; and
if such include support allowed conditionals, parts could be loaded
conditionally and automatically depending on machine model (assuming
that would become available via sysctl), etc.

What demotivated me from keeping to work on it back then was the low
interest of the developers about that PR, but most importantly that I'm
usually using X11 terminals and ssh myself, with the default EN-US
wscons keymap being fine when I'm really at the console (and that almost
exclusively occurs at installation time).

If we want to pipe-dream, for the future, now that there's Lua in base,
it's even possible to redo the whole userland keymap loading/management
part with a more powerful language than sh.  This last part being back
on topic with tech-kern, with the advent of the kernel-lua project, it
might even be possible to eventually allow user translation mechanics
in the form of Lua scripts... :)
-- 
Matt


Re: vmpage race and deadlock

2010-11-28 Thread Matthew Mondor
On Sun, 28 Nov 2010 09:30:44 +0100
Juergen Hannken-Illjes hann...@eis.cs.tu-bs.de wrote:

 Usually within hours I get a deadlock where a thread is waiting on genput
 but the page in question is neither BUSY nor WANTED.  I suppose I tracked (*1)
 it down to three places, where we change page flags without holding the
 object lock.  With this diff (*2) in place the test runs for  48 hours.

This is a nice find, which most probably also deserves a PR, as
netbsd-5 also lacks proper synchronization there.

Thanks,
-- 
Matt


Re: mlock() issues

2010-10-22 Thread Matthew Mondor
On Fri, 22 Oct 2010 10:18:52 +0100
Sad Clouds cryintotheblue...@googlemail.com wrote:

 A pipelined request, say for 10 small files can be served with a single
 writev() system call (provided those files are cached in RAM), if you
 rely on kernel file cache, you need to issue 10 write() system calls.

Is this also true if the 10 iovecs point to mmap(2)ed files/buffers
which pages were recently accessed?
-- 
Matt


Re: mlock() issues

2010-10-22 Thread Matthew Mondor
On Fri, 22 Oct 2010 12:06:37 +0100
Sad Clouds cryintotheblue...@googlemail.com wrote:

 Well if you're allocating memory yourself, then you've just created your
 own application cache.

Say many files were mapped in the process's address space, the OS would
still be responsible of keeping frequently used ones pages active,
possibly swapping out long-unused ones, unless of course MAP_WIRED was
used.  A syscall per access would be eliminated however, i.e. read(2),
and I think that zero-copy could be used (with page loaning) if writing
64KB blocks out to a socket from a memory-mapped file.

 On the other hand if you mmap() those files
 directly, what happens if another process truncates some of those files
 while you're reading them?

I didn't do a test (it's definitely worth testing), but I think that a
SIGSEGV could occur if a previously available page disappeared unless
MAP_COPY, and file need to be remapped.

I could see a problem where a siginfo-provided address might need to be
easily matched with the file so that the process can efficiently know
which file to remap...  and for many files the current kqueue(2)
EVFILT_VNODE isn't very useful either to detect that a file was
recently modified, as it'd require too many open file descriptors :(

There was some discussion made years ago about a kqueue(2) filter that
could be set on a directory under which any modified file (possibly for
the whole involved filesystem for the superuser) would generate an
event with information about which file is modified by inode, but this
seems non-trivial and wasn't yet implemented.  There also are issues
with inode to file string lookup (multiple files could point to a
common destination, and a reverse name cache is needed).

Anyway, I like this kind of discussion and have nothing against NIH
personally (it fuels variety and competition, in fact), so thanks for
sharing your custom cache experiments and performance numbers.  If you
happen to do achieve interesting performance along the above
lines with mmap(2) as well, I'd also like to know how it went.

Thanks,
-- 
Matt


Re: kernel module loading vs securelevel

2010-10-18 Thread Matthew Mondor
On Mon, 18 Oct 2010 14:51:03 +0200
Jean-Yves Migeon jeanyves.mig...@free.fr wrote:

 *lurker mode off*
 IIRC, part of agc work with netpgp is to integrate signature verification
 within kernel.
 *lurker mode on*

Thanks, that's nice to know, I didn't look at netpgp yet but might
eventually check if its RSA implementation (if any) can eventually be
worked into common/lib/libc/rsa, which would be a major step forward to
allow the kernel to verify signatures.

I started writing a task list to have an idea of what needs to be done,
and it's not trivial
(http://cvs.pulsar-zone.net/cgi-bin/cvsweb.cgi/mmondor/netbsd/signed_modules.txt?rev=1.5;content-type=text%2Fplain).
I might give an implementation a try during my next vacations, but no
timeline or guarantee (disclaimer!).  Motivation is also a factor as my
current (very simple) solution to the various MODULAR issues I've faced
(mostly maintenance related) has been so far to use monolithic kernels.
-- 
Matt


Re: kernel module loading vs securelevel

2010-10-18 Thread Matthew Mondor
On Mon, 18 Oct 2010 09:31:32 -0400
Steven Bellovin s...@cs.columbia.edu wrote:

 Signatures provide *authentication*; what is needed here is *authorization*.

While I agree, there also are situations were both can be welcome...

Another solution someone proposed which I like is hashing the modules
to then at load time rehash and match a module against the hash set,
which would be a simpler, shorter-term solution.  I think that
embedding the hashes set in the kernel image would be safer than using
a file, however.  Unfortunately, this makes developing, installing or
upgrading a module less friendly as the kernel image has to be
refreshed and the system rebooted.
-- 
Matt


Re: kernel module loading vs securelevel

2010-10-17 Thread Matthew Mondor
On Sat, 16 Oct 2010 13:58:19 -0400
Thor Lancelot Simon t...@panix.com wrote:

   2) Finish the asymmetric operation support in cryptodev and
  actually require modules to be signed.  This is basically a
  superset of #1 above that could get about as complicated as
  one wanted it to (ugh) but might be worthwhile if kept simple.

You seem to now agree with me that this could be a solution.  It
indeed requires more work, but it also has advantages: not having to
care about module location or their mutability; allowing delegation
(multiple trusted public keys allow to verify signatures of various
trusted third parties), among others.

A developer working on a module only has to sign it without any further
trouble to test it (assuming he included his public key in the kernel
image).  No need to go change the flags of a hashes file (a plausible
point of failure anyway), update it, make it immutable again, etc.

Of course a serious problem would still exist if the kernel's database
of trusted keys could be modified.  An effort could be made so that
these cannot be modified at runtime but only at kernel image build
time, requireing a reboot, and those that care can manage to load the
kernel image from a read-only source.

To simplify things, couldn't X.509 parsing strictly be done by the
userland build infrastructure?  The list of trusted keys can be stored
in a simple binary format as part of the kernel image, and the module
signature can also be stored as a simple binary format as part of that
module.  If you want to be able to revoke an existing key at runtime,
support the use of subkeys and CAs and the like, things suddenly become
more complex, but I don't think it's necessary for this.

Even a simpler system with no trusted entities list could make use of
this: a random key pair could be generated at build time, the public
part of it the only stored key in the kernel image, and all modules
signed with the private part of that key, which then gets discarded...
Although the only advantage over veriexec-like hashes in this case
would be reduced kernel image read-only data segment (i.e. one 1024-bit
public key stored instead of n * 160-bit hashes).
-- 
Matt


Re: [ANN] Lunatik -- NetBSD kernel scripting with Lua (GSoC project results)

2010-10-11 Thread Matthew Mondor
On Sun, 10 Oct 2010 19:45:41 -0600
Samuel Greear l...@evilcode.net wrote:

 I didn't like the fact that the only option for loading a script into
 the kernel was to load the script source. I would make loading
 pre-compiled scripts the preferential method. In fact, I would
 probably tear eval out of the kernel lua implementation and only
 support loading of precompiled byte-code into the kernel.

If the tokenizer is considered heavy, or a potential source of exploit,
or if scripts are expected to frequently be loaded and a peformance
bottleneck exists, I also think that loading pre-tokenized bytecode
would be a good idea.

However, there are several things to consider: some systems (i.e. Java)
do important sanity checks at tokenization time.  Is this important for
Lua?

Secondly, is the Lua bytecode using a stable, well defined instruction
set which is unlikely to change?  Otherwise as it improves and gets
updated any pre-tokenized scripts might need to be regenerated.  Of
course, that's probably not an issue if everything is part of the base
system and always get rebuilt together.

Thanks,
-- 
Matt


audio/video capture timestamping

2010-08-12 Thread Matthew Mondor
Hello,

Since I have an old Brooktree878 card which NetBSD supports, which I
successfully used in the past with custom software using bktr(4) as
part of a security suite, I thought I'd give it a new life and try to
convert rare VHS which were rotting in a drawer to a digital format.

I tried mencoder and ffmpeg, at first encoding in real-time, and had
a/v sync problems, so I then tried simply capturing the stream to an
interleaved avi file without compression, but unfortunately the issues
still persisted.

With mencoder, the video would often skip a bit to resync with the
smooth audio, and with ffmpeg the audio would often skip to resync with
the smooth video.  Capturing from bktr(4) alone, or from audio(4)
alone, is fine.

If the source was not a tape, it would probably be possible to dump the
video and audio streams separately and multiplex them afterwards, but
this would probably be useless because of the mechanical issues causing
slight speed variations.  Encoding both streams with mencoder and the
same card worked on Linux anyway, and I was then curious as to what was
wrong on NetBSD.  I also tried with another audio chip on NetBSD,
without success.

After reading a thread
(http://www.mail-archive.com/po...@openbsd.org/msg26418.html) about it,
it seems that the main problem would be our capture devices not
supporting timestamps, which if available could be used by an encoder
to more accurately synchronize audio/video.

I'd like to know if someone already thought about those issues on
NetBSD or already started some work to allow this.  Indeed, on Linux
with the same card and software the synchronization is better, and the
timestamps are probably the reason.  ALSA appears to support querying
the timestamp for a recorded buffer, and v4l2 also seems to support
timestamps.

We do have a v4l2 partial video(4) implementation, although I didn't
yet try capture with a uvideo(4) through it, and didn't yet read enough
of its code to see if timestamps are supported.  In any case, it
probably means that it'd be possible to eventually hook bktr(4) though
video(4) as well via video(9) and provide timestamps for userland to
use...

As for audio(4), I don't know how it could safely be extended to
support timestamps without backwards compatibility issues, other than
perhaps allowing an ioctl(2) to be used to request that timestamps be
enabled, and yet another ioctl(2) to request the latest timestamp for
the latest read buffer, which might also be considered quite hackish.
Was there a WIP for another audio interface already for NetBSD, which
also supports timestamps?  Or does anyone have suggestions on how
audio(4) could provide timestamps decently?

I honestly have no idea on how much time I myself could put working on
this, but it'd already be nice to determine what is really wanted in
this area for the future, so that if I (or any other coder) has enough
interest and time, progress could be made...  so I'm asking for
opinions and ideas.

Thanks,
-- 
Matt


Re: Length of wmesg for condvar?

2010-08-09 Thread Matthew Mondor
On Sun, 8 Aug 2010 17:23:23 -0700 (PDT)
Paul Goyette p...@whooppee.com wrote:

 Should these be changed?  Are there any adverse effects from having a 
 wmesg longer than 8 characters?

It seems to me that the exporter of those use strncpy() (i.e.
kern/init_sysctl.c) and that the structures use WMESGLEN and
KI_WMESGLEN both defined as 8.  So other than inadvertently truncated
names it at least should not cause corruption, but I think that
truncated names could also be problematic when trying to distinguish
two strings starting with the same 8 characters (is that likely now)?
Especially when the only thing that differs between two states is some
suffix like rd and rw...  After all, those are intended for
humans :)
-- 
Matt


Re: Length of wmesg for condvar?

2010-08-09 Thread Matthew Mondor
On Mon, 9 Aug 2010 22:21:02 +0100
David Laight da...@l8s.co.uk wrote:

 On Mon, Aug 09, 2010 at 02:02:51PM -0700, Paul Goyette wrote:
  
  Does anyone object to my going through and coming up with shorter names 
  (= 8 chars) for these condvars?
 
 It is worth chcking whether they are displayed with a %.8s format
 (or similar) so that they don't need to be 0 terminated.
 Otherwise the names must be strictly less than 8 bytes.
 
   David
 
 -- 
 David Laight: da...@l8s.co.uk
 

That is worthy of concern, so I checked top and ps:

top uses
char wmesg[KI_WMESGLEN + 1];
strlcpy(wmesg, pp-p_wmesg, sizeof(wmesg));

ps uses
strprintorsetwidth(v, l-l_wmesg, mode);
v-width = min(v-width, KI_WMESGLEN);

Thanks,
-- 
Matt


Re: Using coccinelle for (quick?) syntax fixing

2010-08-08 Thread Matthew Mondor
On Sun, 08 Aug 2010 18:05:11 +0200
Jean-Yves Migeon jeanyves.mig...@free.fr wrote:

 Opinions? Any interest in it? My intent is to put NetBSD specific
 scripts on wiki.n.o, and provide links for more generic ones.

That seems like a handy tool to save time and avoid a number of
typos, if it's used right.  Thanks for sharing, I personally didn't
know Coccinelle.  And example scripts can often be more useful than
plain documentation, especially if it's in a WIP state (I liked that
they showed in a few lines why it's better than sed :))
-- 
Matt


Re: Preserving early console output (pre-Copyright stuff)

2010-07-01 Thread Matthew Mondor
On Thu, 1 Jul 2010 06:00:41 -0700 (PDT)
Paul Goyette p...@whooppee.com wrote:

 That's what I thought I'd get for an answer!  :)
 
 There is a serial port, but I haven't figured out yet how to make it 
 work in the BIOS.  And while I do have other machines with serial ports 
 I've never used those ports and don't even have serial cables!  (The 
 last time I used a serial cable was way back in the days of modems and 
 dial-up 'net access!)

Sometimes I've been thinking about this as more and more hardware don't
ship with RS232 anymore.  Is there a relatively common BIOS interface
which would allow, even if non-efficiently, to use a USB port as a
serial device without too much code?  If so, possibly that a special
usb-serial bootblock could be using that sometime in the future?

If there is no BIOS common interface, I can see it's a problem because
of all the driver code that'd be needed at boot time...

Thanks,
-- 
Matt


Re: why not remove AF_LOCAL sockets on last close?

2010-06-25 Thread Matthew Mondor
On Thu, 24 Jun 2010 22:55:51 -0400
Thor Simon t...@coyotepoint.com wrote:

 Can anyone tell me why, exactly, we shouldn't remove bound AF_LOCAL
 sockets from the filesystem on last close?  The following test program
 produces second socket bind failed on every system I've tested it on,
 and seems to cover the only possible use case for this feature...

I initially had the impression that leaving the socket around was a
feature to allow re-binding to the same file by an unprivileged process
after first creating the socket node as root (i.e. at a location where
unprivileged processes cannot create new files such as /var/run/) to
then set its permissions in a way to permit the unprivileged user or
group to bind(2) it.

However, I wrote a small test program and realized that despite
SO_REUSEADDR this doesn't work, and indeed after checking the kernel
code SO_REUSEADDR is ignored in the AF_LOCAL unp_bind() code.


#include sys/types.h
#include sys/socket.h
#include sys/un.h

#include err.h
#include stdio.h
#include stdlib.h
#include string.h
#include unistd.h

int
main(int argc, char **argv)
{
struct sockaddr_un  sun;
int s, opt;

if (argc != 2)
errx(EXIT_FAILURE, Usage: %s path, argv[0]);

if ((s = socket(PF_LOCAL, SOCK_DGRAM, 0)) == -1)
err(EXIT_FAILURE, socket());

opt = 1;
if (setsockopt(s, SOL_SOCKET, SO_REUSEADDR, opt, sizeof(int)) == -1)
err(EXIT_FAILURE, setsockopt(SO_REUSEADDR));

sun.sun_family = AF_UNIX;
sun.sun_len = sizeof(sun);
(void)strlcpy(sun.sun_path, argv[1], sizeof(sun.sun_path));

if (bind(s, (struct sockaddr *)sun, sun.sun_len) != 0)
err(EXIT_FAILURE, bind());

(void)close(s);

return EXIT_SUCCESS;
}

$ cc -o test test.c
$ ./test /tmp/foo.sock
$ ./test /tmp/foo.sock
test: bind(): Address already in use


So I to do what I described above, one has to create a directory
in /var/run instead, with permissions such that the unprivileged
process can create a file there.

Then I'm unsure why we leave those sockets dangling around, although
it's quite easy to explicitely unlink them at close time...
-- 
Matt


Re: why not remove AF_LOCAL sockets on last close?

2010-06-25 Thread Matthew Mondor
On Fri, 25 Jun 2010 14:51:45 +0200
Joerg Sonnenberger jo...@britannica.bec.de wrote:

 On Thu, Jun 24, 2010 at 10:55:51PM -0400, Thor Simon wrote:
  Can anyone tell me why, exactly, we shouldn't remove bound AF_LOCAL
  sockets from the filesystem on last close?
 
 If you want to do that, wouldn't it be easier to just go the Linux route
 and move them into a separate (virtual) namespace completely?

Could this not pose security risks in certain scenarios?  Or would such
a namespace also support permissions?

Thanks,
-- 
Matt


Re: why not remove AF_LOCAL sockets on last close?

2010-06-25 Thread Matthew Mondor
On Fri, 25 Jun 2010 09:19:03 -0400
Thor Simon t...@coyotepoint.com wrote:

 I think this is (always has been) a considerable blind spot on the part
 of BSD partisans.  Sure, we're happy to gripe about persistent SysV IPC
 objects every time we have to remember how to use ipcrm, but bound AF_UNIX
 sockets have the same issue, and we just ignore it.

I don't think most people have trouble with SysV IPC, considering those
persistent resources were often used by short lived, but frequently used
commands/processes, utilising both the permissions and persistent
resources features (and NetBSD allows the admin to set the limit of the
various SysV resources with accuracy); admitedly we can now do the
same using files, mmap and advisory locks, though.

But I agree that if leaving the sockets around permits no interesting
feature whatsoever (i.e. it doesn't even serve for SO_REUSEADDR), it
very well could be a design or implementation bug, even if common
software already explicitely unlink AF_LOCAL sockets to account for
this issue...
-- 
Matt


Re: why not remove AF_LOCAL sockets on last close?

2010-06-25 Thread Matthew Mondor
On Fri, 25 Jun 2010 08:59:18 -0400
Matthew Mondor mm_li...@pulsar-zone.net wrote:

 However, I wrote a small test program and realized that despite
 SO_REUSEADDR this doesn't work, and indeed after checking the kernel
 code SO_REUSEADDR is ignored in the AF_LOCAL unp_bind() code.

Out of curiosity, I modified the test to see if immediately unlinking
the socket node after bind(2) would leave it around until it's closed, a
feature which some software expect for files on certain OS/FS
combinations.

However, the socket node is immediately deleted at unlink(2) even if
it's still open and bound, so an application also shouldn't rely on
this.


Re: updating COMPAT_LINUX for linux 2.6.x support (take 2)

2010-06-17 Thread Matthew Mondor
On Thu, 17 Jun 2010 10:25:59 +
Andrew Doran a...@netbsd.org wrote:

 This is mainly down the fact that we need kernel_lock to bracket legacy
 sections of code that aren't preemption safe.  I think MULTIPROCESSOR
 should be sent off to the glue factory but that's another discussion :-).

Is there any way that performance for the uniprocessor case could be
preserved, where some synchronization/preemption-safe blocks are
unnecessary, without having conditional code when MULTIPROCESSOR?

Or is it that for uniprocessor the same precautions are always required
on -current now?

Thanks,
-- 
Matt


Re: Writing to multiple descriptors with one system call

2010-03-18 Thread Matthew Mondor
On Thu, 18 Mar 2010 21:36:47 +0100
Jean-Yves Migeon jeanyves.mig...@free.fr wrote:

 Pretty much all servers use the accept loop thing and fork/pthread right 
 after, but this was not my point.

High performance non-single-threaded servers often maintain a pool of
persistent processes or threads which accept(2) concurrently, either in
blocking mode or with polling (generally polling to allow listening to
multiple addresses/interfaces).  But indeed this doesn't change much
for this thread...

 Having 80% system time passed in write() calls is not negligeable, but 
 if you send the data byte after byte, I hardly see why it would be the 
 syscall's fault here. You will have to assess that the overhead does 
 indeed come from the context switch, and not by queuing up packets for 
 the PHY, block I/Os, or moving data around the IP stack. There is a big 
 mess behind a write(2), and the context switch is just one small part of 
 it. Instrument. You can't control what you can't measure.

Agreed
-- 
Matt


Re: Writing to multiple descriptors with one system call

2010-03-17 Thread Matthew Mondor
On Wed, 17 Mar 2010 16:22:44 +
Sad Clouds cryintotheblue...@googlemail.com wrote:

 On Wed, 17 Mar 2010 16:01:28 +
 Quentin Garnier c...@cubidou.net wrote:
 
  Do you have a real world use for that?  For instance, I wouldn't call
  a web server that sends the same data to all its clients *at the same
  time* realistic.
 
 Why? Because it never happens? I think it happens quite often. Another
 example is a server that is sending live data, i.e. audio playback,
 video stream, etc. If you can't use multicasting over a WAN, then you
 have a situation where you are streaming the same data to large number
 of clients.

In the past I wrote a custom httpd which read broadcast security camera
frames from LAN to broadcast them over connected HTTP clients, and
since clients remain connected with keep-alive, it has to iterate
through connections to send in new packets.
(http://cvs.pulsar-zone.net/cgi-bin/cvsweb.cgi/mmondor/tests/bktr_httpd/)

However, clients which cannot cope with the sending speed are
throttled so that some packets are skipped, which makes things a
little more complex than simply using a send this message to all
FDs...

kqueue(2)/kevent(2) were used for polling, and in my case the available
bandwidth was always the bottleneck, however.

I also have a question: did your test really use non-blocking sockets
for writing, and an efficient polling mechanism like kqueue or libevent
used, while disabling write polling when the sendq is empty, enabling
it back when there's data to send, and only sending data when a poll
event indicates that write is allowed?  Otherwise, I assume that the LWP
would lock on write(2).

If a broadcast writev(2) to multiple FDs variant existed, it possibly
would have to present an interface similar that of kevent, or be tied
as a new protocol over kqueue, because of the FD specific
errors/events...  libevent for instance also supports transfer buffer
queues and could possibly be adapted to support such a feature too.
However I'm also unsure if this would really help or just move some
userland and syscall overhead up to kernel overhead and achieve a
similar overall performance.  A test implementation might indeed be
needed, to really know :(
-- 
Matt


Re: (Semi-random) thoughts on device tree structure and devfs

2010-03-09 Thread Matthew Mondor
On Tue, 9 Mar 2010 21:59:23 + (UTC)
chris...@astron.com (Christos Zoulas) wrote:

 In article 70f62c5e1003091104l20b98c5ex66842f01e6f17...@mail.gmail.com,
 Masao Uebayashi  uebay...@gmail.com wrote:
  Wow, that sucks.  Not being able to change permissions (and less 
  importantly,
  mv or rm the device files) would definitely be a problem.
 
 Could you show me use cases how it sucks?  I need more use cases.
 
 - I want to present a subset of devices to a chrooted devfs.
 - I want to give a different set of permissions than the default.
 - I want to be able to call a device by a different (symbolic name) without
   using symlinks.
 - I want to prevent access to the device completely by not providing a device
   node.
 - I want to preserve those changes across boots.
 - I want to be able to move all my disk devices to a subdirectory.

I had to deal with every of those scenarios, and never could stand
existing devfs implementations on my systems; I however previously
participated to a thread about devfs with ideas and suggestions for a
possibly less broken pipe-dream implementation, but it simply tought me
how complex a decent implementation would have to be, IMO.

I however like the idea of simply having additional symlinks
automatically be created to redirect unique names to the actual
existing nodes (possibly the best implementation of this would be done
via a virtual fs controled by the kernel, mounted under /dev/uuid/ or
the like?).  This wouldn't affect the target device node permissions,
at least, and might solve most of the hotplug issues for users who
need automount or can't track dmesg to then manually mount a device...

Of course, if a removable device is supposed to move around a few sb*
nodes depending on when/where it's plugged, then at least the admin can
set permissions for all devices in that class, additionally to the
permissions for the fs in /etc/fstab, just as traditionally.
-- 
Matt


  1   2   >