Re: Making dhcpcd work on diskless clients

2015-02-09 Thread Alan Barrett

On Sun, 08 Feb 2015, Roy Marples wrote:

since this problem goes away if we make all of dhcpcd in-memory first,
possibly what happens here is that with i386 or amd64, the layout is
such that we don't ever try to fault in code during the small period
of time that the route is missing.


I don't fully understand what you are saying.


Some part of the code that you are trying to run may not be in 
memory, so you may encounter a page fault when you try to run it.  
Responding to the page fault involves reading information from 
the the file system.  On a diskless client, reading from the file 
system actually involves transferring data over the network from 
the remote file server.  If you try to read from the file server 
at a time when the routing table does not contain a usable route 
to the file server, then you lose.


But do you have an idea of how this can be fixed then without 
dhcpcd having to learn the routing table at load time?


Do you currently use RTM_DELETE and RTM_ADD?  Can you use RTM_CHANGE
instead?

--apb (Alan Barrett)


Re: Reuse strtonum(3) and reallocarray(3) from OpenBSD

2014-12-01 Thread Alan Barrett

On Sat, 29 Nov 2014, Kamil Rytarowski wrote:
My proposition is to add a new header in src/sys/sys/overflow.h 
(/usr/include/sys/overflow.h) with the following content:


operator_XaddY_overflow()
operator_XsubY_overflow()
operator_XmulY_overflow()

X = optional s (signed)
Y = optional l,ll, etc
[* see comment]


OK, so you have told us the names of the proposed functions.  But what
are their semantics, and why would they be useful?

Last but not least please stop enforcing 
programmers' fancy to produce this kind of art: 
https://github.com/ivmai/bdwgc/commit/83231d0ab5ed60015797c3d1ad9056295ac3b2bb 
:-)


Please don't assume that people reading your email messages have
convenient internet access.  It's fine to give URLs thatrexpand on what
you have said, but if you give the URL without any explanation then I
have no idea what you are talking about.

--apb (Alan Barrett)


Re: posix_madvise(2) should fail with ENOMEM for invalid adresses range

2014-11-24 Thread Alan Barrett

On Sun, 23 Nov 2014, Nicolas Joly wrote:
According the OpenGroup online document for posix_madvise[1], it 
should fail with ENOMEM for invalid addresses ranges : [...] But 
we currently fail with EINVAL (returned value from range_check() 
function).


In general, when POSIX doesn't make sense, NetBSD should not need 
to follow what POSIX says.  In this case, it probably doesn't 
matter much.


--apb (Alan Barrett)


Re: CTLTYPE_UINT?

2014-10-05 Thread Alan Barrett

On Sat, 04 Oct 2014, Justin Cormack wrote:
I agree about being explicit with the 32 bitness, but using S64 
and U64 as the 64 bit names to be consistent with FreeBSD might 
make sense.


The S64 and U64 names are fine.  I'd also add S32 and U32.

long types seems best avoided if possible, you can see the 
temptation to use them for memory amounts, but you could be 
running on 32 bit userspace on a 64 bit kernel.


One of the reasons that I like user/kernel interfaces to use 
types with explicit bit width is to simplify running 32-bit 
userland on 64-bit kernels.  If you use a type whose actual size 
changes between 32 bits and 64 bits, then the kernel has to have 
a compatibility layer to copy and adjust the data, and tools like 
kdump or ktruss should also translate (it's a bug that they don't 
do so today).  If you use a type that's always 64 bits, then it's 
much easier to deal with.  Occasionally, an argument for run-time 
efficiency in a 32-bit userland will outweigh this argument for 
ease of coding, and then a type whose size changes should be 
used.


--apb (Alan Barrett)


Re: CTLTYPE_UINT?

2014-10-04 Thread Alan Barrett

On Fri, 03 Oct 2014, Justin Cormack wrote:

Back in the sysctl discussion a while back, core group said:

http://mail-index.netbsd.org/tech-kern/2014/03/26/msg016779.html

a) What types are needed?  Currently, CTLTYPE_INT is a signed
  32-bit type, and CTLTYPE_QUAD is an unsigned 64-bit type.
  Perhaps all four possible combinations of signed/unsigned and
  32 bits/64 bits should be supported.


If you add new sysctl types, please use names that describe 
the size and signedness.  For example, rename CTLTYPE_INT to 
CTLTPE_INT32, keep CTLTYPE_INT as a backward compatible alias 
for CTLTYPE_INT32, and add CTLTYPE_UINT32.  Similarly, rename 
CTLTYPE_QUAD to CTLTYPE_UINT64, keep CTLTYPE_QUAD as an alias, 
and add CTLTYPE_INT64.  Please don't add a CTLTYPE_UINT with no 
indication of its size.


A survey of what other OSes do would also be useful.

--apb (Alan Barrett)


Re: Unification of common date/time macros

2014-09-22 Thread Alan Barrett

On Mon, 22 Sep 2014, Robert Elz wrote:

 | My proposition is [...]
 |
 | #define SECSPERMIN  60L
 | #define MINSPERHOUR 60L
 | #define HOURSPERDAY 24L
 | #define DAYSPERWEEK 7L
 | #define DAYSPERNYEAR365L
 | #define DAYSPERLYEAR366L
 | #define SECSPERHOUR (SECSPERMIN * MINSPERHOUR)
 | #define SECSPERDAY  (SECSPERHOUR * HOURSPERDAY)
 | #define MONSPERYEAR 12L
 | #define EPOCH_YEAR  1970L

Why are they all to be long ? The only one that has even the 
slightest potential for that need (and which is currently 
defined as long for the userspace definitions) is SECSPERDAY, 
and that's only to cope with the possibility that int is 16 bits 
(which I don't think NetBSD supports at all, since there is no 
pdp11 port - but is kept that way for API consistency.)


kre's analysis is correct.  I'd just define them all as plain 
numbers, without any L, U, or UL suffix.  I'd probably 
also use 3600 and 86400 for SECSPERHOUR and SECSPERDAY, to avoid 
surprises in the arithmetic.


For an example of an unwanted surprise, consider (SECSPERHOUR 
* HOURSPERDAY) or ((60 * 60) * 24) on a machine with 16-bit 
ints: the desired result of 86400 is too large to represent in 16 
bits, which causes undefined behaviour.  NetBSD doesn't support 
any machines with 16-bit int, but this is the sort of code where 
it's easy to accommodate such machines, so we might as well do it.


--apb (Alan Barrett)


Re: build kernel from source

2014-09-21 Thread Alan Barrett

On Sun, 21 Sep 2014, bycn82 wrote:

*Command**:*
cd /usr/src; ./build.sh -O /usr/obj -U -j 8 tools kernel=NB6 modules 
distribution sets


I see you didn't pass -T ${TOOLDIR} option to build.sh


*Result:*
configure: creating ./config.status
config.status: creating host-mkdep
chmod +x host-mkdep
#   install  /tooldir.NetBSD-6.1.4-amd64/bin/nbhost-mkdep
mkdir -p /tooldir.NetBSD-6.1.4-amd64/bin
/usr/src/tools/binstall/xinstall -c  -r -m 555 host-mkdep 
/tooldir.NetBSD-6.1.4-amd64/bin/nbhost-mkdep
make: exec(/usr/src/tools/binstall/xinstall) failed (No such file or 
directory)

*** Error code 1


The install line suggests that some part of the build thinks 
that the TOOLDIR is /tooldir.NetBSD-6.1.4-amd64, which makes 
no sense, and the invocation of /usr/src/tools/binstall/xinstall 
suggests that it thinks you are not using an OBJDIR, which also 
makes no sense.


Were there any error or warning messages printed by build.sh 
before it got to === Updated makewrapper: ? What TOOLDIR path 
did build.sh print?



_*Try to build the npfctl command*_


If you can't build tools, nothing else is going to work, but ...


# cd /usr/src/usr.sbin/npf/npfctl/
# pwd
/usr/src/usr.sbin/npf/npfctl
# make
#   lex  npfctl/npf_scan.c
/usr/src/tooldir.NetBSD-6.1.4-amd64/bin/nblex-onpf_scan.c npf_scan.l
make: exec(/usr/src/tooldir.NetBSD-6.1.4-amd64/bin/nblex) failed (No 
such file or directory)

*** Error code 1


... now it seems to think that your TOOLDIR is 
/usr/src/tooldir.NetBSD-6.1.4-amd64, not the 
/tooldir.NetBSD-6.1.4-amd64 that appeared earlier.


How to change the NetBSD6.1.4? I am using the current development 
version of source!


The host platform name is embedded in the default TOOLDIR name, so 
it's fine for it to say NetBSD-6.1.4.


--apb (Alan Barrett)


Re: build kernel from source

2014-09-21 Thread Alan Barrett

On Mon, 22 Sep 2014, bycn82 wrote:

*Now I am building the npfctl, and I met below**
*
-Wsign-compare -Wformat=2 -Werror -I/usr/src/usr.sbin/npf/npfctl 
--sysroot=/  -c /usr/src/usr.sbin/npf/npfctl/npf_bpf_comp.c

In file included from /usr/src/usr.sbin/npf/npfctl/npf_bpf_comp.c:56:0:
/usr/src/usr.sbin/npf/npfctl/npf_bpf_comp.c: In function 'npfctl_bpf_table':
/usr/src/usr.sbin/npf/npfctl/npf_bpf_comp.c:610:21: error: 'BPF_COP' 
undeclared (first use in this function)

  BPF_STMT(BPF_MISC+BPF_COP, NPF_COP_TABLE),
^
/usr/src/usr.sbin/npf/npfctl/npf_bpf_comp.c:610:21: note: each 
undeclared identifier is reported only once for each function it 
appears in

*** Error code 1

Stop.
make: stopped in /usr/src/usr.sbin/npf/npfctl


I suspect that you have a corrupted source tree.  Please take this 
to current-users, not tech-kern.



*Can someone told me which header has the declaration of the BPF_COP ? **
**I found below only.*
# grep -R BPF_COP /usr/src
/usr/src/doc/CHANGES.prev:kernel: Add BPF coprocessor support 
(BPF_COP/BPF_COPX instructions).


It's in src/sys/net/bpf.h.  If grep didn't find it then you have 
an incomplete source tree.  Please fix that, and then if you still 
have build problems, ask in current-users.


--apb (Alan Barrett)


Re: Testing 7.0 Beta: FFS still very slow when creating files

2014-08-26 Thread Alan Barrett

On Tue, 26 Aug 2014, Robert Elz wrote:

 |  memcmp is only supposed to provide the correct sign, not
 |  the difference.
 | true, but that's not what memcmp(9) says.

This is a normal problem with man pages - they're written to 
document what the code actually does, then interpreted as a 
specification of what the code is required to do.  Man pages 
should be the former, the latter is the job of standards docs.


Often, there are no standards docs, and the man page has to serve 
as both a specification of the parts of the interface that users 
can depend on, and documentation of what the code actually does. 
For example, it's possible to document returns -ve, 0, or +ve 
in one part of the man page, as an interface specification, and 
returns the difference in another part of th man page, as an 
implementation note.


If anything needs changing, it would be to make it more clear 
that the man pages should not be interpreted as an interface 
specification, but as a statement of what the implementations 
actually do - not to be interpreted as a promise that they will 
always do that - for what can be relied upon a reference should 
be made to the relevant standard (which can be POSIX (or IEEE 
for C, or anyone else), or POSIX (etc) as amended by NetBSD, or 
a NetBSD private standard for stuff that either isn't documented 
by anyone else's standards doc, or where NetBSD's version has 
simply decided to be different.


In cases where there really is a standard that can be referred to, 
that might work, but I like to have all the information in one 
place.  If it's easy for the NetBSD man page to say both what's 
promised, and what is actually done, then I would like it to do 
so.  I think that this helps both people using the interface and 
people changing the implementation.


--apb (Alan Barrett)


Re: fdiscard error cases

2014-07-27 Thread Alan Barrett

On Sun, 27 Jul 2014, David Holland wrote:
It was pointed out that it would be well to distinguish devices 
that don't currently support discard, but theoretically should 
(because they're disks) from devices where it makes no sense 
(e.g. ttys). This is probably a good idea.


For fdiscard, I think the following errno values are likely to be 
relevant:


ENOSYS: The operation is not supported at all.  e.g. a kernel 
module has not been loaded, or a build option was not enabled.


ENOTTY: It doesn't make sense to ask this driver layer to perform 
this operation.  e.g. disk operation on a file or non-disk device.


ENODEV: It does make sense to ask, but this device (or this driver 
layer) doesn't support this operation.  e.g. this device or file 
system doesn't implement discard.  This doesn't distinguish 
between not supported by driver and not supported by underlying 
hardware.


EINVAL: The arguments don't make sense.  e.g. null pointer, or an 
invalid combination of flags, or a length or offset out of bounds.


EPERM: You don't have permission, but perhaps a process with different
credentials might success.

EACCESS: You don't have permission, but the problem is not in your 
credentials, the problem is in the way the object was opened. 
e.g. write on file opened with O_RDONLY.



Another option is to add a new errno for Operation not implemented on
this object or the like, to be a bit clearer about the distinction
between not appropriate and not implemented and maintain the
distinction between not implemented at the syscall level and not
implemented on a particular backend entity. But, adding errnos is not
something to do lightly...


I think the ENOTTY/ENODEV distinction is enough.

Many existing drivers or subsystems use ENODEV where I think 
ENOTTY or EINVAL or some other error would be more appropriate, but 
if you are careful to make the distinction for your new syscalls 
then you can document it.


--apb (Alan Barrett)


More detailed build infomation in kernels

2014-07-23 Thread Alan Barrett
I have some private patches that append arbitrary additional 
information to the kernel version string.  Essentially, I pass 
BUILDINFO=multi-line message here in the environment (through 
build.sh and the make wrapper), and then a modified version of 
src/sys/conf/newvers.sh appends it to the value of the version 
variable in the vers.c file that's compiled into the kernel.  I 
also add the information to /etc/release.


The additional information is exposed in sysctl kern.version, and 
in {struct utsname}.version as returned by uname(3), and in the 
output from uname(1) -v.


I use this feature to add infomation about the source tree
and build date, so I see information like this:

$ sysctl kern.version
kern.version = NetBSD 6.99.47 (APB)
fossil repository: apb-local-src.fossil
fossil tags: local
fossil commit: 449e51b700 (2014-07-19 15:41:14 UTC)
fossil comment: merge src from trunk as of 2014-07-19 00:00 UTC

I imagine that it would be useful for official builds to
include some sort of official statement here.

The multi-line BUILDINFO strings are truncated and folded to a 
single line by uname(3), which is unhelpful, so I am inclined to 
store them in a new kernel variable, exposed via a new sysctl 
node, instead of appending to the existing kernel version 
variable.  Then the new information would not be exposed by 
uname(3) or uname(1).


Comments?

--apb (Alan Barrett)


Re: Vnode API change: add global vnode cache

2014-04-08 Thread Alan Barrett

On Tue, 08 Apr 2014, Mindaugas Rasiukevicius wrote:
Nothing [in NetBSD] really uses intern.  Perhaps not a great 
naming, but other subsystems usually just use get.


Yes, that's a good argument for just using get.

--apb (Alan Barrett)


Re: Vnode API change: add global vnode cache

2014-04-07 Thread Alan Barrett

On Mon, 07 Apr 2014, Mindaugas Rasiukevicius wrote:

Taylor R Campbell campbell+netbsd-tech-k...@mumble.net wrote:

   What is intern?

`Intern' means `lookup, or create and insert if not there'.


The point being is that I do not find it meaningful/intuitive.  Many other
systems just use get().  If you want more accurate name, I suggest conget()
or something more meaningful.


I would find conget confusing, while finding intern clear.

Essentially, to intern a string or an external representation 
of an an object, means to create an internal representation of 
the string or object, or to find an already existing internal 
representation of an identical object, and (usually) to return a 
reference to that internal representation.  The wikipedia article 
at https://en.wikipedia.org/wiki/String_interning is focused on 
strings, but other objects can also be interned.


--apb (Alan Barrett)


Core statement on sysctl 32-bit/64-bit changes

2014-03-26 Thread Alan Barrett

The NetBSD core group has considered the sysctl changes
made by David Laight on 23 and 27 Feb 2014 (see
http://mail-index.netbsd.org/source-changes/2014/02/23/msg051946.html,
http://mail-index.netbsd.org/source-changes/2014/02/27/msg052131.html, and
http://mail-index.netbsd.org/source-changes/2014/03/01/msg052200.html),
and objections raised by Andreas Gustafsson (see
http://mail-index.netbsd.org/source-changes-d/2014/03/05/msg006587.html,
http://mail-index.netbsd.org/tech-kern/2014/03/05/msg016706.html,
and subsequent discussion in the source-changes-d and tech-kern
lists).

We note that the following sysctl nodes (for the i386 and amd64 ports)
have been changed from CTLTYPE_INT (a 32-bit signed integer) to
CTLTYPE_QUAD (a 64-bit unsigned integer):

   machdep.fpu_present
   machdep.osfxsr
   machdep.sse
   machdep.sse2
   machdep.biosbasemem
   machdep.biosextmem

Both binary and source code compatibility are important for NetBSD.
If the types of sysctl variables are changed, then we would want it to
be done in such a way that old code continues to work.  We note that
there has been an attempt to provide compatibility, by allowing 32-bit
sysctl variables to be read into 64-bit buffers, and vice versa, but
we are concerned that this mechanism was introduced without prior
discussion, and there may be cases where compatibility is lost.

Several of the affected sysctl variables appear to be essentially
boolean in nature, and there appears to be no good reason for the
variables to be exposed to userland as 64-bit values.  It's not clear
why variables that are logically boolean (such as machdep.fpu_present)
were ever defined as CTLTYPE_INT instead of CTLTYPE_BOOL, but it does
seem clear that changing them to CTLTYPE_QUAD is not an improvement.

Two of the variables, machdep.biosbasemem and machdep.biosextmem,
represent memory sizes, and it is possible that 32 bits might not
be large enough for them.  For these variables, changing the size
to 64 bits, or adding new 64-bit variables in parallel with the
old variables, may be appropriate.  In the past, new 64-bit sysctl
variables hw.physmem64 and hw.usermem64 were introduced in parallel to
the older 32-bit variables hw.basemem and hw.usermem.  We have heard
suggestions that the same should be done for machdep.biosbasemem and
machdep.biosextmem, if 32 bits is not sufficient for those variables.
We have also heard suggestions that increasing the size of the
variables would be preferable to adding new variables with different
names, provided that it was done in a compatible way.

The core group now recommends as follows:

1. In the short term, the affected sysctl variables should be changed
back to their original type, and the compatibility code should be
removed.

2. sysctl variables should not be wider than necessary.  If there is
no need for the existing variables to be made wider than 32 bits, then
they should not be made wider than 32 bits.

3. If some existing variables need to be made wider, then
consideration should be given to either:

a) adding new 64-bit sysctl variables in parallel to the existing
   32-bit variables (such as adding a new machdep.biosextmem64 in
   parallel to the existing machdep.biosextmem); or

b) adding new infrastructure for 32-bit/64-bit compatibility, and
   using that infrastructure.

4. If new infrastructure is considered, to allow reading 64-bit sysctl
variables into 32-bit buffers, then the design and implementation
should be discussed in public.  Some considerations that we would like
to see addressed are:

a) What types are needed?  Currently, CTLTYPE_INT is a signed
   32-bit type, and CTLTYPE_QUAD is an unsigned 64-bit type.
   Perhaps all four possible combinations of signed/unsigned and
   32 bits/64 bits should be supported.

b) Should the ability to read values with a different size apply
   to all sysctl variables, or only to those defined in a special
   way?  For example, there could be a new CTLFLAG_COMPAT32 flag
   that allows reading a 64-bit value into a 32-bit buffer.

c) What is the appropriate error return when a 64-bit value is too
   large to fit in a user-provided 32-bit buffer?

d) Will old code still work without change?

e) Will new userland code be able to detect the presence of wider
   sysctl variables with 32-bit compatibility?

f) Is coordination with other projects using the sysctl(3) or
   sysctl(9) interface needed?

g) Are the new interfaces adequately documented?

--
Alan Barrett,
on behalf of the NetBSD core group


Re: DIOCGDISKINFO support for vnd

2014-03-11 Thread Alan Barrett

On Tue, 11 Mar 2014, Patrick Welche wrote:

The attached trivial patch allows vnd(4) to support generic disk ioctls.
The only one in kern/subr_disk.c at the moment is DIOCGDISKINFO.

Before:
$ ./vndtest /dev/vnd0a 
vndtest: DIOCGDISKINFO: Inappropriate ioctl for device


After:
$ ./vndtest /dev/vnd0a
size of /dev/vnd0a: 524288 bytes


That's good, but ...


default:
-   return ENOTTY;
+   error = disk_ioctl(vnd-sc_dkdev, cmd, data, flag, l);
+   if (error == EPASSTHROUGH)
+   return ENOTTY;
+   else
+   return error;


I think there's no need to translate EPASSTHROUGH to ENOTTY here; 
that translation will be done by sys_ioctl() before returning 
to userland.  Also, several other disk drivers have their ioctl 
handlers call disk_ioctl early (see fdioctl, wdioctl, sdioctl, 
dkioctl, raidioctl, among others), and it's not clear why vndioctl 
doesn't do that.


--apb (Alan Barrett)


Re: Closing a serial device takes one second

2014-02-07 Thread Alan Barrett

On Fri, 07 Feb 2014, Marc Balmer wrote:

Am 06.02.14 17:18, schrieb Marc Balmer:

fd = open(/dev/dty03, O_RDWR);  /* returns immediately */
close(fd); /* returns after one second */


So it is clear now that the delay is there for a specific 
application case:  A modem on a tty line that hangs up when DTR 
is gone for one second.  For probably all other use cases the 
delay is not necessary.


Yes, for many use cases, the delay is not necessary.  If you have 
one of those use cases then you can clear the termios(4) HUPCL 
flag.


For use cases where the delay is desired, it would be better if 
the delay was neither unconditionally inserted at close time, not 
unconditionally inserted at open time, but rather if the delay was 
inserted only when necessary, such as when a close (with HUPCL 
set) is followed very soon by an open.


In conclusion, the delay probably comes from times long 
gone when people used dial-in modems.  For modern serial 
applications they are more a nuisance.  Is it time to switch the 
default to no delay?


I think this idea should be rephrased in terms of changing the default
termios(4) flags.

--apb (Alan Barrett)


Re: amd64 kernel, i386 userland

2014-01-25 Thread Alan Barrett

On Fri, 24 Jan 2014, Alan Barrett wrote:

I have successfully used magic symlinks (see symlink(7)) to allow
i386 and amd64 to use different instances of /dev.  The basic scheme is:

   Build a kernel with options MAGICLINKS, or arrange to run
   sysctl -w vfs.generic.magiclinks=1 very early in /etc/rc.
   Putting the setting in /etc/sysctl.conf will probably be too late.

   mkdir /dev.i386
   mkdir /dev.amd64
   copy the i386 version of MAKEDEV to /dev.i386/MAKEDEV
   copy the amd64 version of MAKEDEV to /dev.amd64/MAKEDEV
   ( cd /dev.i386  sh ./MAKEDEV all )
   ( cd /dev.amd64  sh ./MAKEDEV all )
   mv /dev /dev.old  ln -sf dev.@machine /dev

   reboot.  If it works then rm -rf /dev.old.


Oh, I forgot to address the issue of booting without options 
MAGICLINKS in the kernel.  No matter how early in /etc/rc you try 
to put the sysctl -w vfs.generic.magiclinks=1 command, init(8) 
will want to open /dev/console earlier than that.  So you either 
have to enable magiclinks in the kernel (so it's already enabled 
before init(8) starts), or you have to arrange for /dev/console to 
work even before magiclinks are enabled via the sysctl command.


Adding a symlink from /dev.@machine to dev.i386 works for this 
(taking dev.@machine literally instead of as a magic expansion):


ln -s dev.i386 /dev.@machine

This is good enough for /dev/console and /dev/null, because the 
amd64 and i386 versions of those device nodes are identical.


--apb (Alan Barrett)


Re: amd64 kernel, i386 userland

2014-01-25 Thread Alan Barrett
 (0x801b, 0x3b)
wd3m:   device (0x801c, 0x3c)
wd3n:   device (0x801d, 0x3d)
wd3o:   device (0x801e, 0x3e)
wd3p:   device (0x801f, 0x3f)
wd4a:   device (0x20, 0x40)
wd4b:   device (0x21, 0x41)
wd4c:   device (0x22, 0x42)
wd4d:   device (0x23, 0x43)
wd4e:   device (0x24, 0x44)
wd4f:   device (0x25, 0x45)
wd4g:   device (0x26, 0x46)
wd4h:   device (0x27, 0x47)
wd4i:   device (0x8020, 0x48)
wd4j:   device (0x8021, 0x49)
wd4k:   device (0x8022, 0x4a)
wd4l:   device (0x8023, 0x4b)
wd4m:   device (0x8024, 0x4c)
wd4n:   device (0x8025, 0x4d)
wd4o:   device (0x8026, 0x4e)
wd4p:   device (0x8027, 0x4f)
wd5a:   device (0x28, 0x50)
wd5b:   device (0x29, 0x51)
wd5c:   device (0x2a, 0x52)
wd5d:   device (0x2b, 0x53)
wd5e:   device (0x2c, 0x54)
wd5f:   device (0x2d, 0x55)
wd5g:   device (0x2e, 0x56)
wd5h:   device (0x2f, 0x57)
wd5i:   device (0x8028, 0x58)
wd5j:   device (0x8029, 0x59)
wd5k:   device (0x802a, 0x5a)
wd5l:   device (0x802b, 0x5b)
wd5m:   device (0x802c, 0x5c)
wd5n:   device (0x802d, 0x5d)
wd5o:   device (0x802e, 0x5e)
wd5p:   device (0x802f, 0x5f)
wd6a:   device (0x30, 0x60)
wd6b:   device (0x31, 0x61)
wd6c:   device (0x32, 0x62)
wd6d:   device (0x33, 0x63)
wd6e:   device (0x34, 0x64)
wd6f:   device (0x35, 0x65)
wd6g:   device (0x36, 0x66)
wd6h:   device (0x37, 0x67)
wd6i:   device (0x8030, 0x68)
wd6j:   device (0x8031, 0x69)
wd6k:   device (0x8032, 0x6a)
wd6l:   device (0x8033, 0x6b)
wd6m:   device (0x8034, 0x6c)
wd6n:   device (0x8035, 0x6d)
wd6o:   device (0x8036, 0x6e)
wd6p:   device (0x8037, 0x6f)
wd7a:   device (0x38, 0x70)
wd7b:   device (0x39, 0x71)
wd7c:   device (0x3a, 0x72)
wd7d:   device (0x3b, 0x73)
wd7e:   device (0x3c, 0x74)
wd7f:   device (0x3d, 0x75)
wd7g:   device (0x3e, 0x76)
wd7h:   device (0x3f, 0x77)
wd7i:   device (0x8038, 0x78)
wd7j:   device (0x8039, 0x79)
wd7k:   device (0x803a, 0x7a)
wd7l:   device (0x803b, 0x7b)
wd7m:   device (0x803c, 0x7c)
wd7n:   device (0x803d, 0x7d)
wd7o:   device (0x803e, 0x7e)
wd7p:   device (0x803f, 0x7f)
wsfont: device (0x5100, 0x5600)
xbd0i:  device (0x80008e00, 0x8e08)
xbd0j:  device (0x80008e01, 0x8e09)
xbd0k:  device (0x80008e02, 0x8e0a)
xbd0l:  device (0x80008e03, 0x8e0b)
xbd0m:  device (0x80008e04, 0x8e0c)
xbd0n:  device (0x80008e05, 0x8e0d)
xbd0o:  device (0x80008e06, 0x8e0e)
xbd0p:  device (0x80008e07, 0x8e0f)
xbd1a:  device (0x8e08, 0x8e10)
xbd1b:  device (0x8e09, 0x8e11)
xbd1c:  device (0x8e0a, 0x8e12)
xbd1d:  device (0x8e0b, 0x8e13)
xbd1e:  device (0x8e0c, 0x8e14)
xbd1f:  device (0x8e0d, 0x8e15)
xbd1g:  device (0x8e0e, 0x8e16)
xbd1h:  device (0x8e0f, 0x8e17)
xbd1i:  device (0x80008e08, 0x8e18)
xbd1j:  device (0x80008e09, 0x8e19)
xbd1k:  device (0x80008e0a, 0x8e1a)
xbd1l:  device (0x80008e0b, 0x8e1b)
xbd1m:  device (0x80008e0c, 0x8e1c)
xbd1n:  device (0x80008e0d, 0x8e1d)
xbd1o:  device (0x80008e0e, 0x8e1e)
xbd1p:  device (0x80008e0f, 0x8e1f)
xbd2a:  device (0x8e10, 0x8e20)
xbd2b:  device (0x8e11, 0x8e21)
xbd2c:  device (0x8e12, 0x8e22)
xbd2d:  device (0x8e13, 0x8e23)
xbd2e:  device (0x8e14, 0x8e24)
xbd2f:  device (0x8e15, 0x8e25)
xbd2g:  device (0x8e16, 0x8e26)
xbd2h:  device (0x8e17, 0x8e27)
xbd2i:  device (0x80008e10, 0x8e28)
xbd2j:  device (0x80008e11, 0x8e29)
xbd2k:  device (0x80008e12, 0x8e2a)
xbd2l:  device (0x80008e13, 0x8e2b)
xbd2m:  device (0x80008e14, 0x8e2c)
xbd2n:  device (0x80008e15, 0x8e2d)
xbd2o:  device (0x80008e16, 0x8e2e)
xbd2p:  device (0x80008e17, 0x8e2f)
xbd3a:  device (0x8e18, 0x8e30)
xbd3b:  device (0x8e19, 0x8e31)
xbd3c:  device (0x8e1a, 0x8e32)
xbd3d:  device (0x8e1b, 0x8e33)
xbd3e:  device (0x8e1c, 0x8e34)
xbd3f:  device (0x8e1d, 0x8e35)
xbd3g:  device (0x8e1e, 0x8e36)
xbd3h:  device (0x8e1f, 0x8e37)
xbd3i:  device (0x80008e18, 0x8e38)
xbd3j:  device (0x80008e19, 0x8e39)
xbd3k:  device (0x80008e1a, 0x8e3a)
xbd3l:  device (0x80008e1b, 0x8e3b)
xbd3m:  device (0x80008e1c, 0x8e3c)
xbd3n:  device (0x80008e1d, 0x8e3d)
xbd3o:  device (0x80008e1e, 0x8e3e)
xbd3p:  device (0x80008e1f, 0x8e3f)

--apb (Alan Barrett)




Re: amd64 kernel, i386 userland

2014-01-25 Thread Alan Barrett

On Sat, 25 Jan 2014, Emmanuel Dreyfus wrote:

Alan Barrett a...@cequrux.com wrote:

I see the following differences from this mtree comparison:


As I said, if your only filesystem is root on raid0a and your 
swap is on sd0b/wd0b, you boot to multiuser without touching 
/dev


Perhaps you are thinking of some other scenario, but I am talking 
about the scenario that exists if you follow the steps in my first 
message, and do not follow the steps in my second message, and use 
a kernel that does not have options MAGICLINKS.  In such a case, 
any attempt to cd /dev or open /dev/console will fail, because 
/dev will be a dangling symlink.


It doesn't matter how similar the i386 and amd64 versions of /dev 
are; if /dev is a dangling symlink then nothing really works.


Try it yourself:

mkdir /dev.i386
mkdir /dev.amd64
copy the i386 version of MAKEDEV to /dev.i386/MAKEDEV
copy the amd64 version of MAKEDEV to /dev.amd64/MAKEDEV
( cd /dev.i386  sh ./MAKEDEV all )
( cd /dev.amd64  sh ./MAKEDEV all )
mv /dev /dev.old  ln -sf dev.@machine /dev

and then boot a kernel that does *NOT* have options MAGICLINKS.
/dev will be a dangling symlink, and /dev/console will not be found,
and init(8)'s attempt to (cd /dev/ sh ./MAKEDEV init) will fail.

--apb (Alan Barrett)


Re: amd64 kernel, i386 userland

2014-01-25 Thread Alan Barrett

On Sat, 25 Jan 2014, Thor Lancelot Simon wrote:
Perhaps you are thinking of some other scenario, but I am 
talking about the scenario that exists if you follow the 
steps in my first message, and do not follow the steps in my 
second message, and use a kernel that does not have options 
MAGICLINKS.  In such a case, any attempt to cd /dev or open 
/dev/console will fail, because /dev will be a dangling 
symlink.


init should detect this (doesn't it already?) and mount a tmpfs 
/dev.


init detects the absence of /dev/console, and tries to mount a 
tmpfs /dev, but the first step in that process is chdir(/dev), 
which fails when /dev is a dangling symlink.


--apb (Alan Barrett)


Re: amd64 kernel, i386 userland

2014-01-24 Thread Alan Barrett

On Fri, 24 Jan 2014, matthew green wrote:

i386 and amd64 do NOT have compatible /dev.  if you boot an amd64
kernel, make sure you run an amd64 MAKEDEV in /dev.


I have successfully used magic symlinks (see symlink(7)) to allow
i386 and amd64 to use different instances of /dev.  The basic scheme is:

Build a kernel with options MAGICLINKS, or arrange to run
sysctl -w vfs.generic.magiclinks=1 very early in /etc/rc.
Putting the setting in /etc/sysctl.conf will probably be too late.

mkdir /dev.i386
mkdir /dev.amd64
copy the i386 version of MAKEDEV to /dev.i386/MAKEDEV
copy the amd64 version of MAKEDEV to /dev.amd64/MAKEDEV
( cd /dev.i386  sh ./MAKEDEV all )
( cd /dev.amd64  sh ./MAKEDEV all )
mv /dev /dev.old  ln -sf dev.@machine /dev

reboot.  If it works then rm -rf /dev.old.

--apb (Alan Barrett)


Re: amd64 kernel, i386 userland

2014-01-21 Thread Alan Barrett

On Tue, 21 Jan 2014, Emmanuel Dreyfus wrote:
In order to have more RAM without reinstalling everything, 
using an amd64 kernel on to of a i386 userland seems 
appealing. netbsd32 emulation works fine, and the machine boots 
to multiuser without a hassle.


But there are minor problems, with binaries that use ioctl to 
talk with a kernel subsystem: I spoted ipf and raidctl.


If there are particular ioctls that don't have proper netbsd32 
compat equivalents, they can be fixed on a case by case basis. 
We'd need to start with a list of the problematic ioctls, either 
by noticing what fails, or by systematically searching for ioctls 
whose data includes fields that change size between 32-bit and 
64-bit systems.


I think that we should also avoid adding such problematic kernel 
interfaces in future, by using fixed width types wherever 
possible.


Things would be much easier if the kernel searched 
/emul/netbsd64 before / for native binaries. Of course such 
a behavior cannot be made default because of the performance 
penalty. But a compile time option would be nice without causing 
any performance harm to people that do not want it.


I have also sometimes wished for that.

--apb (Alan Barrett)


Re: Vnode API cleanup pass 2a

2014-01-15 Thread Alan Barrett

On Wed, 15 Jan 2014, Taylor R Campbell wrote:

For that matter, why new machinery for this versioning stuff at all?
Why not just rename the vop from mkdir to mkdir_v2?  That would take
care of both struct vop_mkdir_v2_args and VOP_MKDIR_V2.  Am I missing
something?


That would the calling code ugly.

--apb (Alan Barrett)


Re: qsort_r

2013-12-09 Thread Alan Barrett

On Mon, 09 Dec 2013, Mouse wrote:
I actually don't see anything that promises that a pointer to a 
function type may be converted to a pointer to void, nor back 
again (except, in each direction, when the original pointer is 
nil), much less promising anything about the results if it is 
done.  But I haven't read over the whole thing recently enough 
to be sure there isn't such a promise hiding somewhere.


Sorry, I did not express myself clearly enough.  C does not 
promise that function pointers can be converted to or from void* 
pointers, but I believe that all existing NetBSD implementations 
do allow such conversions.


--apb (Alan Barrett)



Re: qsort_r

2013-12-08 Thread Alan Barrett

On Sun, 08 Dec 2013, David Holland wrote:
My irritation with not being able to pass a data pointer through 
qsort() boiled over just now. Apparently Linux and/or GNU 
has a qsort_r() that supports this; so, following is a patch 
that gives us a compatible qsort_r() plus mergesort_r(), and 
heapsort_r().


Apparently FreeBSD [1] and GNU [2] have incompatible versions 
of qsort_r, passing the extra 'thunk' or 'data' argument in a 
different position.


[1]: FreeBSD qsort_r http://www.manpagez.com/man/3/qsort_r/
[2]: Linux qsort_r  http://man7.org/linux/man-pages/man3/qsort.3.html

If we have to pick one, let's pick the FreeBSD version.

I have done it by having the original, non-_r functions 
provide a thunk for the comparison function, as this is least 
invasive. If we think this is too expensive, an alternative is 
generating a union of function pointers and making tests at the 
call sites; another option is to duplicate the code (hopefully 
with cpp rather than CP) but that seems like a bad plan.


I'd probably duplicate the code via CPP, to trade time for space, 
but your way is fine.


Note that the thunks use an extra struct to hold the function 
pointer; this is to satisfy C standards pedantry about void 
pointers vs. function pointers, and if we decide not to care it 
could be simplified.


That adds more run-time overhead.  Could you make it conditional 
on whether it's really necessary?  All existing NetBSD platforms 
can convert back and forth between void * and function pointers 
without any trouble.


--apb (Alan Barrett)


Re: qsort_r

2013-12-08 Thread Alan Barrett

On Sun, 08 Dec 2013, Mouse wrote:

Is just casting the function pointers safe in C


No.  As soon as you call through a pointer to the wrong type 
you're off in nasal demon territory.  (Loosely put; I'd have to 
look up the exact wording - there is a little wiggle room, but, 
if I've understood the subject of the discussion correctly, not 
enough.)


You can't call through a function pointer of the wrong type, but 
you can cast from one type to another.  I think that's enough, 
provided that void * is large enough to be converted to and from a 
function pointer.


If you can find me a description of what NetBSD assumes beyond 
what C promises, I can have a stab at answering that question.


There is no such list.  That's a bug in NetBSD's documentation.

--apb (Alan Barrett)


Re: [patch] put ptrdiff_t in the kernel and create sys/stddef.h

2013-12-04 Thread Alan Barrett

On Wed, 04 Dec 2013, David Holland wrote:

(*) A complete scheme for doing it right removes all the _BSD_FOO_T_
drivel and ifdefs scattered in userland headers in favor of:
  - a single header file that defines all the needed types prefixed
with __, which can be included anywhere;
  - in userland, include-guarded header files akin to sys/null.h
that define single or common groups of the names without the
__ prefixes, e.g. types/size_t.h;
  - including these header files in the proper places, such as in
standard userland header files like stddef.h;
  - in the kernel, a single header file that defines all the types
without the __, that is or is exposed to sys/types.h but does
not affect userland.


Yes, that's one way of doing it right.

Until such time as somebody does it right, please follow the 
pattern of what's done already.


--apb (Alan Barrett)


Re: [patch] changing lua_Number to int64_t

2013-11-17 Thread Alan Barrett

On Sun, 17 Nov 2013, Mouse wrote:
sizeof returns the number of bytes used to store an object. 
This is only loosely related to the number of data bits in the 
object; the latter is no more than sizeof the object times 
CHAR_BIT, but it may be lower.


Also, using an exact-width type assumes that the 
hardware/compiler in question _has_ such a type.


Yes, that's true of C.

It's possible that lua, NetBSD, or the combination of the two is 
willing to write off portability to machines where one or both 
of those potential portability issues becomes actual.  But that 
seems to be asking for trouble to me; history is full of but 
nobody will ever want to port this to one of _those_ that come 
back to bite people.


NetBSD already assumes that char is exactly 8 bits, and that 
integer types with exactly 16, 32, and 64 bits exist.  Adding more 
instances of the same assumptions doesn't seem like a big problem 
to me.  If there's ever a need to port to a machine where those 
assumptions do not hold, then we can worry about it at that time, 
but I susect that it will be possible to change to using things 
like int_least64_t (for a type with no less than 64 bits) instead 
of int64_t (for a type with exactly 64 bits).


--apb (Alan Barrett)


Re: pulse-per-second API status

2013-11-02 Thread Alan Barrett

On Fri, 01 Nov 2013, Greg Troxel wrote:
But if NetBSD enables PPS on ucom, there's going to be an 
expectation that it is good enough for stratum-1 timekeeping, 
like PPS on real serial ports.


I don't think there's any such expectation created.
[...]
People who expect the same as serial PPS are confused, and we 
are not responsible for that.


I think that PPS on a device with very high interrupt latency is 
sufficiently similar to PPS on a device with low interrupt latency 
that it deserves to have the same API.  I don't think it even 
needs a sysctl to enable it.


I think that it just needs careful documentation, in ucom(4) and 
wherever we document the PPS API.  Maybe the documentation for 
applications like ntpd should also warn against using PPS on USB 
interfaces.


--apb (Alan Barrett)


Re: pulse-per-second API status

2013-11-02 Thread Alan Barrett

On Fri, 01 Nov 2013, paul_kon...@dell.com wrote:
I don't know this API.  But my first reaction when I saw the 
designation PPS is to think of GPS timekeeping boxes and other 
precision frequency sources that have a PPS output.  On those 
devices, the PPS output is divided down from the main oscillator 
frequency, i.e., you can expect accuracies of 10^-9 for modest 
price crystal oscillators, 10^-10 to 10^-12 for higher end stuff 
-- and jitter in the nanosecond range or better.


It seems rather confusing to have another interface that goes by 
the same name but has specs 6 or more orders of magnitude worse. 
How about a different name that avoids this confusion?


It's exactly the same interface.  Something in the external 
timekeeping box is hooked up to one of the modem control lines on 
a serial port; the modem control line is hooked up to an interrupt 
(or something like an interrupt); the interrupt is hooked up to 
something in the kernel that records the time that the interrupt 
occurred.


The difference is only one of interrupt latency.  With plain old 
serial ports, the modem control line can be hooked up to a CPU 
interrupt pin using low-latency electronics.  With USB, if I have 
understood correctly, the interrupt is faked by some sort of 
polling interface with much higher latency and jitter.


--apb (Alan Barrett)


Re: zero-length symlinks

2013-11-02 Thread Alan Barrett

On Fri, 01 Nov 2013, David Holland wrote:
rmind@ points out that it's possible to create zero-length 
symlinks.  As zero-length symlinks aren't sensible, this 
should probably be prohibited. Does anyone see any reason they 
shouldn't be?


Symlink names should satisfy all the rules for file system object 
names, so  should not be allowed.


Symlink targets are just strings.  They are usually used to store 
path names, but they can also be used to store arbitrary strings 
that can be read via readlink(2).  NetBSD's malloc implementation 
uses /etc/malloc.conf in this way, and I don't see a reason to 
prohibit it from using .


POSIX says The string pointed to by path1 shall be treated only 
as a character string and shall not be validated as a pathname. 
http://pubs.opengroup.org/onlinepubs/9699919799/functions/symlink.html


--apb (Alan Barrett)


Re: module path message

2013-10-30 Thread Alan Barrett

On Tue, 29 Oct 2013, John Nemeth wrote:

The default path for module loading is: /stand/amd64-xen/6.99.25/modules


I suggest exposing the path via sysctl, and printing the sysctl 
mib name in the message, something like


kern.module.path=/stand/amd64-xen/6.99.25/modules

--apb (Alan Barrett)


Re: changing KASSERT()'s definition for non-diag kernels

2013-10-20 Thread Alan Barrett

On Sun, 20 Oct 2013, matthew green wrote:
as part of the GCC 4.8 preparation work, we're seeing many new 
warnings where variables are only used inside KASSERT(), but the 
non-diag kernel builds trigger errors.


my solution, rather than marking these variables with __USE(), 
is to change KASSERT() into a real function that consumes its 
arguments, but is still an empty function.


That seems sensible to me.  More generally, a lot of our exiting 
macros can be rewritten as static inline functions, now that we 
require a C99 compiler.


note that there is a re-direction to force the input to 
KASSERT() to be an integer type, as it is called with all sorts 
of types of input (pointers, values, boolean expressions..)


The KASSERT macro can be invoked with anything that has a 
truth value as its first argument.  Casting that to int seems 
reasonable, but perhaps using (!!(e)) to convert any type to a 
truth value would be clearer and less likely to trigger compiler 
warnings about casting non-numeric types to int.


--apb (Alan Barrett)


Re: Why do we need lua in-tree again? Yet another call for actual evidence, please. (was Re: Moving Lua source codes)

2013-10-19 Thread Alan Barrett

On Sat, 19 Oct 2013, Marc Balmer wrote:
The inclusion and use of Lua in base, for use in userland and 
the kernel, [...] has, last but not least, core's blessing.


Would you please either present some evidence for that claim, or 
stop making the claim.


To the best of my knowledge, userland Lua was approved by core in 
2010, but kernel Lua has never been approved by core.



Can we now please stop this useless discussion?


People will continue to ask questions until they receive some 
satisfactory answers.


--apb (Alan Barrett)


Re: Why do we need lua in-tree again? Yet another call for actual evidence, please. (was Re: Moving Lua source codes)

2013-10-19 Thread Alan Barrett

On Fri, 18 Oct 2013, Lourival Vieira Neto wrote:
I have to point out that interesting work is commonly used as 
a sort of euphemism to refer to highly experimental work with 
unclear future.


Yes. But I'm talking about interesting *user* work. I'm not 
claiming that they should be in the kernel. I'm just saying 
that, IMHO, we should incorporate a small device driver that 
facilitates this kind of development (outside the tree).


You seem to want the lua device driver to be inside the tree, 
to facilitate experimental work outside the tree.  Other people 
have asked why the lua(4) device driver itself can't be developed 
outside the tree (with a view to importing it later, if it ever 
proves to be more than an experiment), and I have seen no good 
answer to that.


--apb (Alan Barrett)


Re: Why do we need lua in-tree again? Yet another call for actual evidence, please. (was Re: Moving Lua source codes)

2013-10-19 Thread Alan Barrett

On Sat, 19 Oct 2013, Marc Balmer wrote:

And now to give you a practical example what I personally do with lua(4)
right now:  In the past I wrote several tty line disciplines to decode
various serial formats.  Now I have a need for that again.  Doing this
in C is of course possible, but I want something more dynamic.  So I
wrote a tty line discipline that uses Lua to do all the decoding.  That
works like a charm:  Load the script, test, change the script and
reload.  Really practical.  I will release this code once I sorted out a
few remaining details.  And in the course of this work, I also found
deficencies in slattach(8).

In previous work I used Lua to create a software gpio device, a modified
version of gpiosim(4) that uses a Lua script to mimick a real device.
Also handy.


Thank you.  Those seem like useful example.

--apb (Alan Barrett)


Re: Moving Lua source codes

2013-10-17 Thread Alan Barrett

On Tue, 15 Oct 2013, Marc Balmer wrote:
Well, you are in contradiction to our guide, which under 
http://www.netbsd.org/releases/release-map.html#current states 
that NetBSD-current is the main development branch.


NetBSD-current the main development branch for things that we know 
we want, and that we are prepared to support for a long time, and 
that mostly work.  If any of those tests fail, then I'd say that 
the code should not be in -current, but could be in a branch or in 
pkgsrc or in some third party tree.


In the case of kernel Lua, some people are not convinced that we 
want it, and some people are not convinced that the API is stable 
enough that we should commit to long-term compatibility for it.


Although I think that developing in the main -current tree is 
acceptable (especially if users are told not to expect as much 
future compatibility as for most other parts of NetBSD), I would 
have preferred to see development in a branch.  It's certainly not 
the clear-cut no need for a branch situation that you seem to 
think.


--apb (Alan Barrett)


Re: Sending ATA commands?

2013-08-12 Thread Alan Barrett

On Sun, 11 Aug 2013, Mouse wrote:

What does your support do?  Does it let you write over the host
protected area?  Does it let you extract what's in there?


Yes and yes.  It simply removes the protection, letting the host see
the HPA as what it really is: more space appended to the space
advertised to HPA-unaware software.


I don't really like silently appending the host protected area 
to the unprotected part of the disk.  Exposing something that is 
supposed to be hidden could have unexpected consequences.


I think I'd prefer to present the HPA as a separate device (an 
ld(4) device, as others have suggested, would be fine), and add 
some ioctls and atactl commands to query and adjust the sizes of 
the ordinary and HPA parts of the disk.


--apb (Alan Barrett)


Re: marking kern_assert(9) as __dead, and recursive panics

2013-02-10 Thread Alan Barrett

On Sun, 10 Feb 2013, Alan Barrett wrote:

* Remove the panicstr test from kern_assert() in
 sys/lib/libkern/kern_assert.c, so that KASSERT, KASSERTMSG and
 friends do not degenerate to no-ops after a panic.

 I don't know a reason for making all kernel asserts degenerate
 to no-ops, but I imagine that it might have been a workaround
 for problems with recursive panics, and I propose to address
 recursive panics directly (see below).

 I can also imagine that there are particular kernel asserts
 that need to degenerate to no-ops after a panic, and I suggest
 explicitly rewriting them in terms of (panicstr != NULL ||
 other tests).  I have not attempted to identify such asserts.


People have informed me that, when debugging a kernel after a 
panic, they often want to call functions that may hit assertion 
failures, and the particular asserts cannot reasonably be 
identified in advance, so it's useful for all kernel asserts to 
degenerate to no-ops after a panic.


I will produce a revised proposal that retains this feature which 
people obviously want.  My current ideas are to print a message 
about the fact that the assertion failure was ignored (instead 
of silently ignoring the assertion failure), and to use ifdefs to 
allow static analysers to behave as if the assertion failures are 
never ignored.


--apb (Alan Barrett)


Re: fixing compat_12 getdents

2012-12-10 Thread Alan Barrett
also, EINVAL doesn't seem like a great error code for this 
condition.  it's not an input parameter that's causing the 
error, but rather that the required output format cannot express 
the data to be returned.  I think solaris uses EOVERFLOW for 
this kind of situation, and ERANGE doesn't seem too bad either. 
any opinions on that?


There's also E2BIG, but I don't think it fits.  ERANGE is 
documented in terms of the available space, while EOVERFLOW is 
documented in terms of a numeric result.  So perhaps EOVERFLOW 
for integer is too large to fit in N bits, and ERANGE for 
string is too long to fit in N bytes?  Or vice versa?


Somebody(TM) should go through the errno(2) documentation and make 
the descriptions more generic, and add guidance for choosing which 
code to return.


--apb (Alan Barrett)


Re: KNF and the C preprocessor

2012-12-10 Thread Alan Barrett

On Mon, 10 Dec 2012, David Young wrote:
What do people think about setting stricter guidelines for using 
the C preprocessor than the guidelines from the past?


Maybe.


The C preprocessor MUST NOT be used for

1 In-line code: 'static inline' subroutines are virtually always better
 than macros.


I disagree with this one.  If you tone it down to SHOULD NOT or 
prefer static inline functions where appropriate then I might 
agree, but MUST NOT is way too strict.  Sometimes the C standard 
mandates the use of macros, and I would not want to violate that 
simply to comply with your MUST NOT requirement.


For example, the ctype(3) API must be provided by extern 
functions, and may also be provided by macros; I don't see how 
the macros could be replaced by static inline functions without 
breaking something.  The first breakage that springs to mind is 
that adding parentheses around a name like (isalpha) is supposed 
to prevent it from being interpreted as a macro, so you get 
the function instead; but there's no analogous way to prevent 
something from being interpreted as a static inline function so 
you get the external function instead.



2 Configuration management: use the compiler  linker to a greater
 extent than the C preprocessor to configure your program for your
 execution environment, your chosen compilation options, et cetera.


Again here, MUST NOT is way too strict.  While I don't like *.c 
files littered with ifdefs, I think it's OK for *.c files to 
contain many macro invocations, and it's OK for header files to 
contain many ifdefs, but both these would be outlawed by your 
MUST NOT requirement.


--apb (Alan Barrett)


Re: core statement on fexecve, O_EXEC, and O_SEARCH

2012-12-04 Thread Alan Barrett

The fexecve function could be implemented entirely in libc,
via execve(2) on a file name of the form /proc/self/fd/N.
Any security concerns around fexecve() also apply to exec of
/proc/self/fd/N.

I gave a try to this approach. There is an unexpected issue:

The descriptor is probably already closed on exec before the syscall
tries to use it.


I believe that we should not fix that without a proper design 
of how all the parts will work together.


Some questions that I would like to see answered are: Should it 
be possible to exec a fd only if a special flag was used in the 
open(2) call?  Should the file's executability be checked at open 
time or at exec time, or both, or does it depend on open flags or 
on what happened to the fd in between open and exec?  Should the 
record of the fact that the fd may be eligible for exec be erased 
when the fd is passed from one process to another?  Always or only 
sometimes?  How can fds obtained from procfs be made to follow the 
rules?


--apb (Alan Barrett)


Re: FFS write coalescing

2012-12-03 Thread Alan Barrett

On Mon, 03 Dec 2012, Chuck Silvers wrote:

the genfs code also never writes clean pages to disk, even though for
RAID5 storage it would likely be more efficient to write clean pages
that are in the same stripe as dirty pages if that would avoid issuing
partial-stripe writes.  (which is basically another way of saying
what david said.)


Perhaps there should be a way for block devices to report at least three
block sizes:

a) smallest possible block size (512 for almost all disks)

b) smallest efficient block size and alignment (4k for modern disks,
stripe size for raid)

c) largest possible size (a device and bus-dependent variant of MAXPHYS)

Then the file system could use (b) to know when it's a good idea to
combine dirty and clean pages into the same write.

--apb (Alan Barrett)


core statement on fexecve, O_EXEC, and O_SEARCH

2012-11-25 Thread Alan Barrett

The NetBSD core group has considered adding the
fexecve(2) or fexecve(3) syscall or function, and adding
new O_EXEC and O_SEARCH open(2) flags.

These new features may be useful, but their security properties 
are not well understood.  The core group is of the opinion that 
these new features should not be added to NetBSD until there is 
a design that discusses their security properties, the way they 
interact with each other and existing features, and addresses the 
security concerns.


Designs that are slightly incompatible with other operating 
systems or with POSIX need not be ruled out; for example, it may 
be reasonable to make fexecve() fail if the fd was not opened with 
certain flags, or to automatically clear certain flags when the fd 
is passed from one process to another.


The fexecve function could be implemented entirely in libc, 
via execve(2) on a file name of the form /proc/self/fd/N. 
Any security concerns around fexecve() also apply to exec of 
/proc/self/fd/N.


If necessary, the open(2) syscall could be versioned so that 
O_RDONLY is no longer defined as zero.


--
Alan Barrett, on behalf of core


Re: [PATCH] POSIX extended API set 2

2012-11-11 Thread Alan Barrett

On Sun, 11 Nov 2012, Emmanuel Dreyfus wrote:

Taylor R Campbell campbell+netbsd-tech-k...@mumble.net wrote:

I know this is a bike shed, and I'm sorry to be the one to bring it
up, but can we use the names chmodat, chownat, c., for our native
system calls, and just use libc aliases or _BLAH_SOURCE nonsense or
something for the ridiculous `f' prefix on fchmodat, fchownat, c.?


What is the goal? You want to write userland code using chmodat()
instead of fchmodat()?


I want the names to follow a clear and easily-documented pattern.

Takes a nameTakes a fd, not a name  Takes a name and an at fd
(prepend f) (append at)
--  ---
open- (fopen is different)  openat
link-   linkat
unlink  -   unlinkat
rename  -   renameat
chdir   fchdir  chdirat
mkdir   fmkdir  mkdirat
mkfifo  fmkfifo mkfifoat
utimens futimensutimensat
chmod   fchmod  chmodat (not fchmodat)
chown   fchown  chownat (not fchownat)
statfstat   statat (not fstatat)
access  -   accessat (not faccessat)

However, I also want the inconsistent POSIX names to be provided.

I don't know a good way of satisfying both goals.

--apb (Alan Barrett)


Re: pass-through linux ioctl for mfi(4)

2012-09-19 Thread Alan Barrett

On Wed, 19 Sep 2012, Manuel Bouyer wrote:

Here's an updated patch, which checks the size before malloc in mfifioctl(),
and I also removed a debug printf in compat_linux.
I intend to commit this next weekend.


Are these pass-through ioctl commands denied at securelevel = 1?

--apb (Alan Barrett)


Core statement on directory naming for kernel modules

2012-07-27 Thread Alan Barrett

Core statement on directory naming for kernel modules -- July 2012

The NetBSD core group has noted concerns about the name of the
directory used for kernel modules.

At present, the kernel loads modules from the
directory /stand/${MODULE_MACHINE}/${VERSION}/modules
(e.g. /stand/amd64/6.99.4/modules).

There have been several objections to this use of the /stand directory,
and several suggestions for alternatives.  On 8 July 2012, David Holland
presented this summary of the proposals, and objections to them:

   /boot is wrong because modules are not used only or even primarily at boot.

   /lib/modules is wrong because modules are not link libraries.

   /libdata/modules is wrong because modules are not data.

   /libexec/modules is wrong because modules are not programs.

   /modules is wrong because it adds a new toplevel directory.

   /stand/modules is wrong because modules are not used without the kernel.

There have also been proposals for more radical changes, including:

   Keeping both the kernel and its modules together in a directory.
   A detailed description was posted by Luke Mewburn
   http://mail-index.NetBSD.org/current-users/2009/05/10/msg009372.html.

   Keeping both the kernel and its modules together in a tar archive.

   Keeping both the kernel and its modules together in an ELF executable.

The core group is of the opinion that it is too late for such major
changes to be included in NetBSD-6.  Accordingly, we think that the
existing scheme should be retained, without changes to either directory
names or more fundamental aspects, for the NetBSD-6 release.  Changes to
either the directory names, or more fundamental aspects of the scheme,
or both, may be made in the future.

The core group would also like to see the following changes in the near
future:

   Implementation of the scheme described by Luke Mewburn in
   http://mail-index.NetBSD.org/current-users/2009/05/10/msg009372.html
   to allow a kernel and its modules to be kept together.

   Changes to config(1) to extend the existing notion of whether or not
   an option is built-in to the kernel, to three states: built-in, not
   built-in but loadable as a module, entirely excluded and not even
   loadable as a module.

Alan Barrett, on behalf of core


Re: raid1: unable to open device, error = 16

2012-07-27 Thread Alan Barrett

On Thu, 26 Jul 2012, matthew green wrote:

library functions like
opendisk() will look for a file-path of the given name before
trying other names.  that tends to make them use the block
dev when you want the char dev.  eg, compare the ktrace for
newfs raid1d when you are in/not in /dev.


I think that opendisk should do something like this:

   if (path contains a slash) {
   use specified path, do not search in any way
   } else {
   try /dev/[r]foo
   try /dev/[r]fooX, where X is c or d depending on kern.rawpartition
   }

If the user wants to open a file in the cwd, then let the
user pass ./foo instead of foo to opendisk.

--apb (Alan Barrett)


Re: File systems on 4k sector devices?

2012-06-07 Thread Alan Barrett

On Thu, 07 Jun 2012, Michael van Elst wrote:
WAPBL is an exception, the position and size of the journal is 
stored int terms of pyhsical disk sectors.


So if I use dd(1) or equivalent to copy the data from a disk with 
512-byte sectors to a disk with 4096-byte sectors, and if the data 
includes a WAPBL wile system, then it won't work?


Can we fix this?
Can we fix it before next week, when I plan to upgrade my 
laptop's disk to one with 4kB sectors?


--apb (Alan Barrett)


Re: Time to remove COMPAT_386BSD_MBRPART

2012-03-03 Thread Alan Barrett

On Sun, 04 Mar 2012, David Holland wrote:
Now that netbsd-6 has been branched, I think it's time to 
remove COMPAT_386BSD_MBRPART. This entails both the kernel code 
and some code in disklabel(8) and sysinst. The kernel code 
has been disabled by default for years; the disklabel(8) code 
was overlooked at the time and disabled about a year ago (in 
both current and -5) after several reports of trashed FreeBSD 
partitions were traced to it.


Does anyone object? It's been a good long time since the 
partition ID changed.


No objection, but please can it be added to Features that will be 
removed in the next version in the netbsd-6 release notes.


--apb (Alan Barrett)


Re: extattr namespaces

2012-02-10 Thread Alan Barrett

On Fri, 10 Feb 2012, YAMAMOTO Takashi wrote:

how about the following mapping?

xattr name string - ufs on-disk

system.foo  - SYSTEM foo
others.bar  - USER others.bar


Looks reasonable, but then which of the following?

a)  user.user.baz  - USER user.baz
b)  user.baz   - USER user.baz
c)  user.baz   - USER baz
d)  baz- USER baz

(I suggest b and d)

--apb (Alan Barrett)


Re: Implementing mount_union(8) into vfs (for -o union)?

2012-01-28 Thread Alan Barrett

On Sat, 28 Jan 2012, Julian Fagir wrote:
I've just been trying to mount a tmpfs over a read-only root 
file system.  Unfortunately, this won't work just by mounting a 
tmpfs with option union over the root file system. You'd have to 
create a tmpfs, and mount that one with mount_union(8) over the 
root file system, which is again not possible.


I read your message twice and I still don't know what you mean. 
Could you give examples of the commands that you use, and the 
errors.


--apb (Alan Barrett)


Re: fifo and [acm]time

2011-12-28 Thread Alan Barrett

On Mon, 26 Dec 2011, Taylor R Campbell wrote:
Is one inode update per minute enough to be a significant 
issue?


It means the disk must continue spinning and, e.g., will 
continue to draw power from a laptop battery to do so, even when 
the system is functionally idle.


I think that's a more general problem.  It would be nice if all 
updates to atime/mtime/ctime could be buffered in memory (not 
committed to stable storage) until either the disk happens to be 
spinning anyway, or the amount of memory wasted in buffering the 
updates is too large, or the updates are forced using a mechanism 
like fsync(2) or sync(2).


I even want syslogd's writes to /var/log/messages to be buffered 
until the disk happens to be spinning anyway.


--apb (Alan Barrett)


Re: Patch: new random pseudodevice

2011-12-09 Thread Alan Barrett

On Fri, 09 Dec 2011, Thor Lancelot Simon wrote:
An attacker who can break AES might be able to predict 
the future output of _one_ instance of the generator.  An 
attacker who can break AES and recover the key and defeat the 
backtracking resistance designed into CTR_DRBG *might* be able 
to recover the prior outputs of the generator for that user. 
An attacker who can do all these things *and* recover earlier 
entropy-pool output from later entropy-pool output (that is, do 
exactly what would have had to be done to break the old design) 
can recover keys provided by the generator to other users.  If 
he happens to know when exactly they were produced (time is an 
input to the algorithm), etc.


Fair enough, but you still seem to be talking about how good a 
CSPRNG it is, whereas my concern is that it's pseudorandom, nor 
random.


How many different bit streams of length 2^31 can be produced by 
a generator that has a 128-bit key?  I think it's 2^128 different 
pseudorandom bit streams of length 2^31.  If they were truly 
random, then there would be 2^(2^31) of them.


I still think it's not appropriate for /dev/random to output 
pseudorandom bits (even cryptographically secure pseudorandom 
bits) when it has historically output random bits (or at least 
attempted to output random bits, modulo bugs, design mistakes, 
etc.).


--apb (Alan Barrett)


Re: Patch: new random pseudodevice

2011-12-09 Thread Alan Barrett

On Fri, 09 Dec 2011, Thor Lancelot Simon wrote:
On Fri, Dec 09, 2011 at 12:14:40PM -0500, Thor Lancelot Simon 
wrote:
Let me put it this way: before, you may have thought you were 
getting some kind of true randomness.  You weren't.  Now, 
you still aren't, but at least what sits between you and the 
entropy source is a lot more clear, and a lot better analyzed.


I am not knowledgeable enough to comment on that, so I'll take 
your word for it.


However, when applications use /dev/random, we could consider a 
request to be a single read from the device.  This also has 
the appealing property that it aligns with how the underlying 
generator (CTR_DRBG) counts requests.  That way, in practice, 
each read from /dev/random would get a fresh AES key -- and most 
application reads from /dev/random, which may block, are very 
small.


I think that, in practice, that is about as close to meeting the 
expectations of the application authors as possible.


I like that idea.

--apb (Alan Barrett)


Re: Patch: new random pseudodevice

2011-12-09 Thread Alan Barrett

On Fri, 09 Dec 2011, Pawel Jakub Dawidek wrote:
You are aware of the fact that 99.99% of computers don't have 
true random number generators and the bits you claim that are 
random are not random at all? They try to be unpredictable.


I believe that there is a truly random component to air turbulence 
inside mechanical disk drives, and that some of the randomness can 
be harvested in timing measurements.  I believe that there is a 
truly random component to the relationship between two uncoupled 
oscillators, and that some of that randomness can be harvested 
in timing measurements.  I believe that there is a truly random 
component to the noise produced by an amplifier, and that some 
of that randomness can be harvested by an A/D converter.  I 
believe that most computers have hardware capable of exploiting 
some of this randomness.  I believe that this randomness is of 
thermodynamic and quantum origin, that it's difficult to estimate 
how many bits of entropy are theoretically present, and even 
more difficult to estimate how many bits of entropy are actually 
harvested.


CSPRNG have two roles: turn few almost unpredictable bits that 
your machine can gather into many cryptographically secure 
pseudo-random bits and to hide those almost unpredictable bits 
from consumers.


Yes.


Returning gathered entropy directly is very, very risky.


Yes.

--apb (Alan Barrett)


Re: Patch: new random pseudodevice

2011-12-08 Thread Alan Barrett

On Thu, 08 Dec 2011, Thor Lancelot Simon wrote:
The urandom device node will key the generator and output data 
even if the kernel entropy pool estimates that it does not 
have enough bits to provide an AES-128 key with ful entropy. 
The random device node will block until sufficient bits are 
available from the pool to key the generator.


So, /dev/urandom will never block, and each opened file descriptor 
from /dev/random may block the first time you read or select from 
it, but will not block again until it is re-keyed after 2^31 bits 
(or is it bytes?) of output have been generated?


The previous /dev/random implementation would never give out 
more data than the estimated entropy in the pool, so callers 
could think that they were getting the highest quality possible. 
Callers will now get 2^31 bits of output and consume only 128 bits 
of entropy from the pool, so they may think that they are getting 
lower quality output.


I have this naive idea that trying to get out more than you put 
in is cheating, and I think it's fine for /dev/urandom to cheat, 
but I am not happy about /dev/random cheating.  Please could you 
explain where I have misunderstood.


--apb (Alan Barrett)


Re: secmodel_register(9) API

2011-12-05 Thread Alan Barrett

On Mon, 05 Dec 2011, Elad Efrat wrote:
Personally I don't care if this stays or not. All I can say is 
that I have not seen a single argument worthy of consideration 
against it and I would strongly recommend to leave it in.


When you want to introduce a new feature, you should provide 
arguments in favour of the new feature, not merely say there are 
no good arguments against it.  This is especially important in 
the case of features that have non-trivial security impact.


--apb (Alan Barrett)


Re: language bindings (fs-independent quotas)

2011-11-18 Thread Alan Barrett

On Fri, 18 Nov 2011, Manuel Bouyer wrote:
Assuming that there's no need to handle fields with embedded 
spaces, perl's split() function will DTRT.


No, it does not because there are fields that can be empty.


The common way of dealing with that is to have a placehloder like 
- for empty fields.


By the way, I still haven't figured out how to test any of 
this quota stuff.  quotaon / followed by edquota -f / does 
nothing (no error message, and no useful result).  Using the 
device name /dev/cgd1a instead of the file system name / 
does not help.


what are you trying to do ?


I am just trying to enable quotas so that I can test some of the
quota-related commands.

quotaon won't do anything if / doesn't have the userquota or 
groupquota keyword in the fstab, and you have to run quotacheck 
before quotaon.  This is for ufs-quota1.


I don't see that in the quotaon(8) man page.

The filesystems specified must have entries in /etc/fstab and be 
mounted.


I have that.

quotaon expects each filesystem to have quota files named 
quota.user and quota.group which are located at the root of the 
associated file system.  These defaults may be overridden in 
/etc/fstab.  By default both user and group quotas are enabled.


I interpreted that as by default, quotaon will just work.

Anyway, when I run quotacheck, it complains:

$ sudo quotacheck /
quotacheck: / not found in /etc/fstab

I do have an entry for / in /etc/fstab:

from_mount  /   ffs rw,log  1 0

This is in a chroot, and the actual device name is /dev/cgd1a,
but fstab doesn't know that.


For ufs-quota2, quotas are enabled at newfs time, or with tunefs (with
the later this has to be done on a read-only mounted filesystem, and you
have to run fsck before mounting R/W). quotaon won''t do anything
for ufs-quota2.


The quotaon(8) man page does not say that it's only for some file
system types, and does not refer to newfs, tunefs, or fsck.

--apb (Alan Barrett)


Re: language bindings (fs-independent quotas)

2011-11-17 Thread Alan Barrett

On Fri, 18 Nov 2011, David Holland wrote:
The proposed standard format for quotas is an ordinary columnar 
text file. The reason language bindings came up is that Manuel 
was complaining, somewhat oddly, that it's hard to handle these 
in Perl.


Assuming that there's no need to handle fields with embedded 
spaces, perl's split() function will DTRT.


And actually, language bindings are probably a good thing 
anyway; if you have an installation with 50,000 users and you 
want to frob their quotas from a Perl script, forking 50,000 
edquota processes is probably not the best approach.


Oh my, I missed the part of the edquota man page where it says a 
temporary file is created for each user.  Why can't it just create 
a single temporary file with a text table of all quotas?


By the way, I still haven't figured out how to test any of this 
quota stuff.  quotaon / followed by edquota -f / does nothing 
(no error message, and no useful result).  Using the device name 
/dev/cgd1a instead of the file system name / does not help.


--apb (Alan Barrett)


Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive

2011-10-28 Thread Alan Barrett

Matthew Mondor wrote:

Greg Troxel g...@ir.bbn.com wrote:
So, I'm inclined to patch rdiff-backup not to fsync, since it 
seems excessive, and the backup is toast if the machine crashes 
before it is finished -- in that case rdiff-backup just rolls 
back.  Opinions?


I also wonder why fsync would be used for every file, especially 
if you consider a whole run a single transaction, even more so 
if using snapshots (although you don't mention using them).


If rdiff-backup was easily able to roll back after a crash, then 
I'd probably agree with the above.  But it's expensive to roll 
back (you have to compare the actual data in the files, without 
assuming that {same size, same mtime} implies same data).


The current state of ffs+wabl is that, if the system crashes and 
the log is replayed, then files that had been written shortly 
before the crash end up with whatever old data happened to be 
in the underlying disk blocks, but new metadata indicating that 
the size and timestamps are all up to date.  I think that this 
violates traditional unix file system semantics, but the people 
who worked on wapbl don't seem to think it's a problem.


Anyway, the new metadata with old data tends to make rsync (and 
probably rdiff-backup) think that the file is up to date, and 
so not copy it again next time (unless you perform an expensive 
comparison of all the data, nit just the metadata).


I have patched rsync to issue fdatasync(2) calls frequently, 
to mitigate this problem in my own usage.  It does slow it 
down, but nowhere near as dramatically as you report.  (I use 
NetBSD-current.)


--apb (Alan Barrett)


Re: fs-independent quotas

2011-10-19 Thread Alan Barrett

On Wed, 19 Oct 2011, David Holland wrote:

  - the quota key is:
   the quota *class*
   the id
   the quota *type*

  - the quota value is:
   the configured hard limit
   the configured soft limit
   the configured grace period
   the current usage
   the current grace expiry time (if any)


This seems sensible.


1. A file system type can have or not have support for quotas. If
there is no support for quotas, nothing else works.

2. Any given filesystem volume may have or not have quota data on it.
This is the filesystem code's problem and irrelevant to the
FS-independent logic.

3. Any given filesystem volume may be mounted with or without quotas
enabled. If quotas are not enabled, quota information is not available
and the quota utilities will not be able to do anything.

4. Once mounted, quotas can be either on or off. As far as the
FS-independent code is concerned, quotas being off means only that
they aren't enforced; that is, with quotas off operations that
increase usage do not fail with EDQUOT. When quotas are off, quota
information can still be inspected or updated.


I don't like the names on and off at level 4.  They are too vague,
and too easily confused with enabled or not enabled at level 3.

I'd suggest these names:

1. supported or not supported by the file system format
2. present or not present in the file system backing store
3. enabled or not enabled in the mounted file system
4. enforced or not enforced for the system/user/group/file system/???

I think you might want a fs-independent API to ask the file 
system whether or not quotas are supported or present.  I suppose 
getschemaname answers the present? question, but I don't see 
anything that would help a user interface choose whether to 
display a message saying quotas not supported, tough luck or 
quotas not enabled, would you like to enable quotas now?


--apb (Alan Barrett)


Re: RFC: SEEK_DATA/SEEK_HOLE implementation version 2

2011-10-03 Thread Alan Barrett

On Wed, 17 Aug 2011, Reinoud Zandijk wrote:
after getting stuck in the 1st implementation in the 
rump/puffs/refuse jungle i started a new version that is more in 
line with the Solaris implementation and is far less invasive.


Basicly the system call forwards the requests using ioctl's just 
like Solaris and, as it turns out, also FreeBSD with their ZFS 
import. For simplicity and to reduce compat stuff i've used the 
same ioctls FreeBSD defines. FreeBSDs support is limited though; 
only ZFS handles them. The ioctl names are not documented yet.


So, if I am reverse engineering the code correctly, the design is 
like this:


  There are no new VOP calls.

  There are two new ioctls, FIOSEEKDATA and FIOSEEKHOLE.  Each
  file system may provide its own implementation.  If the
  underlying file system doesn't support them, then they fail.

  There are two new lseek 'whence' flags, SEEK_DATA and SEEK_HOLE.
  The kernel's lseek implementation forwards them to the
  underlying file system using VOP_IOCTL(FIOSEEKDATA) and
  VOP_IOCTL(FIOSEEKHOLE).  If the ioctl fails, then lseek
  implements the fallback behaviour of treating the file as a
  single data region followed by a hole after the end of file.

I think that it would be better to implement the fallback behaviour in
the vfs layer rather than in the lseek syscall.

--apb (Alan Barrett)


Re: RFC: SEEK_DATA/SEEK_HOLE implementation version 2

2011-10-03 Thread Alan Barrett

On Mon, 03 Oct 2011, Reinoud Zandijk wrote:

On Mon, Oct 03, 2011 at 08:33:06AM +0200, Alan Barrett wrote:
I think that it would be better to implement the fallback 
behaviour in the vfs layer rather than in the lseek syscall.


I tried that before and it was in my origional patch. I changed 
the VOP_SEEK() to accept the other two `whence' argument values. 
VOP_SEEK()'s prototype had to be extended resulting in severe 
compatibility issues with puffs/rump/(re)fuse etc. resulting in 
a HUGE patchset.  Also, external maintained code like ZFS had to 
be changed.


Your original patch did that in VOP_SEEK, yes.  I think that was a 
bad idea, and that's not what I am suggesting.


When I suggested implement the fallback behaviour in the vfs 
layer, I meant in the vfs layer's handling of the new FIOSEEKHOLE 
and FIOSEEKDATA ioctls.  This would mean that users of the new 
lseek flags, and users of the new ioctls, would both get the 
fallback behaviour that, if the underlying file system doesn't 
know better, a file appears to have a single data region followed 
by a hole after EOF.



Does this answer your question?


Not really, but I see that my suggesiton was unclear.  I hope 
it's more clear now.


--apb (Alan Barrett)


Re: A simple cpufreq(9)

2011-09-25 Thread Alan Barrett

On Sun, 25 Sep 2011, Jukka Ruohonen wrote:

So here is a quick draft for the first iteration with the cpuctl(8). If there
are issues, speak now, otherwise I'll proceed with something based on this.


You forgot to include the documentation.

--apb (Alan Barrett)


Re: core's decision on modular kernels

2011-09-22 Thread Alan Barrett

On Wed, 21 Sep 2011, Martin S. Weber wrote:

On Wed, Sep 21, 2011 at 07:55:38AM +0200, Alan Barrett wrote:

- A port's MONOLITHIC kernel should include features that
   traditionally would have been present in a non-modular GENERIC
   kernel, and it may or may not include options MODULAR, at the
   portmaster's discretion.


Huh? Would it be possible please to get a more detailed rationale
behind allowing options MODULAR in a MONOLITHIC kernel, if all
ports using modules already offer MODULAR and GENERIC?


The main difference between MODULAR and MONOLITHIC would be that
MONOLITHIC has built-in support for almost everything considered stable
and useful, whereas MODULAR might expect to load a lot of modules at run
time.  MONOLITHIC might still not have absolutely everything built-in,
and options MODULAR allows it to load additional modules at run time,
if the portmaster decides that this would be useful.

I use a MONOLITHIC kernel with options MODULAR to allow loading of a
module that contains the root file system as an md(4) image.

--apb (Alan Barrett)


core's decision on modular kernels

2011-09-20 Thread Alan Barrett

Dear NetBSD users,

The NetBSD core group has discussed the questions presented to us
about the situation with modules and modular kernels.

We understand that there are problems with modularization on all
the platforms, specially on amd64, and we have seen a lot of
breakage due to them in the past years.  As core we believe that
ultimately the ability to build modular kernels is the way to go
and that by reverting a lot of the modularization on head we limit
its testing making it harder to become mainstream.  On the other
hand, we should always provide a safe way for people to build and
release kernels.

On the positive side:

- Modules can speed up kernel development because they eliminate
   many reboots by simply loading and unloading the module during
   each development cycle.

- Modules can conserve kernel memory in memory shortage
   situations.

- Modules can be used to add/remove/replace functionality on the
   fly.

On the negative side:

- Many of our modules are half baked (don't work correctly as
   modules, don't specify the right dependencies, or cannot be
   unloaded).

- Our module separation is not good (try compiling a kernel with
   only COMPAT_30 and all the rest of the compat code as modules;
   for now all that works is the all or nothing approach).

- Modules don't work on all platforms. Some platforms don't have a
   need for them because their hardware is fixed, but modules could
   still be used for software features (compat code, emulations).

- We don't have an easy way to group a kernel and its associated
   modules together, so that it's possible to have multiple
   bootable kernels, and multiple associated sets of modules, even
   if the kernels all share the same version number.

- We don't have a stable kernel ABI so that modules are reusable
   across different kernel versions.

- We don't have a way to tell from the kernel config file whether a
   feature can be used in a module form or not. (Perhaps comments or
   additional config(1) syntax could be used for this.)

Accordingly, we propose the following policy for the immediate
future.  We expect that it will be appropriate to re-evaluate this
policy as the state of modular support changes later.

- All ports using modules should provide all three of MODULAR,
   MONOLITHIC, and GENERIC kernels.

- At the portmaster's discretion, options MODULAR may be made the
   default by adding it to the port's std.machine configuration
   file.  (A kernel without the MODULAR option cannot load any
   modules, not even through the modload(8) command.)

- A port's MONOLITHIC kernel should include features that
   traditionally would have been present in a non-modular GENERIC
   kernel, and it may or may not include options MODULAR, at the
   portmaster's discretion.

- A port's MODULAR kernel may lack many built-in features, expecting
   them to be loaded from modules at run time.  However, all features
   that are necessary for the standard MODULAR kernel to boot and
   work reasonably must be built-in.  This includes:

* common file systems, including all file systems that can
  be the root file system, and also including nullfs and tmpfs;
* disk devices that can contain the root file system;
* common network devices;
* exec support for the native ELF format, and for scripts
  (not necessarily for a.out, ECOFF, or compat formats);
* core dump support.

   Users or developers may of course comment out relevant lines
   if they want to load these items as modules.

- The GENERIC kernel should be based on either MODULAR or
   MONOLITHIC, using an include directive.  The GENERIC kernel
   should include options MODULAR, even if it it based on a
   MONOLITHIC kernel that does not include options MODULAR.

- A port may not set GENERIC = MODULAR if it lacks an easy way to
   group a kernel and its associated modules together.  Because
   no existing ports have this feature, no existing ports may set
   GENERIC = MODULAR.

Alan Barrett
On behalf of the NetBSD core group


Re: RFC: New security model secmodel_securechroot(9)

2011-07-09 Thread Alan Barrett

On Sat, 09 Jul 2011, Aleksey Cheusov wrote:

·   Adding and enabling a ppp(4) interface is not allowed.

·   Adding and enabling a sl(4) interface is not allowed.

·   Adding and enabling a strip(4) interface is not allowed.

·   Adding and enabling a tun(4) interface is not allowed.

·   Adding and enabling a bcsp(4) device is not allowed.

·   Adding and enabling a btuart(4) device is not allowed.


Can this be generalised to adding and enabling any kind of 
network interface is not allowed?


--apb (Alan Barrett)


Re: mutexes, locks and so on...

2010-11-16 Thread Alan Barrett
Please could somebody on the eat your CAS whether you like it or not
side of the fence explain why the following idea would not work:

On Sat, 13 Nov 2010, der Mouse wrote:
 Consider this hypothetical:
 
 x86 does #define ATOMIC_OPS_USE_CAS and defines a CAS(); MI code
 notices this and defines all the higher-level primitives (if that's not
 too much of an oxymoron) in terms of CAS().
 
 ppc, arm, all the arches sufficiently modern to have CAS, likewise.
 
 Arches without a sufficiently general CAS[%] do not define
 ATOMIC_OPS_USE_CAS and provides their own implementations of mutexes,
 spinlocks, whatever.

--apb (Alan Barrett)


Re: XIP

2010-10-26 Thread Alan Barrett
On Mon, 25 Oct 2010, Masao Uebayashi wrote:
 I think the uebayasi-xip branch is ready to be merged.
 
 This branch implements a preliminary support of eXecute-In-Place;
 execute programs directly from memory-mappable devices without
 copying files into RAM.  This benefits mainly resource restricted
 embedded systems to save RAM consumption.

Would memory disks (such as md(4)) also benefit from XIP, or do they
already do something to avoid having multiple copies of the same data?

--apb (Alan Barrett)



Re: XIP

2010-10-26 Thread Alan Barrett
On Tue, 26 Oct 2010, Alan Barrett wrote:
 Would memory disks (such as md(4)) also benefit from XIP, or do they
 already do something to avoid having multiple copies of the same data?

Never mind.  I see you discuss this in section 11.6 of the paper.

--apb (Alan Barrett)


Re: 16 year old bug [with non-contiguous netmasks]

2010-08-23 Thread Alan Barrett
On Mon, 23 Aug 2010, Christoph Egger wrote:
 [OpenBSD] commit message:
 
 Fix a 16 year old bug in the sorting routine for non-contiguous netmasks.

I suggest removing support for non-contiguous netmasks.  They are
unusable with CIDR (introduced in 1993 in RFCs 1517, 1518, and 1519).
Even RFC 950 (August 1985) recommended that subnet bits should be
contiguous.

--apb (Alan Barrett)


deprecating #define'd sysctl OIDs

2010-08-15 Thread Alan Barrett
On Sun, 15 Aug 2010, Jean-Yves Migeon wrote:
  It might make sense to add comments near all existing lists of
  hard-wired sysctl OID values asking people not to add more of them.
 
 Shall it be added for all other archs then? I assume that they can all
 benefit from the dynamic sysctl(9) interface?

If we do this at all, then we should do it for all lists of sysctl
OID values.  Several of them are in sys/sysctl.h, and I am sure
there are more scattered around.  I don't see the point of doing
it only for CPU_* definitions.

All three of the sysctl(3), sysctl(7), and sysctl(9) man pages
could also be improved, to make it more clear that new code can
(should?) use dynamic allocation instead of #define'd OID values.

--apb (Alan Barrett)


Re: Adding 'i386_use_pae' variable, and expose it through sysctl

2010-08-15 Thread Alan Barrett
On Sun, 15 Aug 2010, Thor Lancelot Simon wrote:
 You can't do it for existing OIDs, that breaks binary compatibility.

Yes, obviously.  My suggestion was about adding comments and
documentation to discourage new OIDS from being added in the old way.

--apb (Alan Barrett)


Re: Forcing a serial console for the kernel

2010-03-28 Thread Alan Barrett
On Sun, 28 Mar 2010, STEPHEN JONES, W0TTY wrote:
  If your system has serial BIOS, it is probably hiding the first
  serial port from the bootblocks so they don't automatically detect
  it.  This is a change you need to make to the bootblocks -- not
  the kernel.  Try installboot (possibly with -e depending on your
  application) -o console=com0 -o ioaddr=0x3f8 -o speed=9600.  The
  ioaddr= option forces the bootblocks to detect the serial port
  even though the BIOS claims it's not there.

 Unfortunately this is a 2.0 (GOJU.RYU.COM) system which does not seem
 to have the ioaddr option for installboot.

If you installed or upgraded from a CD, then the installboot command
on the CD will have the options you need.  If you built netbsd-5
from source, then ${TOOLDIR}/bin/nbinstallboot will be a version of
installboot that runs under the existing system but supports the options
found in netbsd-5's installboot.  You'll also need new boot blocks, from
${DESTDIR}/usr/share/mdec (or from a netbsd-5 install CD).

--apb (Alan Barrett)


[no subject]

2010-03-16 Thread Alan Barrett
On Mon, 15 Mar 2010, Aleksej Saushev wrote:
 While here, can anyone enlighten us how one boots NetBSD so that it looks
 for modules in non-default directory?

You can't, and the people who want NetBSD to move to modular kernels
don't seem to care.  Until this problem is fixed, I will try to avoid
using modular kernels.

--apb (Alan Barrett)