Re: Some guidance/suggestion please

2022-01-14 Thread John Nemeth
On Jan 14,  9:11, Paul Goyette wrote:
}
} I'm looking into more modularization of the kernel, and my next
} "target" is the ALTQ stuff.  Right now, there are several network
} device drivers that are built as loadable modules, yet they still
} depend on conditional compilation.  In particular, there seems to
} be some number of code fragments similar to
} 
}   ...
}   #ifdef ALTQ
}   altq-code-part-A
}   #endif
}   (common code)
}   #ifdef ALTQ
}   altq-code-part-B
}   #endif
}   ...
} 
} The existing module_hook mechanism doesn't help us here.  We can
} make the two pieces of altq code into module hooks, but that
} doesn't handle the case where the module gets loaded or unloaded
} between the two parts of the altq code.
} 
} We have the module_hold()/module_rele() mechanism but they're not
} really appropriate here.  Both routines require us to already have
} access to the module's data in its struct module.
} 
} So, I'm pretty sure we need a new mechanism, one that prevents a
} module from being unloaded during critical sequences.  This new

 Unload is a request that goes to the module.  It is the modules
responsibility to deny the request if it can't be safely unloaded.
This has always been the case.  I don't think a new mechanism is
requried here.

} mechanism needs to also prevent a module from being newly loaded
} (or at least, from installing any new hooks);  otherwise we could
} potentially execute the part-B code without have run any part-A
} code on which part-B might depend.  Therefore the mechanism cannot
} live within the module itself.

 This one is more tricky.  The module can certainly error out
in the load routine and refuse to load.  The problem is that the
calls to "part A" and "part B" are external to the module.  How
does it know that the dummy "part A" has been called, but not the
dummy "part B"?

 There are a couple of possibilities here, all bad.  The dummy
"part A" could set a flag which is cleared by the dummy "part B".
The module could then refuse to load if the flag is set.  The other
possibility is that the module could simply clear the flag and
return when "part B" is called.

 The problem here is what happens if multiple devices are
calling AltQ at the same time?  The best situatiion I can see is
that the module be written in such a way that if "part B" is called
without "part A" being called that it not cause problems.  This
should be the case anyways to have resiliency against a broken
driver calls "part B" without calling "part A".

} FWIW, I vaguely remember having one or two instances of this issue
} during my work on the compat modules (some years ago).  I think I
} addressed these by making explicit checks on the hook->hooked
} member, but that's not really correct.  (Unfortunately I can't
} remember any details on this...)
} 
}-- End of excerpt from Paul Goyette


Re: eventfd(2) and timerfd(2) APIs

2021-09-18 Thread John Nemeth
On Sep 18, 10:26, Jason Thorpe wrote:
}
} Last year, I wrote implementations of the Linux eventfd(2) and
} timerfd(2) interfaces for NetBSD, with the goal of improving our
} Linux emulation.  In order to be able to test them with ATF tests,
} I went ahead and made them native calls as well.
} 
} Here are the man pages describing the interfaces:
} 
}   https://www.netbsd.org/~thorpej/eventfd.2
}   https://www.netbsd.org/~thorpej/timerfd.2
} 
} Any objections to adding these?

 Nice.  timerfd(2) is Asterisk's preferred timing source.  This
should improve our support for Asterisk.  As you might guess, I'm
all for the addition of these.

}-- End of excerpt from Jason Thorpe


Re: Devices.

2021-05-30 Thread John Nemeth
On May 30, 14:24, Michael van Elst wrote:
} mueller6...@twc.com ("Thomas Mueller") writes:
} 
} > Where do I find the "enough dk* nodes" mode?  Would it be in
} > the kernel config?  I never saw it.
} 
} You can run devpubd. When a wedge and thus the dk* unit attaches, it
} runs the 01-makedev hook that creates the device node in /dev.

 That's not a "mode".  It's a clunky userland daemon that tries
to make up for the fact that we don't have a devfs.

}-- End of excerpt from Michael van Elst


Re: Devices.

2021-05-29 Thread John Nemeth
On May 29, 22:52, David Holland wrote:
} On Sat, May 29, 2021 at 05:41:38PM -0400, Mouse wrote:
} 
}  > > For disks, which for historical reasons live in both cdevsw and
}  > > bdevsw, both entries would point at the same disk_dev.
}  > 
}  > I would suggest getting rid of the bdev/cdev distinction.  It is, as
}  > you say, a historical artifact, and IMO it is not serving anyone at
}  > this point.
} 
} It is deeply baked into the system call API and into POSIX, so it's
} not going anywhere. It's been proposed that we should stop having
} block devices, which would have the same net effect; I have no strong
} opinion on that and it doesn't need to be part of this set of changes.

 I was thinking the same thing about getting rid of block
devices.  The only place they should ever be used is an argument
to mount(2) and mount(2) can be adjusted to use a block device
underneath when it is handed a character device.  FreeBSD got rid
of block devices a long time ago.  Doing that as a first step is
likely to simplify things to make other things easier.

}  > > A third question: how does this affect interfaces?
}  > 
}  > As in, network interfaces?  Good question.  I think they should be
}  > device nodes in the filesystem *somehow*.
} 
} That's probably true, but they currently aren't and the plumbing above
} them is unrelated to the VFS device plumbing, so for the time being
} it's a separate issue.
} 
} Disentangling the current situation with device special files on
} filesystems will make it easier to manifest interfaces on disk if we
} ultimately want that.

 We should really get with the times and create a devfs.  I
know that there are people that disagree with this (likely including
you), but the archaic device node system causes a lot of headaches
and it's time that we joined the 21st century.  Anything done with
devices should be done with idea of a devfs in mind.  Yes, devfs
like things have caused a lot of problems on other operating systems,
but I think we have enough brain power and enough real world examples
to be able to not repeat the mistakes of the past.

}-- End of excerpt from David Holland


Re: regarding the changes to kernel entropy gathering

2021-04-05 Thread John Nemeth
On Apr 4, 23:09, Taylor R Campbell wrote:
} 
} > Date: Sun, 04 Apr 2021 12:58:09 -0700
} > From: "Greg A. Woods" 
} > References: 
} > <20210404094958.692f360...@jupiter.mumble.net>
} > 
} > At Sun, 4 Apr 2021 09:49:58 +, Taylor R Campbell  
wrote:
} > >
} > > Your change _creates_ the lie that every bit of data entered this way
} > > is drawn from a source with independent uniform distribution.
} > 
} > No, my change _allows_ the administrator to decide which devices can be
} > used as estimating/counting entropy sources.  For example I know that
} > many of the devices on almost all of my machines (virtual or otherwise)
} > are equally good sources of entropy for their uses.
} 
} If you know this (and this is something I certainly can't confidently
} assert!), you can write 32 bytes to /dev/random, save a seed, and be
} done with it.
} 
} But users who don't go messing around with obscure rndctl settings in
} rc.conf will be proverbially shot in the foot by this change -- except
} they won't notice because there is practically guaranteed to be no
} feedback whatsoever for a security disaster until their systems turn
} up in a paper published at Usenix like .

 Or, get a repeat of the Debian weak SSH key debacle when they
screwed up their crypto.  I don't expect NetBSD to withstand an
attack by a nation state actor, but I do expect it to stand up to
a wardialing script kiddie.

}-- End of excerpt from Taylor R Campbell


Re: regarding the changes to kernel entropy gathering

2021-04-04 Thread John Nemeth
On Apr 4,  9:49, Taylor R Campbell wrote:
} 
} What NetBSD-current is telling you on your Xen system, on a CPU
} predating RDRAND/RDSEED, is the unfortunate truth that there is no
} reliable source of entropy available in your system -- annoying, yes,
} but when you talk about `matters so important as system security and
} integrity' you might prefer to hear about this rather than have it
} swept under the rug.

 I understand the need for good random sources, and won't argue
it.  My question is, how can we tell what random sources a system
actually has, i.e. is there some flag that cpuctl identify shows
when a system has RDRAND/RDSEED?  Are there other sources that can
be positively identified as providing randomness?

}-- End of excerpt from Taylor R Campbell


Re: "Boot this kernel once" functionality? (amd64)

2020-09-24 Thread John Nemeth
On Sep 24,  9:14am, Reinoud Zandijk wrote:
} Subject: Re: "Boot this kernel once" functionality? (amd64)
} On Wed, Sep 16, 2020 at 12:09:43PM +0200, Martin Husemann wrote:
} > On Wed, Sep 16, 2020 at 12:05:26PM +0200, Anthony Mallet wrote:
} > > I was also wondering if it would be possible to pass arguments to the
} > > primary or secondary bootloader via reboot(2) and the boothowto
} > > flags. But this doesn't seem doable. Right?
} > 
} > This works fine on e.g. sparc*; I can do: shutdown -b netbsd.t -r now
} > 
} > and it will pass "netbsd.t" as boot argument to the firmware, which passes
} > it on to the bootloader and then it boots /netbsd.t once.
} 
} In shutdown(8) I read that the arguments are passed to reboot(8) and that is
} mentioned in kloader(4) so I guess its using that mechanism.
} 
} As for amd64, it would be great if I could boot a kernel once. It could
} simplify testing out a new kernel. Not that a few lines of boot.cfg can't do
} that but still.
} 
} > I don't know if there is enough of a persistent environment for UEFI boots
} > (I would guess there is), and probably no easy way for BIOS boot.
} 
} I could imagine some BIOS/UEFI wiping all DRAM on reboot for security reasons.

 UEFI has the concept of persistent variable storage (key/value
store).  See Section 8.2 "Variable Services" of the UEFI spec.

}-- End of excerpt from Reinoud Zandijk



Re: modules item #14 revisited

2019-12-07 Thread John Nemeth
On Dec 7,  4:31pm, Christos Zoulas wrote:
} On Dec 7,  8:55pm, a...@absd.org (David Brownlee) wrote:
} 
} | Very much like this - would assume that modules.tgz goes away?

 I can't say I'm a fan of this.  I would hope that it goes away
once we get serious about having a stable KABI for modules.  Modules
shouldn't be tied to a particular kernel the way they currently
are.

} This is a good question. The problem is that if every kernel in a
} distribution includes its own copy of modules, we'll end up bloating
} the distribution a lot. For example on amd64 we distribute 4 kernels:
} 
} kern-GENERIC.tgz
} kern-GENERIC_KASLR.tgz
} kern-XEN3_DOM0.tgz  
} kern-XEN3_DOMU.tgz  
} 
} Does each one gets a copy of the "appropriate" (since XEN needs different

 Packing modules with the kernel kinda implies this.

} modules)? Or do we have a modules.tgz and a modules.xen3.tgz to be unpacked
} together with the kernel? And how is the unpacking done?

 This method would require sysinst to "know" what modules go with
which kernel and to setup things accordingly.

 BTW:

% tax tvzpf modules.tar.xz
drwxr-xr-x  0 root   wheel   0 Nov 10 04:16 .
drwxr-xr-x  0 root   wheel   0 Nov 10 03:01 ./etc
drwxr-xr-x  0 root   wheel   0 Nov 10 03:03 ./etc/mtree
-rw-r--r--  0 root   wheel  185780 Nov 10 04:16 ./etc/mtree/set.modules
drwxr-xr-x  0 root   wheel   0 Nov 10 02:48 ./stand
drwxr-xr-x  0 root   wheel   0 Nov 10 02:43 ./stand/amd64
drwxr-xr-x  0 root   wheel   0 Nov 10 02:43 ./stand/amd64/9.99.17
drwxr-xr-x  0 root   wheel   0 Nov 10 02:48 ./stand/amd64/9.99.17/modules
drwxr-xr-x  0 root   wheel   0 Nov 10 02:43 
./stand/amd64/9.99.17/modules/accf_dataready
-r--r--r--  0 root   wheel   19480 Nov 10 01:22 
./stand/amd64/9.99.17/modules/accf_dataready/accf_dataready.kmod
...
drwxr-xr-x  0 root   wheel   0 Nov 10 02:48 ./stand/amd64-xen
drwxr-xr-x  0 root   wheel   0 Nov 10 02:48 ./stand/amd64-xen/9.99.17
drwxr-xr-x  0 root   wheel   0 Nov 10 02:48 
./stand/amd64-xen/9.99.17/modules
drwxr-xr-x  0 root   wheel   0 Nov 10 02:48 
./stand/amd64-xen/9.99.17/modules/accf_dataready
-r--r--r--  0 root   wheel   19480 Nov 10 02:03 
./stand/amd64-xen/9.99.17/modules/accf_dataready/accf_dataready.kmod
...

Note that i386 includes an additional set of modules for Xen PAE.

}-- End of excerpt from Christos Zoulas


Re: racy acccess in kern_runq.c

2019-12-07 Thread John Nemeth
On Dec 6,  5:22pm, Don Lee wrote:
}
} Writing Kernel code *requires* knowledge of what code is generated
} sometimes.  In my experience, there have been standard techniques,
} like pragmas and insertions of assembly code to suppress this
} sort of undesirable optimization.
} 
} Don't those techniques exist any more?  My compiler friends used
} to put them in for just this purpose, and they tried to make them
} as portable as possible.  Surely GCC does this.  No?

   Pragmas and assembly code, of course, still exist, but they are
extremely unportable.  It is my understanding that Clang emulates
many of GCCs pragmas, but there is no guarantee as pragmas are very
much a compiler dependent feature.  Writing short bits of assembly
code is going to have dependence on the compiler and the CPU.  Even
within a particular CPU architecture, there are often significantly
varying feature sets.  Both of these options are best avoided in
portable code.  Newer versions of the C standard provide ways to
annotate your code to tell the compiler what you want, which is
what one should be using.

}-- End of excerpt from Don Lee


Re: racy acccess in kern_runq.c

2019-12-07 Thread John Nemeth
On Dec 6,  3:02pm, Jason Thorpe wrote:
} > On Dec 6, 2019, at 11:44 AM, paul.kon...@dell.com wrote:
} > 
} > For clean semantics, I like ALGOL; too bad it is no longer used
} 
} There's just too much shouting in ALGOL.

 Are you perhaps thinking of COBOL, which is traditionally all
upper case.  I could be mistaken since I've never written and likely
have never seen ALGOL, but I have written COBOL.

}-- End of excerpt from Jason Thorpe


Re: Adding an ioctl to check for disklabel existence

2019-10-03 Thread John Nemeth
On Oct 3,  2:42pm, Rhialto wrote:
} On Wed 02 Oct 2019 at 19:40:01 -0700, John Nemeth wrote:
} >  Cloning disks always presents issues.  However, gpt(8) has
} > grown a "uuid" command to generate new UUIDs.  This was primarily
} > done to help with the cloning problem.  Cloning a disk and then
} > putting two disks with the same UUIDs on the same system is an
} > operator error.
} 
} I was thinking the other day that it might be useful if gpt had a
} subcommand to spit out a script to duplicate the partitioning of a disk,
} but without the "unique" parts. The script would of course be
} hand-editable for any changes one might want to make.

 By "unique" parts, do you mean just the UUIDs, or do you mean
other parts as well?  What would the output look like?

} Such functionality would be the equivalent of using disklabel to get the
} editable text-version of a disklabel from one disk, and applying it to a
} different disk.
} 
} I use this sort of thing if I want to create a backup disk which should
} have the same layout as the original.
} 
} There is "gpt backup", but the manual page mentions "It should not be
} modified" and it contains the guids of the partition which in many cases
} should NOT be duplicated. It also isn't very readable.

 The bit about not modifying it is more of a caution.  You can
modify it if you're careful (checksums are recomputed during
restore).  Obviously if you make an error, it may not be usable
for restore, so you should work with a copy.

}-- End of excerpt from Rhialto


Re: Adding an ioctl to check for disklabel existence

2019-10-02 Thread John Nemeth
On Oct 2,  9:47pm, Mouse wrote:
}
} >> If _that_'s what you're concerned about, then just grow the relevant
} >> fields (and, presumably, change the magic number).
} > Any change to the label format or semantics would make it be a
} > completely different object, no longer compatible with anything.
} 
} Of course.
} 
} > If we were going to invent something new that way, we may as well
} > make it lots better - and ideally compatible with other systems so we
} > can read one another's drives.
} 
} > That's what GPT is, and it is already supported.
} 
} Except for the "better" part, which is a matter of opinion.  I don't
} care for GPT.
} 
} - Partitions are huge.  128 bytes, when you actually need maybe 18 (and
}about a quarter of even that is pure future-proofing paranoia).

 I assume that you're talking about the size of a partition
entry here.  The size might be somewhat wasteful, but on a
multi-terabyte drive isn't a particularly large concern.

} - Partition types are UUIDs.  Why use 16 bytes when you have, in the
}information-theoretic sense, maybe three bits of information?
}(At least, I can't recall the last time I used a partition type
}other than FFS, unused, swap, MSDOS, or NTFS - and NTFS only for
}trying to figure out what's on Windows disks.  Six bits, maybe,
}based on doubling the number of types I see in disklabel_gpt.h.  My
}"18" above assumes 16 bits.)

 I have 38 types in my list, which I'm sure is incomplete.
That's a bit more than three bits.  When the space is small compared
to the utilisation, collisions are highly likely.  Anybody can
generate an UUID and if done properly the likelihood of collisions
is very small.  This would be the obvious reason for using an UUID.
Remember that GPT is a standard meant to be used by all operating
systems on multiple platforms

} - Partition names(!) are character strings, rather than octet strings
}or integers (an encoding is specified).  This mandates a lot of
}character-handling crap that does not belong in boot blocks and
}arguably does not belong in operating systems.  It most certainly is
}not appropriate for the partition table format to mandate/demand the
}presence of Unicode support in the operating system.
} 
} It does have one nice thing, a (supposedly-)unique partition
} identifier.  (Of course, nothing can ensure this actually _is_ unique;
} if nothing else, cloning an entire disk will perforce clone these
} `unique' IDs.)  But it doesn't need to be 16 bytes long!  Even adding
} 14 bytes of partition ID to the 18 bytes above still comes out to only
} a quarter of the space GPT burns on each partition.

 Cloning disks always presents issues.  However, gpt(8) has
grown a "uuid" command to generate new UUIDs.  This was primarily
done to help with the cloning problem.  Cloning a disk and then
putting two disks with the same UUIDs on the same system is an
operator error.

} > Why would we want to invent something new, just to be different?
} 
} No, not to be different.  To be better.

 Ah yes, https://xkcd.org/927/

} Except, of course, that (I assume) you don't think it _would_ be
} better.

 Better or not better is not really the point, being incompatible
for specious reasons is.

}-- End of excerpt from Mouse


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-30 Thread John Nemeth
On Sep 30,  1:06pm, Michael van Elst wrote:
} On Mon, Sep 30, 2019 at 12:37:38AM -0700, John Nemeth wrote:
} 
} > BTW, modules.conf isn't read by the kernel, it's read by
} > /etc/rc.d/modules.  Putting anything in there that would have a
} > lasting effect (i.e. parameters for autoloaded modules) would
} > require quite a bit of work.
} 
} You could just store the parameters in the kernel so that a future
} autoload will use these instead of or merged with the plist.

 You could do this using the backend code for the proposed
sysctl.  The question then is how do you know when the .plist is
changed? Would you attach some kind of a kevent to it?  If so, then
you need to track the source of the entry in the "blacklist".  If
it came from the .plist, then upon receiving notification that it
has changed, then you want to delete the entry.  However, if it
came from userland via sysctl or is part of the default list, then
you don't want to delete the entry just because the .plist changed.
This is starting to get complicated with corner cases.

}-- End of excerpt from Michael van Elst


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-30 Thread John Nemeth
On Sep 30,  7:10am, Michael van Elst wrote:
} chris...@astron.com (Christos Zoulas) writes:
} >In article <20190929090053.g...@homeworld.netbsd.org>,
} >  wrote:
} >>On Sat, Sep 28, 2019 at 01:29:39AM -, Christos Zoulas wrote:
} >>> + "compat_linux",
} >>> + "compat_linux32",
} >>
} >>As for the actual change, I'd like to see it integrated through
} >>modules.conf, not via settings of default sysctl values. I think it's
} >>bad user experience.
} 
} >modules.conf contains module names and their arguments. It is a configuration
} >file for each module. There are already sysctls in the kern.module. tree all
} >related to autoloading.
} 
} Everything currently in modules.conf is loaded permanently. One argument
} for adding autoload support would be that it allows to configure module
} parameters in a common place, as autoloaded modules cannot get parameters
} yet. It could also be used to configure policies (e.g. blacklists).

 Uh, that's not true.  If you store .plist in the same
directory with the module, it will be loaded and passed to the
module to provide parameters.  You can use "modload -p" to create
the file (see module(7)).  Also, if you put a special flag in the
.plist the module won't be autoloaded, see this from module(9):

   The directory from which the module is loaded will be searched for
   a file with the same name as the module file, but with the suffix
   ``.plist''.  If this file is found, the prop_dictionary it contains
   will be loaded and passed to the module's modcmd() routine.  If
   this prop_dictionary contains a ``noautoload'' property which is
   set to ``true'' then the system will refuse to load the module.

BTW, modules.conf isn't read by the kernel, it's read by
/etc/rc.d/modules.  Putting anything in there that would have a
lasting effect (i.e. parameters for autoloaded modules) would
require quite a bit of work.  Although it could be made to specify
modules not to autoload by having it use christos' kern.module.noautoload
sysctl.

}-- End of excerpt from Michael van Elst


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread John Nemeth
On Sep 26,  7:40pm, Christos Zoulas wrote:
} In article <390f4c81-bf1c-443f-f7a9-a379c46b7...@m00nbsd.net>,
} Maxime Villard   wrote:
} >I recently made a big set of changes to fix many bugs and vulnerabilities in
} >compat_linux and compat_linux32, the majority of which have a security impact
} >bigger than the Intel CPU bugs we hear about so much. These compat layers are
} >enabled by default, so everybody is affected.
} >
} >Secteam is in a state where no one is willing to pull up all the changes to
} >the stable branches, because of the effort. No one is willing to write a
} >security advisory either. When I say "no one", it includes me.
} >
} >The proposal and discussion held in this 2017 thread still hold and are
} >unchanged two years later:
} >
} > https://mail-index.netbsd.org/tech-kern/2017/07/31/msg022153.html
} >
} >The compat layers are largely untested, often broken, and are a security risk
} >for everybody. Keeping them enabled for the <1% users interested means 
keeping
} >vulnerabilities for the >99% who don't use these features.
} >
} >In the conversation above, we hit the problem that there was cross-dependency
} >among compat modules, and we couldn't selectively disable specific layers.
} >Today this is possible thanks to pgoyette's work. That is, it is possible to
} >comment out "options COMPAT_LINUX" from GENERIC, and have a compat_linux.kmod
} >which will modload correctly and be able to run Linux binaries out of the 
box.
} >Under this scheme, the feature would be only one root command away from being
} >enabled in the kernel.
} >
} >Therefore, I am making today the same proposal as Taylor in 2017, because the
} >problem is still there exactly as-is and we just hit it again; the solution
} >however is more straightforward.
} 
} I propose something very slightly different that can preserve the current
} functionality with user action:
} 
} 1. Remove them from standard kernels in architectures where modules are
}supported. Users can add them back or just use modules.
} 2. Disable autoloading, but provide a sysctl to enable autoloading
}(1 global sysctl for all compat modules). Users can change the default
}in /etc/sysctl.conf (adds sysctl to the proposal)

 You mean this (first line):

i386devel: {31} sysctl kern.module
kern.module.autoload = 0
kern.module.verbose = 0
kern.module.path = /stand/amd64-xen/8.99.26/modules
kern.module.autotime = 10

Or, did you want an additional sysctl that is specific to compat
modules.  Then there is the question of whether it includes
COMPAT_NETBSDxx?  I know the discussion has focused on COMPAT_LINUX,
but there are plenty of other COMPAT_*.

 There is also my earlier message about only being able to load
modules at secure level 0.  This makes them more difficult to use
if you don't have options INSECURE.  With the new kernel side
graphics card drivers, we should looking at removing that by default.

}-- End of excerpt from Christos Zoulas


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread John Nemeth
On Sep 26,  5:18pm, Maxime Villard wrote:
} Le 26/09/2019 à 17:15, Manuel Bouyer a écrit :
} > On Thu, Sep 26, 2019 at 05:10:01PM +0200, Maxime Villard wrote:
} >> issues for a clearly marginal use case, and given the current general
} >   ^^^
} > 
} > This is where we dissagree. You guess it's marginal but there's no
} > evidence of that (and there's no evidence of the opposite either).
} 
} Can you provide evidence that it is used by the majority of the users?
} And that therefore keeping vulnerabilities for 100% of the people is
} legitimate?
} 
} Please provide clear evidence.

 You are the one making the claim, it is your responsibility
to back up the claim.  Trying to push the responsibility to disprove
your claim to the opposite side is a completely bogus way of
debating.

}-- End of excerpt from Maxime Villard


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread John Nemeth
On Sep 26,  4:40pm, Maxime Villard wrote:
} Le 26/09/2019 à 16:36, Manuel Bouyer a écrit :
} > On Thu, Sep 26, 2019 at 04:29:52PM +0200, Maxime Villard wrote:
} >> Le 26/09/2019 à 16:22, Mouse a écrit :
} >> Keeping them enabled for the <1% users interested means keeping
} >> vulnerabilities for the >99% who don't use these features.
} > Are the usage numbers really that extreme?  Where'd you get them?  I
} > didn't think there were any mechanisms in place that would allow
} > tracking compat usage.
}  No, there is no strict procedure to monitor compat usage, and there
}  never will be.  Maybe it's not <1%, but rather 1.5%; or maybe it's
}  5%, 10%, 15%.
} >>>
}  Who cares, exactly?
} >>>
} >>> The short answer is "anyone who wants NetBSD to be useful".
} >>>
} >>> If it really is only a tiny fraction - under ten people, say - then,
} >>> sure, yank it out.  If it's 90%, removing it would lose most of the
} >>> userbase, possibly provoke a fork.  15%, 40%, I don't think there is a
} >>> hard line between "pull it" and "keep it", and even if there were I'm
} >>> not sure it would matter because it appears nobody knows what the
} >>> actual use rate is anyway.
} >>
} >> What is known, however, is that 100% of the users are affected by the
} >> vulnerabilities. So, do we keep these things enabled by default just
} >> because "uh we don't know so we shouldn't do anything"? Even as it's
} >> already been clear that the majority doesn't use compat_linux?
} > 
} > Actually this is not clear. We have linux binaries in pkgsrc.
} 
} ... And? We have 22000 packages in pkgsrc.
} 
} >> Is it such a Herculean effort to type "modload compat_linux" for the
} >> people that want to use Linux binaries? In order to keep the majority
} >> safe from the bugs and vulnerabilities?
} > 
} > Maybe some of them don't even know they are using compat_linux ...
} 
} Yeah, and maybe I'm the Pope also, who knows.

 Now, you're just being obtuse.  Although it is within the
realm of possibility that you could be the pope operating under an
alias, the likelihood of that being the case is so small as to be
negligable.  The pope is an extremely well known entity who's every
action is closely monitored thus it would be extremely difficult
for the pope to live a clandestine life as a TNF developer.  Also,
the known background of the the pope does not include software
developement.  If we want to throw out absuridities, it is far more
likely that you're Julian Assange.

}-- End of excerpt from Maxime Villard


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread John Nemeth
On Sep 26,  3:51pm, Maxime Villard wrote:
} Le 26/09/2019 à 15:06, Mouse a écrit :
} >> [...] compat_linux and compat_linux32 [...]
} > 
} >> Keeping them enabled for the <1% users interested means keeping
} >> vulnerabilities for the >99% who don't use these features.
} > 
} > Are the usage numbers really that extreme?  Where'd you get them?  I
} > didn't think there were any mechanisms in place that would allow
} > tracking compat usage.
} 
} No, there is no strict procedure to monitor compat usage, and there never will
} be. Maybe it's not <1%, but rather 1.5%; or maybe it's 5%, 10%, 15%.

 In other words, you're just pulling numbers out of your butt.
There's a common expression, "There are lies, damned lies, and
statistics."  Things like this contribute to that expression.  Of
course, this can't be called a statistic in any real sense as it
is just a made up number.

} Who cares, exactly?

 Anybody that wants to have a serious discussion.

}-- End of excerpt from Maxime Villard


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread John Nemeth
On Sep 26, 10:17am, Maxime Villard wrote:
}
} I recently made a big set of changes to fix many bugs and vulnerabilities in
} compat_linux and compat_linux32, the majority of which have a security impact
} bigger than the Intel CPU bugs we hear about so much. These compat layers are
} enabled by default, so everybody is affected.
} 
} Secteam is in a state where no one is willing to pull up all the changes to
} the stable branches, because of the effort. No one is willing to write a
} security advisory either. When I say "no one", it includes me.
} 
} The proposal and discussion held in this 2017 thread still hold and are
} unchanged two years later:
} 
}   https://mail-index.netbsd.org/tech-kern/2017/07/31/msg022153.html
} 
} The compat layers are largely untested, often broken, and are a security risk
} for everybody. Keeping them enabled for the <1% users interested means keeping
} vulnerabilities for the >99% who don't use these features.

 Where did you get your statistics?  I'm not saying that they
are wrong, just that I won't accept them without evidence.

} In the conversation above, we hit the problem that there was cross-dependency
} among compat modules, and we couldn't selectively disable specific layers.
} Today this is possible thanks to pgoyette's work. That is, it is possible to
} comment out "options COMPAT_LINUX" from GENERIC, and have a compat_linux.kmod
} which will modload correctly and be able to run Linux binaries out of the box.
} Under this scheme, the feature would be only one root command away from being
} enabled in the kernel.

 This is assuming that you're running with options INSECURE,
otherwise you need to add it to /etc/modules.conf to have the module
loaded at boot time and remain loaded.

} Therefore, I am making today the same proposal as Taylor in 2017, because the
} problem is still there exactly as-is and we just hit it again; the solution
} however is more straightforward.
}-- End of excerpt from Maxime Villard


Re: Regarding the ULTRIX and OSF1 compats

2019-03-15 Thread John Nemeth
On Mar 15, 10:31pm, Michael Kronsteiner wrote:
}
} i have this discussion today aswell... considering 64/32bit machines.
} if you want ultrix, install ultrix. if you want osf1/dec unix/tru64
} install that. being able to run ummm nearly 20 year old binaries...
} well. if thats what you want be prepared for a ride. i never ran
} "foreign" binaries on a BSD. and i often compile myself even on more
} "user friendly" systems.

 By any chance, have you seen our About page:
http://www.netbsd.org/about/ ?  The second paragraph reads thus:

-
One of the primary focuses of the NetBSD project has been to make
the base OS highly portable. This has resulted in NetBSD being
ported to a large number of hardware platforms. NetBSD is also
interoperable, implementing many standard APIs and network protocols,
and emulating many other systems' ABIs.
-

Emulating other systems is fundamental to what NetBSD is about.

}-- End of excerpt from Michael Kronsteiner


Re: Regarding the ULTRIX and OSF1 compats

2019-03-10 Thread John Nemeth
On Mar 10, 12:16pm, Maxime Villard wrote:
} Le 10/03/2019 à 11:25, Björn Johannesson a écrit :
} >
} > COMPAT_ULTRIX (mips) works fine which I recently discovered after shuffling
} > some disks and NetBSD8 mounted the ULTRIX disk as /
} 
} This more likely means that it was an old UFS disk that we do support by
} default in our UFS/FFS code, but I hardly see how this could be related to
} COMPAT_ULTRIX.

 I'm assuming here that he actually ran at least some of the
binaries.  Of course, it is possible that my assumption might be
mistaken.

 But, this does raise good, albeit separate, point that it
would be a bad idea to remove support for older versions of FFS.

} Which MIPS are you talking about by the way? Pmax I guess? Because
} COMPAT_ULTRIX is disabled on the majority of our MIPSs.
} 
} > Not that I have terribly much use for it (except maybe maple) but I would
} > still like it to be kept in.
} 
} I would tend to think that a good reason needs to go a bit farther than just
} "I'd like to keep it in"...

 Maybe so, but the onus is on the person wanting to do the
deletion to provide a good reason.  If COMPAT_ULTRIX is just a thin
shim on top of COMPAT_43 as somebody else has said, then is it
really causing a problem? As for lack of maintenance, it isn't a
moving target, so it only needs maintenance to keep up with related
changes to other parts of the kernel.

} When it comes to Maple, it is already available on Linux, and we do have
} COMPAT_LINUX.

 True, but COMPAT_LINUX is a moving target, and COMPAT_LINUX
has fallen behind significantly.  It is well known that many modern
Linux binaries won't run.

} In fact, nowadays, the vast majority of proprietary binaries compiled on
} UNIX-like systems are available on Linux, and we do support Linux emulation,
} so we're covered for the most part.

 Not exactly.

}-- End of excerpt from Maxime Villard


Re: svr4, again

2019-03-09 Thread John Nemeth
On Mar 9,  6:38am, "Jonathan A. Kollasch" wrote:
} On Sat, Mar 09, 2019 at 11:28:05AM +0100, Maxime Villard wrote:
} > Re-reading this thread - which was initially about SVR4 but which diverged 
in
} > all directions -, I see there were talks about retiring COMPAT_ULTRIX and
} > COMPAT_OSF1, because these were of questionable utility, in addition to 
being
} > clear dead wood (in terms of use case, commits in these areas, and ability 
to
} > test changes).
} > 
} > Does anyone have anything to say?
} 
} Possibly, although I doubt they'll notice they want to say something in
} this thread with the current Subject line...  I'd suggest starting a
} new thread, possibly CCing port-pmax@ and port-alpha@ as relevant.

 Ah, don't forget about port-vax...  I even have a uVAX 1000 running
Ultrix.  Although it's been quite some time since I've powered it up and
I think it was having problems with the hard drive.

}-- End of excerpt from "Jonathan A. Kollasch"


Re: Reserve device major numbers for pkgsrc

2019-02-16 Thread John Nemeth
On Feb 16, 11:25pm, Kamil Rytarowski wrote:
} 
} We started to build and ship kernel modules through pkgsrc.

 This is a really good thing and is part of the reason why
modules exist.

} I would like to reserve 3 major numbers for the HAXM case from the base
} pool of devices and prevent potential future conflicts and compatibility
} breakage due to picking up another major number in a 3rd party software.
} 
} Where and how to reserve these major numbers?

 The ideal thing is to not reserve numbers at all and have them
allocated dynamically.  This requires the module reporting the
number that was allocated to userland somehow.  However, the only
thing that is coming to mind off the top of my head is printf(9)
which would normally land in /var/log/messages.  This is obviously
not very convenient.  Does anybody else have thoughts on this?
Maybe we need to extend modctl(MODCTL_LOAD, ...) to be able to
return information from the loaded module?

 Of course, the real ideal thing would be to get a devfs and
get rid of the concept of major numbers.  Here I go again, starting
contriversies.  :->

}-- End of excerpt from Kamil Rytarowski


Re: scsipi: physio split the request

2018-12-27 Thread John Nemeth
On Dec 27,  6:49pm, Michael van Elst wrote:
} m...@netbsd.org (Emmanuel Dreyfus) writes:
} 
} >Is there a reason other than historical for NetBSD 64kB limit?
} 
} It's a compromise. Some buffers are statically sized for MAXPHYS
} and some ancient hardware cannot exceed 64k (or even less) DMA transfers.
} The buffer size is mostly a problem because we don't support
} scatter-gather transfers, so the buffers need to be contigous in
} physical RAM (and some hardware doesn't support s-g either).
} 
} So far that's mostly a problem with software raid and modern tape I/O.

 Wouldn't hardware RAID also benefit from bigger buffers?
Although, I suppose a battery backed cache be used to workaround
small transfer sizes.

}-- End of excerpt from Michael van Elst


Re: Support for tv_sec=-1 (one second before the epoch) timestamps?

2018-12-16 Thread John Nemeth
On Dec 16,  1:20pm, Mouse wrote:
}
} >> Not sure about that, but I agree that we should not extend the range
} >> of time_t (aka "seconds since the epoch") to negative values.
} > I'm not sure why anyone thinks that ship didn't sail years ago.
} 
} > % cal 6 1942
} 
} How is that relevant to time_t?

 Indeed, I just looked at the source for cal(1).  It uses time_t
in two places.  The first is if you use cal with no arguments, it
uses it in getting the current time.  The second is in the day_array()
function.  The comment above it says:

/*
 * day_array --
 *  Fill in an array of 42 integers with a calendar.  Assume for a moment
 *  that you took the (maximum) 6 rows in a calendar and stretched them
 *  out end to end.  You would have 42 numbers or spaces.  This routine
 *  builds that array for any month from Jan. 1 through Dec. .
 */

I haven't fully analyzed it, but I suspect it could be done in a
different way.  At no point is any math done on a time_t variable.

}-- End of excerpt from Mouse


Re: Support for tv_sec=-1 (one second before the epoch) timestamps?

2018-12-14 Thread John Nemeth
On Dec 14,  2:38pm,  wrote:
} > On Dec 14, 2018, at 9:30 AM, Joerg Sonnenberger  wrote:
} > On Thu, Dec 13, 2018 at 02:37:06AM +0100, Kamil Rytarowski wrote:
} >> In real life it's often needed to store time_t pointing before the UNIX
} >> epoch.
} > 
} > Again, I quite disagree and believe that you are confusing two different
} > things. It makes perfect sense in certain applications to store time as
} > relative to the UNIX epoch. But that's not the same as time_t which is a
} > specific type for a *system* interface. I'm strongly question the
} > sensibility of trying to put dates before 1970 in the context of time_t.
} 
} I'm not sure if people care about this example, but here's one:
} if you want to archive old files with their original timestamps,
} and those files predate the epoch.

 Where are you going to get files that predate the epoch?  I would
expect those to be extremely rare.

}-- End of excerpt from 


Re: Things not referenced in kernel configs, but mentioned in files.*

2018-11-14 Thread John Nemeth
On Nov 13,  7:33am, Jason Thorpe wrote:
} > On Nov 13, 2018, at 7:15 AM, John Nemeth  wrote:
} > 
} > That's a different kind of unusable.  :-)  That puts it in
} > the same camp as strip, where there may be functioning hardware,
} > but you can't do anything with the hardware.
} 
} ...and when you can't do anything with the hardware, people don't use
} (i.e. "test by dogfooding") the drivers, which leads to bit rot and
} maintenance headaches.

 As I noted, it's not quite the same.  Assuming that our ISDN
stack was capable of acting as "network" side, you could have used
it in a back-to-back configuration.  Granted, that's probably not
very interesting except for special circumstances.  It's my
understanding, which may be incorrect, that strip required a central
node, and without that you couldn't do anything.

}-- End of excerpt from Jason Thorpe


Re: Things not referenced in kernel configs, but mentioned in files.*

2018-11-13 Thread John Nemeth
On Nov 13,  7:10am, Martin Husemann wrote:
} On Mon, Nov 12, 2018 at 05:18:41PM -0800, John Nemeth wrote:
} >  Was the ISDN code usable?  Something in the back of my mind is telling
} > me that it wasn't and thus was just clutter.
} 
} It was usable, but even here it is hard (impossible?) to get real ISDN
} land lines nowadays.

 That's a different kind of unusable.  :-)  That puts it in
the same camp as strip, where there may be functioning hardware,
but you can't do anything with the hardware.  Granted you could
setup private connections if you wish (I once put a PRI card in a
Linux box and made it network side for testing purposes).

 On a side note, ISDN land lines are very common here in the
form of PRI used for trunking purposes.  There would be some use
for that when coupled with Asterisk (or another soft PBX) to create
a phone system.  However, for that, we would need DAHDI which we
don't currently have.  DAHDI is somewhat on my radar, but given
that you can do pretty much everything by talking SIP to an external
box it isn't my highest priority.  If somebody else wishes to port
DAHDI that would be great!

}-- End of excerpt from Martin Husemann


Re: Things not referenced in kernel configs, but mentioned in files.*

2018-11-12 Thread John Nemeth
On Nov 12,  2:12pm, Jason Thorpe wrote:
} > On Nov 12, 2018, at 1:59 PM, John Nemeth  wrote:
} > } On Nov 12,  1:16pm, Jason Thorpe wrote:
} > } > On Nov 12, 2018, at 11:12 AM, John Nemeth  wrote:
} > } > 
} > } > wbsio and wt also seems to fit in that category.
} > } 
} > } Isn't "wt" an ancient PC tape drive?  We should make an effort
} > 
} > Yes.
} > 
} > } to prune more deadwood drivers.
} > 
} > How do we know that it's deadwood?  How do we know somebody
} > out there doesn't still have functioning hardware?  I will grant
} > you that in this case, it is unlikely as those things were total
} > junk.  But, still...
} 
} We managed to do it w/ the ISDN code.  I suggest we put together
} a list, post it to netbsd-announce, and note that there's always
} the Attic.

 Was the ISDN code usable?  Something in the back of my mind is telling
me that it wasn't and thus was just clutter.

}-- End of excerpt from Jason Thorpe


Re: Things not referenced in kernel configs, but mentioned in files.*

2018-11-12 Thread John Nemeth
On Nov 12,  1:16pm, Jason Thorpe wrote:
} Subject: Re: Things not referenced in kernel configs, but mentioned in fil
} > On Nov 12, 2018, at 11:12 AM, John Nemeth  wrote:
} > 
} > wbsio and wt also seems to fit in that category.
} 
} Isn't "wt" an ancient PC tape drive?  We should make an effort

 Yes.

} to prune more deadwood drivers.

 How do we know that it's deadwood?  How do we know somebody
out there doesn't still have functioning hardware?  I will grant
you that in this case, it is unlikely as those things were total
junk.  But, still...

}-- End of excerpt from Jason Thorpe


Re: Things not referenced in kernel configs, but mentioned in files.*

2018-11-12 Thread John Nemeth
On Nov 12,  3:38pm, co...@sdf.org wrote:
} On Mon, Nov 12, 2018 at 10:23:26AM -0500, Greg Troxel wrote:
} > co...@sdf.org writes:
} > 
} > > This is an automatically generated list with some hand touchups, feel
} > > free to do whatever with it. I only generated the output.
} > >
} > > ac100ic
} > > acemidi
} > > acpipmtr
} > > [snip]
} > 
} > I wonder if these are candidates to add to an ALL kernel, and if it will
} > turn out that they are mostly not x86 things.
} > 
} > I see we only have ALL for i386/amd64.  I wonder if it makes sense to
} > have one in evbarm.
} 
} The actual search was roughly (and I didn't re-test these commands)
} find src/sys -name 'files.*' | xargs grep 'attach' | awk '{print $2}' > 
drivers
} for i in `cat drivers`; do echo $i; grep "^$i[^a-z]" src/sys/arch/*/conf/*; 
done |grep -v ALL > appearances-in-configs
} grep -B 1 '[^0-9]0$' appearances-in-configs > no-appearance-in-configs
} 
} 
} And some manual removal of things that are obviously not drivers,
} removing duplicates, sorting...
} 
} So, I am excluding things that appear in ALL, and I am not checking if
} they appear as modules.
} 
} So far I had complaints about the appearance of 'lm' which cannot be
} safely included in a default kernel, for example.

 wbsio and wt also seems to fit in that category.  Maybe change
the regex to "^#?$i[^a-z]" to catch commented out things.  Of
course, the flip side is that things that are commented out are
naturally going to get less testing.

}-- End of excerpt from co...@sdf.org


Re: Missing compat_43 stuff for netbsd32?

2018-09-11 Thread John Nemeth
On Sep 11,  6:38pm, Thor Lancelot Simon wrote:
} On Tue, Sep 11, 2018 at 03:35:24PM +, Eduardo Horvath wrote:
} > 
} > It's probably only useful for running ancient SunOS 4.x binaries, maybe 
} > Ultrix, Irix or OSF-1 depending on how closely they followed BSD 4.3.
} 
} Actually, I think amd64, sparc64, and mips64 are the only platforms where
} it could even be possible to encounter netbsd32 executables that required
} system calls that had the "o" names in 4.3BSD.
} 
} On amd64, because i386 architecture SunOS 4 executables exist and I am not
} sure the SunOS 4 kernel did actually pick up all the new syscalls from

 Yeah, but that would be the Roadrunner, which I believe was
pretty rare, and a bit of a quirky system (software and hardware
wise).  I did lay eyes on one once many years, but I don't believe
I touched that one.  I suspect that not very many people have even
seen one.

} 4.3BSD.  Whether such executables would run at all though, I'm not sure;

 Personally, I have my doubts.

} there is probably other COMPAT_SUNOS code needed that may not work on i386.
} 
}-- End of excerpt from Thor Lancelot Simon


Re: Kernel module framework status?

2018-05-05 Thread John Nemeth
On May 5, 10:17am, m...@netbsd.org wrote:
}
} If someone wants to do this route of metadata, please consider the
} addition of a metadata property "should this be auto loaded".
} 
} Currently we have ad-hoc logic for some modules that might be auto
} loaded (compat_...) and it'd probably be cleaner to do this.

 This appears to be a complete misunderstanding.  There isn't
some magic way that modules get autoloaded.  There has to be
something that triggers the loading.  In the case of compat modules,
when the kernel tries to execute a binary and finds that it doesn't
recognise it, it then tries varies modules to see if they recogonise
it.  A similar thing would happen with file systems.  if you try to
mount some media and the kernel doesn't recogonise the file system,
it will try to load various file system modules to see they recogonise
it.  All modules are inherently autoloadable.  There just has to
be some kind of mechanism to trigger the load.  The only thing that
additional metadata would provide is classification, such as file
system, syscall, driver, exec, etc.  However, we already have a
mechanism that can be used for this purpose.  Note that it is
possible to set a module not to be autoloadable, see module_autoload()
in module(9).

}-- End of excerpt from m...@netbsd.org


Re: Kernel module framework status?

2018-05-04 Thread John Nemeth
On May 3, 10:54pm, Mouse wrote:
}
} >  There is also the idea of having a module specify the device(s)
} > it handles by vendor:product
} 
} Isn't that rather restrictive in what buses it permits supporting?

 I suppose that other types of identifiers could be used.

} Indeed, PCI (and close relatives, like PCIe) and USB are the only
} things I can name offhand that even _have_ vendor:product.  (Of course,
} I'm sure there are lots of buses out there I've never heard of, or
} don't know enough about.)

 Only buses where the devices are identified would work.  For
buses like ISA where you have to probe the devices, it would not
be workable.

}-- End of excerpt from Mouse


Re: Kernel module framework status?

2018-05-03 Thread John Nemeth
On May 2,  9:48am, Anders Magnusson wrote:
} 
} I'm trying to find some documentation of the status of the kernel 
} modules, but only finds some scattered postings.
} What is done, what is left, are there any decision points etc...?

 Paul Goyette has been making great strides on modularising
various things.

 I have a couple of projects in the back of my mind for the
module framework, such as being able to load .plist at boot
time, and being able to control when modules are loaded during the
the boot process.  The idea is that things like the pciverbose
module might want to load before autoconf whereas a disk driver
might want to load after autoconf but before findroot.

 There is also the idea of having a module specify the device(s)
it handles by vendor:product so that autoconf can simply hunt for
a module that way instead of having to know the name of the device.
Jared McNeill had some prototype code for this years ago, and I
probably have a copy of it somewhere.

 Then there is the whole thing that core put out a few years
ago about packaging kernels with the corresponding module.

 Long term I would like to see some kind of KABI so that modules
aren't so closely tied to a particular kernel.  But, this is
definitely a long term project/goal.

 Another nice thing to have would be a way to specify a schema
for a .plist file to specify the options that a particular
module takes.  Then a proper editor could be created for editing
.plist files.  Note that putting stuff in a .plist
file that the module doesn't understand isn't likely to cause any
harm as it would likely just be ignored.  However, if you misspell an
option, the module might not behave the way you expected.

}-- End of excerpt from Anders Magnusson


Re: virtual to physical memory address translation

2018-01-15 Thread John Nemeth
On Jan 15,  2:04pm, Michael van Elst wrote:
} m...@netbsd.org (Emmanuel Dreyfus) writes:
} 
} >Sorry if that has been covered ad nauseum, but I canot find relevant
} >information about that: on NetBSD, how can I get the physical memory
} >address given a virtual memory address? This is to port the Linux
} >Meltdown PoC so that we have something to test our systems against.
} 
} pmap_extract() returns the physical address of a virtual address.
} pmap_kernel() gives you the kernel map.

 I suspect that he wants to do this from userland.

}-- End of excerpt from Michael van Elst


Re: LVM and 4K sectors

2018-01-03 Thread John Nemeth
On Jan 3,  8:59pm, Benny Siegert wrote:
}
} I am trying to set up LVM on a 4T hard drive that has 4096-byte
} sectors. However:
} 
} # gpt create sd0
} # gpt add  sd0
} /dev/rsd0: Partition 1 added: 49f48d5a-b10e-11dc-b99b-0019d1879648 6 976754635
} # newfs /dev/rdk4
} /dev/rdk4: 3815447.8MB (976754635 sectors) block size 32768, fragment size 
4096
} using 5148 cylinder groups of 741.25MB, 23720 blks, 47104 inodes.
} super-block backups (for fsck_ffs -b #) at:
} 8, 189768, 379528, 569288, 759048, 948808, 1138568, 1328328, 1518088,
} 1707848, 1897608,
} ^C
} #
} # gpt type -i 1 -T linux-lvm sd0
} /dev/rsd0: Partition 1 type changed
} # lvm pvcreate -v -Z y /dev/rdk4
} Set up physical volume for "/dev/rdk4" with 976754635 available sectors
} Zeroing start of device /dev/rdk4
}   Physical volume "/dev/rdk4" successfully created
} # lvm pvs
}   PV VG   Fmt  Attr PSize   PFree
}   /dev/rdk4   lvm2 --   465.75g 465.75g
}   /dev/rsd1a vg0  lvm2 a-   931.51g 100.20g
} 
} The number of sectors is correct but it assumes that they are 512 bytes.
} 
} # lvm pvremove /dev/rdk4
}   Labels on physical volume "/dev/rdk4" successfully wiped
} # lvm pvcreate -v -Z y --setphysicalvolumesize 3726g /dev/rdk4
}   WARNING: /dev/rdk4: Overriding real size. You could lose data.
} /dev/rdk4: Pretending size is 7813988352 sectors.
} Set up physical volume for "/dev/rdk4" with 7813988352 available sectors
} Zeroing start of device /dev/rdk4
}   Physical volume "/dev/rdk4" successfully created
} # lvm pvs
}   PV VG   Fmt  Attr PSize   PFree
}   /dev/rdk4   lvm2 -- 3.64t   3.64t
}   /dev/rsd1a vg0  lvm2 a-   931.51g 100.20g
} 
} How safe will my data be on this? Is it a terrible idea to mix two
} disks in a VG if one of them has 512-byte sectors and one 4K sectors?

 Unfortunately, the LVM tools are quite old so you do need to
override the size for large disks.  I have done this a number of
times with no ill effect despite the warning.

 I don't see a real problem with this besides performance as
the drive or driver might have to an RMW cycle if less then 4K of
data is written.

}-- End of excerpt from Benny Siegert


Re: modstat and kaslr

2017-12-31 Thread John Nemeth
On Dec 31,  5:11pm, Maxime Villard wrote:
}
} Here is a patch [1] that hides the addresses of the kernel modules when
} 'modstat -k' is entered by an unprivileged user. The current behavior is
} preserved for root.
} 
} The addresses currently leaked cannot be used to reconstruct the layout of
} the kernel, since the module VAs are embedded in bootspace.boot, whose 
location
} is independent from that of each of the remaining kernel segments.
} 
} But it's still good not to leak such information, to limit the surface for ROP
} and a few other things, and this, also in the non-kaslr case. Ok?
} 
} [1] http://m00nbsd.net/garbage/module/modstat.diff

@@ -150,10 +159,13 @@
strlcpy(ms->ms_required, mi->mi_required,
sizeof(ms->ms_required));
}
-   if (mod->mod_kobj != NULL) {
+   if (mod->mod_kobj != NULL && stataddr) {
kobj_stat(mod->mod_kobj, , );
ms->ms_addr = addr;
ms->ms_size = size;
+   } else {
+   ms->ms_addr = 0;
+   ms->ms_size = 0;
}
ms->ms_class = mi->mi_class;
ms->ms_refcnt = -1;

 I don't see why you added the part where you set ms_addr and
ms_size to 0 given that the memory was kmem_zalloc'ed and thus we
know that it is already 0?

 Also, given the reason for preventing information leaks, I
would also make sure that the address isn't given out even for root
when secure_level has been elevated.

}-- End of excerpt from Maxime Villard


Re: Proposal to obsolete SYS_pipe

2017-12-24 Thread John Nemeth
On Dec 24,  9:37pm, Mouse wrote:
}
} > http://netbsd.org/~kamil/patch-00039-obsolete-SYS_pipe.txt
} 
} I see no pipe2(2), nor change from pipe(2) to pipe(3) (with an xref to
} pipe2(2)), both of which, it seems to me, should be part of this.

 From: http://netbsd.gw.com/cgi-bin/man-cgi?pipe2+2+NetBSD-current

HISTORY
 A pipe() function call appeared in Version 6 AT UNIX.  The pipe2()
 function is inspired from Linux and appeared in NetBSD 6.0.

My NetBSD 7.x systems have the manpage as well.  One might wish to
look for manpages on a system newer then 1.4T.  :->

 The big thing is that I don't see what the difference between
pipe(2) and pipe2(2) are, other then that pipe2(2) takes an extra
flags argument, i.e. I don't see how it solves the problem stated
in the original message.

}-- End of excerpt from Mouse


Re: amd64: kernel aslr support

2017-10-05 Thread John Nemeth
On Oct 5,  8:30pm, Thor Lancelot Simon wrote:
}
} >  * The RNG is not really strong. Help in this area would be greatly
} >appreciated.
} 
} This is tricky mostly because once you start probing for hardware
} devices or even CPU features, you're going to find yourself wanting
} more and more of the support you'd get from the "real kernel".
} 
} For example, to probe for RDRAND support on the CPU, you need a
} whole pile of CPU feature decoding.  To probe for environmental
} sensors or an audio device you may need to know a whole pile about
} ACPI and/or PCI.  And so forth.
} 
} EFI has a RNG API, but I think it's usually just stubbed out and
} besides, you can't rely on having EFI...

 It does, but it isn't listed as a runtime service, so likely
isn't available after ExitBootServices() is called, which would be
called by efiboot.

} I think I'd suggest some combination of:
} 
}   * Just enough CPU-feature support to find/use RDRAND
} (Intel's sample code is not that big and I think it's
}  suitably-licensed)
} 
}   * Hash the contents of the "CMOS RAM" and/or EFI boot variables

 We currently have no support, of which I'm aware, for accessing
EFI bot variables.

}   * Maybe poke around for an IPMI BMC (has environmental sensors),
} or a TPM (has a RNG) on the LPC bus
} 
}   * Maybe poke around for on-die temperature/voltage sensors
} (will again require some CPU identification support).
} 
}   * Rather than just using rdtsc once, consider using rdtsc to
} "time" multiple hardware oscillators against one another;
} at the very least, you've always got the hardware clock.
} 
}   * Also, you can use rdtsc to time memory accesses.
} 
} For quick and dirty "entropy extraction", you can crunch as much of this
} data as you're able to connect together using SHA512.
} 
} I know, little or none of this is easy.
} 
} Thor
}-- End of excerpt from Thor Lancelot Simon


Re: attaching cpu via lapic

2017-08-20 Thread John Nemeth
On Aug 19,  1:31am, "Cherry G. Mathew" wrote:
} 
} I'm trying to improve the semantics around x86 lapic vs. cpu, with a
} view to wedging in the concept of "vcpu"s.
} 
} TLDR: please review this patch:
} http://ftp.netbsd.org/pub/NetBSD/misc/cherry/tmp/attach-cpu-with-lapic.diff
} 
} Essentially, the idea is that when the kernel runs under a hypervisor
} which it can detect, and this hypervisory exports a "virtual cpu" to the
} guest OS, this can be understood in the context of NetBSD's cpu device.
} 
} The current attachment is as follows:
} 
} 'attach cpu at cpubus'
} 
} which is then constrained via:
} 
} 'cpu* at mainbus'
} 
} similarly in the virtualised case:
} 'attach vcpu at xendevbus'
} 
} and,
} 
} 'vcpu* at hypervisor'
} 
} I've thought of a couple of schemes, but my current thought is as
} follows:
} 
} attach cpu at cpubus with lapic
} attach cpu at xendevbus with xvcpu
} 
} cpu* at mainbus #(via x86/(mpacpi.c|mpbios.c) as usual)
} cpu* at hypervisor #(via xen/hypervisor.c as usual)
} 
} The idea is that the attachment is mediated by the lapic/xvcpu
} respectively which can then decide how to go forward (vcpus on xen pvhvm
} don't need to fully initialise the underlying cpu, for eg: whereas
} physical cpus on xen dom0 cannot fully initialise, since xen disallows
} full access to the corresponding lapic).
} 
} The situation for XEN is as follows:
} 
} PV domU - only vcpu
} HVM domU - only cpu
} PVHVM domU - cpu:vcpu -> 1:1

 Why is this 1:1?  This seems to be a serious limitation that
will reduce the number of VMs that a given box can handle.

} PVH dom0 - cpu:vcpu -> 1:1 (IIUC)
} PV dom0 - cpu:vcpu -> vcpu can be fewer than cpu
} 
} Thus I'm trying to dissect the attach path of x86 cpu in such a way that
} it makes the least amount of 'platform' assumptions. The above patch is
} a first cut at taking out 'lapic' related assumptions in the native cpu
} attach path.
} 
}-- End of excerpt from "Cherry G. Mathew"


Re: Proposal: Disable autoload of compat_xyz modules

2017-08-03 Thread John Nemeth
On Aug 3, 11:35am, m...@netbsd.org wrote:
} On Thu, Aug 03, 2017 at 01:23:17AM +0200, Emmanuel Dreyfus wrote:
} > Taylor R Campbell  wrote:
} > 
} > Once every compatibility module would not loaded by default, pehaps the
} > compat_xxx module could be loaded automatically if /emul/xxx/ exists?
} > 
} > The presence of that hierarchy means the system administrator really
} > meant to use compat_xxx, and it would avoid breaking existing system at
} > upgrade time.
} 
} Sounds good.
} 
} By the way, isn't that what happens in practice anyway? the only way to
} reach the COMPAT_OTHEROS code is to first exec a binary, which looks for
} an interpreter in /emul/otheros. If one doesn't exist, exec will fail.

 Not if the binary is statically linked.  I suspect that would
be the common case for at least some of the emulations.  Some of
them might not even support dynamic linking.

} I would feel more assured if COMPAT_SVR4 didn't exist in my kernels, but
} I suspect the vulnerability doesn't affect me.
} 
}-- End of excerpt from m...@netbsd.org


Re: Proposal: Disable autoload of compat_xyz modules

2017-08-03 Thread John Nemeth
On Aug 3,  4:09pm, Emmanuel Dreyfus wrote:
} Subject: Re: Proposal: Disable autoload of compat_xyz modules
}  wrote:
} 
} > By the way, isn't that what happens in practice anyway? the only way to
} > reach the COMPAT_OTHEROS code is to first exec a binary, which looks for
} > an interpreter in /emul/otheros. If one doesn't exist, exec will fail.
} 
} Joerg mentionned the statically linked binary. Even for dynamic
} binaries, there may also be some code executed in the compat module to
} check if it can run the binary.

 Also, strictly speaking, as I understand it, the interpreter
doesn't have to be /emul/otheros.  It's just that /emul/otheros is
searched first and if not there, then a second check is made without
that prefix.  Certainly this is the way it used to be.  I plopped
a NetBSD kernel on an otherwise stock SunOS system once, and apart
from KVM grovellers, it worked perfectly.

}-- End of excerpt from Emmanuel Dreyfus


Re: Proposal: Disable autoload of compat_xyz modules

2017-08-03 Thread John Nemeth
On Aug 3, 10:07am, Maxime Villard wrote:
} Le 02/08/2017 à 23:08, Joerg Sonnenberger a écrit :
} > On Wed, Aug 02, 2017 at 08:52:15PM +0200, Maxime Villard wrote:
} >> I disagree. The cost of doing a modload is low enough compared to the
} >> configuration needed to use compat_linux. Just like the command you quoted.
} > 
} > If I wanted OpenBSD, I know were to get it. There is a balance between
} > pissing off people and providing security.
} 
} In your opinion, what is pissing people off the most: having to do a modload,
} or being automatically vulnerable because some guys want to be able to do
} "make install opera etc" without typing one more command?

 What is pissing off people the most is one random developer,
who is not even a portmaster or member of core, making major
decisions about the project on their own accord, and basically
behaving like a petty little dictator.  Even if it is the correct
thing to do, which is debatable, it is not a decision that should
be made by a single random developer.  This is NetBSD, not MaxBSD.

} Strange understanding of pissing off people.
} 
} > If you want to minimize the
} > attack surface at all cost of *your* system, you are free to do so.
} 
} Forgive me for feeling a little sorry for the users that are
} regularly affected by vulnerabilities in compat_linux*.

 Who are these users?  Where are the complaints?

} > Otherwise it has to be balanced.
} 
} Certainly. It does not seem to me that moving compat_linux* into modules is in
} any way illegitimate or unbalanced. That's the opinion I was stating.

 YOU were not talking about turning them into modules.  YOU
were talking about deleting them.  I noted that you already deleted
the i386 version and I can't find any public discussion about that.

} > So far modules have primarily created
} > problems for a lot of people without any gain.
} 
} And so have compat_linux and compat_linux32.

 Huh?!?

} > Disabling rarely used
} > code is one thing, disabling commonly used code is something else. Stop
} > pushing for "security" as a single goal above else. It doesn't make you
} > more credible, it just makes people shot down sensible proposal as knee
} > jerk reaction because they are waiting for the insane follow-up.
} 
} Getting credibility and recognition from someone like you, Joerg, is not
} something I particularly care about. We're not in the jungle, we're here to
} talk; people are giving their opinion, I'm giving mine. I fixed 11 of the 11

 YOU are giving a lot more then just opinion.  YOU are threatening
to single handedly take action if you don't get the response you want.

} vulnerabilities that affected our compat options these last ten years, so I do
} have my word to say when it comes to security and compatibility, just like
} everyone else.

 "Say" is one thing, action is another thing entirely.

}-- End of excerpt from Maxime Villard


Re: Proposal: Disable autoload of compat_xyz modules

2017-08-02 Thread John Nemeth
On Aug 2,  5:02pm, Martin Husemann wrote:
} On Wed, Aug 02, 2017 at 07:56:50AM -0700, Brian Buhrow wrote:
} > Hello.  My feeling is that the cost of requiring a modload to use
} > compat_linux and compat_linux32 is fine.  My concern is that by taking it
} > out of the GENERIC kernel configuration, we lose the regular testing, such
} > as it is, with the daily builds.  Sure, the module gets built, but it could
} > be a while before it gets loaded and run by the test harness.  Today, with
} > these modules in GENERIC, the modules get loaded as a matter of course.
} > Is there a way to rig our test harness so that you can take the modules out
} > of the GENERIC kernel configuration and still do more than compile-time
} > test them?
} 
} The tests exercise quite a few modules, but currently testing compat stuff
} is tricky (due to the extra setup needed on the test machine to have a
} create the compat runtime environment).
} 
} Just doing a few modctl and load some of them is simple, but what does that
} actually buy us?

 Originally, it was my thought that compiling it as a module
and not using it is the same as compiling it into the kernel and
not using it.  However, it is possible to create a module that
fails to load due to run time linking issues.  So, having a test
that does modload ensures that the module can still linked into
the kernel.

}-- End of excerpt from Martin Husemann


Re: nanosleep() for shorted than schedule slice

2017-07-02 Thread John Nemeth
On Jul 2,  8:04pm, David Holland wrote:
} Subject: Re: nanosleep() for shorted than schedule slice
} On Sun, Jul 02, 2017 at 12:54:52PM +0200, Joerg Sonnenberger wrote:
}  > > I wonder if it would make sense for nanosleep(2) to check that requested
}  > > sleeping time is shorter than a schedule slice, and if it is, spin the
}  > > CPU instead of scheduling another process. Any opinion on this?
}  > 
}  > No, that's wrong. It's also been discussed before.
} 
} How is that wrong? It was always more or less the point of nanosleep.

 If you start spinning right after the start of a timeslice,
you could spin for close to an entire timeslice.  On a modern
multi-GHz CPU that's a tremendous number of wasted cycles (also
doesn't help power consumption).

}-- End of excerpt from David Holland


Re: nanosleep() for shorted than schedule slice

2017-07-02 Thread John Nemeth
On Jul 2,  8:41pm, m...@netbsd.org wrote:
} On Sun, Jul 02, 2017 at 08:38:24PM +, m...@netbsd.org wrote:
} > On Sun, Jul 02, 2017 at 01:16:15PM +, Christos Zoulas wrote:
} > > The solution is to implement "tickless kernel". It is not that difficult.
} > 
} > It looks like we are always descheduling the thread, not jut because we
} > got a clock tick. even a tickless kernel won't help stupid.
} 
} to clarify, I mean, "it won't help, because we are being stupid", not
} as an insult.

 Saying, "we are being stupid," could most definitely be an
insult to the people that wrote the code in question.  I seriously
doubt that we are being stupid in this case.

}-- End of excerpt from m...@netbsd.org


Re: nanosleep() for shorted than schedule slice

2017-07-02 Thread John Nemeth
On Jul 2,  1:16pm, Christos Zoulas wrote:
} In article <1n8j63y.1pcs0owrn6gcem%m...@netbsd.org>,
} Emmanuel Dreyfus  wrote:
} >
} >I just encountered a situation where PHP performance on NetBSD is rather
} >weak compared to Linux or MacOS X.
} >
} >The code calls PHP's uniqid() a lot of time. uniqid() creates an unique
} >id based on the clock. In order to avoid giving the same value for two
} >consecutive calls, PHP's uniqid() calls usleep(1) to skip to make sure
} >the current microsecond has changed.
} >
} >On NetBSD this turns into a 16 ms sleep, which is 16000 what was
} >requested. This happens because the kernel scheduled another process,
} >which is the behavior documented in the man page. However the result is
} >that a PHP script full of uniqid() is ridiculously slow. 
} >
} >I worked around the problem by reimplementing PHP uniqid() using
} >uuidgen(), but that kind of performance problem could exist in many
} >other softwares.
} >
} >I wonder if it would make sense for nanosleep(2) to check that requested
} >sleeping time is shorter than a schedule slice, and if it is, spin the
} >CPU instead of scheduling another process. Any opinion on this?
} 
} The solution is to implement "tickless kernel". It is not that difficult.

 The other option would be to tell PHP not to be so dumb.  What
happens on other OSes?  I find it hard to believe that we're the
only ones that aren't tickless.

}-- End of excerpt from Christos Zoulas


Re: kernel aslr: someone interested?

2017-03-26 Thread John Nemeth
On Mar 25, 10:17pm, Mouse wrote:
}
} > [ASLR] is just one more check mark in the exploit building tool.
} 
} Yes and no.
} 
} It increases the work required to exploit any putative bugs.  It does
} not make exploitation impossible, but that does not mean it's not worth
} making it harder.  "You don't have to run faster than the bear; you
} just have to run faster than someone else."  That is, you don't have to
} be impossible to exploit; you just have to be enough harder to make
} them go after someone else instead.

 True enough.  Sometimes the simplest things are effective at
keeping the script kiddies away.  I use a different port for SSH
on my gateway box and I never hear from the script kiddies even
though a simple port scan would quickly find my SSH server.  On
the other hand, machines with SSH on the normal port are constantly
being hammered by the script kiddies.

}-- End of excerpt from Mouse


re: "Wire" definitions and __packed

2016-10-06 Thread John Nemeth
On Oct 6,  3:01pm, matthew green wrote:
}
} >  X86 doesn't have alignment restrictions.  The platform
} > practically lets you get away with murder, and thus is not useful
} > as a test platform.
} 
} FWIW, this hasn't been true since at least 1999 (SSE.)  also,

 That only counts if somebody is using SSE, and I highly doubt
that dhcpcd does.

} while no one uses them, x86 has "alignment checking" options.

 I am aware of the flag, but as you noted nobody uses it, thus
it might as well not be there.

}-- End of excerpt from matthew green


Re: "Wire" definitions and __packed

2016-10-05 Thread John Nemeth
On Oct 5, 10:15pm, Roy Marples wrote:
} On Wednesday 05 October 2016 17:10:28 Eduardo Horvath wrote:
} > On Wed, 5 Oct 2016, Roy Marples wrote:
} > > On 04/10/2016 23:06, Joerg Sonnenberger wrote:
} > > > I'd like to addressing this by cutting down on the first set. For this
} > > > purpose, I want to replace many of the __packed attributes in the
} > > > current network headers with CTASSERT of the proper size, especially for
} > > > those structs that are clearly not wire definitions by themselve.
} > > 
} > > I tested the following structs without packed with the latest dhcpcd
} > > trunk (not yet in NetBSD).
} > > 
} > > ip
} > > udphdr
} > > arphdr
} > > in_addr
} > > nd_router_advert
} > > nd_opt_hdr
} > > nd_opt_prefix_info
} > > nd_opt_mtu
} > > nd_opt_rdnss
} > > nd_opt_dnssl
} > > 
} > > Works fine so far.
} > 
} > What platforms did you test it on?
} > 
} > I recommend trying it on sparc64.  That's one of the worst cases, being
} > big-endian 64-bit with alignment constraints.  And I recall some ABI (was
} > it ARM?) has strange alignment restrictions on byte values.
} 
} i386/amd64 only right now.

 X86 doesn't have alignment restrictions.  The platform
practically lets you get away with murder, and thus is not useful
as a test platform.

} I'll test on mips64-eb tomorrow.
} Sadly my sparc64 is dead, the network card reports an unspecified hardware 
} address.

 Traditionally, sparc boxes got their network MAC address
programmed by a value specified in CMOS RAM.  This likely means
that the CMOS battery is dead.

}-- End of excerpt from Roy Marples


Re: Changing the return value of xxx_attach() from void to int.

2016-07-10 Thread John Nemeth
On Jul 10,  9:37pm, David Holland wrote:
} On Sat, Jul 09, 2016 at 08:45:15PM -0700, John Nemeth wrote:
}  > } The substance of that reservation is that there's not much point doing
}  > } it without also taking the time to correct the behavior, i.e., back
}  > } out properly if something fails. And that requires attention, not just
}  > } mechanical changes.
}  > 
}  >  Sure, but that's something that can be done over time, driver
}  > by driver.  The first step is the infrastructure support (changing
}  > the return type, having autoconf respond intelligently, etc.).
}  > The very first step of changing the return type is a purely mechanical
}  > change.
} 
} Well, yes, but if you change the return type mechanically first then
} you end up with a thousand or two attach functions that *look* like
} they handle errors but actually don't.

 Thanks for the reminder.  I meant to add to my list of steps
that the xxx_attach() function needs to be flagged somehow (possibly
with a standardised comment) to show that it still needs to be
audited.  The flag is something that needs to be easily found
mechanically so that lists can be made.

 Also, I expect that some drivers will never be audited/tested
since there are drivers for ancient hardware that very few people
now own/use.  Of course, that might be a hint that the driver should
be retired (or, at least commented out in GENERIC).

}-- End of excerpt from David Holland


Re: Changing the return value of xxx_attach() from void to int.

2016-07-09 Thread John Nemeth
On Jul 10,  2:39am, David Holland wrote:
} On Sat, Jul 09, 2016 at 04:57:20PM -0700, John Nemeth wrote:
}  >  A number of people have expressed reservation (bring up memories
}  > of device_t and how long that took to settle out) indicating that
}  > this should be done on a branch or something.  Personally, I don't
}  > see the need to do so.  The issue with the device_t change was that
}  > it involved actual code changes.  This does not, it is simply search
}  > and replace, which is a much less dangerous thing to do.  Some have
}  > suggested doing other changes at the same time.  That would certainly
}  > increase the risk.
} 
} The substance of that reservation is that there's not much point doing
} it without also taking the time to correct the behavior, i.e., back
} out properly if something fails. And that requires attention, not just
} mechanical changes.

 Sure, but that's something that can be done over time, driver
by driver.  The first step is the infrastructure support (changing
the return type, having autoconf respond intelligently, etc.).
The very first step of changing the return type is a purely mechanical
change.

}-- End of excerpt from David Holland


Re: Changing the return value of xxx_attach() from void to int.

2016-07-09 Thread John Nemeth
On Jun 23,  7:40pm, Masanobu SAITOH wrote:
} 
}   As you know, the return value of device driver's attach function is void.
} I've thought that we should change it to int for many years. I believe I'm
} not the only person.

 I've been meaning to get back to this one for some time...

}   xxx_attach() may fail the following cases:
} 
}   got unexpected behavior.
} 
}   resource allocation error.
} 
}   driver is really broken.
} 
}   some others.

 I believe the interesting cases are:  device is broken, resource
allocation failure, can't find firmware, etc. (i.e. no chance of
attaching device); driver decides that it can't handle device even
though it matched (i.e. try another driver); and success.

}   xxx_attach() is void, so the caller can't know the fail. It makes some
} problems:
} 
}   a) The OS may touch the broken device while the OS is running.
}  It may causes a panic.
} 
}   b) The shutdown sequence calls xxx_shutdown() even if
}  it's not really attached. It may causes panic. We have met
}  this bugs many times.
} 
}   c) To prevent b), we have added extra code into xxx_shutdown().
}  It wastes our time and the code become big and complex.
} 
}   d) Resource leak. For example, A device_t structure is allocated
}  by caller side.
} 
}   If we change the return value to int or any other type's value, we can
} do the following things.
} 
}   a) Don't register failed device to avoid problem.
} 
}   b) If we don't do a), we can add code to not to call xxx_shutdown()
}  for broken devices instead.
} 
}   c) We can add the code to fallback to lower priority device driver
} like ukphy(4), ugen(4) or others.
} 
}   d) (add some other good features?)
} 
}   What do you think about this change? If you're OK, I'll change _ALL_ device
} drivers' attach function first by the following way:
} 
}   0) Change return value to int.
} 
}   1) Change "return;" to "return 0;" or "return -1;"
}  (or bool value or Exxx. See below)
} 
}   2) I won't modify the caller side for the time being.

3) Add "return 0;" to end of match routine to catch any
   that just "fall off" the end of the function (perfectly
   legal for void functions).

} In this way, it won't break anything because the caller doesn't check the
} value. Even if I mistakenly modified the return value and we add code to
} check the return value, it won't be a big problem because almost all drivers
} don't go into the failure paths. So, I think it's not required to use new
} CFATTACH_() because it makes both the driver and the caller side be
} complex.
} 
}   Now I'm wondering if I should return "-1" or Exxx. FreeBSD returns Exxx
} (e.g. ENOMEN, EIO, ENXIO, etc.). I don't know which one is better.
} 
}   Changing to -1 is easier than Exxx because it's not required
}   to wonder what Exxx I should choose.
} 
}   Changing to Exxx may be good if we check the error code and
}   do something depend on the value though I can't imagine anything now.
} 
}   -1 and Exxx can be mixed because both are int. We can detect
}   error with "if (rv != 0)" even if it's mixed.

 I would just have a sequence of numeric values.  I don't think
we need formal Exxx values.  I would recommend 0 for success, 1
for impossible, 2 for try another driver, etc.  There is only one
caller, autoconf, so it isn't like we have to prepare for a bunch
of random callers.

}   Any objection, advice or idea?

 I like the idea.  On the first pass, you could simply have
all drivers return "success" unconditionally (this would maintain
status quo).  Then, figure out what autoconf should do with the
various errors, then modify drivers one at a time.

 A number of people have expressed reservation (bring up memories
of device_t and how long that took to settle out) indicating that
this should be done on a branch or something.  Personally, I don't
see the need to do so.  The issue with the device_t change was that
it involved actual code changes.  This does not, it is simply search
and replace, which is a much less dangerous thing to do.  Some have
suggested doing other changes at the same time.  That would certainly
increase the risk.

}-- End of excerpt from Masanobu SAITOH


Re: DTrace on Xen?

2016-05-24 Thread John Nemeth
On Apr 9,  1:50am, Christos Zoulas wrote:
} In article <20160409012248.ga27...@panix.com>,
} Thor Lancelot Simon   wrote:
} >Next try: is DTrace (particularly FBT) expected to work on NetBSD/xen?
} >
} >I'm struggling to get some grasp on why I/O to SCSI disks uses *25X* more
} >CPU in "interrupt" time on the same system under NetBSD/xen than under
} >NetBSD/amd64.  Kernel profiling clearly does not work, so I'm hoping to
} >get somewhere with DTrace even if I have to do much of what gprof would
} >do for me by hand.
} >
} >This is -- among other things -- crippling the Foundation's pkgsrc build
} >server.  So I could use some help, if anyone's got it to give.
} 
} Why doesn't kernel profiling work.?
} Does the xen kernel has KDTRACE_HOOKS, and is it built with symbols?

 Xen kernels are supposedly built with symbols.  However, when
trying to do MODULAR Xen kernels, I found that I could only load
very simple modules.  If I try to load a more complex, such as a
filesystem, it fails due to symbols not being found.

}-- End of excerpt from Christos Zoulas


Re: Simplify bridge(4)

2016-02-12 Thread John Nemeth
On Feb 12, 10:33am, Roy Marples wrote:
} On 12/02/2016 08:34, Ryota Ozaki wrote:
} > On Thu, Feb 11, 2016 at 3:17 AM, Mouse  wrote:
} >>> [J]ust wondering if we are going to see vether(4) anytime soon.
} >>
} >> How would this vether differ from the existing tap?  Presumably I'm
} >> just missing something
} > 
} > dhcpcd didn't work well with bridge(4) and tap(4) didn't help that.
} > vether(4) would help that. We may be able to address the issue by
} > fixing bridge or tap but I have no idea for now.
} 
} It's not actually dhcpcd itself - it's the kernel BPF implementation.
} There was also an issue where some DHCPv6 messages were not following
} across the bridge properly either.
} 
} If vether solves that then great, but does that mean we could drop the
} tap interface entirely or just swap it in place?
} From my perspective (a user), there is no difference between tap and vether?

 tap(4) is a direct interface between userland and the network.
vether(4) would not be (although you could use BPF, etc.).  It
would be an ethernet device that represents the host.  If you know
how to configure Cisco devices, think BVI.

 The problem with bridge(4) is that you put addresses on one
of the interfaces included in the bridge.  The addresses belong to
the host as a whole, not to the particular part represented by an
interface to part of the outside world.  vether(4) would represent
the host.  "bridge" is a synonym for "switch".  A bridge is really
network infrastructure, not part of a host.

}-- End of excerpt from Roy Marples


Re: Simplify bridge(4)

2016-02-10 Thread John Nemeth
On Feb 10,  6:56pm, Ryota Ozaki wrote:
} 
} Thanks to introducing softint-based if_input,
} we can simplify bridge(4).
} 
} - Remove spin mutexes
}   - They were needed because some code of bridge could run in
} hardware interrupt context
}   - We now need only an adaptive mutex for each shared object
} (a member list and a forwarding table)
} - Remove pktqueue
}   - bridge_input is already in softint, using another softint
} (for bridge_forward) is useless
}   - Packet distribution should be down at device drivers
} 
} As expected, forwarding performance improves slightly
} because of stopping using the second softint.
} 
} Here is a patch:
} http://www.netbsd.org/~ozaki-r/simplify-bridge.diff
} 
} Any comments or suggestions?

 Not a direct comment, but just wondering if we are going to
see vether(4) anytime soon.  This would simplify the administration
and possibly the code of bridge(4) in that bridge(4) would become
strictly transit and not have to worry about acting as both transit
and endpoint.

}-- End of excerpt from Ryota Ozaki


Re: vnd.c 1.254

2016-01-17 Thread John Nemeth
On Jan 18, 12:52am, Robert Elz wrote:
}
} Date:Sun, 17 Jan 2016 17:52:16 +0100
} From:Manuel Bouyer 
} Message-ID:  <20160117165216.ga4...@asim.lip6.fr>
} 
}   | I don't understand that. If you run in /, you get the busy/free devices
}   | in /dev, if you run in /chroot you get the busy/free devices in 
/chroot/dev.
} 
} But that makes no sense at all, there are only one set of devices, just
} multiple different sets of names for them.   This is why vnconfig should
} not be looking in /dev (as it never did before NetBSD 7)
} 
}   | yes, but that's not how one would use it.
} 
} It might be.  Particularly with something like xen, qemu, virtualbox
} (as a host) it might make sense to have vnd device files with known
} fixed names (root, usr, home ...) in each config directory (accessing
} different kernel vndN's of course), and then have the startup scripts
} not need configuration, or to go hunting for a suitable vnd.

 Currently, there is no simple way of doing that, at least not
for Xen.

}   | One would use vnconfig -l to find a usable device in /dev,
} 
} No, one wouldn't, as is evidenced by the fact that that is not the
} method that the xen config script uses.  It does it the right way
} (which includes looking at what is available in /dev, though if needed,

 The only place the Xen script does look is in /dev.  It seems
kind of strange to be arguing that it is correct for the Xen script
to look in /dev, but that isn't correct for vnconfig -l to do so.
After all, they are essentially doing the same thing in order to
find out what vnds exist.

} making a new set of vndN's there is trivial - if I used xen enough
} (that is, making new clients frequently) I think I'd have the script just
} MAKEDEV a new one if all the existing entries in /dev were in use.)

 Hrmm...  automagic...

}   | so you need the /dev entry.
} 
} You need a (set of) special files to create and access the device.
} They do not need to be in /dev.

 True, but if you put them elsewhere, you're on your own to
manage them, and I don't see a problem with that.

}   | I say that what's in /dev/ is now relevant because this is what limits
}   | the number of vnd you can use
} 
} No it doesn't.   Not in any way at all.

 Now, you're just splitting hairs.

}   | Older vnconfig -l listing devices without checking that a /dev/ entry
}   | exists may also be seens as a bug.
} 
} No, it wasn't, it told which vnds were in use, and which were free,
} regardless of which path names might be available to access the free ones,
} or which had been used to config the busy ones.   That's useful.
} It is still useful now, except now there are too many free vnd's to
} list, so the rational approach is to deduce which are free given
} knowledge of which are busy.   The information is still there, just
} in a different form.
} 
}   | The only limit is what's in /dev/ so listing what's in /dev is fine.
} 
} But that isn't a limit, and isn't material in any case.
} 
}   | > It did, or should have.   The code that looked in /dev was ripped out.
}   | > If a pullup of that didn't happen, it should have.
}   | 
}   | You can check that. But a pullup that remove a functionality that
}   | has been there for at last 2 release should be rejected.
} 
} I have now, and it looks like the pullup did not happen.   Christos put
} a comment in the commit on head
}   XXX: pullup to 7 together with the kernel change.
} but it appears as if that did not happen.
} 
} It needs to.   The vnconfig (vndconfig now) and kernel changes are a set,
} one without the other is definitely broken.
} 
} Christos: can you request that please?
} 
}   | You didn't look at the code I guess.
} 
} That's because you're still running the 7.0 release vnconfig (because
} that is what is still on the netbsd-7 branch).  That's simply broken, and
} it is not surprising that you are seeing problems with vnconfig -l.
} 
} Note that all of this occurred because NetBSD 7 got a broken hack to
} vnconfig to work around the change to being a cloning (which ignored
} backwards compat) rather than fixing it in a rational way, which has
} now been done ... just not yet(fully) pulled up to -7 and -7-0
} 
} And this (irrelevant side issue) still doesn't explain what is happening
} with the xen startup, which doesn't use vn{d}config -l   You wouldn't
} even be thinking about it if you had not noticed (largely because of
} the mismatch of vnconfig & kernel) the change while looking for whatever
} the real issue is here.You also didn't object when the original
} problem, and this solution, were being discussed early last November.
} 
}-- End of excerpt from Robert Elz


Re: vnd.c 1.254

2016-01-17 Thread John Nemeth
On Jan 17,  5:52pm, Manuel Bouyer wrote:
} On Sun, Jan 17, 2016 at 11:04:23PM +0700, Robert Elz wrote:
} > Date:Sun, 17 Jan 2016 15:52:38 +0100
} > From:Manuel Bouyer 
} > Message-ID:  <20160117145238.ga3...@asim.lip6.fr>
} > 
} >   | unless you run vnconfig in the chroot.
} > 
} > And /dev in the chroot has the same vnds in it that /dev has
} 
} I don't understand that. If you run in /, you get the busy/free devices
} in /dev, if you run in /chroot you get the busy/free devices in /chroot/dev.
} I can't see a problem with that.
} 
} > 
} >   | listing what is available in /dev makes sense to me, as, unless you 
have a
} >   | very special setup, you'll use what's in /dev/ anyway.
} > 
} > Usually, yes, but "usually works" isn't really good enough.
} 
} As long as the limitations are known and documented I don't have a
} problem with that. If we remove all softwares that only "usually works"
} we can just drop computers away
} 
} > 
} >   | You could use an option to list other devices in other directories.
} > 
} > You'd also need an option to give their names.   Consider
} > 
} > mknod mydir/foo-pt1 c 14 0
} > monod mydir/bar-pt2 c 14 1
} > mknod mydir/xxx-pt3 c 14 2
} > mknod mydir/vnd-raw c 14 4
} > vmconfig $(pwd)/mydir/vnd-raw /some/image/file
} > mknod other/foo-pt1 c 14 16
} > mknod other/bar-pt2 c 14 17
} > mknod other/xxx-pt3 c 14 18
} > mknod other/vnd-raw c 14 19
} > rm -f /dev/vnd*
} > 
} > What would you like vnconfig -l to list, and how would you expect to
} > achieve it?
} > 
} >   | or just list what's in /dev/
} > 
} > That's not backward compat with any NetBSD prior to NetBSD7.
} > Take your netbsd 5 that you used for the previous example, remove
} > all the /dev/vnd* (or move them somewhere) and try vnconfig -l
} > again.   I think you'll see the same output as you did before.
} 
} yes, but that's not how one would use it. One would use vnconfig -l
} to find a usable device in /dev, so you need the /dev entry.
} 
} > Similarly if you MAKEDEV vnd{5,6,7} it will still just list vnd 0..3
} 
} yes, and that's find because others are not usable even if they exists.
} But now that this limitation is gone I don't have a problem with
} listing all /dev entries.
} 
} > What is in /dev was always irrelevant.   NetBSD 7 is just broken in
} > this area.
} 
} I say that what's in /dev/ is now relevant because this is what limits
} the number of vnd you can use (and this limit can easily be raised if needed).
} Older vnconfig -l listing devices without checking that a /dev/ entry
} exists may also be seens as a bug.
} 
} > 
} >   | True, that's why I insist on vnconfig -l to list free devices as it used
} >   | to (although I don't use it myself).
} > 
} > If you can work out what that really means (not looking at /dev) in a
} > way that makes sense, that would be fine.  I cannot (other than listing
} > all 4 billion.)
} 
} The only limit is what's in /dev/ so listing what's in /dev is fine.
} 
} > 
} >   | I'm talking about vnconfig -l not listing free devices, no about
} >   | vnconfig getting spurious ENXIO
} > 
} > I know, and I still doubt that it matters.
} > 
} >   | it is a kernel and an userland from netbsd-7, not HEAD.
} > 
} > I understand.
} > 
} >   | Anyway vnconfig didn't change in netbsd-7 since 7.0.
} > 
} > It did, or should have.   The code that looked in /dev was ripped out.
} > If a pullup of that didn't happen, it should have.
} 
} You can check that. But a pullup that remove a functionality that
} has been there for at last 2 release should be rejected.
} 
} > 
} >   | And even if it did, I would expect vnconfig from 7.0_RELEASE to work
} >   | with a netbsd-7 kernel
} > 
} > Normally I would do, but ...
} > 
} >   | (for backward compat it's more important than a netbsd-6 vnconfig with
} >   | a netbsd-7 kernel)
} > 
} > I disagree.   The number of people upgrading 7.0 to -7 (and doing
} > it by only upgrading the kernel) is going to be far fewer than the
} > number upgrading from 6 (and earlier.)
} 
} of course not. I guess it's common to run userland from a release and
} kernel from the corresponding stable branch. Running a kernel from a
} different stable branch than userland is much less common (because you
} expect things to break, e.g. ipf).

 Actually, ipf has backwards compat these days.  There is very
little left that doesn't have backwards compat.

} > If a 7.0.1 had already
} > been released it wouldn't even be an issue.
} 
} That wouldn't change the problem at all.
} 
} > 
} >   | > All 4 billion of them?
} >   | No, what's in /dev/ as it used to do in 7.0-RELEASE
} > 
} > I bet it isn't.   MAKEDEV a few more vnds in /dev and try
} > again, changing nothing else.  If it appears to be listing
} > all that is in /dev, that is just co-incidence.
} 
} You didn't look at the code I guess.
} xen1:/root#uname -a
} NetBSD xen1.soc.lip6.fr 7.0_STABLE NetBSD 7.0_STABLE 

Re: vnd.c 1.254

2016-01-17 Thread John Nemeth
On Jan 17, 11:04pm, Robert Elz wrote:
}
} Date:Sun, 17 Jan 2016 15:52:38 +0100
} From:Manuel Bouyer 
} Message-ID:  <20160117145238.ga3...@asim.lip6.fr>
} 
}   | unless you run vnconfig in the chroot.
} 
} And /dev in the chroot has the same vnds in it that /dev has
} 
}   | listing what is available in /dev makes sense to me, as, unless you have a
}   | very special setup, you'll use what's in /dev/ anyway.
} 
} Usually, yes, but "usually works" isn't really good enough.
} 
}   | You could use an option to list other devices in other directories.
} 
} You'd also need an option to give their names.   Consider
} 
}   mknod mydir/foo-pt1 c 14 0
}   monod mydir/bar-pt2 c 14 1
}   mknod mydir/xxx-pt3 c 14 2
}   mknod mydir/vnd-raw c 14 4
}   vmconfig $(pwd)/mydir/vnd-raw /some/image/file
}   mknod other/foo-pt1 c 14 16
}   mknod other/bar-pt2 c 14 17
}   mknod other/xxx-pt3 c 14 18
}   mknod other/vnd-raw c 14 19
}   rm -f /dev/vnd*
} 
} What would you like vnconfig -l to list, and how would you expect to
} achieve it?

 If you're going to do bonkers things, then you should expect
the system to behave in bonkers ways.  It is unreasonable to expect
the system to handle every corner case that a sysadmin on crack
can create.

}   | or just list what's in /dev/
} 
} That's not backward compat with any NetBSD prior to NetBSD7.
} Take your netbsd 5 that you used for the previous example, remove
} all the /dev/vnd* (or move them somewhere) and try vnconfig -l
} again.   I think you'll see the same output as you did before.
} Similarly if you MAKEDEV vnd{5,6,7} it will still just list vnd 0..3
} What is in /dev was always irrelevant.   NetBSD 7 is just broken in
} this area.
} 
}   | True, that's why I insist on vnconfig -l to list free devices as it used
}   | to (although I don't use it myself).
} 
} If you can work out what that really means (not looking at /dev) in a
} way that makes sense, that would be fine.  I cannot (other than listing
} all 4 billion.)
} 
}   | I'm talking about vnconfig -l not listing free devices, no about
}   | vnconfig getting spurious ENXIO
} 
} I know, and I still doubt that it matters.
} 
}   | it is a kernel and an userland from netbsd-7, not HEAD.
} 
} I understand.
} 
}   | Anyway vnconfig didn't change in netbsd-7 since 7.0.
} 
} It did, or should have.   The code that looked in /dev was ripped out.
} If a pullup of that didn't happen, it should have.
} 
}   | And even if it did, I would expect vnconfig from 7.0_RELEASE to work
}   | with a netbsd-7 kernel
} 
} Normally I would do, but ...
} 
}   | (for backward compat it's more important than a netbsd-6 vnconfig with
}   | a netbsd-7 kernel)
} 
} I disagree.   The number of people upgrading 7.0 to -7 (and doing
} it by only upgrading the kernel) is going to be far fewer than the
} number upgrading from 6 (and earlier.)   If a 7.0.1 had already

 That isn't necessarily true.  It is certainly feasible in many
cases and possibly even desirable in some cases to run with a 7.0
kernel on a 6.X userland for some time to make sure things are
going to work out okay.  It is much easier to change the kernel
then it is downgrade userland, especially since there is no officially
supported method for doing the latter.

} been released it wouldn't even be an issue.

 This, also isn't necessarily true.  7.0.1 won't see all pullups
that netbsd-7 does (7.0.1 will come from the netbsd-7-0 branch).
7.0.1 will only sees security and critical bug fixes, whereas 7.1
will have general bug fixes, updated/new device drivers, etc.

}   | > All 4 billion of them?
}   | No, what's in /dev/ as it used to do in 7.0-RELEASE
} 
} I bet it isn't.   MAKEDEV a few more vnds in /dev and try
} again, changing nothing else.  If it appears to be listing
} all that is in /dev, that is just co-incidence.
} 
}   | When the problem did show up, only vnd0 and vnd1 were in use.
}   | vnconfig -l did show on vnd0 and failed with ENXIO on vnd1 (although the
}   | device was configured because it was, and is still, in use by a domU).
} 
} That would be a bug, that we need to find and fix.   If vnd1 is in use,
} it should be listed.
} 
} It may be the same bug that is causing the xen startup problem, or it
} might be a different one.   Was it a (bare) "vnconfig -l" that failed?
} If you (or anyone else) sees this again, also try "vnconfig -l vnd1"
} (or whichever one vnconfig -l fails on and is known to be in use.)
} 
}-- End of excerpt from Robert Elz


Re: vnd.c 1.254

2016-01-17 Thread John Nemeth
On Jan 17,  9:37pm, Robert Elz wrote:
}
} Date:Sun, 17 Jan 2016 14:49:23 +0100
} From:Manuel Bouyer 
} Message-ID:  <20160117134923.ga2...@asim.lip6.fr>
} 
}   | I mean, vnconfig -l (without other arguments) has been showing available
}   | devices for a long time:
} 
} Yes, I know, and agree, it has ... but that is only possible if it
} is possible to rationally enumerate the available devices.   When there
} were a fixed (small) number, it made sense.  That is no longer the case.
} 
} Do you really want it to list 4 billion free vnds ?

 Obviously not, unless somebody was silly enough to create 4
billion /dev entries, which is likely to cause other problems.

} Using what is in /dev is incorrect (always was) as /dev is just a
} convention (and particularly is not reliable when chroots are in use).

 It may be "just a convention", but it is also the best
approximation.

}   | this is a major behavior change, which may well break existing setups.
} 
} True, but there is little alternative, unless you'd like to return to
} the pre cloning days.   It can stay as it is now, listing free devices
} up to the highest used (but that really is hard to explain and makes
} little sense, and as you have observed, is not very reliable) or I guess
} we could just add a 
} 
}   for (n = highest_found; ++n < highest_found + 4; )
}   printf("vnd%d: not in use\n", n);
} 
} after it finishes printing, just to list a few more free ones.
} 
}   | You remove existing and working functionality to fix a marginal backward
}   | compatibility issue ?
} 
} Not marginal at all, and backwards compat has always been one of NetBSD's
} prime objectives.
} 
}   | But removing this functionality is breaking
}   | backward compat, in a much more important way.
} 
} Actually, I doubt it.  I suspect some other issue is the problem here,
} and the change to vnconfig -l is just confusing the issue.

 Possibly.

}   | we *are* already running an up to date vnconfig, dammit !
} 
} Ah, OK, I misread your description (I thought you meant one from 7.0)
} 
}   | not until this problem is fixed. Breaking XEN3_DOM0 support is a real
}   | problem.
} 
} Agreed, we need to work out what is causing that vnconfig to fail.
} 
}   | Unfortunably it's transient.
} 
} That does make it difficult to debug.
} 
}   | After a view vnconfig manipulations the
}   | problem is gone for me (and vnconfig -l again show all devices,
}   | used or free).
} 
} All 4 billion of them?
} 
}   | cd_ndevs is now at 8 (checked with gdb against /dev/mem)
} 
} Then at some stage you had vnd7 configured.
} 
}-- End of excerpt from Robert Elz


Re: vnd.c 1.254

2016-01-17 Thread John Nemeth
On Jan 18,  5:58am, Robert Elz wrote:
}
} Date:Sun, 17 Jan 2016 12:42:32 -0800
} From:John Nemeth <jnem...@cue.bc.ca>
} 
} And from a later message
} (<201601172101.u0hl11cv023...@server.cornerstoneservice.ca>) ...
} 
}   |  The only place the Xen script does look is in /dev.  It seems kind of
}   | strange to be arguing that it is correct for the Xen script to look in 
/dev,
}   | but that isn't correct for vnconfig -l to do so. After all, they are
}   | essentially doing the same thing in order to find out what vnds exist. 
} 
} Not at all.  They're doing different things for different purposes.  The
} xen script needs to find a special file that it can configure, for that
} looking in /dev (the normal place to find such files) is entirely reasonable.
} [Aside: a config option to specify which vnd device, which could allow the
} special file to be anywhere, would be nice, and probably already exists.]

 Actually, there isn't.  Keep in mind that the Xen stuff is
host OS independent.  Most of the processing is done in code that
is OS independent.  The block script is an interface between the
OS independent code and NetBSD.

} On the other hand, vnconfig is just revealing internal kernel status to
} the user.   The names it prints (vnd0: ...) aren't used for anything else.
} There isn't even any guarantee that /dev/vnd0[a-p] and the kernel vnd0
} are related in any way at all.  They usually will be, but need not.

 True.  But, if you change that, you're just going to be creating
a big headache for yourself.

} That wouldn't bother other users,   No-one should really care which internal
} kernel unit is accessed by a particular vndN[a-z] set of special files.
} 
}-- End of excerpt from Robert Elz


Re: vnd.c 1.254

2016-01-17 Thread John Nemeth
On Jan 18,  6:37am, Robert Elz wrote:
}
} Date:Sun, 17 Jan 2016 23:26:35 +0100
} From:Michael van Elst 
} Message-ID:  <20160117222634.ga5...@serpens.de>
} 
}   | I'd rather have something that lists existing devices, allocates
}   | a fresh one and tells me the name and works for all such pseudo disks.
} 
} I use the following script.   You will probably want to remove the
} "echo cgd0" bit - I do that to permanently reserve cgd0 from this kind
} of use (all my systems use that one for one particular purpose.)
} 
} I call it next_avail - usage is something like
} 
}   VND=$( next_avail vnd )
}   CGD=$( next_avail cgd )
}   
} (and whatever else is similar, I know there is at least one more, but I
} have forgotten which it is .. oh yes, raid of course...)

 Don't forget ccd(4).

}-- End of excerpt from Robert Elz


Re: vnd.c 1.254

2016-01-17 Thread John Nemeth
On Jan 17,  1:01pm, Robert Elz wrote:
}
} Date:Sat, 16 Jan 2016 23:27:51 +0100
} From:Manuel Bouyer 
} Message-ID:  <20160116222751.ga2...@asim.lip6.fr>
} 
}   | Also, you don't address the problem that, as I understand it and if
}   | the code works properly, vnconfig -l won't show free devices if the
}   | first 4 are in use.
} 
} Arguably it shouldn't show any free devices at all, otherwise, where
} should it stop?   The correct answer to "which vnd is free?" is "any
} vnd that is not is use."   Attempting to enumerate them all is folly.
} 
} The current scheme (I believe) lists a vnd as free (not in use) if
} some higher vnd is (or has been) used, and stops when the highest one
} ever used is reached.   Or at least that's the intent.   But removing
} all of the output for unused vnds would probably be a good idea.
} 
} If you want to know what is configured in /dev, then "ls /dev/vnd*d"
} will show you that, but there is no particular reason that vnd's
} (or any devices) need to exist in /dev (consider in a chroot partition,
} which might have /dev/vnd23[a-p] only)
} 
} There original problem was caused with the way vn{d}config was hacked
} to handle -l when vnd was made cloning (that lost backward compat to
} netbsd 6, which was the bug reported which the fixes in question were
} handling).  But there was no way to fix vnd and vn{d}config that would
} retain 100% backward compat in all cases.  Since NetBSD 7 was so new,
} some compat was lost just for it, you really do not want to run vnd
} related stuff from netbsd 7 release except with everything from its
} own version - upgrade to what is now on the relevant branch, or what
} is in current, but do both vnd.c and vndconfig at the same time.
} 
} But if you have vndconfig & a kernel built from the same set of sources,
} it should work.   But various mismatches have different sets of problems.
} Which particular problem depends upon just which version of vn{d}config
} and which version of vnd.c happen to be in use.
} 
} jnem...@cue.bc.ca said:
}   | It would appear that the call to vnconfig is failing.
}   | The question is, why?
} 
} Yes, good question.   What is $xparams in [t]he script fragment quoted ?

 It's the path to the file to be used as backing store (confirmed
to exist and be a regular file by an earlier call to stat(1)).
Its original source is the config file for the domU.

} Currently, it is possible to configure any unused vnd (so if $xparams
} is doing that it should work) (it is also possible to vndconfig -l

 As shown in the script fragment, $xparams has nothing to do with
the choice of which vnd to use.

} any device) but other uses are likely to return an error when used on
} an unconfig'd vnd
} 
}   | What happens if you have 9 or fewer /dev/vnds?
} 
} Should be irrelevant.
} 
}   | My thought here is about sort order where vnd10 would come before vnd2
} 
} No, the script extracts just the N part of the vnd names, and uses sort -n
} so the sort will produce 0 1 2 3 ... 10 11 ...

 Oops, right.

} But in any case:
} 
}   | and what happens if you try to configure them out of order.
} 
} Nothing very interesting, vnds (or any similar cloning device) can be
} configured in any order you like.
} 
} Incidentally, possibly depending on just what $xparams is, that script
} fragment looks like it should work fine to me - it uses safe methods
} to work out which vnd is available from what I can see (the script
} wants to use /dev/vnd* so it looks to see what is there, it cannot
} use anything which isn't) and then it removes from consideration any
} vnd which is in use (for which it uses $( sysctl hw.disknames )
} which is the safest way to see what is actually in use.
} 
} It isn't using vnconfig -l, which is the only thing that was (or should
} have been) affected by the vnd.c (and related vndconfig) changes.  That
} is, unless it is attempting to set a geometry with a sector size that is
} not a power of 2 - another of the changes in the set causes that to error

 The only call to vnconfig to configure a vnd (there is a
vnconfig -u elsewhere in the script) and as you saw it is nothing
more then:

vnconfig /dev/vndd 

} out, whereas previously it would have been accepted (and who knows what
} would have happened had it been actually used that way - these days much of
} the kernel assumes only power of two sector sizes, shifting is used to
} adjust units.)
} 
}-- End of excerpt from Robert Elz


Re: vnd.c 1.254

2016-01-16 Thread John Nemeth
On Jan 16,  7:21pm, Manuel Bouyer wrote:
}
} what problem are you trying to solve with this commit to sys/dev/vnd.c ?
} revision 1.251
} date: 2015/11/09 17:41:24;  author: christos;  state: Exp;  lines: +3 -5
} Return ENXIO if the get ioctl exceeds the number of configured devices.
} XXX: pullup-7

 The issue was that under some conditions, vnconfig -l would
loop forever, displaying:

vnd: not in use
vnd: not in use
vnd: not in use
...

I don't recall the exact trigger condition, but I have seen it happen.

} This broke vnconfig -l (and so Xen block-device scripts):
} xen1:/tmp#vnconfig -l
} vnd0: /domains (/dev/wd0f) inode 3
} vnconfig: VNDIOCGET: Device not configured

 It stops an older vnconfig with a newer kernel from looping
forever.  Exactly how old vnconfig has to be and how new the kernel
has to be is left as an exercise for the reader.  :->

} There are 7 more vnd devices in /dev/ waiting to be configured on this system.
} 
} This has been pulled up to netbsd-7 and netbsd-7-0 as part of
} ticket 1038, so vnconfig (and Xen dom0) is broken here too,
} as reported in PR 50659

 When trying to locate a free vnd(4), xl (technically, it calls
out to a script that) does this:

-
# Store the list of available vnd(4) devices in
#``available_disks'', and mark them as ``free''.
list=`ls -1 /dev/vnd[0-9]*d | sed "s,/dev/vnd,,;s,d,," | sort -n
`
for i in $list; do
disk="vnd$i"
available_disks="$available_disks $disk"
eval $disk=free
done
# Mark the used vnd(4) devices as ``used''.
for disk in `sysctl hw.disknames`; do
case $disk in
vnd[0-9]*) eval $disk=used ;;
esac
done
# Configure the first free vnd(4) device.
for disk in $available_disks; do
eval status=\$$disk
if [ "$status" = "free" ] && \
vnconfig /dev/${disk}d $xparams >/dev/null; then
device=/dev/${disk}d
break
fi
done
if [ x$device = x ] ; then
error "no available vnd device"
fi
-

It would appear that the call to vnconfig is failing.  The question
is, why?  What happens if you have 9 or fewer /dev/vnds?  My thought
here is about sort order where vnd10 would come before vnd2 and
what happens if you try to configure them out of order.

}-- End of excerpt from Manuel Bouyer


Re: In-kernel units for block numbers, etc ...

2015-11-29 Thread John Nemeth
On Nov 29, 10:38am, Michael van Elst wrote:
} Subject: Re: In-kernel units for block numbers, etc ...
} jnem...@cue.bc.ca (John Nemeth) writes:
} 
} > On a side note, if the backend is just a file, why doesn't
} >vnd(4) work with NFS?
} 
} A quick test shows that it works with a NFS file. I don't know
} how stable that is.

 It's documented as not working, and I know from experience
that it doesn't work, unless something has changed recently.  My
test case, at least the most recent one from memory, has to do with
Xen.  I keep ISO images on a NAS.  I often want to feed an ISO
image to Xen when setting up a new domU or upgrading one.  When
Xen is told to use a file for backing store, a script sets up a
VND and then uses that as Xen really wants a device.  It doesn't
work when the ISO device is on a NAS, I have to copy it to the
dom0.  BTW, the dom0 is running 6.1.5.  I was just poking at it,
and may need to poke at it some more to try a couple of things.

}-- End of excerpt from Michael van Elst


Re: In-kernel units for block numbers, etc ...

2015-11-28 Thread John Nemeth
On Nov 29, 12:05am, Michael van Elst wrote:
} k...@munnari.oz.au (Robert Elz) writes:
} 
} >I havem't looked carefully yet, but does vnd have the RMW behaviour to
} >allow an emulated small sector drive to exist on a big sector underlying.
} 
} It doesn't need to, the backend is a file and you can access arbitrary
} byte positions. The "RMW behaviour" is what the underlying filesystem
} automatically provides.

 On a side note, if the backend is just a file, why doesn't
vnd(4) work with NFS?

}-- End of excerpt from Michael van Elst


Re: In-kernel units for block numbers, etc ...

2015-11-26 Thread John Nemeth
On Nov 27,  6:00am, Robert Elz wrote:
}
} Date:Fri, 27 Nov 2015 07:12:50 +1100
} From:matthew green 
} Message-ID:  <18094.1448568...@splode.eterna.com.au>
} 
}   | FWIW, i "fixed" raidframe on 4K disks a few years back.
} 
} Do we allow mirroring where one drive is 512 byte sectors, and the
} other is 4K ?
} 
} If so (and I'd hope the answer is yes) what happens if the 4K drive
} dies and is replaced by a 512 byte sector drive?

 I would hope the answer is no, considering how much that would
complicate things, not to mention the slow down (i.e. doing a single
sector write on one drive would require an RMW cycle on the other).

}-- End of excerpt from Robert Elz


Re: POSIX.1 semaphores vs message queues

2015-11-13 Thread John Nemeth
On Nov 13,  6:34pm, Masao Uebayashi wrote:
} On Mon, Nov 9, 2015 at 7:13 PM, John Nemeth <jnem...@cue.bc.ca> wrote:
} > On Nov 9, 11:15am, Masao Uebayashi wrote:
} > } On Mon, Nov 9, 2015 at 9:21 AM, Joerg Sonnenberger
} > } <jo...@britannica.bec.de> wrote:
} > } > On Mon, Nov 09, 2015 at 08:05:43AM +0800, Paul Goyette wrote:
} > } >> Well, both EXEC_SCRIPT and COREDUMP are modularized, and they _are_
} > } >> optional.
} > } >
} > } > See part about modularity masturbation. Making things optional for the
} > } > sake of making them optional is just as wrong.
} > } >
} > } >> Both EXEC_SCRIPT and COREDUMP are also much smaller than the ksem code;
} > } >> these two optional/removeable modules together add up to just about
} > } >> the size of a SEMAPHORE module.  (On amd64 we have exec_script weighing
} > } >> in at 1285 bytes and coredump at 3895 bytes, while ksem tips the scales
} > } >> at 5186 bytes).  There are numerous other modules which are similar in
} > } >> size to the SEMAPHORE module.
} > } >
} > } > Add in the page alignment and the cost becomes even larger. There is
} > } > nothing to be gained.
} > }
} > } Please don't (intentionally) confuse module in general and dynamic 
loading.
} > }
} > } For buiit-in modules, the extra size is code added by #ifdef _MODULE.
} > } In the long run, xxx_modcmd() functions are merged into kctors.  If
} >
} >  Uh, I don't think so.  Not unless you have one heck of a good
} > reason.
} 
} If you need only one reason: dynamically loadable modules help
} development and debugging.

 What does this have to do with xxx_modcmd()?  It's also isn't
necessarily a good enough reason to turn everything and its dog
into a module.

} > xxx_modcmd() does more then just initialize the module.
} 
} I know I know...  That sentence should have been read as: *part of*
} xxx_modcmd() *might be* merged into kctors.

 That doesn't answer the concern that module init routines take
a parameter and return a result code.  If you yank the module init
routine out of xxx_modcmd(), you remove significant functionality.

} > Spreading that stuff all over the place would not be nice.  Also,
} > we need to be able to pass parameters to the initialization routine
} > and check the return code.  These are NOT fire and forget routines.
} >
} >  There is a reason that planned major changes are supposed to
} > be discussed.  It is so that people know what is happening and to
} > give people a chance to point out things you might not have thought
} > of.  "By the way, this is what's going to happen," is not how you
} > start a discussion.
} 
} I have tried to explain the need of kctors, instead of hardcoded
} sequence of xxx_init() functions in init_main.c:main(), generated by
} dependency.

 This is truely lame.  It's not like you have to make a gazillion
calls from init_main() to each module.  One call to a module routine
causes all modules to inited.

 Also, I don't think I've seen any discussion here.  I've seen
people asking you to tell us what your intentions are, without any
kind of real response from you.

} > } other metada consume more than expected, it will be addressed and
} > } reconsidered.  But that goes away in !MODULAR kernels.  So virtually
} > } you lose nothing.
} > }-- End of excerpt from Masao Uebayashi
}-- End of excerpt from Masao Uebayashi


Re: POSIX.1 semaphores vs message queues

2015-11-13 Thread John Nemeth
On Nov 13,  7:46pm, Masao Uebayashi wrote:
} On Fri, Nov 13, 2015 at 8:05 PM, John Nemeth <jnem...@cue.bc.ca> wrote:
} > On Nov 13,  6:34pm, Masao Uebayashi wrote:
} > } On Mon, Nov 9, 2015 at 7:13 PM, John Nemeth <jnem...@cue.bc.ca> wrote:
} > } > On Nov 9, 11:15am, Masao Uebayashi wrote:
} > } > } On Mon, Nov 9, 2015 at 9:21 AM, Joerg Sonnenberger
} > } > } <jo...@britannica.bec.de> wrote:
} > } > } > On Mon, Nov 09, 2015 at 08:05:43AM +0800, Paul Goyette wrote:
} > } > } >> Well, both EXEC_SCRIPT and COREDUMP are modularized, and they _are_
} > } > } >> optional.
} > } > } >
} > } > } > See part about modularity masturbation. Making things optional for 
the
} > } > } > sake of making them optional is just as wrong.
} > } > } >
} > } > } >> Both EXEC_SCRIPT and COREDUMP are also much smaller than the ksem 
code;
} > } > } >> these two optional/removeable modules together add up to just about
} > } > } >> the size of a SEMAPHORE module.  (On amd64 we have exec_script 
weighing
} > } > } >> in at 1285 bytes and coredump at 3895 bytes, while ksem tips the 
scales
} > } > } >> at 5186 bytes).  There are numerous other modules which are 
similar in
} > } > } >> size to the SEMAPHORE module.
} > } > } >
} > } > } > Add in the page alignment and the cost becomes even larger. There is
} > } > } > nothing to be gained.
} > } > }
} > } > } Please don't (intentionally) confuse module in general and dynamic 
loading.
} > } > }
} > } > } For buiit-in modules, the extra size is code added by #ifdef _MODULE.
} > } > } In the long run, xxx_modcmd() functions are merged into kctors.  If
} > } >
} > } >  Uh, I don't think so.  Not unless you have one heck of a good
} > } > reason.
} > }
} > } If you need only one reason: dynamically loadable modules help
} > } development and debugging.
} >
} >  What does this have to do with xxx_modcmd()?  It's also isn't
} > necessarily a good enough reason to turn everything and its dog
} > into a module.
} >
} > } > xxx_modcmd() does more then just initialize the module.
} > }
} > } I know I know...  That sentence should have been read as: *part of*
} > } xxx_modcmd() *might be* merged into kctors.
} >
} >  That doesn't answer the concern that module init routines take
} > a parameter and return a result code.  If you yank the module init
} > routine out of xxx_modcmd(), you remove significant functionality.
} >
} > } > Spreading that stuff all over the place would not be nice.  Also,
} > } > we need to be able to pass parameters to the initialization routine
} > } > and check the return code.  These are NOT fire and forget routines.
} > } >
} > } >  There is a reason that planned major changes are supposed to
} > } > be discussed.  It is so that people know what is happening and to
} > } > give people a chance to point out things you might not have thought
} > } > of.  "By the way, this is what's going to happen," is not how you
} > } > start a discussion.
} > }
} > } I have tried to explain the need of kctors, instead of hardcoded
} > } sequence of xxx_init() functions in init_main.c:main(), generated by
} > } dependency.
} >
} >  This is truely lame.  It's not like you have to make a gazillion
} > calls from init_main() to each module.  One call to a module routine
} > causes all modules to inited.
} 
} Are you proposing to make everything a module and always use module
} init routine?

 I am most certainly not proposing to make everything a module.
I started out in this thread by objecting to the idea that basic
functionality should be modularised.

} >  Also, I don't think I've seen any discussion here.  I've seen
} > people asking you to tell us what your intentions are, without any
} > kind of real response from you.
} >
} > } > } other metada consume more than expected, it will be addressed and
} > } > } reconsidered.  But that goes away in !MODULAR kernels.  So virtually
} > } > } you lose nothing.
} > } > }-- End of excerpt from Masao Uebayashi
} > }-- End of excerpt from Masao Uebayashi
}-- End of excerpt from Masao Uebayashi


Re: POSIX.1 semaphores vs message queues

2015-11-09 Thread John Nemeth
On Nov 9, 11:15am, Masao Uebayashi wrote:
} On Mon, Nov 9, 2015 at 9:21 AM, Joerg Sonnenberger
}  wrote:
} > On Mon, Nov 09, 2015 at 08:05:43AM +0800, Paul Goyette wrote:
} >> Well, both EXEC_SCRIPT and COREDUMP are modularized, and they _are_
} >> optional.
} >
} > See part about modularity masturbation. Making things optional for the
} > sake of making them optional is just as wrong.
} >
} >> Both EXEC_SCRIPT and COREDUMP are also much smaller than the ksem code;
} >> these two optional/removeable modules together add up to just about
} >> the size of a SEMAPHORE module.  (On amd64 we have exec_script weighing
} >> in at 1285 bytes and coredump at 3895 bytes, while ksem tips the scales
} >> at 5186 bytes).  There are numerous other modules which are similar in
} >> size to the SEMAPHORE module.
} >
} > Add in the page alignment and the cost becomes even larger. There is
} > nothing to be gained.
} 
} Please don't (intentionally) confuse module in general and dynamic loading.
} 
} For buiit-in modules, the extra size is code added by #ifdef _MODULE.
} In the long run, xxx_modcmd() functions are merged into kctors.  If

 Uh, I don't think so.  Not unless you have one heck of a good
reason.  xxx_modcmd() does more then just initialize the module.
Spreading that stuff all over the place would not be nice.  Also,
we need to be able to pass parameters to the initialization routine
and check the return code.  These are NOT fire and forget routines.

 There is a reason that planned major changes are supposed to
be discussed.  It is so that people know what is happening and to
give people a chance to point out things you might not have thought
of.  "By the way, this is what's going to happen," is not how you
start a discussion.

} other metada consume more than expected, it will be addressed and
} reconsidered.  But that goes away in !MODULAR kernels.  So virtually
} you lose nothing.
}-- End of excerpt from Masao Uebayashi


Re: POSIX.1 semaphores vs message queues

2015-11-07 Thread John Nemeth
On Nov 8,  7:22am, Paul Goyette wrote:
} On Sat, 7 Nov 2015, Joerg Sonnenberger wrote:
} > On Sun, Nov 08, 2015 at 06:35:36AM +0800, Paul Goyette wrote:
} >> On Sat, 7 Nov 2015, Joerg Sonnenberger wrote:
} >>> On Sat, Nov 07, 2015 at 10:55:49AM +0800, Paul Goyette wrote:
}  I'd like to understand the rationale that makes POSIX sempahores a
}  non-optional component of the kernel, while POSIX message queues are
}  still optional.  Both seem to be related specifically to use in the
}  librt real-time library.
} >>>
} >>> Semaphores are used quite a lot and not only required by librt, but
} >>> also by libpthread. I'm not sure what is using message queues.
} >>
} >> Hmmm, sounds like a great reason to include the semaphore code in
} >> every kernel by default.  But it doesn't sound sufficiently critical
} >> to _prevent_ it from being removed from custom kernels if explicitly
} >> requested by the user.
} >>
} >> I'd like to suggest that this code once again become an option.  Rather
} >> than adding an option to every kernel configuration file, however, we
} >> can simply add it to src/sys/conf/std where it will get included by
} >> default, in the same manner as MQUEUE.  (I also propose use of "option
} >> SEMAPHORE" rather than P1003_1B_SEMAPHORE, similar to MQUEUE.)
} >
} > I don't see the point in having options for every single system call or
} > the like. At best, it is a form of modularity masturbation and at worst,
} > it is asking for difficult to analyze bugs when someone actually insists
} > on removing them.
} 
} I do understand your position.  And I'm well aware of how difficult it
} can be to analyze any bugs that get introduced.  (Refer my recent issues
} that resulted from fixing the module dependencies for compat_netbsd32,
} or the issue with SYSVSEM, which took a couple of weeks to locate and
} fix.)
} 
} This isn't a request to modularize a single syscall, it's a complete set
} of ten syscalls for a self-contained set of functionality on which there
} are no other kernel or modular dependencies.  There is no functional
} impact on anyone who uses standard kernels.  I only impacts those who
} explicitly request the exclusion of this code from their kernels, and in
} the exact same manner as requesting the exclusion of MQUEUE or AIO.
} (And yes, I run with both of those removed from my kernels, loading the
} modules on-demand.)
} 
} Based on the (lack of) commentary I received in my recent bug-hunts, it
} seems that very few people would care about re-modularizing ksem.  I'm
} willing to do all the work (actually, it's already done, except for
} testing and fixing any bugs I find).
} 
} I'd really appreciate comments from others

 In general, I like the idea of modules.  However, in this
case, I pretty much agree with Joerg and have to ask, what is the
point of modularising basic functionality?  Is having it in the
kernal all the time causing some kind of issue?

}-- End of excerpt from Paul Goyette


Re: Choice of SAS controller

2015-07-16 Thread John Nemeth
On Jul 16,  3:27pm, Edgar =?iso-8859-1?B?RnXf?= wrote:
}
}  I have been using Areca RAID controllers for several years now and 
}  I have been pretty happy with them.
} Can you drop me a part number?
} The intersection between devices supported by NetBSD and those actually 
} still available on the market seems to be aproximately empty.
} 
} Of course, it may be as trivial as adding PCI IDs to sys/dev/pci/arcsmr.c 
} to add support for newer controllers.

 I'm using this one:

arcmsr0: Areca ARC-1680 Host Adapter RAID controller
arcmsr0: 8 ports, 2048MB SDRAM, firmware V1.49 2010-12-02

}-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=


Re: mount_checkdirs

2015-07-08 Thread John Nemeth
On Jul 9, 12:27am, Rhialto wrote:
} On Mon 06 Jul 2015 at 09:58:59 +, David Holland wrote:
} 
}  Also it's occasionally useful to mount over things and leave a process
}  underneath, which this logic seems to complicate.
} 
} If I read the code correctly, it looks for processes that have a current
} working directory or root directory exactly at the mount point. But the
} mount point directory does not need to be empty. A process could have a
} cwd or root in any directory inside it. So as-is, the code is
} insufficient for its intended purpose anyway.
} 
} Furthermore, the process can have open files from that directory tree.
} If its cwd or root gets changed (and into what exactly, if it isn't the
} exact mount point?) it has files open that it can't find anymore with
} another call to open(2). That seems like an inconsistency that we may
} want to avoid due to the POLA.

 The same process or another process could unlink the open
file.  There is no guarantee of being to open(2) a file twice.

}-- End of excerpt from Rhialto


Re: Specification of BTINFO_CONSOLE value in bootinfo.h

2015-06-03 Thread John Nemeth
On Jun 3,  9:36am, deco33...@yandex.com wrote:
} 
} I was reading the boot code to make netbsd multiboot compliant.

 Uh, it already is, or should be.  See sys/arch/i386/i386/multiboot.c.
I'm not certain if that is used for amd64.  But, if not, it would
probably be the place to start.

} What defines those values in arch/x86/include/bootinfo.h, e.g.
} BTINFO_CONSOLE,BTINFO_BOOTDISK.. is it the MBR ? the netbsd
} bootloader ?  I mean, the value of 6 for BTINFO_CONSOLE can be
} found in which specification ? Could not find out.

 None.  The definition is bootinfo.h.  Anyways, I don't believe
it's relevant to multiboot anyways, as that passes a string, not
a struct.

}-- End of excerpt from deco33...@yandex.com


Re: Specification of BTINFO_CONSOLE value in bootinfo.h

2015-06-03 Thread John Nemeth
On Jun 3,  9:57am, deco33...@yandex.com wrote:
} Thanks but,
} 
} lookup_bootinfo(BTINFO_CONSOLE); - initiate the console.

 This stuff is related to the NetBSD native bootloader and has
absolutely nothing to do with multiboot.

}-- End of excerpt from deco33...@yandex.com


Re: disk driver interface

2014-12-30 Thread John Nemeth
On Dec 29,  9:28pm, Christos Zoulas wrote:
} On Dec 29,  4:11pm, jnem...@cue.bc.ca (John Nemeth) wrote:
} 
} |  A semi-quick look around shows that pretty much everything
} | that would support the drvctl(4) method would also support the
} | DIOCGDISKINFO method.  Both methods return the same proplib dictionary
} | for disk geometry info.  So perhaps the DIOCGDISKINFO method should
} | always be used in preference to the drvctl(4) method.
} 
} I think that using it directly makes sense. If you want you can
} delete the drvctl and partutil code in gpt. Now that we have both
} the ioctls and the DIOCGDISKINFO code, doing the same thing 3
} different ways does not make a lot of sense, except to demonstrate
} we (like perl) have many different ways of doing the same thing but
} with varying complexity and possibility of error.

 I want to pullup gpt(8) to all branches, so now I have to
figure out what to do with it.  I'm thinking I might just pullup
everything before the recent change.  Given that, I can just blow
away the drvctl(4) stuff.

} |  As far as I know, the only drivers that don't support drvctl(4)
} | and DIOCGDISKINFO are ccd(4) and cgd(4).  They should just be fixed.
} | Then DIOCGDISKINFO can be used always with everything else relegated
} | to compat.  Also src/sbin/fsck/partutil.* should probably be moved
} | to libutil as they appear to be of general utility, instead of
} | having random utilities pulling in parts of fsck.
} 
} Michael fixed cgd and I fixed ccd. I am not sure about getdiskinfo(),

 I saw that you added a call to disk_ioctl() to ccd.  I'm just
not sure what you expected it to do, given that the struct disk_geom
wasn't filled in.  I just fixed that problem.

} the API is clumsy. If is what I found useful when converting the
} individual fsck and dump utilities to wedges. It should and could
} be improved.  getdisksize() on the other hand can be abstracted to
} the two new ioctls() + opendisk() now...
} 
} | } I think we should decide on a single API/interface to get general
} | } information about disk devices. If a big DIOCGDINFO is that,
} | } fine.  But we decided it was not providing enough information a
} | } while ago and so we got DIOCGDISKINFO. Providing a big DIOCGDINFO
} | } would allow us to have compatibility with OpenBSD and bring a 70's
} | } technology to the 21st century.
} | 
} |  It's a dead technology.  Besides, for real OpenBSD compability
} | we would have to deal with their on-disk changes as well.
} 
} Right, this is probably too much work for too little gain.
} 
} | } in sbin/fsck/). Perhaps adding a DIOCGDISKGEOM that returns just
} | } disk_geom would be nice to have and can replace DIOCGDINFO.
} | 
} |  DIOCGDISKGEOM could easily be added to
} | src/sys/kern/disk_subr.c:disk_ioctl(), then all drivers that support
} | DIOCGDISKINFO would automatically support DIOCGDISKGEOM.
} 
} Yes, then we don't need all the plist crap in partutil.c, since the
} only thing that partutil uses from the plist is geometry. I think that
} we should add this ioctl and not need to go through the hoops of
} extracting the geometry from the plist now.
} 
} | } - import FreeBSD DIOCGMEDIASIZE (and DIOCGSECTORSIZE) ioctls.
} | } 
} | } I would do that anyway, since it is simple and most things just
} | } need those two numbers.
} | 
} |  These ioctls could probably also be added to
} | src/sys/kern/disk_subr.c:disk_ioctl().  Any disk drivers that don't
} | call that function should be fixed.
} 
} Michael did that already.
} 
}-- End of excerpt from Christos Zoulas


Re: disk driver interface

2014-12-29 Thread John Nemeth
On Dec 29,  3:00am, Michael van Elst wrote:
} 
} Currently NetBSD has three programming interfaces to determine
} disk geometry from userland.
} 
} - ioctl DIOCGDINFO. The traditional interface, limited to 32bit
}   numbers or disks  2TB because its data structure corresponds
}   to the binary on-disk structure.
} 
} - the get-properties command to the drvctl(4) driver. drvctl(4)
}   is missing on some ports and some disk drivers don't make
}   geometry properties available.
} 
} - ioctl DIOCGWEDGEINFO. Works only for wedges but not for the
}   disk drivers themselves. This is fine for operations on
}   data blocks of a wedge but doesn't help e.g. partitioning
}   tools. It also does not provide the sector size.
} 
} To solve this, we could
} 
} - create a new DIOCGDINFO version that uses larger numbers. AFAIK
}   that is about what OpenBSD does. The on-disk structure could be
}   translated but writing a label might be incompatible if partitions
}   are defined beyond the 2TB limit.
} 
} - make drvctl(4) mandatory and make all disk drivers provide
}   geometry properties.

 I would tend to go with this since it is used for a lot more
then just getting the geometry of a drive.

} - make DIOCGWEDGEINFO available for the disk drivers and
}   ignore wedge-related information.
} 
} - import FreeBSD DIOCGMEDIASIZE (and DIOCGSECTORSIZE) ioctls.
} 
} 
} Comments?

 I really don't care about this silly little issue.  But, as
a side note, I will note that gpt(8) (which originated this thread)
came from FreeBSD so it already has support for the FreeBSD ioctls
and would use them in preference to drvctl(4) method if they existed.

}-- End of excerpt from Michael van Elst


Re: disk driver interface

2014-12-29 Thread John Nemeth
On Dec 29,  4:46pm, Christos Zoulas wrote:
} In article m7qg4d$3kt$1...@serpens.de,
} Michael van Elst mlel...@serpens.de wrote:
} 
} Currently NetBSD has three programming interfaces to determine
} disk geometry from userland.
} 
} - ioctl DIOCGDINFO. The traditional interface, limited to 32bit
}   numbers or disks  2TB because its data structure corresponds
}   to the binary on-disk structure.
} 
} - the get-properties command to the drvctl(4) driver. drvctl(4)
}   is missing on some ports and some disk drivers don't make
}   geometry properties available.
} 
} - ioctl DIOCGWEDGEINFO. Works only for wedges but not for the
}   disk drivers themselves. This is fine for operations on
}   data blocks of a wedge but doesn't help e.g. partitioning
}   tools. It also does not provide the sector size.
} 
} Actually there is also:
}  - ioctl DIOCGDISKINFO. This is supposed to work for all kinds of
}disks but it returns a plist, and it is a pain to use.

 A semi-quick look around shows that pretty much everything
that would support the drvctl(4) method would also support the
DIOCGDISKINFO method.  Both methods return the same proplib dictionary
for disk geometry info.  So perhaps the DIOCGDISKINFO method should
always be used in preference to the drvctl(4) method.

 As far as I know, the only drivers that don't support drvctl(4)
and DIOCGDISKINFO are ccd(4) and cgd(4).  They should just be fixed.
Then DIOCGDISKINFO can be used always with everything else relegated
to compat.  Also src/sbin/fsck/partutil.* should probably be moved
to libutil as they appear to be of general utility, instead of
having random utilities pulling in parts of fsck.

} To solve this, we could
} 
} - create a new DIOCGDINFO version that uses larger numbers. AFAIK
}   that is about what OpenBSD does. The on-disk structure could be
}   translated but writing a label might be incompatible if partitions
}   are defined beyond the 2TB limit.
} 
} I think we should decide on a single API/interface to get general
} information about disk devices. If a big DIOCGDINFO is that,
} fine.  But we decided it was not providing enough information a
} while ago and so we got DIOCGDISKINFO. Providing a big DIOCGDINFO
} would allow us to have compatibility with OpenBSD and bring a 70's
} technology to the 21st century.

 It's a dead technology.  Besides, for real OpenBSD compability
we would have to deal with their on-disk changes as well.

} - make drvctl(4) mandatory and make all disk drivers provide
}   geometry properties.
} 
} Well, I don't particularly like to have to go through an auxiliary
} driver to get information that should be readily available from
} the direct driver, but we could consider making drvctl mandatory.
} The only problem would be small kernels.
} 
} - make DIOCGWEDGEINFO available for the disk drivers and
}   ignore wedge-related information.
} 
} Well, we have DIOCGDISKINFO... which provides the kitchensink, but
} it is hard to use. I think it is a demonstration on how a fully
} generalized API that provides everything loses because of programming
} complexity. Having said that, for the most part (getting struct
} disk_geom out of it), it works once abstracted (see partutil.[ch]
} in sbin/fsck/). Perhaps adding a DIOCGDISKGEOM that returns just
} disk_geom would be nice to have and can replace DIOCGDINFO.

 DIOCGDISKGEOM could easily be added to
src/sys/kern/disk_subr.c:disk_ioctl(), then all drivers that support
DIOCGDISKINFO would automatically support DIOCGDISKGEOM.

} - import FreeBSD DIOCGMEDIASIZE (and DIOCGSECTORSIZE) ioctls.
} 
} I would do that anyway, since it is simple and most things just
} need those two numbers.

 These ioctls could probably also be added to
src/sys/kern/disk_subr.c:disk_ioctl().  Any disk drivers that don't
call that function should be fixed.

}-- End of excerpt from Christos Zoulas


Re: disk driver interface

2014-12-29 Thread John Nemeth
On Dec 30,  6:42am, David Holland wrote:
} On Tue, Dec 30, 2014 at 02:50:14AM +, Christos Zoulas wrote:
}   In article 20141229233211.ga10...@netbsd.org,
}   David Holland  dholland-t...@netbsd.org wrote:
}   
}   It might be a good idea to do this for our own use, but probably it
}   shouldn't be a 3rd-party interface. (Unless we decide like the look of
}   it, I guess.)
}   
}   Although I'm not real thrilled about multiplying uses of proplib...
}   
}   This is why I said let's add DIOCGDISKGEOM to avoid proplib and 
DIOCGDISKINFO.
} 
} Because it's better to multiply ioctl entities? :-)

 Before we go adding ioctl entities all over the place, we
should probably find out what other OSes are doing.  We've already
added a couple from FreeBSD.  The question is, what else is out
there that may satisfy our needs?

} (I suppose it in fact is...)

 I'm not sure I agree.  But, then I don't have the same hate-on
for proplib that others seem to have.

}-- End of excerpt from David Holland


Re: kernel constructor

2014-11-11 Thread John Nemeth
On Nov 12,  1:46am, Masao Uebayashi wrote:
} On Wed, Nov 12, 2014 at 1:15 AM, Kamil Rytarowski n...@gmx.com wrote:
}  From David Holland
}  Please don't do that. Nothing good can come of it - you are asking for
}  a thousand weird problems where undisclosed ordering dependencies
}  silently manifest as strange bugs.
} 
} Everyone is aware of that.  Code conversion must be done extremely
} carefully.  Order must be preserved.
} 
}  Furthermore, the compiler can and probably will assume that
}  constructor functions get called before all non-constructor code, and
}  owing to unavoidable issues in early initialization this will not be
}  the case in some contexts. (I first hit this problem back in about
}  1995ish when some more gung-ho colleagues were trying to use C++
}  global constructors in a C++ kernel, and we eventually had to declare
}  a moratorium on all global constructors.)
} 
} Thanks, but irrelevant for kernel...
} 
}  init_main.c could use some tidying, but there's nothing fundamentally
}  wrong with it that will be improved by adding a lot of implicit magic
}  that doesn't do what the average passerby expects.
} 
} Function pointers are not magic.
} 
} (snip)
}  And last but not least... what's wrong with init_main.c? It must be clear 
for a developer adding a new platform or debugging hardware bring-up. It gives 
me big picture on that what's going on step-by-step, even when I was lurking 
into assembly of our kernel... call it, call that, call this.. making it all 
clear.
} 
} Those functions are hardcoded and ordered even without dependencies
} among them, that's a big problem.

 Without dependencies?!?  The ordering gives the dependencies.

} The biggest problem of constructors (and indirect function call in
} general), I am aware of, is, static code analysis (code reading, tag
} jump, ...) becomes difficult (or impossible).

 Considering that we're talking about the kernel, this is an
extremely huge flaw!  As in, DON'T do it!

}-- End of excerpt from Masao Uebayashi


Re: MI linker script

2014-11-09 Thread John Nemeth
On Nov 9, 11:52am, Masao Uebayashi wrote:
} On Sun, Nov 9, 2014 at 11:22 AM, John Nemeth jnem...@cue.bc.ca wrote:
}   The question wasn't simply about ld -r stuff.  It was about
}  the entire program of config(1) changes, linking changes, module(9)
}  changes, etc.  There's an awful lot of stuff happening to major
}  parts of the system without any discussion.
} 
} The entire program of config(1) is a bit too exaggerated.  I'm
} rather hunting low-hanging fruits.

 By program I didn't mean config(1), I meant what you're
doing.  And, what you are doing appears to be a lot more then just
hunting low-hanging fruits.

}-- End of excerpt from Masao Uebayashi


Re: MI linker script

2014-11-08 Thread John Nemeth
On Nov 9,  1:25am, Masao Uebayashi wrote:
} On Sat, Nov 8, 2014 at 11:53 PM, Christos Zoulas chris...@astron.com wrote:
}  depending on ld -r to work properly
} 
} I know none of you trust me, but you don't trust ld -r?

 It has nothing to do with trust.  It's more like wanting to
know what the heck is going on.  Normally major work like this
would start with a discussion or at least an announcement of the
plan.  Instead all that happened is suddenly we see a major overhaul
of a critical item with no clue as to why.

 So, what is the plan?  Why are you doing this?  What are your
goals (i.e. what is the expected end result)?  What are you doing
with modules?

}-- End of excerpt from Masao Uebayashi


Re: MI linker script

2014-11-08 Thread John Nemeth
On Nov 9, 10:35am, Masao Uebayashi wrote:
} On Sun, Nov 9, 2014 at 5:07 AM, John Nemeth jnem...@cue.bc.ca wrote:
}  On Nov 9,  1:25am, Masao Uebayashi wrote:
}  } On Sat, Nov 8, 2014 at 11:53 PM, Christos Zoulas chris...@astron.com 
wrote:
}  }  depending on ld -r to work properly
}  }
}  } I know none of you trust me, but you don't trust ld -r?
} 
}   It has nothing to do with trust.  It's more like wanting to
}  know what the heck is going on.  Normally major work like this
}  would start with a discussion or at least an announcement of the
}  plan.  Instead all that happened is suddenly we see a major overhaul
}  of a critical item with no clue as to why.
} 
}   So, what is the plan?  Why are you doing this?  What are your
}  goals (i.e. what is the expected end result)?  What are you doing
}  with modules?
} 
} Something like this:
} https://mail-index.netbsd.org/tech-kern/2012/05/28/msg013235.html
} 
} In short: making kernel build better by sharing *.o

 The question wasn't simply about ld -r stuff.  It was about
the entire program of config(1) changes, linking changes, module(9)
changes, etc.  There's an awful lot of stuff happening to major
parts of the system without any discussion.

}-- End of excerpt from Masao Uebayashi


Re: asymmetric smp

2014-04-02 Thread John Nemeth
On Apr 2,  1:55pm, Johnny Billquist wrote:
} On 2014-04-01 23:04, Warner Losh wrote:
}  On Apr 1, 2014, at 5:49 AM, Johnny Billquist b...@softjar.se wrote:
} 
}  Good points.
}  Is this the right time to ask why booting NetBSD on a VAX (a 3500) now 
takes more than 15 minutes? What is the system doing all that time???
} 
}  FreeBSD used to take forever to boot on certain low-end ARM CPUs with 
/etc/rc.d after it was imported from NetBSD. This was due to crappy root-device 
performance (100kB/s is enough for anybody, right?) and crappy, at the time, 
pmap code that caused excess page traffic in the /etc/rc.d environment. Perhaps 
those areas would be fruitful to profile? Also, there were some inefficiencies 
that were either the result of a botched port, or were basic to the system that 
got fixed. Between fixing all these things, the boot time went from 10 minutes 
down to ~20s.
} 
} Always nice with some ideas. The problem here is that this used to be 
} way faster in the past, but have slowed down recently.
} 
} The time between entering a username and getting the password prompt in 
} the same 3500 with the latest release is something like 30 seconds.
} 
} This is on an otherwise idle system, where boot has completed. 30 
} seconds (approximately, I should time it) just from pressing enter after 
} the username, until I just get the Password: prompt seems incredible 
} to me.
} 
} The root fs in on nfs, as I'm running the machine diskless. Disk is 
} served from a -current NetBSD/alpha system sitting right next to it. And 
} I have changed the Alpha to run at 10 MB/s half duplex, and I have 2k 
} block size for NFS. Login is obviously already running, since that is 
} what also prompts for the username, and doing it twice should even put 
} some stuff in local cache.

 Uh, actually getty does the initial prompt for username on
the console.  After collecting the username, getty execs login.

}-- End of excerpt from Johnny Billquist


Re: Closing a serial device takes one second

2014-02-06 Thread John Nemeth
On Feb 6,  1:22pm, Dennis Ferguson wrote:
} On 6 Feb, 2014, at 12:18 , Marc Balmer m...@msys.ch wrote:
}  Actually the one second delay is wrong.  If you want to de-assert DTR
}  for a modem to hangup, then do it in the application.
} 
} You've clearly not run a bank of dial-in/out modems on a multiuser

  That's why Telebit Netblazers and Livingston Portmasters were
invented.

} I'm personally undisturbed by removing it, rather than fixing it, only
} because I don't know anyone who still uses dialup modems like that and
} I only remember this because I am old.  For the things I do use serial

 Does mentioning equipment from long dead companies make me old?

}-- End of excerpt from Dennis Ferguson


Re: The lamentation of proplib(3)

2014-01-28 Thread John Nemeth
On Jan 28,  7:40pm, Christian Koch wrote:
} On Tue, Jan 28, 2014 at 06:44:57PM +, Mindaugas Rasiukevicius wrote:
}  and my own dissatisfaction has reached the point where I decided to raise
}  the question.  The question of replacing proplib(3) with a better library.
}  There were ideas by some developers to write a new library from scratch.
}  The FreeBSD project has recently developed a general purpose key-value pair
}  library, which is quite similar to nvpair library in Solaris.
} 
} Isn't proplib(3) quite heavily used throughout the system, both
} kernel space and user space?  It won't be a trivial task to fully

 It is.

} make this change, is all I'm saying.

 Definitely.  Also, nvlist doesn't address one of the significant
uses of proplib.

} I say don't get rid of proplib(3) entirely, how about moving it
} to pkgsrc at least?

 Something that is heavily used throughout the system can not
be moved to pkgsrc.  Pkgsrc is an addon, not part of the base
system.  Thus nothing in the base system can be dependent upon
pkgsrc to function.

}-- End of excerpt from Christian Koch


Re: in which we present an ugly hack to make sys/queue.h CIRCLEQ work

2013-11-24 Thread John Nemeth
On Nov 24,  5:25am, Mouse wrote:
} 
} Well, mrg wrote, when starting the thread,
} 
}  while preparing to update to GCC 4.8 i discovered that our
}  sys/queue.h CIRCLEQ macros violate C aliasing rules, ultimately
}  leading to the compiler eliding comparisons it declared as always
}  false.
} 
} which sure looks to me as though it's not just theoretical.  (I don't
} know personally; mrg's mail implies this was with gcc 4.8, which I
} don't run.)

 The work has now changed to GCC 4.8.2.  It is being prepped
for import.  The compiler work is basically done.  At this point,
it is mostly making sure that NetBSD builds and runs with it.
Since I'm not doing the work, I don't have a timeline, but it
shouldn't be too much longer (FSVO much longer).  This means that
sometime in the not too distant future anybody running -current,
or anybody that runs NetBSD 7.0 when it is released will be using
GCC 4.8.2 or later.

}-- End of excerpt from Mouse


Re: in which we present an ugly hack to make sys/queue.h CIRCLEQ work

2013-11-23 Thread John Nemeth
On Nov 23,  2:16pm, Dennis Ferguson wrote:
} On 22 Nov, 2013, at 21:40 , David Holland dholland-t...@netbsd.org wrote:
}  So ... looking at this code ... it seems like the core problem is that
}  TAILQ_HEAD and TAILQ_ENTRY are two different types (even though they
}  literally the same structure layout).  So if TAILQ_HEAD and TAILQ_ENTRY
}  were the same structure, it wouldn't be an issue.  It doesn't quite leap
}  out to me how that would be possible without changing the API a bit.
}  
}  I think it can be done by sticking an anonymous union into TAILQ_HEAD,
}  but of course anonymous unions aren't supported until C11.
} 
} It isn't perfectly clear to me that this code has an aliasing problem
} the way it is, though.  The only thing that matters in the standard are
} the types of the lvalue expressions used to access object in storage.  The
} lvalue expression types used to access the objects in storage in this
} case are 'type **', 'type **' and 'type *', which are the types those

 type ** and type * are not the same types.

} objects were stored with and the types that would be used for other
} accesses to the same locations.  The structure type used to arrive there
} should only matter if it is the type of an lvalue expression itself,
} e.g. *(struct foo *)ptr(?).
} 
} I would be interested in knowing an actual example of the comparison
} problem with the CIRCLEQ macro, if the concern isn't theoretical.  Since

 Uh, do you really think people would be doing all this work
for something that was theoretical?  The problem is that gcc 4.8
optimises out the comparison as being always false due to the
anti-alias rule.

} the C standard explicitly allows a pointer to a structure type to be
} converted to the type of its first member and back, to another structure
} type and back, or to char * or void * and back, the fact that the two

 I rather doubt that you can convert to a different structure type
and back.  Those would definitely be different objects.

} pointers point at different structure types is by itself insufficient to
} prove that they would not compare equal when suitably converted.  It seems
} like that conclusion would minimally need to depend on proving that there
} was no possible use of the structure pointers which wouldn't violate the
} aliasing requirements, i.e. that that are no structure members at the same
} offsets which have compatible types.  That's a rather aggressive optimization,
} and is kind of like throwing you in jail for a crime you haven't actually
} committed yet (though I guess that happens too).
} 
}-- End of excerpt from Dennis Ferguson


Re: A Library for Converting Data to and from C Structs for Lua

2013-11-17 Thread John Nemeth
On Nov 17, 11:02pm, Marc Balmer wrote:
} Am 17.11.13 20:40, schrieb Lourival Vieira Neto:
}  On Sun, Nov 17, 2013 at 4:39 PM, David Holland dholland-t...@netbsd.org 
wrote:
}  On Sun, Nov 17, 2013 at 01:32:03PM +0100, Hubert Feyrer wrote:
}I plan to import it and to make it available to both lua(1) and lua(4)
}   
}I wonder if we really need to get all this into NetBSD,
}instead of moving it to pkgsrc somehow.
} 
}  This...
}  
}  I think that would be nice to have Lua kernel modules in pkgsrc, if 
possible.
} 
} No, I don't think so.  They interact to much with the system, they need
} to be part of the system.

 Uh, no.  The whole idea behind modules clearly means being
able to use third party code.  We should be able to have modules
in pkgsrc.  There are no modules in pkgsrc yet, but that's just a
matter of figuring out the best way to do it.  There is no reason
why all modules must be included with the system.

}-- End of excerpt from Marc Balmer


Re: zero-length symlinks

2013-11-03 Thread John Nemeth
On Nov 3,  2:57pm, Sverre Froyen wrote:
} On 2013-11-03, at 11:47, Hubert Feyrer hub...@feyrer.de wrote:
}  On Sat, 2 Nov 2013, David Holland wrote:
}   I think not sensible is not a good enough reason to prohibit
}   something.
}  
}  Yeah yeah, but still nowadays we don't allow adding hard links to
}  directories. So while that's a valid premise, it's not universal.
}  
}  FWIW, the idea not allowing hard links to directories is that
}  .. wouldn't be unique any more. I don't see such a thing with
}  a symlink pointing to .
} 
} On Unix System V, the link command would allow hard-linking
} directories when used as root. A quick test shows that NetBSD
} does not allow this. Was the feature removed from NetBSD (or BSD)
} at some point or was it an addition to Bell Labs Unix after
} Berkeley received the Bell Labs sources? Perhaps a feature unique
} to the v7 file system.

 It has to do with the fact that historically mkdir(2) was
actually mkdir(3), it wasn't an atomic syscall and was a sequence
of operation performed by a library routine.  The library routine
called link(2) to hook the new directory into the directory tree.
Once mkdir(2) was created and the kernel became responsible for
everything link(2) lost the ability to create hard links to
directories.  The reason being that hard links to directories means
that the tree of directories is no longer a DAG and that causes
serious problems for the tree traversing code.

 I don't know at what point this happened in BSD, but certainly,
it was long before NetBSD came on the scene.  BTW, I doubt that
modern System V, i.e. SVR4 would allow you to make hard links to
directories (that capability probably went away somewhat before
SVR4 came about).

}-- End of excerpt from Sverre Froyen


module path message

2013-10-30 Thread John Nemeth
 I've made a patch to the module subsystem to print the default
module load path during initialisation.  The reason for doing this
is that certain arch/machine combos don't work with the standard
modules for their archs and require custom built modules.  This is
the case for several evbppc variants and xen.  The evbppc variants
are already working.  I'm working on getting xen modules working.
I have the modules building and the kernel finds the correct modules,
but has problems loading them.  Anyways, here's a sample of the
dmesg showing the new message.  Let the flamewar about the message
and when it should be displayed begin...

NetBSD 6.99.25 (XEN3_DOMU) #3: Tue Oct 29 19:07:29 PDT 2013

jnemeth@P4-3679GHz:/usr/local/NetBSD-current/amd64-xenmod-objdir/sys/arch/amd64/compile/XEN3_DOMU
total memory = 512 MB
avail memory = 486 MB
The default path for module loading is: /stand/amd64-xen/6.99.25/modules
mainbus0 (root)
hypervisor0 at mainbus0: Xen version 4.2.3
...


Re: module path message

2013-10-30 Thread John Nemeth
On Oct 30, 11:00am, Alan Barrett wrote:
} On Tue, 29 Oct 2013, John Nemeth wrote:
} The default path for module loading is: /stand/amd64-xen/6.99.25/modules
} 
} I suggest exposing the path via sysctl, and printing the sysctl 
} mib name in the message, something like
} 
}   kern.module.path=/stand/amd64-xen/6.99.25/modules

 Good idea, then it's easily accessible at run time.  Of course,
with that, it doesn't have to be printed.

}-- End of excerpt from Alan Barrett


Re: module path message

2013-10-30 Thread John Nemeth
On Oct 30, 12:40pm, Marc Balmer wrote:
} Am 30.10.13 10:00, schrieb Alan Barrett:
}  On Tue, 29 Oct 2013, John Nemeth wrote:
}  The default path for module loading is: /stand/amd64-xen/6.99.25/modules
}  
}  I suggest exposing the path via sysctl, and printing the sysctl mib name
}  in the message, something like
}  
}  kern.module.path=/stand/amd64-xen/6.99.25/modules
} 
} If that variable is to be writable, it has to be somehow integrated with
} kauth, so that it can not be changed when the kauth equivalent of a
} raised securelevel is in place.

 It will be read only for now.

}-- End of excerpt from Marc Balmer


Re: How to hot swap an SCA SCSI disk with NetBSD

2013-10-26 Thread John Nemeth
On Oct 25,  2:20pm, Mouse wrote:
}
}  Generally speaking, SCA SCSI drives are hot-swap capable.
} 
} Sure...but the drive bays aren't necessarily.  For example, the drive
} bay in a SS20 probably isn't; you can't even get to it without removing
} the lid, so there'd've been little reason for Sun to spend the money
} for the signal switching hardware to make it hotswap.
} 
}  I'm not interested in fiddling with 50-pin or 68-pin with a paused machine 
-$
} 
} Actually, with a _paused_ machine, IME - I M limited E - it's fine.
} It's doing so on an active SCSI bus, one with transfers going on, that
} I was saying was a recipe for trouble.

 With SCA, or anything else that is designed for hotswap, the
ground pins are longer then the other pins.  This means that ground
disconnects last and connects first.  This prevents spikes.
Hotswapping with connectors that aren't designed for it can cause
physical damage to equipment, and thus is not generally recommended.

}  The key thing in documentation is not just how, but why.
} 
}  For example, why scsictl dev detach?  Why not just stop and
}  remove?
} 
} Personally?  The reasons which occur to me offhand:

 SCA is just a type of connector.  As far as I know, there are
no extra signals (in particular there is no way to signal the OS
that the device was removed).

} Because doing that doesn't get the teardown and rebuild I mentioned
} upthread.  Because not all the scsictl versions I have in use support
} stop.  Beacuse I'm not always replacing it with an identical drive (or,
} sometimes, at all).
} 
}  The idea here is to document a procedure generally. Odds are good lots of 
it$
} 
} Yeah - everything but the physical-layer stuff, I'd guess.
} 
} (SAS, gh)
} 
}-- End of excerpt from Mouse


Re: Lua in-kernel (lbuf library)

2013-10-18 Thread John Nemeth
On Oct 18, 11:03am, Marc Balmer wrote:
} Am 18.10.13 10:43, schrieb Artem Falcon:
}  Marc Balmer marc at msys.ch writes:
}  Justin Cormack justin at specialbusservice.com writes:
}  I have been using the luajit ffi and luaffi, which let you directly
}  use C structs (with bitfields) in Lua to do this. It makes it easier
}  to reuse stuff that is already defined in C. (luaffi is not in its
}  current state portable but my plan is to strip out the non portable
}  bits, which are the function call support).
} 
}  Justin
}  
}  I had successfully used more lightweight solution called Lua AutoC [1] 
with
}  Marc's lua(4).
}  Pros: light in comparison to other FFI libs, joy in use, easy to adopt to be
}  used in kernel, does the things in runtime, which gives the flexibility.
}  Cons: not widely tested, again does the things in runtime, which on other
}  side may give performance penalty.
}  
} 
}  I never used luaffi. It sounds very interesting and I think it could
}  be very useful to bind already defined C structs, but my purpose is to
}  dynamically define data layouts using Lua syntax (without parsing C
}  code).
} 
}  FFI in the kernel can be dangerous.  Pure Lua is a perfect confinment
}  for code, but with an FFI a Lua script can access almost anything in the
}  kernel.  One has to think twice if one wants that.
} 
}  Well, assuming it would be module, so I would not have to load it if I
}  don't want to.
}  
}  It's desirable if you're writing a device driver in Lua, as you can do
}  most of work from Lua code (e.g. call C methods of NetBSD driver API
}  and feed them with C structs and pointers).
}  States and explicit exports of a certain foreign functions makes things
}  a bit less dangerous.
}  But in general you're right, one should do this with care.
} 
} lua(4) has a mechanism for Lua's 'require' statement.  Normally, when
} you require 'foo', it looks up wheter a kernel module name luafoo exists
} and loads it.  This automatic loading of modules can be turned off, to
} make a module available to a state, it has to be specifically assigned.
}  So when you turn autoloading off, a script could not simply call a ffi
} module by requiring it.
} 
} Maybe Lua kernel modules should carry a flag whether they should allow
} autoloading or not?  This way, an ffi module would still be loaded into
} the kernel when Lua code requires it, but lua(4) would detect the don't
} autoload flag and would then not_ assign the module to the Lua state.

 There is already a mechanism for this, see module_autoload(9).
You should always be using module_autoload() to load a module from
inside the kernel.  If the no autoload flag is set, then the call
will fail.  Thus, there is no need for lua(4) to try managing this
itself.  It should just attempt to load the module.  If successful,
great.  If not, then the feature being requested isn't available.

}  [1] https://github.com/orangeduck/LuaAutoC
} 
}-- End of excerpt from Marc Balmer


Re: Lua in-kernel (lbuf library)

2013-10-18 Thread John Nemeth
On Oct 19, 12:13am, Artem Falcon wrote:
} 18.10.2013, × 21:03, John Nemeth jnem...@cue.bc.ca wrote:
}  On Oct 18, 11:03am, Marc Balmer wrote:
}  } Am 18.10.13 10:43, schrieb Artem Falcon:
}  }  Marc Balmer marc at msys.ch writes:
}  }  Justin Cormack justin at specialbusservice.com writes:
}  }  I have been using the luajit ffi and luaffi, which let you directly
}  }  use C structs (with bitfields) in Lua to do this. It makes it easier
}  }  to reuse stuff that is already defined in C. (luaffi is not in its
}  }  current state portable but my plan is to strip out the non portable
}  }  bits, which are the function call support).
}  } 
}  }  Justin
}  }  
}  }  I had successfully used more lightweight solution called Lua AutoC 
[1] with
}  }  Marc's lua(4).
}  }  Pros: light in comparison to other FFI libs, joy in use, easy to adopt 
to be
}  }  used in kernel, does the things in runtime, which gives the flexibility.
}  }  Cons: not widely tested, again does the things in runtime, which on 
other
}  }  side may give performance penalty.
}  }  
}  } 
}  }  I never used luaffi. It sounds very interesting and I think it could
}  }  be very useful to bind already defined C structs, but my purpose is to
}  }  dynamically define data layouts using Lua syntax (without parsing C
}  }  code).
}  } 
}  }  FFI in the kernel can be dangerous.  Pure Lua is a perfect confinment
}  }  for code, but with an FFI a Lua script can access almost anything in 
the
}  }  kernel.  One has to think twice if one wants that.
}  } 
}  }  Well, assuming it would be module, so I would not have to load it if I
}  }  don't want to.
}  }  
}  }  It's desirable if you're writing a device driver in Lua, as you can do
}  }  most of work from Lua code (e.g. call C methods of NetBSD driver API
}  }  and feed them with C structs and pointers).
}  }  States and explicit exports of a certain foreign functions makes things
}  }  a bit less dangerous.
}  }  But in general you're right, one should do this with care.
}  } 
}  } lua(4) has a mechanism for Lua's 'require' statement.  Normally, when
}  } you require 'foo', it looks up wheter a kernel module name luafoo exists
}  } and loads it.  This automatic loading of modules can be turned off, to
}  } make a module available to a state, it has to be specifically assigned.
}  }  So when you turn autoloading off, a script could not simply call a ffi
}  } module by requiring it.
}  } 
}  } Maybe Lua kernel modules should carry a flag whether they should allow
}  } autoloading or not?  This way, an ffi module would still be loaded into
}  } the kernel when Lua code requires it, but lua(4) would detect the don't
}  } autoload flag and would then not_ assign the module to the Lua state.
} 
} Probably. It should be named as 'auto assign' for clarity, as module loading
} occurs anyway.
} 
}  There is already a mechanism for this, see module_autoload(9).
}  You should always be using module_autoload() to load a module from
}  inside the kernel.  If the noautoload flag is set, then the call
}  will fail.  
} 
} This is exactly what lua(4) does on 'requiring'.
} 
}  Thus, there is no need for lua(4) to try managing this
}  itself.  It should just attempt to load the module.  If successful,
}  great.  If not, then the feature being requested isn't available.
} 
} kern.lua.autoload is a safety barrier. One may wish not allow any lua kernel
} script to load any given lua kernel module.

 The lua(4) implementers can certainly do this if they want.
However, module_autoload() won't be looking at this flag and will
continue to refuse to autoload any module that has the noautoload
flag set.  Also, there is the kern.module.autoload sysctl that can
prevent any module from autoloading.

}-- End of excerpt from Artem Falcon


Re: POSIX Semaphores

2013-02-23 Thread John Nemeth
On Jun 9,  7:19pm, Paul Goyette wrote:
}
} According to the man page sem(4), one needs to include options 
} P1003_1B_SEMAPHORE in the kernel config file in order to support this 
} feature.  Yet, the file kern/uipc_sem.c is included unconditionally in 
} all kernels, and there appears to be nothing in NetBSD anywhere that 
} depends on P1003_1B_SEMAPHORE.
} 
} Most of the MODULAR-ization work has already been done (with only the 
} actual building of the loadable module left), so I would like to propose 
} that this feature be made conditional, as described in sem(4).

 The feature was condiational in the past, but made unconditional
on the basis that it is essential.  The man page is out of date.  The
appropriate action would be to delete that statement from the manpage.

}-- End of excerpt from Paul Goyette


Re: MI boot args revamp?

2012-12-30 Thread John Nemeth
On May 22,  9:38am, Jean-Yves Migeon wrote:
} Le 29/12/12 22:23, Jeff Rizzo a écrit :
}  On 12/29/12 1:12 PM, Greg Troxel wrote:
} I would like to have a way to pass a string composed of the same flags
} (we can continue to use our existing -a, -s and other flags) in a
} consistent manner from one platform to another, to be able to adjust
} driver options, kernel options, whatever, and to be able to expect it
} to be similar whether I'm on amd64, macppc, evbppc, evbarm, or
} whatever.
} 
}  Are you talking about the UI of how the strings are written and what
}  they mean or how the bootloader stage that interacts with the user/prom
}  communicates this to the kernel?  For platforms with existing
}  conventions, I don't see how we can interact with native bootloaders
}  without meeting their interface.
} 
}  There are always going to be exceptions;  certain platforms (especially
}  older ones) are not flexible enough to do everything we want the way we
}  want it.  What I _would_ like to get to is this is the recommended goal
}  to shoot for.
} 
} That really depends on the capabilities of the MD component. I have a 
} good example with Xen though.
} 
} Xen port parses a command line for which the syntax is very close to the 
} one used by Linux (key=value) syntax [1]. Having a command line close to 
} this syntax has a potential for code reuse, or even turn it into an 
} MI/MD interface.

 Xen uses multiboot.  Yet another thing on my todo list is to
handle boot time module loading in the multiboot case.

} As we have a decent module framework too, I would look at what module(7) 
} offers when we pass arguments to them. I would expect modules and kernel 
} share the same code when parsing args, this makes sense somehow. Typical 
} example is (again) Xen with a DOM0 kernel, where the kernel is loaded as 
} a module.

 module(7) arguments are passed as a plist.  Take a look at
sys/modules/example/example.c.  That is the simplest module.

}-- End of excerpt from Jean-Yves Migeon


Re: lua(4), non-invasive and invasive parts

2012-12-29 Thread John Nemeth
On May 21,  6:10am, Marc Balmer wrote:
}
}  this is going to upset dyoung i'm sure :) but it seems to me that
}  if these invasive changes to individual subsystems are needed like
}  this, and we want them to be optional, then imo they should be on
}  a per-subsystem basis, not global.  eg something like:
}  
}  options LINEDISC_LUA
}  options GPIOSIM_LUA
}  
}  etc.  the ugliness could/should be largely hidden in header files.
} 
} The problem remains that modules no nothing about kernel options.  Maybe
} - in an ideal world - there should be no kernel options at all, but only
} modules... ;)

 Which is fine for gpiosim, as it can just depend on the lua
module.  For LINEDISC_LUA, there would have to be some kind of hook to
which the lua module would attach when loaded, so that the kernel would
still function even without the module loaded.

}-- End of excerpt from Marc Balmer


  1   2   >