Re: Locking in disk(9) api

2009-12-30 Thread John Nemeth
On May 22,  3:19am, Michael van Elst wrote:
} On Tue, Dec 29, 2009 at 03:53:58PM -0800, John Nemeth wrote:
} 
} >  In the case in question the real benefit is avoiding a panic when
} > dk_stats->io_busy goes negative.
} 
} The producer side is protected by the corresponding locks in each
} driver (now also in the dm driver). No panic there.

 tls was suggesting that we don't use locks on the producer side
which would lead to corrupt stats.

} > The system would have to be changed
} > to ignore obviously corrupt stats.  At which point, you might as well
} > not bother keeping them since they will be completely unreliable.
} 
} Reading the stats from userland may yield corrupted stats, temporarily.
} 
} In reality however, the 'corruption' only means slightly inconsistent
} data (e.g. old rxfer, new rbytes within a sample). The chance of
} actually seeing a corrupted 64bit number due to non-atomic updates
} is pretty low.

 True, however...

 Reading subr_disk.c shows that the disk_* routines simply call the
equivalent iostat_* routines in subr_iostat.c.  subr_iostat.c has a
lock that protects the addition and deletion of disks to the iostat
chain:

krwlock_t iostatlist_lock;

This lock is used by iostat_alloc() and iostat_free().  It is also used
to protect against changes to the chain while getting the info for the
hw.disknames, hw.iostatnames, and hw.iostats sysctls.  Would it not
make sense to use the same lock to protect the stats themselves from
being updated while they are being read (especially now that sysctl is
used to read them instead of kvm)?

}-- End of excerpt from Michael van Elst


Re: tmpfs module and pool allocator

2009-12-30 Thread John Nemeth
On May 22,  3:58pm, Bernd Ernesti wrote:
} On Wed, Dec 30, 2009 at 01:03:50PM -0500, Elad Efrat wrote:
} > 
} > >I've "fixed it" by explicitly calling mutex_destroy() in
} > >tmpfs_modcmd():MODULE_CMD_FINI, but I think the right fix would be to
} > >set a flag indicating that the pool allocator used is not a "system
} > >allocator" (or that it is a "custom allocator", or whatever) and
} > >should be destroyed as well, or at least its mutex...
} > 
} > Matt@ suggested we use a reference count on the pool allocator instead
} > of the PA_INITIALIZED flag, and when it drops to zero, pool_destroy can
} > destroy the mutex as well.
} > 
} > I've attached a diff that supposedly implements it... But I don't know
} > if it's correct. It fixes the problem for me, on a GEENRIC kernel built
} > with DIAGNOSTIC and LOCKDEBUG.
} > 
} > Please have a look and let me know if it's okay to commit.
} 
} I can't comment if it is okay or not, but see below.
} 
} > Index: sys/pool.h
} > ===
} > RCS file: /cvsroot/src/sys/sys/pool.h,v
} > retrieving revision 1.67
} > diff -u -p -r1.67 pool.h
} > --- sys/pool.h  15 Oct 2009 20:50:12 -  1.67
} > +++ sys/pool.h  30 Dec 2009 17:58:17 -
} > @@ -64,7 +64,7 @@ struct pool_allocator {
} > kmutex_tpa_lock;
} > TAILQ_HEAD(, pool) pa_list; /* list of pools using this allocator */
} > int pa_flags;
} > -#definePA_INITIALIZED  0x01
} > +   uint32_tpa_refcnt;  /* number of pools using this allocator 
*/
} > int pa_pagemask;
} > int pa_pageshift;
} > struct vm_map *pa_backingmap;
} 
} You are adding a new field in the middle of a struct.
} Is that really ok?

 Since it isn't a struct that userland will see, there is no compat
issue.  If it is at all possible that a module will see it, then the
kernel version needs to be bumped (it would need to be bumped
regardless of where the field is added).

}-- End of excerpt from Bernd Ernesti


Re: blocksizes

2010-01-24 Thread John Nemeth
On Jun 16,  9:13am, Michael van Elst wrote:
} On Sun, Jan 24, 2010 at 11:16:05PM +1030, Brett Lymn wrote:
} > On Fri, Jan 22, 2010 at 01:09:10PM +0100, Michael van Elst wrote:
} > > 
} > > Keeping DEV_SIZE at 512 bytes avoids lots of changes.

 A quote, often attributed to Einstein, is, "Everything should be
made as simple as possible, but no simpler."  I can't help but feel
that this is making things simpler then they should be.  This may be a
good first implementation, but I keep getting the feeling that
DEV_BSIZE should go away, and we should be using the real blocksize.
However, since I'm not overly famailiar with this area of the system
and I'm unlikely to be doing the work, I can't tell the people that
are, how to do it.

} > Won't that mean there is a chance there will be a lot of
} > read/modify/write going on if the driver is pretending to have 512byte
} > sectors?
} 
} No, the driver will not support writes of single 512byte sectors
} if the underlying hardware does not provide 512byte sectors.

 How do you communicate the real blocksize up the stack?  If you're
doing writes from userland through the raw device, how do you find out
the real blocksize?

} We are only talking about the API and what units are used to
} specify disk addresses and block counts. So on a disk with
} 1K sectors you will address blocks 0,2,4,6,... and you can
} only transfer an even number of blocks.

 Other then simplifying things possibly beyond the point they
should be, what is the point of keeping DEV_BSIZE when you are going to
force everything to use the real blocksize?

} N.B. So far I have MSDOSFS and FFS running on a disk with 1K sectors
} and I learned that the block size translation is already done
} in our block drivers, so there is no need to funnel I/O through dk.

 It is certainly good to have a proof of concept.

}-- End of excerpt from Michael van Elst


Re: xxxVERBOSE module?

2010-05-21 Thread John Nemeth
On Sep 4,  9:00am, Paul Goyette wrote:
}
} From the comments in the GENERIC config files, the primary reason for 
} omitting the various xxxVERBOSE options is to avoid including large text 
} tables in the resulting kernel.  And I vaguely recall some spirited 
} discussion back when the change was made to exclude these options by 
} default.
} 
} Now that we have MODULAR kernels (at least on some architectures), I've 
} been wondering if it might make sense to create a mod_verbose that could 

 There's been quite a bit of discussion about doing just this.

} be loaded during start-up time and then unloaded after the machine is up 

 It would have to be loaded by the boot loader then.  As far as I
know, only the x86 boot loader is capable of loading modules.

} and running.  (For plug-and-play situations, such as USB, the module 
} could be reloaded and unloaded whenever a new device is added.)
} 
} Is this something that would be useful?

 Yes, I think it would be very useful.  I was thinking of looking
at this myself.  Thanks for taking on this task.  It's one thing I can
cross off my TODO list.  :-)

}-- End of excerpt from Paul Goyette


Re: xxxVERBOSE module?

2010-05-21 Thread John Nemeth
On Sep 6,  9:11am, Paul Goyette wrote:
} On Fri, 21 May 2010, John Nemeth wrote:
} 
} > } be loaded during start-up time and then unloaded after the machine is up
} >
} > It would have to be loaded by the boot loader then.  As far as I
} > know, only the x86 boot loader is capable of loading modules.
} 
} I'm almost finished with the PCIVERBOSE stuff...
} 
} My current approach is to load the module right before the first pcibus 
} is enumerated, and unload when finished.  So we can use the in-kernel 

File systems aren't initialised during autoconf when the system is
being cold booted, thus it isn't possible for the kernel to load a
module at that point in time.  Also consider that on most platforms
with PCI, prior to the first pcibus being enumerated, the kernel
doesn't know anything about any disk drives that may be attached to the
system.

} loader/linker for whichever platforms it supports.  For other platforms 
} it will still be possible to set 'options PCIVERBOSE' to generate a 
} built-in module.

 Sure, this is the way MODULEs are supposed to work.

} The fun part is making sure that the shared code still plays nicely with 
} src/lib/libpci :)

 I don't know anything about libpci, but have fun with that.

} > Yes, I think it would be very useful.  I was thinking of looking
} > at this myself.  Thanks for taking on this task.  It's one thing I can
} > cross off my TODO list.  :-)
} 
} No worries!  When I finish up with PCI, I'll start in on USB.

 I imagine that will be mostly copy/paste.

}-- End of excerpt from Paul Goyette


Re: xxxVERBOSE module?

2010-05-22 Thread John Nemeth
On Sep 6, 10:07am, Paul Goyette wrote:
} On Fri, 21 May 2010, John Nemeth wrote:
} 
} > } My current approach is to load the module right before the first pcibus
} > } is enumerated, and unload when finished.  So we can use the in-kernel
} >
} >File systems aren't initialised during autoconf when the system is
} > being cold booted, thus it isn't possible for the kernel to load a
} > module at that point in time.  Also consider that on most platforms
} > with PCI, prior to the first pcibus being enumerated, the kernel
} > doesn't know anything about any disk drives that may be attached to the
} > system.
} 
} Ah, yeah!  Ooppss!  :)
} 
} I guess I can probably remove those changes.  So we'll have to rely on 

 Although, you can't test them, possibly keep them.  We do want to
support hotswap PCI at some point.  Also, dyoung@ is in the process of
merging cardbus into PCI.  I don't know if your changes would cause
mod_verbose to be autoloaded for cardbus, which uses PCIVERBOSE, but
cardbus insertion should be handled at some point.  And, of course,
there is PCIe and expresscard on the horizon.  expresscard is basically
hotswap PCIe along with USB (i.e.  an expresscard slot brings out
signals for both and the card determines which one it is going to
use).

} > } The fun part is making sure that the shared code still plays nicely with
} > } src/lib/libpci :)
} >
} > I don't know anything about libpci, but have fun with that.
} 
} The only thing I find that uses libpci is /usr/sbin/pcictl and that 
} seems to be working fine.

 What about X?  What does it use for scanning the PCI bus?

}-- End of excerpt from Paul Goyette


Re: Kernel modules - documentation?

2010-05-22 Thread John Nemeth
On Sep 5,  6:12pm, Paul Goyette wrote:
}
} Is there any documentation on the modules interface or API?  There does 
} not seem to be anything in the man pages...

 I'm not aware of any.  Probably the best documentation for the
moment is src/sys/modules/example/example.c.  This is an example of a
very simple module, but it has all the pieces.  You might also want to
look at modctl(2), which is the syscall that modload/modstat/modunload
call.  I should probably add some cross references.

} My specific questions:
} 
} What actually triggers an autoload of a module?  (There seem to be very 
} few places where module_autoload() is called.)

 This depends on the type of module, i.e. filesystem modules are
handled differently from device driver modules.  Basically the part of
the kernel that needs functionality that might be provided by a module
needs to check to see if the functionality is currently available and
if it isn't, then it needs to call module_autoload() for the
appropriate module.

} What is the semantic difference between module_autoload() and a "normal" 
} module_load()?

 From the module's perspective, there is none.

 More generally, the kern.module.autoload sysctl will be checked to
see if module autoloading is allowed, there are restrictions on the
path passed to module_autoload(), and a timer will be kicked off to try
to automatically unload the module.

} Does the code which calls either of these routines need to be concerned 
} with whether the module has been previously loaded?  Is it OK to load a 
} module that has already been loaded?

 You will get EEXIST if the module is already loaded.  Also, if it
is a built-in module then autoload will fail.  It can only be reloaded
by doing 'modload -f '.

} Given that there is a kernel thread that runs around and attempts to 
} unload any unreferenced modules that have been loaded "for a while", is 
} it ever necessary or desirable to explicitly unload a module?

 Code that autoloads a module doesn't need to worry about this.

} What happens if a global symbol referenced by a module doesn't exist? 
} Does the module get loaded anyway, leaving the reference unresolved?

 It will fail to link and thus will fail to load.  I don't know the
exact error that you will get off hand.  It was ENOENT, but there was
talk about changing it to ENOEXEC.  I will check on this, and if it
hasn't been done yet, I'll probably make the change.

}-- End of excerpt from Paul Goyette


Re: Kernel modules - documentation?

2010-05-23 Thread John Nemeth
On Sep 6,  5:36am, Paul Goyette wrote:
} On Sat, 22 May 2010, Adam Hamsik wrote:
} 
} >> My specific questions:
} >>
} >> What actually triggers an autoload of a module?  (There seem to be 
} >> very few places where module_autoload() is called.)
} >
} > Device module_autoloading is done in specfs_open.
} 
} OK.  So I guess there's no generic "Gee, the kernel just tried to 
} reference something that's not here, so maybe we can find a module to 
} resolve that reference."

 How do you envision something like this working?  Off the top of
my head, I can't think of any way it could work.

} >> What is the semantic difference between module_autoload() and a 
} >> "normal" module_load()?
} >
} > module_autoload happens automatically when you try to open device node 
} > which doesn't have device driver in kernel.
} 
} My question was more along the lines of "what is the _difference_ 
} between module_load() vs module_autoload()?"  Or even simpler, "Why do 
} both routines exist?"  :)

 module_load() is for use by the modctl() syscall.
module_autoload() is for when the kernel is autoloading something.  As
I mentioned previously module_autoload() checks the
kern.module.autoload sysctl for permission and there are some path
restrictions.

} >> Does the code which calls either of these routines need to be 
} >> concerned with whether the module has been previously loaded?  Is it 
} >> OK to load a module that has already been loaded?
} >>
} >
} > I think that this is not possible and these routines return an error 
} > when you try to do that.
} 
} That's OK, I'd expect the error.  I can ignore that.  I just need to be 
} sure that the previously-loaded module doesn't get "screwed up" from the 
} attempt to load it the second time.

 No, the "scan" to see if the module is already loaded happens
before anything is done with the module to be loaded.

} >> Given that there is a kernel thread that runs around and attempts to 
} >> unload any unreferenced modules that have been loaded "for a while", 
} > is it ever necessary or desirable to explicitly unload a module?
} >
} > It doesn't work this way there is a thread(workqueue) which try to 
} > unload module in first 300 seconds or mili seconds I can't remember 
} > now. But this thread or whatever is it doesn't unload modules older 
} > that set limit AFAIK.
} 
} Hmmm, I must have misunderstood this code - time to go look again.

 At least some versions of the kernel will make multiple attempts.
This can be demonstrated.  Let the kernel autoload something like
exec_elf32 (which will happen if you use a standard i386 GENERIC), set
kern.module.verbose=1, then check /var/log/messages.  However, I
believe there was some talk about changing the way this works.  I'm not
sure how it works at this exact moment, or how it might work in the
future.

}-- End of excerpt from Paul Goyette


Re: Modules loading modules?

2010-08-01 Thread John Nemeth
On Nov 17,  5:24am, Paul Goyette wrote:
} On Sun, 1 Aug 2010, Antti Kantee wrote:
} 
} > I'm not sure if it's a good idea to change the size of kmutex_t.  I
} > guess plenty of data structures have carefully been adjusted by hand
} > to its size and I don't know of any automatic way to recalculate that
} > stuff.
} >
} > Even if not, since this is the only user and we probably won't have
} > that many of them even in the future, why not just define a new type
} > ``rmutex'' which contains a kmutex, an owner and the counter?  It 
} > feels wrong to punish all the normal kmutex users for just one use.
} > It'll also make the implementation a lot simpler to test, since it's
} > purely MI.
} >
} > "separate normal case and worst case"
} 
} Round two!  Taking pooka's suggestion, this version is built on top of 
} (rather than beside) the existing non-recursive mutex.  As such, it does 
} not affect any MD code.
} 
} Attached is a set of diffs that
} 
} 1. Adds sys/sys/rmutex.h and sys/kern/kern_rmutex.c to implement
} recursive adaptive mutexes.  (Conspicuously missing is an rmutex(9)
} man page...  It will happen before this gets committed.)
} 
} 2. Converts the existing module_lock from a normal kmutex_t to an
} rmutex_t
} 
} 3. Updates all of the (surprisingly many) places where module_lock
} is acquired.

 I'm thinking the acquisition of module_lock should be pushed into
module_autoload() instead of having the caller acquire it for this very
reason.  It makes it hard to change the way locking works in the
MODULAR code if you expect the caller to acquire the lock.  I don't
know why it was done this way originally, or what the consequences (if
any) would be for making the change.  Andrew, any thoughts on this?

} Compile-tested on port-amd64 (including rumptest).  Since there are no 
} MD-changes in this version, there "shouldn't be" any issues with 
} building on other ports.
} 
} As previously noted, there is only one known use case for this so far: 
} modules loading other modules from within their xxx_modcmd() routine. 
} The specific use case we have involves loading the acpicpu driver/module 
} which eventually results in an attempt to load acpiverbose.
} 
} It would be really nice if the community could
} 
} A. Compile-test on additional architectures
} B. Test to see that existing mutex operations still work correctly
} C. Exercise the known use case if possible
} D. Identify additional use cases
} 
}-- End of excerpt from Paul Goyette


re: kicking everybody out of the softc

2010-08-15 Thread John Nemeth
On Jan 6,  1:19am, matthew green wrote:
} 
} would device_lookup() and device_lookup_private() take a reference
} on this count automatically?  or maybe some new API that does it,
} to avoid the need to audit every driver at once.

 What would release the reference in that case?  Or, would the
count just keep incrementing thus preventing the driver from detaching
until it is audited?

}-- End of excerpt from matthew green


Re: kernel module loading vs securelevel

2010-10-16 Thread John Nemeth
On Jan 31,  5:14pm, Paul Goyette wrote:
} On Sat, 16 Oct 2010, Izumi Tsutsui wrote:
} 
} >>> Hmm, what do you think about this feature?
} >>> Only available in INSECURE environment?
} >
} >> We trust modules at the time when they're installed into the trusted
} >> place, same as kernel itself.  I think prohibiting module load  at
} >> run-time is rather pointless.
} >
} > Well I think the point is whether we should require INSECURE or not
} > to use module autoload/autounload after multiuser.
} >
} > If we should I'll enable options INSECURE by default on ports
} > that require options MODULAR (to save kernel file size).
} 
} autoload/autounload does NOT perform any authorization checks - please 
} look at the code!  No checking of securelevel occurs, as far as I can 

 I just did and autoload most certainly does do authorization
checks.

} see.  For autoload, the module name must not contain a '/', so if the 
} module is being loaded from the file system it must be loaded from the 
} "blessed" /stand/${ARCH}/${VERSION}/modules directory.  Including the 
} INSECURE option will have no effect on autoloading of modules.
} 
} Manual loading and unloading of modules does involve calls to 
} kauth_authorize_system() which will check securelevel.

 sys/kern/kern_module.c:module_autoload() makes almost the exact
same call to kauth_authorize_system as does module_load().  The
difference is that the second last parm is (void *)(uintptr_t)1.  What
difference this makes is going to be buried in the bowels of kauth, and
I'm not going to dig through that at this moment.

}-- End of excerpt from Paul Goyette


Re: kernel module loading vs securelevel

2010-10-16 Thread John Nemeth
On Mar 8,  9:44am, Thor Lancelot Simon wrote:
} On Sun, Oct 17, 2010 at 03:51:52AM +0900, Izumi Tsutsui wrote:
} > 
} > I'm just asking if "options INSECURE is mandaory to use autoloading,"
} > not module/autoloading is secure/silly/boo or not.
} 
} No.  As far as I can tell, there's a bug in the relevant kauth listener,
} at least in terms of the original intent of the author of the autoloading
} code; the system scope kauth listener should return DEFER, not DENY.

 module_listener_cb() was added to kern_module.c in revision 1.51
by elad.  The kauth_authorize_system() calls were added to
kern_module.c by ad, but the respective commit log messages doesn't say
anything about them, so the original intent of the author of the
autoloading code (ad) is unclear.

} However, I think it's a troublesome question whether this is really
} the right policy to apply.  Unless the directory from which modules are
} loaded is required to be immutable (flags schg) at boot time, this really
} does introduce a major security regression: now it is possible to override
} the whole security policy by placing a new kernel module in the existing
} directory, when the system is running at securelevel > 0.
} 
} I really only see two ways to keep the convenient behavior you and I both
} seem to want (autoload of modules when filesystems, syscalls, etc. are
} used) and the safe behavior I and others building (for example) embedded
} systems with tight security policies want: either we need to rely on
} the existing securelevel machinery and require that the directory from
} which autoload occurs is immutable at kernel boot time (elsewise disabling
} autoload), or we need to use something like veriexec, when we're still at
} securelevel < 0, to ensure that the modules placed there don't change in
} any way.

 I would have to agree.  Having modules loaded at securelevel > 0
when you can't be absolutely sure of what's in them, completely defeats
the purpose of running at securelevel > 0.

}-- End of excerpt from Thor Lancelot Simon


Re: kernel module loading vs securelevel

2010-10-16 Thread John Nemeth
On Feb 1,  1:25am, Paul Goyette wrote:
} On Sat, 16 Oct 2010, David Holland wrote:
} 
} > > And also make the "blessed" directory itself immutable?  :)
} >
} > As I recall the semantics of immutable are such that this isn't
} > necessary to protect modules that are present at boot time (that is,
} > they can't be unlinked/renamed/etc.), and if there are autoloadable
} > modules whose names aren't present at boot time, they'll fail the
} > check.
} 
} I've already misread the code here once, but...
} 
} As far as I can tell, each time a module_autoload call is made, if the 
} module is neither built-in nor passed in by the boot loader, the code 
} will attempt to load it via a call to kobj_load_vfs() which has path as 
} an argument.  It doesn't appear to me that there is any pre-approved 
} list of acceptable objects that can be loaded from the file system.

 No, there isn't.  If the module is in the appropriate directory,
it can be loaded.

}-- End of excerpt from Paul Goyette


.prop rename

2010-11-19 Thread John Nemeth
 The module subsystem has a feature where it can automatically load
a .prop file along with the module.  The purpose of the file is
to store persistent args for the module and/or args for use when the
module is autoloaded.  It is also being used to store information about
the module for use by the module subsystem.

 .prop is short for proplib or property list.  It also made
coding easier as s/.kmod/.prop/ doesn't change the length of the path.
Anyways, I have received several private requests to change the name to
.plist.  After thinking about it, I believe this to be a good
idea.  It seems that .plist is a somewhat standard extension for a file
that contains a property list (see Wikipedia).

 .prop isn't in any release and to the best of my knowledge
there are no significant uses of it yet.  That means now is a pretty
good time to change it.  If there are no serious complaints, I'll start
working on the changes in about a week.


Re: .prop rename

2010-11-20 Thread John Nemeth
On Apr 12,  5:11pm, David Holland wrote:
} On Fri, Nov 19, 2010 at 12:33:00AM -0800, John Nemeth wrote:
}  >  .prop is short for proplib or property list.  It also made
}  > coding easier as s/.kmod/.prop/ doesn't change the length of the path.
}  > Anyways, I have received several private requests to change the name to
}  > .plist.  After thinking about it, I believe this to be a good
}  > idea.  It seems that .plist is a somewhat standard extension for a file
}  > that contains a property list (see Wikipedia).
} 
} I don't care one way or the other about the name but I have a
} different suggestion: since, nominally, serialized proplib files
} aren't supposed to be hand-edited, wouldn't it make more sense to

 That doesn't mean that people can't hand-edit them.  Also, they
can be generated using 'modload -p'.

} embed the property info in the module file itself?

 That may or may not make more sense, but it would require a lot
more work (i.e. inventing a tool to extract them, edit them, and insert
them; and modifying the module loading code to extract them).  I have
very little interest in doing that work at this time.

}-- End of excerpt from David Holland


Re: NetBSD kernel modules.

2010-12-07 Thread John Nemeth
On Apr 28, 10:22am, Piotr Adamus wrote:
} 
} I have one simple question: is it possible to compile these drivers
} into modules only: sdhc, ubt, uaudio? At this moment I don't have
} NetBSD installed. These drivers don't have suspend support enabled.

 There is a module for uaudio, but there are no modules for sdhc
and ubt.  Kernel modules are a work in progress and driver modules even
more so.  You're best bet if you don't need them, is to remove them
from your kernel.

}-- End of excerpt from Piotr Adamus


Re: NetBSD kernel modules.

2010-12-07 Thread John Nemeth
On Apr 29, 11:03am, Piotr Adamus wrote:
} On Tue, Dec 7, 2010 at 11:33 AM, John Nemeth  wrote
} > On Apr 28, 10:22am, Piotr Adamus wrote:
} > }
} > } I have one simple question: is it possible to compile these drivers
} > } into modules only: sdhc, ubt, uaudio? At this moment I don't have
} > } NetBSD installed. These drivers don't have suspend support enabled.
} >
} >  There is a module for uaudio, but there are no modules for sdhc
} > and ubt.  Kernel modules are a work in progress and driver modules even
} > more so.  You're best bet if you don't need them, is to remove them
} > from your kernel.
} 
} thank you. That's a pity but at least sdhc is needed for me :) Do you
} know when this will be finished or is planned in the future?

 No idea.  Kernel modules are really a long term project, and there
aren't very many people actively working on them.  As for sdhc, you can
try detaching it before suspending and reattaching it afterwards.  See
'man drvctl'.  You might also try filing a PR against it if there isn't
one already.

}-- End of excerpt from Piotr Adamus


Re: NetBSD kernel modules.

2010-12-07 Thread John Nemeth
On Mar 25,  7:15am, Iain Hibbert wrote:
} On Tue, 7 Dec 2010, John Nemeth wrote:
} > On Apr 28, 10:22am, Piotr Adamus wrote:
} > }
} > } I have one simple question: is it possible to compile these drivers
} > } into modules only: sdhc, ubt, uaudio? At this moment I don't have
} > } NetBSD installed. These drivers don't have suspend support enabled.
} >
} >  There is a module for uaudio, but there are no modules for sdhc
} > and ubt.  Kernel modules are a work in progress and driver modules even
} > more so.  You're best bet if you don't need them, is to remove them
} > from your kernel.
} 
} I am not sure but in the old days (when I wrote ubt), there was not really
} any need for specific suspend support in ubt because USB devices were just
} detached upon suspend events. Is the USB stack any different now? I
} confess, I don't use suspend.

 I don't know the USB stack.  However, I just looked at ubt.c and I
see that Christos added NULL suspend and resume handlers in rev. 1.30,
which is between NetBSD 4.x and NetBSD 5.x.

 Piotr, what version of NetBSD are you running?  Also, can you
execute this command, please:  "ident /netbsd | grep ubt".

} In any case, there is plenty of state about the current Bluetooth
} connections that is held inside the controller and would be lost if the
} device was powered down with no way that I know of to reinstate it, not to

 In that case, you should probably create suspend and resume
routines to save and restore the necessary data.

} mention that devices would likely be out of range after awakening, so I
} don't really know how much code would be useful there anyway.

 Maybe, maybe not.  Equipment doesn't necessarily move while being
suspended (or, if it does move, the Bluetooth device might move with
it, i.e. a mouse or a keyboard).  And, with Bluetooth, devices can
move in and out of range at random times, so this needs to be handled
anyways.

}-- End of excerpt from Iain Hibbert


Re: NetBSD kernel modules.

2010-12-10 Thread John Nemeth
On May 1,  4:24pm, Piotr Adamus wrote:
} 
} I tried drvctl- it works with ubt0, uaudio0 but not with sdhc0- it
} returns (from memory) "Operations not supported" or similar. Anyway
} shdc0 doesn't work with drvctl.

 I would suggest filing a PR against sdhc.  It's a relatively new
device; new device shouldn't be added without detach/suspend/resume
routines.  Right now your only options are to 'boot -c' and 'disable
sdhc0', or compile a kernel without it.

}-- End of excerpt from Piotr Adamus


Re: prop_*_internalize and copyin/out for syscall ?

2011-01-17 Thread John Nemeth
On Jun 9, 11:09am, Manuel Bouyer wrote:
} 
} so I'm evaluating how to use proplib for the new quotactl(2) I'm working on.
} I see there is already provision of function to pass property list between
} kernel and userland using ioctl, but there is no equivalent for syscalls.
} Should there be ?

 The way that modload(8) does it is that it calls
prop_dictionary_externalize() to put the dictionary in a string.  It
then fills in a structure that contains amongst other things the length
of the string and a pointer to it (see src/sbin/modload/main.c), and
calls modctl(2).  In the kernel, modctl(2) allocates memory based on
the size passed in, calls copyinstr() to get the string, and then calls
prop_dictionary_internalize() (see src/sys/kern/sys_module.c).  Whether
or not there should be dedicated functions to do this is another
question.

} For kernel land, prop_{array,dictionary}_copy{in,out} would do it
} (prop_{array,dictionary}_copyout is documented but not implemented,
} this is easy).
} For userland, we probably need a prop_{array,dictionary}_recv_syscall(),
} wich takes as parameter the pref we got from kernel, and internalize it.
} Parameters would be the pref, and a pointer to the prop_array_t or
} prop_dictionary_t that will get the result. What this would do in
} addition to call prop_{array,dictionary}_internalize_from_pref() is
} to unmap the buffer the kernel mmaped for us. I don't think this details of 
the
} kernel/userland communication should be exposed outside of the
} proplib code.

 modctl(2) doesn't copyout any dictionaries.  Off the top of my head,
I'm not aware of any precedence for that.

} For symetry, we probably want a prop_{array,dictionary}_send_syscall()
} which is just an alias to prop_{array,dictionary}_externalize_to_pref()
} 
}-- End of excerpt from Manuel Bouyer


Re: Loading modules during startup

2011-02-20 Thread John Nemeth
On Jun 6,  8:40pm, Paul Goyette wrote:
}
} One thing still puzzles me a bit WRT device-driver modules.
} 
} We have a number of devices whose drivers have been modularized.  For 
} example, acpicpu(4).  There appears to be no attempt to auto-load 
} drivers when their associated devices exist; this could be due, in part, 

 Not entirely true, but it is rather adhoc at the moment.

} to having the xxx_match() within the driver itself.  If there's nothing 
} to identify the device, what mechanism should be used to load the 
} driver?

 There is the code provided by jmcneill@.  Taking a closer look at
it is on my list of things todo after I finish my current MODULAR
project, which is loading .plist at boot time (/boot code is
finished).

} My solution has been to manually load the appropriate modules in my 
} /etc/rc.local file.  But it would seem to me that there should be a 
} "better way" (tm) to do this.
} 
} I don't think that a single /etc/rc.d/modules script (with some sort of 
} configuration file) would necessarily fill the bill, since there might 
} be different dependencies (some modules might need to be loaded early, 
} some later).
} 
} Any thoughts or suggestions?
} 
}-- End of excerpt from Paul Goyette


Re: sys/dev/isa/fd.c FDUNIT/FDTYPE

2011-05-04 Thread John Nemeth
On Sep 24,  3:25pm, Izumi Tsutsui wrote:
}
} The problem is that there might be some ports whose MAXPARTITIONS is still 8
} and such ports can't use type 8.

 Given that floppies don't have disklabels (and don't support
them), what does MAXPARTITIONS have to do with anything?

}-- End of excerpt from Izumi Tsutsui


Re: sys/dev/isa/fd.c FDUNIT/FDTYPE

2011-05-04 Thread John Nemeth
On Sep 24,  6:14pm, Izumi Tsutsui wrote:
} > On Wed, May 04, 2011 at 08:50:10PM +0900, Izumi Tsutsui wrote:
} > > The problem is that there might be some ports whose MAXPARTITIONS is 
still 8
} > > and such ports can't use type 8.
} > 
} > Why not? It is not used as a partiton of fd*.
} > MAKEDEV is already wrong for those ports, the fd nodes probably should have
} > special case handling.
} 
} On i386:
} ---
} % ls -l fd1*
} brw-r-  1 root  operator  2,  8 May  7  2003 fd1a
} [snip]
} brw-r-  1 root  operator  2, 15 May  7  2003 fd1h
} brw-r-  1 root  operator  2, 524296 May  7  2003 fd1i
} [snip]
} brw-r-  1 root  operator  2, 524303 May  7  2003 fd1p
} ---
} 
} on amd64:
} ---
} # ls -l fd1*
} brw-r-  1 root  operator  2, 16 May  4 23:31 fd1a
} [snip]
} brw-r-  1 root  operator  2, 31 May  4 23:31 fd1p
} # 
} ---
} 
} So current isa/fd.c can't handle the second drive
} on ports where (MAXPARTITIONS != 8 && !__HAVE_OLD_DISKLABEL).
} 
} For compatibility with longstanding inconsistent MAKEDEV(8),
} it might be better to use DISKUNIT() and DISKPART() for
} FDUNIT() and FDTYPE() as other disks, so that we don't have
} to have special device minor handling for each MD fd device in
} MI MAKEDEV.tmpl script.

 So, instead of fixing the very broken MAKEDEV script, you want to
mangle multiple floppy drivers?  At the end of the day, MAKEDEV is
broken, it should not be treating floppy drives like hard drives.  The
unit letters don't have the same meaning and never have.

}-- End of excerpt from Izumi Tsutsui


Re: sys/dev/isa/fd.c FDUNIT/FDTYPE

2011-05-04 Thread John Nemeth
On Sep 24,  8:03pm, Izumi Tsutsui wrote:
}
} >  So, instead of fixing the very broken MAKEDEV script, you want to
} > mangle multiple floppy drivers?  At the end of the day, MAKEDEV is
} > broken, it should not be treating floppy drives like hard drives.  The
} > unit letters don't have the same meaning and never have.
} 
} There are two options, fixing kernels, or

 The kernels aren't broken and don't require fixing.

} fixing /dev nodes on existing disks (not only MAKEDEV script).

 As things sit now, you can't use the second floppy drive on an
amd64 machine.  Although, there are going to be very few amd64 machines
with two floppy drives (heck, modern PC motherboards don't have a
floppy controller any more), that should be fixed.

} I'm afraid few developers will maintain MAKEDEV script properly,

 Then they shouldn't be messing with it.

} and few users will rerun /dev/MAKEDEV on upgrade.

 It will be automatically run by sysinst.  If somebody manually
does an upgrade and doesn't do it properly that's their problem.
Besides, failing to run MAKEDEV will bite you in other ways, such as
missing dev nodes for new devices.

} Nowadays floppy is almost dead, so we don't have to care about
} compatibility, though...

 This doesn't mean we should be doing hack jobs.  NetBSD is about
doing things right.

}-- End of excerpt from Izumi Tsutsui


Re: sys/dev/isa/fd.c FDUNIT/FDTYPE

2011-05-04 Thread John Nemeth
On Sep 25,  6:43am, Izumi Tsutsui wrote:
}
} >  The kernels aren't broken and don't require fixing.
} 
} The topic is how to add 8th type and currently fd.c uses hardcoded '8'.

 Actually, the topic is asking what the purpose of FDUNIT and
FDTYPE is.  That question has been answered.

} If we can simply change it to 16, why did we introduce complicated
} __HAVE_OLD_DISKLABEL for harddisks?

 Again, disklabels have nothing to do with floppies, or conversely,
the letter part of the "floppy unit" has nothing to do with partitions.

}-- End of excerpt from Izumi Tsutsui


Re: sys/dev/isa/fd.c FDUNIT/FDTYPE

2011-05-04 Thread John Nemeth
On Sep 25,  6:51am, Izumi Tsutsui wrote:

 You seem to have defective e-mail software, it doesn't properly
quote.

} > I have a realy strange collection of
} > old machines, but pretty sure none of it has more than one floppy drive,
} > actually most of them have only broken drives).
} 
} BTW all X680x0 machines have two 5.25" floppy drives,
} though it uses sys/arch/x68k/dev/fd.c and it already suppurts

 I'm not sure what this has to do with anything, given that you're
now talking about a different driver.

} 1024bytes/sector format by default.

 It has a modified fd_types array.

} It was based on isa/fd.c and also uses:
} >> #define FDUNIT(dev)(minor(dev) / 8)
} >> #define FDTYPE(dev)(minor(dev) % 8)

 Still don't know what this has to do with anything, other then
demonstrating that it treats minor numbers the same way that
sys/dev/isa/fd.c does.

}-- End of excerpt from Izumi Tsutsui


Re: sys/dev/isa/fd.c FDUNIT/FDTYPE

2011-05-05 Thread John Nemeth
On Sep 25,  6:14am, Edgar =?iso-8859-1?B?RnXf?= wrote:
} Subject: Re: sys/dev/isa/fd.c FDUNIT/FDTYPE
} IT> The topic is how to add 8th type and currently fd.c uses hardcoded '8'.
} JN> Actually, the topic is asking what the purpose of FDUNIT and
} JN> FDTYPE is.  That question has been answered.
} As I started this discussion in the first place, I should probably clarify 
what my intentions on asking this question were.
} 
} 1. I observed that, at least on amd64, MAKEDEV adds 16 to the minor
} for fd1 while fd.c treats the minor mod/div 8. So (either I was
} wrong, which doesn't semm to be the case or) MAKEDEV or fd.c had to
} be fixed in order for a second floppy (fd1, unit 1) to work.

 Yes, good catch.  I'm guessing that there aren't many amd64 based
machines with two floppy drives (I don't think I've seen any).

} 2. As I originally observed that discrepancy in the course of adding
} a ninth floppy type, I expressed that my personal choice would have
} been to adjust the kernel to MAKEDEV (e.g. div/mod 16) and not the
} other way round (e.g. div/mod 8).

 My preference, as stated, would be to fix MAKEDEV.  And, possibly
modify the floppy driver to be able to modify one of the table entries
so that you don't need to go through the gyrations that you mentioned.

} Since I seem to be about the only person using floppies in NetBSD

 You're not the only one.  If I didn't care, I wouldn't comment.
I'm probably the last person to have created/added a floppy driver to
NetBSD (I did the driver for SBus based sparc64 machines).  I still
have plans for creating MI floppy driver(s) and doing various
cleanups.  At this point these plans pretty much fall under the
category of "exercise for the student" (the driver I created was my
first significant foray into kernel land).  This means that although I
still intend to do it, it is below various other priorities, such as
fixing if_cas and NAT-T, since it has limited utility.  But at the same
time, I don't want one single driver to get messed up as that makes the
grand unification much more difficult.

} land (and my current intent is only to have the content of some four
} shoeboxes full of (mostly ten-sector) Atari floppies more readily

 These are actually readable on a standard PC style floppy drive?
On a side note, I did have an Atari 800 at one time, but I never did
get the floppy drive for it.

} available), the question is probably irrelevant. I have modified my
} local copy of fd.c to have ten sectors at type 8; so I modified it to
} also treat the minor div/mod 16. I could also hijack one of the other
} six types I don't need and stay div/mod 8.

 I would have hijacked one of the last two types.  That's the
easiest modification.  Also, I have never seen those in the field.

}-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=


Re: sys/dev/isa/fd.c FDUNIT/FDTYPE

2011-05-05 Thread John Nemeth
On Sep 25,  3:36pm, Izumi Tsutsui wrote:
}
} > } The topic is how to add 8th type and currently fd.c uses hardcoded '8'.
} > 
} >  Actually, the topic is asking what the purpose of FDUNIT and
} > FDTYPE is.  That question has been answered.
} 
} The original question is:
} 
} http://mail-index.NetBSD.org/tech-kern/2011/05/03/msg010454.html

 Yes, it is asking for clarification of FDUNIT and FDTYPE.

} >> sys/dev/isa/fd.c defines FDUNIT and FDTYPE as DIV/MOD 8.
} >> etc/MAKEDEV uses makedisk_p16 for fd*.
} >> 
} >> Who's right?
} >> As I'm just adding a ninth (ten-sector) fd_type, I prefer the 16 version.
} 
} I said changing DEV/MOD number in fd.c required all users to update
} existing fd device nodes under /dev.
} (note i386 uses makedisk_p16high that handles OLDMAXPARTITIONS)

 Which is clearly very wrong.

} > } If we can simply change it to 16, why did we introduce complicated
} > } __HAVE_OLD_DISKLABEL for harddisks?
} > 
} >  Again, disklabels have nothing to do with floppies, or conversely,
} > the letter part of the "floppy unit" has nothing to do with partitions.
} 
} FDUNIT and FDTYPE are calculated from device minor using the
} hardcoded DEV/MOD number, as MI DISKUNIT and DISKPART for harddrives.
} __HAVE_OLD_DISKLABEL was introduced to add magics in MD DISKUINT()
} and DISKPART() macro to avoid renumbering existing old device minors
} on MAXPARTITION bump.
} Current MI MAKEDEV.tmpl treats floppy device minors as harddrives.

 Again, I don't see your point.  You're talking about a major bug
in MAKEDEV where it treats floppy drives the same as hard drives.  They
aren't the same and the letters in the "unit number" don't have the
same meaning.

} If someone[TM] will maintain MAKEDEV scripts and
} write doc/UPDATING properly, no problem for me.

 This is the only correct solution.  I might consider working on
it, but my developement system is down with a bad hard drive at the
moment.  Getting it running again is a somewhat higher priority.

}-- End of excerpt from Izumi Tsutsui


Re: sys/dev/isa/fd.c FDUNIT/FDTYPE

2011-05-05 Thread John Nemeth
On Sep 25, 11:29pm, Izumi Tsutsui wrote:
}
} > } FDUNIT and FDTYPE are calculated from device minor using the
} > } hardcoded DEV/MOD number, as MI DISKUNIT and DISKPART for harddrives.
} > } __HAVE_OLD_DISKLABEL was introduced to add magics in MD DISKUINT()
} > } and DISKPART() macro to avoid renumbering existing old device minors
} > } on MAXPARTITION bump.
} > } Current MI MAKEDEV.tmpl treats floppy device minors as harddrives.
} > 
} >  Again, I don't see your point.  You're talking about a major bug
} > in MAKEDEV where it treats floppy drives the same as hard drives.  They
} > aren't the same and the letters in the "unit number" don't have the
} > same meaning.
} 
} If we "fix" kernels to use DISKUNIT() and DISKPART() macro

 The kernels aren't broken and don't need fixing.

} for FDUNIT() and FDTYPE(), we can bump a number of fd types
} to MAXPARTITIONS with no further changes.

 Floppies don't use partitions, so I don't see what MAXPARTITIONS
has to do with anything.

} Nothing needs to be done by users in that case.

 Nothing needs to be done by most users if we fix MAKEDEV.

} I thought it was acceptable workaround because paying extra costs

 It's a hack to workaround a broken MAKEDEV.  It's a rather bad
practice to introduce hacks to work around other broken code, when the
other broken code could just as easily be fixed.

} against correctness of such obsolete device was worthless and
} we had much more important problems on modern devices.

 Just because some people might consider a device to be obsolete is
no reason to not strive for correctness.

}-- End of excerpt from Izumi Tsutsui


Re: sys/dev/isa/fd.c FDUNIT/FDTYPE

2011-05-05 Thread John Nemeth
On Sep 25,  4:47pm, Martin Husemann wrote:
} On Fri, May 06, 2011 at 04:53:41AM +0900, Izumi Tsutsui wrote:
} > If we "fix" kernels to use DISKUNIT() and DISKPART() macro
} > for FDUNIT() and FDTYPE(), we can bump a number of fd types
} > to MAXPARTITIONS with no further changes.
} > Nothing needs to be done by users in that case.
} 
} Oh, now I see what you meant - didn't get it with the previous explanation -
} nice trick.
} 
} Still, if postinstall can warn users, I don't see a big deal in fixing it

 I don't think postinstall needs to provide any warning.  The extra
floppy letters either incorrectly refer to a different drive, or won't
do anything.  Also, sysinst will automatically run MAKEDEV when
upgrading a system.  Anybody that manually updates should be running
MAKEDEV as part of the process.  Failure to do so means they will be
missing new device nodes.

} in other ways (though I can live with both, and the Edgar can do his changes
} with your solution too, as he seems to use amd64).

 Hacking up just one floppy driver is an extremely bad idea.  It
just makes maintenance harder.  And, I don't see any real benefit to
this hack.

}-- End of excerpt from Martin Husemann


Re: sys/dev/isa/fd.c FDUNIT/FDTYPE

2011-05-05 Thread John Nemeth
On Sep 25,  5:05pm, Hauke Fath wrote:
} At 10:53 Uhr -0700 5.5.2011, John Nemeth wrote:
} >} land (and my current intent is only to have the content of some four
} >} shoeboxes full of (mostly ten-sector) Atari floppies more readily
} >
} > These are actually readable on a standard PC style floppy drive?
} 
} That's "Atari ST" (68k) standard MFM, as opposed to the 6502 based
} predecessors, which IIRC were closer to Apple II and Commodore formats. The
} ST shipped with a 68k MSDOS 2.0 clone.

 Ah, okay.

} Because of the WD1772 controller's properties, and on certain floppy
} drives, people managed to squeeze up to 12 sectors on a track with
} custom-made formatters that reduced the inter-sector gaps.

 The sparc{,64} floppy controller chip is basically a PC floppy
controller chip.  The main difference is that SBus based machines don't
support PC style DMA, so the chips are configured for pseudo-DMA where
you read/write data one byte at a time inside an interrupt handler.

} (who wrote another "late NetBSD floppy driver" for the Macintosh IWM)

 That must have been fun.  As I recall, this was basically the
Apple ][ disk controller squished into one chip. It was hard to call
that thing a disk controller since it wasn't much more then a TTL
driver and a shift register.

 My first experience with "operating systems" was basically taking
a printout of a disassembly of Apple DOS 3.3, figuring out how it
works, and documenting the entire thing, then mangling it to do various
tricks.  I wonder where that printout is?  I probably still have it
somewhere.

}-- End of excerpt from Hauke Fath


Re: sys/dev/isa/fd.c FDUNIT/FDTYPE

2011-05-05 Thread John Nemeth
On Sep 25,  9:46am, Tom Spindler wrote:
}
} Increasingly offtopic, but...
} 
} >  That must have been fun.  As I recall, this was basically the
} > Apple ][ disk controller squished into one chip. It was hard to call
} > that thing a disk controller since it wasn't much more then a TTL
} > driver and a shift register.
} 
} And a state machine as provided by one of the PROMs. A terribly complete
} analysis can be found in "Understanding the Apple II" by James F Sather.

 The PROMs on the Apple ][ only handled booting.  They didn't have
anything to do with the drives after booting.  The one other thing I
didn't mention was some kind of data discriminator to seperate clock
pulses from data pulses.  The hardware controller didn't even do the
GCR encoding/decoding, that was done in software.

 The code in the PROMs couldn't do much more then "recalibrate" the
head (basically move the head towards track 0 and slam it against the
stop a whole bunch of times due to lack of a track 0 detect switch),
read some sectors, and jump to the beginning.  One of the tricks I did
was to hack the first sector to make it ask for a password.  Combining
this with "copy protection schemes" that made the disk unreadable by
unmodified DOS 3.3 made for a fairly effective way to protect data.
Wouldn't stand up to anybody that truly knew what they were doing, but
beat 90% of people.

} >  My first experience with "operating systems" was basically taking
} > a printout of a disassembly of Apple DOS 3.3, figuring out how it
} > works, and documenting the entire thing, then mangling it to do various
} > tricks.  I wonder where that printout is?  I probably still have it
} > somewhere.
} 
} "Beneath Apple DOS", by Worth and Lechner, is probably a good alternative.

 Yes, I have that.  But, I learned a lot more by doing it myself
first.  Reading a book that gives you all the answers doesn't teach you
nearly as much.

} Both of the books I mentioned appear to be up on scribd these days.

 URL?

}-- End of excerpt from Tom Spindler


Re: sys/dev/isa/fd.c FDUNIT/FDTYPE

2011-05-13 Thread John Nemeth
On Oct 3,  5:43pm, David Laight wrote:
} On Wed, May 04, 2011 at 01:40:06PM +0100, David Brownlee wrote:
} > On 4 May 2011 12:50, Izumi Tsutsui  wrote:
} > > The problem is that there might be some ports whose MAXPARTITIONS is 
still 8
} > > and such ports can't use type 8.
} > 
} > Maybe this is time to update those remaining ports to 16... :)
} 
} Last I looked that was related to the offset of the disklabel into
} the disk sector.
} In some cases there isn't room for 16 partition records.

 Floppies don't have labels.  The final character in fd0a has
nothing to do with a partition.  It is used to indicate the type of
floppy in the drive.

}-- End of excerpt from David Laight


Re: kernel v. userland #includes for standard types

2011-06-04 Thread John Nemeth
On Oct 25,  2:49pm, David Holland wrote:
} On Sat, Jun 04, 2011 at 01:09:19PM -0500, David Young wrote:
}  > Is there a good reason that we are not using  and 
}  > in the kernel, instead of  and ?  It seems
}  > to me that we could cut simplify by using the same headers for the same
}  > definitions everywhere.
} 
} We should, but the source tree organization doesn't really allow it.
} (I assume you agree that creating any src/sys/*.h files would be a
} mistake.)
} 
} In the long run we should add src/sys/include, but in general I think
} major structural cleanup like this should wait until we manage to do a
} version control migration.

 The files in src/sys/sys get installed into /usr/include/sys .

}-- End of excerpt from David Holland


Re: kernel v. userland #includes for standard types

2011-06-04 Thread John Nemeth
On Oct 25,  4:49pm, David Holland wrote:
} On Sat, Jun 04, 2011 at 02:27:59PM -0700, John Nemeth wrote:
}  > }  > Is there a good reason that we are not using  and 

}  > }  > in the kernel, instead of  and ?  It seems
}  > }  > to me that we could cut simplify by using the same headers for the 
same
}  > }  > definitions everywhere.
}  > } 
}  > } We should, but the source tree organization doesn't really allow it.
}  > } (I assume you agree that creating any src/sys/*.h files would be a
}  > } mistake.)
}  > } 
}  > } In the long run we should add src/sys/include, but in general I think
}  > } major structural cleanup like this should wait until we manage to do a
}  > } version control migration.
}  > 
}  >  The files in src/sys/sys get installed into /usr/include/sys .
} 
} Yes, I'm aware of that. Your point being?

 Then what purpose would src/sys/include serve?

}-- End of excerpt from David Holland


Re: RFC: New security model secmodel_securechroot(9)

2011-07-09 Thread John Nemeth
On Nov 29,  7:06am, Joerg Sonnenberger wrote:
} On Sat, Jul 09, 2011 at 12:03:50PM +0300, Aleksey Cheusov wrote:
} > DESCRIPTION
} >  The securechroot security model is intended to protect the system
} >  against destructive modifications by chroot-ed processes.  If
} >  enabled, secmodel_securechroot applies the following restrictions
} >  to chroot-ed processes.
} 
} >  ·   Module requests are not allowed.
} 
} Does this include automatic loading of modules as side effect of actions
} or not?

 This should be fine.  When autoloading, it will only use the
system path and doesn't follow chroot.

} >  ·   Firewall-related operations such as modification of packet
} >  filtering rules or modification of NAT rules are not allowed.
} 
} Table manipulation is a valid use case of a chroot, especially a
} restricted chroot. Consider FTP proxies as example.

 Manipulating global state is a pretty major exception considering
the rest of the stuff here.  If you want that, then don't use this
module.

}-- End of excerpt from Joerg Sonnenberger


Re: Don't load kernel modules from the current directory, second diff

2011-08-04 Thread John Nemeth
On Dec 25,  7:20am, Marc Balmer wrote:
} Subject: Re: Don't load kernel modules from the current directory, second 
} This is a multi-part message in MIME format.
} --030702090605080608070109
} Content-Type: text/plain; charset=ISO-8859-15
} Content-Transfer-Encoding: 7bit
} 
} Thanks to all that replied to my initial diff.  This second version is
} better, it allows to load a module from the filesystem with either an
} absolute path starting with '/' or a relative path starting with '.'.
} So you can still load a module from the CWD using
} 
} modload ./mymodule.kmod
} 
} module_load_vfs() is changed in two ways:  When a module is loaded from
} the path given to modload, it must start with either '.' or '/'.  If a
} path is constructed to load the module from the system module area, it
} must not start with '.' or '/'.

 If you really want to beef up the security of loading from the
system module area, you should make sure there is no / anywhere in
name.  Granted, with name being added to path twice, it will be very
difficult to come up with something that will escape the system module
area and load some random module (even without your change to that
part).

} kobj_load_vfs() will only load an object with a path starting with
} either '/' or '.'
} 
}-- End of excerpt from Marc Balmer


Re: modload_03.diff, was: Don't load kernel modules from the current directory

2011-08-05 Thread John Nemeth
On Nov 20,  8:34pm, Iain Hibbert wrote:
} On Fri, 5 Aug 2011, Marc Balmer wrote:
} 
} > This is the third iteration of the patch to make kernel module loading
} > more secure.  The only change to the previous patch is that the code,
} > when loading a module from /stand/... now checks that the module name
} > does not contain a path separator character.
} >
} > modload  still works, but  must be available in the system
} > module area under /stand/...
} >
} > To load from any other location, either an absolute path or a relative
} > path starting with a '.' is needed.
} 
} strchr() is available in kernel I think

 I was wondering about this...

} also, is this complication of '.' really needed?  What I mean is, if you
} are checking for the path separator, why limit to current directory?
} 
}   if (strchr(name, '/') == NULL)
}   path = //.kmod
}   else
}   path = 

 I think I like this better for the first part.  Then the second
part is just for autoload and it can be left alone (or turned into an
else clause) since autoload can only be done from inside the kernel.

} which is the same semantics used by many other 'automatic file path'
} operations, requiring explicit current-dir to avoid accidents..
} 
} (as noted, you didn't exclude ./sub/dir/module anyway)
} 
}-- End of excerpt from Iain Hibbert


Re: RFC: lseek() extensions: SEEK_HOLE / SEEK_DATA with patch

2011-08-09 Thread John Nemeth
On Dec 28,  1:27pm, Reinoud Zandijk wrote:
} On Sun, Aug 07, 2011 at 02:14:50PM +, David Holland wrote:
} > On Sun, Aug 07, 2011 at 09:52:11AM +0200, Reinoud Zandijk wrote:
} >  > i've implemented the SEEK_HOLE / SEEK_DATA additions to lseek() as
} >  > introduced by Solaris for ZFS.
} > 
} > What does this operation have to do with seeking? And why involve the
} > seek pointer, especially at a time when new calls are being added left
} > and right to cope with files accessed from more than one thread at a
} > time in the same process?
} 
} One `seeks' the next hole of data block in the file from a given pos? Sounds
} logical to me.

 Sounds like rationalising to me.  David is right, this is a truely
brain dead interface.  Having said that, I understand that sometimes we
get stuck with brain dead interfaces simply because they get into
common use.  The question is, do we have time to come with a better
interface that we can hopefully encourage others to use?  Or, is this
one already in common use across multiple OSes and applications?

}-- End of excerpt from Reinoud Zandijk


Re: DADHI drivers for Asterisk?

2012-01-09 Thread John Nemeth
On Jun 1,  3:02am, Marc Balmer wrote:
} Am 09.01.2012 03:06, schrieb Emmanuel Dreyfus:
} > Hello everybody
} >
} > PCI boards for Asterisk require kernel drivers, which used to be
} > provided by the now obsolete and retired zaptel package. We now need
} > DADHI drivers, which have a FreeBSD port here:
} > http://svn.digium.com/svn/dahdi/freebsd/trunk
} >
} > Anyone started working on porting that to NetBSD?
} 
} Have you realized that these drivers are apparently GPL/LGPL and thus 
} not suitable for NetBSD kernel inclusion?

 There is no problem with creating a module and sticking it in
pkgsrc.

}-- End of excerpt from Marc Balmer


Re: Areca SAS controllers (was: NetBSD on current AMD motherboards)

2012-01-18 Thread John Nemeth
On Jun 10,  9:36am, =?iso-8859-1?Q?Edgar_Fu=DF?= wrote:
}
} > Areca is one decent choice.
} Thanks. But which Areca controllers are supported by NetBSD?
} I can find lots of their PCI IDs in dev/pci/arcmsr.c, bu I fail to map
} product names (e.g. ARC-1320-8i) to PCI IDs.

 I'm using:

arcmsr0 at pci1 dev 0 function 0
arcmsr0: interrupting at ioapic1 pin 0, event channel 5
arcmsr0: Areca ARC-1680 Host Adapter RAID controller
arcmsr0: 8 ports, 2048MB SDRAM, firmware 

004:00:0: Areca ARC-1680 (RAID mass storage)
004:00:0: 0x168017d3 (0x0104)
 (VendID = 17d3, ProdID = 1680)

It works fine for accessing disks, but management and monitoring (i.e.
bioctl) doesn't work.

}-- End of excerpt from =?iso-8859-1?Q?Edgar_Fu=DF?=


Re: Module name - recommendations/opinions sought

2012-04-25 Thread John Nemeth
On Sep 16,  7:17am, Masao Uebayashi wrote:
}
} I thought module names alway match what's "define"'ed in config(1) files...

 At this time, config(1) has absolutely nothing to do with
modules.  There is a movement afoot to change that.  I'll just say that
I don't agree with changing that, but I don't feel like getting into
yet another flamewar and certainly don't have time to do so, so I won't
say anything else on the subject.

}-- End of excerpt from Masao Uebayashi


Re: Modularizing net80211 (was: link_set info needed)

2012-04-29 Thread John Nemeth
On Sep 18, 11:45am, David Laight wrote:
} Subject: Re: Modularizing net80211 (was: link_set info needed)
} > 
} > This mechanism only works for modules that are "separate" from the 
} > kernel (loaded via "boot" or from "filesys").  "builtin" modules still 
} > need to use the link_set mechanism.
} 
} Shouldn't be that hard to put the contructor list address into a
} link_set - that would make it easy to get them called for 'built in'
} modules.
} 
} Thinks a bit further...
} Have each module define a snall structure that contains (say):
} - it's name
} - the address of a list of modules it depends on
} - the address of its constructors
} - the address of its destructios

 You mean something like this, which is what modules do today:

/* Module header structure. */
typedef struct modinfo {
 u_int   mi_version;
 modclass_t  mi_class;
 int (*mi_modcmd)(modcmd_t, void *);
 const char  *mi_name;
 const char  *mi_required;
} const modinfo_t;

mi_required is a comma seperated list of dependencies.  This structure
could be expanded.

}-- End of excerpt from David Laight


Re: Should kqueue descriptors work outsid of the creating process?

2012-06-02 Thread John Nemeth
On Oct 21,  9:16am, Matthew Mondor wrote:
} On Thu, 31 May 2012 10:38:38 -0400 (EDT)
} Mouse  wrote:
} 
} > > Recently we found out (PR kern/46463) that kqueue() file descriptors,
} > > which originaly were designed to be "local process only" objects,
} > > could be passed with SCM_RIGHTS messages to other processes.  [...]
} > 
} > > I propose to not allow sending kqueue file descriptors [...]
} > 
} > > Or are there any legit uses for "foreign" kqueue()s?
} > 
} > It seems to me, for what it may be worth, that this is asking the
} > wrong question.  Rather, I would ask whether there are illegitimate
} > uses for `foreign' kqueue descriptors, and, if not, fix them to be
} > passable like any other descriptors.
} 
} It's true that it's normally the parent's reponsibility to decide which
} FDs to close or set close-on-exec before fork(2)... Was there a design
} decision not to inherit kqueue descriptors for security or complexity
} reasons?
} 
} Since signals, signal mask, signal stack and restart/interrupt flags
} are also inherited according to sigaction(2), probably that an
} EVFILT_SIGNAL filter would still be valid...
} 
} But how about EVFILT_TIMER?  timer_create(2) timers are not inherited,
} setitimer(2) doesn't specify, but it also uses the same ptimers pool
} timer_create(2) uses.  EVFILT_TIMER apears to use its own system though.
} 
} For EVFILT_PROC, it appears to be for the specified process, so I guess
} it might still work if inherited?
} 
} And there also EVFILT_VNODE... who knows what other filters might be
} added in the future?
} 
} What I can see is that the implications of inheriting this special
} descriptor are quite more complex than for normal FDs...  Which makes
} me think that it very well could be a design decision not to inherit
} these, in which case I don't object to also prevent passing it via
} SCM_RIGHTS ancillary message.

Although, I don't know much about kqueue, I disagree with this.
Passing kqueue FDs via fork could be considered an oversight,
especially since they don't normally get inherited.  However, passing
them via SCM_RIGHTS is a very deliberate action.  We should assume that
any application doing this knows what it is doing.

}-- End of excerpt from Matthew Mondor


Re: disklabel problems on 3TB disc

2012-07-21 Thread John Nemeth
On Dec 11,  6:39pm, Edgar =?iso-8859-1?B?RnXf?= wrote:
}
} > or put the raid on the raw disk itself.
} I'm not sure what you mean by that. If the disc is sd2, what string do I put
} in the RAIDframe ``START disks'' section? You can't mean ``rsd0c'' because

 rsd0d (assuming SCSI, or rwd0d for non-SCSI) on x86.

} a) I can't have raw partitions as RAIDframe components, can I? and

 According to this thread, yes you can.

} b) the kernel mis-computes rsd0c's size, doesn't it?

 The kernel knows the size of the disk.  It's just that the format
of disklabel(5) doesn't allow for disks larger then 2TB.  Of course,
when disklabel(5) was created, disks were sub-100M, and a disk larger
then 2TB was unimaginably large.

} And how do I limit the number of sectors RAIDframe uses (so I can replace
} a failed disc by a slightly smaller one)?

 Not sure off the top of my head.

}-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=


Re: Core statement on directory naming for kernel modules

2012-07-26 Thread John Nemeth
On Dec 17,  1:54am, David Laight wrote:
} On Fri, Jul 27, 2012 at 08:00:27AM +0200, Alan Barrett wrote:
} > 
} >Changes to config(1) to extend the existing notion of whether or not
} >an option is built-in to the kernel, to three states: built-in, not
} >built-in but loadable as a module, entirely excluded and not even
} >loadable as a module.
} 
} Add built-in and marked as 'loaded' in the module list.

 Have you typed "modstat" lately?  Built-in modules do show up in
the module list.  A 6.99.7 i386 kernel from May 13th is showing 115
built-in modules.

} It would also allow, on some architectures at least [1], a netbsd.o
} be generated with 'ld -r' and modules added (with ld -r) later
} prior to a final link.

 This would probably be a good way of doing it, since it might even
allow for the unloading of built-in modules in such a way that they
could be replaced.  Right now, built-in modules can't be unloaded as
they are part of a monolithic kernel and there is no way of untangling
them.  Being able to properly unload a built-in module would be a nice
feature.

}-- End of excerpt from David Laight


Re: Core statement on directory naming for kernel modules

2012-07-27 Thread John Nemeth
On Dec 17,  1:58pm, Matthew Mondor wrote:
} On Fri, 27 Jul 2012 13:57:52 + (UTC)
} Geoff Wing  wrote:
} > John Nemeth  typed:
} > : .. Being able to properly unload a built-in module would be a nice
} > : feature.
} > 
} > This sounds a bit like a possible security problem, though 
presumably/hopefully
} > limited by the current security level and AAA.
} 
} Do you mean in the case an external module could then be loaded instead
} of a built-in one?  Probably that someone who wants to prevent the
} kernel from loading external modules would use a kernel without
} MODULAR, or change the runlevel.

 True enough.

} This reminds me though: why/how does sysctl/kern.module.autoload
} default to 1 for non-MODULAR kernels (at least on netbsd-6)?  Or an
} alternative question: are these sysctl knobs useful at all with
} non-MODULAR kernels, or are they then artifacts?

 Good question.  Non-MODULAR kernels still have parts of the MODULAR
subsystem in order to initialise built-in modules.  However, the linking
code isn't there, so it would be impossible to load a module.  I'll make
a note to trim some of the excess stuff in non-MODULAR kernels.

}-- End of excerpt from Matthew Mondor


Re: Can't load solaris and zfs module in Xen DomU

2012-11-18 Thread John Nemeth
On Apr 10,  1:26pm, Lukas Laukamp wrote:
} 
} I have a problem to get ZFS running in a NetBSD 6.0 Xen DomU.

 Modules aren't supported with Xen at this time.

} I installed the DomU over the normal way and recompiled the NetBSD DomU 
} kernel with MODULAR option so that it can load modules. I copied the 
} kernel image to Dom0 (Debian Squeeze Kernel 2.6.32/Xen 4.0) and booted 
} up the DomU which works fine. But when I try to load the modules I get 
} the following error as well as for solaris and zfs module:

 How did you compile these modules?  One of the issues with Xen and
modules is that Xen changes a bunch of kernel ABIs.  That means you
need to compile modules with -DXEN (which won't happen by default) to
even try loading modules.  If you don't do this, and you "successfully"
load a module, then there is a very good chance of the system going
"boom!"

} So does somebody know how I could fix this problem and is it possible to 
} compile ZFS statically into the kernel?

 I haven't been following ZFS that closely, as currently it isn't
even close to being in production ready state, but something in the
back of my mind says it isn't possible, possibly due to licensing
issues.

}-- End of excerpt from Lukas Laukamp


Re: Can't load solaris and zfs module in Xen DomU

2012-11-18 Thread John Nemeth
On Apr 10,  2:27pm, Lukas Laukamp wrote:
} Am 18.11.2012 19:26, schrieb John Nemeth:
} > On Apr 10,  1:26pm, Lukas Laukamp wrote:
} > }
} > } I have a problem to get ZFS running in a NetBSD 6.0 Xen DomU.
} >
} >   Modules aren't supported with Xen at this time.
} >
} > } I installed the DomU over the normal way and recompiled the NetBSD DomU
} > } kernel with MODULAR option so that it can load modules. I copied the
} > } kernel image to Dom0 (Debian Squeeze Kernel 2.6.32/Xen 4.0) and booted
} > } up the DomU which works fine. But when I try to load the modules I get
} > } the following error as well as for solaris and zfs module:
} >
} >   How did you compile these modules?  One of the issues with Xen and
} > modules is that Xen changes a bunch of kernel ABIs.  That means you
} > need to compile modules with -DXEN (which won't happen by default) to
} > even try loading modules.  If you don't do this, and you "successfully"
} > load a module, then there is a very good chance of the system going
} > "boom!"
} >
} > } So does somebody know how I could fix this problem and is it possible to
} > } compile ZFS statically into the kernel?
} >
} >   I haven't been following ZFS that closely, as currently it isn't
} > even close to being in production ready state, but something in the
} > back of my mind says it isn't possible, possibly due to licensing
} > issues.
} 
} I don't recompiled the kernel modules, I simply tried the modules which 

 That definitely won't work.

} was binary shipped with the base system. So I will try to recompile the 
} modules and hope that there will be a way to get it working. Or is there 
} an alternativ filesystem which has modern design and would be usable on 
} much platforms? So for ZFS there is support in FreeBSD, Linux and I 
} think read only support on Windows. So I thought it would be a good choice.

 FAT works on all platforms, but is hardly modern.  I don't think
there is a modern file system that works on all platforms.  You're
going to have to be more specific.  Also, take a look in
pkgsrc/filesystems.  There are things in there that might be useful,
such as fuse-ntfs-3g.

}-- End of excerpt from Lukas Laukamp


Re: fexecve, round 2

2012-11-19 Thread John Nemeth
On Apr 11,  9:48am, Emmanuel Dreyfus wrote:
} On Mon, Nov 19, 2012 at 02:39:36PM +, Julian Yon wrote:
} > No, Emmanuel is right: "[...] use the O_EXEC flag when opening fd. In
} > this case, the application will not be able to perform a checksum test
} > since it will not be able to read the contents of the file." You can
} > open with --x but (correctly) you can't read from the file.
} 
} And it means the standard mandates that one can execute without
} read access. Weird.

 Not weird at all.  This is the way Unix systems have been behaving
for as long as I can remember, and I've been working with Unix systems
for aproximately 20 years.

}-- End of excerpt from Emmanuel Dreyfus


Re: very bad behavior on overquota writes

2012-11-22 Thread John Nemeth
On Apr 14,  7:25am, Manuel Bouyer wrote:
} On Thu, Nov 22, 2012 at 12:46:54PM +0100, Manuel Bouyer wrote:
} > @@ -521,6 +527,16 @@ out:
} > (void) UFS_TRUNCATE(vp, osize, ioflag & IO_SYNC, ap->a_cred);
} > uio->uio_offset -= resid - uio->uio_resid;
} > uio->uio_resid = resid;
} > +   if (error == EDQUOT || error == ENOSPC) {
} > +   /* if the process keeps writing (e.g. nfsd),
} > +* UFS_TRUNCATE() may be very expensive as it
} > +* walks the page list. As a workaround flush and
} > +* free all pages associated with this vnode
} > +*/
} > +   (void)VOP_PUTPAGES(vp, 0, 0,
} > +   PGO_ALLPAGES |PGO_CLEANIT | PGO_FREE | PGO_SYNCIO |
} > +   PGO_JOURNALLOCKED);
} > +   }
} > } else if (resid > uio->uio_resid && (ioflag & IO_SYNC) == IO_SYNC)
} > error = UFS_UPDATE(vp, NULL, NULL, UPDATE_WAIT);
} > else
} 
} And we should probably do this on any erorr, no only space-related errors.
} I've adjusted my local tree.

 Would that prevent recovering in the case where the user
disconnects a device (typical example is a thumb drive) and later
reconnects it (once we have the ability to handle this situation)?  I
guess that depends whether requests are held at the file system layer
or the device layer?  At this point, dropping the blocks should
probably only be done when the error is known to be permanent.

}-- End of excerpt from Manuel Bouyer


Re: Broadcast traffic on vlans leaks into the parent interface on NetBSD-5.1

2012-12-04 Thread John Nemeth
On Apr 22,  5:50pm, Robert Elz wrote:
}
} Date:Thu, 29 Nov 2012 22:54:24 -0500 (EST)
} From:Mouse 
} Message-ID:  <201211300354.waa22...@sparkle.rodents-montreal.org>
} 
} On the general VLAn topic, I agree with all Dennis said - leave the VLAN tags
} alone and just deal with them.
} 
}   | > I believe every use of BPF by an application to send and receive
}   | > protocol traffic is a signal that something is missing
} 
} I think "was missing" might be a better characterisation.
} 
}   | ...in general, I agree, but in the case of DHCP, I'm not so sure.  It
}   | needs to send and receive packets to and from unusual IPs (0.0.0.0, I
}   | think it is), if nothing else.
} 
} But that's not it, the DHCP server has no real issue with 0 addresses,
} that's the client (the server might need to receive from 0.0.0.0 but
} there's no reason at all for the stack to object to that - sending to
} 0.0.0.0 would be a truly silly desire for any software, including DHCP
} servers).

 Obtaining an address via DHCP is a four step process, and the
client can't legitimately use the new address until the fourth step is
completed.  To what address would you like the DHCP server to send its
responses?  I suppose the DHCP server could send responses to the
broadcast address, but I couldn't guarantee that every client would be
listening for them there (it's been a while since I looked at the
details).

} The missing part used to be (I believe we now have APIs that solve this
} probem) that the DHCP server needs to know which interface the packet
} arrived on - that's vital.  The original BSD API had no way to convey that
} information to the application, other than via BPF, or via binding a socket
} to the specific address of each interface, and inferring the interface
} from the socket the packet arrived on.  The latter is used by some UDP apps
} (including BIND) but is useless for DHCP, as the packets being received
} aren't sent to the server's address, but are broadcast (or multicast for v6).
} 
} As the DHCP server needed to get the interface information, it had to
} go the BPF route.  Once that's written, and works, there's no real reason
} to change it, even given that a better API (or at least "an API", by
} definition it is better than the nothing that existed before, even though
} it isn't really a great API) now exists.  Retaining use of the BPF code allows
} dhcpd to work on older systems, and newer ones, without needing config
} options to control which way it works, and duplicate code paths to maintain.

 We use ISC's DHCP server.  As third party software, it is designed
to be portable to many systems.  BPF is a fairly portable interface,
thus a reasonable interface for it to use.

}-- End of excerpt from Robert Elz


Re: Broadcast traffic on vlans leaks into the parent interface on NetBSD-5.1

2012-12-06 Thread John Nemeth
On Apr 27,  3:15am, David Laight wrote:
} On Tue, Dec 04, 2012 at 10:17:23PM -0800, John Nemeth wrote:
} > 
} >  We use ISC's DHCP server.  As third party software, it is designed
} > to be portable to many systems.  BPF is a fairly portable interface,
} > thus a reasonable interface for it to use.
} 
} One thing I discovered long ago, in an operating system far ... well
} not NetBSD is that dhcp's use of the bpf (equivalent) caused a data
} copy for every received ethernet frame - at considerable cost.
} I've NFI whether this happens withthe current code.

 Given that DHCP is very low traffic, I'm not sure that this really
matters.

} Although DHCP has to do strange things in order to acquire the
} original lease, renewing it should really only requires packets
} with the current IP address.

 True.  Renewing a lease takes two packets, a RENEW request, and an
ACK.  Those packets are sent using assigned addresses for both the
destination and the source.

}-- End of excerpt from David Laight


Re: Broadcast traffic on vlans leaks into the parent interface on NetBSD-5.1

2012-12-07 Thread John Nemeth
On Apr 29,  4:38pm, Robert Elz wrote:
}
} Date:Tue, 4 Dec 2012 22:17:23 -0800
} From:jnem...@victoria.tc.ca (John Nemeth)
} Message-ID:  <201212050617.qb56hncf018...@vtn1.victoria.tc.ca>
} 
}   | Obtaining an address via DHCP is a four step process, and the
}   | client can't legitimately use the new address until the fourth step is
}   | completed.
} 
} Agreed.
} 
}   | To what address would you like the DHCP server to send its responses?
} 
} You should really "look at the details", but if you're suggesting that
} 0.0.0.0 might be an appropriate place to send it, then you really do
} consider how that could possibly work.

 If you look at ISC's changelog for dhclient, you'll find my name.
However, it's been quite a few years since I've dugged into the
protocol/code at that level, and I don't have time to do it at this
moment.

}   | I suppose the DHCP server could send responses to the broadcast address,
} 
} That is one of its options, and is the one most commonly used I think.
} 
} The other option is to send to the assigned address.  The client can't
} claim it yet, but that doesn't mean that the server cannot make use of it.

 No, it can't.  Obtaining an address is a four step process.  The
first step, the client broadcasts a DISCOVER request.  The second step,
all servers that receive the message respond with an OFFER.  After
that, the client will make a REQUEST to one of the servers, and finally
the server will ACK.  Typically, there is only one server for a given
network segment, but you can't assume that.  And, there is no assigned
address until the third step is completed.

} Naturally, not all clients can do this,, so DHCP (actually, originally,
} BOOTP) has a bit in the request that the client can set to instruct
} the server to use a broadcast reply.  Even when the client could hande
} it, there's no guarantee that the server can make this method work.

 Yes, I'm aware of the broadcast bit.  The local cable company's
DHCP server would fail to respond if the bit was set.  I don't know if
it still has that problem, and I have other things to do besides
probing it.

} The point from this for us, is that it isn't necessarily because of a
} current defect in NetBSD's IP stack that the server is using BPF for its
} purposes, however unfortunate that it might be that it (once) needed to
} resort to that extreme.

 I don't think this is a problem, since we provide the BPF
interface anyways, and the DHCP software might not be the only software
using it.

}-- End of excerpt from Robert Elz


Re: lua(4), non-invasive and invasive parts

2012-12-28 Thread John Nemeth
On May 20,  6:34am, Marc Balmer wrote:
} Am 28.12.2012 um 11:43 schrieb Jukka Ruohonen :
} > On Fri, Dec 28, 2012 at 11:35:05AM +0100, Marc Balmer wrote:
} >>> What does this mean? Also the kernel modules using lua(4) will be
} >>> conditionally compiled? I think this is fairly strongly against the design
} >>> principles of module(7).
} >> 
} >> This means that gpiosim(4) can be compiled with Lua support, if 'options
} >> LUA' is defined in the kernel configuration.  As Lua in the kernel is
} >> experimental, such a guard makes sense.
} > 
} > I think this has been discussed previously, the conclusion being that kernel
} > modules should not diverge upon changes in the kernel configuration options.
} > 
} > The practical case is: what happens when I try to load gpiosim(4) after
} > having compiled a kernel with a LUA option but not having updated
} > userland/modules (or vice versa)?
} 
} No harm done.
} 
} A gpiosim(4) module compiled without lua(4) support will just work (if
} gpio(4) support is present), a gpiosim(4) module compiled *with* lua(4)
} support will try to load the gpio(4) module and the lua(4) module, and
} if either fails it will itself not load.

 The point is that module compilation is in no way dependent on the
kernel config file (at the moment).  This means that putting
"options LUA" in a kernel config file will not change how the gpiosim
module is compiled.  As a result the gpiosim module should always be
compiled as if "options LUA" was defined (once the lua module exists,
of course).

}-- End of excerpt from Marc Balmer


Re: lua(4), non-invasive and invasive parts

2012-12-29 Thread John Nemeth
On May 21,  6:10am, Marc Balmer wrote:
}
} > this is going to upset dyoung i'm sure :) but it seems to me that
} > if these invasive changes to individual subsystems are needed like
} > this, and we want them to be optional, then imo they should be on
} > a per-subsystem basis, not global.  eg something like:
} > 
} > options LINEDISC_LUA
} > options GPIOSIM_LUA
} > 
} > etc.  the ugliness could/should be largely hidden in header files.
} 
} The problem remains that modules no nothing about kernel options.  Maybe
} - in an ideal world - there should be no kernel options at all, but only
} modules... ;)

 Which is fine for gpiosim, as it can just depend on the lua
module.  For LINEDISC_LUA, there would have to be some kind of hook to
which the lua module would attach when loaded, so that the kernel would
still function even without the module loaded.

}-- End of excerpt from Marc Balmer


Re: MI boot args revamp?

2012-12-30 Thread John Nemeth
On May 22,  9:38am, Jean-Yves Migeon wrote:
} Le 29/12/12 22:23, Jeff Rizzo a écrit :
} > On 12/29/12 1:12 PM, Greg Troxel wrote:
} >>I would like to have a way to pass a string composed of the same flags
} >>(we can continue to use our existing "-a", "-s" and other flags) in a
} >>consistent manner from one platform to another, to be able to adjust
} >>driver options, kernel options, whatever, and to be able to expect it
} >>to be similar whether I'm on amd64, macppc, evbppc, evbarm, or
} >>whatever.
} >>
} >> Are you talking about the UI of how the strings are written and what
} >> they mean or how the bootloader stage that interacts with the user/prom
} >> communicates this to the kernel?  For platforms with existing
} >> conventions, I don't see how we can interact with native bootloaders
} >> without meeting their interface.
} >
} > There are always going to be exceptions;  certain platforms (especially
} > older ones) are not flexible enough to do everything we want the way we
} > want it.  What I _would_ like to get to is "this is the recommended goal
} > to shoot for".
} 
} That really depends on the capabilities of the MD component. I have a 
} good example with Xen though.
} 
} Xen port parses a command line for which the syntax is very close to the 
} one used by Linux (key=value) syntax [1]. Having a command line close to 
} this syntax has a potential for code reuse, or even turn it into an 
} MI/MD interface.

 Xen uses multiboot.  Yet another thing on my todo list is to
handle boot time module loading in the multiboot case.

} As we have a decent module framework too, I would look at what module(7) 
} offers when we pass arguments to them. I would expect modules and kernel 
} share the same code when parsing args, this makes sense somehow. Typical 
} example is (again) Xen with a DOM0 kernel, where the kernel is loaded as 
} a module.

 module(7) arguments are passed as a plist.  Take a look at
sys/modules/example/example.c.  That is the simplest module.

}-- End of excerpt from Jean-Yves Migeon


Re: MI boot args revamp?

2012-12-31 Thread John Nemeth
On May 22,  6:25pm, Jean-Yves Migeon wrote:
} Le 30/12/12 22:40, John Nemeth a écrit :
} > On May 22,  9:38am, Jean-Yves Migeon wrote:
} >
} > } As we have a decent module framework too, I would look at what module(7)
} > } offers when we pass arguments to them. I would expect modules and kernel
} > } share the same code when parsing args, this makes sense somehow. Typical
} > } example is (again) Xen with a DOM0 kernel, where the kernel is loaded as
} > } a module.
} >
} >   module(7) arguments are passed as a plist.  Take a look at
} > sys/modules/example/example.c.  That is the simplest module.
} 
} Interesting -- what kind of runtime requirements are there when plist 
} are used? I suppose that building up a plist from low level code is not 
} quite practical (bootloaders and/or early kernel boot).

 You need to link with libprop.  I already have plans to link
libprop to the x86 /boot for various reasons, although I haven't
finalised how I'm going to do everything.

} modload(8) has a custom implementation to internalize passed arguments 
} (-b, -i ,...) into a plist, however the binary can rely on a rich 
} runtime when executed, which is not necessarily the case when parsing 
} cmdline strings.

 Either the keyword can be used to imply the argument type, or you
would have to analyze the value to figure out what it is.  Since you've
looked at the source for modload, you will have seen that there are
different proplib functions to add different types of objects, so you
will need to figure out the type of object.

}-- End of excerpt from Jean-Yves Migeon


Re: POSIX Semaphores

2013-02-23 Thread John Nemeth
On Jun 9,  7:19pm, Paul Goyette wrote:
}
} According to the man page sem(4), one needs to include "options 
} P1003_1B_SEMAPHORE" in the kernel config file in order to support this 
} feature.  Yet, the file kern/uipc_sem.c is included unconditionally in 
} all kernels, and there appears to be nothing in NetBSD anywhere that 
} depends on P1003_1B_SEMAPHORE.
} 
} Most of the MODULAR-ization work has already been done (with only the 
} actual building of the loadable module left), so I would like to propose 
} that this feature be made conditional, as described in sem(4).

 The feature was condiational in the past, but made unconditional
on the basis that it is essential.  The man page is out of date.  The
appropriate action would be to delete that statement from the manpage.

}-- End of excerpt from Paul Goyette


Re: Lua in-kernel (lbuf library)

2013-10-18 Thread John Nemeth
On Oct 18, 11:03am, Marc Balmer wrote:
} Am 18.10.13 10:43, schrieb Artem Falcon:
} > Marc Balmer  msys.ch> writes:
}  Justin Cormack  specialbusservice.com> writes:
}  I have been using the luajit ffi and luaffi, which let you directly
}  use C structs (with bitfields) in Lua to do this. It makes it easier
}  to reuse stuff that is already defined in C. (luaffi is not in its
}  current state portable but my plan is to strip out the non portable
}  bits, which are the function call support).
} 
}  Justin
} > 
} > I had successfully used more lightweight solution called "Lua AutoC" [1] 
with
} > Marc's lua(4).
} > Pros: light in comparison to other FFI libs, joy in use, easy to adopt to be
} > used in kernel, does the things in runtime, which gives the flexibility.
} > Cons: not widely tested, again does the things in runtime, which on other
} > side may give performance penalty.
} > 
} >>>
} >>> I never used luaffi. It sounds very interesting and I think it could
} >>> be very useful to bind already defined C structs, but my purpose is to
} >>> dynamically define data layouts using Lua syntax (without parsing C
} >>> code).
} >>
} >> FFI in the kernel can be dangerous.  Pure Lua is a perfect confinment
} >> for code, but with an FFI a Lua script can access almost anything in the
} >> kernel.  One has to think twice if one wants that.
} >>
} >> Well, assuming it would be module, so I would not have to load it if I
} >> don't want to.
} > 
} > It's desirable if you're writing a device driver in Lua, as you can do
} > most of work from Lua code (e.g. call C methods of NetBSD driver API
} > and feed them with C structs and pointers).
} > States and explicit exports of a certain foreign functions makes things
} > a bit less dangerous.
} > But in general you're right, one should do this with care.
} 
} lua(4) has a mechanism for Lua's 'require' statement.  Normally, when
} you require 'foo', it looks up wheter a kernel module name luafoo exists
} and loads it.  This automatic loading of modules can be turned off, to
} make a module available to a state, it has to be specifically assigned.
}  So when you turn autoloading off, a script could not simply call a ffi
} module by requiring it.
} 
} Maybe Lua kernel modules should carry a flag whether they should allow
} autoloading or not?  This way, an ffi module would still be loaded into
} the kernel when Lua code requires it, but lua(4) would detect the "don't
} autoload" flag and would then not_ assign the module to the Lua state.

 There is already a mechanism for this, see module_autoload(9).
You should always be using module_autoload() to load a module from
inside the kernel.  If the no autoload flag is set, then the call
will fail.  Thus, there is no need for lua(4) to try managing this
itself.  It should just attempt to load the module.  If successful,
great.  If not, then the feature being requested isn't available.

} > [1] https://github.com/orangeduck/LuaAutoC
} 
}-- End of excerpt from Marc Balmer


Re: Lua in-kernel (lbuf library)

2013-10-18 Thread John Nemeth
On Oct 19, 12:13am, Artem Falcon wrote:
} 18.10.2013, × 21:03, John Nemeth  wrote:
} > On Oct 18, 11:03am, Marc Balmer wrote:
} > } Am 18.10.13 10:43, schrieb Artem Falcon:
} > } > Marc Balmer  msys.ch> writes:
} > } >>>> Justin Cormack  specialbusservice.com> writes:
} > } >>>> I have been using the luajit ffi and luaffi, which let you directly
} > } >>>> use C structs (with bitfields) in Lua to do this. It makes it easier
} > } >>>> to reuse stuff that is already defined in C. (luaffi is not in its
} > } >>>> current state portable but my plan is to strip out the non portable
} > } >>>> bits, which are the function call support).
} > } >>>>
} > } >>>> Justin
} > } > 
} > } > I had successfully used more lightweight solution called "Lua AutoC" 
[1] with
} > } > Marc's lua(4).
} > } > Pros: light in comparison to other FFI libs, joy in use, easy to adopt 
to be
} > } > used in kernel, does the things in runtime, which gives the flexibility.
} > } > Cons: not widely tested, again does the things in runtime, which on 
other
} > } > side may give performance penalty.
} > } > 
} > } >>>
} > } >>> I never used luaffi. It sounds very interesting and I think it could
} > } >>> be very useful to bind already defined C structs, but my purpose is to
} > } >>> dynamically define data layouts using Lua syntax (without parsing C
} > } >>> code).
} > } >>
} > } >> FFI in the kernel can be dangerous.  Pure Lua is a perfect confinment
} > } >> for code, but with an FFI a Lua script can access almost anything in 
the
} > } >> kernel.  One has to think twice if one wants that.
} > } >>
} > } >> Well, assuming it would be module, so I would not have to load it if I
} > } >> don't want to.
} > } > 
} > } > It's desirable if you're writing a device driver in Lua, as you can do
} > } > most of work from Lua code (e.g. call C methods of NetBSD driver API
} > } > and feed them with C structs and pointers).
} > } > States and explicit exports of a certain foreign functions makes things
} > } > a bit less dangerous.
} > } > But in general you're right, one should do this with care.
} > } 
} > } lua(4) has a mechanism for Lua's 'require' statement.  Normally, when
} > } you require 'foo', it looks up wheter a kernel module name luafoo exists
} > } and loads it.  This automatic loading of modules can be turned off, to
} > } make a module available to a state, it has to be specifically assigned.
} > }  So when you turn autoloading off, a script could not simply call a ffi
} > } module by requiring it.
} > } 
} > } Maybe Lua kernel modules should carry a flag whether they should allow
} > } autoloading or not?  This way, an ffi module would still be loaded into
} > } the kernel when Lua code requires it, but lua(4) would detect the "don't
} > } autoload" flag and would then not_ assign the module to the Lua state.
} 
} Probably. It should be named as 'auto assign' for clarity, as module loading
} occurs anyway.
} 
} > There is already a mechanism for this, see module_autoload(9).
} > You should always be using module_autoload() to load a module from
} > inside the kernel.  If the noautoload flag is set, then the call
} > will fail.  
} 
} This is exactly what lua(4) does on 'requiring'.
} 
} > Thus, there is no need for lua(4) to try managing this
} > itself.  It should just attempt to load the module.  If successful,
} > great.  If not, then the feature being requested isn't available.
} 
} kern.lua.autoload is a safety barrier. One may wish not allow any lua kernel
} script to load any given lua kernel module.

 The lua(4) implementers can certainly do this if they want.
However, module_autoload() won't be looking at this flag and will
continue to refuse to autoload any module that has the noautoload
flag set.  Also, there is the kern.module.autoload sysctl that can
prevent any module from autoloading.

}-- End of excerpt from Artem Falcon


Re: How to hot swap an SCA SCSI disk with NetBSD

2013-10-26 Thread John Nemeth
On Oct 25,  2:20pm, Mouse wrote:
}
} > Generally speaking, SCA SCSI drives are hot-swap capable.
} 
} Sure...but the drive bays aren't necessarily.  For example, the drive
} bay in a SS20 probably isn't; you can't even get to it without removing
} the lid, so there'd've been little reason for Sun to spend the money
} for the signal switching hardware to make it hotswap.
} 
} > I'm not interested in fiddling with 50-pin or 68-pin with a paused machine 
-$
} 
} Actually, with a _paused_ machine, IME - I M limited E - it's fine.
} It's doing so on an active SCSI bus, one with transfers going on, that
} I was saying was a recipe for trouble.

 With SCA, or anything else that is designed for hotswap, the
ground pins are longer then the other pins.  This means that ground
disconnects last and connects first.  This prevents spikes.
Hotswapping with connectors that aren't designed for it can cause
physical damage to equipment, and thus is not generally recommended.

} > The key thing in documentation is not just how, but why.
} 
} > For example, why "scsictl  detach"?  Why not just "stop" and
} > remove?
} 
} Personally?  The reasons which occur to me offhand:

 SCA is just a type of connector.  As far as I know, there are
no extra signals (in particular there is no way to signal the OS
that the device was removed).

} Because doing that doesn't get the teardown and rebuild I mentioned
} upthread.  Because not all the scsictl versions I have in use support
} stop.  Beacuse I'm not always replacing it with an identical drive (or,
} sometimes, at all).
} 
} > The idea here is to document a procedure generally. Odds are good lots of 
it$
} 
} Yeah - everything but the physical-layer stuff, I'd guess.
} 
} (SAS, gh)
} 
}-- End of excerpt from Mouse


module path message

2013-10-30 Thread John Nemeth
 I've made a patch to the module subsystem to print the default
module load path during initialisation.  The reason for doing this
is that certain arch/machine combos don't work with the standard
modules for their archs and require custom built modules.  This is
the case for several evbppc variants and xen.  The evbppc variants
are already working.  I'm working on getting xen modules working.
I have the modules building and the kernel finds the correct modules,
but has problems loading them.  Anyways, here's a sample of the
dmesg showing the new message.  Let the flamewar about the message
and when it should be displayed begin...

NetBSD 6.99.25 (XEN3_DOMU) #3: Tue Oct 29 19:07:29 PDT 2013

jnemeth@P4-3679GHz:/usr/local/NetBSD-current/amd64-xenmod-objdir/sys/arch/amd64/compile/XEN3_DOMU
total memory = 512 MB
avail memory = 486 MB
The default path for module loading is: /stand/amd64-xen/6.99.25/modules
mainbus0 (root)
hypervisor0 at mainbus0: Xen version 4.2.3
...


Re: module path message

2013-10-30 Thread John Nemeth
On Oct 30, 11:00am, Alan Barrett wrote:
} On Tue, 29 Oct 2013, John Nemeth wrote:
} >The default path for module loading is: /stand/amd64-xen/6.99.25/modules
} 
} I suggest exposing the path via sysctl, and printing the sysctl 
} mib name in the message, something like
} 
}   kern.module.path=/stand/amd64-xen/6.99.25/modules

 Good idea, then it's easily accessible at run time.  Of course,
with that, it doesn't have to be printed.

}-- End of excerpt from Alan Barrett


Re: module path message

2013-10-30 Thread John Nemeth
On Oct 30, 12:40pm, Marc Balmer wrote:
} Am 30.10.13 10:00, schrieb Alan Barrett:
} > On Tue, 29 Oct 2013, John Nemeth wrote:
} >> The default path for module loading is: /stand/amd64-xen/6.99.25/modules
} > 
} > I suggest exposing the path via sysctl, and printing the sysctl mib name
} > in the message, something like
} > 
} > kern.module.path=/stand/amd64-xen/6.99.25/modules
} 
} If that variable is to be writable, it has to be somehow integrated with
} kauth, so that it can not be changed when the kauth equivalent of a
} raised securelevel is in place.

 It will be read only for now.

}-- End of excerpt from Marc Balmer


Re: zero-length symlinks

2013-11-03 Thread John Nemeth
On Nov 3,  2:57pm, Sverre Froyen wrote:
} On 2013-11-03, at 11:47, Hubert Feyrer  wrote:
} > On Sat, 2 Nov 2013, David Holland wrote:
} >> > I think "not sensible" is not a good enough reason to prohibit
} >> > something.
} >> 
} >> Yeah yeah, but still nowadays we don't allow adding hard links to
} >> directories. So while that's a valid premise, it's not universal.
} > 
} > FWIW, the idea not allowing hard links to directories is that
} > ".." wouldn't be unique any more. I don't see such a thing with
} > a symlink pointing to ".
} 
} On Unix System V, the link command would allow hard-linking
} directories when used as root. A quick test shows that NetBSD
} does not allow this. Was the feature removed from NetBSD (or BSD)
} at some point or was it an addition to Bell Labs Unix after
} Berkeley received the Bell Labs sources? Perhaps a feature unique
} to the v7 file system.

 It has to do with the fact that historically mkdir(2) was
actually mkdir(3), it wasn't an atomic syscall and was a sequence
of operation performed by a library routine.  The library routine
called link(2) to hook the new directory into the directory tree.
Once mkdir(2) was created and the kernel became responsible for
everything link(2) lost the ability to create hard links to
directories.  The reason being that hard links to directories means
that the tree of directories is no longer a DAG and that causes
serious problems for the tree traversing code.

 I don't know at what point this happened in BSD, but certainly,
it was long before NetBSD came on the scene.  BTW, I doubt that
modern System V, i.e. SVR4 would allow you to make hard links to
directories (that capability probably went away somewhat before
SVR4 came about).

}-- End of excerpt from Sverre Froyen


Re: A Library for Converting Data to and from C Structs for Lua

2013-11-17 Thread John Nemeth
On Nov 17, 11:02pm, Marc Balmer wrote:
} Am 17.11.13 20:40, schrieb Lourival Vieira Neto:
} > On Sun, Nov 17, 2013 at 4:39 PM, David Holland  
wrote:
} >> On Sun, Nov 17, 2013 at 01:32:03PM +0100, Hubert Feyrer wrote:
} >>  > >I plan to import it and to make it available to both lua(1) and lua(4)
} >>  >
} >>  > I wonder if we really need to get all this into NetBSD,
} >>  > instead of moving it to pkgsrc somehow.
} >>
} >> This...
} > 
} > I think that would be nice to have Lua kernel modules in pkgsrc, if 
possible.
} 
} No, I don't think so.  They interact to much with the system, they need
} to be part of the system.

 Uh, no.  The whole idea behind modules clearly means being
able to use third party code.  We should be able to have modules
in pkgsrc.  There are no modules in pkgsrc yet, but that's just a
matter of figuring out the best way to do it.  There is no reason
why all modules must be included with the system.

}-- End of excerpt from Marc Balmer


Re: in which we present an ugly hack to make sys/queue.h CIRCLEQ work

2013-11-23 Thread John Nemeth
On Nov 23,  2:16pm, Dennis Ferguson wrote:
} On 22 Nov, 2013, at 21:40 , David Holland  wrote:
} >> So ... looking at this code ... it seems like the core problem is that
} >> TAILQ_HEAD and TAILQ_ENTRY are two different types (even though they
} >> literally the same structure layout).  So if TAILQ_HEAD and TAILQ_ENTRY
} >> were the same structure, it wouldn't be an issue.  It doesn't quite leap
} >> out to me how that would be possible without changing the API a bit.
} > 
} > I think it can be done by sticking an anonymous union into TAILQ_HEAD,
} > but of course anonymous unions aren't supported until C11.
} 
} It isn't perfectly clear to me that this code has an aliasing problem
} the way it is, though.  The only thing that matters in the standard are
} the types of the lvalue expressions used to access object in storage.  The
} lvalue expression types used to access the objects in storage in this
} case are 'type **', 'type **' and 'type *', which are the types those

 "type **" and "type *" are not the same types.

} objects were stored with and the types that would be used for other
} accesses to the same locations.  The structure type used to arrive there
} should only matter if it is the type of an lvalue expression itself,
} e.g. *(struct foo *)ptr(?).
} 
} I would be interested in knowing an actual example of the comparison
} problem with the CIRCLEQ macro, if the concern isn't theoretical.  Since

 Uh, do you really think people would be doing all this work
for something that was theoretical?  The problem is that gcc 4.8
optimises out the comparison as being always false due to the
anti-alias rule.

} the C standard explicitly allows a pointer to a structure type to be
} converted to the type of its first member and back, to another structure
} type and back, or to char * or void * and back, the fact that the two

 I rather doubt that you can convert to a different structure type
and back.  Those would definitely be different objects.

} pointers point at different structure types is by itself insufficient to
} prove that they would not compare equal when suitably converted.  It seems
} like that conclusion would minimally need to depend on proving that there
} was no possible use of the structure pointers which wouldn't violate the
} aliasing requirements, i.e. that that are no structure members at the same
} offsets which have compatible types.  That's a rather aggressive optimization,
} and is kind of like throwing you in jail for a crime you haven't actually
} committed yet (though I guess that happens too).
} 
}-- End of excerpt from Dennis Ferguson


Re: in which we present an ugly hack to make sys/queue.h CIRCLEQ work

2013-11-24 Thread John Nemeth
On Nov 24,  5:25am, Mouse wrote:
} 
} Well, mrg wrote, when starting the thread,
} 
} < while preparing to update to GCC 4.8 i discovered that our
} < sys/queue.h CIRCLEQ macros violate C aliasing rules, ultimately
} < leading to the compiler eliding comparisons it declared as always
} < false.
} 
} which sure looks to me as though it's not just theoretical.  (I don't
} know personally; mrg's mail implies this was with gcc 4.8, which I
} don't run.)

 The work has now changed to GCC 4.8.2.  It is being prepped
for import.  The compiler work is basically done.  At this point,
it is mostly making sure that NetBSD builds and runs with it.
Since I'm not doing the work, I don't have a timeline, but it
shouldn't be too much longer (FSVO much longer).  This means that
sometime in the not too distant future anybody running -current,
or anybody that runs NetBSD 7.0 when it is released will be using
GCC 4.8.2 or later.

}-- End of excerpt from Mouse


Re: The lamentation of proplib(3)

2014-01-28 Thread John Nemeth
On Jan 28,  7:40pm, Christian Koch wrote:
} On Tue, Jan 28, 2014 at 06:44:57PM +, Mindaugas Rasiukevicius wrote:
} > and my own dissatisfaction has reached the point where I decided to raise
} > the question.  The question of replacing proplib(3) with a better library.
} > There were ideas by some developers to write a new library from scratch.
} > The FreeBSD project has recently developed a general purpose key-value pair
} > library, which is quite similar to nvpair library in Solaris.
} 
} Isn't proplib(3) quite heavily used throughout the system, both
} kernel space and user space?  It won't be a trivial task to fully

 It is.

} make this change, is all I'm saying.

 Definitely.  Also, nvlist doesn't address one of the significant
uses of proplib.

} I say don't get rid of proplib(3) entirely, how about moving it
} to pkgsrc at least?

 Something that is heavily used throughout the system can not
be moved to pkgsrc.  Pkgsrc is an addon, not part of the base
system.  Thus nothing in the base system can be dependent upon
pkgsrc to function.

}-- End of excerpt from Christian Koch


Re: Closing a serial device takes one second

2014-02-06 Thread John Nemeth
On Feb 6,  1:22pm, Dennis Ferguson wrote:
} On 6 Feb, 2014, at 12:18 , Marc Balmer  wrote:
} > Actually the one second delay is wrong.  If you want to de-assert DTR
} > for a modem to hangup, then do it in the application.
} 
} You've clearly not run a bank of dial-in/out modems on a multiuser

  That's why Telebit Netblazers and Livingston Portmasters were
invented.

} I'm personally undisturbed by removing it, rather than fixing it, only
} because I don't know anyone who still uses dialup modems like that and
} I only remember this because I am old.  For the things I do use serial

 Does mentioning equipment from long dead companies make me old?

}-- End of excerpt from Dennis Ferguson


Re: asymmetric smp

2014-04-02 Thread John Nemeth
On Apr 2,  1:55pm, Johnny Billquist wrote:
} On 2014-04-01 23:04, Warner Losh wrote:
} > On Apr 1, 2014, at 5:49 AM, Johnny Billquist  wrote:
} >
} >> Good points.
} >> Is this the right time to ask why booting NetBSD on a VAX (a 3500) now 
takes more than 15 minutes? What is the system doing all that time???
} >
} > FreeBSD used to take forever to boot on certain low-end ARM CPUs with 
/etc/rc.d after it was imported from NetBSD. This was due to crappy root-device 
performance (100kB/s is enough for anybody, right?) and crappy, at the time, 
pmap code that caused excess page traffic in the /etc/rc.d environment. Perhaps 
those areas would be fruitful to profile? Also, there were some inefficiencies 
that were either the result of a botched port, or were basic to the system that 
got fixed. Between fixing all these things, the boot time went from 10 minutes 
down to ~20s.
} 
} Always nice with some ideas. The problem here is that this used to be 
} way faster in the past, but have slowed down recently.
} 
} The time between entering a username and getting the password prompt in 
} the same 3500 with the latest release is something like 30 seconds.
} 
} This is on an otherwise idle system, where boot has completed. 30 
} seconds (approximately, I should time it) just from pressing enter after 
} the username, until I just get the "Password:" prompt seems incredible 
} to me.
} 
} The root fs in on nfs, as I'm running the machine diskless. Disk is 
} served from a -current NetBSD/alpha system sitting right next to it. And 
} I have changed the Alpha to run at 10 MB/s half duplex, and I have 2k 
} block size for NFS. Login is obviously already running, since that is 
} what also prompts for the username, and doing it twice should even put 
} some stuff in local cache.

 Uh, actually getty does the initial prompt for username on
the console.  After collecting the username, getty execs login.

}-- End of excerpt from Johnny Billquist


Re: MI linker script

2014-11-08 Thread John Nemeth
On Nov 9,  1:25am, Masao Uebayashi wrote:
} On Sat, Nov 8, 2014 at 11:53 PM, Christos Zoulas  wrote:
} > depending on ld -r to work properly
} 
} I know none of you trust me, but you don't trust ld -r?

 It has nothing to do with trust.  It's more like wanting to
know what the heck is going on.  Normally major work like this
would start with a discussion or at least an announcement of the
plan.  Instead all that happened is suddenly we see a major overhaul
of a critical item with no clue as to why.

 So, what is the plan?  Why are you doing this?  What are your
goals (i.e. what is the expected end result)?  What are you doing
with modules?

}-- End of excerpt from Masao Uebayashi


Re: MI linker script

2014-11-08 Thread John Nemeth
On Nov 9, 10:35am, Masao Uebayashi wrote:
} On Sun, Nov 9, 2014 at 5:07 AM, John Nemeth  wrote:
} > On Nov 9,  1:25am, Masao Uebayashi wrote:
} > } On Sat, Nov 8, 2014 at 11:53 PM, Christos Zoulas  
wrote:
} > } > depending on ld -r to work properly
} > }
} > } I know none of you trust me, but you don't trust ld -r?
} >
} >  It has nothing to do with trust.  It's more like wanting to
} > know what the heck is going on.  Normally major work like this
} > would start with a discussion or at least an announcement of the
} > plan.  Instead all that happened is suddenly we see a major overhaul
} > of a critical item with no clue as to why.
} >
} >  So, what is the plan?  Why are you doing this?  What are your
} > goals (i.e. what is the expected end result)?  What are you doing
} > with modules?
} 
} Something like this:
} https://mail-index.netbsd.org/tech-kern/2012/05/28/msg013235.html
} 
} In short: making kernel build better by sharing *.o

 The question wasn't simply about "ld -r" stuff.  It was about
the entire program of config(1) changes, linking changes, module(9)
changes, etc.  There's an awful lot of stuff happening to major
parts of the system without any discussion.

}-- End of excerpt from Masao Uebayashi


Re: MI linker script

2014-11-09 Thread John Nemeth
On Nov 9, 11:52am, Masao Uebayashi wrote:
} On Sun, Nov 9, 2014 at 11:22 AM, John Nemeth  wrote:
} >  The question wasn't simply about "ld -r" stuff.  It was about
} > the entire program of config(1) changes, linking changes, module(9)
} > changes, etc.  There's an awful lot of stuff happening to major
} > parts of the system without any discussion.
} 
} "The entire program of config(1)" is a bit too exaggerated.  I'm
} rather hunting low-hanging fruits.

 By "program" I didn't mean config(1), I meant what you're
doing.  And, what you are doing appears to be a lot more then just
"hunting low-hanging fruits."

}-- End of excerpt from Masao Uebayashi


Re: kernel constructor

2014-11-11 Thread John Nemeth
On Nov 12,  1:46am, Masao Uebayashi wrote:
} On Wed, Nov 12, 2014 at 1:15 AM, Kamil Rytarowski  wrote:
} > From David Holland
} >> Please don't do that. Nothing good can come of it - you are asking for
} >> a thousand weird problems where undisclosed ordering dependencies
} >> silently manifest as strange bugs.
} 
} Everyone is aware of that.  Code conversion must be done extremely
} carefully.  Order must be preserved.
} 
} >> Furthermore, the compiler can and probably will assume that
} >> constructor functions get called before all non-constructor code, and
} >> owing to unavoidable issues in early initialization this will not be
} >> the case in some contexts. (I first hit this problem back in about
} >> 1995ish when some more gung-ho colleagues were trying to use C++
} >> global constructors in a C++ kernel, and we eventually had to declare
} >> a moratorium on all global constructors.)
} 
} Thanks, but irrelevant for kernel...
} 
} >> init_main.c could use some tidying, but there's nothing fundamentally
} >> wrong with it that will be improved by adding a lot of implicit magic
} >> that doesn't do what the average passerby expects.
} 
} Function pointers are not magic.
} 
} (snip)
} > And last but not least... what's wrong with init_main.c? It must be clear 
for a developer adding a new platform or debugging hardware bring-up. It gives 
me big picture on that what's going on step-by-step, even when I was lurking 
into assembly of our kernel... call it, call that, call this.. making it all 
clear.
} 
} Those functions are hardcoded and ordered even without dependencies
} among them, that's a big problem.

 Without dependencies?!?  The ordering gives the dependencies.

} The biggest problem of constructors (and indirect function call in
} general), I am aware of, is, static code analysis (code reading, tag
} jump, ...) becomes difficult (or impossible).

 Considering that we're talking about the kernel, this is an
extremely huge flaw!  As in, DON'T do it!

}-- End of excerpt from Masao Uebayashi


Re: disk driver interface

2014-12-29 Thread John Nemeth
On Dec 29,  3:00am, Michael van Elst wrote:
} 
} Currently NetBSD has three programming interfaces to determine
} disk geometry from userland.
} 
} - ioctl DIOCGDINFO. The traditional interface, limited to 32bit
}   numbers or disks < 2TB because its data structure corresponds
}   to the binary on-disk structure.
} 
} - the "get-properties" command to the drvctl(4) driver. drvctl(4)
}   is missing on some ports and some disk drivers don't make
}   geometry properties available.
} 
} - ioctl DIOCGWEDGEINFO. Works only for wedges but not for the
}   disk drivers themselves. This is fine for operations on
}   data blocks of a wedge but doesn't help e.g. partitioning
}   tools. It also does not provide the sector size.
} 
} To solve this, we could
} 
} - create a new DIOCGDINFO version that uses larger numbers. AFAIK
}   that is about what OpenBSD does. The on-disk structure could be
}   translated but writing a label might be incompatible if partitions
}   are defined beyond the 2TB limit.
} 
} - make drvctl(4) mandatory and make all disk drivers provide
}   geometry properties.

 I would tend to go with this since it is used for a lot more
then just getting the geometry of a drive.

} - make DIOCGWEDGEINFO available for the disk drivers and
}   ignore wedge-related information.
} 
} - import FreeBSD DIOCGMEDIASIZE (and DIOCGSECTORSIZE) ioctls.
} 
} 
} Comments?

 I really don't care about this silly little issue.  But, as
a side note, I will note that gpt(8) (which originated this thread)
came from FreeBSD so it already has support for the FreeBSD ioctls
and would use them in preference to drvctl(4) method if they existed.

}-- End of excerpt from Michael van Elst


Re: disk driver interface

2014-12-29 Thread John Nemeth
On Dec 29,  4:46pm, Christos Zoulas wrote:
} In article ,
} Michael van Elst  wrote:
} >
} >Currently NetBSD has three programming interfaces to determine
} >disk geometry from userland.
} >
} >- ioctl DIOCGDINFO. The traditional interface, limited to 32bit
} >  numbers or disks < 2TB because its data structure corresponds
} >  to the binary on-disk structure.
} >
} >- the "get-properties" command to the drvctl(4) driver. drvctl(4)
} >  is missing on some ports and some disk drivers don't make
} >  geometry properties available.
} >
} >- ioctl DIOCGWEDGEINFO. Works only for wedges but not for the
} >  disk drivers themselves. This is fine for operations on
} >  data blocks of a wedge but doesn't help e.g. partitioning
} >  tools. It also does not provide the sector size.
} 
} Actually there is also:
}  - ioctl DIOCGDISKINFO. This is supposed to work for all kinds of
}disks but it returns a plist, and it is a pain to use.

 A semi-quick look around shows that pretty much everything
that would support the drvctl(4) method would also support the
DIOCGDISKINFO method.  Both methods return the same proplib dictionary
for disk geometry info.  So perhaps the DIOCGDISKINFO method should
always be used in preference to the drvctl(4) method.

 As far as I know, the only drivers that don't support drvctl(4)
and DIOCGDISKINFO are ccd(4) and cgd(4).  They should just be fixed.
Then DIOCGDISKINFO can be used always with everything else relegated
to compat.  Also src/sbin/fsck/partutil.* should probably be moved
to libutil as they appear to be of general utility, instead of
having random utilities pulling in parts of fsck.

} >To solve this, we could
} >
} >- create a new DIOCGDINFO version that uses larger numbers. AFAIK
} >  that is about what OpenBSD does. The on-disk structure could be
} >  translated but writing a label might be incompatible if partitions
} >  are defined beyond the 2TB limit.
} 
} I think we should decide on a single API/interface to get general
} information about disk devices. If a "big" DIOCGDINFO is that,
} fine.  But we decided it was not providing enough information a
} while ago and so we got DIOCGDISKINFO. Providing a "big" DIOCGDINFO
} would allow us to have compatibility with OpenBSD and bring a 70's
} technology to the 21st century.

 It's a dead technology.  Besides, for real OpenBSD compability
we would have to deal with their on-disk changes as well.

} >- make drvctl(4) mandatory and make all disk drivers provide
} >  geometry properties.
} 
} Well, I don't particularly like to have to go through an auxiliary
} driver to get information that should be readily available from
} the direct driver, but we could consider making drvctl mandatory.
} The only problem would be "small" kernels.
} 
} >- make DIOCGWEDGEINFO available for the disk drivers and
} >  ignore wedge-related information.
} 
} Well, we have DIOCGDISKINFO... which provides the kitchensink, but
} it is hard to use. I think it is a demonstration on how a fully
} generalized API that provides everything loses because of programming
} complexity. Having said that, for the most part (getting struct
} disk_geom out of it), it works once abstracted (see partutil.[ch]
} in sbin/fsck/). Perhaps adding a DIOCGDISKGEOM that returns just
} disk_geom would be nice to have and can replace DIOCGDINFO.

 DIOCGDISKGEOM could easily be added to
src/sys/kern/disk_subr.c:disk_ioctl(), then all drivers that support
DIOCGDISKINFO would automatically support DIOCGDISKGEOM.

} >- import FreeBSD DIOCGMEDIASIZE (and DIOCGSECTORSIZE) ioctls.
} 
} I would do that anyway, since it is simple and most things just
} need those two numbers.

 These ioctls could probably also be added to
src/sys/kern/disk_subr.c:disk_ioctl().  Any disk drivers that don't
call that function should be fixed.

}-- End of excerpt from Christos Zoulas


Re: disk driver interface

2014-12-29 Thread John Nemeth
On Dec 30,  6:42am, David Holland wrote:
} On Tue, Dec 30, 2014 at 02:50:14AM +, Christos Zoulas wrote:
}  > In article <20141229233211.ga10...@netbsd.org>,
}  > David Holland   wrote:
}  > >
}  > >It might be a good idea to do this for our own use, but probably it
}  > >shouldn't be a 3rd-party interface. (Unless we decide like the look of
}  > >it, I guess.)
}  > >
}  > >Although I'm not real thrilled about multiplying uses of proplib...
}  > 
}  > This is why I said let's add DIOCGDISKGEOM to avoid proplib and 
DIOCGDISKINFO.
} 
} Because it's better to multiply ioctl entities? :-)

 Before we go adding ioctl entities all over the place, we
should probably find out what other OSes are doing.  We've already
added a couple from FreeBSD.  The question is, what else is out
there that may satisfy our needs?

} (I suppose it in fact is...)

 I'm not sure I agree.  But, then I don't have the same hate-on
for proplib that others seem to have.

}-- End of excerpt from David Holland


Re: disk driver interface

2014-12-30 Thread John Nemeth
On Dec 29,  9:28pm, Christos Zoulas wrote:
} On Dec 29,  4:11pm, jnem...@cue.bc.ca (John Nemeth) wrote:
} 
} |  A semi-quick look around shows that pretty much everything
} | that would support the drvctl(4) method would also support the
} | DIOCGDISKINFO method.  Both methods return the same proplib dictionary
} | for disk geometry info.  So perhaps the DIOCGDISKINFO method should
} | always be used in preference to the drvctl(4) method.
} 
} I think that using it directly makes sense. If you want you can
} delete the drvctl and partutil code in gpt. Now that we have both
} the ioctls and the DIOCGDISKINFO code, doing the same thing 3
} different ways does not make a lot of sense, except to demonstrate
} we (like perl) have many different ways of doing the same thing but
} with varying complexity and possibility of error.

 I want to pullup gpt(8) to all branches, so now I have to
figure out what to do with it.  I'm thinking I might just pullup
everything before the recent change.  Given that, I can just blow
away the drvctl(4) stuff.

} |  As far as I know, the only drivers that don't support drvctl(4)
} | and DIOCGDISKINFO are ccd(4) and cgd(4).  They should just be fixed.
} | Then DIOCGDISKINFO can be used always with everything else relegated
} | to compat.  Also src/sbin/fsck/partutil.* should probably be moved
} | to libutil as they appear to be of general utility, instead of
} | having random utilities pulling in parts of fsck.
} 
} Michael fixed cgd and I fixed ccd. I am not sure about getdiskinfo(),

 I saw that you added a call to disk_ioctl() to ccd.  I'm just
not sure what you expected it to do, given that the struct disk_geom
wasn't filled in.  I just fixed that problem.

} the API is clumsy. If is what I found useful when converting the
} individual fsck and dump utilities to wedges. It should and could
} be improved.  getdisksize() on the other hand can be abstracted to
} the two new ioctls() + opendisk() now...
} 
} | } I think we should decide on a single API/interface to get general
} | } information about disk devices. If a "big" DIOCGDINFO is that,
} | } fine.  But we decided it was not providing enough information a
} | } while ago and so we got DIOCGDISKINFO. Providing a "big" DIOCGDINFO
} | } would allow us to have compatibility with OpenBSD and bring a 70's
} | } technology to the 21st century.
} | 
} |  It's a dead technology.  Besides, for real OpenBSD compability
} | we would have to deal with their on-disk changes as well.
} 
} Right, this is probably too much work for too little gain.
} 
} | } in sbin/fsck/). Perhaps adding a DIOCGDISKGEOM that returns just
} | } disk_geom would be nice to have and can replace DIOCGDINFO.
} | 
} |  DIOCGDISKGEOM could easily be added to
} | src/sys/kern/disk_subr.c:disk_ioctl(), then all drivers that support
} | DIOCGDISKINFO would automatically support DIOCGDISKGEOM.
} 
} Yes, then we don't need all the plist crap in partutil.c, since the
} only thing that partutil uses from the plist is geometry. I think that
} we should add this ioctl and not need to go through the hoops of
} extracting the geometry from the plist now.
} 
} | } >- import FreeBSD DIOCGMEDIASIZE (and DIOCGSECTORSIZE) ioctls.
} | } 
} | } I would do that anyway, since it is simple and most things just
} | } need those two numbers.
} | 
} |  These ioctls could probably also be added to
} | src/sys/kern/disk_subr.c:disk_ioctl().  Any disk drivers that don't
} | call that function should be fixed.
} 
} Michael did that already.
} 
}-- End of excerpt from Christos Zoulas


Re: jit code and securelevel

2015-01-01 Thread John Nemeth
On Jan 1,  8:34pm, Alexander Nasonov wrote:
} Subject: Re: jit code and securelevel
} Christos Zoulas wrote:
} > On Jan 1,  6:21pm, al...@yandex.ru (Alexander Nasonov) wrote:
} > | They might spot use-after-free bug and reuse freed memory for bpf_d
} > | object which has a pointer to jit code.
} > 
} > The exploit takes advantage of being able to insert particular code
} > sequences that have different meanings at different code offsets (which
} > can happen naturally too -- there is a paper that describes such attacks),
} > and depends on other kernel bugs to be functional.
} 
} A hypothetical use-after-free bug alone wouldn't let you jump to
} a different offset, but those guys are very creative. If they ever
} succeed in exploiting a system with a help of bpfjit code, I'd very
} interested in details ;-)
} 
} > At the same time killing
} > jit at securelevel 1 it is not really fatal with the exception on npf.
} > 
} > Perhaps having a sysctl to enable/disable it that can only be enabled
} > at a low securelevel can let people choose the behavior they want.
} 
} I implemented it, see below, but I feel it's not right to query
} securelevel directly, adding new KAUTH_SYSTEM_BPFJIT would be
} a better approach. Not sure it's worth the effort.

 Keep in mind that securelevel is only one many possible security
models.  A different security model could be loaded that doesn't
have securelevel or an analogue.  Poking around in the guts of a
security model is extremely bad form.

}-- End of excerpt from Alexander Nasonov


Re: Specification of BTINFO_CONSOLE value in bootinfo.h

2015-06-03 Thread John Nemeth
On Jun 3,  9:36am, deco33...@yandex.com wrote:
} 
} I was reading the boot code to make netbsd multiboot compliant.

 Uh, it already is, or should be.  See sys/arch/i386/i386/multiboot.c.
I'm not certain if that is used for amd64.  But, if not, it would
probably be the place to start.

} What defines those values in arch/x86/include/bootinfo.h, e.g.
} BTINFO_CONSOLE,BTINFO_BOOTDISK.. is it the MBR ? the netbsd
} bootloader ?  I mean, the value of 6 for BTINFO_CONSOLE can be
} found in which specification ? Could not find out.

 None.  The definition is bootinfo.h.  Anyways, I don't believe
it's relevant to multiboot anyways, as that passes a string, not
a struct.

}-- End of excerpt from deco33...@yandex.com


Re: Specification of BTINFO_CONSOLE value in bootinfo.h

2015-06-03 Thread John Nemeth
On Jun 3,  9:57am, deco33...@yandex.com wrote:
} Thanks but,
} 
} lookup_bootinfo(BTINFO_CONSOLE); -> initiate the console.

 This stuff is related to the NetBSD native bootloader and has
absolutely nothing to do with multiboot.

}-- End of excerpt from deco33...@yandex.com


Re: Specification of BTINFO_CONSOLE value in bootinfo.h

2015-06-03 Thread John Nemeth
On Jun 3, 10:09am, Sheda wrote:
}
} I confirm the amd64 port is not multiboot compliant,
} arch/amd64/amd64/locore.S lake the multiboot header you can see in its
} i386 counterpart:
} http://fxr.watson.org/fxr/source/arch/i386/i386/locore.S?v=NETBSD#L261

 I did a quick bit of digging and found a copy of the multiboot
specification.  It contains this sentence, "This specification is
targeted toward free 32-bit operating systems that can be fairly
easily modified to support the specification without going through
lots of bureaucratic rigmarole."  Is there information somewhere
about how multiboot works on 64-bit systems.

 One thing I'll note, is that the bootloader normally starts
the amd64 port in 32-bit mode.  During kernel startup, it switches
the system to 64-bit mode.

}-- End of excerpt from Sheda


Re: mount_checkdirs

2015-07-08 Thread John Nemeth
On Jul 9, 12:27am, Rhialto wrote:
} On Mon 06 Jul 2015 at 09:58:59 +, David Holland wrote:
} 
} > Also it's occasionally useful to mount over things and leave a process
} > underneath, which this logic seems to complicate.
} 
} If I read the code correctly, it looks for processes that have a current
} working directory or root directory exactly at the mount point. But the
} mount point directory does not need to be empty. A process could have a
} cwd or root in any directory inside it. So as-is, the code is
} insufficient for its intended purpose anyway.
} 
} Furthermore, the process can have open files from that directory tree.
} If its cwd or root gets changed (and into what exactly, if it isn't the
} exact mount point?) it has files open that it can't find anymore with
} another call to open(2). That seems like an inconsistency that we may
} want to avoid due to the POLA.

 The same process or another process could unlink the open
file.  There is no guarantee of being to open(2) a file twice.

}-- End of excerpt from Rhialto


Re: Choice of SAS controller

2015-07-16 Thread John Nemeth
On Jul 16,  3:27pm, Edgar =?iso-8859-1?B?RnXf?= wrote:
}
} > I have been using Areca RAID controllers for several years now and 
} > I have been pretty happy with them.
} Can you drop me a part number?
} The intersection between devices supported by NetBSD and those actually 
} still available on the market seems to be aproximately empty.
} 
} Of course, it may be as trivial as adding PCI IDs to sys/dev/pci/arcsmr.c 
} to add support for newer controllers.

 I'm using this one:

arcmsr0: Areca ARC-1680 Host Adapter RAID controller
arcmsr0: 8 ports, 2048MB SDRAM, firmware 

}-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=


Re: POSIX.1 semaphores vs message queues

2015-11-07 Thread John Nemeth
On Nov 8,  7:22am, Paul Goyette wrote:
} On Sat, 7 Nov 2015, Joerg Sonnenberger wrote:
} > On Sun, Nov 08, 2015 at 06:35:36AM +0800, Paul Goyette wrote:
} >> On Sat, 7 Nov 2015, Joerg Sonnenberger wrote:
} >>> On Sat, Nov 07, 2015 at 10:55:49AM +0800, Paul Goyette wrote:
}  I'd like to understand the rationale that makes POSIX sempahores a
}  non-optional component of the kernel, while POSIX message queues are
}  still optional.  Both seem to be related specifically to use in the
}  librt real-time library.
} >>>
} >>> Semaphores are used quite a lot and not only required by librt, but
} >>> also by libpthread. I'm not sure what is using message queues.
} >>
} >> Hmmm, sounds like a great reason to include the semaphore code in
} >> every kernel by default.  But it doesn't sound sufficiently critical
} >> to _prevent_ it from being removed from custom kernels if explicitly
} >> requested by the user.
} >>
} >> I'd like to suggest that this code once again become an option.  Rather
} >> than adding an option to every kernel configuration file, however, we
} >> can simply add it to src/sys/conf/std where it will get included by
} >> default, in the same manner as MQUEUE.  (I also propose use of "option
} >> SEMAPHORE" rather than P1003_1B_SEMAPHORE, similar to MQUEUE.)
} >
} > I don't see the point in having options for every single system call or
} > the like. At best, it is a form of modularity masturbation and at worst,
} > it is asking for difficult to analyze bugs when someone actually insists
} > on removing them.
} 
} I do understand your position.  And I'm well aware of how difficult it
} can be to analyze any bugs that get introduced.  (Refer my recent issues
} that resulted from fixing the module dependencies for compat_netbsd32,
} or the issue with SYSVSEM, which took a couple of weeks to locate and
} fix.)
} 
} This isn't a request to modularize a single syscall, it's a complete set
} of ten syscalls for a self-contained set of functionality on which there
} are no other kernel or modular dependencies.  There is no functional
} impact on anyone who uses standard kernels.  I only impacts those who
} explicitly request the exclusion of this code from their kernels, and in
} the exact same manner as requesting the exclusion of MQUEUE or AIO.
} (And yes, I run with both of those removed from my kernels, loading the
} modules on-demand.)
} 
} Based on the (lack of) commentary I received in my recent bug-hunts, it
} seems that very few people would care about re-modularizing ksem.  I'm
} willing to do all the work (actually, it's already done, except for
} testing and fixing any bugs I find).
} 
} I'd really appreciate comments from others

 In general, I like the idea of modules.  However, in this
case, I pretty much agree with Joerg and have to ask, what is the
point of modularising basic functionality?  Is having it in the
kernal all the time causing some kind of issue?

}-- End of excerpt from Paul Goyette


Re: POSIX.1 semaphores vs message queues

2015-11-09 Thread John Nemeth
On Nov 9,  8:05am, Paul Goyette wrote:
} On Sun, 8 Nov 2015, Masao Uebayashi wrote:
} 
} >> I don't see the point in having options for every single system call or
} >> the like. At best, it is a form of modularity masturbation and at worst,
} >> it is asking for difficult to analyze bugs when someone actually insists
} >> on removing them.
} >
} > You need subsystem dependency information for decent initialization
} > (a.k.a. kctors) ordering anyway.
} >
} > I also think that creating (practically useless) reduced kernel helps
} > unit-testing.
} 
} I'm not sure about helping unit testing, but it definitely helps to
} keep the code organized.
} 
} If any bugs do arise, they should be fixed, just like any other bugs.
} One might argue that having this code as optional/removeable offers
} new opportunities for finding bugs which affect other modules and/or
} the loadable module mechanisms, and exposing and fixing these bugs
} is beneficial to the project as a whole.
} 
} So far, the only objections seem to be:
} 
}   * semaphores are pretty common, just about everyone will use
} them, so there's no reason for them to be optional or
} removeable
} 
}   * the semaphore code is fairly small, and having it as an
} optional/removable module doesn't gain very much, and
} isn't worth the effort.
} 
} Well, both EXEC_SCRIPT and COREDUMP are modularized, and they _are_ 
} optional.  I would contend that EXEC_SCRIPT is much more widely used
} than the semaphore code, and much more critical to operation of nearly
} every NetBSD kernel, yet we still build them as optional/removable
} modules.  (And yes, I actually run my "production" machine with these
} two modules - as well as AOI, MQUEUE, and EXEC_ELF64! - removed from
} the baseline kernel;  they all get auto-loaded as needed.)
} 
} Both EXEC_SCRIPT and COREDUMP are also much smaller than the ksem code;
} these two optional/removeable modules together add up to just about
} the size of a SEMAPHORE module.  (On amd64 we have exec_script weighing
} in at 1285 bytes and coredump at 3895 bytes, while ksem tips the scales
} at 5186 bytes).  There are numerous other modules which are similar in
} size to the SEMAPHORE module.
} 
} So, unless there are strenuous objections, I'm planning to resurrect
} the SEMAPHORE module in about a week.

 You've had objections.  It serves no purpose.

}-- End of excerpt from Paul Goyette


Re: POSIX.1 semaphores vs message queues

2015-11-09 Thread John Nemeth
On Nov 9,  8:29am, Paul Goyette wrote:
} On Mon, 9 Nov 2015, Joerg Sonnenberger wrote:
} > On Mon, Nov 09, 2015 at 08:05:43AM +0800, Paul Goyette wrote:
} >> Well, both EXEC_SCRIPT and COREDUMP are modularized, and they _are_
} >> optional.
} >
} > See part about modularity masturbation. Making things optional for the
} > sake of making them optional is just as wrong.
} >
} >> Both EXEC_SCRIPT and COREDUMP are also much smaller than the ksem code;
} >> these two optional/removeable modules together add up to just about
} >> the size of a SEMAPHORE module.  (On amd64 we have exec_script weighing
} >> in at 1285 bytes and coredump at 3895 bytes, while ksem tips the scales
} >> at 5186 bytes).  There are numerous other modules which are similar in
} >> size to the SEMAPHORE module.
} >
} > Add in the page alignment and the cost becomes even larger. There is
} > nothing to be gained.
} 
} The gain is flexibility, for those who may want it.  And possibly
} finding (and eventually fixing) latent bugs, either in the semaphore
} code itself or in the module system, which can be exposed by making

 This second point is silly.  The module system has been pretty
thoroughly debugged at this point.  The primary sore point now is
modules that allow themselves to be unloaded when they have hooks
into the system.

} the code optional and loadable.
} 
} We also gain consistency with existing practice, evidenced by the
} various examples I've cited.

 Some of those examples might be considered to be errors, i.e.
EXEC_ELF, as the system can't run without it.  Propogating
errors is not generaly a good thing.

}-- End of excerpt from Paul Goyette


Re: POSIX.1 semaphores vs message queues

2015-11-09 Thread John Nemeth
On Nov 9, 11:15am, Masao Uebayashi wrote:
} On Mon, Nov 9, 2015 at 9:21 AM, Joerg Sonnenberger
}  wrote:
} > On Mon, Nov 09, 2015 at 08:05:43AM +0800, Paul Goyette wrote:
} >> Well, both EXEC_SCRIPT and COREDUMP are modularized, and they _are_
} >> optional.
} >
} > See part about modularity masturbation. Making things optional for the
} > sake of making them optional is just as wrong.
} >
} >> Both EXEC_SCRIPT and COREDUMP are also much smaller than the ksem code;
} >> these two optional/removeable modules together add up to just about
} >> the size of a SEMAPHORE module.  (On amd64 we have exec_script weighing
} >> in at 1285 bytes and coredump at 3895 bytes, while ksem tips the scales
} >> at 5186 bytes).  There are numerous other modules which are similar in
} >> size to the SEMAPHORE module.
} >
} > Add in the page alignment and the cost becomes even larger. There is
} > nothing to be gained.
} 
} Please don't (intentionally) confuse module in general and dynamic loading.
} 
} For buiit-in modules, the extra size is code added by #ifdef _MODULE.
} In the long run, xxx_modcmd() functions are merged into kctors.  If

 Uh, I don't think so.  Not unless you have one heck of a good
reason.  xxx_modcmd() does more then just initialize the module.
Spreading that stuff all over the place would not be nice.  Also,
we need to be able to pass parameters to the initialization routine
and check the return code.  These are NOT fire and forget routines.

 There is a reason that planned major changes are supposed to
be discussed.  It is so that people know what is happening and to
give people a chance to point out things you might not have thought
of.  "By the way, this is what's going to happen," is not how you
start a discussion.

} other metada consume more than expected, it will be addressed and
} reconsidered.  But that goes away in !MODULAR kernels.  So virtually
} you lose nothing.
}-- End of excerpt from Masao Uebayashi


Re: POSIX.1 semaphores vs message queues

2015-11-13 Thread John Nemeth
On Nov 13,  6:34pm, Masao Uebayashi wrote:
} On Mon, Nov 9, 2015 at 7:13 PM, John Nemeth  wrote:
} > On Nov 9, 11:15am, Masao Uebayashi wrote:
} > } On Mon, Nov 9, 2015 at 9:21 AM, Joerg Sonnenberger
} > }  wrote:
} > } > On Mon, Nov 09, 2015 at 08:05:43AM +0800, Paul Goyette wrote:
} > } >> Well, both EXEC_SCRIPT and COREDUMP are modularized, and they _are_
} > } >> optional.
} > } >
} > } > See part about modularity masturbation. Making things optional for the
} > } > sake of making them optional is just as wrong.
} > } >
} > } >> Both EXEC_SCRIPT and COREDUMP are also much smaller than the ksem code;
} > } >> these two optional/removeable modules together add up to just about
} > } >> the size of a SEMAPHORE module.  (On amd64 we have exec_script weighing
} > } >> in at 1285 bytes and coredump at 3895 bytes, while ksem tips the scales
} > } >> at 5186 bytes).  There are numerous other modules which are similar in
} > } >> size to the SEMAPHORE module.
} > } >
} > } > Add in the page alignment and the cost becomes even larger. There is
} > } > nothing to be gained.
} > }
} > } Please don't (intentionally) confuse module in general and dynamic 
loading.
} > }
} > } For buiit-in modules, the extra size is code added by #ifdef _MODULE.
} > } In the long run, xxx_modcmd() functions are merged into kctors.  If
} >
} >  Uh, I don't think so.  Not unless you have one heck of a good
} > reason.
} 
} If you need only one reason: dynamically loadable modules help
} development and debugging.

 What does this have to do with xxx_modcmd()?  It's also isn't
necessarily a good enough reason to turn everything and its dog
into a module.

} > xxx_modcmd() does more then just initialize the module.
} 
} I know I know...  That sentence should have been read as: *part of*
} xxx_modcmd() *might be* merged into kctors.

 That doesn't answer the concern that module init routines take
a parameter and return a result code.  If you yank the module init
routine out of xxx_modcmd(), you remove significant functionality.

} > Spreading that stuff all over the place would not be nice.  Also,
} > we need to be able to pass parameters to the initialization routine
} > and check the return code.  These are NOT fire and forget routines.
} >
} >  There is a reason that planned major changes are supposed to
} > be discussed.  It is so that people know what is happening and to
} > give people a chance to point out things you might not have thought
} > of.  "By the way, this is what's going to happen," is not how you
} > start a discussion.
} 
} I have tried to explain the need of kctors, instead of hardcoded
} sequence of xxx_init() functions in init_main.c:main(), generated by
} dependency.

 This is truely lame.  It's not like you have to make a gazillion
calls from init_main() to each module.  One call to a module routine
causes all modules to inited.

 Also, I don't think I've seen any discussion here.  I've seen
people asking you to tell us what your intentions are, without any
kind of real response from you.

} > } other metada consume more than expected, it will be addressed and
} > } reconsidered.  But that goes away in !MODULAR kernels.  So virtually
} > } you lose nothing.
} > }-- End of excerpt from Masao Uebayashi
}-- End of excerpt from Masao Uebayashi


Re: POSIX.1 semaphores vs message queues

2015-11-13 Thread John Nemeth
On Nov 13,  7:46pm, Masao Uebayashi wrote:
} On Fri, Nov 13, 2015 at 8:05 PM, John Nemeth  wrote:
} > On Nov 13,  6:34pm, Masao Uebayashi wrote:
} > } On Mon, Nov 9, 2015 at 7:13 PM, John Nemeth  wrote:
} > } > On Nov 9, 11:15am, Masao Uebayashi wrote:
} > } > } On Mon, Nov 9, 2015 at 9:21 AM, Joerg Sonnenberger
} > } > }  wrote:
} > } > } > On Mon, Nov 09, 2015 at 08:05:43AM +0800, Paul Goyette wrote:
} > } > } >> Well, both EXEC_SCRIPT and COREDUMP are modularized, and they _are_
} > } > } >> optional.
} > } > } >
} > } > } > See part about modularity masturbation. Making things optional for 
the
} > } > } > sake of making them optional is just as wrong.
} > } > } >
} > } > } >> Both EXEC_SCRIPT and COREDUMP are also much smaller than the ksem 
code;
} > } > } >> these two optional/removeable modules together add up to just about
} > } > } >> the size of a SEMAPHORE module.  (On amd64 we have exec_script 
weighing
} > } > } >> in at 1285 bytes and coredump at 3895 bytes, while ksem tips the 
scales
} > } > } >> at 5186 bytes).  There are numerous other modules which are 
similar in
} > } > } >> size to the SEMAPHORE module.
} > } > } >
} > } > } > Add in the page alignment and the cost becomes even larger. There is
} > } > } > nothing to be gained.
} > } > }
} > } > } Please don't (intentionally) confuse module in general and dynamic 
loading.
} > } > }
} > } > } For buiit-in modules, the extra size is code added by #ifdef _MODULE.
} > } > } In the long run, xxx_modcmd() functions are merged into kctors.  If
} > } >
} > } >  Uh, I don't think so.  Not unless you have one heck of a good
} > } > reason.
} > }
} > } If you need only one reason: dynamically loadable modules help
} > } development and debugging.
} >
} >  What does this have to do with xxx_modcmd()?  It's also isn't
} > necessarily a good enough reason to turn everything and its dog
} > into a module.
} >
} > } > xxx_modcmd() does more then just initialize the module.
} > }
} > } I know I know...  That sentence should have been read as: *part of*
} > } xxx_modcmd() *might be* merged into kctors.
} >
} >  That doesn't answer the concern that module init routines take
} > a parameter and return a result code.  If you yank the module init
} > routine out of xxx_modcmd(), you remove significant functionality.
} >
} > } > Spreading that stuff all over the place would not be nice.  Also,
} > } > we need to be able to pass parameters to the initialization routine
} > } > and check the return code.  These are NOT fire and forget routines.
} > } >
} > } >  There is a reason that planned major changes are supposed to
} > } > be discussed.  It is so that people know what is happening and to
} > } > give people a chance to point out things you might not have thought
} > } > of.  "By the way, this is what's going to happen," is not how you
} > } > start a discussion.
} > }
} > } I have tried to explain the need of kctors, instead of hardcoded
} > } sequence of xxx_init() functions in init_main.c:main(), generated by
} > } dependency.
} >
} >  This is truely lame.  It's not like you have to make a gazillion
} > calls from init_main() to each module.  One call to a module routine
} > causes all modules to inited.
} 
} Are you proposing to make everything a module and always use module
} init routine?

 I am most certainly not proposing to make everything a module.
I started out in this thread by objecting to the idea that basic
functionality should be modularised.

} >  Also, I don't think I've seen any discussion here.  I've seen
} > people asking you to tell us what your intentions are, without any
} > kind of real response from you.
} >
} > } > } other metada consume more than expected, it will be addressed and
} > } > } reconsidered.  But that goes away in !MODULAR kernels.  So virtually
} > } > } you lose nothing.
} > } > }-- End of excerpt from Masao Uebayashi
} > }-- End of excerpt from Masao Uebayashi
}-- End of excerpt from Masao Uebayashi


Re: In-kernel units for block numbers, etc ...

2015-11-26 Thread John Nemeth
On Nov 27,  6:00am, Robert Elz wrote:
}
} Date:Fri, 27 Nov 2015 07:12:50 +1100
} From:matthew green 
} Message-ID:  <18094.1448568...@splode.eterna.com.au>
} 
}   | FWIW, i "fixed" raidframe on 4K disks a few years back.
} 
} Do we allow mirroring where one drive is 512 byte sectors, and the
} other is 4K ?
} 
} If so (and I'd hope the answer is yes) what happens if the 4K drive
} dies and is replaced by a 512 byte sector drive?

 I would hope the answer is no, considering how much that would
complicate things, not to mention the slow down (i.e. doing a single
sector write on one drive would require an RMW cycle on the other).

}-- End of excerpt from Robert Elz


Re: In-kernel units for block numbers, etc ...

2015-11-28 Thread John Nemeth
On Nov 29, 12:05am, Michael van Elst wrote:
} k...@munnari.oz.au (Robert Elz) writes:
} 
} >I havem't looked carefully yet, but does vnd have the RMW behaviour to
} >allow an emulated small sector drive to exist on a big sector underlying.
} 
} It doesn't need to, the backend is a file and you can access arbitrary
} byte positions. The "RMW behaviour" is what the underlying filesystem
} automatically provides.

 On a side note, if the backend is just a file, why doesn't
vnd(4) work with NFS?

}-- End of excerpt from Michael van Elst


Re: In-kernel units for block numbers, etc ...

2015-11-29 Thread John Nemeth
On Nov 29, 10:38am, Michael van Elst wrote:
} Subject: Re: In-kernel units for block numbers, etc ...
} jnem...@cue.bc.ca (John Nemeth) writes:
} 
} > On a side note, if the backend is just a file, why doesn't
} >vnd(4) work with NFS?
} 
} A quick test shows that it works with a NFS file. I don't know
} how stable that is.

 It's documented as not working, and I know from experience
that it doesn't work, unless something has changed recently.  My
test case, at least the most recent one from memory, has to do with
Xen.  I keep ISO images on a NAS.  I often want to feed an ISO
image to Xen when setting up a new domU or upgrading one.  When
Xen is told to use a file for backing store, a script sets up a
VND and then uses that as Xen really wants a device.  It doesn't
work when the ISO device is on a NAS, I have to copy it to the
dom0.  BTW, the dom0 is running 6.1.5.  I was just poking at it,
and may need to poke at it some more to try a couple of things.

}-- End of excerpt from Michael van Elst


Re: vnd.c 1.254

2016-01-16 Thread John Nemeth
On Jan 16,  7:21pm, Manuel Bouyer wrote:
}
} what problem are you trying to solve with this commit to sys/dev/vnd.c ?
} revision 1.251
} date: 2015/11/09 17:41:24;  author: christos;  state: Exp;  lines: +3 -5
} Return ENXIO if the get ioctl exceeds the number of configured devices.
} XXX: pullup-7

 The issue was that under some conditions, vnconfig -l would
loop forever, displaying:

vnd: not in use
vnd: not in use
vnd: not in use
...

I don't recall the exact trigger condition, but I have seen it happen.

} This broke vnconfig -l (and so Xen block-device scripts):
} xen1:/tmp#vnconfig -l
} vnd0: /domains (/dev/wd0f) inode 3
} vnconfig: VNDIOCGET: Device not configured

 It stops an older vnconfig with a newer kernel from looping
forever.  Exactly how old vnconfig has to be and how new the kernel
has to be is left as an exercise for the reader.  :->

} There are 7 more vnd devices in /dev/ waiting to be configured on this system.
} 
} This has been pulled up to netbsd-7 and netbsd-7-0 as part of
} ticket 1038, so vnconfig (and Xen dom0) is broken here too,
} as reported in PR 50659

 When trying to locate a free vnd(4), xl (technically, it calls
out to a script that) does this:

-
# Store the list of available vnd(4) devices in
#``available_disks'', and mark them as ``free''.
list=`ls -1 /dev/vnd[0-9]*d | sed "s,/dev/vnd,,;s,d,," | sort -n
`
for i in $list; do
disk="vnd$i"
available_disks="$available_disks $disk"
eval $disk=free
done
# Mark the used vnd(4) devices as ``used''.
for disk in `sysctl hw.disknames`; do
case $disk in
vnd[0-9]*) eval $disk=used ;;
esac
done
# Configure the first free vnd(4) device.
for disk in $available_disks; do
eval status=\$$disk
if [ "$status" = "free" ] && \
vnconfig /dev/${disk}d $xparams >/dev/null; then
device=/dev/${disk}d
break
fi
done
if [ x$device = x ] ; then
error "no available vnd device"
fi
-

It would appear that the call to vnconfig is failing.  The question
is, why?  What happens if you have 9 or fewer /dev/vnds?  My thought
here is about sort order where vnd10 would come before vnd2 and
what happens if you try to configure them out of order.

}-- End of excerpt from Manuel Bouyer


Re: vnd.c 1.254

2016-01-17 Thread John Nemeth
On Jan 17,  1:01pm, Robert Elz wrote:
}
} Date:Sat, 16 Jan 2016 23:27:51 +0100
} From:Manuel Bouyer 
} Message-ID:  <20160116222751.ga2...@asim.lip6.fr>
} 
}   | Also, you don't address the problem that, as I understand it and if
}   | the code works properly, vnconfig -l won't show free devices if the
}   | first 4 are in use.
} 
} Arguably it shouldn't show any free devices at all, otherwise, where
} should it stop?   The correct answer to "which vnd is free?" is "any
} vnd that is not is use."   Attempting to enumerate them all is folly.
} 
} The current scheme (I believe) lists a vnd as free (not in use) if
} some higher vnd is (or has been) used, and stops when the highest one
} ever used is reached.   Or at least that's the intent.   But removing
} all of the output for unused vnds would probably be a good idea.
} 
} If you want to know what is configured in /dev, then "ls /dev/vnd*d"
} will show you that, but there is no particular reason that vnd's
} (or any devices) need to exist in /dev (consider in a chroot partition,
} which might have /dev/vnd23[a-p] only)
} 
} There original problem was caused with the way vn{d}config was hacked
} to handle -l when vnd was made cloning (that lost backward compat to
} netbsd 6, which was the bug reported which the fixes in question were
} handling).  But there was no way to fix vnd and vn{d}config that would
} retain 100% backward compat in all cases.  Since NetBSD 7 was so new,
} some compat was lost just for it, you really do not want to run vnd
} related stuff from netbsd 7 release except with everything from its
} own version - upgrade to what is now on the relevant branch, or what
} is in current, but do both vnd.c and vndconfig at the same time.
} 
} But if you have vndconfig & a kernel built from the same set of sources,
} it should work.   But various mismatches have different sets of problems.
} Which particular problem depends upon just which version of vn{d}config
} and which version of vnd.c happen to be in use.
} 
} jnem...@cue.bc.ca said:
}   | It would appear that the call to vnconfig is failing.
}   | The question is, why?
} 
} Yes, good question.   What is $xparams in [t]he script fragment quoted ?

 It's the path to the file to be used as backing store (confirmed
to exist and be a regular file by an earlier call to stat(1)).
Its original source is the config file for the domU.

} Currently, it is possible to configure any unused vnd (so if $xparams
} is doing that it should work) (it is also possible to vndconfig -l

 As shown in the script fragment, $xparams has nothing to do with
the choice of which vnd to use.

} any device) but other uses are likely to return an error when used on
} an unconfig'd vnd
} 
}   | What happens if you have 9 or fewer /dev/vnds?
} 
} Should be irrelevant.
} 
}   | My thought here is about sort order where vnd10 would come before vnd2
} 
} No, the script extracts just the N part of the vnd names, and uses sort -n
} so the sort will produce 0 1 2 3 ... 10 11 ...

 Oops, right.

} But in any case:
} 
}   | and what happens if you try to configure them out of order.
} 
} Nothing very interesting, vnds (or any similar cloning device) can be
} configured in any order you like.
} 
} Incidentally, possibly depending on just what $xparams is, that script
} fragment looks like it should work fine to me - it uses safe methods
} to work out which vnd is available from what I can see (the script
} wants to use /dev/vnd* so it looks to see what is there, it cannot
} use anything which isn't) and then it removes from consideration any
} vnd which is in use (for which it uses $( sysctl hw.disknames )
} which is the safest way to see what is actually in use.
} 
} It isn't using vnconfig -l, which is the only thing that was (or should
} have been) affected by the vnd.c (and related vndconfig) changes.  That
} is, unless it is attempting to set a geometry with a sector size that is
} not a power of 2 - another of the changes in the set causes that to error

 The only call to vnconfig to configure a vnd (there is a
vnconfig -u elsewhere in the script) and as you saw it is nothing
more then:

vnconfig /dev/vndd 

} out, whereas previously it would have been accepted (and who knows what
} would have happened had it been actually used that way - these days much of
} the kernel assumes only power of two sector sizes, shifting is used to
} adjust units.)
} 
}-- End of excerpt from Robert Elz


Re: vnd.c 1.254

2016-01-17 Thread John Nemeth
On Jan 17,  9:37pm, Robert Elz wrote:
}
} Date:Sun, 17 Jan 2016 14:49:23 +0100
} From:Manuel Bouyer 
} Message-ID:  <20160117134923.ga2...@asim.lip6.fr>
} 
}   | I mean, vnconfig -l (without other arguments) has been showing available
}   | devices for a long time:
} 
} Yes, I know, and agree, it has ... but that is only possible if it
} is possible to rationally enumerate the available devices.   When there
} were a fixed (small) number, it made sense.  That is no longer the case.
} 
} Do you really want it to list 4 billion free vnds ?

 Obviously not, unless somebody was silly enough to create 4
billion /dev entries, which is likely to cause other problems.

} Using what is in /dev is incorrect (always was) as /dev is just a
} convention (and particularly is not reliable when chroots are in use).

 It may be "just a convention", but it is also the best
approximation.

}   | this is a major behavior change, which may well break existing setups.
} 
} True, but there is little alternative, unless you'd like to return to
} the pre cloning days.   It can stay as it is now, listing free devices
} up to the highest used (but that really is hard to explain and makes
} little sense, and as you have observed, is not very reliable) or I guess
} we could just add a 
} 
}   for (n = highest_found; ++n < highest_found + 4; )
}   printf("vnd%d: not in use\n", n);
} 
} after it finishes printing, just to list a few more free ones.
} 
}   | You remove existing and working functionality to fix a marginal backward
}   | compatibility issue ?
} 
} Not marginal at all, and backwards compat has always been one of NetBSD's
} prime objectives.
} 
}   | But removing this functionality is breaking
}   | backward compat, in a much more important way.
} 
} Actually, I doubt it.  I suspect some other issue is the problem here,
} and the change to vnconfig -l is just confusing the issue.

 Possibly.

}   | we *are* already running an up to date vnconfig, dammit !
} 
} Ah, OK, I misread your description (I thought you meant one from 7.0)
} 
}   | not until this problem is fixed. Breaking XEN3_DOM0 support is a real
}   | problem.
} 
} Agreed, we need to work out what is causing that vnconfig to fail.
} 
}   | Unfortunably it's transient.
} 
} That does make it difficult to debug.
} 
}   | After a view vnconfig manipulations the
}   | problem is gone for me (and vnconfig -l again show all devices,
}   | used or free).
} 
} All 4 billion of them?
} 
}   | cd_ndevs is now at 8 (checked with gdb against /dev/mem)
} 
} Then at some stage you had vnd7 configured.
} 
}-- End of excerpt from Robert Elz


Re: vnd.c 1.254

2016-01-17 Thread John Nemeth
On Jan 17, 11:04pm, Robert Elz wrote:
}
} Date:Sun, 17 Jan 2016 15:52:38 +0100
} From:Manuel Bouyer 
} Message-ID:  <20160117145238.ga3...@asim.lip6.fr>
} 
}   | unless you run vnconfig in the chroot.
} 
} And /dev in the chroot has the same vnds in it that /dev has
} 
}   | listing what is available in /dev makes sense to me, as, unless you have a
}   | very special setup, you'll use what's in /dev/ anyway.
} 
} Usually, yes, but "usually works" isn't really good enough.
} 
}   | You could use an option to list other devices in other directories.
} 
} You'd also need an option to give their names.   Consider
} 
}   mknod mydir/foo-pt1 c 14 0
}   monod mydir/bar-pt2 c 14 1
}   mknod mydir/xxx-pt3 c 14 2
}   mknod mydir/vnd-raw c 14 4
}   vmconfig $(pwd)/mydir/vnd-raw /some/image/file
}   mknod other/foo-pt1 c 14 16
}   mknod other/bar-pt2 c 14 17
}   mknod other/xxx-pt3 c 14 18
}   mknod other/vnd-raw c 14 19
}   rm -f /dev/vnd*
} 
} What would you like vnconfig -l to list, and how would you expect to
} achieve it?

 If you're going to do bonkers things, then you should expect
the system to behave in bonkers ways.  It is unreasonable to expect
the system to handle every corner case that a sysadmin on crack
can create.

}   | or just list what's in /dev/
} 
} That's not backward compat with any NetBSD prior to NetBSD7.
} Take your netbsd 5 that you used for the previous example, remove
} all the /dev/vnd* (or move them somewhere) and try vnconfig -l
} again.   I think you'll see the same output as you did before.
} Similarly if you MAKEDEV vnd{5,6,7} it will still just list vnd 0..3
} What is in /dev was always irrelevant.   NetBSD 7 is just broken in
} this area.
} 
}   | True, that's why I insist on vnconfig -l to list free devices as it used
}   | to (although I don't use it myself).
} 
} If you can work out what that really means (not looking at /dev) in a
} way that makes sense, that would be fine.  I cannot (other than listing
} all 4 billion.)
} 
}   | I'm talking about vnconfig -l not listing free devices, no about
}   | vnconfig getting spurious ENXIO
} 
} I know, and I still doubt that it matters.
} 
}   | it is a kernel and an userland from netbsd-7, not HEAD.
} 
} I understand.
} 
}   | Anyway vnconfig didn't change in netbsd-7 since 7.0.
} 
} It did, or should have.   The code that looked in /dev was ripped out.
} If a pullup of that didn't happen, it should have.
} 
}   | And even if it did, I would expect vnconfig from 7.0_RELEASE to work
}   | with a netbsd-7 kernel
} 
} Normally I would do, but ...
} 
}   | (for backward compat it's more important than a netbsd-6 vnconfig with
}   | a netbsd-7 kernel)
} 
} I disagree.   The number of people upgrading 7.0 to -7 (and doing
} it by only upgrading the kernel) is going to be far fewer than the
} number upgrading from 6 (and earlier.)   If a 7.0.1 had already

 That isn't necessarily true.  It is certainly feasible in many
cases and possibly even desirable in some cases to run with a 7.0
kernel on a 6.X userland for some time to make sure things are
going to work out okay.  It is much easier to change the kernel
then it is downgrade userland, especially since there is no officially
supported method for doing the latter.

} been released it wouldn't even be an issue.

 This, also isn't necessarily true.  7.0.1 won't see all pullups
that netbsd-7 does (7.0.1 will come from the netbsd-7-0 branch).
7.0.1 will only sees security and critical bug fixes, whereas 7.1
will have general bug fixes, updated/new device drivers, etc.

}   | > All 4 billion of them?
}   | No, what's in /dev/ as it used to do in 7.0-RELEASE
} 
} I bet it isn't.   MAKEDEV a few more vnds in /dev and try
} again, changing nothing else.  If it appears to be listing
} all that is in /dev, that is just co-incidence.
} 
}   | When the problem did show up, only vnd0 and vnd1 were in use.
}   | vnconfig -l did show on vnd0 and failed with ENXIO on vnd1 (although the
}   | device was configured because it was, and is still, in use by a domU).
} 
} That would be a bug, that we need to find and fix.   If vnd1 is in use,
} it should be listed.
} 
} It may be the same bug that is causing the xen startup problem, or it
} might be a different one.   Was it a (bare) "vnconfig -l" that failed?
} If you (or anyone else) sees this again, also try "vnconfig -l vnd1"
} (or whichever one vnconfig -l fails on and is known to be in use.)
} 
}-- End of excerpt from Robert Elz


Re: vnd.c 1.254

2016-01-17 Thread John Nemeth
On Jan 17,  5:52pm, Manuel Bouyer wrote:
} On Sun, Jan 17, 2016 at 11:04:23PM +0700, Robert Elz wrote:
} > Date:Sun, 17 Jan 2016 15:52:38 +0100
} > From:Manuel Bouyer 
} > Message-ID:  <20160117145238.ga3...@asim.lip6.fr>
} > 
} >   | unless you run vnconfig in the chroot.
} > 
} > And /dev in the chroot has the same vnds in it that /dev has
} 
} I don't understand that. If you run in /, you get the busy/free devices
} in /dev, if you run in /chroot you get the busy/free devices in /chroot/dev.
} I can't see a problem with that.
} 
} > 
} >   | listing what is available in /dev makes sense to me, as, unless you 
have a
} >   | very special setup, you'll use what's in /dev/ anyway.
} > 
} > Usually, yes, but "usually works" isn't really good enough.
} 
} As long as the limitations are known and documented I don't have a
} problem with that. If we remove all softwares that only "usually works"
} we can just drop computers away
} 
} > 
} >   | You could use an option to list other devices in other directories.
} > 
} > You'd also need an option to give their names.   Consider
} > 
} > mknod mydir/foo-pt1 c 14 0
} > monod mydir/bar-pt2 c 14 1
} > mknod mydir/xxx-pt3 c 14 2
} > mknod mydir/vnd-raw c 14 4
} > vmconfig $(pwd)/mydir/vnd-raw /some/image/file
} > mknod other/foo-pt1 c 14 16
} > mknod other/bar-pt2 c 14 17
} > mknod other/xxx-pt3 c 14 18
} > mknod other/vnd-raw c 14 19
} > rm -f /dev/vnd*
} > 
} > What would you like vnconfig -l to list, and how would you expect to
} > achieve it?
} > 
} >   | or just list what's in /dev/
} > 
} > That's not backward compat with any NetBSD prior to NetBSD7.
} > Take your netbsd 5 that you used for the previous example, remove
} > all the /dev/vnd* (or move them somewhere) and try vnconfig -l
} > again.   I think you'll see the same output as you did before.
} 
} yes, but that's not how one would use it. One would use vnconfig -l
} to find a usable device in /dev, so you need the /dev entry.
} 
} > Similarly if you MAKEDEV vnd{5,6,7} it will still just list vnd 0..3
} 
} yes, and that's find because others are not usable even if they exists.
} But now that this limitation is gone I don't have a problem with
} listing all /dev entries.
} 
} > What is in /dev was always irrelevant.   NetBSD 7 is just broken in
} > this area.
} 
} I say that what's in /dev/ is now relevant because this is what limits
} the number of vnd you can use (and this limit can easily be raised if needed).
} Older vnconfig -l listing devices without checking that a /dev/ entry
} exists may also be seens as a bug.
} 
} > 
} >   | True, that's why I insist on vnconfig -l to list free devices as it used
} >   | to (although I don't use it myself).
} > 
} > If you can work out what that really means (not looking at /dev) in a
} > way that makes sense, that would be fine.  I cannot (other than listing
} > all 4 billion.)
} 
} The only limit is what's in /dev/ so listing what's in /dev is fine.
} 
} > 
} >   | I'm talking about vnconfig -l not listing free devices, no about
} >   | vnconfig getting spurious ENXIO
} > 
} > I know, and I still doubt that it matters.
} > 
} >   | it is a kernel and an userland from netbsd-7, not HEAD.
} > 
} > I understand.
} > 
} >   | Anyway vnconfig didn't change in netbsd-7 since 7.0.
} > 
} > It did, or should have.   The code that looked in /dev was ripped out.
} > If a pullup of that didn't happen, it should have.
} 
} You can check that. But a pullup that remove a functionality that
} has been there for at last 2 release should be rejected.
} 
} > 
} >   | And even if it did, I would expect vnconfig from 7.0_RELEASE to work
} >   | with a netbsd-7 kernel
} > 
} > Normally I would do, but ...
} > 
} >   | (for backward compat it's more important than a netbsd-6 vnconfig with
} >   | a netbsd-7 kernel)
} > 
} > I disagree.   The number of people upgrading 7.0 to -7 (and doing
} > it by only upgrading the kernel) is going to be far fewer than the
} > number upgrading from 6 (and earlier.)
} 
} of course not. I guess it's common to run userland from a release and
} kernel from the corresponding stable branch. Running a kernel from a
} different stable branch than userland is much less common (because you
} expect things to break, e.g. ipf).

 Actually, ipf has backwards compat these days.  There is very
little left that doesn't have backwards compat.

} > If a 7.0.1 had already
} > been released it wouldn't even be an issue.
} 
} That wouldn't change the problem at all.
} 
} > 
} >   | > All 4 billion of them?
} >   | No, what's in /dev/ as it used to do in 7.0-RELEASE
} > 
} > I bet it isn't.   MAKEDEV a few more vnds in /dev and try
} > again, changing nothing else.  If it appears to be listing
} > all that is in /dev, that is just co-incidence.
} 
} You didn't look at the code I guess.
} xen1:/root#uname -a
} NetBSD xen1.soc.lip6.fr 7.0_STABLE NetBSD 7.0_STABLE (XEN3_DOM0) #12: Wed Jan 
 6 16:47

  1   2   >