Re: Communications kernel -> userland

2003-07-22 Thread Adam Migus

Robert Watson said:
>
> Well, the case I had particularly in mind was the rapid
> flow of packets
> form the kernel to the user process; Pawel's suggestion
> handles the flow
> of new data from the user process to the kernel well,
> and has substantial
> similarity to some of the IO Lite mechanisms I pointed
> at (and hopefully
> with many of the same performance benefits).  In the
> kernel-to-userspace
> case, we want to avoid the copy of what is originally
> kernel-owned memory
> (from the mbuf allocator) to the user process memory.
> If you didn't care
> about stuff like confidentiality of kernel memory, etc,
> the simplest
> approach would be to actually map the mbuf memory (and
> possibly cluster)
> into userspace, and then notify the user process in
> some form of the new
> mapping.  However, because mbufs and their meta-data
> aren't page aligned
> (etc, etc, etc), you really don't want to do it
> explicitly that way, I
> suspect.

Ok, I think I understand a little better.  The DMA
analogy combined with the somewhat obscure bracketed
requirements below it caused me to get a little
confused.  As for the page alignment issue have you
checked out the new MBUMA stuff Bosko's doing?  It uses
(and abstracts) mbuf allocation over uma.  Perhaps it
could be taylored to fit your requirements.

>
> By synchronization, I had in mind a mechanism by which
> the process and
> kernel would communicate about memory ownership in the
> shared memory
> space: "I'm done with this packet", "I'm done with
> these packets", "I want
> to continue delivery of that packet", "I modified this
> packet", "I'm
> inserting a new packet here", "I'm dropping this
> packet", all without
> extensive memory copying, and with a moderate amount of
> asynchrony (and
> possibly parallelism).  In terms of functionality, it
> might be similar to
> some of the current services that forward between
> IPDIVERT "in" and "out"
> (such as natd), or between BPF pseudo-devices.  This
> sounds like something
> that likely exists in a few commercial products
> already, so my question to
> Terry was to whether he knew of any in the literature.
> IOLite is the
> closest I know of, as it supports the zero-copy page
> and memory ownership
> bits, although I don't know if they allowed it to
> handle packets, perhaps
> just datagrams and streams.
>

Given my comments above would it not be possible to
offer this mechanism as an extension to the mbuf's own
meta-data?

-- 
Adam Migus - Research Scientist
Network Associates Laboratories (http://www.nailabs.com)
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Communications kernel -> userland

2003-07-22 Thread Robert Watson

On Tue, 22 Jul 2003, Adam Migus wrote:

> Perhaps I'm not understanding you right but I think Pawel's idea is
> cool.  It seems to fulfill your requirements (except being network
> specific).  I suppose if it were network specific we could optimize it
> for packet streams and if we made it complicated enough it would require
> quite an elaborate sychronization and notification mechanism.  Is that
> closer to what have in mind? 

Well, the case I had particularly in mind was the rapid flow of packets
form the kernel to the user process; Pawel's suggestion handles the flow
of new data from the user process to the kernel well, and has substantial
similarity to some of the IO Lite mechanisms I pointed at (and hopefully
with many of the same performance benefits).  In the kernel-to-userspace
case, we want to avoid the copy of what is originally kernel-owned memory
(from the mbuf allocator) to the user process memory.  If you didn't care
about stuff like confidentiality of kernel memory, etc, the simplest
approach would be to actually map the mbuf memory (and possibly cluster)
into userspace, and then notify the user process in some form of the new
mapping.  However, because mbufs and their meta-data aren't page aligned
(etc, etc, etc), you really don't want to do it explicitly that way, I
suspect. 

By synchronization, I had in mind a mechanism by which the process and
kernel would communicate about memory ownership in the shared memory
space: "I'm done with this packet", "I'm done with these packets", "I want
to continue delivery of that packet", "I modified this packet", "I'm
inserting a new packet here", "I'm dropping this packet", all without
extensive memory copying, and with a moderate amount of asynchrony (and
possibly parallelism).  In terms of functionality, it might be similar to
some of the current services that forward between IPDIVERT "in" and "out" 
(such as natd), or between BPF pseudo-devices.  This sounds like something
that likely exists in a few commercial products already, so my question to
Terry was to whether he knew of any in the literature.  IOLite is the
closest I know of, as it supports the zero-copy page and memory ownership
bits, although I don't know if they allowed it to handle packets, perhaps
just datagrams and streams.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Communications kernel -> userland

2003-07-22 Thread Adam Migus

Robert Watson said:
>
> On Mon, 21 Jul 2003, Pawel Jakub Dawidek wrote:
>
>> For example syscall is marking some range with mark()
>> function.  For now
>> on this range isn't accessable from userland. If
>> process will try to
>> write to this page, page is copied (copy-on-write).
>> If this page will
>> be modified by kernel it will be marked as MODIFIED.
>> Now when syscall
>> will call unmark() on this range we could get two
>> scenarious:
>>
>>  1. Page is marked as MODIFIED (by kernel) so
>> userland copy
>> of this page (if it exists of course) is
>> destroyed and
>> this page will be putted in its place.
>> This is replacement for copyin() and then
>> copyout() or
>> just copyout()..
>>  2. Page isn't marked as MODIFIED, so kernel version
>> of page
>> is destroyed (is there is userland version).
>> This is replacement for just copyin().
>>
>> There could be other ways. Thread/process could be
>> locked if it is
>> trying to access memory marked with mark() function.
>> And this, I think,
>> don't hit performance, because this happends really
>> rarely. So maybe it
>> is better to lock thread for a moment instead of
>> doplicating page, but I
>> don't think so.
>
> This sounds a bit like some of the IO Lite stuff --
> moving to a
> page-centric model for IO interfaces to avoid copy
> operations, in many
> cases able to share pages between applications, buffer
> cache, network
> buffers, etc. Take a look at:
>
>   http://www.cs.princeton.edu/~vivek/
>
> For some details.  Some of the benefits of this
> approach are captured in
> the common case through sendfile(), in practice, but
> it's definitely worth
> a read.
>
> I guess what I had in mind was something more
> network-specific, with
> interfaces optimized for memory mapped network packet
> streams.  In the
> simplest case, something like memory-mapping the BPF
> buffer from kernel
> space to userspace, with some sort of simple stream
> synchronization so
> that the user application could notify the kernel as to
> when it could
> reuse bits of the buffer, but avoiding copy operations
> and lots of context
> switching.
>
> Robert N M Watson FreeBSD Core Team,
> TrustedBSD Projects
> [EMAIL PROTECTED]  Network Associates
> Laboratories
>
>
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to
> "[EMAIL PROTECTED]"
>

Perhaps I'm not understanding you right but I think
Pawel's idea is cool.  It seems to fulfill your
requirements (except being network specific).  I
suppose if it were network specific we could optimize
it for packet streams and if we made it complicated
enough it would require quite an elaborate
sychronization and notification mechanism.  Is that
closer to what have in mind?

-- 
Adam Migus - Research Scientist
Network Associates Laboratories (http://www.nailabs.com)
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Communications kernel -> userland

2003-07-21 Thread Robert Watson

On Mon, 21 Jul 2003, Pawel Jakub Dawidek wrote:

> For example syscall is marking some range with mark() function.  For now
> on this range isn't accessable from userland. If process will try to
> write to this page, page is copied (copy-on-write).  If this page will
> be modified by kernel it will be marked as MODIFIED.  Now when syscall
> will call unmark() on this range we could get two scenarious: 
> 
>   1. Page is marked as MODIFIED (by kernel) so userland copy
>  of this page (if it exists of course) is destroyed and
>  this page will be putted in its place.
>  This is replacement for copyin() and then copyout() or
>  just copyout()..
>   2. Page isn't marked as MODIFIED, so kernel version of page
>  is destroyed (is there is userland version).
>  This is replacement for just copyin().
> 
> There could be other ways. Thread/process could be locked if it is
> trying to access memory marked with mark() function. And this, I think,
> don't hit performance, because this happends really rarely. So maybe it
> is better to lock thread for a moment instead of doplicating page, but I
> don't think so. 

This sounds a bit like some of the IO Lite stuff -- moving to a
page-centric model for IO interfaces to avoid copy operations, in many
cases able to share pages between applications, buffer cache, network
buffers, etc. Take a look at:

  http://www.cs.princeton.edu/~vivek/

For some details.  Some of the benefits of this approach are captured in
the common case through sendfile(), in practice, but it's definitely worth
a read.

I guess what I had in mind was something more network-specific, with
interfaces optimized for memory mapped network packet streams.  In the
simplest case, something like memory-mapping the BPF buffer from kernel
space to userspace, with some sort of simple stream synchronization so
that the user application could notify the kernel as to when it could
reuse bits of the buffer, but avoiding copy operations and lots of context
switching. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Communications kernel -> userland

2003-07-21 Thread Pawel Jakub Dawidek
On Mon, Jul 21, 2003 at 02:20:40PM -0400, Robert Watson wrote:
+> For one of our research projects, here at NAI, we did a fair amount of
+> userland network code prototyping.  We started out with IPDIVERT, then
+> pushed down to BPF using a partial network stack in userspace.  We've
+> found it's a lot easier on competent network developers who are unfamiliar
+> with the FreeBSD kernel code, not to mention easier on debugging.  We
+> never got so far on that project as to do shared memory between the kernel
+> and userspace, but I know that that's been done by at least a couple of
+> companies at various points to reduce copying and context switch costs for
+> userspace test frameworks.  One of the things I'd really like to see if
+> some decent "throw packets between kernel and userspace" primitive bits,
+> such that the kernel has a useful and logical way to expose buffer data
+> into directly mapped user pages, and an appropriate notification and
+> management system to reuse memory, etc.  Something that looks a bit like
+> the relationship between kernel device drivers and devices when it comes
+> to DMA management.  Do you know if any such framework exists? 
+> (Specifically targetted at exposing network packets...)  (Ideally not
+> requiring privilege in the user process, nor involving nasty integrity or
+> confidentiality problems :-)

It will be cool to have something like this:

mark(vm_map_t map, vm_offset_t start, vm_offset_t end);
unmark(vm_map_t map, vm_offset_t start, vm_offset_t end);

It will be used instead of copyin()/copyout() functions.

For example syscall is marking some range with mark() function.
For now on this range isn't accessable from userland. If process will
try to write to this page, page is copied (copy-on-write).
If this page will be modified by kernel it will be marked as MODIFIED.
Now when syscall will call unmark() on this range we could get two
scenarious:

1. Page is marked as MODIFIED (by kernel) so userland copy
   of this page (if it exists of course) is destroyed and
   this page will be putted in its place.
   This is replacement for copyin() and then copyout() or
   just copyout()..
2. Page isn't marked as MODIFIED, so kernel version of page
   is destroyed (is there is userland version).
   This is replacement for just copyin().

There could be other ways. Thread/process could be locked if it is trying
to access memory marked with mark() function. And this, I think, don't hit
performance, because this happends really rarely. So maybe it is better to
lock thread for a moment instead of doplicating page, but I don't think so.

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Re: Communications kernel -> userland

2003-07-21 Thread Robert Watson

On Mon, 21 Jul 2003, Terry Lambert wrote:

> Robert Watson wrote:
> > Of these approaches, my favorite are writing directly to a file, and using
> > a psuedo-device, depending on the requirements.  They have fairly
> > well-defined security semantics (especially if you properly cache the
> > open-time credentials in the file case).  I don't really like the Fifo
> > case as it has to re-look-up the fifo each time, and has some odd blocking
> > semantics.  Sockets, as I said, involve a lot of special casing, so unless
> > you're already dealing with network code, you probably don't want to drag
> > it into the mix.  If you're creating big new infrastructure for a feature,
> > I suppose you could also hook it up as a first class object at the file
> > descriptor level, in the style of kqueue.  If it's relatively minor event
> > data, you could hook up a new kqueue event type.  You could also just use
> > a special-purpose system call or sysctl if you don't mind a lot of context
> > switching and lack of buffering.
> 
> I like setting the PG_G bit on the page involved, which maps it into the
> address space of all processes.  8-). 

For one of our research projects, here at NAI, we did a fair amount of
userland network code prototyping.  We started out with IPDIVERT, then
pushed down to BPF using a partial network stack in userspace.  We've
found it's a lot easier on competent network developers who are unfamiliar
with the FreeBSD kernel code, not to mention easier on debugging.  We
never got so far on that project as to do shared memory between the kernel
and userspace, but I know that that's been done by at least a couple of
companies at various points to reduce copying and context switch costs for
userspace test frameworks.  One of the things I'd really like to see if
some decent "throw packets between kernel and userspace" primitive bits,
such that the kernel has a useful and logical way to expose buffer data
into directly mapped user pages, and an appropriate notification and
management system to reuse memory, etc.  Something that looks a bit like
the relationship between kernel device drivers and devices when it comes
to DMA management.  Do you know if any such framework exists? 
(Specifically targetted at exposing network packets...)  (Ideally not
requiring privilege in the user process, nor involving nasty integrity or
confidentiality problems :-)

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Communications kernel -> userland

2003-07-21 Thread Terry Lambert
Robert Watson wrote:
> Of these approaches, my favorite are writing directly to a file, and using
> a psuedo-device, depending on the requirements.  They have fairly
> well-defined security semantics (especially if you properly cache the
> open-time credentials in the file case).  I don't really like the Fifo
> case as it has to re-look-up the fifo each time, and has some odd blocking
> semantics.  Sockets, as I said, involve a lot of special casing, so unless
> you're already dealing with network code, you probably don't want to drag
> it into the mix.  If you're creating big new infrastructure for a feature,
> I suppose you could also hook it up as a first class object at the file
> descriptor level, in the style of kqueue.  If it's relatively minor event
> data, you could hook up a new kqueue event type.  You could also just use
> a special-purpose system call or sysctl if you don't mind a lot of context
> switching and lack of buffering.

I like setting the PG_G bit on the page involved, which maps it
into the address space of all processes.  8-).

-- Terry
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Communications kernel -> userland

2003-07-21 Thread Marc Ramirez

Cool.  Thanks, everyone!  Messrs. Watson and Lambert have convinced me to
go the pseudo-device route.  I think that's really going to clean up a lot
of the code.

I'm so excited!

Thanks!

Marc.

On Sun, 20 Jul 2003, Robert Watson wrote:

>
> On Sat, 19 Jul 2003, Pawel Jakub Dawidek wrote:
>
> > Your choices are:
> > - device,
> > - sysctl,
> > - syscall.
>
> There are actually a few other more obscure ways to push information from
> the kernel to userspace, depending on what you want to accomplish.
>
> Write directly to a file from the kernel.  ktrace, system accounting, and
> ktr with alq all stream data directly to
> a file provided by an authorized user process.  quotas and UFS1
> extended attribute data are also written directly to a file.  On
> other operating systems, audit implementations frequently take the same
> approach -- when the goal is long term storage of data in a
> user-accessible
> form, but you don't want to stream it through a user process live, this
> is usually the preference.  Typically, when taking this approach, a
> special system call is used to notify the kernel of the target file to
> write to -- the file is created by the user process with appropriate
> protections.  Often, but not always, the system call is non-blocking and
> simply returns once the file is hooked up as a target, and continues
> until another system call cancels delivery, or switches it to a new
> target.
>
> Stream it through a device node.  If you need only one or a small number
> of processes to listen for events from the kernel, a common approach
> is a pseudo-device that acts like a file.  For example, syslogd listens
> on /dev/klog for log events from the kernel; some audit implementations
> also take this approach.  Our devd, usbd, and others similarly listen
> for system events that are exposed to user processes as data on a
> blocking pseudo-device.  One nice thing about this approach is that you
> can combine it with select(), kqueue(), et al, to do centralized event
> management in the application.  BPF also does this.  Both Arla and
> Coda take this approach for LPC'ing to userspace to request events
> as a result of VFS operations by processes.
>
> Expose it using a special socket type.  We expose routing data and
> network stack administrative controls as special reads, writes, and
> ioctls on various socket types.  I'm not a big fan of this approach,
> as it special cases a lot of bits, and requires you to get caught
> up in socket semantics.  However, one advantage of this approach is
> it makes the notion of multicast of events to multiple listeners easier
> to deal with, since each socket endpoint has automatic message buffering.
>
> There are some other odd cases in use as well.  The NFS locking code
> opens a specially named fifo (/var/run/lock) and writes messages to
> it, which are picked up by rpc.lockd.  The lock daemon pushes events
> back into the kernel using a special system call.  I don't really
> like this approach, as it has some odd semantics -- especially since
> it reopens the fifo for each operation, and there are credential/
> file system namespace inconsistencies.
>
> Of these approaches, my favorite are writing directly to a file, and using
> a psuedo-device, depending on the requirements.  They have fairly
> well-defined security semantics (especially if you properly cache the
> open-time credentials in the file case).  I don't really like the Fifo
> case as it has to re-look-up the fifo each time, and has some odd blocking
> semantics.  Sockets, as I said, involve a lot of special casing, so unless
> you're already dealing with network code, you probably don't want to drag
> it into the mix.  If you're creating big new infrastructure for a feature,
> I suppose you could also hook it up as a first class object at the file
> descriptor level, in the style of kqueue.  If it's relatively minor event
> data, you could hook up a new kqueue event type.  You could also just use
> a special-purpose system call or sysctl if you don't mind a lot of context
> switching and lack of buffering.
>
> Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
> [EMAIL PROTECTED]  Network Associates Laboratories
>
>
>
>


--
Marc Ramirez
Blue Circle Software Corporation
513-688-1070 (main)
513-382-1270 (direct)
www.bluecirclesoft.com
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Communications kernel -> userland

2003-07-21 Thread Marc Ramirez
On Sat, 19 Jul 2003, Pawel Jakub Dawidek wrote:

> On Fri, Jul 18, 2003 at 03:47:05PM -0400, Marc Ramirez wrote:
> +> I have a remote datastore that I want to present as a filesystem.  There
> +> are two parts to this: fetching raw data over the network, and doing some
> +> processing on the data.  For purposes of maintainability, I'd like to do
> +> as little of this as possible inside the kernel, so I've currently got a
> +> daemon to fetch and process the data, and then pipes it over a socket to
> +> the kernel FS layer.
>
> Your choices are:
> - device,
> - sysctl,
> - syscall.
>
> You need to think about what you exactly need and which options will be
> the best. Creating new syscall isn't good idea, creating device is more
> complicated than sysctl, but of course it's up to you and your needs.

Okay, thanks.  Syscall seems completely counter-intuitive for my needs,
anyway.

Marc.

--
Marc Ramirez
Blue Circle Software Corporation
513-688-1070 (main)
513-382-1270 (direct)
www.bluecirclesoft.com
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Communications kernel -> userland

2003-07-21 Thread Marc Ramirez
On Fri, 18 Jul 2003, Terry Lambert wrote:

> Marc Ramirez wrote:
> > I asked this in -questions, but got no response; sorry for the repost.
> >
> > I have a device driver that needs to make requests for data from a
> > userland daemon.  What's the preferred method for doing this in 4.8R and
> > 5.1R?  I'm assuming the answer is Unix-domain sockets...
>
> It depends on the application.  In most cases these are set up
> as request/response protocols.
>
> In that case, the best method is to ise an ioctl() or fcntl()
> (which you use depends on what in the kernel is talking to
> userland), and then "returning" to user space with the request.
> The userland then makes another call back down with the response,
> and the next wait-for-request.  This saves you fully 50% of the
> protection domain crossing system calls from an ordinary callback,
> and it saves you 300% of the protection domain crossings of what
> you would need for a pipe/FIFO/unix-domain-socket.

I understand.  Thanks!

> E.g.:
>
>   userkernel
>   --
> REQ1  make_req()
>   sleep_waiting_for_available()
>   ioctl(fd, MY_GETREQ, &req)
>   sleep_waiting_for_req()
>   copyout()
>   sleep_waiting_for_rsp()
>   ioctl(fd, MY_RSPREQ, &req)
>   sleep_waiting_for_req()
>   copyin()
>   ...
> REQ2  make_req()
>   copyout()
>   sleep_waiting_for_rsp()
>   ioctl(fd, MY_RSPREQ, &req)
>   sleep_waiting_for_req()
>   copyin()
>   ...
> ...
>
> -- Terry
>


--
Marc Ramirez
Blue Circle Software Corporation
513-688-1070 (main)
513-382-1270 (direct)
www.bluecirclesoft.com
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Communications kernel -> userland

2003-07-21 Thread Marc Ramirez
On Sat, 19 Jul 2003, M. Warner Losh wrote:

> In message: <[EMAIL PROTECTED]>
> Marc Ramirez <[EMAIL PROTECTED]> writes:
> : I have a device driver that needs to make requests for data from a
> : userland daemon.  What's the preferred method for doing this in 4.8R and
> : 5.1R?  I'm assuming the answer is Unix-domain sockets...
>
> what's wrong with a simple read channel?

Nothing except that this is my first real foray into non-trivial kernel
programming. :)

Marc.

> why complicate things by
> bringing sockets into it?
>
> Warner
>
>


--
Marc Ramirez
Blue Circle Software Corporation
513-688-1070 (main)
513-382-1270 (direct)
www.bluecirclesoft.com
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Communications kernel -> userland

2003-07-20 Thread Robert Watson

On Sat, 19 Jul 2003, Pawel Jakub Dawidek wrote:

> Your choices are:
> - device,
> - sysctl,
> - syscall.

There are actually a few other more obscure ways to push information from
the kernel to userspace, depending on what you want to accomplish.

Write directly to a file from the kernel.  ktrace, system accounting, and
ktr with alq all stream data directly to
a file provided by an authorized user process.  quotas and UFS1
extended attribute data are also written directly to a file.  On
other operating systems, audit implementations frequently take the same
approach -- when the goal is long term storage of data in a
user-accessible
form, but you don't want to stream it through a user process live, this
is usually the preference.  Typically, when taking this approach, a
special system call is used to notify the kernel of the target file to
write to -- the file is created by the user process with appropriate
protections.  Often, but not always, the system call is non-blocking and
simply returns once the file is hooked up as a target, and continues
until another system call cancels delivery, or switches it to a new
target.

Stream it through a device node.  If you need only one or a small number
of processes to listen for events from the kernel, a common approach
is a pseudo-device that acts like a file.  For example, syslogd listens
on /dev/klog for log events from the kernel; some audit implementations
also take this approach.  Our devd, usbd, and others similarly listen
for system events that are exposed to user processes as data on a
blocking pseudo-device.  One nice thing about this approach is that you
can combine it with select(), kqueue(), et al, to do centralized event
management in the application.  BPF also does this.  Both Arla and
Coda take this approach for LPC'ing to userspace to request events
as a result of VFS operations by processes.

Expose it using a special socket type.  We expose routing data and
network stack administrative controls as special reads, writes, and
ioctls on various socket types.  I'm not a big fan of this approach,
as it special cases a lot of bits, and requires you to get caught
up in socket semantics.  However, one advantage of this approach is
it makes the notion of multicast of events to multiple listeners easier
to deal with, since each socket endpoint has automatic message buffering.

There are some other odd cases in use as well.  The NFS locking code
opens a specially named fifo (/var/run/lock) and writes messages to
it, which are picked up by rpc.lockd.  The lock daemon pushes events
back into the kernel using a special system call.  I don't really
like this approach, as it has some odd semantics -- especially since
it reopens the fifo for each operation, and there are credential/
file system namespace inconsistencies.

Of these approaches, my favorite are writing directly to a file, and using
a psuedo-device, depending on the requirements.  They have fairly
well-defined security semantics (especially if you properly cache the
open-time credentials in the file case).  I don't really like the Fifo
case as it has to re-look-up the fifo each time, and has some odd blocking
semantics.  Sockets, as I said, involve a lot of special casing, so unless
you're already dealing with network code, you probably don't want to drag
it into the mix.  If you're creating big new infrastructure for a feature,
I suppose you could also hook it up as a first class object at the file
descriptor level, in the style of kqueue.  If it's relatively minor event
data, you could hook up a new kqueue event type.  You could also just use
a special-purpose system call or sysctl if you don't mind a lot of context
switching and lack of buffering. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories



___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Communications kernel -> userland

2003-07-19 Thread M. Warner Losh
In message: <[EMAIL PROTECTED]>
Marc Ramirez <[EMAIL PROTECTED]> writes:
: I have a device driver that needs to make requests for data from a
: userland daemon.  What's the preferred method for doing this in 4.8R and
: 5.1R?  I'm assuming the answer is Unix-domain sockets...

what's wrong with a simple read channel?  why complicate things by
bringing sockets into it?

Warner

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Communications kernel -> userland

2003-07-19 Thread Pawel Jakub Dawidek
On Sat, Jul 19, 2003 at 09:47:08AM +0200, Pawel Jakub Dawidek wrote:
+> On Fri, Jul 18, 2003 at 03:47:05PM -0400, Marc Ramirez wrote:
+> +> I have a remote datastore that I want to present as a filesystem.  There
+> +> are two parts to this: fetching raw data over the network, and doing some
+> +> processing on the data.  For purposes of maintainability, I'd like to do
+> +> as little of this as possible inside the kernel, so I've currently got a
+> +> daemon to fetch and process the data, and then pipes it over a socket to
+> +> the kernel FS layer.
+> 
+> Your choices are:
+> - device,
+> - sysctl,
+> - syscall.
+> 
+> You need to think about what you exactly need and which options will be
+> the best. Creating new syscall isn't good idea, creating device is more
+> complicated than sysctl, but of course it's up to you and your needs.

Hmm, there is a chance that I've reverse direction:)

But if you don't use kevent/kqueue you need to tell kernel that you want
to read data.

For example exporting logs from kernel to userland (to syslogd) is via
device /dev/klog.

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Re: Communications kernel -> userland

2003-07-19 Thread Pawel Jakub Dawidek
On Fri, Jul 18, 2003 at 03:47:05PM -0400, Marc Ramirez wrote:
+> I have a remote datastore that I want to present as a filesystem.  There
+> are two parts to this: fetching raw data over the network, and doing some
+> processing on the data.  For purposes of maintainability, I'd like to do
+> as little of this as possible inside the kernel, so I've currently got a
+> daemon to fetch and process the data, and then pipes it over a socket to
+> the kernel FS layer.

Your choices are:
- device,
- sysctl,
- syscall.

You need to think about what you exactly need and which options will be
the best. Creating new syscall isn't good idea, creating device is more
complicated than sysctl, but of course it's up to you and your needs.

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Re: Communications kernel -> userland

2003-07-18 Thread Terry Lambert
Marc Ramirez wrote:
> I asked this in -questions, but got no response; sorry for the repost.
> 
> I have a device driver that needs to make requests for data from a
> userland daemon.  What's the preferred method for doing this in 4.8R and
> 5.1R?  I'm assuming the answer is Unix-domain sockets...

It depends on the application.  In most cases these are set up
as request/response protocols.

In that case, the best method is to ise an ioctl() or fcntl()
(which you use depends on what in the kernel is talking to
userland), and then "returning" to user space with the request.
The userland then makes another call back down with the response,
and the next wait-for-request.  This saves you fully 50% of the
protection domain crossing system calls from an ordinary callback,
and it saves you 300% of the protection domain crossings of what
you would need for a pipe/FIFO/unix-domain-socket.

E.g.:

userkernel
--
REQ1make_req()
sleep_waiting_for_available()
ioctl(fd, MY_GETREQ, &req)
sleep_waiting_for_req()
copyout()
sleep_waiting_for_rsp()
ioctl(fd, MY_RSPREQ, &req)
sleep_waiting_for_req()
copyin()
...
REQ2make_req()
copyout()
sleep_waiting_for_rsp()
ioctl(fd, MY_RSPREQ, &req)
sleep_waiting_for_req()
copyin()
...
...

-- Terry
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Communications kernel -> userland

2003-07-18 Thread Marc Ramirez
On Fri, 18 Jul 2003, Julian Elischer wrote:

>
>
> On Fri, 18 Jul 2003, Marc Ramirez wrote:
>
> > On Fri, 18 Jul 2003, Daniel Eischen wrote:
> >
> > > On Fri, 18 Jul 2003, Marc Ramirez wrote:
> > > > I asked this in -questions, but got no response; sorry for the repost.
> > > >
> > > > I have a device driver that needs to make requests for data from a
> > > > userland daemon.  What's the preferred method for doing this in 4.8R and
> > > > 5.1R?  I'm assuming the answer is Unix-domain sockets...
> > >
> > > I think you got it backwards.  Not that you can't
> > > do what you want to do, but it's usually the other
> > > way around.
> > >
> > > Your daemon should listen on the device (blocking
> > > ioctl or read) and send data to the device when
> > > it is ready for it (using write or ioctl).
> >
> > Sorry - I'll be more specific.
> >
> > I have a remote datastore that I want to present as a filesystem.  There
> > are two parts to this: fetching raw data over the network, and doing some
> > processing on the data.  For purposes of maintainability, I'd like to do
> > as little of this as possible inside the kernel, so I've currently got a
> > daemon to fetch and process the data, and then pipes it over a socket to
> > the kernel FS layer.
> >
> > Anyway I'm trying to move on from the "accurate" stage of development to
> > the "accurate and speedy" stage, so I'm asking around... :)
>
> Isn't that what the 'portalfs' is for?

Actually, I just read the manpage, and I'm a little confused on what
portalfs is for... :) but it appears that it's just for letting you use
establish network connections via the FS... maybe... (plus that 'fs'
thing).

I actually have large sets of data that I dynamically want to present as a
hierarchy (even different hierarchies based on, say, environment
variables, but I haven had quite that need yet). I'm constantly writing
software to do all kinds of wierd things to this data that's in several
big blobs.  It'll save me much time in the long run if I can just
dynamically view it as an FS.  I worry about regularizing the data and the
users worry about find and perl scripts.

Marc.

--
Marc Ramirez
Blue Circle Software Corporation
513-688-1070 (main)
513-382-1270 (direct)
www.bluecirclesoft.com
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Communications kernel -> userland

2003-07-18 Thread Julian Elischer


On Fri, 18 Jul 2003, Marc Ramirez wrote:

> On Fri, 18 Jul 2003, Daniel Eischen wrote:
> 
> > On Fri, 18 Jul 2003, Marc Ramirez wrote:
> > > I asked this in -questions, but got no response; sorry for the repost.
> > >
> > > I have a device driver that needs to make requests for data from a
> > > userland daemon.  What's the preferred method for doing this in 4.8R and
> > > 5.1R?  I'm assuming the answer is Unix-domain sockets...
> >
> > I think you got it backwards.  Not that you can't
> > do what you want to do, but it's usually the other
> > way around.
> >
> > Your daemon should listen on the device (blocking
> > ioctl or read) and send data to the device when
> > it is ready for it (using write or ioctl).
> 
> Sorry - I'll be more specific.
> 
> I have a remote datastore that I want to present as a filesystem.  There
> are two parts to this: fetching raw data over the network, and doing some
> processing on the data.  For purposes of maintainability, I'd like to do
> as little of this as possible inside the kernel, so I've currently got a
> daemon to fetch and process the data, and then pipes it over a socket to
> the kernel FS layer.
> 
> Anyway I'm trying to move on from the "accurate" stage of development to
> the "accurate and speedy" stage, so I'm asking around... :)

Isn't that what the 'portalfs' is for?

> 
> Thanks,
> 
> Marc.
> 
> 
> --
> Marc Ramirez
> Blue Circle Software Corporation
> 513-688-1070 (main)
> 513-382-1270 (direct)
> www.bluecirclesoft.com
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
> 

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Communications kernel -> userland

2003-07-18 Thread Marc Ramirez
On Fri, 18 Jul 2003, Daniel Eischen wrote:

> On Fri, 18 Jul 2003, Marc Ramirez wrote:
> > I asked this in -questions, but got no response; sorry for the repost.
> >
> > I have a device driver that needs to make requests for data from a
> > userland daemon.  What's the preferred method for doing this in 4.8R and
> > 5.1R?  I'm assuming the answer is Unix-domain sockets...
>
> I think you got it backwards.  Not that you can't
> do what you want to do, but it's usually the other
> way around.
>
> Your daemon should listen on the device (blocking
> ioctl or read) and send data to the device when
> it is ready for it (using write or ioctl).

Sorry - I'll be more specific.

I have a remote datastore that I want to present as a filesystem.  There
are two parts to this: fetching raw data over the network, and doing some
processing on the data.  For purposes of maintainability, I'd like to do
as little of this as possible inside the kernel, so I've currently got a
daemon to fetch and process the data, and then pipes it over a socket to
the kernel FS layer.

Anyway I'm trying to move on from the "accurate" stage of development to
the "accurate and speedy" stage, so I'm asking around... :)

Thanks,

Marc.


--
Marc Ramirez
Blue Circle Software Corporation
513-688-1070 (main)
513-382-1270 (direct)
www.bluecirclesoft.com
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Communications kernel -> userland

2003-07-18 Thread Daniel Eischen
On Fri, 18 Jul 2003, Marc Ramirez wrote:
> I asked this in -questions, but got no response; sorry for the repost.
> 
> I have a device driver that needs to make requests for data from a
> userland daemon.  What's the preferred method for doing this in 4.8R and
> 5.1R?  I'm assuming the answer is Unix-domain sockets...

I think you got it backwards.  Not that you can't
do what you want to do, but it's usually the other
way around.

Your daemon should listen on the device (blocking
ioctl or read) and send data to the device when
it is ready for it (using write or ioctl).

-- 
Dan Eischen

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"