On Wed, 18 Apr 2001, Poul-Henning Kamp wrote:

> I have not examined the full details of doing the shift yet, but it is
> my impression that it actually will reduce the amount of code
> duplication and special casing. 

..

> The only places we will need new magic is
>       open, which needs to fix the plumbing for us.
>       mmap, which may have to be added to the fileops vector.
> 
> The amount of special-casing code this would remove from the vnode
> layer is rather astonishing.
> 
> If we merger vm-objects and vnodes without taking devices out of the
> mix, we will need even more special-case code for devices.

Let me expand a bit on what I want to object to, and then comment a bit on
what I have mixed feelings about but am not actively objecting to.

I believe it is necessary to retain a reference to the vnode used to
access the device in f_data, and an f_type of DTYPE_VNODE.  This is used
with tty's extensively, where it is desirable to open /dev/ttyfoo and then
perform file system operations on it, such as fchflags(), fchmod(),
fchown(), revoke(), et al, and relies on reaching the vnode via the open
file entry associated with the file descriptor designated by the invoking
process.  This behavior is needed for a variety of race-free operations at
login, et al.  Changing this would require *extensive* modification to the
syscall service layer (that is, what sits above VFS).  Assuming the
modifications were made so that the fileops array provided these services
(makine the struct file be the entire abstraction, hiding VFS from the
system call service layer)  you've now completely rewritten the large
majority of system calls, as well as introduced a whole ne category of
inter-abstraction synchronization that must occur when a change is made to
any abstraction (i.e., adding ACLs, MAC, ...).  So it seems to me that
access to the vnode must be maintained in struct file, that we cannot
totally replace references to the vnode with references to, for example,
the device abstraction. 

So with these assumptions in place, it's still possible to consider what
you were suggesting: replacing the vnode fileops array with a device
fileops array, so that these calls would be short-cutted directly to the
device abstraction rather than passing through the VFS abstractions on the
way.  In some ways, this makes sense: many of the device services map
poorly into the file-like abstraction of the vnode.  For example, devices
may have a notion of a stateful seeking position: tape drives, for
example, really *do* seek to a particular location where the next read or
write must be performed.  Similarly, some devices really do act like
streaming data sources or sinks: especially with regards to
pseudo-devices, they may behave much more like sockets, with a notion of a
discrete transmission unit, a maximum transmission unit, or addressibility
(imagine if you could open a device representing a bus, and use socket
addressing calls to set the bus address being targetted -- say for a
/dev/usb0, you could say "address the following messages to USB address
4", or being able to open /dev/ed0, set the target address of the device
instance to an ethernet address, and send).  We already have this problem
to some extent with sockets: we use the file system vnode for two
purposes: first, as a namespace in which to identify the IPC object, and
second, as a means for storing protection properties.  It's arguable that
devices might work that way also, which I think is what you're asserting.

I'm not strictly opposed to this viewpoint, but it begins to make me
wonder a bit about the current structuring of that whole section of the
kernel: to me, a vnode really does seem like a decent abstraction of the
file system concept.  The socket seems like a less decent abstraction of
the IPC concept, but a better abstraction of a send/receive stream.  This
is all complicated by long-standing interfaces and notions about how the
abstractions are to be used.  I guess I'd rather see it look something
like this:

                                 +-----------------+
                                 | file descriptor |
                                 +-------+---------+
                                         |
                             +-----------+-------------+
                             | kernel object reference |
                             +-----------+-------------+
                                         |
                         +---------------+-----------------+
                         |               |                 |
                      vfile           kqueue            vstream           
                                                           |
                                        +--------+------+--+--------+
                                   IPC Socket  FIFO   Pipe   Stream Device


(note the above, and below, are highly fictional)

Where "kernel object reference" is the equivilent of today's "struct
file", "vfile" is the equivilent of today's "vnode", and "vstream" is a
new abstraction for discrete or streamed, ordered, message/event-oriented
services.  Devices might choose to appear as a file-like service, offering
an ordered data address space where all points of the address space have
fairly similar properties, provide a memory mapping service (possibly a
generic vfile pager), data can be read or written arbitrarily, and so on. 
They could also choose to appear as a stream-oriented service which would
offer send/receive primitives, possibly as a stream with discrete message
boundaries, with addressing management, etc.  Ideally, I'd actually rather
kqueue fit under an abstraction like that, although it's currently a
first-class object.  You could imagine:

struct kernel_object {
        struct vnode    *ko_vp;         /* Optional vnode that provided
                                         * access to the object. */
        int              ko_type;       /* Which service abstraction. */
        union {
                struct vfile    *kso_vfile;
                struct vstream  *kso_vstream;
                struct kqueue   *kso_kqueue;
        } ko_service;
};

The optional vnode (possibly NULL) is maintained so that the caller can
perform file-system f* operations on the file descriptor pointing at the
object, but wouldn't apply for things like pipe's, where there is no file
system object.  Presumably most operations would go either to the ko_vp,
or to the ko_service; some might be propagated to both, such as open and
close operations.

Another thing to keep in mind, btw, is that security services are poorly
divided between the device system and file system right now.  File system
permissions are applied on device open, and used by many consumers -- in
fact, one cool thing about using BPF with a /dev/bpf is being able to give
out read/write access to unprivileged.  This doesn't work for a number of
devices, which enforce their own protections in the open operation...

Robert N M Watson             FreeBSD Core Team, TrustedBSD Project
[EMAIL PROTECTED]      NAI Labs, Safeport Network Services




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Reply via email to