Re: Improving the Unix API

1999-06-29 Thread Allen Briggs
> > Well, I'd argue that Berkeley defined a bunch of VFS attributes, and
> > then implemented them natively in UFS and LFS; other non-native
> > filesystems have to map their concepts of other file attributes (e.g.,
> > dates, permissions, etc.,) into the native VFS concepts.
> 
>   Right. Except that UFS has not only generic attibutes. For example,
> you have UF_NODUMP and SF_ARCHIVED. The *only* place in the /sys you
> mention the former is sys/stat.h (BTW, you don't even map it on
> EXT2_NODUMP_FL). The latter is mentioned only in the msdosfs/msdosfs_vnops.c.
> Hardly a VFS flag, right?

It sounds like the implementation is what you are complaining about here,
not the design or the interface.

If I understand you correctly, the level argument in the proposed API is
used to extend the namespace for the attributes.  Do you see a real need
right now to extend your namespace this way?  It seems to provide more
room for error on the programmer's side by increasing the complexity
(albeit not much) of the function/system call.

Additionally, using the same name as the BSD entry point but changing the
calling conventions could confuse programmers who use both systems and
even if it doesn't confuse them, it's an additional thing that they have
to keep in mind when writing portable software.  I think we should be
working in the other direction--reducing the differences between the
systems and making it easier to write portable software.  Maybe that's
just me, though...  ;-)

-allen



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-28 Thread Anonymous

On Mon, 28 Jun 1999, Francois-Rene Rideau wrote:

> On Sun, Jun 27, 1999 at 12:58:05PM -0400, der Mouse wrote:
> > See NetBSD (and presumably other BSD) "mount -o update,rdonly" and/or
> > "umount -f".  (Last I tried, the latter didn't work as it should, but
> > that's a matter of fixing bugs rather than introducing new features.)
> If you re-read the original message, the problem is what to do
> about processes with open file descriptors on the partition:
> stop them at once? stop them at first file access?
> block them instead? kill them? Will you do it atomically?
> How will you allow for such large table-walking to be compatible
> with real-time kernel response? [Hint: either use incremental
> data-structures, or don't be atomic and be interruptible instead.]

unmount -f is more intended for oh-sh*t situations. So harshness is ok.

The way it's done is that all of the vnodes in that fs's vnode list get
either vgone'd or vcleaned (in the -f case). This will have the effect of
mapping them to deadfs vnodes, so all future access will either fail or do
nothing (close works, read returns an error). There aren't any big table
walks. :-)

Take care,

Bill



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-28 Thread Bill Studenmund
On Mon, 28 Jun 1999, Francois-Rene Rideau wrote:

> On Sun, Jun 27, 1999 at 12:58:05PM -0400, der Mouse wrote:
> > See NetBSD (and presumably other BSD) "mount -o update,rdonly" and/or
> > "umount -f".  (Last I tried, the latter didn't work as it should, but
> > that's a matter of fixing bugs rather than introducing new features.)
> If you re-read the original message, the problem is what to do
> about processes with open file descriptors on the partition:
> stop them at once? stop them at first file access?
> block them instead? kill them? Will you do it atomically?
> How will you allow for such large table-walking to be compatible
> with real-time kernel response? [Hint: either use incremental
> data-structures, or don't be atomic and be interruptible instead.]

unmount -f is more intended for oh-sh*t situations. So harshness is ok.

The way it's done is that all of the vnodes in that fs's vnode list get
either vgone'd or vcleaned (in the -f case). This will have the effect of
mapping them to deadfs vnodes, so all future access will either fail or do
nothing (close works, read returns an error). There aren't any big table
walks. :-)

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-28 Thread Anonymous

> > Something which always confused me about Linux' procfs - what have all
> > these kernel variables got to do with process state?  We used to have a
> > kernfs which was intended for this kind of thing but it rotted after
> > people started extending sysctl for the purpose.
> 
> About as much as having a /usr/bin for the slower binaries on the 40Mbyte
> moving head disk has relationship to /usr nowdays. /proc is basically
> both process and machine state in Linux. It got expaneded on.

Maybe nobody noticed yet that 'proc' is an acronym, and has nothing
to do with processes per se.

Hmm.

'Portable Runtime Operation Control' might be a useful name expansion,
alluding to the fact that the interface works across all supported
platforms without byte order problems etc.

:-)
  Patrick


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-28 Thread Patrick Schaaf
> > Something which always confused me about Linux' procfs - what have all
> > these kernel variables got to do with process state?  We used to have a
> > kernfs which was intended for this kind of thing but it rotted after
> > people started extending sysctl for the purpose.
> 
> About as much as having a /usr/bin for the slower binaries on the 40Mbyte
> moving head disk has relationship to /usr nowdays. /proc is basically
> both process and machine state in Linux. It got expaneded on.

Maybe nobody noticed yet that 'proc' is an acronym, and has nothing
to do with processes per se.

Hmm.

'Portable Runtime Operation Control' might be a useful name expansion,
alluding to the fact that the interface works across all supported
platforms without byte order problems etc.

:-)
  Patrick


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-28 Thread Alexander Viro



On Mon, 28 Jun 1999, David S. Miller wrote:

>Date:  Mon, 28 Jun 1999 06:12:44 -0400 (EDT)
>From: Alexander Viro <[EMAIL PROTECTED]>
> 
>3) openpromfs - sparc only (?), AFAICS not actively maintained.
> 
> Oh, it's maintained and used every day, believe me.

Cool ;-) There is a lot of stuff that is apparently not used in the main
tree and vger CVS also gives zero. I'ld like to ask a couple of questions
about that code, but let's take it to e-ma^W oh, hell... out of
crossposting. And postpone till the evening - I'm going down now...
Oh, dear... Integrating all this stuff when the page/buffer cache
stuff will settle down is going to be something ;-/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Improving the Unix API

1999-06-28 Thread David S. Miller

   Date:Mon, 28 Jun 1999 06:12:44 -0400 (EDT)
   From: Alexander Viro <[EMAIL PROTECTED]>

   3) openpromfs - sparc only (?), AFAICS not actively maintained.

Oh, it's maintained and used every day, believe me.

Later,
David S. Miller
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Improving the Unix API

1999-06-28 Thread Alexander Viro



On Mon, 28 Jun 1999, Doug Rabson wrote:

> As far as I know, only FreeBSD has a string-based sysctl implementation.
> Something which always confused me about Linux' procfs - what have all
> these kernel variables got to do with process state?  We used to have a

Nothing. procfs is a union of 4 filesystems. Historical reasons ;-/
There are:
1) /* - per-process stuff. Procfs proper.
2) sys/ - what kernfs should be. I.e. fs interface for sysctl tree.
3) openpromfs - sparc only (?), AFAICS not actively maintained.
4) the rest - mostly information advertised by drivers + kcore + kmsg,
etc. Stuff that is not covered by sysctls (/dev/core is a symlink to
/proc/kcore. 'nuff said.)

They are different code-wise and ought to be separated. As soon as we'll
have working unionfs (or at least non-opaque mount) they *will* be
separated. 

> kernfs which was intended for this kind of thing but it rotted after
> people started extending sysctl for the purpose.

/proc/sys on Linux. It was stuffed into procfs because at that moment
procfs was the only virtual filesystem (and because they shared some
code).


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Improving the Unix API

1999-06-28 Thread Alexander Viro


On Mon, 28 Jun 1999, David S. Miller wrote:

>Date:  Mon, 28 Jun 1999 06:12:44 -0400 (EDT)
>From: Alexander Viro 
> 
>3) openpromfs - sparc only (?), AFAICS not actively maintained.
> 
> Oh, it's maintained and used every day, believe me.

Cool ;-) There is a lot of stuff that is apparently not used in the main
tree and vger CVS also gives zero. I'ld like to ask a couple of questions
about that code, but let's take it to e-ma^W oh, hell... out of
crossposting. And postpone till the evening - I'm going down now...
Oh, dear... Integrating all this stuff when the page/buffer cache
stuff will settle down is going to be something ;-/



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-28 Thread David S. Miller
   Date:Mon, 28 Jun 1999 06:12:44 -0400 (EDT)
   From: Alexander Viro 

   3) openpromfs - sparc only (?), AFAICS not actively maintained.

Oh, it's maintained and used every day, believe me.

Later,
David S. Miller
da...@redhat.com


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-28 Thread Alexander Viro


On Mon, 28 Jun 1999, Doug Rabson wrote:

> As far as I know, only FreeBSD has a string-based sysctl implementation.
> Something which always confused me about Linux' procfs - what have all
> these kernel variables got to do with process state?  We used to have a

Nothing. procfs is a union of 4 filesystems. Historical reasons ;-/
There are:
1) /* - per-process stuff. Procfs proper.
2) sys/ - what kernfs should be. I.e. fs interface for sysctl tree.
3) openpromfs - sparc only (?), AFAICS not actively maintained.
4) the rest - mostly information advertised by drivers + kcore + kmsg,
etc. Stuff that is not covered by sysctls (/dev/core is a symlink to
/proc/kcore. 'nuff said.)

They are different code-wise and ought to be separated. As soon as we'll
have working unionfs (or at least non-opaque mount) they *will* be
separated. 

> kernfs which was intended for this kind of thing but it rotted after
> people started extending sysctl for the purpose.

/proc/sys on Linux. It was stuffed into procfs because at that moment
procfs was the only virtual filesystem (and because they shared some
code).



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-28 Thread Alan Cox
> As far as I know, only FreeBSD has a string-based sysctl implementation.

Nod.

> Something which always confused me about Linux' procfs - what have all
> these kernel variables got to do with process state?  We used to have a
> kernfs which was intended for this kind of thing but it rotted after
> people started extending sysctl for the purpose.

About as much as having a /usr/bin for the slower binaries on the 40Mbyte
moving head disk has relationship to /usr nowdays. /proc is basically
both process and machine state in Linux. It got expaneded on.

Alan





To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-28 Thread Alexander Viro


On Mon, 28 Jun 1999, Alan Cox wrote:

> > As far as sysctl goes, FreeBSD deprecates the use of numbers for OIDs and
> > has a string-based mechanism for exploring the sysctl tree.
> 
> So we are actually both going the same way. Linus with /proc/sys and his
> official dislike of sysctl (Oh well I think sysctl using number spaces is the
> right idea - like snmp is), and BSD going to names

Yup. kernfs (we'ld also better keep it in a separate fs instead of
cluttering procfs, but that's another story).

OK, then. I've looked at it and it seems that strings will be easy to do
(for new chflags, that is). For the time being I simply #define them to
"vfs" and fs names - choosing decent names will wait until the thing will
actually work ;-)



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-28 Thread Doug Rabson
On Mon, 28 Jun 1999, Alan Cox wrote:

> > As far as sysctl goes, FreeBSD deprecates the use of numbers for OIDs and
> > has a string-based mechanism for exploring the sysctl tree.
> 
> So we are actually both going the same way. Linus with /proc/sys and his
> official dislike of sysctl (Oh well I think sysctl using number spaces is the
> right idea - like snmp is), and BSD going to names

As far as I know, only FreeBSD has a string-based sysctl implementation.
Something which always confused me about Linux' procfs - what have all
these kernel variables got to do with process state?  We used to have a
kernfs which was intended for this kind of thing but it rotted after
people started extending sysctl for the purpose.

--
Doug Rabson Mail:  d...@nlsystems.com
Nonlinear Systems Ltd.  Phone: +44 181 442 9037




To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-28 Thread Alan Cox

> As far as sysctl goes, FreeBSD deprecates the use of numbers for OIDs and
> has a string-based mechanism for exploring the sysctl tree.

So we are actually both going the same way. Linus with /proc/sys and his
official dislike of sysctl (Oh well I think sysctl using number spaces is the
right idea - like snmp is), and BSD going to names




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-28 Thread Alan Cox
> As far as sysctl goes, FreeBSD deprecates the use of numbers for OIDs and
> has a string-based mechanism for exploring the sysctl tree.

So we are actually both going the same way. Linus with /proc/sys and his
official dislike of sysctl (Oh well I think sysctl using number spaces is the
right idea - like snmp is), and BSD going to names




To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-28 Thread Doug Rabson

On Sun, 27 Jun 1999, Alexander Viro wrote:

> 
> 
> On Mon, 28 Jun 1999, Doug Rabson wrote:
> > I'm talking about the concept of a header file containing something like:
> > 
> > #define FL_VFS  0
> > #define FL_FOOFS1
> > #define FD_BARFS2
> > ...
> > 
> > not being scalable.
> > 
> > Do you have a complete list of filesystem types? Are you prepared to act
> > as an Assigned Number authority for that list. For this kind of problem,
> > strings are a damn sight easier to manage in the long term.
> 
> Augh... It's ugly, indeed, but... sysctl() is not much nicer and all
> systems in question manage to deal with it somehow. OTOH doing it as
> strings... Hell knows. I'll look at it. Considering that HFS folks
> had already asked for more than one value here (creator and type?) it may
> be reasonable. I'm afraid that doing that may open the hell gates ;-/
> 'N' in *ANA can be 'namespace' as well as 'number'...

Its a tough one alright. Some of my friends at Microsoft would suggest
using UUIDs for this job. They might be clumsy but at least they are never
going to collide and they are easy to generate.

As far as sysctl goes, FreeBSD deprecates the use of numbers for OIDs and
has a string-based mechanism for exploring the sysctl tree.

> 
> [1]
> BTW, how does NetBSD deal with HFS  forks?
> 
> 
> [1] cue current flamew^Wthreads on l-k regarding files-as-directories
> hell.

:-)

--
Doug Rabson Mail:  [EMAIL PROTECTED]
Nonlinear Systems Ltd.  Phone: +44 181 442 9037




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-28 Thread Doug Rabson
On Sun, 27 Jun 1999, Alexander Viro wrote:

> 
> 
> On Mon, 28 Jun 1999, Doug Rabson wrote:
> > I'm talking about the concept of a header file containing something like:
> > 
> > #define FL_VFS  0
> > #define FL_FOOFS1
> > #define FD_BARFS2
> > ...
> > 
> > not being scalable.
> > 
> > Do you have a complete list of filesystem types? Are you prepared to act
> > as an Assigned Number authority for that list. For this kind of problem,
> > strings are a damn sight easier to manage in the long term.
> 
> Augh... It's ugly, indeed, but... sysctl() is not much nicer and all
> systems in question manage to deal with it somehow. OTOH doing it as
> strings... Hell knows. I'll look at it. Considering that HFS folks
> had already asked for more than one value here (creator and type?) it may
> be reasonable. I'm afraid that doing that may open the hell gates ;-/
> 'N' in *ANA can be 'namespace' as well as 'number'...

Its a tough one alright. Some of my friends at Microsoft would suggest
using UUIDs for this job. They might be clumsy but at least they are never
going to collide and they are easy to generate.

As far as sysctl goes, FreeBSD deprecates the use of numbers for OIDs and
has a string-based mechanism for exploring the sysctl tree.

> 
> [1]
> BTW, how does NetBSD deal with HFS  forks?
> 
> 
> [1] cue current flamew^Wthreads on l-k regarding files-as-directories
> hell.

:-)

--
Doug Rabson Mail:  d...@nlsystems.com
Nonlinear Systems Ltd.  Phone: +44 181 442 9037




To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-28 Thread Jan-Simon Pendry

Bodo Rueskamp wrote:
> 
> > >> flink (make a new directory link for file given by descriptor),
> > flink() combined with the ability to create an unlinked file
> > in a given filesystem would allow for safe temporaries
> > without race conditions, that could be "published" when ready.
> 
> The System V people (Solaris, Unixware) call this fattach().

fattach is used to implement stream mounts.  it does not attach an
arbitrary file back to the filesystem.  fattach is a library
function  that specifically mounts a stream pipe using the
"namefs" filesystem.  the effect of fattach does not persist
across a reboot.

jan-simon.

> 
> ; Bodo
> 
> --
> Bodo Rüskamp, [EMAIL PROTECTED], 51°55' N 7°41' E


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-28 Thread Jan-Simon Pendry
Bodo Rueskamp wrote:
> 
> > >> flink (make a new directory link for file given by descriptor),
> > flink() combined with the ability to create an unlinked file
> > in a given filesystem would allow for safe temporaries
> > without race conditions, that could be "published" when ready.
> 
> The System V people (Solaris, Unixware) call this fattach().

fattach is used to implement stream mounts.  it does not attach an
arbitrary file back to the filesystem.  fattach is a library
function  that specifically mounts a stream pipe using the
"namefs" filesystem.  the effect of fattach does not persist
across a reboot.

jan-simon.

> 
> ; Bodo
> 
> --
> Bodo Rüskamp, b...@rueskamp.com, 51°55' N 7°41' E


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-28 Thread Bodo Rueskamp


> >> flink (make a new directory link for file given by descriptor),
> flink() combined with the ability to create an unlinked file
> in a given filesystem would allow for safe temporaries
> without race conditions, that could be "published" when ready.

The System V people (Solaris, Unixware) call this fattach().

; Bodo

-- 
Bodo Rüskamp, [EMAIL PROTECTED], 51°55' N 7°41' E
(1) Elvis is alive.
(2) Dinosaurs too. 
(3) The next millenium starts on January 1st 2000.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-28 Thread Bodo Rueskamp

> >> flink (make a new directory link for file given by descriptor),
> flink() combined with the ability to create an unlinked file
> in a given filesystem would allow for safe temporaries
> without race conditions, that could be "published" when ready.

The System V people (Solaris, Unixware) call this fattach().

; Bodo

-- 
Bodo Rüskamp, b...@rueskamp.com, 51°55' N 7°41' E
(1) Elvis is alive.
(2) Dinosaurs too. 
(3) The next millenium starts on January 1st 2000.


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread der Mouse
>>  -f  The filesystem is forcibly unmounted.  Active special devices
>>  continue to work, but all other files return errors if further
>>  accesses are attempted.
> I think that returning errors is WRONG, unless [...]
> It means that you can't fix the problem with the filesystem and
> resume operations nicely afterwards;

I think I see part of the problem here.

You are thinking "unmount to fix problem, will remount later".

"umount -f" is more like "it's going away, dammit, and I'd rather crash
a few processes than have to take down the whole system".

It might be worthwhile having an option that causes attempted accesses
to hang until the filesystem comes back online, somewhat akin to
Auspex's filesystem "isolation".

der Mouse

   mo...@rodents.montreal.qc.ca
 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Alexander Viro


On Sun, 27 Jun 1999 allb...@ece.cmu.edu wrote:

> On 27 Jun, Jason Thorpe wrote:
> +-
> |   Alexander Viro  wrote:
> |   > doesn't unmap the stuff. Oh, shit, there is such thing as pending
> |   > unlink... Does vgone() force it?
> |  
> |  Regarding unlink()... those aren't operations on vnodes.  Those are
> |  operations on the filesystem namespace, and are thus (correctly)
> |  unaffected.
> +--->8
> 
> I believe what he meant is "how is deallocation of a pending-unlink
> file whose only reference is an open fd which has been revoked dealt
> with"?
> 
> (To which my own answer would be:  "deallocated on close as usual, no
> reason to treat this case specially that I know of".)

When it's already remounted r/o?



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread allbery
On 27 Jun, To: thor...@nas.nasa.gov wrote:
+-
|  (To which my own answer would be:  "deallocated on close as usual, no
|  reason to treat this case specially that I know of".)
+--->8

Strike that, I was on the wrong page.  (Crossed threads re: general
revoke() on Linux)

-- 
brandon s. allbery  [os/2][linux][solaris][japh] allb...@kf8nh.apk.net
system administrator [WAY too many hats]   allb...@ece.cmu.edu
carnegie mellon / electrical and computer engineeringKF8NH
 We are Linux. Resistance is an indication that you missed the point.



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API y,

1999-06-27 Thread Brian F. Feldman
On Sun, 27 Jun 1999, Alexander Viro wrote:

> 
> As for the opening with no permissions - well, it would make *big* sense
> if we could narrow down the API and move chown(), chmod(), etc. into libc
> leaving f-variants in the kernel. Binary compatibility... Extreme variant
> might include {set,get}sockopt extended to files and doing both *stat and
> *ch{mod,own,flags} via that. Out of curiosity - did somebody on *BSD side
> play with that?
> 

Actually, instead of *big* sense, that makes *no* sense.

> 
> 
> To Unsubscribe: send mail to majord...@freebsd.org
> with "unsubscribe freebsd-hackers" in the body of the message
> 

 Brian Fundakowski Feldman  _ __ ___   ___ ___ ___  
 gr...@freebsd.org   _ __ ___ | _ ) __|   \ 
 FreeBSD: The Power to Serve!_ __ | _ \._ \ |) |
   http://www.FreeBSD.org/  _ |___/___/___/ 



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread allbery
On 27 Jun, Jason Thorpe wrote:
+-
|   Alexander Viro  wrote:
|   > doesn't unmap the stuff. Oh, shit, there is such thing as pending
|   > unlink... Does vgone() force it?
|  
|  Regarding unlink()... those aren't operations on vnodes.  Those are
|  operations on the filesystem namespace, and are thus (correctly)
|  unaffected.
+--->8

I believe what he meant is "how is deallocation of a pending-unlink
file whose only reference is an open fd which has been revoked dealt
with"?

(To which my own answer would be:  "deallocated on close as usual, no
reason to treat this case specially that I know of".)

-- 
brandon s. allbery  [os/2][linux][solaris][japh] allb...@kf8nh.apk.net
system administrator [WAY too many hats]   allb...@ece.cmu.edu
carnegie mellon / electrical and computer engineeringKF8NH
 We are Linux. Resistance is an indication that you missed the point.



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Colin Wood

Alexander Viro wrote:
> [1]
> BTW, how does NetBSD deal with HFS  forks?
> 

easy, it doesn't :-)  we don't currently have HFS support, mainly b/c the
only freeware implementations of it (that i'm aware of) are GPL'd, and no
one has been able to devote enough time to it to get a BSD-licensed
version.  although the darwin stuff is now available.  i'm not too sure
how much of it is useful (i haven't looked at it either, tho).

later.

colin



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Colin Wood
Alexander Viro wrote:
> [1]
> BTW, how does NetBSD deal with HFS  forks?
> 

easy, it doesn't :-)  we don't currently have HFS support, mainly b/c the
only freeware implementations of it (that i'm aware of) are GPL'd, and no
one has been able to devote enough time to it to get a BSD-licensed
version.  although the darwin stuff is now available.  i'm not too sure
how much of it is useful (i haven't looked at it either, tho).

later.

colin



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Alexander Viro



On Sun, 27 Jun 1999, Jason Thorpe wrote:

> Regarding unlink()... those aren't operations on vnodes.  Those are
> operations on the filesystem namespace, and are thus (correctly)
> unaffected.

Eh, wait. Those are operations on namespace, but at some moment you need
to clean the bit in inode bitmap. You can't do it before the last close()
and it definitely alters the filesystem. fsck will pick them up, but that
may be *not* a desired result. Dirty filesystem is definitely not desired
anyway.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Alexander Viro


On Sun, 27 Jun 1999, Jason Thorpe wrote:

> Regarding unlink()... those aren't operations on vnodes.  Those are
> operations on the filesystem namespace, and are thus (correctly)
> unaffected.

Eh, wait. Those are operations on namespace, but at some moment you need
to clean the bit in inode bitmap. You can't do it before the last close()
and it definitely alters the filesystem. fsck will pick them up, but that
may be *not* a desired result. Dirty filesystem is definitely not desired
anyway.



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Alexander Viro



On Mon, 28 Jun 1999, Doug Rabson wrote:
> I'm talking about the concept of a header file containing something like:
> 
>   #define FL_VFS  0
>   #define FL_FOOFS1
>   #define FD_BARFS2
>   ...
> 
> not being scalable.
> 
> Do you have a complete list of filesystem types? Are you prepared to act
> as an Assigned Number authority for that list. For this kind of problem,
> strings are a damn sight easier to manage in the long term.

Augh... It's ugly, indeed, but... sysctl() is not much nicer and all
systems in question manage to deal with it somehow. OTOH doing it as
strings... Hell knows. I'll look at it. Considering that HFS folks
had already asked for more than one value here (creator and type?) it may
be reasonable. I'm afraid that doing that may open the hell gates ;-/
'N' in *ANA can be 'namespace' as well as 'number'...

[1]
BTW, how does NetBSD deal with HFS  forks?


[1] cue current flamew^Wthreads on l-k regarding files-as-directories
hell.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Alexander Viro


On Mon, 28 Jun 1999, Doug Rabson wrote:
> I'm talking about the concept of a header file containing something like:
> 
>   #define FL_VFS  0
>   #define FL_FOOFS1
>   #define FD_BARFS2
>   ...
> 
> not being scalable.
> 
> Do you have a complete list of filesystem types? Are you prepared to act
> as an Assigned Number authority for that list. For this kind of problem,
> strings are a damn sight easier to manage in the long term.

Augh... It's ugly, indeed, but... sysctl() is not much nicer and all
systems in question manage to deal with it somehow. OTOH doing it as
strings... Hell knows. I'll look at it. Considering that HFS folks
had already asked for more than one value here (creator and type?) it may
be reasonable. I'm afraid that doing that may open the hell gates ;-/
'N' in *ANA can be 'namespace' as well as 'number'...

[1]
BTW, how does NetBSD deal with HFS  forks?


[1] cue current flamew^Wthreads on l-k regarding files-as-directories
hell.



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Jason Thorpe

On Sun, 27 Jun 1999 20:43:28 -0400 (EDT) 
 Alexander Viro <[EMAIL PROTECTED]> wrote:

 > Forced revoke()? But then there is mmap() and IIRC revoke() on *BSD
 > doesn't unmap the stuff. Oh, shit, there is such thing as pending
 > unlink... Does vgone() force it?

It doesn't unmap the region, but it doesn't allow any more page faults
from that backing vnode (the pager will get an error from the file system,
and thus send a SIGSEGV to the process), and no dirty pages can be cleaned
to that vnode.

I mean, you wouldn't invalidate any buffers the user read the file into
when the file was revoke()'d, would you? :-)

Regarding unlink()... those aren't operations on vnodes.  Those are
operations on the filesystem namespace, and are thus (correctly)
unaffected.

-- Jason R. Thorpe <[EMAIL PROTECTED]>



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Jason Thorpe
On Sun, 27 Jun 1999 20:43:28 -0400 (EDT) 
 Alexander Viro  wrote:

 > Forced revoke()? But then there is mmap() and IIRC revoke() on *BSD
 > doesn't unmap the stuff. Oh, shit, there is such thing as pending
 > unlink... Does vgone() force it?

It doesn't unmap the region, but it doesn't allow any more page faults
from that backing vnode (the pager will get an error from the file system,
and thus send a SIGSEGV to the process), and no dirty pages can be cleaned
to that vnode.

I mean, you wouldn't invalidate any buffers the user read the file into
when the file was revoke()'d, would you? :-)

Regarding unlink()... those aren't operations on vnodes.  Those are
operations on the filesystem namespace, and are thus (correctly)
unaffected.

-- Jason R. Thorpe 



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Alexander Viro



On Sun, 27 Jun 1999, der Mouse wrote:

> >> (clri didn't work?)
> > Never heard about clri (was under Linux).
> 
> May not have existed, then, which *would* explain it. :-)

# debugfs -w /dev/sda1
debugfs:  clri file
debugfs:  close

It exists, all right ;-) Even documented - man 8 debugfs and there you go.

> The NetBSD manpage doesn't say what happens if you "mount -o
> update,force,rdonly" when there are writeable descriptors open onto the
> filesystem, and then try to use those fds.  I would assume further
> attempts to write would produce errors (EROFS?), unless of course the
> filesystem has been re-remounted read/write.

Forced revoke()? But then there is mmap() and IIRC revoke() on *BSD
doesn't unmap the stuff. Oh, shit, there is such thing as pending
unlink... Does vgone() force it?



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Alexander Viro


On Sun, 27 Jun 1999, der Mouse wrote:

> >> (clri didn't work?)
> > Never heard about clri (was under Linux).
> 
> May not have existed, then, which *would* explain it. :-)

# debugfs -w /dev/sda1
debugfs:  clri file
debugfs:  close

It exists, all right ;-) Even documented - man 8 debugfs and there you go.

> The NetBSD manpage doesn't say what happens if you "mount -o
> update,force,rdonly" when there are writeable descriptors open onto the
> filesystem, and then try to use those fds.  I would assume further
> attempts to write would produce errors (EROFS?), unless of course the
> filesystem has been re-remounted read/write.

Forced revoke()? But then there is mmap() and IIRC revoke() on *BSD
doesn't unmap the stuff. Oh, shit, there is such thing as pending
unlink... Does vgone() force it?



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Francois-Rene Rideau

On Sun, Jun 27, 1999 at 07:33:32PM -0400, der Mouse wrote:
>> If you re-read the original message, the problem is what to do about
>> processes with open file descriptors on the partition [...]
> Yes, that's the most difficult part. [...] NetBSD manpage:
>  -f  The filesystem is forcibly unmounted.  Active special devices
>  continue to work, but all other files return errors if further
>  accesses are attempted.
I think that returning errors is WRONG,
unless specifically requested by fnctl().
It means that processes will get unexpected errors
from otherwise validly open filedescriptor.
It means that you can't fix the problem with the filesystem
and resume operations nicely afterwards;
or you will have to manually stop processes from userland before unmounting,
which would not be atomic and generate yet another race condition.
Robert seemed to favor atomically stopping processes.
I am personally in favor of defaulting to a blocking behavior.

>> How will you allow for such large table-walking to be compatible with
>> real-time kernel response?
> *What* large table-walking?  All this means you have to do is have
> every write check the relevant mount point to see if it's mounted
> read-only, for downgrading remounts, and mark the filesystem as gone,
> for forced unmounts.  (I suspect this is what deadfs is for.)
That's typical incremental behavior. Again, it's a matter of tradeoff:
do you want a big atomic operation once in a while and
simple operations every time, or complex incremental operations every time?
It's real-time response vs overall-time duration.
See GC for a field of CS where this trade-off has been beaten to death.
Also, the worry was most important in the case
of atomically stopping processes as recommended by Robert.

Hum. It looks like the need to avoid losing file descriptor information
and pending I/O requests would make it a good idea that there be a
mount mode without either read or write permissions,
similarly to opening files without read or write permissions.
Looks to me like an interesting alternative to deadfs, anyway...

>> Competition is _not_ about taunting each other for pride;
> I know this.  I even think most of the people involved know it.
Cool.

> But there seem to be a few - not many, but very poisonous - who seem to
> take any competition - indeed, almost any *difference* - as an
> opportunity for "we're better than you" egoboo.
"Hey, stupid, my underwear is nicer than yours!"
Hum. Let's just send those kiddies to /dev/null; uh, I mean, er, whatever.

[ "Faré" | VN: Уng-Vû Bân | Join the TUNES project!   http://www.tunes.org/  ]
[ FR: François-René Rideau | TUNES is a Useful, Nevertheless Expedient System ]
[ Reflection&Cybernethics  | Project for  a Free Reflective  Computing System ]
My opinions may have changed, but not the fact that I am right.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Francois-Rene Rideau
On Sun, Jun 27, 1999 at 07:33:32PM -0400, der Mouse wrote:
>> If you re-read the original message, the problem is what to do about
>> processes with open file descriptors on the partition [...]
> Yes, that's the most difficult part. [...] NetBSD manpage:
>  -f  The filesystem is forcibly unmounted.  Active special devices
>  continue to work, but all other files return errors if further
>  accesses are attempted.
I think that returning errors is WRONG,
unless specifically requested by fnctl().
It means that processes will get unexpected errors
from otherwise validly open filedescriptor.
It means that you can't fix the problem with the filesystem
and resume operations nicely afterwards;
or you will have to manually stop processes from userland before unmounting,
which would not be atomic and generate yet another race condition.
Robert seemed to favor atomically stopping processes.
I am personally in favor of defaulting to a blocking behavior.

>> How will you allow for such large table-walking to be compatible with
>> real-time kernel response?
> *What* large table-walking?  All this means you have to do is have
> every write check the relevant mount point to see if it's mounted
> read-only, for downgrading remounts, and mark the filesystem as gone,
> for forced unmounts.  (I suspect this is what deadfs is for.)
That's typical incremental behavior. Again, it's a matter of tradeoff:
do you want a big atomic operation once in a while and
simple operations every time, or complex incremental operations every time?
It's real-time response vs overall-time duration.
See GC for a field of CS where this trade-off has been beaten to death.
Also, the worry was most important in the case
of atomically stopping processes as recommended by Robert.

Hum. It looks like the need to avoid losing file descriptor information
and pending I/O requests would make it a good idea that there be a
mount mode without either read or write permissions,
similarly to opening files without read or write permissions.
Looks to me like an interesting alternative to deadfs, anyway...

>> Competition is _not_ about taunting each other for pride;
> I know this.  I even think most of the people involved know it.
Cool.

> But there seem to be a few - not many, but very poisonous - who seem to
> take any competition - indeed, almost any *difference* - as an
> opportunity for "we're better than you" egoboo.
"Hey, stupid, my underwear is nicer than yours!"
Hum. Let's just send those kiddies to /dev/null; uh, I mean, er, whatever.

[ "Faré" | VN: Уng-Vû Bân | Join the TUNES project!   http://www.tunes.org/  ]
[ FR: François-René Rideau | TUNES is a Useful, Nevertheless Expedient System ]
[ Reflection&Cybernethics  | Project for  a Free Reflective  Computing System ]
My opinions may have changed, but not the fact that I am right.


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread der Mouse

>> (clri didn't work?)
> Never heard about clri (was under Linux).

May not have existed, then, which *would* explain it. :-)

>>> Another problem was the ability to change the mount status of a
>>> partition from read-write to read-only or to unmounted,
>> See NetBSD (and presumably other BSD) "mount -o update,rdonly"
>> and/or "umount -f".
> If you re-read the original message, the problem is what to do about
> processes with open file descriptors on the partition: stop them at
> once? stop them at first file access? block them instead? kill them?

Yes, that's the most difficult part.

The NetBSD manpage doesn't say what happens if you "mount -o
update,force,rdonly" when there are writeable descriptors open onto the
filesystem, and then try to use those fds.  I would assume further
attempts to write would produce errors (EROFS?), unless of course the
filesystem has been re-remounted read/write.

The manpage for umount says

 -f  The filesystem is forcibly unmounted.  Active special devices
 continue to work, but all other files return errors if further
 accesses are attempted.

I haven't looked at the relevant kernel code to see what *really*
happens.

> How will you allow for such large table-walking to be compatible with
> real-time kernel response?

*What* large table-walking?  All this means you have to do is have
every write check the relevant mount point to see if it's mounted
read-only, for downgrading remounts, and mark the filesystem as gone,
for forced unmounts.  (I suspect this is what deadfs is for.)

>>> I intend to put free unices in competition [...]
>> Reasonable as this sounds, I think the last thing we need is yet
>> another ground on which one free-unix can be doing the "nana nana
>> boo boo" taunt at another.
> Competition is _not_ about taunting each other for pride;

I know this.  I even think most of the people involved know it.

But there seem to be a few - not many, but very poisonous - who seem to
take any competition - indeed, almost any *difference* - as an
opportunity for "we're better than you" egoboo.

der Mouse

   [EMAIL PROTECTED]
 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread der Mouse
>> (clri didn't work?)
> Never heard about clri (was under Linux).

May not have existed, then, which *would* explain it. :-)

>>> Another problem was the ability to change the mount status of a
>>> partition from read-write to read-only or to unmounted,
>> See NetBSD (and presumably other BSD) "mount -o update,rdonly"
>> and/or "umount -f".
> If you re-read the original message, the problem is what to do about
> processes with open file descriptors on the partition: stop them at
> once? stop them at first file access? block them instead? kill them?

Yes, that's the most difficult part.

The NetBSD manpage doesn't say what happens if you "mount -o
update,force,rdonly" when there are writeable descriptors open onto the
filesystem, and then try to use those fds.  I would assume further
attempts to write would produce errors (EROFS?), unless of course the
filesystem has been re-remounted read/write.

The manpage for umount says

 -f  The filesystem is forcibly unmounted.  Active special devices
 continue to work, but all other files return errors if further
 accesses are attempted.

I haven't looked at the relevant kernel code to see what *really*
happens.

> How will you allow for such large table-walking to be compatible with
> real-time kernel response?

*What* large table-walking?  All this means you have to do is have
every write check the relevant mount point to see if it's mounted
read-only, for downgrading remounts, and mark the filesystem as gone,
for forced unmounts.  (I suspect this is what deadfs is for.)

>>> I intend to put free unices in competition [...]
>> Reasonable as this sounds, I think the last thing we need is yet
>> another ground on which one free-unix can be doing the "nana nana
>> boo boo" taunt at another.
> Competition is _not_ about taunting each other for pride;

I know this.  I even think most of the people involved know it.

But there seem to be a few - not many, but very poisonous - who seem to
take any competition - indeed, almost any *difference* - as an
opportunity for "we're better than you" egoboo.

der Mouse

   mo...@rodents.montreal.qc.ca
 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Doug Rabson

On Sun, 27 Jun 1999, Alexander Viro wrote:

> 
> 
> On Sun, 27 Jun 1999, Doug Rabson wrote:
> 
> > This looks viable as long as you don't use small integers to represent
> > FL_UFS etc. Having a single header defining constants for all filesystems
> 
>   Erm... sizeof(int)==4. I doubt that you will need more.
> 
> > just doesn't scale at all.
>   Sure. If you don't need fs-specific stuff -  and there
> you go. If you need some particular fs -  and 

I'm talking about the concept of a header file containing something like:

#define FL_VFS  0
#define FL_FOOFS1
#define FD_BARFS2
...

not being scalable.

Do you have a complete list of filesystem types? Are you prepared to act
as an Assigned Number authority for that list. For this kind of problem,
strings are a damn sight easier to manage in the long term.

> 
> > You still want a clearly defined set of FS independant flags so that the
> > application doesn't need to care what filesystem it is sitting on.
> 
>   And that's exactly the reason for FL_VFS vs. FL_FOOFS separation -
> some applications should be able to talk with the filesystem in the
> filesystem's terms *and* be sure that they will not mess with another fs;
> the rest shouldn't care for fs differences at all (aside of "did the
> sucker set the bits I wanted?" that you already have for SUID/SGID/sticky).
> 
>   I don't think that porting it to 4.4 will be difficult - all you
> need is a way to tell VOP_SETATTR what level are you talking to (most
> likely the same way as on the our side - add a field to the structure and 
> let the methods scratch their heads). I'm going to do the Linux variant
> and see how it will work. If somebody wants to do it with *BSD - fine, it
> shouldn't be a problem.

I'm sure the api would be easy to port.  I wouldn't accept any api for
FreeBSD which involved assigning numbers to filesystem types. It was too
painful to rid it of the last set of numbers from the old mount(2) call.

--
Doug Rabson Mail:  [EMAIL PROTECTED]
Nonlinear Systems Ltd.  Phone: +44 181 442 9037



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Improving the Unix API

1999-06-27 Thread Doug Rabson
On Sun, 27 Jun 1999, Alexander Viro wrote:

> 
> 
> On Sun, 27 Jun 1999, Doug Rabson wrote:
> 
> > This looks viable as long as you don't use small integers to represent
> > FL_UFS etc. Having a single header defining constants for all filesystems
> 
>   Erm... sizeof(int)==4. I doubt that you will need more.
> 
> > just doesn't scale at all.
>   Sure. If you don't need fs-specific stuff -  and there
> you go. If you need some particular fs -  and 

I'm talking about the concept of a header file containing something like:

#define FL_VFS  0
#define FL_FOOFS1
#define FD_BARFS2
...

not being scalable.

Do you have a complete list of filesystem types? Are you prepared to act
as an Assigned Number authority for that list. For this kind of problem,
strings are a damn sight easier to manage in the long term.

> 
> > You still want a clearly defined set of FS independant flags so that the
> > application doesn't need to care what filesystem it is sitting on.
> 
>   And that's exactly the reason for FL_VFS vs. FL_FOOFS separation -
> some applications should be able to talk with the filesystem in the
> filesystem's terms *and* be sure that they will not mess with another fs;
> the rest shouldn't care for fs differences at all (aside of "did the
> sucker set the bits I wanted?" that you already have for SUID/SGID/sticky).
> 
>   I don't think that porting it to 4.4 will be difficult - all you
> need is a way to tell VOP_SETATTR what level are you talking to (most
> likely the same way as on the our side - add a field to the structure and 
> let the methods scratch their heads). I'm going to do the Linux variant
> and see how it will work. If somebody wants to do it with *BSD - fine, it
> shouldn't be a problem.

I'm sure the api would be easy to port.  I wouldn't accept any api for
FreeBSD which involved assigning numbers to filesystem types. It was too
painful to rid it of the last set of numbers from the old mount(2) call.

--
Doug Rabson Mail:  d...@nlsystems.com
Nonlinear Systems Ltd.  Phone: +44 181 442 9037




To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Francois-Rene Rideau
On Sun, Jun 27, 1999 at 12:58:05PM -0400, der Mouse wrote:
> As I think someone already mentioned, BSD has chflags(), [...]
Yup.

>> Robert had to hand-remove the immutable flag
>> (I guess, by accessing the relevant block directly).
> (clri didn't work?)
Never heard about clri (was under Linux). And I dunno what Robert did.
I will ask him, if it matters.

> funlink makes no sense [...] unlink() operates on names, not files [...]
Oops. Indeed. The thinko is purely mine.

> I've often wanted open-with-no-access in conjunction with fchdir().
> This is because you need only execute access to set your cwd to a
> directory, but there's no way to get an fd on a mode-111 directory.
Again and again, open-with-no-access definitely seems
to have lots of applications.

>> flink (make a new directory link for file given by descriptor),
flink() combined with the ability to create an unlinked file
in a given filesystem would allow for safe temporaries
without race conditions, that could be "published" when ready.

>> freadlink (read link from a file descriptor opened with O_NULL),
>> fexec (execute the binary that we checked), etc.
> freadlink() implies that open() with O_NULL has the peculiar property
> that, unlike all other open()s, it doesn't follow terminal symlinks.
I suggested that there could be a flag O_DONTFOLLOWLINK in such cases;
I'm not fully sure the feature, but it would allow to set flags on symlinks,
and other goodies.

> While I think there are ways symlinks could be improved, I don't think
> this is one of them.  I can't see any use for opening a symlink except
> use of write() to atomically make the link point somewhere different,
> and I'd prefer to do that by making symlink() do that when the link
> already exists and some appropriate condition is met.
Well, I can imagine opening them to lock them,
so as to prevent other people from making them point somewhere else,
as well as change some filesystem attributes on the right thing, etc.
Again, open() allows locking and prevents race conditions.

>> Of course, you'll want to be able to fcntl(fd,F_SETFL,O_RDWR)
>> or something equivalent, to upgrade your access mode
>> on a file you opened with O_NULL.
> The security weenie in me is _really_ unsure that the ability to
> increase the access modes on an open fd is a good idea.
Well, there could be a flag O_NOINCREASEACCESS to prevent
further increasing of access modes (by e.g. children),
if you that makes you safer.
And of course, increasing access mode
is subject to usual permission checking.

>> Another problem was the ability to change the mount status of a partition
>> from read-write to read-only or to unmounted,
> See NetBSD (and presumably other BSD) "mount -o update,rdonly" and/or
> "umount -f".  (Last I tried, the latter didn't work as it should, but
> that's a matter of fixing bugs rather than introducing new features.)
If you re-read the original message, the problem is what to do
about processes with open file descriptors on the partition:
stop them at once? stop them at first file access?
block them instead? kill them? Will you do it atomically?
How will you allow for such large table-walking to be compatible
with real-time kernel response? [Hint: either use incremental
data-structures, or don't be atomic and be interruptible instead.]

>> Finally, we discussed about saving _and restoring_ the state of a process,
>> another hack that he did once to preserve a long-winded calculation
>> from the service shutdown of a big unix computer.
> I did this once, long long ago, under (I think) 4.3.  I found that I
> couldn't just dump core, though I forget why.  As for the open file
> descriptor question, I punted - I made the relevant call fail unless
> the process had no fds open.
Again, the difficult part is precisely about fd handling;
and the suggested feature of whole-computer save&restore
(where external connections will still be a problem)
similarly required that device drivers be able to dump restorable state.

>> By posting on all free unix kernel mailing-list I know,
>> I intend to put free unices in competition as to which
>> will implement these features first.
> Reasonable as this sounds, I think the last thing we need is yet
> another ground on which one free-unix can be doing the "nana nana boo
> boo" taunt at another.
Competition is _not_ about taunting each other for pride;
it's about striving to be the best we can in an atmosphere
of creative diversity whereby people copy each other's good ideas
and drop everyone's bad ideas. Diversity and free competition
increase the odds of good and bad ideas being recognized as what they are,
first by one, then by everyone,
which benefits to everyone in the form of positive evolution.
But let's reserve such meta-technical discussions to another forum.

>> As for the opening with no permissions - well, it would make *big*
>> sense if we could narrow down the API and move chown(), chmod(), etc.
>> into libc leaving f-variants in the kernel.
> I re

Re: Improving the Unix API

1999-06-27 Thread Gandhi woulda smacked you
On Sun, 27 Jun 1999, der Mouse wrote:

# > Robert had to hand-remove the immutable flag
# > (I guess, by accessing the relevant block directly).
# 
# (clri didn't work?)

Obviously the guy thinks along the lines that you need a file descriptor
to do things to files.  That, or he didn't want to do an fsck on the
partition once he was done.

# 
# > Indeed, the "open without access rights"
# > is useful not only to modify attributes and do other ioctl's,
# > but also to effect all operations that should be done w/o the ability
# > to open for either read or write
# > (fstat, funlink, ioctl, fchown, fchmod, fsync),

You mean like stat, unlink, chown, chmod?

Why in the world are you going to fsync a file with which you haven't
done anything?

The only one up there that makes sense is ioctl.

# 
# funlink makes no sense, unless the fd it takes is the fd of a directory
# and you pass in the name of the entry to be removed - which I imagine
# is not what most people will think when they think of an fd-based
# variant of unlink.  unlink() operates on names, not files, after all.

I've wanted an fclri(fd) which would clear the dev/ino attached to the
fd, but there's no clean way to do that as the system would then have
to search for all instances of that ino on that dev, and that's something
the system has no business doing.  namei (name-to-inode) is necessary,
and also easier than doing the reverse since name-to-inode mapping
is unique (because pathnames are unique) while inode-to-name mapping
is not (because there are hard links, i.e. multiple names can refer
to the same inode).  [read that carefully, it looks contradictory but
isn't.]

# I've often wanted open-with-no-access in conjunction with fchdir().
# This is because you need only execute access to set your cwd to a
# directory, but there's no way to get an fd on a mode-111 directory.

Playing the Daemon's advocate, here...
What use is a descriptor into a directory you can't read?  What's
the point of fchdir(dd->fd) if you can't figure out where you're going
from there?  You may as well use chdir(dir).

# While I think there are ways symlinks could be improved, I don't think
# this is one of them.  I can't see any use for opening a symlink except
# use of write() to atomically make the link point somewhere different,
# and I'd prefer to do that by making symlink() do that when the link
# already exists and some appropriate condition is met.

That's a dicey proposition.  We already have quite a few "appropriate
condition" cases, and I think we want to avoid special-casing a whole
slough of conditions.

# > Of course, you'll want to be able to fcntl(fd,F_SETFL,O_RDWR)
# > or something equivalent, to upgrade your access mode
# > on a file you opened with O_NULL.
# 
# The security weenie in me is _really_ unsure that the ability to
# increase the access modes on an open fd is a good idea.

Nah.  fd's are inevitably associated with vnodes (which don't get freed
until the last close()); if the vnode doesn't map out to the appropriate
permissions, the fcntl() would fail.

# > About namei() and large directories, Robert suggested
# > that news servers, and other large databases
# > (terminfo, that web cache, and many more come to my mind),
# > should use special database libraries with a well-defined API
# > (possibly inspired by the filesystem interface),
# > rather than abuse the filesystem API as they do;
# 
# At least one news system does this now, I think - instead of keeping
# each post in a separate file, it uses one huge file and does its own
# space allocation out of it.

Another problem with making filesystems for news was that you had to tune
cpg and bpi way down and ipg way up.

When fscking the block device was actually possible, it was also faster
to fsck the block device on a device full of symlinks (but that's a horse
of a different colour, I realise...).  Go figure.

# > Another problem was the ability to change the mount status of a partition
# > from read-write to read-only or to unmounted,
# 
# See NetBSD (and presumably other BSD) "mount -o update,rdonly" and/or
# "umount -f".  (Last I tried, the latter didn't work as it should, but
# that's a matter of fixing bugs rather than introducing new features.)

...really?  umount -f always works for me.

There's a bug running around, though, at the end of a shutdown which prevents
me from umounting /var for some reason (fstat shows nothing).

# > Finally, we discussed about saving _and restoring_ the state of a process,
# > another hack that he did once to preserve a long-winded calculation
# > from the service shutdown of a big unix computer.
# 
# I did this once, long long ago, under (I think) 4.3.  I found that I
# couldn't just dump core, though I forget why.  As for the open file
# descriptor question, I punted - I made the relevant call fail unless
# the process had no fds open.

Yeah, there's just no way to restore fd state from saved state, since
that would require locking that particular set 

Re: Improving the Unix API

1999-06-27 Thread Alexander Viro


On Sun, 27 Jun 1999, Jan-Simon Pendry wrote:

> Alexander Viro wrote:
> > Proposed API on the Linux side being
> > int chflags(name, level, oldp, newp); where level is FL_VFS for generic
> > attirbutes (fs may map them on its own set) and FL_{UFS,EXT2,...} for raw
> > flags - corresponding filesystem is free to interpret the thing as it
> > likes and should set the generic attributes in the right way. 
> 
> if linux introduces a different API (ie not just an extension of
> the existing bsd API) then please do *not* call it "chflags".
;-/ Yes, it makes sense.

> it took years just to get over the bsd vs. svr4 gettimeofday()
> fiasco.  btw, what's the proposed API for getting the current
> attribute set?

oldp == NULL ;-)



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Alexander Viro


On Sun, 27 Jun 1999, der Mouse wrote:

> > Another problem was the ability to change the mount status of a partition
> > from read-write to read-only or to unmounted,
> 
> See NetBSD (and presumably other BSD) "mount -o update,rdonly" and/or
> "umount -f".  (Last I tried, the latter didn't work as it should, but
> that's a matter of fixing bugs rather than introducing new features.)

mount -o remount,ro on Linux. What was the problem? Indeed you can't do it
if you have files opened for write there (or pending removal of files
from unlinks), but that limitation is reasonable, IMHO.

> > As for the opening with no permissions - well, it would make *big*
> > sense if we could narrow down the API and move chown(), chmod(), etc.
> > into libc leaving f-variants in the kernel.
> 
> I really don't like that.  The reasons why are (1) this means you have
> to have an fd free to do them; (2) it triples the number of user/kernel
> crossings involved.

The former is not too terrible, but the latter... Yup.

> > Extreme variant might include {set,get}sockopt extended to files and
> > doing both *stat and *ch{mod,own,flags} via that.
> 
> If done, I think the name should be changed.  They are ?etSOCKopt,
> after all.  I'm not fond of this, though; it amounts to returning to
> using ioctl() for the tasks - albeit with a slightly different name.

The *only* way to make it reasonable would be to have a hierarchical
namespace for the options. Otherwise you are just getting the ioctl()
mess, and that's the last thing I'ld like to see.



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Jan-Simon Pendry
Alexander Viro wrote:
> Proposed API on the Linux side being
> int chflags(name, level, oldp, newp); where level is FL_VFS for generic
> attirbutes (fs may map them on its own set) and FL_{UFS,EXT2,...} for raw
> flags - corresponding filesystem is free to interpret the thing as it
> likes and should set the generic attributes in the right way. 

if linux introduces a different API (ie not just an extension of
the existing bsd API) then please do *not* call it "chflags".
it took years just to get over the bsd vs. svr4 gettimeofday()
fiasco.  btw, what's the proposed API for getting the current
attribute set?

jan-simon.


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Alexander Viro


On Sun, 27 Jun 1999, Doug Rabson wrote:

> This looks viable as long as you don't use small integers to represent
> FL_UFS etc. Having a single header defining constants for all filesystems

Erm... sizeof(int)==4. I doubt that you will need more.

> just doesn't scale at all.
Sure. If you don't need fs-specific stuff -  and there
you go. If you need some particular fs -  and 

> You still want a clearly defined set of FS independant flags so that the
> application doesn't need to care what filesystem it is sitting on.

And that's exactly the reason for FL_VFS vs. FL_FOOFS separation -
some applications should be able to talk with the filesystem in the
filesystem's terms *and* be sure that they will not mess with another fs;
the rest shouldn't care for fs differences at all (aside of "did the
sucker set the bits I wanted?" that you already have for SUID/SGID/sticky).

I don't think that porting it to 4.4 will be difficult - all you
need is a way to tell VOP_SETATTR what level are you talking to (most
likely the same way as on the our side - add a field to the structure and 
let the methods scratch their heads). I'm going to do the Linux variant
and see how it will work. If somebody wants to do it with *BSD - fine, it
shouldn't be a problem.



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Bill Sommerfeld
>   Right. Except that UFS has not only generic attibutes. For example,
> you have UF_NODUMP and SF_ARCHIVED. The *only* place in the /sys you
> mention the former is sys/stat.h

Well, right, because backup/restore aren't part of the kernel...

> (BTW, you don't even map it on EXT2_NODUMP_FL).

This was presumably an oversight; I've reported it as a bug.

- Bill


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Doug Rabson
On Sun, 27 Jun 1999, Alexander Viro wrote:

> 
> 
> On Sun, 27 Jun 1999, Bill Sommerfeld wrote:
> 
> > > Usage of ioctl() on Linux was a bad idea and it's going to be fixed. More
> > > or less in the same direction, not exactly the same - 4.4 chflags() works
> > > fine for UFS and leaves other filesystems to map what they can into the
> > > UFS set. 
> > 
> > > Which is bogus - immutable is not a UFS attribute, it's VFS one.
> > 
> > Well, I'd argue that Berkeley defined a bunch of VFS attributes, and
> > then implemented them natively in UFS and LFS; other non-native
> > filesystems have to map their concepts of other file attributes (e.g.,
> > dates, permissions, etc.,) into the native VFS concepts.
> 
>   Right. Except that UFS has not only generic attibutes. For example,
> you have UF_NODUMP and SF_ARCHIVED. The *only* place in the /sys you
> mention the former is sys/stat.h (BTW, you don't even map it on
> EXT2_NODUMP_FL). The latter is mentioned only in the msdosfs/msdosfs_vnops.c.
> Hardly a VFS flag, right?
>   Proposed API on the Linux side being
> int chflags(name, level, oldp, newp); where level is FL_VFS for generic
> attirbutes (fs may map them on its own set) and FL_{UFS,EXT2,...} for raw
> flags - corresponding filesystem is free to interpret the thing as it
> likes and should set the generic attributes in the right way. If you are
> trying to talk with the wrong filesystem (i.e. the level is not FL_VFS and
> not FL_) you are getting an error. If
> oldp is not NULL *oldp contains the attributes to set. if newp is not
> NULL *newp will contain the attributes *after* operation. IMO it's cleaner
> than pushing all attributes into the single bitmap.

This looks viable as long as you don't use small integers to represent
FL_UFS etc. Having a single header defining constants for all filesystems
just doesn't scale at all.

You still want a clearly defined set of FS independant flags so that the
application doesn't need to care what filesystem it is sitting on.

--
Doug Rabson Mail:  d...@nlsystems.com
Nonlinear Systems Ltd.  Phone: +44 181 442 9037




To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Alexander Viro


On Sun, 27 Jun 1999, Bill Sommerfeld wrote:

> > Usage of ioctl() on Linux was a bad idea and it's going to be fixed. More
> > or less in the same direction, not exactly the same - 4.4 chflags() works
> > fine for UFS and leaves other filesystems to map what they can into the
> > UFS set. 
> 
> > Which is bogus - immutable is not a UFS attribute, it's VFS one.
> 
> Well, I'd argue that Berkeley defined a bunch of VFS attributes, and
> then implemented them natively in UFS and LFS; other non-native
> filesystems have to map their concepts of other file attributes (e.g.,
> dates, permissions, etc.,) into the native VFS concepts.

Right. Except that UFS has not only generic attibutes. For example,
you have UF_NODUMP and SF_ARCHIVED. The *only* place in the /sys you
mention the former is sys/stat.h (BTW, you don't even map it on
EXT2_NODUMP_FL). The latter is mentioned only in the msdosfs/msdosfs_vnops.c.
Hardly a VFS flag, right?
Proposed API on the Linux side being
int chflags(name, level, oldp, newp); where level is FL_VFS for generic
attirbutes (fs may map them on its own set) and FL_{UFS,EXT2,...} for raw
flags - corresponding filesystem is free to interpret the thing as it
likes and should set the generic attributes in the right way. If you are
trying to talk with the wrong filesystem (i.e. the level is not FL_VFS and
not FL_) you are getting an error. If
oldp is not NULL *oldp contains the attributes to set. if newp is not
NULL *newp will contain the attributes *after* operation. IMO it's cleaner
than pushing all attributes into the single bitmap.



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread der Mouse
> He realized that the device had an immutable attribute.
> He tried to change the attribute with open() and ioctl()

As I think someone already mentioned, BSD has chflags(), which takes a
pathname.

> Robert had to hand-remove the immutable flag
> (I guess, by accessing the relevant block directly).

(clri didn't work?)

> Indeed, the "open without access rights"
> is useful not only to modify attributes and do other ioctl's,
> but also to effect all operations that should be done w/o the ability
> to open for either read or write
> (fstat, funlink, ioctl, fchown, fchmod, fsync),

funlink makes no sense, unless the fd it takes is the fd of a directory
and you pass in the name of the entry to be removed - which I imagine
is not what most people will think when they think of an fd-based
variant of unlink.  unlink() operates on names, not files, after all.

I've often wanted open-with-no-access in conjunction with fchdir().
This is because you need only execute access to set your cwd to a
directory, but there's no way to get an fd on a mode-111 directory.

> and could be used with new syscalls like
> flink (make a new directory link for file given by descriptor),
> freadlink (read link from a file descriptor opened with O_NULL),
> fexec (execute the binary that we checked), etc.

freadlink() implies that open() with O_NULL has the peculiar property
that, unlike all other open()s, it doesn't follow terminal symlinks.

While I think there are ways symlinks could be improved, I don't think
this is one of them.  I can't see any use for opening a symlink except
use of write() to atomically make the link point somewhere different,
and I'd prefer to do that by making symlink() do that when the link
already exists and some appropriate condition is met.

> Of course, you'll want to be able to fcntl(fd,F_SETFL,O_RDWR)
> or something equivalent, to upgrade your access mode
> on a file you opened with O_NULL.

The security weenie in me is _really_ unsure that the ability to
increase the access modes on an open fd is a good idea.

> About namei() and large directories, Robert suggested
> that news servers, and other large databases
> (terminfo, that web cache, and many more come to my mind),
> should use special database libraries with a well-defined API
> (possibly inspired by the filesystem interface),
> rather than abuse the filesystem API as they do;

At least one news system does this now, I think - instead of keeping
each post in a separate file, it uses one huge file and does its own
space allocation out of it.

> Another problem was the ability to change the mount status of a partition
> from read-write to read-only or to unmounted,

See NetBSD (and presumably other BSD) "mount -o update,rdonly" and/or
"umount -f".  (Last I tried, the latter didn't work as it should, but
that's a matter of fixing bugs rather than introducing new features.)

> Finally, we discussed about saving _and restoring_ the state of a process,
> another hack that he did once to preserve a long-winded calculation
> from the service shutdown of a big unix computer.

I did this once, long long ago, under (I think) 4.3.  I found that I
couldn't just dump core, though I forget why.  As for the open file
descriptor question, I punted - I made the relevant call fail unless
the process had no fds open.

> By posting on all free unix kernel mailing-list I know,
> I intend to put free unices in competition as to which
> will implement these features first.

Reasonable as this sounds, I think the last thing we need is yet
another ground on which one free-unix can be doing the "nana nana boo
boo" taunt at another.  Once upon a time I would have hoped the people
involved were sufficiently mature to avoid doing that, or responding
when on the receiving end of it - and many of them *are*, but I've been
involved in this scene too long to retain any real hope that *all* of
them are.

[And replying to another message...]

> 4.4 chflags() works fine for UFS and leaves other filesystems to map
> what they can into the UFS set.  Which is bogus - immutable is not a
> UFS attribute, it's VFS one.

Perhaps, but it's still something that the underlying filesystem has to
support.  Just because the API bit definitions happen to match what FFS
filesystems save on disk doesn't mean it's inherently an FFS thing.

> As for the opening with no permissions - well, it would make *big*
> sense if we could narrow down the API and move chown(), chmod(), etc.
> into libc leaving f-variants in the kernel.

I really don't like that.  The reasons why are (1) this means you have
to have an fd free to do them; (2) it triples the number of user/kernel
crossings involved.

> Extreme variant might include {set,get}sockopt extended to files and
> doing both *stat and *ch{mod,own,flags} via that.

If done, I think the name should be changed.  They are ?etSOCKopt,
after all.  I'm not fond of this, though; it amounts to returning to
using ioctl() for the tasks - albeit with a slightly diff

Re: Improving the Unix API

1999-06-27 Thread Bill Sommerfeld
> Usage of ioctl() on Linux was a bad idea and it's going to be fixed. More
> or less in the same direction, not exactly the same - 4.4 chflags() works
> fine for UFS and leaves other filesystems to map what they can into the
> UFS set. 

> Which is bogus - immutable is not a UFS attribute, it's VFS one.

Well, I'd argue that Berkeley defined a bunch of VFS attributes, and
then implemented them natively in UFS and LFS; other non-native
filesystems have to map their concepts of other file attributes (e.g.,
dates, permissions, etc.,) into the native VFS concepts.

- Bill


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Alexander Viro


On Sun, 27 Jun 1999, Bill Sommerfeld wrote:

> > .. but there remained one that garbled meta-data had made into a
> > non-existing block device, that would resist rm -f.  He realized
> > that the device had an immutable attribute.  However, the problem is
> > that to change the attribute, you have to open the file before you
> > can ioctl() on it;
> 
> BSD4.4 and its progeny deal with this by providing both chflags() and
> fchflags() system calls; as you don't need to be able to do an open()
> call to use chflags(), you can just fix the immutable attribute once
> you have the system running at an appropriate securelevel.

Usage of ioctl() on Linux was a bad idea and it's going to be fixed. More
or less in the same direction, not exactly the same - 4.4 chflags() works
fine for UFS and leaves other filesystems to map what they can into the
UFS set. Which is bogus - immutable is not a UFS attribute, it's VFS one.
I have a patch (still pre-alpha) and I'll post it tomorrow or on Wednesday
when I'll be back from CA.

As for the opening with no permissions - well, it would make *big* sense
if we could narrow down the API and move chown(), chmod(), etc. into libc
leaving f-variants in the kernel. Binary compatibility... Extreme variant
might include {set,get}sockopt extended to files and doing both *stat and
*ch{mod,own,flags} via that. Out of curiosity - did somebody on *BSD side
play with that?



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Werner Almesberger
Francois-Rene Rideau wrote:
> Robert told me that in some Unix flavors of old,
> it was possible to open a file by path with a null access mode (O_NULL ?)

E.g. Linux. Very undocumented, but has been around for ages ('92 or
such). The main purpose is to keep the floppy drive from spinning up
to check for a media change when you open it to access parameters and
such. E.g. fdformat, setfdprm, and LILO use this. (NB: some versions
of strace print the flags argument in this case as "0x4", although
it's really 3.)

- Werner

-- 
  _
 / Werner Almesberger, ICA, EPFL, CH   werner.almesber...@ica.epfl.ch /
/_IN_R_131__Tel_+41_21_693_6621__Fax_+41_21_693_6610_/


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message



Re: Improving the Unix API

1999-06-27 Thread Bill Sommerfeld
> .. but there remained one that garbled meta-data had made into a
> non-existing block device, that would resist rm -f.  He realized
> that the device had an immutable attribute.  However, the problem is
> that to change the attribute, you have to open the file before you
> can ioctl() on it;

BSD4.4 and its progeny deal with this by providing both chflags() and
fchflags() system calls; as you don't need to be able to do an open()
call to use chflags(), you can just fix the immutable attribute once
you have the system running at an appropriate securelevel.

- Bill


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message